What is Dataset?
A Dataset is container of data in python. It can work as data storage for the various algorithms in python. and also a primary storage of data in Data Science. Below I will be discussing how to import a dataset as dataframe in python. For this post I’ll be using a public dataset called Titanic Dataset available on kaggle. You can dowload it from Titanic Dataset
- Importing Data from various excel sheets as dataframe.
There are various type of excel sheets supported by Microsoft excel. Importing some of those files is illustrated below in python.
- Importing a CSV file.
import pandas as pd my_df=pd.read_csv(C:\Users\geekycodesco\Downloads\titanic\test.csv)
2. Importing a xlsx file.
import pandas as pd my_def=pd.read_excel(C:\Users\geekycodesco\Downloads\titanic\test.xls)
- Display the dataset imported in above codes.
By default python shows first 5 rows in the dataset. But if you want to see 10 rows then you can see it by below code
- Exporting data-frame as excel sheets in Python.
When you want to save a data-frame in your local drive for later use you need to save it as excel sheet.
my_def.to_csv('file_name.csv’) # the current directory my_def.to_csv('C:/Users/abc/Desktop/file_name.csv') #customized directory
if you want to save NaN values as Unknown
Want to import headers or not?