Appending a new row to DataFrame
import pandas as pd df = pd.DataFrame(columns = ['A', 'B', 'C'])
Empty DataFrame
Columns: [A, B, C]
Index: []
Appending a row by a single column value:
df.loc[0, 'A'] = 1 df
A B C
0 1 NaN NaN
Appending a row, given list of values:
df.loc[1] = [2, 3, 4] df
A B C
0 1 NaN NaN
1 2 3 4
Appending a row given a dictionary:
df.loc[2] = {'A': 3, 'C': 9, 'B': 9}
df
A B C
0 1 NaN NaN
1 2 3 4
2 3 9 9
The first input in .loc[] is the index. If you use an existing index, you will overwrite the values in that row:
df.loc[1] = [5, 6, 7] df
A B C
0 1 NaN NaN
1 5 6 7
2 3 9 9
df.loc[0, 'B'] = 8 df
A B C
0 1 8 NaN
1 5 6 7
2 3 9 9
Append a DataFrame to another DataFrame
Let us assume we have the following two DataFrames:
A B
0 a1 b1
1 a2 b2
df2
B C
0 b1 c1
The two DataFrames are not required to have the same set of columns. The append method does not change either of the original DataFrames. Instead, it returns a new DataFrame by appending the original two. Appending a DataFrame to another one is quite simple:
df1.append(df2)
A B C
0 a1 b1 NaN
1 a2 b2 NaN
0 NaN b1 c1
As you can see, it is possible to have duplicate indices (0 in this example). To avoid this issue, you may ask Pandas to reindex the new DataFrame for you:
df1.append(df2, ignore_index = True)
A B C
0 a1 b1 NaN
1 a2 b2 NaN
2 NaN b1 c1
Boolean indexing of dataframes
Accessing rows in a dataframe using the DataFrame indexer objects .ix, .loc, .iloc and how it differentiates itself from using a boolean mask.
Examples
Accessing a DataFrame with a boolean index This will be our example data frame:
df = pd.DataFrame({"color": ['red', 'blue', 'red', 'blue']}, index=[True, False, True, False])
color
True red
False blue
True red
False blue
Accessing with .loc
df.loc[True]
color
True red
True red
Accessing with .iloc
df.iloc[True]
>> TypeError
df.iloc[1]
color blue
dtype: object
Important to note is that older pandas versions did not distinguish between boolean and integer input, thus .iloc[True] would return the same as .iloc[1].
Accessing with .ix
df.ix[True]
color
True red
True red
df.ix[1]
color blue
dtype: object
As you can see, .ix has two behaviors. This is very bad practice in code and thus it should be avoided. Please use .iloc or .loc to be more explicit.
Applying a boolean mask to a dataframe
This will be our example data frame:
color name size
0 red rose big
1 blue violet big
2 red tulip small
3 blue harebell small
Using the magic getitem or [] accessor. Giving it a list of True and False of the same length as the dataframe will give you:
df[[True, False, True, False]]
color name size
0 red rose big
2 red tulip small
Masking data based on column value
This will be our example data frame:
color name size
0 red rose big
1 blue violet small
2 red tulip small
3 blue harebell small
Accessing a single column from a data frame, we can use a simple comparison == to compare every element in the column to the given variable, producing a pd.Series of True and False.
df['size'] == 'small'
0 False
1 True
2 True
3 True
Name: size, dtype: bool
This pd.Series is an extension of an np.array which is an extension of a simple list, Thus we can hand this to the getitem or [] accessor as in the above example.
size_small_mask = df['size'] == 'small' df[size_small_mask]
color name size
1 blue violet small
2 red tulip small
3 blue harebell small
Masking data based on index value
This will be our example data frame:
color size
name
rose red big
violet blue small
tulip red small
harebell blue small
We can create a mask based on the index values, just like on a column value.
rose_mask = df.index == 'rose' df[rose_mask]
color size
name
rose red big
But doing this is almost the same as
df.loc['rose']
color red
size big
Name: rose, dtype: object
The important difference being, when .loc only encounters one row in the index that matches, it will return a pd.Series, if it encounters more rows that matches, it will return a pd.DataFrame. This makes this method rather unstable. This behavior can be controlled by giving the .loc a list of a single entry. This will force it to return a data frame.
df.loc[['rose']]
color size
name
rose red big
Important Notice For College Students
If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at geekycomail@gmail.com
For more Programming related blogs Visit Us Geekycodes. Follow us on Instagram.