Appending a new row to DataFrame

import pandas as pd
df = pd.DataFrame(columns = ['A', 'B', 'C'])
Empty DataFrame
Columns: [A, B, C]
Index: []

Appending a row by a single column value:

df.loc[0, 'A'] = 1
df
A B C
0 1 NaN NaN

Appending a row, given list of values:

df.loc[1] = [2, 3, 4]
df
A B C
0 1 NaN NaN
1 2 3 4

Appending a row given a dictionary:

df.loc[2] = {'A': 3, 'C': 9, 'B': 9}
 df
A B C
0 1 NaN NaN
1 2 3 4
2 3 9 9

The first input in .loc[] is the index. If you use an existing index, you will overwrite the values in that row:

df.loc[1] = [5, 6, 7]
df
A B C
0 1 NaN NaN
1 5 6 7
2 3 9 9
df.loc[0, 'B'] = 8
df
A B C
0 1 8 NaN
1 5 6 7
2 3 9 9

Append a DataFrame to another DataFrame

Let us assume we have the following two DataFrames:

A B
0 a1 b1
1 a2 b2
df2
B C
0 b1 c1

The two DataFrames are not required to have the same set of columns. The append method does not change either of the original DataFrames. Instead, it returns a new DataFrame by appending the original two. Appending a DataFrame to another one is quite simple:

df1.append(df2)
A B C
0 a1 b1 NaN
1 a2 b2 NaN
0 NaN b1 c1

As you can see, it is possible to have duplicate indices (0 in this example). To avoid this issue, you may ask Pandas to reindex the new DataFrame for you:

df1.append(df2, ignore_index = True)
A B C
0 a1 b1 NaN
1 a2 b2 NaN
2 NaN b1 c1

Boolean indexing of dataframes

Accessing rows in a dataframe using the DataFrame indexer objects .ix, .loc, .iloc and how it differentiates itself from using a boolean mask.

Examples

Accessing a DataFrame with a boolean index This will be our example data frame:

df = pd.DataFrame({"color": ['red', 'blue', 'red', 'blue']}, index=[True, False, True, False])
color
True red
False blue
True red
False blue

Accessing with .loc

df.loc[True]
color
True red
True red

Accessing with .iloc

df.iloc[True]
>> TypeError
df.iloc[1]
color blue
dtype: object

Important to note is that older pandas versions did not distinguish between boolean and integer input, thus .iloc[True] would return the same as .iloc[1].

Accessing with .ix

df.ix[True]
color
True red
True red
df.ix[1]
color blue 
dtype: object

As you can see, .ix has two behaviors. This is very bad practice in code and thus it should be avoided. Please use .iloc or .loc to be more explicit.

Applying a boolean mask to a dataframe

This will be our example data frame:

color name size
0 red rose big
1 blue violet big
2 red tulip small
3 blue harebell small

Using the magic getitem or [] accessor. Giving it a list of True and False of the same length as the dataframe will give you:

df[[True, False, True, False]]
color name size
0 red rose big
2 red tulip small

Masking data based on column value

This will be our example data frame:

color name size
0 red rose big
1 blue violet small
2 red tulip small
3 blue harebell small

Accessing a single column from a data frame, we can use a simple comparison == to compare every element in the column to the given variable, producing a pd.Series of True and False.

df['size'] == 'small'
0 False
1 True
2 True
3 True
Name: size, dtype: bool

This pd.Series is an extension of an np.array which is an extension of a simple list, Thus we can hand this to the getitem or [] accessor as in the above example.

size_small_mask = df['size'] == 'small'
df[size_small_mask]
color name size
1 blue violet small
2 red tulip small
3 blue harebell small

Masking data based on index value
This will be our example data frame:

color size
name
rose red big
violet blue small
tulip red small
harebell blue small

We can create a mask based on the index values, just like on a column value.

rose_mask = df.index == 'rose'
df[rose_mask]
color size
name
rose red big

But doing this is almost the same as

df.loc['rose']
color red
size big
Name: rose, dtype: object

The important difference being, when .loc only encounters one row in the index that matches, it will return a pd.Series, if it encounters more rows that matches, it will return a pd.DataFrame. This makes this method rather unstable. This behavior can be controlled by giving the .loc a list of a single entry. This will force it to return a data frame.

df.loc[['rose']]
color size
name
rose red big
Important Notice For College Students

If you’re a college student and have skills in programming languages, Want to earn through blogging? Mail us at geekycomail@gmail.com

For more Programming related blogs Visit Us Geekycodes. Follow us on Instagram.

By

Leave a Reply

Discover more from Geeky Codes

Subscribe now to keep reading and get access to the full archive.

Continue reading