Removing row of NaN values from a DataFrame

0

I joined two tables through the command pd.concat and I came across the problem of having multiple NaN values.

It turns out that there are missing values in one of the dataframes. To facilitate my study of Data Science I want to remove all row with NaN values.

I accept other suggestions.

Data:

    Ano Country Name  Pobreza  Population
1     1960  Afghanistan      NaN   8996351.0
265   1961  Afghanistan      NaN   9166764.0
529   1962  Afghanistan      NaN   9345868.0
793   1963  Afghanistan      NaN   9533954.0
1057  1964  Afghanistan      NaN   9731361.0
1321  1965  Afghanistan      NaN   9938414.0
1585  1966  Afghanistan      NaN  10152331.0
1849  1967  Afghanistan      NaN  10372630.0
2113  1968  Afghanistan      NaN  10604346.0
2377  1969  Afghanistan      NaN  10854428.0
    
asked by anonymous 22.08.2017 / 03:39

1 answer

0

Dropna

import pandas as pd
import numpy as np

df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1],
                       [np.nan, np.nan, np.nan, 5]],
                       columns=list('ABCD'))

df
   A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

Drop columns in which all elements are nan :

df.dropna(axis=1, how='all')    
    A    B  D
0  NaN  2.0  0
1  3.0  4.0  1
2  NaN  NaN  5

Drop columns in which any elements are nan :

df.dropna(axis=1, how='any')
   D
0  0
1  1
2  5

Drop lines in which all elements are nan (in this case, we do not have any):

df.dropna(axis=0, how='all')
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

Keep only rows with at least 2 values that are not nan :

df.dropna(thresh=2)
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
    
22.08.2017 / 04:38