Counting 'nan' and 'null' values in a pandas dataframe

2

Imagine that we have a CSV file called data.csv:

col1    col2    col3   col4
1        2        3        4
5       6         7        8
9      10        11       12
13     14        15    
33    44

import numpy as np
import pandas as pd

po = pd.read_csv('/dados.csv')

My goal is to better understand how to identify Nan / null data in a dataset.

Questions:

1. How to count how many 'nan' data exist in the above dataset?

2. How to count how many null data exist in the above dataset?

3. How to count how many data does not 'nan' exist in the above dataset?

4. How to count how many non-null data exist in the above dataset?

And the same questions as above but per column?

I tried, for example,:

po[po['col4'].isna()].count()

Thinking about how many 'nan' bills exist in column 4, but the answer was:

col1    2
col2    2
col3    1
col4    0
dtype: int64

What's wrong? How to answer the above questions?

    
asked by anonymous 24.06.2018 / 21:52

2 answers

3
  

What's wrong?

The function count() does not count null data (for each column or row), the correct use of it is:

  • Non-null data count of all columns

    print(po.count())
    

    the output will be:  

    col1    5
    col2    5
    col3    4
    col4    3
    dtype: int64
  • Non-null data count of a specific column

    print(po.col4.count())
    

    the output will be:  

    3

See working at repl.it

To make the missing data count, you can use the

28.06.2018 / 02:55
1

1 and 2: how many 'nan' data exist in the above dataset -> sum ()

    
30.06.2018 / 20:36