Python fill values with data from other rows

1

Good morning, I'm having a big data failure problem on my df . I need to find the CO2 value similar to another time using conditions I can not do with the line information. I have df of 1 year with values of 30 em 30 minutos . the values of Temperature and Radiation have no missing, I only have missing values in CO2 .

import numpy as np
import pandas as pd

df = pd.read_hdf('./dados.hd5')

df.head()

Year_DoY_Hour          Temperatura    radiacao        CO2
2016-01-01 00:00:00    22.44570        0              380
2016-01-01 00:30:00    22.445700       0              390 
.
.
2016-01-15 00:00:00    22.88300        0              379
2016-01-15 00:30:00    22.445700       0              381 
2016-01-15 01:00:00    22.388300       0              NaN
.
.
.
2016-01-30 00:00:00    22.400000       0              350       
2016-01-30 00:30:00    16.393900       0              375                
2016-01-30 01:00:00    17.133900       0              365 
  • (a) Temperature must be between +- 2.5ºC ;
  • (b) Radiation +- 50W/m² ;
  • I have to have a window of -+ 3 dias between the value with NaN of CO2 .
  • Calculate the average of the values of CO2 when (a) and (b) are accepted in the condition and put where I have the missing data of CO2 .

In% with% displayed above we have that for the day and time df we have 2016-01-15 01:00:00 in NaN and then I can not find a temp. and radia. to fill the value of CO2 . I believe that with conditions I can do it, but I can not.

    
asked by anonymous 21.03.2017 / 14:13

2 answers

1
# Cria um index dos valores que são Nan
nan_index = df[df.isnull()].index
# Para todos os Nans
for i in range(df.isnull().sum()):
    # Extrai os valores da outra coluna que você quer procurar
    dado_nan = df[['coluna']][df.isnull()].iloc[i].values()
    # Substitui com as médias dos valores dentro da faixa desejada
    df['novaColuna'][nan_index[i]] = df[abs(df.coluna - dado_nan[0]) < 2.5].mean()
    
29.03.2017 / 19:52
2

Lucas, this process is called interpolação .

Since your data is in the dataframe format, take a look at docs . And take a look also at the part about Working with Missing Data .

According to the docs, try to run the command:

df['CO2'].interpolate()

You can also define which interpolation method to use:

method : {‘linear’, ‘time’, ‘index’, ‘values’, ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’, ‘piecewise_polynomial’, ‘from_derivatives’, ‘pchip’, ‘akima’}

Ex:

df['CO2'].interpolate(method='linear')]

In addition, you can also incorporate conditional clauses to interpolate under certain conditions.

    
22.03.2017 / 18:05