Separating a dataframe by some criteria - Python Pandas

2

I have a csv file with more than 500,000 lines that I would like to separate into two dataframes.

The format of each line: date, time, value of the variable of interest

dd-mm-aa, hh: mm: 00, float

The lines are recorded every minute. In one year they are: 526,600 lines.

I would like to do a split by separating the day from the night, ie based on the time intervals. 08:00:00 to 20:00:00 have one df and another for the night.

How to do this using pandas?

    
asked by anonymous 03.06.2018 / 02:45

1 answer

2

Assuming the following data for this response, saved in 'dados.csv' :

data,hora,numero
01/01/2018,10:30,0.1
01/01/2018,20:05,0.2
01/01/2018,07:00,0.3
02/01/2018,11:10,0.4
02/01/2018,22:35,0.5
03/01/2018,03:10,0.6
03/01/2018,20:45,0.7
03/01/2018,12:20,0.8
04/01/2018,15:15,0.9
04/01/2018,23:59,0.95

When loading the DataFrame , specify the type of the columns data and hora as dates, informing the name of the columns in the parameter parse_dates of the command read_csv :

import pandas as pd
import datetime

df = pd.read_csv('dados.csv', parse_dates=['data','hora'])

Output:

Out[4]:
        data                hora  numero
0 2018-01-01 2018-06-02 10:30:00    0.10
1 2018-01-01 2018-06-02 20:05:00    0.20
2 2018-01-01 2018-06-02 07:00:00    0.30
3 2018-02-01 2018-06-02 11:10:00    0.40
4 2018-02-01 2018-06-02 22:35:00    0.50
5 2018-03-01 2018-06-02 03:10:00    0.60
6 2018-03-01 2018-06-02 20:45:00    0.70
7 2018-03-01 2018-06-02 12:20:00    0.80
8 2018-04-01 2018-06-02 15:15:00    0.90
9 2018-04-01 2018-06-02 23:59:00    0.95

To transform the hora column into time format:

df['hora'] = pd.to_datetime(df['hora']).dt.time

Output:

Out[6]:
        data      hora  numero
0 2018-01-01  10:30:00    0.10
1 2018-01-01  20:05:00    0.20
2 2018-01-01  07:00:00    0.30
3 2018-02-01  11:10:00    0.40
4 2018-02-01  22:35:00    0.50
5 2018-03-01  03:10:00    0.60
6 2018-03-01  20:45:00    0.70
7 2018-03-01  12:20:00    0.80
8 2018-04-01  15:15:00    0.90
9 2018-04-01  23:59:00    0.95

To create the DataFrame with data for the period "day", enter the condition 'time' between 8 and 20 in the query filter:

df_dia = df.loc[(df['hora'] >= datetime.time(hour=8)) & (df['hora'] <= datetime.time(hour=20))]

Result:

In [9]: df_dia
Out[9]:
        data      hora  numero
0 2018-01-01  10:30:00     0.1
3 2018-02-01  11:10:00     0.4
7 2018-03-01  12:20:00     0.8
8 2018-04-01  15:15:00     0.9

To create the DataFrame night, just enter the opposite condition:

df_noite = df.loc[(df['hora'] < datetime.time(hour=8)) | (df['hora'] > datetime.time(hour=20))]

Result:

In [12]: df_noite
Out[12]:
        data      hora  numero
1 2018-01-01  20:05:00    0.20
2 2018-01-01  07:00:00    0.30
4 2018-02-01  22:35:00    0.50
5 2018-03-01  03:10:00    0.60
6 2018-03-01  20:45:00    0.70
9 2018-04-01  23:59:00    0.95
    
03.06.2018 / 03:50