I do not know what the format really is in hd5 (I researched but I did not understand), if it is what you put instead of doing ...split(',')
as I do in the examples below it does ....split(' ')
(4 spaces ). The csv format I used for testing is:
2016-01-01 00:00:00, NaN
2016-01-01 01:00:00, 22.445700
2016-01-01 02:00:00, 22.388300
2016-01-01 03:00:00, 22.400000
2016-01-01 04:00:00, NaN
2016-01-01 05:00:00, 22.133900
2016-01-01 06:00:00, 21.948999
2016-01-01 07:00:00, 21.787901
...
With groupby you can do this:
from itertools import groupby
with open('tests.csv', 'r') as f:
dados = [(l.split(',')[0], l.split(',')[1].strip()) for l in f]
print(dados) # [('2016-01-01 00:00:00', 'NaN'), ('2016-01-01 01:00:00', '22.445700'), ('2016-01-01 02:00:00', '22.388300'), ('2016-01-01 03:00:00', '22.400000'), ...]
dados_sort = sorted((k.split()[1], v) for k, v in dados) # importante
for hora, group in groupby(dados_sort, key=lambda x: x[0]):
group = list(group)
if any(v == 'NaN' for k, v in group):
print('Existem {} NaN na hora {}'.format(len(group), hora))
Program output for the data you gave:
There are 2 NaN in the hour 00:00:00
There are 2 NaN in the hour 04:00:00
There are
1 NaN at 09:00 AM
But honestly I would not do it this way (unless I really needed it), it would do so:
from collections import Counter
dados = {}
with open('tests.csv', 'r') as f:
for l in f:
hora, val = l.split(',') # hora e temperatura, deves ja ter isto devidido por linha no teu caso
dados.setdefault(val.strip(), []).append(hora.split(' ')[1])
print(dados) # {'22.388300': ['02:00:00'], '23.810600': ['03:00:00'], '21.610300': ['08:00:00'], '22.400000': ['03:00:00'], '21.948999': ['06:00:00'], 'NaN': ['00:00:00', '04:00:00', '09:00:00', '00:00:00', '04:00:00'], '22.910700': ['02:00:00'], '22.445700': ['01:00:00'], '21.787901': ['07:00:00'], '22.133900': ['05:00:00'], '21.310800': ['01:00:00']}
print(Counter(dados['NaN']))
{'00:00:00': 2, '04: 00: 00 ': 2, '09: 00: 00': 1}
Or, if you do not need to store the values, you can only:
from collections import Counter
list_NaN = []
with open('tests.csv', 'r') as f:
for l in f:
hora, val = l.split(',')
if val.strip() == 'NaN':
list_NaN.append(hora.split(' ')[1])
print(Counter(list_NaN))
{'00:00:00': 2, '04: 00: 00 ': 2, '09: 00: 00': 1}