keep only the rows of a column with the value with Pandas

0

Hello,

I am a beginner in the field and one of the exercises I am having difficulty responding to is the following. there is a multi-column file. In one of these columns, there are several zipcode (zipcode), and may or may not repeat. For each line of zip there is a column of price (price) of the houses. I discovered which is the zip with the most expensive houses and now I need to find the average house prices of this zip. I can find the total number of houses in this zip but I can not associate the price with the zip. To see if it becomes clear I send a print with the columns.

    
asked by anonymous 18.07.2017 / 21:20

1 answer

1

I do not know if I completely understood the question, but I tried to answer it, if you want to see the result in a jupyter notebook, clicking here. *

import pandas as pd
import random

# Random
rnds=[]
for i in range(20):
    rnds.append({'id': str(i), 'price': random.randrange(200,8500),
 'zipcode': str(random.randrange(9700,9710))})

# Criando o df
prices = pd.DataFrame(rnds)

prices
    id  price   zipcode
0   0   4939    9706
1   1   314 9708
2   2   4554    9705
3   3   5930    9700
4   4   8306    9706
5   5   717 9708
6   6   7105    9705
7   7   6136    9705
8   8   6882    9704
9   9   4364    9708
10  10  2384    9704
11  11  2003    9704
12  12  8119    9705
13  13  354 9707
14  14  3747    9701
15  15  1838    9709
16  16  3287    9706
17  17  7586    9709
18  18  3228    9702
19  19  4483    9701

Ok, so far a dataframe has been created to serve as an example, only with the columns that matter to the question. Now we will get the CEP with the highest value and a grouping of ceps with the highest values

# Obtenfo o maximo
prices.ix[prices['price'].idxmax()]

id            4
price      8306
zipcode    9706
Name: 4, dtype: object

# Agrupando os máximos
prices.groupby('zipcode', sort=False)['price'].max()

zipcode
9706    8306
9708    4364
9705    8119
9700    5930
9704    6882
9707     354
9701    4483
9709    7586
9702    3228
Name: price, dtype: int64

Finally, here is the answer to the heart of the matter.

# Agrupando pelas médias
prices.groupby('zipcode', sort=False)['price'].mean()

zipcode
9706    5510.666667
9708    1798.333333
9705    6478.500000
9700    5930.000000
9704    3756.333333
9707     354.000000
9701    4115.000000
9709    4712.000000
9702    3228.000000
Name: price, dtype: float64

Bonus: -)

# Agrupando pelos minimos
prices.groupby('zipcode', sort=False)['price'].min()
zipcode
9706    3287
9708     314
9705    4554
9700    5930
9704    2003
9707     354
9701    3747
9709    1838
9702    3228
Name: price, dtype: int64

* The results on the notebook may be slightly different, since I am using random values and, before uploading , I ran it once again after posting here.

    
19.07.2017 / 00:54