Python - Select 2 columns of a DF and classify them

0

I'm new to the programming world, and I'm doing some studies to gain knowledge in the area of Data Science.

Come on ... I have a Dataframe with a lot of information, among it gender and age. I want to bring the amount of lines of each gender (male and female) and classify them as children (0 to> 12 years), young people (12 to> 18 years) and adults (18+). >

I'm lost to the point of not knowing or getting started right ...

Input: df.groupby("Sex").Age.unique()
Output: 
Sex
female    [38.0, 26.0, 35.0, 27.0, 14.0, 4.0, 58.0, 55.0...
male      [22.0, 35.0, 29.0, 54.0, 2.0, 20.0, 39.0, 34.0...
Name: Age, dtype: object

Variável:
classification = df.groupby("Sex").Age.unique()

Now imagine that I have to do a for loop, is that it? But how to name each case.

    
asked by anonymous 21.08.2018 / 16:16

3 answers

1

Starting from this DataFrame:

# -*- coding: utf-8 -*-
import pandas as pd

d = {'Sex':['female','female', 'female', 'female', 'male', 'male','male','male'],
     'Age':[38.0,26.0,4.0,14.0,33.0,24.0,7.0,16.0]}

df = pd.DataFrame(data=d)

>>> print(df)
    Age     Sex
0  38.0  female
1  26.0  female
2   4.0  female
3  14.0  female
4  33.0    male
5  24.0    male
6   7.0    male
7  16.0    male

We sort by age:

def define_classe(idade):
    if idade >= 18:
        return 'Adulto'
    elif idade >= 12:
        return 'Jovem'
    return 'Criança'

df['Classification'] = df['Age'].map(define_classe)
>>> print(df)
    Age     Sex Classification
0  38.0  female         Adulto
1  26.0  female         Adulto
2   4.0  female        Criança
3  14.0  female          Jovem
4  33.0    male         Adulto
5  24.0    male         Adulto
6   7.0    male        Criança
7  16.0    male          Jovem

And now just filter the fields. In the example, Adult Man:

>>> print (len(df.loc[df['Classification'] == 'Adulto'].loc[df['Sex'] == 'male']))
2

Another way would be to filter the values straight, without doing the sort before:

>>> df.loc[df['Age'] >= 18].loc[df['Sex'] == 'male']
    Age   Sex Classification
4  33.0  male         Adulto
5  24.0  male         Adulto

>>> print(len(df.loc[df['Age'] >= 18].loc[df['Sex'] == 'male']))
2

>>> print(df.loc[df['Age'] >= 12].loc[df['Age'] < 18].loc[df['Sex'] == 'male'])
    Age   Sex Classification
7  16.0  male          Jovem

>>> print(len(df.loc[df['Age'] >= 12].loc[df['Age'] < 18].loc[df['Sex'] == 'male']))
1
    
21.08.2018 / 19:14
0

If you just want to change the values of this columns for children, young people and adults you can use the .apply method in each column:

First you create a function:

def classifica_idade(x):
    if < 12:
        returne criança
    elif x >= 12 and x <= 18:
        returne joven
    returne adulto

This done just go in the dataframe and apply in the column you want as follows: class = 'class_name' = 'class_name' = class_name = 'class_name'

also works with lambdas functions.

    
21.08.2018 / 17:34
0

Maybe I was not clear ... my goal is to have a similar result:

Children Female: x amount

Young Female: y amount

Adult Female: z amount

Children Male: n amount

Young Male: k amount

Adult Male: j amount

I have evolved creating a new dataframe with just the Sex and Age columns. I think it will be easier to move from here ...

    
21.08.2018 / 18:10