Error in indexing when separating line information

0
import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 dias | Com crédito       5
3            Prospect | 5 dias | Com crédito       7 

So I try to create a new column with the second information of each row in column 1, ie, I want to get the "5 Days"

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

However, it gives the error below:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-180-57ecc844181a> in <module>()
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

c:\users\iuri\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-180-57ecc844181a> in <lambda>(x)
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

IndexError: list index out of range

If I try to do with the first field, which is Prospect, it works:

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])
df_teste.head()
>>>>>>>
0    Prospect 
1    Prospect 
2    Prospect 
3    Prospect 
4    Prospect 

Does anyone have any tips on why I can not get this information?

If I test, creating something like this:

df_teste = df['Segmento'].apply(lambda x: x.split("|"))
df_teste.head()

>>>>>

0     [Prospect ,  5 dias ,  Com crédito]
1    [Prospect ,  20 dias ,  Com crédito]
2    [Prospect ,  40 dias ,  Com crédito]
3    [Prospect ,  75 dias ,  Com crédito]
4     [Prospect ,  5 dias ,  Sem crédito]

It's clear that you could pick up the information, 1, the days, but that does not happen.

Could anyone help me?

    
asked by anonymous 10.08.2018 / 21:55

2 answers

1

One solution is to use:

def cria_colunas(string_campanha):
    lista = string_campanha.split("|")
    if len(lista) == 0:
        return '', '', ''
    elif len(lista) == 1:
        return lista[0], '', ''
    elif len(lista) == 2:
        return lista[0], lista[1], ''
    elif len(lista) == 3:
        return lista[0], lista[1], lista[2]

df['Ação'], df['Prazo'], df['Crédito'] = df['campanha'].apply(cria_colunas)

Or:

def cria_acao(string_campanha):
    try:
        return string_campanha.split("|")[0]
    except:
        return ''

def cria_prazo(string_campanha):
    try:
        return string_campanha.split("|")[1]
    except:
        return ''

def cria_credito(string_campanha):
    try:
        return string_campanha.split("|")[2]
    except:
        return ''

df['Ação'] = df['campanha'].apply(cria_acao)
df['Prazo'] = df['campanha'].apply(cria_prazo)
df['Crédito'] = df['campanha'].apply(cria_credito)

This solves the problem, but I do not think it's the best way.

    
13.08.2018 / 18:52
0

Thanks for the comments above to help me solve the problem, but I have identified here what is happening and now there is another error that I need to solve.

The data is like this:

import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 Dias                     5
3            Prospect                              2

What I wanted to do is create a new column according to each line variable divided by "|"

What I have done so far was to separate the lines that have "|"

Then I made the rule to separate and get the data:

df['Ação'] = df['Segmento'].apply(lambda x: x.split("|")[0])
df['Prazo'] = df['Segmento'].apply(lambda x: x.split("|")[1])
df['Credito'] = df['Segmento'].apply(lambda x: x.split("|")[2])

I have the indexing error, because it has a line with 2 fields and has a line with 3, I wanted to know how I can create a function to identify if I have 3 of 3 and if I have 2, the indexing error.

Someone to help anyone starting python here haha

Thank you so much !!

    
13.08.2018 / 15:29