Merging two columns into a new column named 'Class' (Pandas dataframe)


Get the entire 'df_downsampled' dataframe where the 'attack_cat' column has the value DoS.

Dataset: link

colunas = ['srcip','sport','dstip','dsport','proto','state','dur','sbytes', 'dbytes','sttl','dttl',


             'ct_dst_sport_ltm','ct_dst_src_ltm','attack_cat','Label' ]

UNSW1 = pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_1.csv',dtype={"srcip":object ,},names = colunas)

UNSW2= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_2.csv',dtype={"srcip":object ,},names = colunas)

UNSW3= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_3.csv',dtype={"srcip":object ,},names = colunas)

UNSW4= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_4.csv',dtype={"srcip":object ,},names = colunas)

UNSW = pd.concat([UNSW1,UNSW2,UNSW3,UNSW4])

  previsores = UNSW.iloc[:,UNSW.columns.isin(('Sload','Dload',
     'ct_src_ltm','ct_src_dport_ltm','ct_dst_sport_ltm','ct_dst_src_ltm')) ].values# atributos previsores

There are two columns I want to "merge":

one is called "Label" and has value 1 when it is attack, and 0 otherwise.

In the 'attack_cat' column I'm only interested when its value is 'DoS' (and in this case, the value of the 'Label' column is 1)


Create a new column named "Class" that:

Only take the values of the 'Label' column when the value of attack_cat is 'DoS' (and the value of 'Label' is 1)

(there are other values in 'attack_cat' that do not interest me)

Get ALL the values of the 'Label' column when it is 0 (do not attack)

How to do it?

asked by anonymous 08.10.2018 / 18:28

1 answer


The way the question is formulated I understand that you need a single column that has the Label values when attack_cat="DoS" and Label values when Label = 0, and for that the solution would be something like:

df_downsampled['Classe'] = pd.concat([(df_downsampled.Label[df_downsampled.attack_cat[df_downsampled.Label == 1]]) , (df_downsampled.Label[df_downsampled.Label == 0])], ignore_index=True)

And to avoid data that is NaN, call the column as:


For a new dataset with the attack_cat="DoS" filter you need:

new_df_downsampled = pd.concat([df_downsampled[df_downsampled['attack_cat']=="DoS"],df_downsampled[df_downsampled.Label==0]])
08.10.2018 / 21:39