Merging two columns into a new column named 'Class' (Pandas dataframe)

1
df_downsampled[df_downsampled['attack_cat']=="DoS"]

Get the entire 'df_downsampled' dataframe where the 'attack_cat' column has the value DoS.

Dataset: link

colunas = ['srcip','sport','dstip','dsport','proto','state','dur','sbytes', 'dbytes','sttl','dttl',
             'sloss','dloss','service','Sload','Dload','Spkts','Dpkts','swin','dwin','stcpb','dtcpb',

             'smeansz','dmeansz','trans_depth','res_bdy_len','Sjit','Djit','Stime','Ltime','Sintpkt',

             'Dintpkt','tcprtt','synack','ackdat','is_sm_ips_ports','ct_state_ttl','ct_flw_http_mthd',
             'is_ftp_login','ct_ftp_cmd','ct_srv_src','ct_srv_dst','ct_dst_ltm','ct_src_ltm','ct_src_dport_ltm',
             'ct_dst_sport_ltm','ct_dst_src_ltm','attack_cat','Label' ]

UNSW1 = pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_1.csv',dtype={"srcip":object ,},names = colunas)

UNSW2= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_2.csv',dtype={"srcip":object ,},names = colunas)

UNSW3= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_3.csv',dtype={"srcip":object ,},names = colunas)

UNSW4= pd.read_csv('/home/users/p02543/ddos/UNSW-NB15_4.csv',dtype={"srcip":object ,},names = colunas)


UNSW = pd.concat([UNSW1,UNSW2,UNSW3,UNSW4])

  previsores = UNSW.iloc[:,UNSW.columns.isin(('Sload','Dload',
                                                       'Spkts','Dpkts','swin','dwin','smeansz','dmeansz',
    'Sjit','Djit','Sintpkt','Dintpkt','tcprtt','synack','ackdat','ct_srv_src','ct_srv_dst','ct_dst_ltm',
     'ct_src_ltm','ct_src_dport_ltm','ct_dst_sport_ltm','ct_dst_src_ltm')) ].values# atributos previsores

There are two columns I want to "merge":

one is called "Label" and has value 1 when it is attack, and 0 otherwise.

In the 'attack_cat' column I'm only interested when its value is 'DoS' (and in this case, the value of the 'Label' column is 1)

Goal:

Create a new column named "Class" that:

Only take the values of the 'Label' column when the value of attack_cat is 'DoS' (and the value of 'Label' is 1)

(there are other values in 'attack_cat' that do not interest me)

Get ALL the values of the 'Label' column when it is 0 (do not attack)

How to do it?

    
asked by anonymous 08.10.2018 / 18:28

1 answer

2

The way the question is formulated I understand that you need a single column that has the Label values when attack_cat="DoS" and Label values when Label = 0, and for that the solution would be something like:

df_downsampled['Classe'] = pd.concat([(df_downsampled.Label[df_downsampled.attack_cat[df_downsampled.Label == 1]]) , (df_downsampled.Label[df_downsampled.Label == 0])], ignore_index=True)

And to avoid data that is NaN, call the column as:

df_downsample.Classe.dropna()

For a new dataset with the attack_cat="DoS" filter you need:

new_df_downsampled = pd.concat([df_downsampled[df_downsampled['attack_cat']=="DoS"],df_downsampled[df_downsampled.Label==0]])
    
08.10.2018 / 21:39