Change String Values from a Dataset to 'Float'

0

Good afternoon everyone !!! I'm having a problem getting a job done in a college subject. I'm getting the one ready dataset from another article that was done.

The Dataset looks something like this:

3,24.3,389693,21,23,tcp,1540,-------,4,11339,16091,24780100,Switch1,Router,35.529786,35.529786,35.539909,0,328.240918,505490,1540,0.236321,0,35.519662,35.550032,1,50.02192,Normal
15,24.15,201196,23,24,tcp,1540,-------,16,6274,16092,24781700,Router,server1,20.176725,20.176725,20.186848,0,328.205808,505437,1540,0.236337,0,20.156478,20.186848,1,50.030211,Normal
24.15,15,61905,23,22,ack,55,-------,16,1930,16092,885060,Router,Switch2,7.049955,7.049955,7.059958,0,328.206042,18051.3,55,0.008441,0,7.039952,7.069962,1.030045,50.060221,UDP-Flood
24.9,9,443135,23,21,ack,55,-------,10,12670,16085,884675,Router,Switch1,39.62797,39.62797,39.637973,0,328.064183,18043.5,55,0.008437,0,39.617967,39.647976,1.030058,50.060098,Normal
24.8,8,157335,23,21,ack,55,-------,9,4901,16088,884840,Router,Switch1,16.039806,16.039806,16.04981,0,328.113525,18046.2,55,0.008438,0,16.029803,16.059813,1.030054,50.061864,Normal
24.1,1,219350,21,1,ack,55,-------,2,6837,16091,885005,Switch1,clien-1,21.885768,21.885768,21.895771,0,328.297902,18056.4,55,0.00844,0,21.865762,21.895771,1.030016,50.043427,Normal
24.13,13,480053,24,23,ack,55,-------,14,13609,16103,885665,server1,Router,42.45032,42.45032,42.460323,0,328.460278,18065.3,55,0.008446,0,42.45032,42.48033,1.030032,50.055747,Normal

It is a dataset that made available about DDoS attacks. I will from this dataset carry out the application of supervised classifiers such as NaiveBayes, RandomForest and Multi Layer Perceptron.

The language I'm using is Python (Required) and I'm using Numpy to get the dataset. This function looks like this:

np.set_printoptions(formatter={'float': lambda x: "{0:0.10f}".format(x)}) 
X = np.loadtxt("datasetTrabalho.data", delimiter=",") 

But every time I try to do anything, it gives me errors like this:

File "trabalho.py", line 190, in <module>
    main()
  File "trabalho.py", line 98, in main
    X = np.loadtxt("testeTrabalho.data", delimiter=",") # pega o dataset
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1101, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 746, in floatconv
    return float(x)
ValueError: could not convert string to float: 'tcp'

I need a means help to switch these Dataset Strings values to Integer values, so I can use the appropriate classifiers for the job. Interesting if someone else also has another library to solve this problem. I'll be grateful for the help.

    
asked by anonymous 04.12.2018 / 21:26

2 answers

0

And the error is that numpy is trying to convert numbers to float as you have defined it but there is a "tcp" string on the first line, thus causing the exception.

ALTERNATIVE

You could use the pandas library, where it is a data processing lib.

With it you would read this dataset quietly. I suggest using it along with jupyter .

    
06.12.2018 / 23:05
0

To convert the data type to a DataFrame column (if you are using Pandas), you can execute the command:

DF['NomeDaColuna'] = DF['NomeDaColuna'].astype(float)   # converte para float, neste caso

As you will be converting strings into floats , make sure that the strings have the format 'x.y', where x and y are numbers (also works without the decimal part '.y')

    
18.12.2018 / 19:18