Good afternoon everyone !!! I'm having a problem getting a job done in a college subject. I'm getting the one ready dataset from another article that was done.
The Dataset looks something like this:
3,24.3,389693,21,23,tcp,1540,-------,4,11339,16091,24780100,Switch1,Router,35.529786,35.529786,35.539909,0,328.240918,505490,1540,0.236321,0,35.519662,35.550032,1,50.02192,Normal
15,24.15,201196,23,24,tcp,1540,-------,16,6274,16092,24781700,Router,server1,20.176725,20.176725,20.186848,0,328.205808,505437,1540,0.236337,0,20.156478,20.186848,1,50.030211,Normal
24.15,15,61905,23,22,ack,55,-------,16,1930,16092,885060,Router,Switch2,7.049955,7.049955,7.059958,0,328.206042,18051.3,55,0.008441,0,7.039952,7.069962,1.030045,50.060221,UDP-Flood
24.9,9,443135,23,21,ack,55,-------,10,12670,16085,884675,Router,Switch1,39.62797,39.62797,39.637973,0,328.064183,18043.5,55,0.008437,0,39.617967,39.647976,1.030058,50.060098,Normal
24.8,8,157335,23,21,ack,55,-------,9,4901,16088,884840,Router,Switch1,16.039806,16.039806,16.04981,0,328.113525,18046.2,55,0.008438,0,16.029803,16.059813,1.030054,50.061864,Normal
24.1,1,219350,21,1,ack,55,-------,2,6837,16091,885005,Switch1,clien-1,21.885768,21.885768,21.895771,0,328.297902,18056.4,55,0.00844,0,21.865762,21.895771,1.030016,50.043427,Normal
24.13,13,480053,24,23,ack,55,-------,14,13609,16103,885665,server1,Router,42.45032,42.45032,42.460323,0,328.460278,18065.3,55,0.008446,0,42.45032,42.48033,1.030032,50.055747,Normal
It is a dataset that made available about DDoS attacks. I will from this dataset carry out the application of supervised classifiers such as NaiveBayes, RandomForest and Multi Layer Perceptron.
The language I'm using is Python (Required) and I'm using Numpy to get the dataset. This function looks like this:
np.set_printoptions(formatter={'float': lambda x: "{0:0.10f}".format(x)})
X = np.loadtxt("datasetTrabalho.data", delimiter=",")
But every time I try to do anything, it gives me errors like this:
File "trabalho.py", line 190, in <module>
main()
File "trabalho.py", line 98, in main
X = np.loadtxt("testeTrabalho.data", delimiter=",") # pega o dataset
File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1101, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1028, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/home/arthur/.local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 746, in floatconv
return float(x)
ValueError: could not convert string to float: 'tcp'
I need a means help to switch these Dataset Strings values to Integer values, so I can use the appropriate classifiers for the job. Interesting if someone else also has another library to solve this problem. I'll be grateful for the help.