Probabilistic Considerations for Computation of Shannon Entropy in Network Traffic
I have a dump file (CAP format) of a network traffic capture made with Debian tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP SYN flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and to compare them.
I'm using the Python code:
import numpy as np
import collections
sample_ips = [
"131.084.001.031",
"131.084.001.031",
"131.284.001.031",
"131.284.001.031",
"131.284.001.000",
]
C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts = np.array(C.values(),dtype=float)
prob = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)
When calculating this way, some doubts arise:
2.How to validate the experiment? I am thinking of a hypothesis test with the following null hypothesis: "The entropy value allows you to detect the attack" Are you coherent? What would be a good hypothesis test for the case (the sample space is about 40)