How to calculate the shannon entropy based on the HTTP header

10

The Shannon entropy is given by the formula:

WhereTiwillbethedataextractedfrommynetworkdump(dump.pcap).

TheendofanHTTPheaderonaregularconnectionismarkedby\r\n\r\n:

ExampleofanincompleteHTTPheader(couldbeadenialofserviceattack):

Mygoalistocalculatetheentropyofthenumberofpackageswith\r\n\r\nandwithout\r\n\r\ninordertocomparethem.

IcanreadthePCAPfilelikethis:

importpysharkpkts=pyshark.FileCapture('dump.pcap')

EntropybasedonIPnumbersI'vedone:

importnumpyasnpimportcollectionssample_ips=["131.084.001.031",
    "131.084.001.031",
    "131.284.001.031",
    "131.284.001.031",
    "131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)

Any ideas? Is it possible / does it make sense to calculate the entropy based on the number of packages with \r\n\r\n and without \r\n\r\n ? Or is it something that does not make sense? Any ideas on how to do the calculation?

The network dump is here: link

Some lines from it:

30  2017/246 11:20:00.304515    192.168.1.18    192.168.1.216   HTTP    339 GET / HTTP/1.1 


GET / HTTP/1.1
Host: 192.168.1.216
accept-language: en-US,en;q=0.5
accept-encoding: gzip, deflate
accept: */*
user-agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0
Connection: keep-alive
content-type: application/x-www-form-urlencoded; charset=UTF-8
    
asked by anonymous 01.09.2017 / 20:51

1 answer

4

I do not know how the structure of your package returned by pyshark is, but I imagine it has 2 information, the ip address and the contents of the package. Imagining that you have these 2 information in a dict, you could do something like:

pkgs = [
    {
        'ip': '127.0.0.1',
        'content': 'Im a http header\r\n\r\n<html><body>',
    },
    {
        'ip': '127.0.0.1',
        'content': 'Im a not a http header',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a http header\r\n\r\n<html><body>',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a not a http header',
    },
    {
        'ip': '127.0.0.2',
        'content': 'Im a not a http header too',
    }
]

def is_http(content):
    return '\r\n\r\n' in content

classified_pkgs = [(p['ip'], is_http(p['content'])) for p in pkgs]
>> [('127.0.0.1', True),
>> ('127.0.0.1', False),
>> ('127.0.0.2', True),
>> ('127.0.0.2', False),
>> ('127.0.0.2', False)]

Then you just calculate the odds as you calculated before:

import numpy as np
import collections

counter = collections.Counter(classified_pkgs)
counts  = np.array(list(counter.values()),dtype=float)

prob = counts/counts.sum()
shannon_entropy = (-prob * np.log2(prob)).sum()
print (shannon_entropy)
    
06.09.2017 / 03:03