Wrap strings using struct

2

I'm doing a basic exercise using the struct module, and I came across a problem: To wrap a string , we should enter in the struct.pack() method the number of characters it has, right? But what if this string is informed by the user? In that case, I do not know how many characters it will have, so how do I package it?

    
asked by anonymous 05.07.2015 / 05:36

1 answer

2

Well, how do you create a string in C? Unless you declare and initialize at the same time, you must enter the size of the string. Right?

So you should also do the same when wrapping a string with struct, because after all, the struct module does Python byte conversions to C structs and vice versa. In fact, if I'm not mistaken, there's no way to declare a non-sized string as a member of a struct in C.

Size limitation makes more sense if you think that later, someone will need to unpack this struct, and therefore it should know its size, otherwise, if it does, it can make a wrong reading by mixing bytes of one given with those of another data.

Now, about your problem, if you really can not limit user input (something that should be done, for security), you could do something like this:

str_ = input()

bytes = str_.encode()
tamanho = len(bytes) + 1

formatacao = "{}s".format(tamanho)
pacote = struct.pack(formatacao, s)

Remember that the length of a string can be different from its representation in bytes:

>>> len("ç")
1
>>> len("ç".encode())
2

In addition, add 1 to the given size, so that the string can be properly terminated with a %code% (remember strings in C?).

If you can persist the formatting string, you can easily unpack your data.

However, a better approach, since this package can be read in another program, is to package the string along with its size in bytes:

formatacao = "i{}s".format(tamanho)
pacote = struct.pack(formatacao, tamanho, bytes)

So the person reading your struct will know that the first value is an integer that will tell how many bytes forward correspond to the string that was stored before. The idea is to have a header with the information needed to read the data and a body with the data to be read.

Did it make sense? Here is a complete example:

Packaging

>>> entrada = input()
sabão
>>> bytes = entrada.encode()
>>> tamanho = len(bytes) + 1
>>> formatacao = "i{}s".format(tamanho)
>>> pacote = struct.pack(formatacao, tamanho, bytes)
>>> pacote
b'\x07\x00\x00\x00sab\xc3\xa3o\x00'

Unpacking (with format string persistence)

>>> tamanho, bytes = struct.unpack(formatacao, pacote)
>>> str_ = bytes.decode().strip('
>>> fim_int = struct.calcsize('i')
>>> tamanho_str = struct.unpack('i', pacote[:fim_int])
>>> tamanho_str = tamanho_str[0]  # unpack sempre retorna uma tupla, por isso o [0]
')

Unpacking (without formatting string persistence)

First, we calculate how many bytes an integer has, so we can measure how many bytes will be consumed from our packet:

>>> formatacao = '{}s'.format(tamanho_str)
>>> inicio_str = fim_int
>>> fim_str = inicio_str + tamanho_str

Next, we determine the length of the string with the first data taken and calculate the bytes that will be consumed:

>>> bytes = struct.unpack(formatacao, pacote[inicio_str:fim_str])
>>> str_ = bytes[0].decode().strip('
str_ = input()

bytes = str_.encode()
tamanho = len(bytes) + 1

formatacao = "{}s".format(tamanho)
pacote = struct.pack(formatacao, s)
')

Finally, we get and treat the string:

>>> len("ç")
1
>>> len("ç".encode())
2

Obviously this example has some strange things in the idiomatic Python question, but I tried to make it as docile as possible.

    
05.07.2015 / 08:13