content management in python

1
try:
    with open('valores.bin', 'r+b') as arq:
        n  = struct.unpack('i', arq.read(4))[0]
        arq.seek(0)
        for i in range(n):
            arq.seek(0)
            if isinstance(struct.unpack('i', arq.read(4)), int) and struct.unpack('i', arq.read(4)) < 10:
                arq.write(struct.pack('i', 0))
            elif isinstance(struct.unpack('f', arq.read(4)), float) and struct.unpack('f', arq.read(4)) > 9.0:
                arq.write(struct.pack('f', 1000.0))
except IOError:
    print('Erro ao abrir ou ao manipular o arquivo.')
    
asked by anonymous 30.04.2018 / 05:32

1 answer

2

There are two things you need to understand when dealing with binary files like this:

  • Reading an amount X of bytes with read advances the read position of the X file positions. This happens every time you call read .

  • seek sends you to the position you passed. Then a seek(0) sends the file reading position back to the beginning.

  • Inside the binary file, it only has bytes, and all are queued.

In your case, for example: the first four bytes represent an integer that indicates how many pairs of int and float the file contains, followed by four from the first int, four from the first float, and so on.

Let's suppose that our file has 2 pairs. The binary will look something like this:

>0010 iiii ffff iiii ffff

Where 0010 is integer 2 in binary, iiii represents a 4-byte integer, and ffff is a 4-byte float. The > arrow represents the read position of the file. When we open the file, it is at position 0, first of all.

Let's take a look at your code:

with open('valores.bin', 'r+b') as arq:
    n  = struct.unpack('i', arq.read(4))[0]
    arq.seek(0)
    for i in range(n):
        arq.seek(0)
        if isinstance(struct.unpack('i', arq.read(4)), int) and struct.unpack('i', arq.read(4)) < 10:
            arq.write(struct.pack('i', 0))
        elif isinstance(struct.unpack('f', arq.read(4)), float) and struct.unpack('f', arq.read(4)) > 9.0:
            arq.write(struct.pack('f', 1000,0))

The first problem is that before you enter the loop, you send the reading position back to the beginning of the file, but that's not what we want to do. After we read the first integer and we know the size of the file, you do not have to read the first 4 bytes anymore. seek(0) is unnecessary.

I mean: after n = struct.unpack('i', arq.read(4))[0] , as we gave read , the reading position is this:

0010 >iiii ffff iiii ffff

We are already in a position to start reading the values. If we give seek(0) , we return to the first position:

>0010 iiii ffff iiii ffff

And we are no longer interested in reading 0010 , because we already know that the file has 2 pairs of values.

Then you can see some more problems inside the loop:

  • We give seek(0) at the beginning of each iteration. So, not only do we get back to the first one that does not interest us, but we never get ahead in the next iterations, and even if we did not have the first% wc we would always read the first pair of values.

  • We give 0010 several times without saving the value. Remember that every arq.read(4) advances the read position in read(x) , so we can only call x once before it goes to the next item. You may want to save the result of read to a variable to avoid having to read the same value twice.

  • We checked that the result is arq.read(4) after we have interpreted it as int . When we call int with the argument struct.unpack , we are saying to interpret those bytes as integers and it will return an integer anyway. The problem is that if we interpret a float as integer, the int value will have nothing to do with the float.

  • What I recommend is to first make the most basic work: let's read the file and make sure the positions are correct:

    with open('valores.bin', 'r+b') as arq:
        n = struct.unpack('i', arq.read(4))[0]
        print(n)
        for i in range(n):
            meu_inteiro = struct.unpack('i', arq.read(4))
            print(meu_inteiro)
    
            meu_float = struct.unpack('f', arq.read(4))
            print(meu_float)
        # Resultado: 3 (2,) (2.5,) (12,) (12.5,) (1337,) (314.70001220703125,)
    

    In my case, the values I took were these, so that's all there is to it. Note that we do not use 'i' yet, because it is not necessary just for sequential reading. We're just going to need it to overwrite the values. That is:

  • We read the first value seek and put it in the variable iiii .

    0010 >iiii ffff iiii ffff
    ->
    0010 iiii >ffff iiii ffff
    
  • We compare meu_inteiro (without doing another meu_inteiro ) with some value. If it is less than 10, we return the necessary positions to change it by -1:

    0010 iiii >ffff iiii ffff
    -> (seek pra voltar à primeira posição)
    0010 >iiii ffff iiii ffff
    -> (escrita de novo int -1)
    0010 iiii >ffff iiii ffff
    (procedemos com a leitura do float)
    
  • The read has 3 modes of operation, defined by the second argument. The first and default mode is to set the absolute position of the read / write position of the file. That is, to make seek read the read position in byte 4. If we pass the second argument as 1, then the position is relative to the current position. That is, seek(4) pushes the position 4 bytes ahead of the current position; if we are in position 4, it goes to 8. The third mode, passing 2, is relative to the end of the file, but that does not matter to us.

    Since we want to return 4 bytes if we are to write, we should use seek(4, 1) .

    Then your code would look like this:

    import struct
    
    try:
        with open('valores.bin', 'r+b') as arq:
            n = struct.unpack('i', arq.read(4))[0]
            print(n)
            for i in range(n):
                meu_inteiro = struct.unpack('i', arq.read(4))[0]
                print(meu_inteiro)
                if meu_inteiro < 10:
                    arq.seek(-4, 1)  # Voltar à posição do iiii que deve ser sobrescrito
                    arq.write(struct.pack('i', 0))
    
                meu_float = struct.unpack('f', arq.read(4))[0]
                print(meu_float)
                if meu_float > 9.0:
                    arq.seek(-4, 1)  # Voltar à posição do ffff que deve ser sobrescrito
                    arq.write(struct.pack('f', 1000.0))
    except IOError:
        print('Erro ao abrir ou ao manipular o arquivo.')
    
        
    01.05.2018 / 00:06