Txt manipulation - Separate blocks based on a pattern

1

I have a txt with some information. The txt follows the following pattern:

1 - Starting block

2 - Information

3 - Description of line 2

So, for example

190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN

What I need is to separate the different blocks into different variables knowing that they start at 1 and end at the next incidence of 1. The example above would look like this:

a = '190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN'

b = '120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN'

I've tried combining while + readline + startswith , but I could not

    
asked by anonymous 08.05.2017 / 15:56

1 answer

0

You can use the following regular expression:

^1 - Where ^ indicates the beginning of a line followed by the character you are looking for in the 1 case. I made an example that I do the split of the text using regular expression and after this I return concatenating each item with the 1.

To set up the regular expression I like to use the Rubular

Code: repl

import sys
import re

FINDER = "1"
# [(^1)]
# http://rubular.com/r/Sx8PL2qdR8
REGEX = '[(^' + FINDER + ')]'

if __name__ == '__main__':
    content = """190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN"""

    splitted = re.split(REGEX, content)

    #Verifica e faz a remoção do primeiro item se for vazio
    if splitted[0] == '':
        splitted = splitted[1:]

    #Recupera todos os item juntando com o caracter inicial
    result = []
    for split in splitted:
        result.append(FINDER + split)

    #Exibe o resultado
    print(len(result))
    for r in result:
        print("----> " + r)
    
08.05.2017 / 16:59