Replace words

3

I would like to know how to compact the lines of a txt. For example, the lines are broken by \n but are still part of the same sentence.

SOCIAL HISTORY:Denies tobacco or alcohol use.   
PHYSICAL EXAMINATION: 
VITAL SIGNS: Age 34, blood pressure 128/78, pulse 70, temperature is 97.8,
weight is 207 pounds, and height is 5 feet 7 inches.  
GENERAL: The patient is healthy appearing; alert and oriented to person, place
and time; responds appropriately; in no acute distress.  
HEAD: Normocephalic. No masses or lesions noted.  
FACE: No facial tenderness or asymmetry noted. 

or whole blocks of text like:

A complete refractive work-up was performed today, in which we found a mild
change in her distance correction, which allowed her the ability to see 20/70
in the right eye and 20/200 in the left eye. With a pair of +4 reading
glasses, she was able to read 0.5M print quite nicely. I have loaned her a
pair of +4 reading glasses at this time and we have started her with fine-
detailed reading. She will return to our office in a matter of two weeks and
we will make a better determination on what near reading glasses to prescribe
for her. I think that she is an excellent candidate for low vision help. I am
sure that we can be of great help to her in the near future. 

I wanted them to stay on one line.

I need each line to match your ID, such as IDENTIFICACAO: SENTENÇA SEM QUEBRA DE LINHA IDENTIFICACAO: SENTENÇA SEM QUEBRA DE LINHA So that each ID is on one line only. the words are different so you can not use replace . Another ploblem is that it has txt files that are not broken: IDENTIFICACAO: SENTENÇA SEM QUEBRA DE LINHA. IDENTIFICACAO: SENTENÇA SEM QUEBRA DE LINHA. IDENTIFICACAO: SENTENÇA SEM QUEBRA DE LINHA I was using regex but it is not working.

    
asked by anonymous 14.11.2016 / 14:27

1 answer

2

Well, I think I realized, according to the example you put in the question you can search for the expression that is uppercase and if there are ':' on the line.

with open('tests.txt', 'r') as f:
    print(f.read())
    lines = (i.strip() for i in f.readlines())
    text = ''
    for line in lines:
        words = line.split()
        if(len(words) > 0):
            if(words[0].isupper() and ':' in line):
                text += '\n{}'.format(line)
                continue
            text += line

Here the variable that stores the final text is text

Here's another way to do it. First we see if there are ":" in the line, we separate the line by ":" and check if the expression that comes before the ":" is capital:

with open('tests.txt', 'r') as f: # abrir e ler o ficheiro
    lines = (i.strip() for i in f.readlines()) # retirar todas as quebras de linha
    text = ''
    for line in lines:
        if(':' in line):
            expression = line.split(':')[0] # separar e ficar com o que vem antes dos ":", expression
            if(expression.isupper()): # ver se e maiuscula
                text += '\n{}'.format(line)
                continue
        text += line
    
14.11.2016 / 16:35