How to check if the txt file has a space in the last line

1

I have a script here that looks for all the .txt files in a folder and then joins them into one file.

The problem is that some files have a "\ n" in the last line, causing the next line not to be below the previous one, causing errors when I import the final .txt.

It would be possible to check if the last line of a file has a "\ n" and thus delete it and if it does not, add a "\ n".

My files are in this format:

00000011098720150131379000100011
00000021098720150131379000400011
00000021098720150131379000400011

Here is the code:

import os
import glob

found = False
source_folder = None

while not found:
  source_folder = str(input("Adicione o diretório com os arquivos.))
   print(source_folder)
  if not os.path.isdir(source_folder):
    print(source_folder, 'A pasta não foi encontrada.)
else:
    print("Pasta encontrada! ")
    found = True

os.chdir(source_folder)

read_files = glob.glob("*.txt")
print(read_files)

arq = str(input("Adicione o nome do arquivo: "))

with open(arq, "wb") as outfile:
  for f in read_files:
      with open(f, "rb") as infile:
          outfile.write(infile.read())
    
asked by anonymous 09.11.2017 / 14:09

2 answers

0

A simple way, since you read all previous files as a single string, is to use the strip method that removes all space characters at the beginning and end of the string.

With this, blank lines at the end will be removed, but also the \n after the last line, which is necessary - so we add it back.

Well, all this to say that you just need to re-write this line:

outfile.write(infile.read())

leaving it like this:

outfile.write(infile.read().strip() + "\n")

If the blank lines were in the middle of the file, instead of at the end, you would have to iterate line by line and delete the blank lines. Thanks to the comprehensions of Python this could be on a line just too:

outfile.writelines(line for line in infile if line.strip())

Only this: the writelines method expects an iterator, which is the generator expression between parentheses. The for of this expression, in turn, uses the input file as an iterator, taking line by line - and the filter expression, after if discards the blank lines: if the line has only spaces and the \n , the strip turns it into an empty string, which has false Boolean value, and then it is discarded from the generator.

    
09.11.2017 / 14:16
0

If the intention is to work with text files, there is no reason to open the input and output files in binary mode.

Here is a tested solution that can "concatenate" all files with .txt extension contained in a given directory into a single file, ignoring the blank lines:

import os
import glob

source_folder = input("Entre com o diretorio de origem: ")

try:
    os.chdir(source_folder)
except FileNotFoundError:
    print( "Diretorio nao encontrado: '%s'" % (source_folder) )
    exit(1)

read_files = glob.glob("*.txt")
print(read_files)

arq = str(input("Entre com o nome do arquivo de saida: "))

with open(arq, "w") as outfile:             # Abre arquivo de saída para gravacao... 
    for f in read_files:                    # Para cada arquivo de entrada...
        with open(f, "r") as infile:        # Abre arquivo de entrada para leitura...
            for ln in infile:               # Para cada linha do arquivo de entrada..
                if ln.strip().strip("\n"):  # Verifica linha em branco
                    outfile.write(ln)       # Grava linha na saida

file1.txt

00000011098720150131379000101528
00000011098720150131379000101561
00000011098720150131379000101594
00000011098720150131379000101627

00000011098720150131379000101660
00000011098720150131379000101693
00000011098720150131379000101726
00000011098720150131379000101759

00000011098720150131379000101792
00000011098720150131379000101825
00000011098720150131379000101858

file2.txt

00000011098720150131379000108227
00000011098720150131379000108260
00000011098720150131379000108293

00000011098720150131379000108326
00000011098720150131379000108359
00000011098720150131379000108392
00000011098720150131379000108425

Testing:

$ python3 teste.py 
Entre com o diretorio de origem: /tmp
['arquivo1.txt', 'arquivo2.txt']
Entre com o nome do arquivo de saida: saida.txt

exit.txt

00000011098720150131379000101528
00000011098720150131379000101561
00000011098720150131379000101594
00000011098720150131379000101627
00000011098720150131379000101660
00000011098720150131379000101693
00000011098720150131379000101726
00000011098720150131379000101759
00000011098720150131379000101792
00000011098720150131379000101825
00000011098720150131379000101858
00000011098720150131379000108227
00000011098720150131379000108260
00000011098720150131379000108293
00000011098720150131379000108326
00000011098720150131379000108359
00000011098720150131379000108392
00000011098720150131379000108425
    
09.11.2017 / 22:04