I'm new to Python, and with a problem I'm not finding a solution to. I have a folder with about 10k of .txt (written in the most varied ways). I need to extract the FIRST sequence of 17 numbers that is located in the first lines of these txt's, and rename the file with the extracted sequence.
This sequence sometimes appears concatenated other times it appears separated by period and hyphen (ex: 00273200844202003, 00588.2007.011.02.00-9) PS: There are other numeric sequences in the text different or equal to 17 numbers, but the sequence is always the first of 17 that appears.
I stored the current names of the documents in a list, I was trying to find the sequence of numbers in the text using the NLTK package but without success.
pasta_de_documentos = (r'''C:\Users\mateus.ferreira\Desktop\Estudos\Python\Doc_Classifier\TXT''')
documentos = os.listdir(pasta_de_documentos)
If someone knows a better approach or can give me a way to continue attacking the problem thank you. (I'm using Python 3)