I made this script to read a TXT file, find a 20-digit sequence in the text, and rename the file with the digit sequence found.
I used replace
to remove all the characters that appear between numbers, but somehow it did not remove hyphens when renaming.
name_files5 = os.listdir(path_txt)
for TXT in name_files5:
with open(path_txt + '\' + TXT, "r") as content:
search = re.search(r'(?:\d(?:[\s,.\-\xAD_]|(?:\r)|(?:\n))*){20}', content.read())
if search is not None:
name5 = search.group(0)
name5 = name5.replace("\n", "")
name5 = name5.replace("\r", "")
name5 = name5.replace("n", "")
name5 = name5.replace("r", "")
name5 = name5.replace("-", "")
name5 = name5.replace("\", "")
name5 = name5.replace("/", "")
name5 = name5.replace(".", "")
name5 = name5.replace(" ", "")
fp = os.path.join("20_digitos", name5 + "_%d.txt")
postfix = 0
while os.path.exists(fp % postfix):
postfix += 1
os.rename(
os.path.join(path_txt, TXT),
fp % postfix
)
I made other links to find other sequences for other sequences of more or less digits, using replace in the same way, including the hyphen, and worked without problems
edit: example of how the sequence appears in the text, and how it renamed the file, "_0" is just an increment to differentiate files when you already have one with the same name
As it appears in the text:
0001018-88.2011.5.02.0002
How did you rename:
0001018-8820115020002_0