I need to read TXT files and remove from them names of people and their respective "functions" of the text. These are minutes of hearing, where I must find the parties 'names (Complainant and Claimant) and the name of the parties' Attorneys. I set find_ between to find a string between two substrings , and then save them to process_id . In the olhômetro I noticed that the majority has a part of the text in which there is a standard where the name of the part of the part is followed by the name of the lawyer that represents it.
Here are some examples of text I'm using in the drive: link
caminho = 'temp'
lista_de_nomes = os.listdir(caminho)
objeto_processo = {}
def find_between(text, first, last):
try:
start = text.index(first) + len(first)
end = text.index(last, start)
return text[start:end]
except ValueError:
return ""
for txt in lista_de_nomes:
with open(path + '\' + txt, "r") as content:
text = content.read()
partes = find_between(text,"preposto", "/sp")
objeto_processo["Partes&Avogados"] = partes
print(objeto_processo)
What I wanted to know is how I can extract this information from process_experience, and transfer it to an excel containing for example the information:
Complainant | Adv to the Claimant | Claimed | Claimed Adv
PS: Sometimes, instead of people's names are company names