Break lines with regex

0

I'm having a problem separating lines from multiple txt files. These files have a specific pattern but there are files that do not respect it. Here are some of the 5000 files I'm trying to read:

SAMPLE TYPE / MEDICAL SPECIALTY:Dentistry  
SAMPLE NAME: Teeth Extraction & I&D - 1 

DESCRIPTION:Extraction of teeth #2 and #19 and incision and drainage (I&D) of intraoral and extraoral of left mandibular dental abscess.  
(Medical Transcription Sample Report)  

* * *

PREOPERATIVE DIAGNOSES:Carious teeth #2 and #19 and left mandibular dental abscess.  

POSTOPERATIVE DIAGNOSES: Carious teeth #2 and #19 and left mandibular dental abscess.  

PROCEDURES: Extraction of teeth #2 and #19 and incision and drainage of intraoral and extraoral of left mandibular dental abscess.  

ANESTHESIA:General, oral endotracheal.  

COMPLICATIONS:None.  

DRAINS:Penrose 0.25 inch intraoral and vestibule and extraoral.  

CONDITION: Stable to PACU.  

DESCRIPTION OF PROCEDURE: Patient was brought to the operating room, placed on the table in the supine position and after demonstration of an adequate plane of general anesthesia via the oral endotracheal route, patient was prepped and draped in the usual fashion for an intraoral procedure. In addition, the extraoral area on the left neck was prepped with Betadine and draped accordingly. Gauze throat pack was placed and local anesthetic was administered in the left lower quadrant, total of 3.4 mL of lidocaine 2% with 1:100,000 epinephrine and Marcaine 1.7 mL of 0.5% with 1:200,000 epinephrine. An incision was made with #15 blade in the left submandibular area through the skin and blunt dissection was accomplished with curved mosquito hemostat to the inferior border of the mandible. No purulent drainage was obtained. The 0.25 inch Penrose drain was then placed in the extraoral incision and it was secured with 3-0 silk suture. Moving to the intraoral area, periosteal elevator was used to elevate the periosteum from the buccal aspect of tooth #19. The area did not drain any purulent material. The carious tooth #19 was then extracted by elevator and forceps extraction. After the tooth was removed, the 0.25 inch Penrose drain was placed in a subperiosteal fashion adjacent to the extraction site and secured with 3-0 silk suture. The tube was then repositioned to the left side allowing access to the upper right quadrant where tooth #2 was then extracted by routine elevator and forceps extraction. After the extraction, the throat pack was removed. An orogastric tube was then placed by Dr. X, and stomach contents were suctioned. The pharynx was then suctioned with the Yankauer suction. The patient was awakened, extubated, and taken to the PACU in stable condition.   

KEYWORDS:dentistry, yankauer suction, orogastric tube, carious teeth, penrose drain, forceps extraction, dental abscess, incision, elevator, mandibular, dental, abscess, teeth, intraoral, extraction, drainage

Or

SAMPLE TYPE / MEDICAL SPECIALTY:Dermatology  
SAMPLE NAME: Epidermal Autograft 

DESCRIPTION:A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings. Epidermal autograft on Integra to the back and application of allograft to areas of the lost Integra, not grafted on the back.  
(Medical Transcription Sample Report)  

* * *

PREOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.  

POSTOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.  

PROCEDURES PERFORMED:  
1\. Epidermal autograft on Integra to the back (3520 cm2).  
2\. Application of allograft to areas of the lost Integra, not grafted on the
back (970 cm2).  

ANESTHESIA:General endotracheal.  

ESTIMATED BLOOD LOSS: Approximately 50 cc.  

BLOOD PRODUCTS RECEIVED: One unit of packed red blood cells.  

COMPLICATIONS:None.  

INDICATIONS:The patient is a 26-year-old male, who sustained a 60% total body surface area flame burn involving the head, face, neck, chest, abdomen, back, bilateral upper extremities, hands, and bilateral lower extremities. He has previously undergone total burn excision with placement of Integra and an initial round of epidermal autografting to the bilateral upper extremities and hands. His donor sites have healed particularly over his buttocks and he returns for a second round of epidermal autografting over the Integra on his back utilizing the buttock donor sites, the extent they will provide coverage.  

OPERATIVE FINDINGS:  
1\. Variable take of Integra, particularly centrally and inferiorly on the
back. A fair amount of lost Integra over the upper back and shoulders.  
2\. No evidence of infection.  
3\. Healthy viable wound beds prior to grafting.  

PROCEDURE IN DETAIL: The patient was brought to the operating room and positioned supine. General endotracheal anesthesia was uneventfully induced and an appropriate time out was performed. He was then repositioned prone and perioperative IV antibiotics were administered. He was prepped and draped in the usual sterile manner. All staples were removed from the Integra and the adherent areas of Silastic were removed. The entire wound bed was further prepped with scrub brushes and more Betadine followed by a sulfamylon solution. Hemostasis of the wound bed was ensured using epinephrine-soaked Telfa pads. Following dermal tumescence of the buttocks, epidermal autografts were harvested 8 one-thousandths of an inch using the air Zimmer dermatome. These grafts were passed to the back table where they were meshed 3:1. The donor sites were hemostased using epinephrine-soaked Telfa and lap pads. Once all the grafts were meshed, we brought them back up onto the field, positioned them over the wounds beginning inferiorly and moving cephalad where we had best areas of Integra engraftment. We were happy with the lie of the grafts and they were stapled into place. The grafts were then overlaid with Conformant 2, which was also stapled into place. Utilizing all of his buttocks skin, we did not have enough to cover his entire back, so we elected to apply allograft to the cephalad and a few areas on his flanks where we had had poor Integra engraftment. Allograft was thawed and meshed 1:1. It was then brought up onto the field, trimmed to fit and stapled into place over the wound. Once the entirety of the posterior wounds on his back were covered out with epidermal autograft or allograft sulfamylon soaked dressings were applied. Donor sites on his buttocks were dressed in Acticoat and secured with staples. He was then repositioned supine and extubated in the operating room having tolerated the procedure without any apparent complications. He was transported to PACU in stable condition.   

KEYWORDS:dermatology, flame burns, body surface area, epidermal autograft, autograft, integra, integra engraftment, wound, grafts, epidermal, allograft

In other words, each file of mine will have the structure: Keyword (uppercase or lowercase or with the first capital letter): Explanation definition in multiple lines or just in a line or an enumerated list.

I would like to put each of these blocks on the same line as your keyword. As exemplified in the keyword OPERATIVE FINDINGS: sometimes a block of lines can come as:

Operative things:1\. Variable take of Integra, particularly centrally and inferiorly on theback. A fair amount of lost Integra over the upper back and shoulders.2\. No evidence of infection.3\. Healthy viable wound beds prior to grafting.\nPROCEDURE IN DETAIL: The patient was brought to the operating room and positioned supine. General endotracheal anesthesia was uneventfully induced and an appropriate time out was performed. He was then repositioned prone and perioperative IV antibiotics were administered. He was prepped and draped in the usual sterile manner. All staples were removed from the Integra and the adherent areas of Silastic were removed. The entire wound bed was further prepped with scrub brushes and more Betadine followed by a sulfamylon solution. Hemostasis of the wound bed was ensured using epinephrine-soaked Telfa pads. Following dermal tumescence of the buttocks, epidermal autografts were harvested 8 one-thousandths of an inch using the air Zimmer dermatome. These grafts were passed to the back table where they were meshed 3:1. 

Note that you can have número:número in these snippets so the colonization by two points is not enough. My code looks like this:

def arrumaDadosPEP(self, inDir, outDir):
        with open(inDir, 'r') as f: # abrir e ler o ficheiro
            lines = (i.strip() for i in f.readlines()) # retirar todas as quebras de linha
            text = ''
            for line in lines:
                if(':' in line):
                    expression = line.split(':')[0] # separar e ficar com o que vem antes dos ":", expression
                    if(expression.isupper()): # ver se e maiuscula
                        text += '\n{}'.format(line)
                        continue
                text += line
        directory, file_name = os.path.split(inDir)
        file = open(outDir + file_name, 'w')
        file.write(text.replace('* * *','').replace('Keywords:','\nKEYWORDS:'))
        file.close()
        return(text) 

But do not take key words that are lower-case and sometimes erase space between words and add space in places you do not have. This is a method of a class.

    
asked by anonymous 15.11.2016 / 00:42

1 answer

1

EDIT Fixing issues listed in comments

I came to the solution of your problem with REGEX: ^(?:[\w \/]*\:(?:(?!\n\n|\n[\w ]+\:).)*)|\* \* \*

regex101 online test

How does regex work?

  • First house with a numerical alpha pattern followed by : (the keyword) ;
  • Next, place all characters after the : that do not contain two line breaks together \n\n (the content of the keyword) OR not contains a line break \n followed by a numerical alpha preceding a colon [\w ]+\:

or

  • House with * * *

it gets a bit tricky to figure out here, but looking at the code I think it gets easier

How to use in the code?

  

I would like to put each of these blocks on the same line as your keyword

With this expression, we can separate all groups into memory and then just reassemble by applying the correction you need.

In this case, all you need to do is .replace in all \n (or \r\n ) contained in each group.

Code: (provided by regex101)     import re

regex = r"^(?:[\w \/]*\:(?:(?!\n\n).)*)|\* \* \*"

test_str = ("SAMPLE TYPE / MEDICAL SPECIALTY:Dermatology  \nSAMPLE NAME: Epidermal Autograft \n\nDESCRIPTION:A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings. Epidermal autograft on Integra to the back and application of allograft to areas of the lost Integra, not grafted on the back.  \n(Medical Transcription Sample Report)  \n\n* * *\n\nPREOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.  \nfim here\nPOSTOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.  \n\nPROCEDURES PERFORMED:  \n1\. Epidermal autograft on Integra to the back (3520 cm2).  \n2\. Application of allograft to areas of the lost Integra, not grafted on the\nback (970 cm2).  \n\nANESTHESIA:General endotracheal.  \n\nESTIMATED BLOOD LOSS: Approximately 50 cc.  \n\nBLOOD PRODUCTS RECEIVED: One unit of packed red blood cells.  \n\nCOMPLICATIONS:None.  \n\nINDICATIONS:The patient is a 26-year-old male, who sustained a 60% total body surface area flame burn involving the head, face, neck, chest, abdomen, back, bilateral upper extremities, hands, and bilateral lower extremities. He has previously undergone total burn excision with placement of Integra and an initial round of epidermal autografting to the bilateral upper extremities and hands. His donor sites have healed particularly over his buttocks and he returns for a second round of epidermal autografting over the Integra on his back utilizing the buttock donor sites, the extent they will provide coverage.  \n\nOPERATIVE FINDINGS:  \n1\. Variable take of Integra, particularly centrally and inferiorly on the\nback. A fair amount of lost Integra over the upper back and shoulders.  \n2\. No evidence of infection.  \n3\. Healthy viable wound beds prior to grafting.  \n\nPROCEDURE IN DETAIL: The patient was brought to the operating room and positioned supine. General endotracheal anesthesia was uneventfully induced and an appropriate time out was performed. He was then repositioned prone and perioperative IV antibiotics were administered. He was prepped and draped in the usual sterile manner. All staples were removed from the Integra and the adherent areas of Silastic were removed. The entire wound bed was further prepped with scrub brushes and more Betadine followed by a sulfamylon solution. Hemostasis of the wound bed was ensured using epinephrine-soaked Telfa pads. Following dermal tumescence of the buttocks, epidermal autografts were harvested 8 one-thousandths of an inch using the air Zimmer dermatome. These grafts were passed to the back table where they were meshed 3:1. The donor sites were hemostased using epinephrine-soaked Telfa and lap pads. Once all the grafts were meshed, we brought them back up onto the field, positioned them over the wounds beginning inferiorly and moving cephalad where we had best areas of Integra engraftment. We were happy with the lie of the grafts and they were stapled into place. The grafts were then overlaid with Conformant 2, which was also stapled into place. Utilizing all of his buttocks skin, we did not have enough to cover his entire back, so we elected to apply allograft to the cephalad and a few areas on his flanks where we had had poor Integra engraftment. Allograft was thawed and meshed 1:1. It was then brought up onto the field, trimmed to fit and stapled into place over the wound. Once the entirety of the posterior wounds on his back were covered out with epidermal autograft or allograft sulfamylon soaked dressings were applied. Donor sites on his buttocks were dressed in Acticoat and secured with staples. He was then repositioned supine and extubated in the operating room having tolerated the procedure without any apparent complications. He was transported to PACU in stable condition.   \n\nKEYWORDS:dermatology, flame burns, body surface area, epidermal autograft, autograft, integra, integra engraftment, wound, grafts, epidermal, allograft\n")

matches = re.finditer(regex, test_str, re.MULTILINE | re.DOTALL)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group().replace('\n', ' ')))

Result:

Match 1 was found at 0-45: SAMPLE TYPE / MEDICAL SPECIALTY:Dermatology  
Match 2 was found at 46-79: SAMPLE NAME: Epidermal Autograft 
Match 3 was found at 81-363: DESCRIPTION:A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings. Epidermal autograft on Integra to the back and application of allograft to areas of the lost Integra, not grafted on the back.   (Medical Transcription Sample Report)  
Match 4 was found at 365-370: * * *
Match 5 was found at 372-508: PREOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.   fim here
Match 6 was found at 509-637: POSTOPERATIVE DIAGNOSIS: A 60% total body surface area flame burns, status post multiple prior excisions and staged graftings.  
Match 7 was found at 639-819: PROCEDURES PERFORMED:   1\. Epidermal autograft on Integra to the back (3520 cm2).   2\. Application of allograft to areas of the lost Integra, not grafted on the back (970 cm2).  
Match 8 was found at 821-855: ANESTHESIA:General endotracheal.  
Match 9 was found at 857-901: ESTIMATED BLOOD LOSS: Approximately 50 cc.  
Match 10 was found at 903-965: BLOOD PRODUCTS RECEIVED: One unit of packed red blood cells.  
Match 11 was found at 967-988: COMPLICATIONS:None.  
Match 12 was found at 990-1605: INDICATIONS:The patient is a 26-year-old male, who sustained a 60% total body surface area flame burn involving the head, face, neck, chest, abdomen, back, bilateral upper extremities, hands, and bilateral lower extremities. He has previously undergone total burn excision with placement of Integra and an initial round of epidermal autografting to the bilateral upper extremities and hands. His donor sites have healed particularly over his buttocks and he returns for a second round of epidermal autografting over the Integra on his back utilizing the buttock donor sites, the extent they will provide coverage.  
Match 13 was found at 1607-1859: OPERATIVE FINDINGS:   1\. Variable take of Integra, particularly centrally and inferiorly on the back. A fair amount of lost Integra over the upper back and shoulders.   2\. No evidence of infection.   3\. Healthy viable wound beds prior to grafting.  
Match 14 was found at 1861-3865: PROCEDURE IN DETAIL: The patient was brought to the operating room and positioned supine. General endotracheal anesthesia was uneventfully induced and an appropriate time out was performed. He was then repositioned prone and perioperative IV antibiotics were administered. He was prepped and draped in the usual sterile manner. All staples were removed from the Integra and the adherent areas of Silastic were removed. The entire wound bed was further prepped with scrub brushes and more Betadine followed by a sulfamylon solution. Hemostasis of the wound bed was ensured using epinephrine-soaked Telfa pads. Following dermal tumescence of the buttocks, epidermal autografts were harvested 8 one-thousandths of an inch using the air Zimmer dermatome. These grafts were passed to the back table where they were meshed 3:1. The donor sites were hemostased using epinephrine-soaked Telfa and lap pads. Once all the grafts were meshed, we brought them back up onto the field, positioned them over the wounds beginning inferiorly and moving cephalad where we had best areas of Integra engraftment. We were happy with the lie of the grafts and they were stapled into place. The grafts were then overlaid with Conformant 2, which was also stapled into place. Utilizing all of his buttocks skin, we did not have enough to cover his entire back, so we elected to apply allograft to the cephalad and a few areas on his flanks where we had had poor Integra engraftment. Allograft was thawed and meshed 1:1. It was then brought up onto the field, trimmed to fit and stapled into place over the wound. Once the entirety of the posterior wounds on his back were covered out with epidermal autograft or allograft sulfamylon soaked dressings were applied. Donor sites on his buttocks were dressed in Acticoat and secured with staples. He was then repositioned supine and extubated in the operating room having tolerated the procedure without any apparent complications. He was transported to PACU in stable condition.   
Match 15 was found at 3867-4019: KEYWORDS:dermatology, flame burns, body surface area, epidermal autograft, autograft, integra, integra engraftment, wound, grafts, epidermal, allograft

online test

    
15.11.2016 / 15:37