We are using REGEX to normalize pharmaceutical data from a string field and we need to distinguish very similar strings from an exception command.
For example, in a very simple way, we have the following records:
0.5 MG COM CT BL AL / AL X 30 ----> COM = Simple Tablet
0.4 MG COM REV CT BL AL AL X 90 ----> COM REV = Coated tablet
0.7 MG COM LIB PROL CT BL AL AL X 30 ----> COM LIB PROL = Prolonged Release Tablet
To identify a Coated Tablet, we use the syntax: COM \ sREV \ s
To identify the Liber Pill. Prolong., We use the syntax: COM \ sLIB \ sPROL \ s
In this example simplified we need to identify a Simple Tablet and for this we need to look for an expression where there is only COM , without REV and LIB. Something like syntax:
COM \ s [^ (REV | LIB)]
.. but this syntax did not work. Can anyone help?
EDITED
REV will not always be immediately after COM . The string may come, for example:
0.4 MG COM CT REV BL AL AL X 90 ----> or with any other words.
The issue is that REV can not exist anywhere in the string.
EDITED 27/07
The syntax \ bCOM \ b \ s (?!. REV |. * LIB) worked well for REV and LIB cases after > COM , but you can not find the expressions below because there is REV and LIB before COM
0.4 MG REV COM CT BL AL AL X 90
0.7 MG LIB PROL COM CT BL AL AL X 30
And then the syntax needs to be comprehensive to identify the COM and discard any REV or LIB
Something like: (?! * REV |. * LIB) \ bCOM \ b \ s (?!. * REV |. * LIB)
Is it possible?