REGEX - search for expressions that do NOT contain specific words

8

We are using REGEX to normalize pharmaceutical data from a string field and we need to distinguish very similar strings from an exception command.

For example, in a very simple way, we have the following records:

0.5 MG COM CT BL AL / AL X 30 ----> COM = Simple Tablet

0.4 MG COM REV CT BL AL AL X 90 ----> COM REV = Coated tablet

0.7 MG COM LIB PROL CT BL AL AL X 30 ----> COM LIB PROL = Prolonged Release Tablet

To identify a Coated Tablet, we use the syntax: COM \ sREV \ s

To identify the Liber Pill. Prolong., We use the syntax: COM \ sLIB \ sPROL \ s

In this example simplified we need to identify a Simple Tablet and for this we need to look for an expression where there is only COM , without REV and LIB. Something like syntax:

COM \ s [^ (REV | LIB)]

.. but this syntax did not work. Can anyone help?

EDITED

REV will not always be immediately after COM . The string may come, for example:

0.4 MG COM CT REV BL AL AL X 90 ----> or with any other words.

The issue is that REV can not exist anywhere in the string.

EDITED 27/07

The syntax \ bCOM \ b \ s (?!. REV |. * LIB) worked well for REV and LIB cases after > COM , but you can not find the expressions below because there is REV and LIB before COM

0.4 MG REV COM CT BL AL AL X 90

0.7 MG LIB PROL COM CT BL AL AL X 30

And then the syntax needs to be comprehensive to identify the COM and discard any REV or LIB

Something like: (?! * REV |. * LIB) \ bCOM \ b \ s (?!. * REV |. * LIB)

Is it possible?

    
asked by anonymous 26.07.2016 / 13:30

3 answers

2

Since sentences will be separated by \n , and you do not want to capture those that do not have the words REV and LIB , note that REVENDEDOR and LIBERADO will capture

The sentence could be ^(?!.* (REV|LIB) .*).*$ .

Applying with the gm modifiers.

See working at REGEX101 .

Explanation

  • ^ ... $ - should the sentence go from beginning to end of the line.
  • (?!) - negation lookback, marry this sentence then ignore.
  • .* (REV|LIB) .* any phrase that has REV or LIB .
  • .* anything.
  • Modifier g - global, all that you can find
  • Modifier m - multiline, which says that every \n it considers as a new sentence.

Applying in PHP

$content = "
0,5 MG COM CT BL AL/AL X 30
0,4 MG COM REV CT BL AL AL X 90
0,7 MG COM LIB PROL CT BL AL AL X 30 
0,4 MG COM CT REV BL AL AL X 90
";

preg_match('~^(?!.* (REV|LIB) .*).*$~m', $content, $matchs);

Issue

As commented I end up forgetting about COM .

The new expression would look like ^(?!.* (REV|LIB) .*).* COM .*$

Explanation

  • (?!.* (REV|LIB) .*) - says with what "should not marry".
  • .* COM .* - says with what "should marry".

Note the spaces in COM and in (REV|LIB) this restricts so that it is just these sentences.

As it treats being two expressions, the one of "should not marry" and the "must marry", no matter if REV|LIB are in or after COM , will not be captured.

See working at REGEX101

    
26.07.2016 / 22:22
2

If you need to do an exact search for a word, the anchor (boundary) \b and the negative Lookahead ( ?! ) to deny the group.

Question example regex gets:

\bCOM\b\s(?!REV|LIB)

Return is four characters, COM_ or COM followed by a space.

Related:

Meaning of?:? =?! •

What is a boundary \ b in a regular expression?

    
26.07.2016 / 14:11
2

You can do the following:

COM\s(?!REV|LIB)

Running expression example.

This expression will only select COMs that are not preceded by REV or LIB.

Explanation (Simple because I do not have advanced knowledge in Regular Expressions):

  • ? = indicates that there is zero or one occurrence of the preceding element

  • ! = different sign

  • (?!) = Deny (? =), marries the absence of the current pattern from the current position to the end, and also does not include the default in the marriage. For example, the standard car (?! Yellow) will marry in "A cheap blue sports car.", Though car (?! Blue) will not marry.

Source: Regular Expression

Edit (Confirm new scenario)

If the REV and LIB can be m any point in the string maybe the addition of wildcards ( .* ) before and after the negated expression already solves. Something like this:

COM\s(?!.*(REV|LIB).*)

Functional example online.

    
26.07.2016 / 14:42