Regex - Get text up to a certain string

9

I would like to get the text up to the characters a) and if possible and the replies also separated using Regex?

pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta<br /><br />

pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta pergunta<br /><br />

  a) resposta a<br />
  b) resposta b<br />
  c) resposta c<br />
  d) resposta d<br />
  e) resposta e<br />

I'm getting the answers with this simple rule link but the question is difficult. Remember that the question may have several paragraphs.

    
asked by anonymous 28.11.2017 / 20:05

4 answers

5

Use this regex:

((.|\n)*?)(a\).*?)\n*?(b\).*?)\n*?(c\).*?)\n*?(d\).*?)\n*?(e\).*?)$|\n*?

It will separate the text into groups where:

  • Group 1 - Contains the text before the option a) .
  • Group 2 - Capture nothing but encapsulate Group 1 capture options.
  • Group 3 - Contains the contents of the a option until the line break (where option b would start in your example).
  • Group 4 - Contains the contents of the b option until the line break.
  • Group 5 - Contains the contents of the option c until the line break.
  • Group 6 - Contains the contents of the d option until the line break.
  • Group 7 - Contains the contents of the e option until the line break or end of the text.

You can see the operation of this regex here

Explanation of regex

((.|\n)*?) - Will capture any character and line break of content until the first delimiter arrives.

(a\).*?)\n*? - a\) is equal to a) and it will be used as a delimiter, so that the first capture group stops capturing on the first occurrence of the a) sequence, after that the regex will capture all the content until the first line break.

(b\).*?)\n*? - The operation of this catch group is the same as that of group 3, only capturing from b) .

(c\).*?)\n*? - The operation of this catch group is the same as that of group 3, only capturing from c) .

(d\).*?)\n*? - The operation of this catch group is the same as that of group 3, only capturing from d) .

(e\).*?)\$|\n*? - The operation of this catch group is the same as that of group 3, only capturing from e) and to the end of the text or a line break, in case you use this regex in a file that has many questions and texts.

    
28.11.2017 / 20:26
13

Marry question and answer with an expression

If you really want to match anything to a letter followed by a ) , or the end of the string, you can use this RegExp: (regexr.com)

([\s\S]+?(?=\b[a-z][)]|$))
  • is a way to match all characters, including line breaks.
    Normally, we would use the singleline flag to change the behavior of the point, but does not exist in ASP.
29.11.2017 / 11:10
9

You can use /(.*?a\))/s to get all the characters including a)

You can use /(.*?)a\)/s for all characters prior to a)

Explanation:

.*? Identifies all characters

a Identifies a a literally (case sensitive)

\( Identifies the first parentheses

    
28.11.2017 / 20:49
5

If you're using a library with more features like PCRE .

You could use:

/(?(?=\s+)|(?(?=\w\))(?<a>\w\).*)|(?<q>.*?\n)))/g

See working at REGEX101

Here I am using "conditionals" and " nominal groups "

Explanation

  • (?(?=\s+)|...) - This conditional basically says you should ignore "spaces" , because if you find the first condition is "do nothing ".
  • (?(?=\w\)) - Here we actually have the condition for questions and answers, because if I find [a-z0-9_] followed by ) is answer, if not, it's a question.

Note

  • I've used \w to facilitate, the correct I think would be [[:alpha:]] , or if you use the modifier i simplifies to [a-z]

Problems

  • This Regex ends up generating a lot of garbage because it does not capture the first condition.
29.11.2017 / 13:58