Although we know very well what the classic answer for people trying to render HTML using regular expressions, we also have the next answer in the same question , which adds an interesting point.
For point-in-time cases where I need to extract or work some data simply in an HTML text, it's often much faster and more practical to produce a regular expression that does the work for me than using an HTML parser. I have no problem using regex in this kind of situation.
Clarified this, the answer:
var testString = '<div class="c2029" style="font-size:45px"><p class="auto">Testing 123...</p></div>'
var result = testString.replace(/<div class="c.*?>(.*?)<\/div>/, '$1');
console.log(result);
The regular expression itself:
<div class="c.*?>(.*?)<\/div>
Explanation:
-
<div class="c.*?>
- Here a lazy quantifier ( .*?
) is used to capture the initial pattern and stop at the first occurrence of the closing of the >
tag.
-
(.*?)<\/div>
- We used lazy quantifier again in a capturing group and ended with the closing tag of div
.
- Lastly, we use
replace()
keeping group 1 obtained in the catch, using the $1
marker.
Update
According to the OP, it seems that the desired response was another, since there are situations where <div>
does not appear (which was not specified in the question).
Solution 2:
<div class="c.*?>(((?!<\/div>)[\s\S])*)(<\/div>)?
This regular expression was adjusted so that you could consider the new situation and also the possibility of line breaks.
Demo: regex101.com
Explanation:
-
<div class="c.*?>
- This is the beginning of the capture of the specified pattern. captures any text until the closing of the >
tag.
-
(((?!<\/div>)[\s\S])*)
- This is a bit more complex trick. The (?!<\/div>)
pattern is a lookahead that checks if the previous match is not followed by the <\/div>
pattern. Then I get the next character that is not a whitespace (given by [\s\S]
), that is, any character after that assertion. It is necessary to first check and capture later, because if it were otherwise ( [\s\S](?!<\/div>)
), the last character before the default that should not be captured would not be captured either. (You can see how this happens by changing the regex101 demo.) In the end, I put this into a catch group and had it repeat the same pattern zero or more times, resulting in: (((?!<\/div>)[\s\S])*)
.
-
(<\/div>)?
- Finally, I catch the div
closing pattern, marking it as optional with the ?
quantizer. That way, even if the lock does not exist, there's no problem at all.