Regex for capturing fixed strings in HTML and JS codes [closed]

2

I am doing some automated testing for a legacy project in the MVC template, however there is a requirement for one of them that is capturing all fixed strings in HTML and JS codes. Since the project company is undergoing an internationalization process of its content, transforming its fixed strings into resource files.

I made this regex: ([\n]|^)(?<Value>(?!.*?\/\/|.*?@\*|.*?@.*?@|.*?\/\*|.*?<!--|.*?\\*)([^\n]*?)[áâãàéêèíîìóôõòúûù].*)

It partially solves my problem, since it identifies accented characters in the code by capturing SE not in comments (% with%).

So since there are no HTML or JS functions that use accents, I can assume that these are fixed strings.

After doing this, I was able to identify some pages that have fixed strings that should be transformed into resource files, but this regex does not cover all cases.

I would like a regex that:

  • Capture fixed strings even with no accented characters in HTML and JS codes.
  • Ignore string cases in comments.
  

There would be some particularity of   syntax that could help me delimit where the regex should capture   to identify these strings?

    
asked by anonymous 11.05.2017 / 17:37

1 answer

3

You can not use regex over HTML.

Repeat with me. You can not use regex over HTML.

Write in a frame 100 times:

for (int i = 0; i < 100; i++) {
    print('Não é possível usar regex sobre HTML.')
}

If you were able to use regex over HTML, you used it only on a snippet, or in a very specific case. Because, in general, it is not possible to use regex over HTML.

Do not just believe it because I'm talking. The best answer of all time in the Stack Overflow matrix was about a similar question. So check it out.

Can not use regex over HTML.

However, as the answer says in the root, you can use an XML parser.

    
11.05.2017 / 21:56