Assign the id according to the content of the text

2

I do not know if Javascript is better, PHP, or even the sublime text, but,

I copied the cf text and pasted it into a txt (to get rid of those strange tags)

Then in the sublime text, I:

  • I selected everything crt + to
  • alt + shift + w to insert html tags on each line
  • I type li , hence already creates the opening and closing tags for each line

Then I ask you to select everything:

</li>

<li>TÍTULO

and replace with:

      </li>
  </ul>

  <ul id="titulo" class="titulo">
      <li>TÍTULO

And then, I repeated this phase for each subsection: titulo>capítulo>seção>subseção>artigo

I got this result: cf_passo_1.txt

I would like the algorithm to search every ul and assign the value of id according to the content of each ul .

For example, it looks like this:

<ul id="titulo" class="titulo">
   <li>
      TÍTULO I ...

<ul id="artigo" class="artigo">
   <li>
      Art.  1º A ...

I would like it to look like this:

<ul id="titulo1" class="titulo">
   <li>
      TÍTULO I ...

<ul id="artigo1" class="artigo">
   <li>
      Art.  1º A ...
    
asked by anonymous 20.04.2015 / 23:12

1 answer

1

As the content seems to me to be fairly uniform, a set of substitutions via regular expressions should be sufficient to achieve its purpose. These substitutions can be made in any of the three ways mentioned: in the text editor itself (using the "Search / Replace" - "Find / Replace" function) or using any programming language PHP and JavaScript. The syntax of regexes will be similar in all cases ( PCRE ) not so much.

By way of example, the conversion of articles would be as follows:

Search:

<ul id="artigo" class="artigo">\n   <li>\n      Art\.  (\d)º

Replace:

<ul id="artigo$1" class="artigo">\n   <li>\n      Art.  $1º

The rest (including Roman numerals) would have the added complication of converting them (if indeed it is important to convert them). At this time it helps to use a programming language, where you can replace the result of the wedding with the return value of a function. JavaScript Example:

var regex = /<ul id="titulo" class="titulo">\n   <li>\n      TÍTULO ([^ ]+) /g
var convertido = str.replace(regex, function(match, romano) {
    var arabico = deromanize(romano);
    return '<ul id="titulo' + 
           arabico + 
           '" class="titulo">\n   <li>\n      TÍTULO ' + 
           romano + 
           ' ';
};

Using a built-in function to convert Romans to Arabic, such as the deromanize described by this article :

function deromanize (str) {
    var str = str.toUpperCase(),
        validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/,
        token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g,
        key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1},
        num = 0, m;
    if (!(str && validator.test(str)))
        return false;
    while (m = token.exec(str))
        num += key[m[0]];
    return num;
}
    
20.04.2015 / 23:40