I have an html file with urls in this default URL: https://www.olympikus.com.br/tenis-olympikus-flower-415-feminino-cinza-D22-1131-010
The default is protocolo://dominio/strig-dinâmica-000-0000-000
I want to get all the links in this pattern. Then I created the following ER: (https\:\/\/?)www\.olympikus\.com\.br\/(.*)\-[A-Z0-9]{3}-[A-Z0-9]{4}-[A-Z0-9]{3}
Unfortunately the default gets the initial ceiling protocolo://dominio/
and ends in the last possible marriage -000-0000-000
Returning a raw string in the middle because of (.*)
. I can not handle the dynamic part of the URL
How to write this ER so that it returns all the links?
I'm currently using egrep in the terminal, but examples with javascript are accepted because I intend to create a crawler in that language on Nodejs.