How to create a regular expression

1

I have the following div with information

<div class="endereco-item">
	<h2 class="azulclaro identify">Casa</h2>
	<div class="entrelinha_0"></div>
	<div class="font_15"></div>
	<div class="font_15"></div>
	<div class="font_15">R: Antonio Pires dos Santos, 647   praça central</div>
	<div class="font_15">Parque santo antonio - Sao Paulo - SP</div>
	<div class="font_15">CEP: 55555-555</div>
	<div class="font_15">Fone: (11)943-056-295 (55)555-555-555</div>
	<div id="ctl00_Body_rptEnderecos_ctl00_dvRadio" class="font_15 custom-checkbox">
	<input type="radio" id="radio0" name="radioSelect" checked onclick="setPrincipal(0)" />

What would be the regular expression to get the city São Paulo and the State SP?

    
asked by anonymous 08.11.2017 / 02:53

2 answers

2

If the format is always the one presented in the question, "neighborhood - city - state", where the state is represented by two uppercase letters,

>.* - (.*) - ([A-Z]{2})<

(Follow the example of the expression regex101: link )

That is, a closing of tag > , followed by any sequence of characters (neighborhood), followed by the "-" tab, followed by any sequence of characters we want to save in rematch ( city), another separator, another rematch for a couple of capital letters (state), and lastly a% tag opening%.

For PHP you can pass an array to the < function. Thus, the city and states will be returned in elements 1 and 2 of the array, respectively:

<?php
$html='<div class="endereco-item">
    <h2 class="azulclaro identify">Casa</h2>
    <div class="entrelinha_0"></div>
    <div class="font_15"></div>
    <div class="font_15"></div>
    <div class="font_15">R: Antonio Pires dos Santos, 647   praça central</div>
    <div class="font_15">Parque santo antonio - Sao Paulo - SP</div>
    <div class="font_15">CEP: 55555-555</div>
    <div class="font_15">Fone: (11)943-056-295 (55)555-555-555</div>
    <div id="ctl00_Body_rptEnderecos_ctl00_dvRadio" class="font_15 custom-checkbox">
    <input type="radio" id="radio0" name="radioSelect" checked onclick="setPrincipal(0)" />';

$cidade_estado = array();
$regex = '/>.* - (.*) - ([A-Z]{2})</';
preg_match($regex, $html, $cidade_estado);

print_r($cidade_estado);

(Follow sample PHP code on repl.it: link )

    
08.11.2017 / 03:53
2

It's a bit complicated to get the City and State in your html.

I did a test here and got using the following Regular Expression:

/(?![^<>]*>)-\s?(?P<cidade>[a-zA-z].*?)\s?-\s?(?P<estado>[a-zA-Z]{2})/

This way it finds the city and state even if it has variables in spaces, and typing. I made a mess in the test code and even then he managed to catch it in several different ways.

Follow the test with the messy code, where I put cities and states in various parts of the code.

But if you can change the html I suggest putting ID s in each div . This would make it much easier to use a regular expression that searches the right div.

But I hope that expression I created works for you.

    
08.11.2017 / 03:33