There are two possibilities:
Read the site and parse it with Regular Expressions
Syntactically parse HTML with DOM or SimpleXML
The first option is the easiest but not the safest one for you, because if you do not beware of regular expression construction, a comma (literally) that the target site developer modifies and your app can potentially fail work.
In addition, it is slower because you almost work on brute force, marrying different patterns, and manipulating array structures, often multidimensional.
For this possibility file_get_contents () is often enough:
$html = file_get_contents( 'http://www.site.com' );
And $ html you enter as a succession target preg_match () , preg_match_all () , preg_replace () ... those which you think is best, as many times as you need.
The second possibility is more complicated if you choose DOM , but it is safer because you work with the HTML hierarchy, almost the same in JavaScript. You list us, iterates collections of children, and so on.
It's tricky because DOM is a massive and very detailed class set.
If the target site is simpler, you can choose to SimpleXML which is like DOM , but much less powerful and consequently much simpler.