Parsing HTML is, without a doubt, the most stupid act a developer can partake in.
It is so unbelievably fraught with error and complications that most of the time it's not even worth doing. Really, if you've got a website that has webpages that show off "stuff" in a "data like" format, would it kill you to just provide a real "data like" format, straight up?
In my particular case, this foray into parsing involved, in the end, turning the HTML page into something more usable in a programming language. This time it was JSON.
Bleh, now as much as I want to go off on a tirade of how much more I prefer JSON or XML, I wont, because there is no argument to be made.
Call me a polemic, but I speak from plenty of experience in the area. The pile of pulled out hair due to JSON nuances and bitch-fests is magnitudes smaller than the pile of pulled out hair made at the hands of XML handling.
XML, as a data format, sucks. Stop using it.