CF11 issue
I’m trying to find an efficient way of pulling in a website’s metadata keywords (from the <meta> tag). Server’s running CF11
So far I’ve tried using the CFHTTP tag to pull in the data, but based on what I’m reading online people don’t seem to recommend using regular expressions for this task. The alternative seems to involve finding or building some sort of HTML parser, but I haven’t found any that work well, and I don’t have control over the server so I’m not able to install anything on it. I looked into using ColdFusion’s XMLPARSE, but that doesn’t seem to be what I’m after either.
The websites I’m going to pull this data from are not standardized, so I can’t rely on the <meta name=”keywords” {…} /> tag to be in the same format every time. It could be missing, it could have the name at the front, or at the end, the end could be />, but it could be just >
Any tips on how to do this without using too much processing power? I am looking for a solution that is efficient. The result should just be a string of keywords found on the website I point it at.
You want to look at jsoup
Add the jar to your CF server and you can very easily use it for parsing HTML.
It uses a selector syntax very similar to jQuery which makes it really easy and powerful.
Labels: 2016 at 09:26AM, code, May 17
0 Comments:
Post a Comment
<< Home