Parsing JSON output using Mechanize and Python Django View -
i'm doing site search : site:somedomain.com
bing using python , mechanize.
it submitting fine bing , returning output - looks json? can't seem figure out way further parse results. is json?
i'm getting output like:
link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=478', text='somesite - professor rating of louis scerbo', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=478'), ('h', 'id=serp,5105.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=527', text='somesite - professor rating of jahan \xe2\x80\xa6', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=527'), ('h', 'id=serp,5118.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=645', text='somesite - professor rating of david kutzik', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=645'), ('h', 'id=serp,5131.1')])
i want urls like:
http://www.somesite.com/prof.php?pid=478 http://www.somesite.com/prof.php?pid=527 http://www.somesite.com/prof.php?pid=645
and on, url
attribute within
how can further mechanize within code? keep in mind, urls in future might like:
http://www.anothersite.com/dir/dir/dir/send.php?pid=100
thank !
well mechanize more browser package python, parsing html/xml recommend lxml, can feed data lxml , urls. option use regular expressions urls, approach more flexible.
import re url_regex = re.compile('http:[^\']+') urls = re.findall(url_regex, html_text)
edit:
well instead of printing output
, pass output
instead of html_text
in re.findall()
, print urls
Comments
Post a Comment