Parsing JSON output using Mechanize and Python Django View -
i'm doing site search : site:somedomain.com bing using python , mechanize.
it submitting fine bing , returning output - looks json? can't seem figure out way further parse results. is json?
i'm getting output like:
link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=478', text='somesite - professor rating of louis scerbo', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=478'), ('h', 'id=serp,5105.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=527', text='somesite - professor rating of jahan \xe2\x80\xa6', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=527'), ('h', 'id=serp,5118.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=645', text='somesite - professor rating of david kutzik', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=645'), ('h', 'id=serp,5131.1')]) i want urls like:
http://www.somesite.com/prof.php?pid=478 http://www.somesite.com/prof.php?pid=527 http://www.somesite.com/prof.php?pid=645 and on, url attribute within
how can further mechanize within code? keep in mind, urls in future might like:
http://www.anothersite.com/dir/dir/dir/send.php?pid=100 thank !
well mechanize more browser package python, parsing html/xml recommend lxml, can feed data lxml , urls. option use regular expressions urls, approach more flexible.
import re url_regex = re.compile('http:[^\']+') urls = re.findall(url_regex, html_text) edit:
well instead of printing output, pass output instead of html_text in re.findall() , print urls
Comments
Post a Comment