Parsing JSON output using Mechanize and Python Django View -


i'm doing site search : site:somedomain.com bing using python , mechanize.

it submitting fine bing , returning output - looks json? can't seem figure out way further parse results. is json?

i'm getting output like:

link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=478', text='somesite -  professor rating of louis scerbo', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=478'), ('h', 'id=serp,5105.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=527', text='somesite -  professor rating of jahan \xe2\x80\xa6', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=527'), ('h', 'id=serp,5118.1')])link(base_url=u'http://www.bing.com/search?q=site%3asomesite.com', url='http://www.somesite.com/prof.php?pid=645', text='somesite -  professor rating of david kutzik', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pid=645'), ('h', 'id=serp,5131.1')]) 

i want urls like:

http://www.somesite.com/prof.php?pid=478 http://www.somesite.com/prof.php?pid=527 http://www.somesite.com/prof.php?pid=645 

and on, url attribute within

how can further mechanize within code? keep in mind, urls in future might like:

http://www.anothersite.com/dir/dir/dir/send.php?pid=100 

thank !

well mechanize more browser package python, parsing html/xml recommend lxml, can feed data lxml , urls. option use regular expressions urls, approach more flexible.

import re  url_regex = re.compile('http:[^\']+') urls = re.findall(url_regex, html_text) 

edit:

well instead of printing output, pass output instead of html_text in re.findall() , print urls


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -