python - Conversion required for part of an html code that is in JSON format -
i have html code in there json string generated program , whole json string commented in html code. there vital information has parsed out of json. there can convert commented json string html format, becomes proper html code can parse.
here input sample. owing character limitation stripped of code.
<!doctype html> <!--[if lt ie 7]> <html lang="en" class="ie ie6 lte9 lte8 lte7 os-win"> <![endif]--> <!--[if ie 7]> <html lang="en" class="ie ie7 lte9 lte8 lte7 os-win"> <![endif]--> <!--[if ie 8]> <html lang="en" class="ie ie8 lte9 lte8 os-win"> <![endif]--> <!--[if ie 9]> <html lang="en" class="ie ie9 lte9 os-win"> <![endif]--> <!--[if gt ie 9]> <html lang="en" class="os-win"> <![endif]--> <!--[if !ie]><!--> <html lang="en" class="os-win"> <!--<![endif]--> <head> <meta name="lnkd-track-json-lib" content="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=2jds9coeh4w78ed9wblscv68v-eo3jgzogk6v7maxgg86f4u27d&fc=2"> <meta name="lnkd-track-lib" content="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=eo3jgzogk6v7maxgg86f4u27d&fc=2"><meta name="treeid" content="yglqhfv7fxmqvjqjacsaaa=="> <meta name="appname" content="profile"> <meta name="lnkd-track-error" content="/lite/ua/error?csrftoken=ajax%3a1584468784299534813&goback=%2enpv_131506997_*1_*1_name*4search_9ikf_*1_en*4us_*1_*1_*1_123452511375704499972_1_63_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1"><script src="http://static.licdn.com:80/scds/common/u/lib/fizzy/fz-1.3.3-min.js" type="text/javascript"></script><script type="text/javascript">fs.config({"failureredirect":"http://www.linkedin.com/nhome/","uniescape":true,"xhrheaders":{"x-fs-origin-request":"/profile/view?id=131506997&authtype=name_search&authtoken=9ikf&locale=en_us&srchid=123452511375704499972&srchindex=1&srchtotal=63&trk=vsrp_people_res_name&trkinfo=vsrpsearchid%3a123452511375704499972%2cvsrptargetid%3a131506997%2cvsrpcmpt%3aprimary","x-fs-page-id":"nprofile-view"}});</script> <!--{"content":{"search_highlight":{},"message_exchanged":{"messagesonlytoviewee":true,"messagesonlytoviewer":true},"certifications":{"certsmpr":{},"empty":{}},"lix_treasury_callout":"b","network_overview":{"lix_deferload":"b","lix_showdetail":"control","distance":3,"lix_deferonload":"b","allow_pivot_search":false,"i18n_s_network":"xyz's network","facets":{"skill_explicit":{"data":[{"count":5,"name":"equity research","value":"2112"},{"count":5,"name":"equities","value":"462"},{"count":5,"name":"portfolio management","value":"480"},{"count":4,"name":"financial markets","value":"1371"},{"count":4,"name":"derivatives","value":"814"}]}} }}}}
i tried taking out json part , tried parse by
>>> json1 = json.loads(f1) traceback (most recent call last): file "<pyshell#26>", line 1, in <module> json1 = json.loads(f1) file "c:\python27\lib\json\__init__.py", line 338, in loads return _default_decoder.decode(s) file "c:\python27\lib\json\decoder.py", line 365, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) file "c:\python27\lib\json\decoder.py", line 383, in raw_decode raise valueerror("no json object decoded") valueerror: no json object decoded
you can parse comments html using lambda text:isinstance(text, comment)
, load json string via json
module. here's example:
import json bs4 import beautifulsoup, comment soup = beautifulsoup(""" <table> <tr> <td><table><tr><td>1</td></tr><tr><td>2</td></tr></table></td> </tr> <!-- {"test": [1,2,3]} --> <tr> <td><table><tr><td>3</td></tr><tr><td>4</td></tr></table></td> </tr> </table> """) comments = soup.find(text=lambda text:isinstance(text, comment)) comments = json.loads(comments) print comments['test']
prints:
[1,2,3]
hope helps.
Comments
Post a Comment