python - Conversion required for part of an html code that is in JSON format -


i have html code in there json string generated program , whole json string commented in html code. there vital information has parsed out of json. there can convert commented json string html format, becomes proper html code can parse.

here input sample. owing character limitation stripped of code.

<!doctype html>   <!--[if lt ie 7]> <html lang="en" class="ie ie6 lte9 lte8 lte7 os-win"> <![endif]-->  <!--[if ie 7]> <html lang="en" class="ie ie7 lte9 lte8 lte7 os-win"> <![endif]-->  <!--[if ie 8]> <html lang="en" class="ie ie8 lte9 lte8 os-win"> <![endif]-->  <!--[if ie 9]> <html lang="en" class="ie ie9 lte9 os-win"> <![endif]-->  <!--[if gt ie 9]> <html lang="en" class="os-win"> <![endif]-->  <!--[if !ie]><!--> <html lang="en" class="os-win"> <!--<![endif]-->  <head>  <meta name="lnkd-track-json-lib" content="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=2jds9coeh4w78ed9wblscv68v-eo3jgzogk6v7maxgg86f4u27d&amp;fc=2">   <meta name="lnkd-track-lib" content="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=eo3jgzogk6v7maxgg86f4u27d&amp;fc=2"><meta name="treeid" content="yglqhfv7fxmqvjqjacsaaa==">   <meta name="appname" content="profile"> <meta name="lnkd-track-error" content="/lite/ua/error?csrftoken=ajax%3a1584468784299534813&amp;goback=%2enpv_131506997_*1_*1_name*4search_9ikf_*1_en*4us_*1_*1_*1_123452511375704499972_1_63_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1"><script src="http://static.licdn.com:80/scds/common/u/lib/fizzy/fz-1.3.3-min.js" type="text/javascript"></script><script type="text/javascript">fs.config({"failureredirect":"http://www.linkedin.com/nhome/","uniescape":true,"xhrheaders":{"x-fs-origin-request":"/profile/view?id=131506997&authtype=name_search&authtoken=9ikf&locale=en_us&srchid=123452511375704499972&srchindex=1&srchtotal=63&trk=vsrp_people_res_name&trkinfo=vsrpsearchid%3a123452511375704499972%2cvsrptargetid%3a131506997%2cvsrpcmpt%3aprimary","x-fs-page-id":"nprofile-view"}});</script> <!--{"content":{"search_highlight":{},"message_exchanged":{"messagesonlytoviewee":true,"messagesonlytoviewer":true},"certifications":{"certsmpr":{},"empty":{}},"lix_treasury_callout":"b","network_overview":{"lix_deferload":"b","lix_showdetail":"control","distance":3,"lix_deferonload":"b","allow_pivot_search":false,"i18n_s_network":"xyz's network","facets":{"skill_explicit":{"data":[{"count":5,"name":"equity research","value":"2112"},{"count":5,"name":"equities","value":"462"},{"count":5,"name":"portfolio management","value":"480"},{"count":4,"name":"financial markets","value":"1371"},{"count":4,"name":"derivatives","value":"814"}]}} }}}} 

i tried taking out json part , tried parse by

>>> json1 = json.loads(f1)  traceback (most recent call last):   file "<pyshell#26>", line 1, in <module>     json1 = json.loads(f1)   file "c:\python27\lib\json\__init__.py", line 338, in loads     return _default_decoder.decode(s)   file "c:\python27\lib\json\decoder.py", line 365, in decode     obj, end = self.raw_decode(s, idx=_w(s, 0).end())   file "c:\python27\lib\json\decoder.py", line 383, in raw_decode     raise valueerror("no json object decoded") valueerror: no json object decoded 

you can parse comments html using lambda text:isinstance(text, comment), load json string via json module. here's example:

import json bs4 import beautifulsoup, comment  soup = beautifulsoup(""" <table> <tr>    <td><table><tr><td>1</td></tr><tr><td>2</td></tr></table></td> </tr> <!--  {"test": [1,2,3]}  --> <tr>    <td><table><tr><td>3</td></tr><tr><td>4</td></tr></table></td> </tr> </table> """)  comments = soup.find(text=lambda text:isinstance(text, comment)) comments = json.loads(comments) print comments['test'] 

prints:

[1,2,3] 

hope helps.


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -