python - how to extract all the data between &nbsp; -

python - how to extract all the data between   -

June 15, 2012

<p align="justify"><a href="#abcd"> mr </a></p> <p align="justify">i </p> <p align="justify"> have question </p> <p align="justify">&nbsp;</p> <p align="justify"><a href="#mnop"> mr b </a></p> <p align="justify">the </p> <p align="justify">answer is</p> <p align="justify">not there</p> <p align="justify">&nbsp;</p> <p align="justify"><a href="wxyz"> mr c </a></p> <p align="justify">please</p> <p align="justify">help</p>

i want iterate extraction of data of  .

the first iteration should display i have question
second iteration should display the answer not there
the person names should extracted in different list ..for example ['mr a','mr b','mr c']

if has idea how it, might useful because trying learn python got stuck problem.the code tried is

for t in soup.findall('p',text = re.compile('&nbsp;'), attrs = {'align' : 'justify'}):     print t     item in t.parent.next_siblings:         if isinstance(item, tag):             if 'p' in item.attrs , 'align' in item.attrs['p']:                 break             print item

it return [] not want

just method using regex:

from re import sub  html = '<p align="justify">i </p>\ <p align="justify"> have question </p>\ <p align="justify">&nbsp;</p>\ <p align="justify">the </p>\ <p align="justify">answer is</p>\ <p align="justify">not there</p>\ <p align="justify">&nbsp;</p>\ <p align="justify">please</p>\ <p align="justify">help</p>'  print [sub("\s+", " ", x).strip() x in sub("<.*?>", " ", html).split("&nbsp;")]

output:

['i have question', 'the answer not there', 'please help']

Search This Blog

Copy

python - how to extract all the data between   -

Comments

Post a Comment

Popular posts from this blog

matlab - Deleting rows with specific rules -

asp.net - redirect .aspx with query string to html page using htaccess -

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

python - how to extract all the data between &nbsp; -

Comments

Post a Comment

Popular posts from this blog

matlab - Deleting rows with specific rules -

asp.net - redirect .aspx with query string to html page using htaccess -

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

python - how to extract all the data between -