python - how to extract all the data between   -


<p align="justify"><a href="#abcd"> mr </a></p> <p align="justify">i </p> <p align="justify"> have question </p> <p align="justify">&nbsp;</p> <p align="justify"><a href="#mnop"> mr b </a></p> <p align="justify">the </p> <p align="justify">answer is</p> <p align="justify">not there</p> <p align="justify">&nbsp;</p> <p align="justify"><a href="wxyz"> mr c </a></p> <p align="justify">please</p> <p align="justify">help</p> 

i want iterate extraction of data of &nbsp;.

  • the first iteration should display i have question
  • second iteration should display the answer not there
  • the person names should extracted in different list ..for example ['mr a','mr b','mr c']

if has idea how it, might useful because trying learn python got stuck problem.the code tried is

for t in soup.findall('p',text = re.compile('&nbsp;'), attrs = {'align' : 'justify'}):     print t     item in t.parent.next_siblings:         if isinstance(item, tag):             if 'p' in item.attrs , 'align' in item.attrs['p']:                 break             print item 

it return [] not want

just method using regex:

from re import sub  html = '<p align="justify">i </p>\ <p align="justify"> have question </p>\ <p align="justify">&nbsp;</p>\ <p align="justify">the </p>\ <p align="justify">answer is</p>\ <p align="justify">not there</p>\ <p align="justify">&nbsp;</p>\ <p align="justify">please</p>\ <p align="justify">help</p>'  print [sub("\s+", " ", x).strip() x in sub("<.*?>", " ", html).split("&nbsp;")] 

output:

['i have question', 'the answer not there', 'please help'] 

Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -