c# - Parsing multiple groups -


i have html file(i can't use html agilitypack) want extract id of div(if has one)

<div id="div1">street ___________________ </div> <div id="div2">cap |__|__|__|__|__| number ______ </div> <div id="div3">city _____________________ state |__|__|</div> <div id="div4">city2 ____________________ state2 _____</div> 

i have pattern extracting underscores __ : [\ _]{3,}

now if have div in front of underscores want extract it, if not i'll underscores.

i have build far pattern (<div id(.+?)>(\w)([\ _]{3,}/*))([\ _]{3,})

the first part build out of 3 groups 1 - div tag, 2 - label, 3 - underscores

1 - <div id(.+?)>, 2 - (\w) , 3 - [\ _]{3,}/*

the div id div2 not take id because contains non-alfanumeric chars.

q: wrong pattern ?

desired matchs 4 divs:

<div id="div1">street ___________________ ______  <div id="div3">city _____________________ <div id="div4">city2 ____________________ _____ 

  • \w single character, want 1 or more - \w+.

  • /* - 0 or more /'s? don't see fits in.

  • one or more not >'s (i.e. [^>]+) better idea .+?. .+? try stop @ first >, continue until finds string matches, i.e.:

    <div id=1>this not valid</div><div id=2>this valid___</div> 

    will match whole string, instead of <div id=2>.

  • as far can tell question, before underscores should optional.

pattern:

(?:(<div id[^>]+>)(\w+))?([\ _]{3,}) 

c# test.


Comments

Popular posts from this blog

image - ClassNotFoundException when add a prebuilt apk into system.img in android -

I need to import mysql 5.1 to 5.5? -

Java, Hibernate, MySQL - store UTC date-time -