c# - Parsing multiple groups -
i have html file(i can't use html agilitypack) want extract id of div(if has one)
<div id="div1">street ___________________ </div> <div id="div2">cap |__|__|__|__|__| number ______ </div> <div id="div3">city _____________________ state |__|__|</div> <div id="div4">city2 ____________________ state2 _____</div>
i have pattern extracting underscores __ : [\ _]{3,}
now if have div in front of underscores want extract it, if not i'll underscores.
i have build far pattern (<div id(.+?)>(\w)([\ _]{3,}/*))([\ _]{3,})
the first part build out of 3 groups 1 - div tag, 2 - label, 3 - underscores
1 - <div id(.+?)>
, 2 - (\w)
, 3 - [\ _]{3,}/*
the div id div2 not take id because contains non-alfanumeric chars.
q: wrong pattern ?
desired matchs 4 divs:
<div id="div1">street ___________________ ______ <div id="div3">city _____________________ <div id="div4">city2 ____________________ _____
\w
single character, want 1 or more -\w+
./*
- 0 or more/
's? don't see fits in.one or more not
>
's (i.e.[^>]+
) better idea.+?
..+?
try stop @ first>
, continue until finds string matches, i.e.:<div id=1>this not valid</div><div id=2>this valid___</div>
will match whole string, instead of
<div id=2>
.as far can tell question, before underscores should optional.
pattern:
(?:(<div id[^>]+>)(\w+))?([\ _]{3,})
Comments
Post a Comment