<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Nested Constructs in Regular Expressions</title>
	<atom:link href="http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/</link>
	<description>Musings on just about everything.</description>
	<lastBuildDate>Tue, 16 Mar 2010 05:30:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Steve</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-5016</link>
		<dc:creator>Steve</dc:creator>
		<pubDate>Fri, 25 Jan 2008 23:14:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-5016</guid>
		<description>This post covers some interesting uses for balancing groups: &lt;a href=&quot;http://blog.stevenlevithan.com/archives/balancing-groups&quot; rel=&quot;nofollow&quot;&gt;Fun With .NET&#039;s Regex Balancing Groups&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>This post covers some interesting uses for balancing groups: <a href="http://blog.stevenlevithan.com/archives/balancing-groups" rel="nofollow">Fun With .NET&#8217;s Regex Balancing Groups</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Weaver</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-2733</link>
		<dc:creator>Scott Weaver</dc:creator>
		<pubDate>Fri, 14 Dec 2007 20:00:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-2733</guid>
		<description>Just to follow up what I said yesterday, having read the post at &lt;a href=&quot;http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx&quot; rel=&quot;nofollow&quot;&gt;an msdn blog on the subject:&lt;/a&gt;It is best to think of a Group as a Stack of captures. Where the top of the stack is the last capture made. (?\)) Matches to “)” and pops a capture off of the Open group’s capture stack. This match can only be successful if and only if the Open group’s capture stack is not empty. This is a fancy way of saying that for every match of this group there must be a match of the group Open.So a &quot;pop&quot; *should* fail on an empty stack and make the pattern fail, but in my recent experience this was just not the case.

sZweaver@gmail.com (86 the z)</description>
		<content:encoded><![CDATA[<p>Just to follow up what I said yesterday, having read the post at <a href="http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx" rel="nofollow">an msdn blog on the subject:</a>It is best to think of a Group as a Stack of captures. Where the top of the stack is the last capture made. (?\)) Matches to “)” and pops a capture off of the Open group’s capture stack. This match can only be successful if and only if the Open group’s capture stack is not empty. This is a fancy way of saying that for every match of this group there must be a match of the group Open.So a &#8220;pop&#8221; *should* fail on an empty stack and make the pattern fail, but in my recent experience this was just not the case.</p>
<p><a href="mailto:sZweaver@gmail.com">sZweaver@gmail.com</a> (86 the z)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Weaver</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-2662</link>
		<dc:creator>Scott Weaver</dc:creator>
		<pubDate>Thu, 13 Dec 2007 14:52:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-2662</guid>
		<description>(Moderator, please remove my previous comment)

The code as posted here (and everywhere I’ve seen it) for matching balancing constructs does not work.

The problem lies in the meat of the pattern:

(?&gt;
&lt;div (?&lt;DEPTH&gt;) &#124; &lt;/div (?&lt;-DEPTH&gt;) &#124; .? )* (?(DEPTH)(?!))
)

let’s say input is: “Beforedivs&lt;div&gt;.stuff&lt;/div&gt;more stuff&lt;/div&gt;Afterdivs”

The pattern matches:&lt;div&gt;.stuff&lt;/div&gt;more stuff&lt;/div&gt;

That’s not a balanced set of divs. Here’s why: the &lt;div&gt; is matched and the DEPTH stack is pushed. The first &lt;/div&gt; is matched and the stack is popped. Now the (?(DEPTH)(?!) comes in and tests if there’s anything on the stack and there isn’t so the pattern continues greedily. The second &lt;/div&gt; is finally matched, the EMTPY stack is popped and again the test sees nothing on the stack so the pattern does not fail. The pattern is forced to backtrack and allow the last &lt;/div&gt; in the pattern to match, and we get the erroneous result. The pattern does not account for unmatched &lt;/div&gt;’s.

The good news is that solution is easy. Make it:
(?&gt;
&lt;div (?) &#124; &lt;/div (?) &#124; .? )*? (?(DEPTH)(?!)
)
I made the whole parentheses subexpression NON-GREEDY with the ? mark. Now the pattern stops after every &lt;div&gt;,&lt;/div&gt;, or single &#039;anything&#039; character (.?) matched and sees if it can match that last &lt;/div&gt;, and if so it stops. This new pattern now matches:
&lt;div&gt;stuff&lt;/div&gt;

sZweaver2112@gmail.com  (86 the z)</description>
		<content:encoded><![CDATA[<p>(Moderator, please remove my previous comment)</p>
<p>The code as posted here (and everywhere I’ve seen it) for matching balancing constructs does not work.</p>
<p>The problem lies in the meat of the pattern:</p>
<p>(?&gt;<br />
&lt;div (?&lt;DEPTH&gt;) | &lt;/div (?&lt;-DEPTH&gt;) | .? )* (?(DEPTH)(?!))<br />
)</p>
<p>let’s say input is: “Beforedivs&lt;div&gt;.stuff&lt;/div&gt;more stuff&lt;/div&gt;Afterdivs”</p>
<p>The pattern matches:&lt;div&gt;.stuff&lt;/div&gt;more stuff&lt;/div&gt;</p>
<p>That’s not a balanced set of divs. Here’s why: the &lt;div&gt; is matched and the DEPTH stack is pushed. The first &lt;/div&gt; is matched and the stack is popped. Now the (?(DEPTH)(?!) comes in and tests if there’s anything on the stack and there isn’t so the pattern continues greedily. The second &lt;/div&gt; is finally matched, the EMTPY stack is popped and again the test sees nothing on the stack so the pattern does not fail. The pattern is forced to backtrack and allow the last &lt;/div&gt; in the pattern to match, and we get the erroneous result. The pattern does not account for unmatched &lt;/div&gt;’s.</p>
<p>The good news is that solution is easy. Make it:<br />
(?&gt;<br />
&lt;div (?) | &lt;/div (?) | .? )*? (?(DEPTH)(?!)<br />
)<br />
I made the whole parentheses subexpression NON-GREEDY with the ? mark. Now the pattern stops after every &lt;div&gt;,&lt;/div&gt;, or single &#8216;anything&#8217; character (.?) matched and sees if it can match that last &lt;/div&gt;, and if so it stops. This new pattern now matches:<br />
&lt;div&gt;stuff&lt;/div&gt;</p>
<p><a href="mailto:sZweaver2112@gmail.com">sZweaver2112@gmail.com</a>  (86 the z)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-545</link>
		<dc:creator>Steve</dc:creator>
		<pubDate>Sun, 04 Feb 2007 07:08:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-545</guid>
		<description>Note that you can do this even without the .NET-exclusive balancing group feature, as long as you know in advance the maximum levels of nesting you need to support. See http://badassery.blogspot.com/2006/03/regex-recursion-without-balancing.html for details.
</description>
		<content:encoded><![CDATA[<p>Note that you can do this even without the .NET-exclusive balancing group feature, as long as you know in advance the maximum levels of nesting you need to support. See <a href="http://badassery.blogspot.com/2006/03/regex-recursion-without-balancing.html" rel="nofollow">http://badassery.blogspot.com/2006/03/regex-recursion-without-balancing.html</a> for details.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Warner</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-544</link>
		<dc:creator>Paul Warner</dc:creator>
		<pubDate>Mon, 06 Jun 2005 06:08:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-544</guid>
		<description>Thanks for the post- this is working like a champ!  Exactly what I needed.  I am having trouble getting this to work with multiple lines of html though.  If there are any newlines between the tags, this seems to break.. am I doing something wrong?  Thanks, Paul
</description>
		<content:encoded><![CDATA[<p>Thanks for the post- this is working like a champ!  Exactly what I needed.  I am having trouble getting this to work with multiple lines of html though.  If there are any newlines between the tags, this seems to break.. am I doing something wrong?  Thanks, Paul</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todd Michael</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-543</link>
		<dc:creator>Todd Michael</dc:creator>
		<pubDate>Mon, 20 Dec 2004 15:58:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-543</guid>
		<description>Any idea if this can be used to easily get the data between non nested items. For example, if I wanted to get all the data between a &lt;div&gt;&lt;br&gt;and &lt;/div&gt;&lt;br&gt;tag and I know there is no nested &lt;div&gt;&lt;br&gt;tags... I tried this but it doesn&#039;t work.
</description>
		<content:encoded><![CDATA[<p>Any idea if this can be used to easily get the data between non nested items. For example, if I wanted to get all the data between a &lt;div&gt;<br />
<br />and &lt;/div&gt;<br />
<br />tag and I know there is no nested &lt;div&gt;<br />
<br />tags&#8230; I tried this but it doesn&#8217;t work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim Hollenhorst</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-542</link>
		<dc:creator>Jim Hollenhorst</dc:creator>
		<pubDate>Wed, 19 Nov 2003 05:30:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-542</guid>
		<description>Greg, &lt;br&gt;&lt;br&gt;Expresso (www.ultrapico.com) does use .NET regular expressions, so you might want to give it a try. Thanks for the reference on balancing groups and Friedl&#039;s book. There is no &quot;DEPTH&quot; construct, per se. That is simply the name that Friedl chose for his capture group, any other name works as well. The key concept is &quot;balancing groups&quot;.
</description>
		<content:encoded><![CDATA[<p>Greg, </p>
<p>Expresso (www.ultrapico.com) does use .NET regular expressions, so you might want to give it a try. Thanks for the reference on balancing groups and Friedl&#8217;s book. There is no &#8220;DEPTH&#8221; construct, per se. That is simply the name that Friedl chose for his capture group, any other name works as well. The key concept is &#8220;balancing groups&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Reinacker</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-541</link>
		<dc:creator>Greg Reinacker</dc:creator>
		<pubDate>Sun, 18 May 2003 22:16:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-541</guid>
		<description>Unless regex-coach uses the .NET regular expression parser, it wouldn&#039;t be of any use - the particular feature I was talking about is a MS extension.
</description>
		<content:encoded><![CDATA[<p>Unless regex-coach uses the .NET regular expression parser, it wouldn&#8217;t be of any use &#8211; the particular feature I was talking about is a MS extension.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Pirillo</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-540</link>
		<dc:creator>Chris Pirillo</dc:creator>
		<pubDate>Sun, 18 May 2003 17:55:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-540</guid>
		<description>http://weitz.de/regex-coach
</description>
		<content:encoded><![CDATA[<p><a href="http://weitz.de/regex-coach" rel="nofollow">http://weitz.de/regex-coach</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Reinacker</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/comment-page-1/#comment-539</link>
		<dc:creator>Greg Reinacker</dc:creator>
		<pubDate>Fri, 16 May 2003 02:41:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/05/15/nested-constructs-in-regular-expressions/#comment-539</guid>
		<description>Mmm...I stand corrected; it does talk about it. :-)
</description>
		<content:encoded><![CDATA[<p>Mmm&#8230;I stand corrected; it does talk about it. :-)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
