Post

Nested Constructs in Regular Expressions

Today I was looking for a way to do nested expression matching with regular expressions, and pretty much came up empty. Then after a trip to the bookstore to pick up Mastering Regular Expressions by Jeffrey Friedl, I finally found it.

Interestingly, even now that I know what to search for :-), I can’t find a single reference to this on the net or on MSDN.

With the .NET regular expression evaluator, there are (?) and (?<-DEPTH>) constructs that you can use to match nested expressions; for example, if you want to find matching parentheses, or matching HTML tags. Here’s a “simple” example that will match nested <div> tags:

1
<div> (?><div (?<DEPTH>) | </div (?<-DEPTH>) | .? )* (?(DEPTH)(?!))</div>

Which will match the part in red below:

<div>this is some <div>red</div>text</div></div></div>

This is pretty cool, I’ve got to say. I really can’t do this justice; if you’re interested, I recommend you pick up the book!

This post is licensed under CC BY 4.0 by the author.