Javascript Regular Expression greedy vs lazy
We will understand the concept of greedy Vs. Lazy regular expression with the help of
an example.
Consider part of an html page consisting of some words in bold. Here is an example
<p> This is an example page </p>
<b> First Bold </b>
Something here
<b> Second Bold </b>
Something else here
<p> Finish </p>
Now, we want to write a regular expression which will match <b> First Bold
</b> in the above example. To do this we use the following regular expression
/<b>.*</b>/
Hoping it to work. This regular expression, however, matches the following part of the
subject
<b> First Bold </b>
Something here
<b> Second Bold </b>
Why ? To understand it we would like to understand the mechanism of match. In the
regular expression /<b>.*</b>/ . The regular expression first looks for <b> in the
subject. Once <b> is found, .* all way to the end of the subject. In the process, it eats
up the whole subject, right all way to the finish </p>. When its stomach is full, it looks
in the regular expression, for what next to match. It is </b>. So what it does is
backtracking.
It tries to match < of </b> in the regular expression with > of </p> in the subject. The
match fails. Now it tries matching < of </b> with p of </p>. This also fails. It keeps
doing this till it matches < of </b> in the regular expression with < of </p> in the
subject
<html>
<body>
<script type="text/javascript">
<!--
/*
********************************************************
Javascript Regular Expression Example ch4 Ex 01
Understanding Greedy Vs. Lazy Match
********************************************************
*/
var pattern1=/<b>(.*)<\/b>/;
var string1 = "<p> This is an example page </p><b> First Bold </b>Something here
<b> Second Bold </b> Something else here<p> Finish </p>" ;
var string2 = string1.match(pattern1);
document.write("string2[0] is : ", string2[0] , "<br />");
//-->
</script>
</body>
</html>
If we run this code we get the following output
string2[0] is : First Bold Something here Second Bold
Notice that, since we are displaying the output on an html, we do not see <b> and </b>.
We instead see the actual bold letters. But you should be able to feel the idea. The
regular expression /<b>(.*)<\/b>/; matches all the text between the first <b> and all
way to the last </b>.
If we change the statement
var pattern1=/<b>(.*)<\/b>/;
to
var pattern1=/<b>(.*?)<\/b>/;
we get the following output
string2[0] is : First Bold
You may also like to take a small quiz about Regex ( greedy vs lazy)
Also check
1. Javascript Tutorial for beginners
2. Regular Expression In Javascript on amazon