<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jesal gadhia &#187; regex</title>
	<atom:link href="http://jesal.us/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://jesal.us</link>
	<description></description>
	<lastBuildDate>Fri, 19 Mar 2010 05:32:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0</title>
		<link>http://jesal.us/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/</link>
		<comments>http://jesal.us/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/#comments</comments>
		<pubDate>Wed, 06 May 2009 22:13:27 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/?p=132</guid>
		<description><![CDATA[Lately I&#8217;ve been working on my pet project called Twime. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user&#8217;s Twitter timeline. Here&#8217;s how I went about doing that:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Text.RegularExpressions;

public static class HTMLParser
{
    public static string Link(this [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been working on my pet project called <a href="http://jesal.us/twime/" target="_blank">Twime</a>. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user&#8217;s Twitter timeline. Here&#8217;s how I went about doing that:</p>
<pre name="code" class="c-sharp:nogutter;">
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Text.RegularExpressions;

public static class HTMLParser
{
    public static string Link(this string s, string url)
    {
        return string.Format("&lt;a href=\"{0}\" target=\"_blank\"&gt;{1}&lt;/a&gt;", url, s);
    }
    public static string ParseURL(this string s)
    {
        return Regex.Replace(s, @"(http(s)?://)?([\w-]+\.)+[\w-]+(/\S\w[\w- ;,./?%&#038;=]\S*)?", new MatchEvaluator(HTMLParser.URL));
    }
    public static string ParseUsername(this string s)
    {
        return Regex.Replace(s, "(@)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Username));
    }
    public static string ParseHashtag(this string s)
    {
        return Regex.Replace(s, "(#)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Hashtag));
    }
    private static string Hashtag(Match m)
    {
        string x = m.ToString();
        string tag = x.Replace("#", "%23");
        return x.Link("http://search.twitter.com/search?q=" + tag);
    }
    private static string Username(Match m)
    {
        string x = m.ToString();
        string username = x.Replace("@", "");
        return x.Link("http://twitter.com/" + username);
    }
    private static string URL(Match m)
    {
        string x = m.ToString();
        return x.Link(x);
    }
}
</pre>
<p>So as you can see I&#8217;m using the new <a href="http://msdn.microsoft.com/en-us/library/bb383977.aspx" target="_blank">Extension Methods</a> feature in C# 3.0.</p>
<p>Now I can simply just call the extension methods like this:</p>
<pre name="code" class="c-sharp:nogutter;">
string tweet = "Just blogged about how to parse HTML from the @twitter timeline - http://jesal.us/blog/?p=132 #programming";
Response.Write(tweet.ParseURL().ParseUsername().ParseHashtag());
</pre>
<p>and the result should looks something like this:</p>
<p>Just blogged about how to parse html from the <a href="http://twitter.com/twitter" target="_blank">@twitter</a> timeline &#8211; <a href="http://jesal.us/blog/?p=132" target="_blank">http://jesal.us/blog/?p=132</a> <a href="http://search.twitter.com/search?q=%23programming">#programming</a></p>
<p>Just be sure to call ParseURL method before ParseUsername and ParseHashtag. The other two methods will add URLs to the usernames and hastags and you don&#8217;t want ParseURL to confuse those links with the original links present in the text.</p>
<p>This was inspired by <a href="http://www.simonwhatley.co.uk/" target="_blank">Simon Whatley</a>&#8217;s <a href="http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-javascript" target="_blank">post</a> about doing something similar using prototyping with JavaScript.</p>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How to easily parse HTML without RegEx</title>
		<link>http://jesal.us/2008/05/how-to-easily-parse-html-without-regex/</link>
		<comments>http://jesal.us/2008/05/how-to-easily-parse-html-without-regex/#comments</comments>
		<pubDate>Tue, 06 May 2008 16:50:04 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/?p=91</guid>
		<description><![CDATA[I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions.
Here&#8217;s a very simple example of what you could do with it &#8211; I&#8217;m just extracting inner HTML from any element inside a HTML file which has a css class [...]]]></description>
			<content:encoded><![CDATA[<p>I recently discovered an absolutely amazing HTML parsing library for .NET called <a href="http://www.codeplex.com/htmlagilitypack" target="_blank">HtmlAgilityPack</a>. It completely takes away the pain of parsing complicated HTML with regular expressions.</p>
<p>Here&#8217;s a very simple example of what you could do with it &#8211; I&#8217;m just extracting inner HTML from any element inside a HTML file which has a css class called &#8220;scrape&#8221; assigned to it:<br />
<br />
<!-- code formatted by http://manoli.net/csharpformat/ --></p>
<pre class="csharpcode">
<span class="kwrd">using</span> HtmlAgilityPack;

<span class="kwrd">public</span> <span class="kwrd">partial</span> <span class="kwrd">class</span> _Default : System.Web.UI.Page
{
    <span class="kwrd">protected</span> <span class="kwrd">void</span> Page_Load(<span class="kwrd">object</span> sender, EventArgs e)
    {
        HtmlDocument doc = <span class="kwrd">new</span> HtmlDocument();
        doc.Load(Server.MapPath(filePath));
        Parse(doc.DocumentNode);
    }
    <span class="kwrd">private</span> <span class="kwrd">void</span> Parse(HtmlNode n)
    {
        <span class="kwrd">foreach</span> (HtmlAttribute atr <span class="kwrd">in</span> n.Attributes)
        {
            <span class="kwrd">if</span> (atr.Name == <span class="str">"class"</span> &amp;&amp; atr.Value == <span class="str">"scrape"</span>)
            {
                Response.Write(n.InnerHtml);
            }
        }

        <span class="kwrd">if</span> (n.HasChildNodes)
        {
            <span class="kwrd">foreach</span> (HtmlNode cn <span class="kwrd">in</span> n.ChildNodes)
            {
                Parse(cn);
            }
        }
    }
}
</pre>
<p>
That&#8217;s just a very small part of what it could do. I&#8217;ll expand upon this and post a few more examples in the future showing some interesting things you could do with this. </p>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2008/05/how-to-easily-parse-html-without-regex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to extract URLs (href property) from HTML</title>
		<link>http://jesal.us/2008/01/how-to-extract-urls-href-property-from-html/</link>
		<comments>http://jesal.us/2008/01/how-to-extract-urls-href-property-from-html/#comments</comments>
		<pubDate>Sat, 05 Jan 2008 07:13:09 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/2008/01/05/how-to-extract-urls-href-property-from-html/</guid>
		<description><![CDATA[

protected ArrayList getURL(string txtIn)
{
    ArrayList outURL = new ArrayList();
    Regex r = new Regex("href\\s*=\\s*(?:(?:\\\"(?&#60;url&#62;[^\\\"]*)\\\")&#124;(?&#60;url&#62;[^\\s]* ))");
    MatchCollection mc1 = r.Matches(txtIn);

    foreach (Match m1 in mc1)
    {
        foreach (Group g in m1.Groups)
    [...]]]></description>
			<content:encoded><![CDATA[<p><!-- code formatted by http://manoli.net/csharpformat/ --></p>
<pre class="csharpcode" style="overflow-y:hidden">
<span class="kwrd">protected</span> ArrayList getURL(<span class="kwrd">string</span> txtIn)
{
    ArrayList outURL = <span class="kwrd">new</span> ArrayList();
    Regex r = <span class="kwrd">new</span> Regex(<span class="str">"href\\s*=\\s*(?:(?:\\\"(?&lt;url&gt;[^\\\"]*)\\\")|(?&lt;url&gt;[^\\s]* ))"</span>);
    MatchCollection mc1 = r.Matches(txtIn);

    <span class="kwrd">foreach</span> (Match m1 <span class="kwrd">in</span> mc1)
    {
        <span class="kwrd">foreach</span> (Group g <span class="kwrd">in</span> m1.Groups)
        {
            outURL.Add(g.Value);
        }
    }

    <span class="kwrd">return</span> outURL;
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2008/01/how-to-extract-urls-href-property-from-html/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to remove all special characters from a string</title>
		<link>http://jesal.us/2007/12/how-to-remove-all-special-characters-from-a-string/</link>
		<comments>http://jesal.us/2007/12/how-to-remove-all-special-characters-from-a-string/#comments</comments>
		<pubDate>Fri, 28 Dec 2007 03:55:55 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/2007/12/27/how-to-remove-all-special-characters-from-a-string/</guid>
		<description><![CDATA[

protected string StripSpecChars(string txtIn)
{
    string txtOut = Regex.Replace(txtIn, @"[^\w\.@-]", "").Trim();
    return txtOut;
}
]]></description>
			<content:encoded><![CDATA[<p><!-- code formatted by http://manoli.net/csharpformat/ --></p>
<pre class="csharpcode">
<span class="kwrd">protected</span> <span class="kwrd">string</span> StripSpecChars(<span class="kwrd">string</span> txtIn)
{
    <span class="kwrd">string</span> txtOut = Regex.Replace(txtIn, <span class="str">@"[^\w\.@-]"</span>, <span class="str">""</span>).Trim();
    <span class="kwrd">return</span> txtOut;
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2007/12/how-to-remove-all-special-characters-from-a-string/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How to remove commonly occuring English words from a string</title>
		<link>http://jesal.us/2007/12/how-to-remove-commonly-occuring-english-words-form-a-string/</link>
		<comments>http://jesal.us/2007/12/how-to-remove-commonly-occuring-english-words-form-a-string/#comments</comments>
		<pubDate>Thu, 20 Dec 2007 23:59:08 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/2007/12/20/how-to-remove-commonly-occuring-english-words-form-a-string/</guid>
		<description><![CDATA[I&#8217;m using this function to filter common words out of a search query.


protected string removeCommonWords(string sourceStr)
{
    string[] seperator = { " " };
    string[] ignoreWords = { "a", "all", "am", "an", "and", "any", "are", "as",
        "at", "be", "but", "can", "did", "do", "does", [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m using this function to filter common words out of a search query.</p>
<p><!-- code formatted by http://manoli.net/csharpformat/ --></p>
<pre class="csharpcode">
<span class="kwrd">protected</span> <span class="kwrd">string</span> removeCommonWords(<span class="kwrd">string</span> sourceStr)
{
    <span class="kwrd">string</span>[] seperator = { <span class="str">" "</span> };
    <span class="kwrd">string</span>[] ignoreWords = { <span class="str">"a"</span>, <span class="str">"all"</span>, <span class="str">"am"</span>, <span class="str">"an"</span>, <span class="str">"and"</span>, <span class="str">"any"</span>, <span class="str">"are"</span>, <span class="str">"as"</span>,
        <span class="str">"at"</span>, <span class="str">"be"</span>, <span class="str">"but"</span>, <span class="str">"can"</span>, <span class="str">"did"</span>, <span class="str">"do"</span>, <span class="str">"does"</span>, <span class="str">"for"</span>, <span class="str">"from"</span>, <span class="str">"had"</span>,
        <span class="str">"has"</span>, <span class="str">"have"</span>, <span class="str">"here"</span>, <span class="str">"how"</span>, <span class="str">"i"</span>, <span class="str">"if"</span>, <span class="str">"in"</span>, <span class="str">"is"</span>, <span class="str">"it"</span>, <span class="str">"no"</span>, <span class="str">"not"</span>,
        <span class="str">"of"</span>, <span class="str">"on"</span>, <span class="str">"or"</span>, <span class="str">"so"</span>, <span class="str">"that"</span>, <span class="str">"the"</span>, <span class="str">"then"</span>, <span class="str">"there"</span>, <span class="str">"this"</span>, <span class="str">"to"</span>,
        <span class="str">"too"</span>, <span class="str">"up"</span>, <span class="str">"use"</span>, <span class="str">"what"</span>, <span class="str">"when"</span>, <span class="str">"where"</span>, <span class="str">"who"</span>, <span class="str">"why"</span>, <span class="str">"you"</span> };
    <span class="kwrd">string</span>[] outputStr = { };

    outputStr = sourceStr.ToLower().Split(seperator, StringSplitOptions.RemoveEmptyEntries);

    <span class="kwrd">foreach</span> (<span class="kwrd">string</span> unwantedWord <span class="kwrd">in</span> ignoreWords)
    {
        <span class="kwrd">int</span> index = Array.IndexOf(outputStr, unwantedWord);

        <span class="kwrd">if</span> (index != -1)
        {
            <span class="kwrd">string</span>[] copyStrArr = <span class="kwrd">new</span> <span class="kwrd">string</span>[outputStr.Length - 1];

            <span class="rem">// copy the elements before the found index</span>
            <span class="kwrd">for</span> (<span class="kwrd">int</span> i = 0; i &lt; index; i++)
            {
                copyStrArr[i] = outputStr[i];
            }

            <span class="rem">// copy the elements after the found index</span>
            <span class="kwrd">for</span> (<span class="kwrd">int</span> i = index; i &lt; copyStrArr.Length; i++)
            {
                copyStrArr[i] = outputStr[i + 1];
            }

            outputStr = copyStrArr;
        }
    }

    sourceStr = <span class="kwrd">string</span>.Join(<span class="str">" "</span>, outputStr);

    <span class="kwrd">return</span> sourceStr;
}</pre>
<p>Let me know if you guys have a better solution. <img src='http://jesal.us/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2007/12/how-to-remove-commonly-occuring-english-words-form-a-string/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Strip out HTML tags using RegEx</title>
		<link>http://jesal.us/2007/11/strip-out-html-tags-via-regex/</link>
		<comments>http://jesal.us/2007/11/strip-out-html-tags-via-regex/#comments</comments>
		<pubDate>Wed, 14 Nov 2007 04:16:51 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/2007/11/13/strip-out-html-tags-via-regex/</guid>
		<description><![CDATA[This code will strip out all the HTML tags and truncate the text to 4 lines.


public static string TruncateText(string txtIn, int newLength)
{
    string txtOut = txtIn;
    string pattern = @"&#60;(.&#124;\n)*?&#62;";

    //Strip out HTML tags
    if (Regex.IsMatch(txtIn, pattern, RegexOptions.None))
      [...]]]></description>
			<content:encoded><![CDATA[<p>This code will strip out all the HTML tags and truncate the text to 4 lines.</p>
<p><!-- code formatted by http://manoli.net/csharpformat/ --></p>
<pre class="csharpcode">
<span class="kwrd">public</span> <span class="kwrd">static</span> <span class="kwrd">string</span> TruncateText(<span class="kwrd">string</span> txtIn, <span class="kwrd">int</span> newLength)
{
    <span class="kwrd">string</span> txtOut = txtIn;
    <span class="kwrd">string</span> pattern = <span class="str">@"&lt;(.|\n)*?&gt;"</span>;

    <span class="rem">//Strip out HTML tags</span>
    <span class="kwrd">if</span> (Regex.IsMatch(txtIn, pattern, RegexOptions.None))
        txtOut = Regex.Replace(txtIn, pattern, <span class="kwrd">string</span>.Empty, RegexOptions.Multiline).Trim();

    <span class="kwrd">if</span> (txtOut.Length &gt; newLength)
    {
        <span class="kwrd">int</span> endPos = txtOut.LastIndexOf(<span class="str">" "</span>, newLength);
        txtOut = txtOut.Substring(0, endPos) + <span class="str">"..."</span>;
    }

    <span class="kwrd">return</span> txtOut;
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2007/11/strip-out-html-tags-via-regex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RegEx Tutorials</title>
		<link>http://jesal.us/2007/05/regex-tutorials/</link>
		<comments>http://jesal.us/2007/05/regex-tutorials/#comments</comments>
		<pubDate>Mon, 07 May 2007 06:27:09 +0000</pubDate>
		<dc:creator>J</dc:creator>
				<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://jesal.us/blog/2007/05/07/regex-tutorials/</guid>
		<description><![CDATA[One of the most useful things to learn in programming is regular expressions. Knowing how to use regular expressions could save you a TON of time and make your applications more efficient.
I recently compiled a small list of sites/blogs which have excellent reference material on regular expressions. Here it is:

The absolute bare minimum every programmer [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most useful things to learn in programming is regular expressions. Knowing how to use regular expressions could save you a TON of time and make your applications more efficient.</p>
<p>I recently compiled a small list of sites/blogs which have excellent reference material on regular expressions. Here it is:</p>
<ul>
<li><a href="http://immike.net/blog/2007/04/06/the-absolute-bare-minimum-every-programmer-should-know-about-regular-expressions/">The absolute bare minimum every programmer should know about regular expressions</a></li>
<li><a href="http://en.kerouac3001.com/regex-tutorial-8.htm">Regex Tutorial</a></li>
<li><a href="http://www.ilovejackdaniels.com/cheat-sheets/regular-expressions-cheat-sheet/">Regular Expressions Cheat Sheet</a></li>
<li><a href="http://www.regular-expressions.info/">regular-expressions.info</a></li>
<li><a href="http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx">Learn Regular Expression (Regex) syntax with C# and .NET</a></li>
</ul>
<p>Hope that helps!!</p>
]]></content:encoded>
			<wfw:commentRss>http://jesal.us/2007/05/regex-tutorials/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
