Lately I’ve been working on my pet project called Twime. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user’s Twitter timeline. Here’s how I went about doing that:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Text.RegularExpressions;
public static class HTMLParser
{
public static string Link(this string s, string url)
{
return string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", url, s);
}
public static string ParseURL(this string s)
{
return Regex.Replace(s, @"(http(s)?://)?([\w-]+\.)+[\w-]+(/\S\w[\w- ;,./?%&=]\S*)?", new MatchEvaluator(HTMLParser.URL));
}
public static string ParseUsername(this string s)
{
return Regex.Replace(s, "(@)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Username));
}
public static string ParseHashtag(this string s)
{
return Regex.Replace(s, "(#)((?:[A-Za-z0-9-_]*))", new MatchEvaluator(HTMLParser.Hashtag));
}
private static string Hashtag(Match m)
{
string x = m.ToString();
string tag = x.Replace("#", "%23");
return x.Link("http://search.twitter.com/search?q=" + tag);
}
private static string Username(Match m)
{
string x = m.ToString();
string username = x.Replace("@", "");
return x.Link("http://twitter.com/" + username);
}
private static string URL(Match m)
{
string x = m.ToString();
return x.Link(x);
}
}
So as you can see I’m using the new Extension Methods feature in C# 3.0.
Now I can simply just call the extension methods like this:
string tweet = "Just blogged about how to parse HTML from the @twitter timeline - http://jesal.us/blog/?p=132 #programming"; Response.Write(tweet.ParseURL().ParseUsername().ParseHashtag());
and the result should looks something like this:
Just blogged about how to parse html from the @twitter timeline – http://jesal.us/blog/?p=132 #programming
Just be sure to call ParseURL method before ParseUsername and ParseHashtag. The other two methods will add URLs to the usernames and hastags and you don’t want ParseURL to confuse those links with the original links present in the text.
This was inspired by Simon Whatley’s post about doing something similar using prototyping with JavaScript.
ДА, вариант хороший
Change the Link method to return this and it works like a champ.
return string.Format(“{1}“, url, s);
Great post!
Ah, your blog is changing the return String.Format method. It just did it to my comment. Probably because it’s html… and it’s not encoding the comments.
It should be a href {0} for the url then {1} for the link name.
Thanks for pointing that out. I had to HTML encode those greater-than and less-than signs. Its a pain working with code snippets in Wordpress.