Posts Tagged ‘regex’

May 6, 2009 35

How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0

By in Uncategorized

Lately I’ve been working on my pet project called Twime. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user’s Twitter timeline. Here’s how I went about doing that: using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Text.RegularExpressions; public static class HTMLParser { [...]

Tags: , ,

May 6, 2008 0

How to easily parse HTML without RegEx

By in Uncategorized

I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions. Here’s a very simple example of what you could do with it – I’m just extracting inner HTML from any element inside a HTML file which has a css [...]

Tags: , ,

January 5, 2008 3

How to extract URLs (href property) from HTML

By in Uncategorized

protected ArrayList getURL(string txtIn) { ArrayList outURL = new ArrayList(); Regex r = new Regex(“href\\s*=\\s*(?:(?:\\\”(?<url>[^\\\"]*)\\\”)|(?<url>[^\\s]* ))”); MatchCollection mc1 = r.Matches(txtIn); foreach (Match m1 in mc1) { foreach (Group g in m1.Groups) { outURL.Add(g.Value); } } return outURL; }

Tags: , ,

December 27, 2007 1

How to remove all special characters from a string

By in Uncategorized

protected string StripSpecChars(string txtIn) { string txtOut = Regex.Replace(txtIn, @”[^\w\.@-]“, “”).Trim(); return txtOut; }

Tags: ,

December 20, 2007 1

How to remove commonly occuring English words from a string

By in Uncategorized

I’m using this function to filter common words out of a search query. protected string removeCommonWords(string sourceStr) { string[] seperator = { ” ” }; string[] ignoreWords = { “a”, “all”, “am”, “an”, “and”, “any”, “are”, “as”, “at”, “be”, “but”, “can”, “did”, “do”, “does”, “for”, “from”, “had”, “has”, “have”, “here”, “how”, “i”, “if”, “in”, “is”, [...]

Tags: ,

November 13, 2007 0

Strip out HTML tags using RegEx

By in Uncategorized

This code will strip out all the HTML tags and truncate the text to 4 lines. public static string TruncateText(string txtIn, int newLength) { string txtOut = txtIn; string pattern = @”<(.|\n)*?>”; //Strip out HTML tags if (Regex.IsMatch(txtIn, pattern, RegexOptions.None)) txtOut = Regex.Replace(txtIn, pattern, string.Empty, RegexOptions.Multiline).Trim(); if (txtOut.Length > newLength) { int endPos = txtOut.LastIndexOf(” [...]

Tags: , ,

May 7, 2007 0

RegEx Tutorials

By in Uncategorized

One of the most useful things to learn in programming is regular expressions. Knowing how to use regular expressions could save you a TON of time and make your applications more efficient. I recently compiled a small list of sites/blogs which have excellent reference material on regular expressions. Here it is: The absolute bare minimum [...]

Tags: