Posts Tagged ‘regex’

May 6, 2009 4

How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0

By J in Uncategorized

Lately I’ve been working on my pet project called Twime. So as part of that project, I wanted to add the ability to parse URLs, usernames and hashtags from the user’s Twitter timeline. Here’s how I went about doing that:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Text.RegularExpressions;

public static class HTMLParser
{
public static string Link(this [...]

Tags: , ,

May 6, 2008 0

How to easily parse HTML without RegEx

By J in Uncategorized

I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions.
Here’s a very simple example of what you could do with it – I’m just extracting inner HTML from any element inside a HTML file which has a css class [...]

Tags: , ,

January 5, 2008 3

How to extract URLs (href property) from HTML

By J in Uncategorized

protected ArrayList getURL(string txtIn)
{
ArrayList outURL = new ArrayList();
Regex r = new Regex(“href\\s*=\\s*(?:(?:\\\”(?<url>[^\\\"]*)\\\”)|(?<url>[^\\s]* ))”);
MatchCollection mc1 = r.Matches(txtIn);

foreach (Match m1 in mc1)
{
foreach (Group g in m1.Groups)
[...]

Tags: , ,

December 27, 2007 1

How to remove all special characters from a string

By J in Uncategorized

protected string StripSpecChars(string txtIn)
{
string txtOut = Regex.Replace(txtIn, @”[^\w\.@-]“, “”).Trim();
return txtOut;
}

Tags: ,

December 20, 2007 1

How to remove commonly occuring English words from a string

By J in Uncategorized

I’m using this function to filter common words out of a search query.

protected string removeCommonWords(string sourceStr)
{
string[] seperator = { ” ” };
string[] ignoreWords = { “a”, “all”, “am”, “an”, “and”, “any”, “are”, “as”,
“at”, “be”, “but”, “can”, “did”, “do”, “does”, [...]

Tags: ,

November 13, 2007 0

Strip out HTML tags using RegEx

By J in Uncategorized

This code will strip out all the HTML tags and truncate the text to 4 lines.

public static string TruncateText(string txtIn, int newLength)
{
string txtOut = txtIn;
string pattern = @”<(.|\n)*?>”;

//Strip out HTML tags
if (Regex.IsMatch(txtIn, pattern, RegexOptions.None))
[...]

Tags: , ,

May 7, 2007 0

RegEx Tutorials

By J in Uncategorized

One of the most useful things to learn in programming is regular expressions. Knowing how to use regular expressions could save you a TON of time and make your applications more efficient.
I recently compiled a small list of sites/blogs which have excellent reference material on regular expressions. Here it is:

The absolute bare minimum every programmer [...]

Tags: