I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions.
Here’s a very simple example of what you could do with it – I’m just extracting inner HTML from any element inside a HTML file which has a css class [...]
Posts Tagged ‘html’
May 6, 2008
0
How to easily parse HTML without RegEx
By J in Uncategorized
November 13, 2007
0
Strip out HTML tags using RegEx
By J in UncategorizedThis code will strip out all the HTML tags and truncate the text to 4 lines.
public static string TruncateText(string txtIn, int newLength)
{
string txtOut = txtIn;
string pattern = @”<(.|\n)*?>”;
//Strip out HTML tags
if (Regex.IsMatch(txtIn, pattern, RegexOptions.None))
[...]