Posts Tagged ‘html’

December 21, 2010 1

Cross-Domain Communication with IFrames

By in Uncategorized

Recently I encountered a situation where I had to communicate between an iframe located on a different domain and its parent. Due to the “same origin policy”, a security concept for browsers, which only permits scripts running on pages originating from the same site, this was not possible to do right out of the box. [...]

Tags: ,

May 6, 2008 0

How to easily parse HTML without RegEx

By in Uncategorized

I recently discovered an absolutely amazing HTML parsing library for .NET called HtmlAgilityPack. It completely takes away the pain of parsing complicated HTML with regular expressions. Here’s a very simple example of what you could do with it – I’m just extracting inner HTML from any element inside a HTML file which has a css [...]

Tags: , ,

January 5, 2008 3

How to extract URLs (href property) from HTML

By in Uncategorized

protected ArrayList getURL(string txtIn) { ArrayList outURL = new ArrayList(); Regex r = new Regex(“href\\s*=\\s*(?:(?:\\\”(?<url>[^\\\"]*)\\\”)|(?<url>[^\\s]* ))”); MatchCollection mc1 = r.Matches(txtIn); foreach (Match m1 in mc1) { foreach (Group g in m1.Groups) { outURL.Add(g.Value); } } return outURL; }

Tags: , ,

November 13, 2007 0

Strip out HTML tags using RegEx

By in Uncategorized

This code will strip out all the HTML tags and truncate the text to 4 lines. public static string TruncateText(string txtIn, int newLength) { string txtOut = txtIn; string pattern = @”<(.|\n)*?>”; //Strip out HTML tags if (Regex.IsMatch(txtIn, pattern, RegexOptions.None)) txtOut = Regex.Replace(txtIn, pattern, string.Empty, RegexOptions.Multiline).Trim(); if (txtOut.Length > newLength) { int endPos = txtOut.LastIndexOf(” [...]

Tags: , ,