In Australia we have weather information published on the BOM (Bureau of Meteorology) website, and as often needed we get the historical/forecast data-sets from this portal.
One usually has few options around this:
If the data requirement is one-off for any particular analysis or requirement
- Look for a download option on the portal (CSV usually)
- Copy and paste data from the web page
If the data is required to be fed into the reporting systems and need to be extracted and updated on regular intervals
- Set up the excel with the web link URL setup, but to my experience they have been usually unreliable as web techniques are moving forward and lowering the usage of <table> tags in HTML
- Write an C# application which can extract the data from the Web URL (HTML Parsing) on a regular interval scheduled via any enterprise scheduler.
And this bring me on the content of this post:This post outlines a C# application which navigates to the following set of URL parses the relevant data set and saves it back to the SQL tables:
using HtmlAgilityPack; using System; using System.Collections.Generic; using System.Linq; using System.Net; using System.Text; using System.Threading.Tasks; namespace BOMWeatherDataDownload { class Program { static void Main(string[] args) { var weatherForecast = DownloadWeatherForecast("sydney" , new Uri("http://www.bom.gov.au/nsw/forecasts/sydney.shtml")); } public static List<WeatherForecast> DownloadWeatherForecast(string locationName, Uri downloadUrl) { HtmlWeb web = new HtmlWeb(); var htmlDoc = web.Load(downloadUrl.OriginalString); List<WeatherForecast> weatherForecasts = new List<WeatherForecast>(); int iCount = 0; foreach (HtmlNode dayNode in htmlDoc.DocumentNode.SelectNodes("//div[contains(@class,'day')]")) { if (dayNode.Attributes["class"].Value == "day" || dayNode.Attributes["class"].Value == "day main") { WeatherForecast dayForecast = ParseHtmlNodeForInfo(dayNode, int.Parse(DateTime.Now.AddDays(iCount).ToString("yyyyMMdd"))); weatherForecasts.Add(dayForecast); iCount++; } } return weatherForecasts; } public static WeatherForecast ParseHtmlNodeForInfo(HtmlNode node, int timeId) { WeatherForecast weatherForecast = new WeatherForecast(); weatherForecast.Time_ID = timeId; if (node.SelectSingleNode(".//dd[@class='image']") != null) if (node.SelectSingleNode(".//dd[@class='image']").SelectSingleNode(".//img") != null) weatherForecast.ImageUri = "http://www.bom.gov.au" + node.SelectSingleNode(".//dd[@class='image']").SelectSingleNode(".//img").Attributes["src"].Value; if (node.SelectSingleNode(".//em[@class='min']") != null) weatherForecast.MinTemperature = decimal.Parse(node.SelectSingleNode(".//em[@class='min']").InnerText); if(node.SelectSingleNode(".//em[@class='max']") != null) weatherForecast.MaxTemperature = decimal.Parse(node.SelectSingleNode(".//em[@class='max']").InnerText); if(node.SelectSingleNode(".//dd[@class='summary']") != null) weatherForecast.ShortInfo = node.SelectSingleNode(".//dd[@class='summary']").InnerText; if (node.SelectSingleNode(".//em[@class='pop']") != null) weatherForecast.ChancesOfRain = decimal.Parse(node.SelectSingleNode(".//em[@class='pop']").InnerText.Replace("%", "")) / 100; if (node.SelectSingleNode(".//em[@class='rain']") != null) { var tString = node.SelectSingleNode(".//em[@class='rain']").InnerText.Split(new string[] { "to" }, StringSplitOptions.None); weatherForecast.MinRainFall = decimal.Parse(tString[0]); weatherForecast.MaxRainFall = decimal.Parse(tString[1].Replace("mm", "")); } return weatherForecast; } } }
The above application uses the following Nuget packages for website parsing. And leveraging which the information can be extracted on regular intervals.
The reference code project files is available for download and review.
The resultant data set can be finally saved in the format:
Weather Forecast:
Weather Observations (History):
Once we have the above regular updated data set, this can be leverage by multiple streams of organisation in their reporting and data processing systems.
Hope it helps !!
Enjoy the Weather …. data !!!