Being into BI business and getting happy with the facts presented via reports, friends, consultants and more…
Is some times just not enough !!!
So what do I do…
I build my own tools and get my hand’s little dirty to dig in and find out the facts for myself. Seems over-analysis … but believe me its fun to find some really interesting facts as below in one of my quest to find out the rental sub-urban profiles of Sydney.
Task: Analyse the rental areas in Sydney based on average weekly rental price.
(bcoz I am looking for a place to rent :D… So I build a cube)
1. Identify the data source: In this case the most popular portal for Real Estate data in Australia came to my mind was http://www.realestate.com.au/rent .
2. Knowing the dimensions to your data: This is one the most critical points in any BI solution/project, (where most of the wars are fought as well). To me this phase of any project is the most troubling phase where pretty much the bonds between the developers and business are tested to its max and the real personalities come to light. 😀
Well back to the discussion, (as I don’t have any one to fight in this) I decided considering my requirements that I would need only one additional dataset which would actually define the key to my project and that is the postal code/Area meta data, based on which I would extract the rental search results on to my database for further analysis. And again going back to step 1 to identify the data source for the same landed me on to http://www.postcodes-australia.com/
3. Build the extraction tool: Now again, there is a call to consider weighing in the effort spent building the tool for extraction or manually based on the data set.
3.1 Compiling the Postal Code Data
Now as my dimensional data of postal code was decided as an important entity in my project, and looking at the data I quickly realized that writing a tool to achieve a 15 minute manual job of copy paste is a’int just worth. So I was being laborious copied – pasted the data off the website on to excel, cleaned it up, arranged and pushed it into the database via SQL standard import package of excel. Resulting as below:
3.2 Extracting the rental search results from the portal
Now for the main data source extraction, I revvvvved up my C# (.NET 4.5/VS2012) with the HTML agility referencing (https://htmlagilitypack.codeplex.com/) and started ploughing the portal search results shoving them up my SQL 2012 database. Now my logic of extraction of rental search data was only based on getting the information in the first 4 pages (any more search result’s were ignored) of the RealEstate portal search based on the postal code data I obtained earlier.
(Note: the search was based on most relevant results, no ordering was in place)
Thus resulting into my populated database table as follows:
Now you have the means once the data is in database, analyse it via excel, build cube, pivot table on it what not !!!
And as follows I ended up building a analytical cube on the same to get my fruits of the effort and search my rental apartment….
Also featuring the navigation mechanism, where the likeable area’s are just a click away to visualize on Google maps providing with the boundaries of the area under consideration.
Attached along with this post is the sample data for your play around (containing the NSW and VIC rental listing data), and the cube for over-analysing the same.
And the the extraction tool !!! ??
Sorry guys not every things comes for free !! But feel free to contact me on the same.
Happy renting !!!
Note: The above article is based only for technology exhibit, only a sample subset of data was extracted for demonstration purposes only.