Access Open Data with Open Source Software Tools Sammy Fung sammy@sammy.hk
Sammy Fung ● Developer ● Founder, JobFOL ● President of Open Source Hong Kong
Creating values to us and community
Open Data
Open Data ● Discoverable – Available and Searchable on Internet. ● Structured – Open and Machine-readable Format. ● Unconditional – Legal Framework allows to reproduce an repurpose the data.
Open Source
Open Source ● Software Development Model ● Free Software (1985) – Free = Freedom – Run the program (Freedom 0) – Study the source code and change it (Freedom 1) – Redistribute copies (Freeom 2) – Distribute your modified version in same license (Freedom 3) ● Open Source (1998)
Open Source Web Application Software Stack ● LAMP – Linux (1991): Operating System – Apache (1995): Web Server – MySQL (1995): Database Server – PHP (1995): Server-side Scripting Language ● Other Alternatives: – LNMP: Replacing Apache with Nginx – Another M of LAMP: MariaDB, MongoDB
Python ● Programming Language – Since 1991 – Widely used general purpose – High-level – Open Source ● Another P of LAMP
My Open Data related Projects ● TV Timetable of Live Football Matches (2004) ● Weather Information (2006) ● Public Transportation Information (2006) ● LegCo Vote Information (2013) ● Air Quality Information (2014) ● Restaurant Information (2014)
TCTrack ● Plot a map of typhoon path of different observation agencies ● Google Map API – First Typhoon Map in HK using Google API – Sammy.HK TCTrack → Weather Underground → Hong Kong Observatory ● Twitter API – Posting typhoon updates from any potential formation of tropcial cyclone in Northwest Pacific Ocean. ● Data Sources: HKO, JTWC.
Interview by MetroPop in 2009
Open Data on Hong Kong Restaurant & Food Licenses
Licensed Restaurants in Hong Kong ● Open Data from Data.One PSI ● Open Source Software Tools – Python – Scrapy Web Scraping Framework ● Source Codes are released on GitHub – https://github.com/sammyfung/LP_Restaurants_Scr apy
Creating environment of a Scrapy project ● Requirements – Python, Python-Dev, virtualenv, pip ● Creating a virtual enviornment for python project – virtualenv ~/env – source ~/env/bin/activate – pip install scrapy
Creating a Scrapy project ● Creating a new Scrapy project with spider – scrapy startproject LP_Restaurants_Scrapy – cd LP_Restaurants_Scrapy – scrapy genspider rlxml fehd.gov.hk ● Creating a scrapy data model ● Doing some tests with scrapy shell. – scrapy shell <URL> – http://www.fehd.gov.hk/english/licensing/license/text/LP_Restaurants_EN.XML ● Writing the parse function of a scrapy spider. ● Try and test the spider – scrapy crawl rlxml -t json -o restaurant_licenses.json
Open Data
Open Source
Creating values to us and community

Access Open Data with Open Source Software Tools

  • 1.
    Access Open Data with Open Source Software Tools Sammy Fung sammy@sammy.hk
  • 2.
    Sammy Fung ●Developer ● Founder, JobFOL ● President of Open Source Hong Kong
  • 3.
    Creating values tous and community
  • 4.
  • 5.
    Open Data ●Discoverable – Available and Searchable on Internet. ● Structured – Open and Machine-readable Format. ● Unconditional – Legal Framework allows to reproduce an repurpose the data.
  • 8.
  • 9.
    Open Source ●Software Development Model ● Free Software (1985) – Free = Freedom – Run the program (Freedom 0) – Study the source code and change it (Freedom 1) – Redistribute copies (Freeom 2) – Distribute your modified version in same license (Freedom 3) ● Open Source (1998)
  • 11.
    Open Source WebApplication Software Stack ● LAMP – Linux (1991): Operating System – Apache (1995): Web Server – MySQL (1995): Database Server – PHP (1995): Server-side Scripting Language ● Other Alternatives: – LNMP: Replacing Apache with Nginx – Another M of LAMP: MariaDB, MongoDB
  • 12.
    Python ● ProgrammingLanguage – Since 1991 – Widely used general purpose – High-level – Open Source ● Another P of LAMP
  • 20.
    My Open Datarelated Projects ● TV Timetable of Live Football Matches (2004) ● Weather Information (2006) ● Public Transportation Information (2006) ● LegCo Vote Information (2013) ● Air Quality Information (2014) ● Restaurant Information (2014)
  • 22.
    TCTrack ● Plota map of typhoon path of different observation agencies ● Google Map API – First Typhoon Map in HK using Google API – Sammy.HK TCTrack → Weather Underground → Hong Kong Observatory ● Twitter API – Posting typhoon updates from any potential formation of tropcial cyclone in Northwest Pacific Ocean. ● Data Sources: HKO, JTWC.
  • 34.
  • 43.
    Open Data on Hong Kong Restaurant & Food Licenses
  • 46.
    Licensed Restaurants inHong Kong ● Open Data from Data.One PSI ● Open Source Software Tools – Python – Scrapy Web Scraping Framework ● Source Codes are released on GitHub – https://github.com/sammyfung/LP_Restaurants_Scr apy
  • 47.
    Creating environment of a Scrapy project ● Requirements – Python, Python-Dev, virtualenv, pip ● Creating a virtual enviornment for python project – virtualenv ~/env – source ~/env/bin/activate – pip install scrapy
  • 48.
    Creating a Scrapyproject ● Creating a new Scrapy project with spider – scrapy startproject LP_Restaurants_Scrapy – cd LP_Restaurants_Scrapy – scrapy genspider rlxml fehd.gov.hk ● Creating a scrapy data model ● Doing some tests with scrapy shell. – scrapy shell <URL> – http://www.fehd.gov.hk/english/licensing/license/text/LP_Restaurants_EN.XML ● Writing the parse function of a scrapy spider. ● Try and test the spider – scrapy crawl rlxml -t json -o restaurant_licenses.json
  • 49.
  • 50.
  • 51.
    Creating values tous and community