IBM Watson Data Platform and Open Data 27 February 2017 Margriet Groenendijk | Developer Advocate | IBM Watson Data Platform @MargrietGr https://medium.com/ibm-watson-data-lab
@MargrietGr About me Developer Advocate, Data scientist Previous Research Fellow at University of Exeter, UK PhD at VU University Amsterdam, the Netherlands
@MargrietGr IBM Watson Data Platform Connect Discover Accelerate
@MargrietGr IBM Watson Data Platform
IBM Bluemix https://console.ng.bluemix.net/
@MargrietGr Bluemix https:// console.ng.bluemix.net/
@MargrietGr https://github.com/snowch/movie-recommender-demo
@MargrietGr https://movie- recommender-demo- margrietgroenendijk-1234. mybluemix.net/
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr APIs
https://github.com/MargrietGroenendijk/Bristol
https://github.com/MargrietGroenendijk/Bristol
@MargrietGr Example : twitter
@MargrietGr Example : Watson Tone Analyser
@MargrietGr Emotion Language style Social propensities Analyze how you are coming across to others
Cloudant NoSQL
@MargrietGr Cloudant is a database id firstname lastname dob 1 John Smith 1970-01-01 2 Kate Jones 1971-12-25 { "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01" }
@MargrietGr Cloudant is "schemaless" { "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com" }
@MargrietGr Cloudant is "schemaless" { "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true }
@MargrietGr Cloudant is "schemaless" { "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true, "tags": ["tall", "glasses"] }
@MargrietGr Cloudant is "schemaless" { "_id": "1", "firstname": "John", "lastname": "Smith", "dob": "1970-01-01", "email": "john.smith@gmail.com", "confirmed": true, "tags": ["tall", "glasses"], "address" : { "number": 14, "street": "Front Street", "town": "Luton", "postcode": "LU1 1AB" } }
@MargrietGr Cloudant is built for the web ▪Store JSON Documents ▪Speaks an HTTP API ▪Lives on the web
@MargrietGr Cloudant is fault tolerant
@MargrietGr Cloudant is fault tolerant
@MargrietGr Cloudant is resilient "write"
@MargrietGr Cloudant is resilient "ok" "write"
@MargrietGr Cloudant is scalable
@MargrietGr Cloudant replicates
@MargrietGr Cloudant replicates
@MargrietGr Cloudant replicates
@MargrietGr Cloudant replicates
@MargrietGr Runkeeper
@MargrietGr
@MargrietGr Open Street Map Data IBM Cloudant Use from anywhere! Daily updates VM daily cron Python script Always up to date! Currently 12,467,460 POIs
@MargrietGr wget -c http://download.geofabrik.de/europe/netherlands- latest.osm.pbf Several data sources - world, continent, country, city or a user defined box Several data formats for which free to use conversion tools exist - pbf, osm, json, shp Example:
@MargrietGr Extract the POIs with osmosis osmosis --read-pbf netherlands-latest.osm.pbf --tf accept-nodes aerialway=station aeroway=aerodrome,helipad,heliport amenity=* craft=* emergency=* highway=bus_stop,rest_area,services historic=* leisure=* office=* public_transport=stop_position,stop_area shop=* tourism=* --tf reject-ways --tf reject-relations --write-xml netherlands.nodes.osm (easy to install with brew on Mac)
@MargrietGr Some cleaning up with osmconvert Convert from osm to json format with ogr2ogr osmconvert $netherlands.nodes.osm --drop-ways --drop-author --drop-relations --drop-versions >$netherlands.poi.osm ogr2ogr -f GeoJSON $netherlands.poi.json $netherlands.poi.osm points
@MargrietGr Upload to Cloudant with couchimport export COUCH_URL="https:// username:password@username.cloudant.com" cat $netherlands.poi.json | couchimport --db poi-$netherlands --type json --jsonpath "features.*" https://github.com/glynnbird/couchimport IBM Cloudant
@MargrietGr Examples from https://console.ng.bluemix.net/docs/services/Cloudant/ api/cloudant-geo.html#cloudant-geospatial
@MargrietGr
@MargrietGr UK Crime Data from https://data.police.uk/data/
@MargrietGr https://opendata.cloudant.com/crimes-uk/_design/ spatial/_geo/newGeoIndex? bbox=-2.600283622741699%2C51.44886539765683%2C-2.59620 66650390625%2C51.4533851454499&limit=20&relation=conta ins
@MargrietGr Python - requests
dashDB Data warehouse
@MargrietGr Add the dashDB service in Bluemix Add a service Search for dashDB
@MargrietGr
@MargrietGr 3 1 2 posted:2016-08-01,2016-10-01 followers_count:3000 friends_count: 3000 (weather OR sun OR sunny OR rain OR hail OR storm OR rainy OR drought OR flood OR hurricane OR tornado OR cold OR snow OR drizzle OR cloudy OR thunder OR lightning OR wind OR windy OR heatwave) REST API docs: https://new-console.ng.bluemix.net/docs/ services/Twitter/ twitter_rest_apis.html#rest_apis Search for tweets 4 Select table Use an existing service
@MargrietGr
Apache Spark
@MargrietGr Apache Spark
@MargrietGr
@MargrietGr RDDs : Resilient Distributed Datasets Data does not have to fit on a single machine Data is separated into partitions Creation of RDDs Load an external dataset Distribute a collection of objects Transformations construct a new RDD from a previous one (lazy!) Actions compute a result based on an RDD
@MargrietGr Load tweets from dashDB with Spark SQL
@MargrietGr Clean data, summarise and load into pandas DataFrame
IBM Data Science Experience
datascience.ibm.com
@MargrietGr
Getting started ▪ Go to datascience.ibm.com and sign in with your Bluemix account when you have one, else sign up for one at the top right of the screen
Create a project ▪ Create New project, click on the link in top of the screen ▪ Or go to the My Projects in the menu on the left of the screen and click Create New Project here
Create a project ▪ Name the Project ▪ Choose a Spark Service ▪ Choose an Object Storage ▪ Click Create
Add collaborators ▪ Click add collaborator ▪ Search for your project members ▪ Select Permission
Add a notebook ▪ Click add notebooks
Add a notebook ▪ Click add notebooks ▪ Pick your favourite: ▪ Python 2 ▪ Scala ▪ R ▪ Choose Spark 1.6 or 2.0 ▪ Click Create Notebook
Let’s write some code ▪ Click the pen icon to start adding code (edit mode) ▪ When collaborating only one person can edit, others can add comments to the notebook when in view mode
@MargrietGr Example : Bristol open data
@MargrietGr Object-store
@MargrietGr Python package PixieDust
@MargrietGr Watson Machine Learning
@MargrietGr IBM Watson Data Platform Bluemix Data storage Apps Watson APIs Weather Data Science Experience Watson Machine Learning Watson Analytics
Thanks! https://github.com/MargrietGroenendijk/Bristol http://www.slideshare.net/MargrietGroenendijk/presentations @MargrietGr https://medium.com/ibm-watson-data-lab

Introduction to the IBM Watson Data Platform