In this developer journey we will use PixieDust running on IBM Data Science Experience (DSX) to analyze traffic data from the City of San Francisco. DSX is an interactive, collaborative, cloud-based environment where data scientists, developers, and others interested in data science can use tools (e.g., RStudio, Jupyter Notebooks, Spark, etc.) to collaborate, share, and gather insight from their data.
When the reader has completed this journey, they will understand how to:
- How to use Jupyter Notebooks to load, visualize, and analyze data
- How to run Notebooks in IBM Data Science Experience
- PixieDust Open Source Python library
- How to build a dashboard using PixieApps
- City of San Francisco Open Data
- Mapbox GL JavaScript library for interactive maps
The intended audience for this journey is application developers and other stakeholders who wish to utilize the power of Data Science quickly and effectively.
Follow these steps to setup and run this developer journey. The steps are described in detail below.
- Sign up for the Data Science Experience
- Create the notebook
- Run the notebook
- Analyze the results
- Save and Share
Sign up for IBM's Data Science Experience. By signing up for the Data Science Experience, two services: DSX-Spark and DSX-ObjectStore will be created in your Bluemix account.
Create the Project:
- From the IBM Data Science Experience page either click the "Projects" tab at the top or scroll down to "Recently updated projects".
- Click on "+ Create Project" in the Projects tab view or "+ New Project" under Recently updated projects.
- Choose a "Name" and, optionally, a "Description". Accept the default "DSX-Spark" for Spark Service, "Object Storage (Swift API)" for Storage Type, and "DSX-ObjectStorage" for Target Object Storage Instance.
- Click "Create".
Create the Notebook:
- In you project, click "add notebooks".
- Click the tab for "From URL" and enter a "Name" and optional "Description".
- In the "Notebook URL" box put: https://github.com/IBM/pixiedust-traffic-analysis/blob/master/notebooks/pixiedust-traffic-analysis.ipynb
- Accept the default "DSX-Spark" for Spark Service and click "Create Notebook".
Use the menu on the left to select My Projects and then Default Project. Click on Add notebooks (upper right) to create a notebook.
- Select the
From URLtab. - Enter a name for the notebook.
- Optionally, enter a description for the notebook.
- Enter this Notebook URL:https://github.com/IBM/pixiedust-traffic-analysis/blob/master/notebooks/pixiedust-traffic-analysis.ipynb
- Use the
Spark Servicepulldown to select yourDSX-Sparkservice. - Click the
Create Notebookbutton.
When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.
Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:
- A blank, this indicates that the cell has never been executed.
- A number, this number represents the relative order this code step was executed.
- A
*, this indicates that the cell is currently executing.
There are several ways to execute the code cells in your notebook:
- One cell at a time.
- Select the cell, and then press the
Playbutton in the toolbar.
- Select the cell, and then press the
- Batch mode, in sequential order.
- From the
Cellmenu bar, there are several options available. For example, you canRun Allcells in your notebook, or you canRun All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
- From the
- At a scheduled time.
- Press the
Schedulebutton located in the top right section of your notebook panel. Here you can schedule your notebook to be executed once at some future time, or repeatedly at your specified interval.
- Press the
After running each cell of the notebook, the results will display. When we use PixieDust display() to create an interactive dataset, we are able to change the visualization using tables, graphs, and charts.
After running cell #3 display(accidents), we can see by clicking the Options button that we are able to manipulate the keys and values for the fields used in the chart: 
Following the instructions, we use DaysOfWeek and IncidntNum, but the user can change the keys and value to see how the chart will look with different inputs.
We use Spark SQL to isolate data to the Taraval district:
accidents.registerTempTable("accidents") taraval = sqlContext.sql("SELECT * FROM accidents WHERE PdDistrict='TARAVAL'") We then get an interactive map of the Taraval district: 
With PixieApps, we can create a dashboard with map layers that can be used to visualize various datasets (i.e. Speeding, Traffic Calming, Police Districts, and Crimes): 
from pixiedust.display.app import * @PixieApp class SFDashboard(): def mainScreen(self): return """ <div class="well"> <center><span style="font-size:x-large">Analyzing San Francisco Public Safety data with PixieDust</span></center> <center><span style="font-size:large"><a href="https://datasf.org/opendata" target="new">https://datasf.org/opendata</a></span></center> </div> <div class="row"> <div class="form-group col-sm-2" style="padding-right:10px;"> <div><strong>Layers</strong></div> {% for layer in this.layers %} <div class="rendererOpt checkbox checkbox-primary"> <input type="checkbox" pd_refresh="map{{prefix}}" pd_script="self.toggleLayer({{loop.index0}})"> <label>{{layer["name"]}}</label> </div> {%endfor%} </div> <div class="form-group col-sm-10"> <div id="map{{prefix}}" pd_entity pd_options="{{this.formatOptions(this.mapJSONOptions)}}"/> </div> </div> """ <div id="map{{prefix}}" pd_entity pd_options="{{this.formatOptions(this.mapJSONOptions)}}"/> pd_entity: Tell PixieDust which dataset to work on.
pd_options: Contains the PixieDust options for the map.
The best way to generate the pd_options for a PixieDust visualization is to:
- Call display() on a new cell
- Graphically select the options for your chart
- Select View/Cell Toobar/Edit metadata menu
- Click on the “Edit Metadata” button and copy the PixieDust metadata
To conform to the pd_options notation, we need to transform the PixieDust JSON metadata into an attribute string with the following format: “key1=value1;key2=value2;…”
To make it easier, we use the a simple Python transform function:
def formatOptions(self, options): return ';'.join(["{}={}".format(k,v) for (k, v) in iteritems(options)]) The formatOptions is then invoked using JinJa2 notation from within the html:
pd_options = “{{this.formatOptions(this.mapJSONOptions)}}” Note: setup is a special method that will be called automatically when the PixieApp is initialized.
def setup(self): self.mapJSONOptions = { "mapboxtoken": "pk.eyJ1IjoicmFqcnNpbmdoIiwiYSI6ImNqM2s4ZDg4djAwcGYyd3BwaGxwaDV3bWoifQ.d5Rklkdu5MeGAnXu1GMNYw", "chartsize": "90", "aggregation": "SUM", "rowCount": "500", "handlerId": "mapView", "rendererId": "mapbox", "valueFields": "IncidntNum", "keyFields": "X,Y", "basemap": "light-v9" } from pixiedust.display.app import * from pixiedust.apps.mapboxBase import MapboxBase @PixieApp class SFDashboard(MapboxBase): def setup(self): ...<snip>... self.setLayers([ { "name": "Speeding", "url": "https://data.sfgov.org/api/geospatial/mfjz-pnye?method=export&format=GeoJSON" }, { "name": "Traffic calming", "url": "https://data.sfgov.org/api/geospatial/ddye-rism?method=export&format=GeoJSON", "type": "symbol", "layout": { "icon-image": "police-15", "icon-size": 1.5 } }, ...<snip>... ...<snip>... {% for layer in this.layers %} <div class="rendererOpt checkbox checkbox-primary"> <input type="checkbox" pd_refresh="map{{prefix}}" pd_script="self.toggleLayer({{loop.index0}})"> <label>{{layer["name"]}}</label> </div> {%endfor%} ...<snip>... The user can now select layers and the map will dynamically add or remove them.
Under the File menu, there are several ways to save your notebook:
Savewill simply save the current state of your notebook, without any version information.Save Versionwill save your current state of your notebook with a version tag that contains a date and time stamp. Up to 10 versions of your notebook can be saved, each one retrievable by selecting theRevert To Versionmenu item.
You can share your notebook by selecting the “Share” button located in the top right section of your notebook panel. The end result of this action will be a URL link that will display a “read-only” version of your notebook. You have several options to specify exactly what you want shared from your notebook:
Only text and output: will remove all code cells from the notebook view.All content excluding sensitive code cells: will remove any code cells that contain a sensitive tag. For example,# @hidden_cellis used to protect your dashDB credentials from being shared.All content, including code: displays the notebook as is.- A variety of
download asoptions are also available in the menu.
There is a sample of the output in data/examples/pixiedust-traffic-analysis.html. This can be viewed in rawgit with this link: pixiedust-traffic-analysis.html
Note: Some interactive map functionality, like
OptionsandLayerswill not work. To see these, you must run the notebook itself.


