JSON-stat a simple light standard for all kinds of data disseminators Xavier Badosa @badosa http://xavierbadosa.com http://json-stat.org December, 2015
a simple light standard for all kinds of data disseminators Who needs to disseminate data?
Who needs to disseminate data? Nowadays? Everybody! Of course! NSOs* Central Banks Intl. orgs * National Statistical Offices Companies The media Citizens… But also NGOs a simple light standard for all kinds of data disseminators
How is data usually disseminated? a simple light standard for all kinds of data disseminators
How is data usually disseminated? intableform
intableform
Wherever there’s data addressed to humans there is (usually) a table
plain old tables Why are tables so popular?
Why are tables so popular? a display device Tablesare
a display device with analytical features
an abbreviation, a compressor a metadata saver
a cube model that avoids repeating metadata for every cell an abbreviation, a compressor a metadata saver
Cubic Thinking Describe data in dimension terms
Simple, for everybody? How? a simple light standard for all kinds of data disseminators
Simple, for everybody? How? If you managed to disseminate data for humans in tables, you should be able to do it for machines with no effort! a simple light standard for all kinds of data disseminators
JSON is a data format used in most APIs. It can include data and metadata in a single doc. Simple, for everybody? How? In JSON.
Using a very simple cube model that mimics a plain old table. Simple, for everybody? How? In JSON-stat.
A Canadian Example
table
data
What’s the simplest way to express these data in JSON?
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , anarray (flat) What’s the simplest way to express these data in JSON?
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Basic metadata?
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", } {
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "dimension" : { … } id and size are needed to “unflatten” the value array.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "dimension" : { … } id and size are needed to “unflatten” the value array. Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
value note/source/updated label
dimension
age dimension
age 20 categories dimension Size
age Role 20 class dimension Size
age concept 20 2 class metric dimension RoleSize
age concept sex 20 2 3 class metric class dimension RoleSize
age concept sex country 20 2 3 1 class metric class geo dimension RoleSize
age concept sex country year 20 2 3 1 1 class metric class geo time dimension RoleSize
Persons (thousands) 2012 Canada
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , To make sense of this array, dimensions must be ordered.
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] To make sense of this array, dimensions must be ordered.
["country", "year", "age", "concept", "sex"] Criterion: What does not change, first. To make sense of this array, dimensions must be ordered. (Position of dimensions of size 1 is irrelevant.)
country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF What does not change, first.
"value" : [ … ] } { "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { … }
"value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF
"value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
"sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } }
"sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } } {"T" : 0, "M" : 1, "F" : 2}, Also accepted (faster access)* * See “Arrays vs. Objects” http://bl.ocks.org/5708161
The “unflattening” problem
The “unflattening” problem from dimension positions [0,0,7,0,2]
The “unflattening” problem from dimension positions to value position 44[0,0,7,0,2]
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"]
[ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] 0 1 2 3 4 5… 44 …120
Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 is the first position (first category of the dimension)
Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 2 3 4 5 6 7
Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 0
Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 0 1 2
Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [0, 0, 7, 0, 2] → 44
2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [ 1, 1, 20, 2, 3 ] Persons (thousands) (Size) [0, 0, 7, 0, 2] → 44
The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] [0, 0, 7, 0, 2] → 44
The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] It’s a simple mathematical problem Compute value position using dimension position & size [0, 0, 7, 0, 2] → 44
Lost in cells? Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
Lost in cells? There’s a Javascript library that takes care of this.
Lost in cells? Are you a coder? Do you want to develop your own library? arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
Lost in cells? Here’s a simple solution to the “unflattening” problem. function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; } arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
Lost in cells? Or check the sample code section at http://json-stat.org/tools/ function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; }
The JSON-stat Ecosystem format libs conn. schema
thank you
all pictures from Blocks picture in slide 1: Soma, by Dru! (CC BY-NC) Cubic head in slide 13: Portrait by Thomas Leth-Olsen (CC BY) Rubik’s Cube in slide 18: BW Rubik’s Cube, by Gerwin Sturm (CC BY-SA) Shiny cube in slide 48: SONY DSC, by Javier Manso (CC BY-NC-SA) Walking girl in slide 61: Sterile, by Lee Nachtigal (CC BY) Atomium in slide 66: Fighting Gravity – Atomium, Brussels, by Jan Faborsky (CC BY-NC-ND) Eggs in slide 77: Eggs n. 3, by Leonardo D’Amico (CC BY-SA-ND)

JSON-stat, a simple light standard for all kinds of data disseminators