OpenAIMode (notebook style)

This blog post has the “slides” of the presentation “OpenAIMode demo” in which is demonstrated creation and evaluation of OpenAI interaction cells in Mathematica notebooks. (Via a special notebook style.)


Setup

  • It is assumed that the paclet OpenAILink is installed
    • … and the required setup steps are completed.
  • Install the paclet OpenAIMode
PacletInstall["AntonAntonov/OpenAIMode"] 
Needs["AntonAntonov`OpenAIMode`"] 

Demo

Let us show how the notebook style works:

  • Needs
  • OpenAIMode
  • Text completion cell (shortcut: “Shift-|”)
  • Tweak invocation parameters with SetOptions
  • Image generation cell (shortcuts: “Tab”)

Screenshots


How does it work?

Consider the following flowchart:


Concluding remarks

Similar work

Future plans (…maybe)

More documentation

Should this notebook style functions be part of OpenAILink?

Based on feedback:

  • Better default options
  • Additional OpenAI cells

Time series search engines over COVID-19 data

Introduction

In this article we proclaim the preparation and availability of interactive interfaces to two Time Series Search Engines (TSSEs) over COVID-19 data. One TSSE is based on Apple Mobility Trends data, [APPL1]; the other on The New York Times COVID-19 data, [NYT1].

Here are links to interactive interfaces of the TSSEs hosted (and publicly available) at shinyapps.io by RStudio:

Motivation: The primary motivation for making the TSSEs and their interactive interfaces is to use them as exploratory tools. Combined with relevant data analysis (e.g. [AA1, AA2]) the TSSEs should help to form better intuition and feel of the spread of COVID-19 and related data aggregation, public reactions, and government polices.

The rest of the article is structured as follows:

  1. Brief descriptions the overall process, the data
  2. Brief descriptions the search engines structure and implementation
  3. Discussions of a few search examples and their (possible) interpretations

The overall process

For both search engines the overall process has the same steps:

  1. Ingest the data
  2. Do basic (and advanced) data analysis
  3. Make (and publish) reports detailing the data ingestion and transformation steps
  4. Enhance the data with transformed versions of it or with additional related data
  5. Make a Time Series Sparse Matrix Recommender (TSSMR)
  6. Make a Time Series Search Engine Interactive Interface (TSSEII)
  7. Make the interactive interface easily accessible over the World Wide Web

Here is a flow chart that corresponds to the steps listed above:

TSSMRFlowChart

Data

The Apple data

The Apple Mobility Trends data is taken from Apple’s site, see [APPL1]. The data ingestion, basic data analysis, time series seasonality demonstration, (graph) clusterings are given in [AA1]. (Here is a link to the corresponding R-notebook .)

The weather data was taken using the Mathematica function WeatherData, [WRI1].

(It was too much work to get the weather data using some of the well known weather data R packages.)

The New York Times data

The New York Times COVID-19 data is taken from GitHub, see [NYT1]. The data ingestion, basic data analysis, and visualizations are given in [AA2]. (Here is a link to the corresponding R-notebook .)

The search engines

The following sub-sections have screenshots of the TSSE interactive interfaces.

I did experiment with combining the data of the two engines, but did not turn out to be particularly useful. It seems that is more interesting and useful to enhance the Apple data engine with temperature data, and to enhance The New Your Times engine with the (consecutive) differences of the time series.

Structure

The interactive interfaces have three panels:

  • Nearest Neighbors
    • Gives the time series nearest neighbors for the time series of selected entity.
    • Has interactive controls for entity selection and filtering.
  • Trend Finding
    • Gives the time series that adhere to a specified named trend.
    • Has interactive controls for trend curves selection and entity filtering.
  • Notes
    • Gives references and data objects summary.

Implementation

Both TSSEs are implemented using the R packages “SparseMatrixRecommender”, [AAp1], and “SparseMatrixRecommenderInterfaces”, [AAp2].

The package “SparseMatrixRecommender” provides functions to create and use Sparse Matrix Recommender (SMR) objects. Both TSSEs use underlying SMR objects.

The package “SparseMatrixRecommenderInterfaces” provides functions to generate the server and client functions for the Shiny framework by RStudio.

As it was mentioned above, both TSSEs are published at shinyapps.io. The corresponding source codes can be found in [AAr1].

The Apple data TSSE has four types of time series (“entities”). The first three are normalized volumes of Apple maps requests while driving, transit transport use, and walking. (See [AA1] for more details.) The fourth is daily mean temperature at different geo-locations.

Here are screenshots of the panels “Nearest Neighbors” and “Trend Finding” (at interface launch):

AppleTSSENNs

AppleTSSETrends

The New York Times COVID-19 Data Search Engine

The New York Times TSSE has four types of time series (aggregated) cases and deaths, and their corresponding time series differences.

Here are screenshots of the panels “Nearest Neighbors” and “Trend Finding” (at interface launch):

NYTTSSENNs

NYTTSSETrends

Examples

In this section we discuss in some detail several examples of using each of the TSSEs.

Apple data search engine examples

Here are a few observations from [AA1]:

  • The COVID-19 lockdowns are clearly reflected in the time series.
  • The time series from the Apple Mobility Trends data shows strong weekly seasonality. Roughly speaking, people go to places they are not familiar with on Fridays and Saturdays. Other work week days people are more familiar with their trips. Since much lesser number of requests are made on Sundays, we can conjecture that many people stay at home or visit very familiar locations.

Here are a few assumptions:

  • Where people frequently go (work, school, groceries shopping, etc.) they do not need directions that much.
  • People request directions when they have more free time and will for “leisure trips.”
  • During vacations people are more likely to be in places they are less familiar with.
  • People are more likely to take leisure trips when the weather is good. (Warm, not raining, etc.)

Nice, France vs Florida, USA

Consider the results of the Nearest Neighbors panel for Nice, France.

Since French tend to go on vacation in July and August ([SS1, INSEE1]) we can see that driving, transit, and walking in Nice have pronounced peaks during that time:

Of course, we also observe the lockdown period in that geographical area.

Compare those time series with the time series from driving in Florida, USA:

We can see that people in Florida, USA have driving patterns unrelated to the typical weather seasons and vacation periods.

(Further TSSE queries show that there is a negative correlation with the temperature in south Florida and the volumes of Apple Maps directions requests.)

Italy and Balkan countries driving

We can see that according to the data people who have access to both iPhones and cars in Italy and the Balkan countries Bulgaria, Greece, and Romania have similar directions requests patterns:

(The similarities can be explained with at least a few “obvious” facts, but we are going to restrain ourselves.)

The New York Times data search engine examples

In Broward county, Florida, USA and Cook county, Illinois, USA we can see two waves of infections in the difference time series:

References

Data

[APPL1] Apple Inc., Mobility Trends Reports, (2020), apple.com.

[NYT1] The New York Times, Coronavirus (Covid-19) Data in the United States, (2020), GitHub.

[WRI1] Wolfram Research (2008), WeatherData, Wolfram Language function.

Articles

[AA1] Anton Antonov, “Apple mobility trends data visualization (for COVID-19)”, (2020), SystemModeling at GitHub/antononcube.

[AA2] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub/antononcube.

[INSEE1] Institut national de la statistique et des études économiques, “En 2010, les salariés ont pris en moyenne six semaines de congé”, (2012).

[SS1] Sam Schechner and Lee Harris, “What Happens When All of France Takes Vacation? 438 Miles of Traffic”, (2019), The Wall Street Journal

Packages, repositories

[AAp1] Anton Antonov, Sparse Matrix Recommender framework functions, (2019), R-packages at GitHub/antononcube.

[AAp2] Anton Antonov, Sparse Matrix Recommender framework interface functions, (2019), R-packages at GitHub/antononcube.

[AAr1] Anton Antonov, Coronavirus propagation dynamics, (2020), SystemModeling at GitHub/antononcube.

Basic experiments workflow for simple epidemiological models

Introduction

The primary purpose of this document (notebook) is to give a “stencil workflow” for simulations using the packages in the project “Coronavirus simulation dynamics”, [AAr1].

The model in this notebook – SEI2R – differs from the classical SEIR model with the following elements:

  1. Two separate infected populations: one is “severely symptomatic”, the other is “normally symptomatic”
  2. The monetary equivalent of lost productivity due to infected or died people is tracked.

Remark: We consider the coronavirus propagation models as instances of the more general System Dynamics (SD) models.

Remark: The SEI2R model is a modification of the classic epidemic model SEIR, [Wk1].

Remark: The interactive interfaces in the notebook can be used for attempts to calibrate SEI2R with real data. (For example, data for the 2019–20 coronavirus outbreak, [WRI1].)

Workflow

  1. Get one of the classical epidemiology models.
  2. Extend the equations of model if needed or desired.
  3. Set relevant initial conditions for the populations.
  4. Pick model parameters to be adjust and “play with.”
  5. Derive parametrized solutions of model’s system of equations (ODE’s or DAE’s.)
    1. Using the parameters of the previous step.
  6. Using an interactive interface experiment with different values of the parameters.
    1. In order to form “qualitative understanding.”
  7. Get real life data.
    1. Say, for the 2019-20 coronavirus outbreak.
  8. Attempt manual or automatic calibration of the model.
    1. This step will most likely require additional data transformations and programming.
    2. Only manual calibration is shown in this notebook.

Load packages of the framework

The epidemiological models framework used in this notebook is implemented in the packages [AAp1, AAp2]; the interactive plots functions are from the package [AAp3].

Getting the model code

Here we take the SEI2R model implemented in the package “EpidemiologyModels.m”, [AAp1]:

We can show a tabulated visualization of the model using the function ModelGridTableForm from [AAp1]:

0ce5juav8jq3j
0ce5juav8jq3j

Model extensions and new models

The framework implemented with the packages [AAp1, AAp2, AAp3] can be utilized using custom made data structures that follow the structure of the models in [AAp1].

Of course, we can also just extend the models from [AAp1]. In this section we show how SEI2R can be extended in two ways:

  1. By adding a birth rate added to the Susceptible Population equation (the birth rate is not included by default)
  2. By adding a new equation for the infected deceased population.

Adding births term

Here are the equations of SEI2R (from [AAp1]):

1s7f291uic6xd
1s7f291uic6xd

Here we find the position of the equation that corresponds to “Susceptible Population”:

Here we make the births term using a birth rate that is the same as the death rate:

Here we add the births term to the equations of new model

Here we display the equations of the new model:

1o2fwon3gfhel
1o2fwon3gfhel

Adding infected deceased population equation

Here we add new population, equation, and initial condition that allow for tracking the deaths because of infection:

Here is how the model looks like:

0qk5d8mdnhvu2
0qk5d8mdnhvu2

Parameters and actual simulation equations code

Here are the parameters we want to experiment with (or do calibration with):

Here we set custom rates and initial conditions:

Here is the system of ODE’s we use with to do parametrized simulations:

0dz5k6hwx6os4
0dz5k6hwx6os4

Simulation

Straightforward simulation for one year with using ParametricNDSolve :

0d6wh46looawc
0d6wh46looawc

(The advantage having parametrized solutions is that we can quickly compute simulation results with new parameter values without solving model’s system of ODE’s; see the interfaces below.)

Interactive interface

opts = {PlotRange -> All, PlotLegends -> None, PlotTheme -> "Detailed", PerformanceGoal -> "Speed", ImageSize -> 300}; lsPopulationKeys = GetPopulationSymbols[modelSI2R, __ ~~ "Population"]; lsEconKeys = {MLP}; Manipulate[  DynamicModule[{lsPopulationPlots, lsEconPlots, lsRestPlots},    lsPopulationPlots =  ParametricSolutionsPlots[  modelSI2R["Stocks"],  KeyTake[aSol, lsPopulationKeys],  {aincp, aip, spf, crisp, criap}, ndays,  "LogPlot" -> popLogPlotQ, "Together" -> popTogetherQ,   "Derivatives" -> popDerivativesQ,   "DerivativePrefix" -> "\[CapitalDelta]", opts];    lsEconPlots =  ParametricSolutionsPlots[  modelSI2R["Stocks"],  KeyTake[aSol, lsEconKeys],  {aincp, aip, spf, crisp, criap}, ndays,  "LogPlot" -> econLogPlotQ, "Together" -> econTogetherQ,   "Derivatives" -> econDerivativesQ,   "DerivativePrefix" -> "\[CapitalDelta]", opts];    lsRestPlots =  ParametricSolutionsPlots[  modelSI2R["Stocks"],  KeyDrop[aSol, Join[lsPopulationKeys, lsEconKeys]],  {aincp, aip, spf, crisp, criap}, ndays,  "LogPlot" -> econLogPlotQ, "Together" -> econTogetherQ,   "Derivatives" -> econDerivativesQ,   "DerivativePrefix" -> "\[CapitalDelta]", opts];    Multicolumn[Join[lsPopulationPlots, lsEconPlots, lsRestPlots],   nPlotColumns, Dividers -> All,   FrameStyle -> GrayLevel[0.8]]  ],  {{aincp, 12., "Average incubation period (days)"}, 1, 60., 1, Appearance -> {"Open"}},  {{aip, 21., "Average infectious period (days)"}, 1, 100., 1, Appearance -> {"Open"}},  {{spf, 0.2, "Severely symptomatic population fraction"}, 0, 1, 0.025, Appearance -> {"Open"}},  {{crisp, 6, "Contact rate of the infected severely symptomatic population"}, 0, 30, 0.1, Appearance -> {"Open"}},  {{criap, 3, "Contact rate of the infected normally symptomatic population"}, 0, 30, 0.1, Appearance -> {"Open"}},  {{ndays, 90, "Number of days"}, 1, 365, 1, Appearance -> {"Open"}},  {{popTogetherQ, True, "Plot populations together"}, {False, True}},  {{popDerivativesQ, False, "Plot populations derivatives"}, {False, True}},  {{popLogPlotQ, False, "LogPlot populations"}, {False, True}},  {{econTogetherQ, False, "Plot economics functions together"}, {False, True}},  {{econDerivativesQ, False, "Plot economics functions derivatives"}, {False, True}},  {{econLogPlotQ, False, "LogPlot economics functions"}, {False, True}},  {{nPlotColumns, 1, "Number of plot columns"}, Range[5]},  ControlPlacement -> Left, ContinuousAction -> False]
0uhcbh5jg8g3a
0uhcbh5jg8g3a

Calibration over real data

It is important to calibrate these kind of models with real data, or at least to give a serious attempt to such a calibration. If the calibration is “too hard” or “impossible” that would indicate that the model is not that adequate. (If adequate at all.)

The calibration efforts can be (semi-)automated using special model-to-data goodness of fit functions and a minimization procedure. (See for example, [AA2].)

In this section we just attempt to calibrate SEI2R over real data taken from [WRI1] using a specialized interactive interface.

Real data

Here is COVID-19 data taken from [WRI1] for the Chinese province Hubei:

The total population in Hubei is

1kt1ikvs8tzqt
1kt1ikvs8tzqt
1cpkt5fvgh8hu
1cpkt5fvgh8hu

But we have to use a fraction of that population in order to produce good fits. That can be justified with the conjecture that the citizens of Hubei are spread out and it is mostly one city (Wuhan) where the outbreak is.

The real data have to be padded with a certain number of 0’s related to the infectious and incubation periods in order to make good fits. Such padding is easy to justify: if we observe recovered people that means that they passed through the incubation and infectious periods.

Calibration interactive interface

In this interface we put the Infected Severely Symptomatic Population (ISSP) to zero. That way it is easier to compare the real data with the simulated results (and pick parameter values that give close fits.) Also note that since SEI2R is simple in this interface the system is always solved.

opts = {PlotRange -> All, PlotLegends -> None, PlotTheme -> "Detailed", PerformanceGoal -> "Speed", ImageSize -> 300}; Manipulate[  DynamicModule[{modelSI2R = modelSI2R, lsActualEquations, aSol,   lsPopulationPlots, lsEconPlots, lsRestPlots},    modelSI2R = SetRateRules[modelSI2R, <|TP[t] -> population|>];  modelSI2R =   SetInitialConditions[  modelSI2R, <|SP[0] -> population - 1, ISSP[0] -> 0,   INSP[0] -> 1|>];  lsActualEquations =   Join[modelSI2R["Equations"] //.   KeyDrop[modelSI2R["RateRules"], lsFocusParams],   modelSI2R["InitialConditions"]];  aSol =  Association@Flatten@  ParametricNDSolve[  lsActualEquations, {SP, EP, INSP, RP, IDP}, {t, 0, 365},   lsFocusParams];    lsPopulationPlots =  ParametricSolutionsPlots[  modelSI2R["Stocks"],  KeyTake[aSol, GetPopulationSymbols[modelSI2R, __ ~~ "Population"]],  {aincp, aip, 0, criap, criap}, ndays, "Together" -> True,   opts];    Show[lsPopulationPlots[[1]],   ListPlot[  PadRealData[aRealData, Round[aincp + padOffset],   Round[aip + padOffset]], PlotStyle -> {Blue, Black, Red}]]  ],  {{population, 58160000/600, "Population"}, 58160000/1000, 58160000, 10000, Appearance -> {"Open"}},  {{padOffset, 0, "real data padding offset"}, -100, 100, 1, Appearance -> {"Open"}},  {{aincp, 6, "Average incubation period (days)"}, 1, 60, 1, Appearance -> {"Open"}},  {{aip, 32, "Average infectious period (days)"}, 1, 100, 1, Appearance -> {"Open"}},  {{criap, 0.8, "Contact rate of the infected normally symptomatic population"}, 0, 30, 0.1, Appearance -> {"Open"}},  {{ndays, 90, "Number of days"}, 1, 365, 1, Appearance -> {"Open"}},  ControlPlacement -> Left, ContinuousAction -> False]
0s4dnliwjni2v
0s4dnliwjni2v

Maybe good enough parameters

1v43idv1zv24j
1v43idv1zv24j

Basic reproduction number:

(*25.5966*)
0upbzla7bc2ok
0upbzla7bc2ok

Basic reproduction number:

(*59.7934*)

References

Articles

[Wk1] Wikipedia entry, “Compartmental models in epidemiology”.

[HH1] Herbert W. Hethcote (2000). “The Mathematics of Infectious Diseases”. SIAM Review. 42 (4): 599–653. Bibcode:2000SIAMR..42..599H. doi:10.1137/s0036144500371907.

[AA1] Anton Antonov, “Coronavirus propagation modeling considerations”, (2020), SystemModeling at GitHub.

[AA2] Anton Antonov, Answer of “Model calibration with phase space data”, (2019), Mathematica StackExchage.

Repositories & packages

[WRI1] Wolfram Research, Inc., “Epidemic Data for Novel Coronavirus COVID-19”, WolframCloud.

[AAr1] Anton Antonov, Coronavirus propagation dynamics project, (2020), SystemModeling at GitHub.

[AAp1] Anton Antonov, “Epidemiology models Mathematica package”, (2020), SystemsModeling at GitHub.

[AAp2] Anton Antonov, “Epidemiology models modifications Mathematica package”, (2020), SystemsModeling at GitHub.

[AAp3] Anton Antonov, “System dynamics interactive interfaces functions Mathematica package”, (2020), SystemsModeling at GitHub.

Phone dialing conversational agent

Introduction

This blog post proclaims the first committed project in the repository ConversationalAgents at GitHub. The project has designs and implementations of a phone calling conversational agent that aims at providing the following functionalities:

  • contacts retrieval (querying, filtering, selection),
  • contacts prioritization, and
  • phone call (work flow) handling.
  • The design is based on a Finite State Machine (FSM) and context free grammar(s) for commands that switch between the states of the FSM. The grammar is designed as a context free grammar rules of a Domain Specific Language (DSL) in Extended Backus-Naur Form (EBNF). (For more details on DSLs design and programming see [1].)

    The (current) implementation is with Wolfram Language (WL) / Mathematica using the functional parsers package [2, 3].

    This movie gives an overview from an end user perspective.

    General design

    The design of the Phone Conversational Agent (PhCA) is derived in a straightforward manner from the typical work flow of calling a contact (using, say, a mobile phone.)

    The main goals for the conversational agent are the following:

    1. contacts retrieval — search, filtering, selection — using both natural language commands and manual interaction,
    2. intuitive integration with the usual work flow of phone calling.

    An additional goal is to facilitate contacts retrieval by determining the most appropriate contacts in query responses. For example, while driving to work by pressing the dial button we might prefer the contacts of an up-coming meeting to be placed on top of the prompting contacts list.

    In this project we assume that the voice to text conversion is done with an external (reliable) component.

    It is assumed that an user of PhCA can react to both visual and spoken query results.

    The main algorithm is the following.

    1) Parse and interpret a natural language command.

    2) If the command is a contacts query that returns a single contact then call that contact.

    3) If the command is a contacts query that returns multiple contacts then :

    3.1) use natural language commands to refine and filter the query results,

    3.2) until a single contact is obtained. Call that single contact.

    4) If other type of command is given act accordingly.

    PhCA has commands for system usage help and for canceling the current contact search and starting over.

    The following FSM diagram gives the basic structure of PhCA:

    "Phone-conversational-agent-FSM-and-DB"

    This movie demonstrates how different natural language commands switch the FSM states.

    Grammar design

    The derived grammar describes sentences that: 1. fit end user expectations, and 2. are used to switch between the FSM states.

    Because of the simplicity of the FSM and the natural language commands only few iterations were done with the Parser-generation-by-grammars work flow.

    The base grammar is given in the file "./Mathematica/PhoneCallingDialogsGrammarRules.m" in EBNF used by [2].

    Here are parsing results of a set of test natural language commands:

    "PhCA-base-grammar-test-queries-125"

    using the WL command:

    ParsingTestTable[ParseJust[pCALLCONTACT\[CirclePlus]pCALLFILTER], ToLowerCase /@ queries] 

    (Note that according to PhCA’s FSM diagram the parsing of pCALLCONTACT is separated from pCALLFILTER, hence the need to combine the two parsers in the code line above.)

    PhCA’s FSM implementation provides interpretation and context of the functional programming expressions obtained by the parser.

    In the running script "./Mathematica/PhoneDialingAgentRunScript.m" the grammar parsers are modified to do successful parsing using data elements of the provided fake address book.

    The base grammar can be extended with the "Time specifications grammar" in order to include queries based on temporal commands.

    Running

    In order to experiment with the agent just run in Mathematica the command:

    Import["https://raw.githubusercontent.com/antononcube/ConversationalAgents/master/Projects/PhoneDialingDialogsAgent/Mathematica/PhoneDialingAgentRunScript.m"]

    The imported Wolfram Language file, "./Mathematica/PhoneDialingAgentRunScript.m", uses a fake address book based on movie creators metadata. The code structure of "./Mathematica/PhoneDialingAgentRunScript.m" allows easy experimentation and modification of the running steps.

    Here are several screen-shots illustrating a particular usage path (scan left-to-right):

    "PhCA-1-call-someone-from-x-men"" "PhCA-2-a-producer" "PhCA-3-the-third-one

    See this movie demonstrating a PhCA run.

    References

    [1] Anton Antonov, "Creating and programming domain specific languages", (2016), MathematicaForPrediction at WordPress blog.

    [2] Anton Antonov, Functional parsers, Mathematica package, MathematicaForPrediction at GitHub, 2014.

    [3] Anton Antonov, "Natural language processing with functional parsers", (2014), MathematicaForPrediction at WordPress blog.