Menu

Core Functionality

Relevant source files

This page provides an overview of the core features that enable asynchronous and parallel evaluation in mirai. It covers the primary API functions for creating asynchronous tasks, collecting results, managing daemon processes, and performing parallel operations.

For deployment across networks and clusters, see Distributed Computing. For integration with other packages and frameworks, see Integration and Interoperability. For advanced features like custom serialization and RNG, see Advanced Topics.


Overview of Core API

The mirai package provides a minimal set of core functions that work together to enable async and parallel computation:

FunctionPurposeReturn Value
mirai()Create async task'mirai' object (promise)
daemons()Set up daemon poolLogical TRUE/FALSE
call_mirai()Wait for and return miraimirai object with $data
collect_mirai()Wait for and return valueEvaluated result
unresolved()Check if still pendingLogical TRUE/FALSE
stop_mirai()Cancel pending taskLogical TRUE/FALSE
mirai_map()Parallel map operation'mirai_map' object
everywhere()Broadcast to all daemons'mirai_map' object

Core API Functions and Relationships

Sources: R/mirai.R1-702 R/daemons.R1-880 R/map.R1-332


Asynchronous Evaluation

The mirai() function creates an asynchronous task that evaluates an expression in a separate process. The function returns immediately with a 'mirai' object, which acts as a promise for the eventual result.

Execution Environment and Data Passing

Key Implementation Details:

  • Expression Handling: The expression .expr is captured using substitute() at R/mirai.R147 If it's a symbol that exists as a language object in the parent frame, the actual language object is used R/mirai.R165-167

  • Environment Scoping: Objects passed via ... are assigned to the global environment of the daemon process R/daemon.R228 while objects in .args remain local to the evaluation environment R/mirai.R32-36

  • Data Structure: Each mirai bundles the expression, globals, and OpenTelemetry context into a list structure R/mirai.R163-173

  • Random Seeds: If a seed is configured, each mirai receives a unique RNG stream via next_stream() R/mirai.R160-162

Sources: R/mirai.R143-198 R/daemon.R220-244


Result Collection Mechanisms

Multiple methods exist for retrieving results from mirai objects, each suited to different use cases:

MethodBehaviorReturn ValueBlocking
$dataDirect accessValue or 'unresolved' NANo
m[]Extract operatorEvaluated valueYes
call_mirai(m)Explicit callmirai with $dataYes
collect_mirai(m)Value extractionEvaluated valueYes
unresolved(m)Status checkTRUE/FALSENo
race_mirai(list)Wait for firstList of miraiYes (until one)

Result Collection Flow

Implementation Details:

  • Non-blocking Access: Accessing $data directly never blocks. Returns unresolved (logical NA with special class) if not yet complete R/mirai.R11-14

  • Blocking Collection: call_mirai() and collect_mirai() both wait for completion but differ in return value: call_mirai() returns the mirai object itself, while collect_mirai() returns the value directly R/mirai.R350 R/mirai.R443

  • Extract Operator: The <FileRef file-url="https://github.com/r-lib/mirai/blob/128718db/ method for mirai objects is equivalent to collect_mirai() [R/mirai.R#L608-L608" min=608 file-path="method for mirai objects is equivalent tocollect_mirai()` [R/mirai.R">Hii

  • Condition Variables: All blocking operations use condition variables (cv) for efficient waiting without polling R/daemons.R385-391

Sources: R/mirai.R295-452 R/mirai.R608-615


Daemon Management

The daemons() function configures persistent background processes that execute mirai tasks. It supports two primary modes: local daemons (launched on the same machine) and remote daemons (connecting from network locations).

Daemon Configuration Modes

Key Configuration Parameters:

  • n: Integer number of local daemons to launch. If zero, resets daemons for the compute profile R/daemons.R29

  • url: Character URL for remote daemons to connect to, e.g., 'tcp://hostname:5555' or 'tls+tcp://10.75.32.70:5555' R/daemons.R30-35

  • dispatcher: Logical value controlling whether to use a dispatcher process. Default is TRUE for optimal FIFO scheduling R/daemons.R38-41

  • sync: Logical value for synchronous mode, useful for debugging. Evaluates mirai in the current process R/daemons.R45-48

  • seed: Integer value to initialize reproducible L'Ecuyer-CMRG RNG streams, or NULL for non-reproducible R/daemons.R49-58

  • .compute: Character name for compute profile, allowing multiple independent daemon pools R/daemons.R244

Compute Profile Storage:

Each compute profile is stored as an environment in the .. namespace object R/daemons.R634 The environment contains:

  • sock: Socket connection
  • url: Connection URL
  • n: Number of daemons
  • dispatcher: Dispatcher URL (if applicable)
  • cv: Condition variable
  • seed: Seed configuration
  • stream: RNG stream state
  • dots: Additional daemon parameters

Sources: R/daemons.R233-333 R/daemons.R634-653


Dispatcher vs Direct Connection

The dispatcher architecture provides two modes of operation with different characteristics:

Dispatcher Mode (dispatcher=TRUE)

Direct Mode (dispatcher=FALSE)

Comparison:

FeatureDispatcher ModeDirect Mode
SchedulingFIFO (optimal)Round-robin (simple)
Queue LocationCentralized at dispatcherLocal at each daemon
CancellationSupported via stop_mirai()Not supported
TimeoutCan cancel after timeoutContinues execution
Resource UsageHigher (extra process)Lower (no dispatcher)
Custom SerializationSupportedNot supported
Socket Type (daemon)'poly' R/daemon.R95'rep' R/daemon.R95
Socket Type (host)'req' R/daemons.R669'req' R/daemons.R669

Dispatcher Process Implementation:

The dispatcher runs as a separate process launched via system2() R/daemons.R736-740 and manages:

Sources: R/daemons.R90-104 R/dispatcher.R30-207 R/daemon.R80-194


Daemon Process Lifecycle

Each daemon process follows a defined lifecycle from connection through evaluation to termination:

Daemon Process State Machine

Initialization Sequence:

  1. Socket Creation: Creates 'poly' (dispatcher) or 'rep' (direct) socket R/daemon.R95
  2. Notification Setup: Registers pipe notification with condition variable R/daemon.R102
  3. Connection: Dials into host/dispatcher URL R/daemon.R106
  4. TLS Configuration: Applies TLS config if provided R/daemon.R103-105
  5. Initial Data: Receives RNG seed and serialization config (dispatcher only) R/daemon.R124-130
  6. State Snapshot: Captures initial packages, options, and global environment R/daemon.R131

Task Evaluation:

The eval_mirai() function handles evaluation with error recovery R/daemon.R220-244:

Cleanup Process:

When cleanup=TRUE, do_cleanup() restores state R/daemon.R254-260:

Exit Conditions:

A daemon exits when:

Sources: R/daemon.R80-194 R/daemon.R220-275


Parallel Mapping

The mirai_map() function provides parallel map operations over lists, vectors, matrices, or dataframes by creating a separate mirai task for each element or row.

Map Operation Flow

Input Type Handling:

The function adapts based on input type R/map.R165-203:

  • List/Vector: Maps .f over each element R/map.R167-177
  • Dataframe: Maps over rows, passing columns as named arguments to .f R/map.R178-190
  • Matrix: Maps over rows, passing columns as positional arguments R/map.R191-203

Collection Options Implementation:

Each collection option is a compiled expression evaluated during collection R/map.R242-300:

  • .flat: Flattens results to vector, checking type consistency R/map.R242-265 Errors if types differ or error values present.

  • .progress: Shows CLI progress bar if available, else text indicator R/map.R270-295 Updates on each completed task.

  • .stop: Implements early stopping by checking each result for error values R/map.R300 Cancels remaining tasks via stop_mirai().

Multiple Options: Options can be combined: x[.stop, .progress] applies both behaviors R/map.R304-318

Nested Map Protection:

To prevent accidentally spawning excessive local daemons, calling daemons(n) within a map errors R/daemons.R300 The workaround for legitimate nested maps on remote machines is to split the call: daemons(url = local_url()); launch_local(n) R/map.R84-87

Sources: R/map.R156-214 R/map.R218-331


Broadcasting with everywhere()

The everywhere() function evaluates an expression on all connected daemons simultaneously, useful for loading packages, exporting common data, or setting up shared state.

Broadcasting Execution

Key Behaviors:

  • State Persistence: Changes to global environment, loaded packages, and options are persisted via snapshot() regardless of daemon cleanup setting R/mirai.R206-217 R/mirai.R266-267

  • Snapshot Mechanism: The expression .snapshot is prepended to evaluate on.exit(mirai:::snapshot(), add = TRUE) R/mirai.R701 R/mirai.R266-277

  • Synchronization: When using dispatcher, forces synchronization by requiring all daemons to complete before subsequent mirai evaluations R/mirai.R208-210

  • Minimum Daemons: The .min parameter ensures execution on at least N daemons, useful for waiting on remote daemon connections R/mirai.R220-223

  • RNG Independence: Does not affect the RNG stream for regular mirai calls when using a reproducible seed R/mirai.R212-217 R/mirai.R284-286

Daemon Count Determination:

For direct mode (no dispatcher), uses max(stat(sock, "pipes"), n) R/daemons.R280-281 For dispatcher mode, uses max(.min, info()[[1]]) R/daemons.R282 to count active connections.

Final Synchronization Mirai:

An empty mirai is created and stored in envir[["everywhere"]] along with the map results R/mirai.R290-291 This provides a synchronization point for dispatcher scheduling.

Sources: R/mirai.R200-293 R/mirai.R701 R/daemon.R262-264


Status and Monitoring

Functions for monitoring daemon pools and task execution:

Status Information Functions

Function Details:

  • status(): Returns a named list with connections and daemon URL R/daemons.R411-417 In dispatcher mode, includes mirai vector with awaiting/executing/completed counts R/daemons.R825-833

  • info(): Returns a named integer vector with connections, cumulative, awaiting, executing, and completed statistics R/daemons.R449-459 More succinct than status().

  • daemons_set(): Returns logical indicating whether daemons are configured for the compute profile R/daemons.R478

Dispatcher Query Protocol:

The query_dispatcher() function sends a command and receives statistics R/daemons.R720-724:

Sources: R/daemons.R379-503 R/daemons.R720-724 R/daemons.R825-833