https://github.com/epsilonlabs/ distributed-evl eclipse.org/epsilon https://git.eclipse.org/c/epsilon/org.eclipse.epsilon.git/ Sina Madani sina.madani@york.ac.uk
Motivation • Scalability is one of the main challenges in model-driven engineering • Large complex projects main beneficiaries of MDE approach • Such projects involve big models, many collaborators, complex workflows and model management programs • Most MDE tools not suitable for handling millions of model elements • Long execution times = lower productivity • One of the main benefits of MDE is working at higher level of abstraction to increase productivity • Therefore, improving performance of MDE tools is a good idea :)
Epsilon Validation Language (EVL) • Built on top of Epsilon Object Language (EOL) • Powerful imperative programming constructs • Independent of underlying modelling technology • A superset of Object Constraint Language (OCL) • Invariants may have dependencies on other invariants • pre and post blocks • Global variables • Cached operations • Fixes may be specified for unsatisfied invariants • ...and more 3
Java hashCode and equals contract 4 @cached operation AbstractTypeDeclaration getPublicMethods() : Collection { return self.bodyDeclarations.select(bd | bd.isKindOf(MethodDeclaration) and bd.modifier.isDefined() and bd.modifier.visibility == VisibilityKind#public); } context ClassDeclaration { constraint hasEquals { guard : self.satisfies("hasHashCode") check : self.getPublicMethods().exists(method | method.name == "equals" and method.parameters.size() == 1 and method.parameters.first().type.type.name == "Object" and method.returnType.type.isTypeOf(PrimitiveTypeBoolean)) } @lazy constraint hasHashCode { check : self.getPublicMethods().exists(method | method.name == "hashCode" and method.parameters.isEmpty() and method.returnType.type.isTypeOf(PrimitiveTypeInt)) } } check : self.getPublicMethods()
Constraint_A1 Parallel execution 5 Context_A Constraint_A2 Constraint_B1 Context_B Constraint_B2 Constraint_B3 3 2 1 2 1 123 12 12 45 45 Validation Logic Unsatisfied Constraints context A { constraint A1 { check {…} } constraint A2 { check {…} } } context B { constraint B1 { check {…} } constraint B2 { check {…} } constraint B3 { check {…} } }
Elements-based (data-parallel) for each context: for each element of the context kind: submit to executor service: ({ for each constraint in context: if constraint-element pair has not already been checked: if constraint is not lazy and constraint guard is satisfied: execute constraint check block; if check block returned false: add constraint-element pair to set of unsatisfied constraints; }); wait for jobs to complete; 6
What could possibly go wrong? • Concurrent access to mutable data structures • e.g. results, evaluated constraint-element pairs, caches at modelling layer • Variable scoping • How to deal with storage, retrieval and modification of local and global variables across different threads of execution? • Exception handling and error reporting • How to inform user where things went wrong with multiple threads? • Dependencies and lazy invariants • Re-evaluation vs. synchronization • Concurrency testing 7
Data Structures • Read-only • e.g. model, EVL program • Immutable, so no need to do anything • Write-only • e.g. the set of unsatisfied constraints • Can be thread-local and merged when needed • Read and writable • e.g. the constraint trace, frame stack, execution controller, caches... • Use concurrent data structure or thread-local with base delegation 8
<switch to Eclipse> • org.eclipse.epsilon.evl.concurrent.EvlModuleParallelElements • org.eclipse.epsilon.evl.context.concurrent.EvlContextParallel • org.eclipse.epsilon.eol.context.concurrent.EolContextParallel • org.eclipse.epsilon.eol.models.CachedModel • org.eclipse.epsilon.eol.execute.context.FrameStack
Atomic decomposition • org.eclipse.epsilon.evl.concurrent.atomic.* • org.eclipse.epsilon.evl.execute.atoms.* • Every EVL program can be decomposed into a finite, deterministically ordered List of rule-element pairs • A rule can be a ConstraintContext or Constraint • ConstraintContext defines model element types to be validated. • We can create a job for every model element • ConstraintContextAtom is a Tuple<ConstraintContext, Object>
context A { constraint invX { check {…} } constraint invY { check {…} } } context B { constraint invX { check {…} } constraint invY { check {…} } constraint invZ { check {…} } } EVL Model+ context A context B context A 33 context A 55 context A 77 context B 29 List<ConstraintContextAtom> context A 11 context A 22 context A 44 context A 66 context B 18 … …
Splitting the Jobs List • List can be split into sublists based on indices • The number of sublists is how we define granularity of jobs • More chunks = smaller sublists = higher granularity • o.e.e.erl.execute.data.JobBatch • Split jobs to List<JobBatch>
Splitting algorithm • If we have 𝑛 jobs, we can split the list into 𝑐 chunks so long as 𝑐 <= 𝑛 • We end up with a list of size 𝑛 𝑐 (maybe +1) • 𝑐 is the Batch Factor
Example with batch factor = 3 context A 33 context A 55 context A 77 context B 29 List<ConstraintContextAtom> context A 11 context A 22 context A 44 context A 66 context B 18 … List<JobBatch> 1 from = 1 to = 3 2 from = 4 to = 6 …
Executing a batch (simplified) • Note that each Constraint is executed sequentially • The above is a “flattened” version and is not how it’s actually implemented • For the intrigued, see o.e.e.erl.execute.context.concurrent.ErlContextParallel#executeJob(Object)
Distribution parameters • Shuffle the batches to ensure uniform distribution • Without static analysis, no way to know which jobs are demanding • Some % of jobs assigned directly to master • No need to be serialized and sent to itself – can be executed directly • Assuming similar performance / specs, 1 1+𝑤 where 𝑤 is number of workers • Batch Factor should be equal to the maximum local parallelism • Local parallelism = Runtime.getRuntime().availableProcessors() • Any lower reduces throughput – want to maximise parallelism per node • Batches are lightweight, low-footprint
Prerequisites for distribution • All participating processes (“nodes”) need to have: • A full copy of the program (and its dependencies / imports) • Full access to the entirety of all model(s) • i.e. not partial models • The full codebase (JAR file for example) with dependencies • Ability to send and receive data from the master • Sufficient resources (disk space, memory) to execute the entire program • as with the non-distributed implementation • Bottom line: Replicate the master node
Distribution Strategy • Single-master, multiple-slave architecture • Fully asynchronous to maximise efficiency • Master sends “configuration” to workers • Path to the EVL program • Model properties (key-value pairs) • Script parameters • Local parallelism (number of threads) • Workers execute assigned job batches and send back results • Dependencies are re-executed on workers when needed
Results processing • Only send back serializable UnsatisfiedConstraint instances • o.e.e.evl.distributed.execute.data.SerializableEvlResultPointer • Index of model element in the job list • Name of the Constraint • Master lazily adds this to Set<UnsatisfiedConstraint> • “Deserialization” (resolving the element, message, constraint etc.) only occurs on demand for each individual UnsatisfiedConstraint • hashCode and equals overridden to avoid unnecessary resolution • o.e.e.evl.distributed.execute.data.LazyUnsatisfiedConstraint • Workers send back aggregate profiling info when finished
JMS 2.0 implementation • Messaging API used to facilitate communication
Worker arguments • “basePath” – used for locating resources • Configuration substitutes master’s base path with a token when sending config to workers • Workers substitute their own local absolute path when locating resources • Broker URL • e.g. tcp://127.0.0.1:61616 • Session ID • To avoid conflicts between multiple running instances of distributed EVL on the same network • In practice, queue and topic names are appended with this ID
Asynchronous setup MASTER WORKERS 1 2 Load configuration (script, models...) 3 • Send workers jobs to jobs queue • Signal that all jobs have been sent to topic • Process results as they come in • Wait for all jobs (master & worker) to finish 3 Process next job from jobs queue 4 Send results from the job to results queue 2 Load configuration (script, models...) Send number of jobs processed and profiling info5Execute post block, report results etc.4 • Listen for workers on registration queue • Send configuration to workers Signal presence to registration queue1 • Base path • EVL script path • Models and their properties (paths, flags etc.) • Script parameters • Output file path Command-line arguments • Base path • Broker URL • Session ID Command-line arguments
Performance • Lab machines in CSE/231 (i5-8500, 16 GB RAM, Samsung SSD) • Reverse-engineered Java models • Data labels on bars show speedup relative to sequential EVL • Ask if you want full details on specs, procedure etc.
findbugs 16 workers 41 mins 5 mins 85 mins 10 mins 83 secs45 secs 160 mins 19 mins 2 mins 6 hrs 20 mins 44 mins 5 mins
87 workers + TR-1950X
1Constraint 2 million elements (i5-8500) 86.5% 95.7% 78.9% 78.1% 78.2% 89.5% 75.3% 75.1% 72.9% 100% 78.4% 94.3%
Single-threaded parallelism • Simulink driver is not thread-safe • Cannot use parallel EVL • Distributed EVL with localParallelism = 1 can help! • Each worker executes part of the script, so in theory should be faster • Tried this with 15 workers (i5-8500 lap PCs only) • Speedup was only 2.355 • pre block took up a lot of time • Model access dominates execution time • Random distribution of jobs minimises data locality
Future Work • Build a UI for configuration in Eclipse (“DT plugin”) • Intelligent assignment of jobs • Maximise data locality • Potential for partial model / script loading • Requires static analysis • More experiments with different modelling technologies • On-the-fly / lazy model loading & element resolution • e.g. something like Hawk • Fix the Flink and Crossflow implementations
Summary • Experiments & resources available from https://github.com/epsilonlabs/parallel-erl • Exploiting the finite and deterministic ordering of jobs can generalise to any other (read-only) model management task (in theory) • When model access is relatively cheap, speedup is exponential when combining parallel + distributed execution • Assumes all participating nodes have full access to resources
Constraint Dependencies • Dependencies are uncommon • Inefficient to add and look up constraint-element pair every time a constraint is checked • Solution: a proxy • Check if constraint is a known dependency target • If so, check the constraint trace for the specific constraint-element pair, and add the result if not present • Otherwise proceed as usual with the check • Result: a dependency will be evaluated only twice at most 30
Constraint.check
SatisfiesOperation
Constraints depended on hashHashCodehasHashCode Constraint Dependencies 33 Checked Elements Constraint_A ClassDeclaration hasEquals Validation Logic Unsatisfied Constraints 1 1 2 self.satisfies(“hasHashCode") 122 NOTE: hasHashCode is not lazy in this case
Alternative performance solutions • MDE community focuses extensively on: • Model-to-Model transformations • Incrementality • Laziness • Incrementality and laziness avoid unnecessary work • Incremental suitable for large models where only small changes are made to the program and/or model • Requires delta caching – overhead which reduces regular performance • Does not improve performance when work cannot be avoided • e.g. absence of cache, no unnecessary code, large changes in model / program, first invocation…

Distributed Model Validation with Epsilon

  • 1.
  • 2.
    Motivation • Scalability isone of the main challenges in model-driven engineering • Large complex projects main beneficiaries of MDE approach • Such projects involve big models, many collaborators, complex workflows and model management programs • Most MDE tools not suitable for handling millions of model elements • Long execution times = lower productivity • One of the main benefits of MDE is working at higher level of abstraction to increase productivity • Therefore, improving performance of MDE tools is a good idea :)
  • 3.
    Epsilon Validation Language(EVL) • Built on top of Epsilon Object Language (EOL) • Powerful imperative programming constructs • Independent of underlying modelling technology • A superset of Object Constraint Language (OCL) • Invariants may have dependencies on other invariants • pre and post blocks • Global variables • Cached operations • Fixes may be specified for unsatisfied invariants • ...and more 3
  • 4.
    Java hashCode andequals contract 4 @cached operation AbstractTypeDeclaration getPublicMethods() : Collection { return self.bodyDeclarations.select(bd | bd.isKindOf(MethodDeclaration) and bd.modifier.isDefined() and bd.modifier.visibility == VisibilityKind#public); } context ClassDeclaration { constraint hasEquals { guard : self.satisfies("hasHashCode") check : self.getPublicMethods().exists(method | method.name == "equals" and method.parameters.size() == 1 and method.parameters.first().type.type.name == "Object" and method.returnType.type.isTypeOf(PrimitiveTypeBoolean)) } @lazy constraint hasHashCode { check : self.getPublicMethods().exists(method | method.name == "hashCode" and method.parameters.isEmpty() and method.returnType.type.isTypeOf(PrimitiveTypeInt)) } } check : self.getPublicMethods()
  • 5.
    Constraint_A1 Parallel execution 5 Context_A Constraint_A2 Constraint_B1 Context_B Constraint_B2 Constraint_B3 32 1 2 1 123 12 12 45 45 Validation Logic Unsatisfied Constraints context A { constraint A1 { check {…} } constraint A2 { check {…} } } context B { constraint B1 { check {…} } constraint B2 { check {…} } constraint B3 { check {…} } }
  • 6.
    Elements-based (data-parallel) for eachcontext: for each element of the context kind: submit to executor service: ({ for each constraint in context: if constraint-element pair has not already been checked: if constraint is not lazy and constraint guard is satisfied: execute constraint check block; if check block returned false: add constraint-element pair to set of unsatisfied constraints; }); wait for jobs to complete; 6
  • 7.
    What could possiblygo wrong? • Concurrent access to mutable data structures • e.g. results, evaluated constraint-element pairs, caches at modelling layer • Variable scoping • How to deal with storage, retrieval and modification of local and global variables across different threads of execution? • Exception handling and error reporting • How to inform user where things went wrong with multiple threads? • Dependencies and lazy invariants • Re-evaluation vs. synchronization • Concurrency testing 7
  • 8.
    Data Structures • Read-only •e.g. model, EVL program • Immutable, so no need to do anything • Write-only • e.g. the set of unsatisfied constraints • Can be thread-local and merged when needed • Read and writable • e.g. the constraint trace, frame stack, execution controller, caches... • Use concurrent data structure or thread-local with base delegation 8
  • 9.
    <switch to Eclipse> •org.eclipse.epsilon.evl.concurrent.EvlModuleParallelElements • org.eclipse.epsilon.evl.context.concurrent.EvlContextParallel • org.eclipse.epsilon.eol.context.concurrent.EolContextParallel • org.eclipse.epsilon.eol.models.CachedModel • org.eclipse.epsilon.eol.execute.context.FrameStack
  • 10.
    Atomic decomposition • org.eclipse.epsilon.evl.concurrent.atomic.* •org.eclipse.epsilon.evl.execute.atoms.* • Every EVL program can be decomposed into a finite, deterministically ordered List of rule-element pairs • A rule can be a ConstraintContext or Constraint • ConstraintContext defines model element types to be validated. • We can create a job for every model element • ConstraintContextAtom is a Tuple<ConstraintContext, Object>
  • 11.
    context A { constraintinvX { check {…} } constraint invY { check {…} } } context B { constraint invX { check {…} } constraint invY { check {…} } constraint invZ { check {…} } } EVL Model+ context A context B context A 33 context A 55 context A 77 context B 29 List<ConstraintContextAtom> context A 11 context A 22 context A 44 context A 66 context B 18 … …
  • 12.
    Splitting the JobsList • List can be split into sublists based on indices • The number of sublists is how we define granularity of jobs • More chunks = smaller sublists = higher granularity • o.e.e.erl.execute.data.JobBatch • Split jobs to List<JobBatch>
  • 13.
    Splitting algorithm • Ifwe have 𝑛 jobs, we can split the list into 𝑐 chunks so long as 𝑐 <= 𝑛 • We end up with a list of size 𝑛 𝑐 (maybe +1) • 𝑐 is the Batch Factor
  • 14.
    Example with batchfactor = 3 context A 33 context A 55 context A 77 context B 29 List<ConstraintContextAtom> context A 11 context A 22 context A 44 context A 66 context B 18 … List<JobBatch> 1 from = 1 to = 3 2 from = 4 to = 6 …
  • 15.
    Executing a batch(simplified) • Note that each Constraint is executed sequentially • The above is a “flattened” version and is not how it’s actually implemented • For the intrigued, see o.e.e.erl.execute.context.concurrent.ErlContextParallel#executeJob(Object)
  • 16.
    Distribution parameters • Shufflethe batches to ensure uniform distribution • Without static analysis, no way to know which jobs are demanding • Some % of jobs assigned directly to master • No need to be serialized and sent to itself – can be executed directly • Assuming similar performance / specs, 1 1+𝑤 where 𝑤 is number of workers • Batch Factor should be equal to the maximum local parallelism • Local parallelism = Runtime.getRuntime().availableProcessors() • Any lower reduces throughput – want to maximise parallelism per node • Batches are lightweight, low-footprint
  • 17.
    Prerequisites for distribution •All participating processes (“nodes”) need to have: • A full copy of the program (and its dependencies / imports) • Full access to the entirety of all model(s) • i.e. not partial models • The full codebase (JAR file for example) with dependencies • Ability to send and receive data from the master • Sufficient resources (disk space, memory) to execute the entire program • as with the non-distributed implementation • Bottom line: Replicate the master node
  • 18.
    Distribution Strategy • Single-master,multiple-slave architecture • Fully asynchronous to maximise efficiency • Master sends “configuration” to workers • Path to the EVL program • Model properties (key-value pairs) • Script parameters • Local parallelism (number of threads) • Workers execute assigned job batches and send back results • Dependencies are re-executed on workers when needed
  • 19.
    Results processing • Onlysend back serializable UnsatisfiedConstraint instances • o.e.e.evl.distributed.execute.data.SerializableEvlResultPointer • Index of model element in the job list • Name of the Constraint • Master lazily adds this to Set<UnsatisfiedConstraint> • “Deserialization” (resolving the element, message, constraint etc.) only occurs on demand for each individual UnsatisfiedConstraint • hashCode and equals overridden to avoid unnecessary resolution • o.e.e.evl.distributed.execute.data.LazyUnsatisfiedConstraint • Workers send back aggregate profiling info when finished
  • 20.
    JMS 2.0 implementation •Messaging API used to facilitate communication
  • 21.
    Worker arguments • “basePath”– used for locating resources • Configuration substitutes master’s base path with a token when sending config to workers • Workers substitute their own local absolute path when locating resources • Broker URL • e.g. tcp://127.0.0.1:61616 • Session ID • To avoid conflicts between multiple running instances of distributed EVL on the same network • In practice, queue and topic names are appended with this ID
  • 22.
    Asynchronous setup MASTER WORKERS 1 2Load configuration (script, models...) 3 • Send workers jobs to jobs queue • Signal that all jobs have been sent to topic • Process results as they come in • Wait for all jobs (master & worker) to finish 3 Process next job from jobs queue 4 Send results from the job to results queue 2 Load configuration (script, models...) Send number of jobs processed and profiling info5Execute post block, report results etc.4 • Listen for workers on registration queue • Send configuration to workers Signal presence to registration queue1 • Base path • EVL script path • Models and their properties (paths, flags etc.) • Script parameters • Output file path Command-line arguments • Base path • Broker URL • Session ID Command-line arguments
  • 23.
    Performance • Lab machinesin CSE/231 (i5-8500, 16 GB RAM, Samsung SSD) • Reverse-engineered Java models • Data labels on bars show speedup relative to sequential EVL • Ask if you want full details on specs, procedure etc.
  • 24.
    findbugs 16 workers 41mins 5 mins 85 mins 10 mins 83 secs45 secs 160 mins 19 mins 2 mins 6 hrs 20 mins 44 mins 5 mins
  • 25.
    87 workers +TR-1950X
  • 26.
    1Constraint 2 millionelements (i5-8500) 86.5% 95.7% 78.9% 78.1% 78.2% 89.5% 75.3% 75.1% 72.9% 100% 78.4% 94.3%
  • 27.
    Single-threaded parallelism • Simulinkdriver is not thread-safe • Cannot use parallel EVL • Distributed EVL with localParallelism = 1 can help! • Each worker executes part of the script, so in theory should be faster • Tried this with 15 workers (i5-8500 lap PCs only) • Speedup was only 2.355 • pre block took up a lot of time • Model access dominates execution time • Random distribution of jobs minimises data locality
  • 28.
    Future Work • Builda UI for configuration in Eclipse (“DT plugin”) • Intelligent assignment of jobs • Maximise data locality • Potential for partial model / script loading • Requires static analysis • More experiments with different modelling technologies • On-the-fly / lazy model loading & element resolution • e.g. something like Hawk • Fix the Flink and Crossflow implementations
  • 29.
    Summary • Experiments &resources available from https://github.com/epsilonlabs/parallel-erl • Exploiting the finite and deterministic ordering of jobs can generalise to any other (read-only) model management task (in theory) • When model access is relatively cheap, speedup is exponential when combining parallel + distributed execution • Assumes all participating nodes have full access to resources
  • 30.
    Constraint Dependencies • Dependenciesare uncommon • Inefficient to add and look up constraint-element pair every time a constraint is checked • Solution: a proxy • Check if constraint is a known dependency target • If so, check the constraint trace for the specific constraint-element pair, and add the result if not present • Otherwise proceed as usual with the check • Result: a dependency will be evaluated only twice at most 30
  • 31.
  • 32.
  • 33.
    Constraints depended on hashHashCodehasHashCode ConstraintDependencies 33 Checked Elements Constraint_A ClassDeclaration hasEquals Validation Logic Unsatisfied Constraints 1 1 2 self.satisfies(“hasHashCode") 122 NOTE: hasHashCode is not lazy in this case
  • 34.
    Alternative performance solutions •MDE community focuses extensively on: • Model-to-Model transformations • Incrementality • Laziness • Incrementality and laziness avoid unnecessary work • Incremental suitable for large models where only small changes are made to the program and/or model • Requires delta caching – overhead which reduces regular performance • Does not improve performance when work cannot be avoided • e.g. absence of cache, no unnecessary code, large changes in model / program, first invocation…

Editor's Notes

  • #4 EVL is a hybrid language. It provides declarative structure like OCL but has general-purpose programming constructs.
  • #5 guard is equivalent to implies
  • #7 Execute the logic as before but with each element in a (potentially) different thread. Note that the order of evaluation is immaterial – they are independent.
  • #8 Lazy initialisation of data structures like caches can also be a problem.
  • #9 Cached operations and extended properties another example of read and write structure. “Thread-local base delegation” is basically a hybrid between using a thread-safe data structure and thread-locals. Each thread has its own non-concurrent structure but also has a pointer to the main thread’s data structure; which is thread-safe. The thread-local structure takes priority over the master thread’s structure.
  • #31 See paper for details