Reporting in microservices. How to optimize performance?

Insights /

Reporting in Microservices. How to Optimize Performance?

June 25, 2019

10 min read

Technology

Artem Barmin

Co-Founder

In this article

This is some text inside of a div block.

Business intelligence solutions use microservice systems for better scalability and flexibility. Optimization of their performance is a challenge for a development team. So, I've decided to describe the improvement of microservice architecture with the help of a reporting module system. The article includes its technical scheme, estimates, pros and cons of suitable technologies. It will be useful both for tech professionals and business owners.

Microservice architecture: Pros and Cons

Since 2010, the popularity of microservice design grows with the rise of DevOps and Agile development. Nowadays, Airbnb, Netflix, Uber, LinkedIn and other big companies benefit from microservices.

A monolith system has a single processor for all the implemented logic. Unlike it, microservice architecture consists of several independent processors. Each of them usually includes the common parts of an enterprise application:

user interface
database
server

Any change in the system leads to to new version of a new version deployment of the server part of the system. Let's consider the concept in detail.

What does microservices architecture really mean?

Microservice design is a set of services, but this definition is vague. I can single out 4 features that a microserver usually has:

meeting a specific business need
automatic deployment
usage of endpoints
the decentralized control of languages and data

In the image below, you can see microservice design compared to a monolith app.

What is scalability in microservices?

One of the main benefits of the microservice style development is its scalability. You can scale several services without changing the whole system, save resources, and keep the app less complex. Netflix is a prime example of this benefit. The company had to cope with the growing subscribers' database. The microservice design was an excellent solution for scaling it.

Microservice design speeds up app development and accelerates the product launch schedule. Each part can be rolled out separately, making the deployment of microservices quicker and easier.

However, each microservice needs a separate database. Otherwise, you can't use all the benefits of the modularization pattern. But the variety of databases leads to challenges in the reporting process. Keep reading to find out more about solving this problem.

What are the other advantages of microservices?

The ability to work in smaller teams and use an Agile approach

Flexibility in continuous integration and deployment

The possibility of convenient horizontal system scaling

Increased development team productivity

Simplification of the debugging and maintenance processes

What are the disadvantages of using microservices?

Despite all these benefits, microservice architecture has drawbacks, including the necessity of operating many systems and completing various tasks in the distributed environment. I believe the main pitfalls of using microservices are:

Management issues

The complexity of microservice design forces the developers to plan and act more carefully.

Security risks

The external API communication in microservice architecture leads to an increased risk of attacks.

Diversity of programming languages

It can be challenging to switch between them in the development and deployment processes.

BI project details: the issue of custom reports

A while back, the Freshcode team worked on a legacy EdTech project. The application consisted of over 10,000 files developed in Coldfusion. It was a 7-year old US-built app running on an MS SQL database. The system was overly complex and included many microservices. Its main parts were:

sophisticated financial and billing system
multi-organization structure for large group entities
workflow management tool for business processes
integrated bulk email, SMS, and live chat
online system for surveys, quizzes, examination
flexible assessment and learning management system

Freshcode worked on the project on the stage of migrating to a new interface. The product was preparing for the global launch. The microservice system was supposed to process great amounts of data. As for the app target audience, it was developed for

large education networks that manage hundreds of campuses
governments that have up to 200k schools, colleges and universities

Meanwhile, the EdTech app design was convenient both for large-scale education networks and small schools of 100 students.

The Freshcode development team faced the problem of managing and improving the performance of the complex microservice architecture. Our client wanted to build both SaaS and self-hosted systems, and we chose the technical solutions while keeping this fact in mind.

How to improve microservices performance?

Report generation relied on engaging different services and caused performance issues. That's why the Freshcode team decided to optimize the app's architecture by creating a separate reporting microservice. It received data from all the databases, saved it, and transformed it into custom reports.

In the image below, you can see the scheme of reporting microservices system and the technologies used for its implementation.

Yellow color marks all microservices in the system, each with an individual database. The reporting module tracks all changes via a messaging system. Then, it stores the new data in a separate report database.4

Reporting module implementation in 6 steps

Let's look at the six chief components of the reporting system, the technologies that can be used, and the best solutions.

Step №1: Change Data Capturing (CDC)

CDC tracks every single change (insert, update, delete) and applies logic to them. There were three technologies suitable for the first step of the microservice reporting system's implementation.

Apache NiFi

It can create a simple CDC with no coding. Apache NiFi has many built-in processors and supports data routing, transformation, and system mediation logic.

Pros:
• Cluster mode support and easy scaling
• Built-in PutToKafka and PutToKinesis activities
• Implementation of custom activities on any JVM language
• User-friendly UI

Cons:
• No predefined data format for messaging between activities
• Supports only JVM languages
• The quality of default activities isn't perfect
• No Oracle CDC activity

StreamSets Data Collector

It is a popular open-source solution for continuous big data ingestion in a microservice reporting system. Its main advantages are the simple creation of data pipelines and the support of many widespread technologies.

Pros:
• Built-in AWS S3, Kinesis, Kafka, Oracle, Postgres processors
• Open-source software can be customized to fit your needs
• Simple and convenient UI
• Support the most popular tools

Cons:
• It's a new solution that is still under active development
• StreamSets Data Collector has a steep learning curve

Matillion

The innovative ELT architecture has an easy-to-use interface. It is built specifically for Amazon Redshift, Google BigQuery, and Snowflake.

Pros:
• A proprietary tool
• Support for the development team
• Well-tested solution

Cons:
• Only specific databases can be used with this tool
• ELT architecture is not suitable for all projects

Oracle was the central database of our microservice reporting system. Considering this fact, the Freshcode team chose StreamSets Data Collector that offered Oracle CDC support out of the box.

Step №2: Messaging System

It allows sending messages between computer systems, as well as setting publishing standards for them.

Apache Kafka

As one of the most popular tools for real-time analytics, Apache Kafka has high throughput and reliability characteristics.

Pros:
• High throughput, fault tolerance, durability
• Great scalability, high concurrency
• Batch mode, native computation over streams
• An excellent choice for on-premise microservice reporting system

Cons:
• Requires DevOps knowledge for correct setup
• No built-in monitoring tool

AWS Kinesis

By simplifying collecting, processing, analyzing streaming data, Amazon Kinesis offers key capabilities for the cost-effective process of any scale.

Pros:
• Easy to manage and scale
• Seamless integration with other AWS services
• Almost no DevOps effort
• Built-in monitoring and alert system

Cons:
• Requires some cost optimization
• No way to use for on-premise software

Although Apache Kafka required a bit more effort to deploy and setup, we used it as a cost-efficient on-premise solution.

Step №3: Streaming Computation Systems

The high-performance computer system analyzes multiple data streams from many sources. It helps to prepare data before ingestion, so it is possible to denormalize/join it and add any info if needed.

Spark Streaming

By bringing Apache Spark's language-integrated API for stream processing, it allows writing streaming jobs the same way we write batch jobs.

Pros:
• Stateful exactly-once semantics out of the box
• Fault-tolerance, scalability
• In-memory computation

Cons:
• Pretty expensive to use
• Manual optimization
• No built-in state management

Apache Flink

It is useful for stateful computations over unbounded and bounded data streams. Apache Flink is suitable for all common cluster environments and performs computations at in-memory speed.

Pros:
• Exactly-once state consistency
• SQL on Stream & Batch Data
• Low latency, scalability, fault-tolerance
• Support of very large state

Cons:
• Requires in-depth programming expertise
• Complicated architecture
• Flink community is smaller than Spark's, but it is growing

Apache Samza

The scalable data processing engine for real-time analytics that can be used
in a microservice reporting system.

Pros:
• Can maintain a large state
• Low latency, high throughput, mature and tested at scale
• Fault-tolerant and high performance

Cons:
• At-least-once processing guarantee
• Lack of advanced streaming features (watermarks, sessions, triggers)

AWS Kinesis Services

The set of tools includes Data Firehose, Data Analytics, and Data Streams.
As a result, it helps to build powerful stream processing without implementing any custom code.
‍
Pros:
• Pay only for what you use
• The easiest way to process data streams in real time with SQL
• Handle any amount of streaming data
‍
Cons:
• No way to use on-premise
• The cost in a high-load environment will be higher compared to other solutions, but development and maintenance cost may be lower
• Hard to customize

AWS provides a great set of tools for ETL and data procession. It's a good starting point, but there is no way to deploy it on custom servers. That's why it doesn't fit on-premise solutions. Apache Flink is the most feature-rich and high-performance solution. It allows storing large application state (multi-terabyte). But it requires a larger team for development.

Step №4: Data Lake

The central repository of integrated data from one or more disparate sources, it stores current and historical data in one place. We can use data lakes for creating analytical reports, machine learning solutions, and more.

AWS S3

The object storage service offers industry-leading scalability, data availability, security, and performance.

Pros:
• Easy to integrate with other AWS services
• Designed for 99.999999999% (11 9's) of data durability
• Cost-effective for rarely accessed data
• Has an open source implementation with full API support
‍
Cons:
• High network pricing
• Previously S3 met availability issues, but it's not a problem for a Data Lake

Apache Hadoop

The primary data storage system used by Hadoop applications allows the storage and processing of large volumes of data.

Pros:
• Efficiently works with huge amounts of data
• Integration with many analytical and operational tools (Impala, Hive, HBase, Kuda, Kyle, and others)
‍
Cons:
• Complicated to deploy and manage
• Needs to set up monitoring and high availability

We decided to start with AWS S3 as it had an open-source implementation, so we could integrate it with the on-premise microservice reporting system.

Step №5: Report Databases

AWS Aurora

It is up to 5 times faster than standard MySQL databases and 3 times faster
than PostgreSQL databases.

Pros:
• Pretty fast SQL database
• High availability and durability
• Fully managed
• Easy to scale
‍
Cons:
• Poor performance for analytical reports in case of big data projects
• The minimally available instance is too big but easily replaceable by plain PostgreSQL

AWS Redshift

Redshift delivers 10 times faster performance than other data warehouses.
It uses machine learning, massively parallel query execution, and columnar storage on high-performance disk.

Pros:
• May run queries on external S3 files
• Easy to set up, use and manage
• Columnar storage

Cons:
• Doesn't enforce uniqueness
• Can't be used as a live app database
• It's mostly useful for running aggregation on a large amount of data

Kinetica

The vectorized, columnar, memory-first database designed for analytical (OLAP) workloads. Kinetica automatically distributes any workload across CPUs and GPUs for optimal results.
‍
Pros:
• High aggregation performance, run on GPU and CPU
• Supports materialized join views and can update them incrementally

Cons:
• GPU instances still cost a lot
• No way to join data between different partitions

Apache Druid

It generally works well with any event-oriented, clickstream, time series, or telemetry data, especially streaming datasets from Apache Kafka. Druid provides exactly-once consumption semantics from Apache Kafka and is commonly used as a sink for event-oriented Kafka topics.

Pros:
• Druid can be deployed in any *NIX environment on commodity hardware
• Best for interactive dashboards with full drill-down capabilities
• Stores only pre-aggregated data
‍
Cons:
• Isn't perfect for custom reports built by users
• Works only on time series data
• No full join support

All of these databases are amazing, but our client's goal was to create reports based on data from all microservices. The development team suggested AWS Aurora as the best choice for this task as it simplified the workflow a lot.

Step №6: Report Microservice

The report microservice was responsible for storing information about data objects and relations between them. It was also responsible for managing security and generating reports as they were based on the chosen data objects.

SaaS and self-hosted technological stacks

We prepared two technological stack options for the microservice reporting system. For the SaaS solution on AWS, we used:

StreamSets for CDC
Apache Kafka as a messaging system
AWS S3 DataLake
AWS Aurora as a report database
AWS ElasticCache as an in-memory data store

The reporting microservice was written in NodeJS. You can see the rough estimates for the SaaS solution in the table below.

Freshcode Tip

These are calculations for production deployment. The development process required a much smaller infrastructure.

This infrastructure was the most appropriate for the client's requirements. Its main advantage was the easy way to replace AWS services with self-hosted solutions. It allowed us to avoid code/logic duplication for different deployment schemas.

For the on-premise solution, we used Minio, PostgreSQL, and Redis accordingly. Their APIs were fully compatible, so we had no significant problems in the microservice reporting system.

The bottom line: custom reporting in microservices

Our client received the improved microservice reporting system and achieved these goals:

to update the app's architecture and design
to improve the product by adding new features
to optimize performance, enhance flexibility and scalability

If you are interested in solving the same problem or face other technical challenges, contact our team. We provide free expert advice for startups, small business owners, and enterprises. Check out the Freshcode portfolio to learn about the other exciting projects.

Build Your Team
with Freshcode

Request a quote

Author

Artem Barmin

Co-Founder

12 years in software development. Co-founder of IT agency and EdTech company.

Author Insights

Airtable: Benefits, Downsides, Use Cases

Top 5 Enterprise ETL Tools. How to Choose the Best?

Developer Happiness: Why Our Team Chooses Clojure