InfoQ Homepage Monitoring Content on InfoQ
-
Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability
Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.
-
Datadog Launches Monocle, a Unified Rust-Powered Real-Time Metrics Engine
Datadog has launched Monocle, a new real-time time series storage engine written in Rust. The system unifies the company’s metrics storage infrastructure, delivering higher ingestion throughput and lower query latency while reducing operational complexity. Monocle replaces several generations of storage backends, addressing concurrency challenges and scaling limits that accumulated over time.
-
Improved Application Insights Code Optimizations Identify .NET Performance Bottlenecks Automatically
Microsoft is expanding .NET developers’ toolset with enhancements to Code Optimizations. This feature is part of Azure Monitor offering and now works with the .NET Profiler in Application Insights to automatically detect CPU, memory, and threading issues in production apps and give code‑level recommendations to fix them.
-
Honeycomb Hosted MCP Brings Observability Data into the IDE
Honeycomb has launched its hosted Model Context Protocol (MCP), giving developers real-time access to observability data inside IDEs and AI tools like GitHub Copilot. Available as a managed service on AWS Marketplace, it removes the need for self-hosting and streamlines debugging by surfacing traces, metrics, and logs without context-switching.
-
Grafana 12.1 Brings Built-in Diagnostics and Enhanced Alerting
Grafana 12.1 is here, elevating system reliability and alert management with features like Grafana Advisor for health checks, a revamped alerting interface, and trendline transformations for smarter data visualization. Enhanced dashboard interactivity and improved variable handling empower teams to scale efficiently. Experience the new era of Grafana on Cloud or self-hosted!
-
Microsoft Azure Enhances Observability with OpenTelemetry Support for Logic Apps and Functions
Microsoft has expanded OpenTelemetry support in Azure Logic Apps and Functions, enhancing observability and interoperability across platforms. This open-source framework enables seamless data generation and correlation, enhancing diagnostics beyond standard telemetry. With streamlined configuration and integration, Azure's offerings aim for standardized observability across cloud services.
-
Grafana 12 Launches with Observability as Code and Dynamic Dashboard Features
Grafana Labs have launched Grafana 12, bringing significant updates to its visualisation and dashboarding platform. Several new key features are now generally available, including Git Sync, dynamic dashboards, and improvements to Drilldown which gives code-free point-and-click insights into data, and a Cloud Migration assistant.
-
Prometheus 3.0 Brings New UI, OpenTelemetry Support and More
Version 3.0 of the popular open-source monitoring system Prometheus has been released, marking the tool's first major update in seven years. A variety of new features have been added, with improvements aimed at enhancing the user experience and streamlining workflows have been made.
-
Distributed Tracing Tool Jaeger Releases Version 2 with OpenTelemetry at the Core
Version 2 of the Jaeger project, a leading open-source distributed tracing platform, has been released. This release contains a significant architectural transformation, as it brings Jaeger and its components into the OpenTelemetry framework.
-
Stripe Rearchitects Its Observability Platform with Managed Prometheus and Grafana on AWS
Stripe replaced its observability platform, which used a third-party vendor solution, with a new architecture utilizing managed services on AWS. The company made the move due to scalability limits, reliability issues, and increasing costs while transitioning to microservices. The migration involved dual-writing metrics, translating assets, validation, and user training.
-
Leveraging eBPF for Improved Infrastructure Observability
To efficiently and effectively investigate multi-tenant system performance, Netflix has been experimenting with eBPF to instrument the Linux kernel to gather continuous, deeper insights into how processes are scheduled and detect "noisy neighbors".
-
Meta Optimises AI Inference by Improving Tail Utilisation
Meta (formerly Facebook) has reported substantial improvements in the efficiency and reliability of its machine-learning model serving infrastructure by focusing on optimising tail utilisation.
-
Improving Mobile Test Automation with Continuous Integration, Central Logging, and Metrics Analysis
Continuous integration can enhance automated mobile testing. Test data from multiple mobile devices running parallel tests can be consolidated to support monitoring. Jira tickets from manual testing can trigger the build process to ensure that testers will have the correct software version to do the manual testing.
-
Apache Skywalking v10: Application Performance Monitoring Tool for Distributed Systems
The Apache Software Foundation has released version 10 of Apache SkyWalking, an open-source observability platform designed to provide comprehensive monitoring, tracing, and analytics for distributed systems. It features many new features and enhancements...
-
Cloudflare AI Gateway Now Generally Available
Cloudflare has recently announced that AI Gateway is now generally available. Described as a unified interface for managing and scaling generative AI workloads, AI Gateway allows developers to gain visibility and control over AI applications.