InfoQ Homepage Infrastructure Content on InfoQ
-
Disaggregation in Large Language Models: the Next Evolution in AI Infrastructure
Large Language Model (LLM) inference faces a fundamental challenge: the same hardware that excels at processing input prompts struggles with generating responses, and vice versa. Disaggregated serving architectures solve this by separating these distinct computational phases, delivering throughput improvements and better resource utilization while reducing costs.
-
Ransomware-Resilient Storage: the New Frontline Defense in a High-Stakes Cyber Battle
Cybersecurity has evolved, with ransomware now primarily targeting data storage and backups. To combat this, modern defense strategies focus on making storage systems more resilient. Key tactics include using immutable storage that prevents data from being altered or deleted, employing AI-powered detection, and implementing air-gapping to create isolated, tamper-proof recovery points.
-
Zero-Downtime Critical Cloud Infrastructure Upgrades at Scale
Engineers can avoid common pitfalls in large-scale infrastructure upgrades by studying others' experiences. The article provides lessons learned from big firms like eBay and Snowflake, offering solutions for legacy systems, performance validation, and rollback planning. It emphasizes systematic preparation and clear communication to handle challenges and ensure zero-downtime upgrades at scale.
-
One Network: Cloud-Agnostic Service and Policy-Oriented Network Architecture
Bringing together software infrastructure leads to faster development time and easy control of large, spread-out systems through clear rules. In this QCon SF 2024 presentation, Anna Berenberg shared learnings and achievements when building One Network, addressing complex infrastructure layers, open-source integration, and uniform policy enforcement for improved reliability and security.
-
Ceph RBD Turns 15: a Story of Open Source Creation
Fifteen years ago, Ceph RBD began as a community-driven idea that grew into essential infrastructure powering today's cloud platforms. This insider story from Yehuda Sadeh-Weinraub reveals how two developers started a distributed storage that now supports OpenStack and Kubernetes through transparent, collaborative development.
-
Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies
Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.
-
Designing Resilient Event-Driven Systems at Scale
Learn how to design resilient event-driven systems that scale. Explore key patterns like shuffle sharding and decoupling queues to handle load spikes and failures. Understand common pitfalls like over-relying on retries and neglecting observability for robust, scalable architectures.
-
Binary Size Matters: the Challenges of Fitting Complex Applications in Storage-Constrained Devices
This article explores developing software for microcontrollers in C or C++, where constraints are the limited amount of volatile memory and the embedded hardware platform on which the software runs. It shows how to adopt languages like C++ while optimizing for binary size due to stringent hardware constraints, and trade off between runtime efficiency and binary size in architecture decisions.
-
Legacy Modernization: Architecting Real-Time Systems around a Mainframe
At its heart, our transformation journey is about breaking dependencies at multiple levels. Many enterprises face similar challenges with legacy systems: tightly coupled architectures that are difficult to scale, change, or maintain. For us at National Grid, the solution came through four complementary paradigms that worked together to enable different forms of decoupling.
-
How to Compute without Looking: a Sneak Peek into Secure Multi-Party Computation
This article shows how you can compute a function across multiple parties that do not trust each other without forcing them to share their individual inputs. This technique can be used to split secrets among parties, perform logical operations, or count votes in a way that ensures data privacy is preserved.
-
Eclipse LMOS: Launching AI Agents across Europe at Breakneck Speed
In this talk, the authors share some of our company’s key learnings in developing customer-facing LLM-powered applications deployed across Europe. They used multi-agent architecture and systems design to create an open-source set of tools, a framework, and a full-fledged platform to accelerate the development of AI agents. This is a summary of a presentation from InfoQ Dev Summit Boston 2024.
-
Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture
Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.