Jose Lasa’s Post

View profile for Jose Lasa, graphic

Chief Data Officer - Chief Technology Officer - Chief Information Officer - Software Engineering - Software Development - Artificial Intelligence

How can a highly scalable and fault-tolerant system that supports high-throughput ingestion and interactive query latencies be designed? A Meta team designed Logarithm, a logging engine for AI training workflows and services. It's a hosted, serverless, multitenant service used only internally at Meta that consumes and indexes these logs and provides an interactive query interface to retrieve and view logs. It provides strong guarantees on availability, durability, freshness, completeness, and query latency. At a high level, Logarithm comprises the following components: - 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐞𝐬 emit logs using logging APIs—the APIs support emitting unstructured log lines along with typed metadata key-value pairs (per line). - 𝐀 𝐡𝐨𝐬𝐭-𝐬𝐢𝐝𝐞 𝐚𝐠𝐞𝐧𝐭 discovers the format of lines and parses lines for common fields, such as timestamp, severity, process ID, and callsite. - The resulting object is buffered and written to a 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐞𝐝 𝐪𝐮𝐞𝐮𝐞 (for that log stream), providing durability guarantees with days of object lifetime. - 𝐈𝐧𝐠𝐞𝐬𝐭𝐢𝐨𝐧 𝐜𝐥𝐮𝐬𝐭𝐞𝐫𝐬 read objects from queues and support additional parsing based on user-defined regex extraction rules – the extracted key-value pairs are written in the line's metadata. - 𝐐𝐮𝐞𝐫𝐲 𝐜𝐥𝐮𝐬𝐭𝐞𝐫𝐬 support interactive and bulk queries on one or more log streams with predicate filters on log text and metadata. The Logarithm design has centered around simplicity for scalability guarantees. The team continuously builds domain-specific and agnostic log analytics capabilities within or layered on Logarithm with appropriate pushdowns for performance optimizations. They also invest in storage and query-time improvements, such as lightweight disaggregated inverted indices for text search, storage layouts optimized for queries, and distributed debugging UI primitives for AI systems. Complete post: https://lnkd.in/dtnTN-jb

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics