I finally know how CPUs work (w/ Casey Muratori)
Dive into the intricate world of CPU architectures with insights from a hardware expert. Learn about ARM, x86, speculative execution, and more.
Discover the strategies and challenges Figma faced to scale Postgres and achieve nearly infinite scalability.
Theo - t3․ggJune 20, 2024This article was AI-generated based on this episode
Figma faced significant challenges with Postgres scalability due to rapid growth in their user base and data volume.
They experienced high CPU utilization, large table sizes, and I/O operation limits.
By 2020, Figma was running on a single Postgres database hosted on AWS's largest instances, but this setup wasn’t sustainable.
To handle millions of users and billions of rows, they needed to implement both horizontal and vertical scaling.
Vertical partitioning provided quick, incremental scaling by splitting related tables into their own databases.
However, as data continued to grow, they also needed horizontal sharding to unlock nearly infinite scalability and avoid bottlenecks.
This combination of scaling techniques was crucial to maintain performance and reliability as Figma expanded.
Vertical partitioning in Postgres divides related tables into separate databases. This method helps manage large datasets more efficiently.
Figma used vertical partitioning to scale quickly and incrementally. They aimed to reduce CPU utilization and manage growing table sizes.
Benefits of vertical partitioning:
Limitations of vertical partitioning:
This method gave Figma a much-needed runway but wasn't a final solution. They eventually had to adopt horizontal sharding for further growth.
Figma's implementation of horizontal sharding was a meticulous nine-month journey. They employed several strategies to achieve this:
Logical Sharding: Figma started by logically sharding tables at the application layer, which allowed them to handle query routing without altering the physical database.
DBProxy Query Engine: They built a new Golang service called DBProxy. This service intercepted SQL queries and routed them dynamically to the correct physical shards. DBProxy also included load shedding and request hedging features.
Use of Postgres Views: To isolate logical shards, they utilized Postgres views. These views replicated data segments, allowing them to simulate physical sharding without immediate risk. This setup enabled gradual feature flag rollouts for de-risking.
These strategies ensured that Figma managed to shard their highest-traffic tables with minimal downtime and error.
Figma encountered several specific challenges while scaling Postgres.
Maintaining data consistency was crucial. With data split across multiple shards, ensuring every piece of data was correctly placed and accessible was complex.
"Our highest write tables were growing so quickly that we would soon exceed the maximum IO operations per second supported by Amazon's RDS."
Handling cross-shard transactions was another major hurdle. Transactions needed to be consistent even when involving multiple shards.
"Transactions now span multiple shards, meaning Postgres can no longer be used to enforce transactionality."
Ensuring minimal developer impact was essential. Figma's goal was to keep application developers focused on new features, not database issues.
"We wanted to handle the majority of our complex relational data models supported by our application. Application devs could then focus on building exciting new features in Figma."
These challenges required innovative solutions like logical sharding and custom query engines to achieve scalability while maintaining performance and reliability.
Figma employed various tools and techniques to achieve Postgres scalability with minimal downtime:
DBProxy: A Golang service that intercepts SQL queries and dynamically routes them to the correct shards.
Shadow Application Readiness: Assesses how live production traffic would behave under different sharding keys.
Logical Sharding: Separates sharding logic from physical sharding.
Postgres Views: Simulates physical sharding by creating multiple views per table.
These tools and techniques enabled Figma to manage scalability challenges efficiently.
Figma plans to further improve their database scalability with several future steps.
They need to implement horizontally sharded schema updates, which can be complex.
Another critical step is generating globally unique IDs for horizontally sharded primary keys.
Cross-shard transactions are essential for business-critical use cases. Figma aims to ensure atomicity in these transactions.
Developing distributed globally unique indexes is also on their agenda. Currently, they can only ensure uniqueness within a shard.
Additionally, they plan to create an ORM model that increases developer velocity and is compatible with horizontal sharding.
Lastly, they aim to automate re-shard operations, achieving this with the click of a button.
These steps are crucial for maintaining Figma's performance and reliability as they continue to grow.
Anticipated challenges include maintaining data consistency and minimizing downtime. They may also reevaluate their current approach, considering open-source or managed solutions.
Dive into the intricate world of CPU architectures with insights from a hardware expert. Learn about ARM, x86, speculative execution, and more.
Discover how Skip, a new reactive framework, aims to revolutionize backend development with its innovative approach.
Explore the evolution and future trends of JavaScript frameworks as we move into 2025, focusing on the changes, challenges, and innovations shaping the web development landscape.