Logo
BlogCategoriesChannels

How Figma Scaled Postgres for Millions of Users

Discover the strategies and challenges Figma faced to scale Postgres and achieve nearly infinite scalability.

Theo - t3․ggTheo - t3․ggJune 20, 2024

This article was AI-generated based on this episode

Why did Figma need to scale Postgres?

Figma faced significant challenges with Postgres scalability due to rapid growth in their user base and data volume.

They experienced high CPU utilization, large table sizes, and I/O operation limits.

By 2020, Figma was running on a single Postgres database hosted on AWS's largest instances, but this setup wasn’t sustainable.

To handle millions of users and billions of rows, they needed to implement both horizontal and vertical scaling.

Vertical partitioning provided quick, incremental scaling by splitting related tables into their own databases.

However, as data continued to grow, they also needed horizontal sharding to unlock nearly infinite scalability and avoid bottlenecks.

This combination of scaling techniques was crucial to maintain performance and reliability as Figma expanded.

What is vertical partitioning in Postgres?

Vertical partitioning in Postgres divides related tables into separate databases. This method helps manage large datasets more efficiently.

Figma used vertical partitioning to scale quickly and incrementally. They aimed to reduce CPU utilization and manage growing table sizes.

Benefits of vertical partitioning:

  • Quick and significant scaling gains.
  • Maintains performance by isolating high-traffic tables.
  • Easy initial implementation.

Limitations of vertical partitioning:

  • Only postpones eventual bottlenecks.
  • Does not handle cross-table queries efficiently.
  • Cannot fully support infinite scalability.

This method gave Figma a much-needed runway but wasn't a final solution. They eventually had to adopt horizontal sharding for further growth.

How did Figma implement horizontal sharding?

Figma's implementation of horizontal sharding was a meticulous nine-month journey. They employed several strategies to achieve this:

  1. Logical Sharding: Figma started by logically sharding tables at the application layer, which allowed them to handle query routing without altering the physical database.

  2. DBProxy Query Engine: They built a new Golang service called DBProxy. This service intercepted SQL queries and routed them dynamically to the correct physical shards. DBProxy also included load shedding and request hedging features.

  3. Use of Postgres Views: To isolate logical shards, they utilized Postgres views. These views replicated data segments, allowing them to simulate physical sharding without immediate risk. This setup enabled gradual feature flag rollouts for de-risking.

These strategies ensured that Figma managed to shard their highest-traffic tables with minimal downtime and error.

What were the unique challenges Figma faced?

Figma encountered several specific challenges while scaling Postgres.

Maintaining data consistency was crucial. With data split across multiple shards, ensuring every piece of data was correctly placed and accessible was complex.

"Our highest write tables were growing so quickly that we would soon exceed the maximum IO operations per second supported by Amazon's RDS."

Handling cross-shard transactions was another major hurdle. Transactions needed to be consistent even when involving multiple shards.

"Transactions now span multiple shards, meaning Postgres can no longer be used to enforce transactionality."

Ensuring minimal developer impact was essential. Figma's goal was to keep application developers focused on new features, not database issues.

"We wanted to handle the majority of our complex relational data models supported by our application. Application devs could then focus on building exciting new features in Figma."

These challenges required innovative solutions like logical sharding and custom query engines to achieve scalability while maintaining performance and reliability.

What tools and techniques did Figma use?

Figma employed various tools and techniques to achieve Postgres scalability with minimal downtime:

  • DBProxy: A Golang service that intercepts SQL queries and dynamically routes them to the correct shards.

    • Features:
      • Load shedding
      • Request hedging
  • Shadow Application Readiness: Assesses how live production traffic would behave under different sharding keys.

    • Benefits:
      • Predicts necessary application logic changes
      • Ensures readiness for horizontal sharding
  • Logical Sharding: Separates sharding logic from physical sharding.

    • Advantages:
      • Low-risk, percentage-based rollout
      • Easy rollback if issues arise
  • Postgres Views: Simulates physical sharding by creating multiple views per table.

    • Perks:
      • Gradual, feature-flagged rollouts
      • De-risking logical sharding before physical changes

These tools and techniques enabled Figma to manage scalability challenges efficiently.

What are the future steps for Figma's database scaling?

Figma plans to further improve their database scalability with several future steps.

They need to implement horizontally sharded schema updates, which can be complex.

Another critical step is generating globally unique IDs for horizontally sharded primary keys.

Cross-shard transactions are essential for business-critical use cases. Figma aims to ensure atomicity in these transactions.

Developing distributed globally unique indexes is also on their agenda. Currently, they can only ensure uniqueness within a shard.

Additionally, they plan to create an ORM model that increases developer velocity and is compatible with horizontal sharding.

Lastly, they aim to automate re-shard operations, achieving this with the click of a button.

These steps are crucial for maintaining Figma's performance and reliability as they continue to grow.

Anticipated challenges include maintaining data consistency and minimizing downtime. They may also reevaluate their current approach, considering open-source or managed solutions.

FAQs

Loading related articles...