/ Shayon Mukherjee / blog

Do you really need Foreign Keys?

December 21, 2023
~6 mins

Before we dive in, let’s set the stage with a few pointers:

Foreign keys are a bit like that well-intentioned friend who insists on double-checking everything you do. They’re often recommended as a must-have for enforcing referential integrity checks in your database. But let’s chat about why you might not always need to invite this friend over, especially if the only reason is “well, everyone else does it.” Sure, foreign keys are good at keeping things in line, but I’d argue that you don’t always need them.

Performance Impact During Writes and Deletes

Foreign keys add overhead to every insert/update/delete operation. Every time a row is added, updated or deleted, PostgreSQL must check the foreign key constraints to ensure referential integrity. This process can slow down write operations, especially in systems with high transaction volumes. For example, if you’re inserting rows in the order of thousands into a table that has multiple foreign key constraints, PostgreSQL has to validate each of those rows against the corresponding foreign key constraints, which can lead to noticeable performance degradation.

Let’s walk through a simple scenario. Imagine you’ve set up your tables using this benchmark setup, which creates projects, employees, and employee_projects tables, all prepped with some dummy data. Now, observe the difference in delete performance with and without foreign keys:

Delete data with foreign key (Time: 490.068 ms)

--- Delete data with foreign key
BEGIN;
DELETE FROM employee_projects
  USING projects
  WHERE employee_projects.project_id = projects.id
    AND projects.name = 'special_task';

DELETE FROM projects
  WHERE projects.name = 'special_task';
COMMIT;

-- Time: 490.068 ms

Deleting data without foreign keys (Time: 96ms)

--- Delete data without foreign key
BEGIN;

ALTER TABLE employee_projects
  DROP CONSTRAINT employee_projects_project_id_fkey;
COMMIT;

-- Reanalyze tables
ANALYZE employee_projects;
ANALYZE projects;
ANALYZE employees;

BEGIN;

DELETE FROM employee_projects
  USING projects
  WHERE employee_projects.project_id = projects.id
    AND projects.name = 'special_task';

DELETE FROM projects
  WHERE projects.name = 'special_task';

COMMIT;

-- Time: 96 ms

Now, as you can see dropping the constraint made the operations up to 5x faster.

That said, if you still care about ensuring the relationship between the the tables are intact, but say just during writes. You can ensure them from a before_create hook in a Ruby on Rails application or equivalent in the framework of your choice. You can also look into DEFERRED constraints in PostgreSQL. However, I’d argue that if you are finding yourself using that every so often, might as well drop the Foreign Keys and explore other ways for validating references. Like a nightly job or something.

Unnecessary Lock Contention

Foreign key constraints can lead to lock contention or timeouts in a busy system. When a record in a referenced table is updated or deleted, PostgreSQL must wait for any locks on the corresponding rows in the referencing table to be released to validate the integrity. In high-traffic applications, these locks can lead to further contention, timeouts, and/or failed transactions, especially if there are long-running queries or updates happening simultaneously. This behavior can further degrade the performance of your application, leading to a poor user experience.

Complexity in Database Migrations and Scaling

As your application grows and your database schema evolves, you might find yourself needing to perform migrations — changes to the database structure — to support new features or improve performance. Foreign keys can add complexity to this process. For instance, when splitting a large table into multiple smaller ones (a common scaling technique), foreign keys can require additional steps to ensure referential integrity is maintained throughout the migration process, increasing the risk of errors and downtime.

A common way to handle this would be to drop or defer the constraint until the end of the migration and then introduce a CHECK CONSTRAINT. But I’d say if you are willing to accept some escaped writes w/o the validation, perhaps time to consider alternative design patterns.

Unwarranted indexes

When you introduce foreign keys to ensure referential integrity, they have a more direct impact on write operations than on reads - PostgreSQL must check that these operations don’t violate the referential integrity. Without an index on the foreign key column, this check can be slow because PostgreSQL might need to perform a sequential scan on the referenced table to ensure no referenced rows are left orphaned. This scan significantly slows down updates and deletes on large tables.

To speed up these referential integrity checks, you might add an index on the foreign key column. While this index improves the performance of updates and deletes by enabling quicker lookups, it also introduces additional overhead. Every insert, update, or delete on the referencing table now requires updating the index, which consumes more I/O and storage, and can slow down these operations.

In scenarios where you added a foreign key to a column that otherwise wouldn’t need indexing, you now face two compounding issues. First, the foreign key itself can slow down writes due to the need for referential integrity checks. Second, the index you added to mitigate this slowdown further impacts write performance and consumes additional resources.

While unlikely (in my experience) this type of setup only gets more aggravated with partitioned tables where each table with the Foreign keys require its own indexing. This can increase the time it takes to insert or update data because the database must check the foreign key constraints against potentially many partitions.

Practicing reflection

As you start on designing a new project, I think it’s perfectly fine to initially incorporate foreign keys. Yet, embracing a reflective approach is key I’d say. Plan to revisit this decision a few months later to evaluate their usefulness. This period of reflection allows you to assess whether the foreign keys are genuinely beneficial or if they’re adding unnecessary complexity, ensuring your design remains as effective and streamlined as possible.

Conclusion

All that said, I hope you found the post interesting and as it’s quite clear there is no one-size-fits-all solution. While Foreign Keys are great at upholding data integrity, their impact on performance, migrations, and system complexity is hard to ignore. So, take a moment to ponder your database schema. Are all those constraints pulling their weight, or could shedding a few lighten your system’s load?

last modified December 21, 2023