The Pitfalls of Polymorphism

November 28, 2021

In Rails, a polymorphic association allows a model to be connected to other models in a special way. It uses a polymorphic interface where the combination of a type column and an id column point to a row in another table.

For example, say I have an app with these models:

  • PhotoAlbum - Has many images
  • UserProfile - Has one image

And I want to be able to report on the number of photo album images created at certain points in time.

I create 3 tables:

create_table :images do |t|
  t.bigint :imageable_id, null: false
  t.string :imageable_type, null: false
  t.timestamps

  t.index [:imageable_type, :imageable_id]
  t.index :created_at
end

create_table :photo_albums do |t|
  t.timestamps
end

create_table :user_profiles do |t|
  t.timestamps
end

And I set up the models:

class Image < ApplicationRecord
  belongs_to :imageable, polymorphic: true
end

class PhotoAlbum < ApplicationRecord
  has_many :images, as: :imageable
end

class UserProfile < ApplicationRecord
  has_one :image, as: :imageable
end

At first glance, polymorphic relationships seem like a really clever idea. I only need to create a single images table. Then whenever I want to connect images to another model, I don't have to do anything in the database. I can just define a new polymorphic association on the model.

But there are a few pitfalls to this design pattern:

1. Referential Integrity Isn't Enforced

It isn't possible to define a foreign key constraint for the imageable_id column. That column can either point to a row in the photo_albums table or it can point to a row in the user_profiles table. Because we can't define a foreign key constraint, we run the risk of ending up with invalid data into the column at some point.

2. Unique Constraints Can't Differ

It's a good idea to set up a unique index in the database for has_one relationships because methods like find_each can fail when there are duplicates. But it isn't possible to add a unique index for the has_one on UserProfile without breaking the has_many on PhotoAlbum.

3. Class Names Affect Storage

Class names are stored in the imageable_type column as a VARCHAR:

SELECT id, imageable_id, imageable_type
FROM images;
+----+--------------+----------------+
| id | imageable_id | imageable_type |
+----+--------------+----------------+
|  1 |            1 | PhotoAlbum     |
|  2 |            1 | UserProfile    |
+----+--------------+----------------+

This means that longer class names will take up more data and indexing space than short class names.

4. Sharing Records Isn't Possible

Say that I want a way for a PhotoAlbum and a UserProfile to point to the same Image. This isn't currently possible because each Image is anchored to a single type and id combination.

5. The Database Can Make Strange Performance Decisions

Sometimes, presenting too many possible indexes to MySQL can cause it to make poor choices. The dual-key index associated with a polymorphic relationship can often get in the way.

Say that I have the following totals in my data:

  • Total Images: 50,000
  • PhotoAlbum Images: 25,000 (50%)
  • Images Created Before 2021-11-05 (12%)

When I run a query prior to 2021-11-04, MySQL is able to figure out that the created_at index is best:

EXPLAIN SELECT COUNT(*)
FROM images
WHERE imageable_type = 'PhotoAlbum'
  AND created_at < '2021-11-04';
+----+-------------+--------+------------+-------+----------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+
| id | select_type | table  | partitions | type  | possible_keys                                                              | key                        | key_len | ref  | rows | filtered | Extra                              |
+----+-------------+--------+------------+-------+----------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+
|  1 | SIMPLE      | images | NULL       | range | index_images_on_imageable_type_and_imageable_id,index_images_on_created_at | index_images_on_created_at | 8       | NULL | 4500 |    50.00 | Using index condition; Using where |
+----+-------------+--------+------------+-------+----------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+

But if I advance one more day, it will choose the imageable_type and imageable_id index, scanning 50% of the table rather than 12% of it:

EXPLAIN SELECT COUNT(*)
FROM images
WHERE imageable_type = 'PhotoAlbum'
  AND created_at < '2021-11-05';
+----+-------------+--------+------------+------+----------------------------------------------------------------------------+-------------------------------------------------+---------+-------+-------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys                                                              | key                                             | key_len | ref   | rows  | filtered | Extra       |
+----+-------------+--------+------------+------+----------------------------------------------------------------------------+-------------------------------------------------+---------+-------+-------+----------+-------------+
|  1 | SIMPLE      | images | NULL       | ref  | index_images_on_imageable_type_and_imageable_id,index_images_on_created_at | index_images_on_imageable_type_and_imageable_id | 768     | const | 25048 |    11.98 | Using where |
+----+-------------+--------+------------+------+----------------------------------------------------------------------------+-------------------------------------------------+---------+-------+-------+----------+-------------+

This can cause noticeable slowdowns on larger datasets.

So what would be an alternative approach?

One possible option is to use a shared images table but explicitly define 2 join tables that have different indexing and constraints:

create_table :images do |t|
  t.timestamps
end

create_table :photo_albums do |t|
  t.timestamps
end

create_table :photo_album_images do |t|
  t.bigint :photo_album_id, null: false
  t.bigint :image_id, null: false
  t.timestamps

  t.index [:photo_album_id, :image_id], unique: true
  t.index :image_id
  t.index :created_at
end
add_foreign_key :photo_album_images, :photo_albums
add_foreign_key :photo_album_images, :images

create_table :user_profiles do |t|
  t.timestamps
end

create_table :user_profile_images do |t|
  t.bigint :user_profile_id, null: false
  t.bigint :image_id, null: false
  t.timestamps

  t.index :user_profile_id, unique: true
  t.index :image_id
end
add_foreign_key :user_profile_images, :user_profiles
add_foreign_key :user_profile_images, :images

Then my models would use normal associations:

class Image < ApplicationRecord
end

class PhotoAlbumImage < ApplicationRecord
  belongs_to :image
  belongs_to :photo_album
end

class PhotoAlbum < ApplicationRecord
  has_many :photo_album_images
  has_many :images, through: :photo_album_images
end

class UserProfileImage < ApplicationRecord
  belongs_to :user_profile
  belongs_to :image
end

class UserProfile < ApplicationRecord
  has_one :user_profile_image
  has_one :image, through: :user_profile_image
end

References