Relational Database Table Structure RFC

Published On
Payload Relational Database Support - Drizzle ORM
Payload Relational Database Support - Drizzle ORM
We've made a lot of progress toward relational database support, and we'd like to take a second and showcase our current approach and gather feedback from our community.

If you've been following Payload over the past few months, you'll know that we're actively working toward adding additional database support. The team has been making excellent progress toward this goal, and most of the hard work is already complete.

We've recently streamlined the abstractions within our codebase to rely on an "adapter-style" database architecture, wherein additional database adapters can be built and supported over time both by Payload and our community.

The database adapter pattern itself is now done and functional using our existing MongoDB implementation.

In addition, we've completed work on establishing a fully-featured migrations workflow, and have also implemented database transactions natively within Payload itself.

Even if you're on MongoDB and intend to stay there, you're about to get a lot of new database goodies. MongoDB is certainly not going anywhere. Our support there is only getting better as a result of the work we're putting in here.

Now that the framework is laid and our intended abstractions are identified, we're working on actually mapping Payload fields and functionality over to a relational database model.

The proof-of-concept

Over the past few months, we've looked at a ton of different ORMs to use as we build out our first relational database adapter. But we've been most impressed with Drizzle, and today we're sharing a proof-of-concept that showcases how we plan to map our fields to tables within a given relational database. We've written out a "reference" Payload config, and then translated the Payload config itself over to its Drizzle schema equivalent.

We've started with SQLite so as to plan for the "lowest common denominator" and to get up and running quickly, as well as gather feedback as succinctly as possible. Once we have the low-level pieces in place, we plan to move to Postgres.

What we need out of an ORM

Here's a quick list of the concerns we tried to keep top-of-mind as we identified which ORM we want to start with.

Performance

While working through these database initiatives, we know that we want to remain as performant as possible. Right now we're significantly faster than other headless CMS and we want it to stay that way. There are lots of factors involved in retaining Payload's speed in the relational world, and so far what we've seen from Drizzle here has made us feel confident. We love Drizzle's ability to make a single query to the database and retrieve all of the required tables that we need in one shot. And we think it's going to work very well with Payload. Due to the highly dynamic nature of the data Payload stores, performance is far and away the most important factor for us as we choose an ORM.

No forced filesystem reliance

We want the Payload config to remain as the single source-of-truth for your schema design. We don't want you to have to maintain a Payload config, and then also deal with an auto-generated Prisma schema file or similar. Other headless CMS require you to design a schema within the CMS itself, but they turn around and dynamically generate a schema configuration for their used ORM which you also have to store in your repo, but NEVER touch. This is duplicative, messy, and error-prone.

We think that how we've established our approach with Mongoose so far has worked well here. We dynamically map an incoming Payload config to Mongoose models, and then leverage Mongoose models within our code. There is no reliance on the file system. Drizzle supports this quite well.

HTTP proxy support

Our longer-term goal is to be able to run fully on the edge, and traditional database connections are less than ideal for that goal. If we can build an ORM that supports this goal, I'm happy. And I think you will be too.

No duplicative migration tooling

We don't want you to have to learn the underlying ORM as well as learn Payload. We now give you a first-party migration architecture, and the migration files themselves are all in TypeScript. We want to make sure that whatever ORM we choose works with Payload logic, not Payload working with ORM logic. Drizzle appears to shine here as well, and the team is actually working on exporting a few functions for us to use which will make our integration seamless. Note - the PoC does not showcase Payload migrations (it only leverages straight-up Drizzle migrations). Payload + Drizzle migrations working in tandem will come soon.

Table structure

Moreso than the identification of which ORM we plan to adopt first, the biggest goal of our proof-of-concept was to determine how to map our dynamic and nested data structures to a relational database structure.

To accomplish this goal, our PoC is barebones and is a completely separate repo from Payload itself. It simply showcases the table structure that we're working toward. We have attempted to identify how to map fields, and then how to query the data efficiently.

Group / named tab fields

Right now, in MongoDB, group / named tab fields are super trivial to store:

1
{
2
myGroup: {
3
myNestedField: 'hello',
4
another: 123,
5
}
6
}

We don't want to jump straight to a JSON column if we can avoid it. Not only that, but we'd also like to support SQLite, which is lacking in this area. So what we plan to do instead is just create "path-like" column names out of nested fields, using an underscore to join field names. From there, we will "transform" the data on its way out of the database back into its intended shape. The above data structure will end up looking like this:

my_group_my_nested_fieldmy_group_another
hello123
Array fields

Arrays are fairly straightforward to store in a relational DB world. They're just a separate table which will maintain its own set of columns and locales.

In the PoC, we have demonstrated a posts collection, which has a myArray field. Payload will store that field's array data into a separate table called posts_my_array. The table name is created from the collection slug and the field path itself, separated by underscores. Each unique array field on a given collection will get its own table, based on the name / path of the array field.

Blocks fields

Blocks are a bit more complicated than arrays, though, because every block has different fields (columns). So what we've done here is created a table for each distinct block on a given collection type.

Take for example the following Payload collection config:

1
export const Posts: CollectionConfig = {
2
slug: 'posts',
3
fields: [
4
{
5
name: 'myBlocks',
6
type: 'blocks',
7
blocks: [
8
{
9
slug: 'block1',
10
fields: [
11
{
12
name: 'text',
13
type: 'text',
14
},
15
]
16
},
17
{
18
slug: 'block2',
19
fields: [
20
{
21
name: 'number',
22
type: 'number',
23
}
24
]
25
}
26
]
27
},
28
]
29
}

Two tables will be scaffolded for the above Payload config - posts_block1 and posts_block2.

Notice how we are creating table names based on the block slug - NOT the field name, like in the case of arrays. No matter how many times these unique blocks are used across a given collection config, the block data will go into the main block table.

We are currently thinking to key the block tables based on block slug, instead of field path as seen in the array field above because this will dramatically cut down on the amount of tables that you'd need, as well as the joins that need to be run.

This has some important ramifications, though. Whereas in MongoDB, you could have many blocks, with different schemas, with the same slug, in the relational world, you'll need to make sure your blocks have different slugs if they have different field schemas. This will probably affect 0.5% of developers building with Payload, though, and we have decided that the table simplicity outweighs the value of having different block schemas with the same block slug.

Here's what the posts_block1 table will look like:

idnon_localized_text_path_locale_order_parent_id
1hellomyBlocksen11

Notice a few extra columns? We will use the _path column to store the Payload field path that the blocks correspond to, and the _order column to store the order of the blocks so that we can maintain the integrity of the order that the blocks are returned in. Finally, the _locale field will store the locale of the row in cases where the myBlocks field has localized: true.

Localized content

Being that Payload stores locales on a field-level, we need a way to store many different locales for a single field. We've opted for a structure where every table of a given collection (the main table, each array table, each block table) will have a corresponding _locales table to store their localized field data.

Taking the most simple example, the posts collection will have a posts_locales table that will be joined in and merged.

So let's take the following Payload config:

1
export const Posts: CollectionConfig = {
2
slug: 'posts',
3
fields: [
4
{
5
name: 'nonLocalizedField',
6
type: 'text',
7
},
8
{
9
name: 'localizedField',
10
type: 'text',
11
localized: true,
12
},
13
{
14
name: 'myArray',
15
type: 'array',
16
fields: [
17
{
18
name: 'subLocalizedField',
19
type: 'text',
20
localized: true,
21
},
22
],
23
},
24
]
25
}
26
27
// In MongoDB, we store (and return) this as the following:
28
// {
29
// nonLocalizedField: 'whatever',
30
// localizedField: {
31
// en: 'hello in english',
32
// es: 'hola en espanol',
33
// },
34
// myArray: [
35
// {
36
// subLocalizedField: {
37
// en: 'hello',
38
// es: 'hola',
39
// }
40
// }
41
// ]
42
// }

But in relational DB land this is a bit more complex. This config would produce the following tables:

  • posts - this is the main table itself, with columns being id, created_at, updated_at, and non_localized_field
  • posts_locales - this table would contain id, _locale, localized_field, and _parent
  • posts_my_array - this is to store the myArray rows themselves, with columns id, _order, and _parent_id
  • posts_my_array_locales - finally, we store the locales for the array rows themselves as columns id, _locale, _parent_id, and finally the sub_localized_field itself

The nice part here is that there is parity between tables. If localization is enabled, every collection will have a corresponding locales table, and then every array / block table will have its own corresponding locales table as well.

We could opt for JSON columns here when we get to Postgres, but I think as of now we are in the camp of "model the data right". This is especially important when you get to more complex field types that fall outside of basic text / number fields - like localized arrays, localized relations, etc. You can nest relationships at any level within any number of array / blocks fields, and we should be using foreign keys / etc. even at those deep levels.

Also, it's great to have Drizzle literally run one query, and merge all of this in for us seamlessly.

Relationships

Phew, we're almost there. Saved the best for last. And....the most complex.

Payload supports four different relationship "types":

  1. Has one, ex. (relationTo: 'pages', hasMany: false)
  2. Has one - polymorphic, ex. (relationTo: ['pages', 'posts'], hasMany: false)
  3. Has many, ex. (relationTo: 'posts', hasMany: true)
  4. Has many - polymorphic, ex. (relationTo: ['pages', 'posts'], hasMany: true)

Performance is key here. Relationships are extremely central to everything that Payload does, and have the largest impact against API response time.

To add, supporting polymorphic relationships in specific is more involved than than simple "one to many" or "many to many" typical relationships. The typical way to do this would be a table set up for unions, using two columns - one, the related id, and the other, a relation_to column.

But unions are not widely supported in modern ORMs (even Prisma does not yet support them), and if we used unions, writing SQL manually / repurposing your own data would become more complex. We also wouldn't be able to make a single query to fetch all the data we need with either Drizzle or Prisma.

One of our goals with Payload in specific is to keep our mental model in general as concise as possible. Complexity is the enemy of good.

All of this is to say that unions might be difficult / less than ideal. And that leaves two ways to model all four types of Payload relationships:

  1. We could follow the *_locale table pattern above, where there would be a relationship join table for every table we have (main collection table, each locale table, each array table, each block table).
  2. We could simplify and create a single *_relationships table for each main collection, and store all relationships centrally in one table

In our PoC, we landed on option 2. Less joins, more mental clarity, easier bi-directional relationships, etc. Here's what a potential relationship join table, called posts_relationships, would look like for a posts collection:

idparentpathorderposts_idpages_id
12someField.nestedPath15NULL

The above table has a foreign key for the parent row as well as nullable columns (w/ foreign keys) for every other collection type that it could possibly have a relation to. This example shows a posts_id and pages_id column but there would be an id column for every necessary collection in the Payload config.

It also has an order column, and a path column, which will be used similarly to the way they are in the blocks table discussion above. This table would be the source of truth for all posts relationships. No relationship data would be stored on the posts collection itself, for clarity.

One nice part of this is that, thanks to Drizzle, when we go to query posts, we can join in all relationships in a single query, succinctly and efficiently. And then our database adapter will simply "transform" the data into the shape Payload's APIs (and you) expect.

There are many alternative ways to approach modeling our relationship data. Here are a few examples.

  • We could at least store "has one" relations directly on the parent table.
  • We could create separate join tables for each different combination of collections - i.e. posts_pages, posts_media, etc. - but then we'd need the inverse of those tables as well, so pages_posts, media_posts, etc. This would require many more tables, joins, and a bit more complexity, but no nullable columns in the posts_relationships table.
  • We could replicate the way that we approach locales (above) and have a join table for every array / block structure as well.

Recap

We're making a lot of progress, and we're pumped about it. We really admire our community and we've been consistently blown away by how smart you all are. We'd really appreciate your thoughts / feedback here, and together, we can knock this out of the park.

Go pull down the PoC and take a look:

https://github.com/payloadcms/drizzle-test/tree/master/sqlite

We'd love to know what you think! Join the GitHub discussion here:

https://github.com/payloadcms/payload/discussions/287