We noticed Payload was struggling to serve images from a media collection. It would take 1-2 minutes for an image to load. Looking at the AWS metrics for the Fargate cluster indicates there could be a memory leak somewhere. Restarting the tasks in Fargate resolved the issue.

Versions:

- payload@1.6.6

- @payloadcms/plugin-cloud-storage@1.0.12

- @aws-sdk/client-s3@3.266.1

- @aws-sdk/lib-storage@3.267.0

jmikrut
2 years ago
Interesting - we have never come across this ourselves but it's definitely something we need to look into

is this the only time you've seen it?

and can you trace it back to one action? or be able to reproduce it? like, did someone upload a large file?
itsjxck2 years ago
This is the only time we have seen it yeah, and we can't identify anything in particular that has caused this. This is from our production instance, and actually noone can access the admin dashboard directly, everything is edited in our staging environment and then the db gets promoted to production through
mongodump
and
mongorestore
, s3 objects get cloned from staging to prod buckets

@364124941832159242
we've seen this again, and nothing happened on the Payload instances for ~2 days prior

The start of the slope increase is at ~9:30PM on Sunday, and the last action we took that affected the instances was Friday ~5PM, where we promoted our staging db to prod using
mongodump/mongorestore

Everything else works fine, and this seems to only impact the loading of images from our media collection
jmikrut
2 years ago
Hmmm, this is very interesting to me. Can you try and create a reproduction locally? We will get on this immediately but we'll likely need more in terms of reproduction to be able to assist

are you using any type of
afterRead
hooks or something that would run in production?
itsjxck2 years ago
Not for media
jmikrut
2 years ago
what about for anything else?

we have some big projects in production on digitalocean droplets, but we have never seen this before
itsjxck2 years ago
Not for our currently deployed instances; we're developing something that uses one but it's not on prod or staging yet

The really strange thing is that it affects nothing but the retrieval of media from the cms. Data returns are just as fast, but trying to fetch the media slows down dramatically
jmikrut
2 years ago
ok, well, at least we can narrow it down to media at least

https://github.com/nodeca/probe-image-size/issues/78

I just found that this package (which we rely on) may be causing a potential memory leak

i released a fix in 1.6.28

can you try that version to see if this solves your issue?

and followup question for you: are you using the
useTempFiles: true
option?

itsjxck2 years ago

Awesome news, thanks for looking into it! We have a ticket for upgrades next week so will report back then, but there is another issue that's slowing down our upgrade process because it appears that some future version of payload/payload-cloud-storage from what we currently use no longer handles spaces in the filenames of media

@364124941832159242

no we aren't using

useTempFiles: true

Our media collection is very simple:

import { CollectionConfig } from "payload/types";

const Media: CollectionConfig = {
  slug: "media",
  access: {
    read: () => true,
  },
  admin: {
    useAsTitle: "alt",
  },
  fields: [
    {
      name: "alt",
      type: "text",
      required: true,
    },
  ],
  upload: {
    staticURL: "/media",
    staticDir: "media",
    adminThumbnail: "thumbnail",
  },
};

export default Media;

And the plugin config:

import { fromContainerMetadata } from "@aws-sdk/credential-providers";
import { cloudStorage } from "@payloadcms/plugin-cloud-storage";
import { s3Adapter as payloadS3Adapter } from "@payloadcms/plugin-cloud-storage/s3";

const s3Adapter = payloadS3Adapter({
  config: {
    credentialDefaultProvider: fromContainerMetadata,
  },
  bucket: process.env.PAYLOAD_CMS_S3_BUCKET,
});

export default buildConfig({
  ...
  plugins: [
    cloudStorage({
      collections: {
        [Media.slug]: {
          adapter: process.env.PAYLOAD_CMS_S3_BUCKET ? s3Adapter : null,
        },
      },
    }),
  ],
});

Hmm, unfortunately it seems we're still having this issue

Versions:

"dependencies": {
    "@aws-sdk/client-s3": "^3.305.0",
    "@aws-sdk/credential-providers": "^3.303.0",
    "@aws-sdk/lib-storage": "^3.305.0",
    "@payloadcms/plugin-cloud-storage": "^1.0.14",
    "express": "^4.18.2",
    "payload": "^1.6.30"
  },

@364124941832159242

this still happens and it really is quite bizarre. It is

only

the image loading that degrades, and media is the only thing that actually "hits" the running Payload instances API. For everything else, we use a local instance of

payload

inside our own API container to connect directly to the database for fetching data. This also seems to be exclusive to our production environment. The differences between our envs:

- Staging

- 1 Fargate container instance

- 0.25 cpu units

- 0.5 mem units

- Prod

- minimum 2, maximum 20 Fargate container instances

- 1 cpu units

- 4 mem units

58bits2 years ago
You may already know this - so feel free to ignore, but there are ways to trigger heap snapshots for node.js in production. You can then bring them down for analysis in the standalone version of dev tools. Here's an excellent presentation from Matteo and Kent....
https://www.youtube.com/watch?v=vkys6Wk-jYk

https://kentcdodds.com/blog/fixing-a-memory-leak-in-a-production-node-js-app

Also here..
https://nodejs.org/en/docs/guides/diagnostics/memory/using-heap-snapshot

In Chrome...

@93699784942034944

@364124941832159242
I think what Kent did was create a custom route that he could 'hit' that would trigger heap snapshots and then he downloaded them and the two of them went through the complete analysis process (you can compare snapshots as well - a baseline, against the increased memory version)

Again - ignore all of this if I'm 'preaching to the choir' ;-)

Also again - I'm sure you know this - but where the snapshot gets written will depend on your Fargate instance config. We use EFS.
itsjxck2 years ago
Awesome! I didn't actually know about the heap stuff, will investigate how we can incorporate this so we can investigate the issues

Thank you
58bits2 years ago
Matteo is a member of the Node.js Technical Steering Committee. He really knows his stuff.

Good luck!

Open

Continue the discussion in Discord

Star on GitHub

Star

Chat on Discord

Discord

online

Can't find what you're looking for?

Get dedicated engineering support directly from the Payload team.