This is terribly hard to write. If you flushed your cache right now you would see all the newest posts without images. These are now 404s, even thought the images exist. In 2 hours everyone will see this. Unfortunately there is no going back, recovering the key store for all the “new” images.
What happened?
After the picture migration from our local file store to our object storage, i made a configuration change so that our Docker container no longer reference the internal file store. This resulted in the picture service having an internal database that was completely empty and started from scratch 😔
What makes this worse is that this was inside the ephemeral container. When the containers are recreated that data is lost. This had happened multiple times over the 2 day period.
What made this harder to debug was our CDN caching was hiding the issues, as we had a long cache time to reduce the load on our server.
The good news is that after you read this post, every picture will be correctly uploaded and added to the internal picture service database! 😊 The “better” news is the all original images from the 28th of June and before will start working again instantly.
Timeframe
The issue existed from the period from 29th of June to 1st of July.
Resolution
Right now. 1st of July 8:48 am UTC.
From now on, everything will work as expected.
Going forward
Our picture service migration has been fraught with issues and I cannot express how annoyed and disheartened by the accidents that have occurred. I am yet to have provided a service that I would be happy with.
I am very sorry that this happened and I will strive to do better! I hope you all can accept this apology
Tiff
It almost never does. But people have jumped on containers as a one-size-fixes-everything solution. For waves hands reasons.
Postgres-in-k8s is something someone legitimately wants to set up at $work. Whiskey tango foxtrot.
Middleware and logic layer as a container? Heck yeah. Front end? Sure! Stateful backend? Watchu talkin bout, Willis?