So for those of you who were refreshing the page and looking at our wonderful maintenance page it took way longer than we planned! A full write up I’ll do after I’ve dealt with a couple time out issues.

Here is a bonus meme.

So? How’d it go…

Exactly how we wanted it to go… except with a HUGE timeframe.
As part of the initial testing with object storage I tested using a backup of our files. I validated that the files were synced, and that our image service could retrieve them while on the object store.

What I did not account for was the latency to backblaze from Australia, how our image service handled migrations, and the response times from backblaze.

  • au-east -> us-west is about 150 to 160ms.
  • the image service was single threaded
  • response times to adding files are around 700ms to 1500ms (inclusive of latency)

We had 43000 files totaling ~15GB of data relating to images. If each response time is 1.5 seconds per image, and we are only operating on one image at a time, yep, that is a best case scenario of 43000 seconds or just under 12 of transfer time at an average of 1s per image.

The total migration took around 19 hours as seen by our pretty transfer graph:

So, not good, but we are okay now?

That was the final migration we will need to do for the foreseeable future. We have enough storage to last over 1 year of current database growth, with the option to purchase more storage on a yearly basis.
I would really like to purchase a dedicated server before that happens and if we continue having more and more amazing people join our monthly donations on our Reddthat open collective, I believe that can happen.

Closing thoughts

I would like to take this opportunity to apologise for this miscalculation of downtime as well as not fully understanding the operational requirements on our usage of object storage.
I may have also been quite vocal on the Lemmy Admin matrix channel regarding the lack of a multi-threaded option for our image service. I hope my sleep deprived ramblings were coherent enough to not rub anyone the wrong way.
A big final thank you to everyone who is still here, posting, commenting and enjoying our little community. Seeing our community thrive gives me great hope for our future.

As always. Cheers,
Tiff

PS.

Our bot defence in our last post was unfortunately not acting as we hoped it would and it didn’t protect us from a bot wave. So I’ve turned registration applications back on for the moment.

PPS. I see the people on reddit talking about Reddthat. You rockstars!


Edit:

Instability and occasional timeouts

There seems to be a memory leak with Lemmy v0.18 and v0.18.1 which some other admins have reported as well and has since been plaguing us. Our server would be completely running fine, and then BAM, we’d be using more memory than available and Lemmy would restart. These would have lasted about 5-15 seconds, and if you saw it would have meant super long page loads, or your mobile client saying “network error”.

Temporary Solution: Buy more RAM.
We now have double the amount of memory courtesy of our open collective contributors, and our friendly VPS host.

In the time I have been making this edit I have already seen it survive a memory spike, without crashing. So I’d count that as a win!

Picture Issues

This leaves us with the picture issues. It seems the picture migration had an error. A few of the pictures never made it across or the internal database was corrupted! Unfortunately there is no going back and the images… were lost or in limbo.

If you see something like below make sure you let the community/user know:

Also if you have uploaded a profile picture or background you can check to make sure it is still there! <3 Tiff

  • Ravener@reddthat.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    no worries, it’s all fine. It’s good to learn from these type of mistakes early on before we grow too much, thank you for all your work, must be tough being a sysadmin.