What are the best practices around DataStore being down?

DataStoreService going down is arguably one of the most harmful problems that could happen. Especially when players are spending money this can heavily threaten the player experience and trust. So I would like to know what a developer can do to minimize disaster when this service happens to go down.

The biggest problems I am currently struggling with is that there does not seem to be a way to know when DataStoreService is having issues until you are actually using it to load or save data. Alternatively there does not seem to be a way to know when the service is back up until you are trying to load or save data. And to make matters worse, these actions are all budgeted. This all raises a few questions:

  • When a player leaves the game and I fail to save the player’s data, what do I do? Do I keep retrying? And what if a player left the game to join another server instead?
  • If I want to disable robux transactions when save and load calls fail, how would I distinguish between small ‘hick-ups’ and the service actually being down? Should I even distinguish the two?
  • How would I effectively detect when DataStoreService is back up again after being down for a little while? Especially for if I implement an ‘offline mode’ of sorts.

Overall, DataStoreService going down temporarily seems like a problem that is tricky to work around no matter how you turn or twist it. So I would like to know, what are some effective solutions, or at least band-aids you could try to minimize the effects of this service going down?

6 Likes

For such an important feature, DataStores are lacking a lot.

Unless you go out of your way to either make your own system via external server(s) or using a 3rd party service. Solving data loss is (near?) impossible.

Personally I think that so long as you do your best to ensure the loss is NOT because of an over-sight on your part, you will be ok. However… that doesn’t mean you can just leave it be.

We may not be able to control the data loss that happens when Roblox’s services are down, though by simply letting the player know “Hey we couldn’t save your data because Roblox’s services are down” will at the very least have the player be aware of what is going on and would not be left clueless, thinking that your game is buggy.

Here is a prompt I show in my upcoming game for when such issues arise.
image

To take it a step further, once my game is made aware that such issues are happening, I lock up any purchases that can be made for Robux which would require saving. To best prevent any players unintentionally getting scammed out of their money.

Really by ensuring that you are up-front with your community and make sure they are aware of any situation that may affect them, you would be doing you and your game a huge favor.

2 Likes

Do you also have any mechanism to automatically unlock purchases at a later moment? And if so, what approach are you taking?

Yup!

Nothing too fancy though, Since the game uses a custom made datastore handler I have it doing 30 second / 1 minute checks once datastores do go down as all other requests to datastores get paused.

Obviously can’t pause game-play for the player so anything that does need to be saved is put into a cache and once services return I have it push everything to datastores.

It would be really nice to have the write cool down removed as it would help systems like this catch up on backlogs faster. :sweat_smile:

Again, this is not 100% solution as the server could potentially shut down before it has time to clear the backlog, however at that point you really can’t do anything about it…

1 Like
  • When a player leaves the game and I fail to save the player’s data, what do I do? Do I keep retrying? And what if a player left the game to join another server instead?
    You should retry until a certain max attempts, it really only hurts your recovery when you use all your budget to make sure that your data saves. When you detect downtime in your outgoing calls make sure to communicate with the player clearly that you have an outage and that it might happen that data will not correctly save. Important is that the player knowns that he / she takes a certain risk with playing the game further. Personally I save data on the following occasions: Player leave, Transaction, big event (For example an transaction in the shop, an achievement, etc) and I auto-save all the data each 10 minutes
  • If I want to disable robux transactions when save and load calls fail, how would I distinguish between small ‘hick-ups’ and the service actually being down? Should I even distinguish the two?
    Yeah you should, basically what you can do to measure it is to track the amount of request that goes wrong in the entire server over ‘X’ minutes and what type of request goes wrong. That way you know or it are small hiccups. I would disable transactions if I see a hiccup of a few minutes since it might be that a big update of a big game (cough) causes downtime on the platform. Basically write an error handler for all the different ways that an request can fail. (There are quite a lot, all with different meaning / strageties to recover from it)
  • How would I effectively detect when DataStoreService is back up again after being down for a little while? Especially for if I implement an ‘offline mode’ of sorts.
    Be measuring the amount of succesfull attempts. After you detect that the game has an issue you can either make ‘fake’ request to measure or the service is back online. If you see that the request returns the expected value for ‘X’ times in ‘Y’ time you can assume that the service is back online and disable the ‘Offline’ mode of the game. Personally I would not save the data of the user any longer, but remove those request from the pool. You can’t trust that your data hasn’t been overwritten when they try an different server, so no point in saving that anymore… And we have the ‘auto-save’ data that had been captures a few minutes before + the player got a notifaction that we have downtime so they can’t be suprised when data is lost (and really put it in your terms of service that they can’t ask for data recovery if sometime like that happens)

My situation is uncommon and doesn’t apply to most DataStore usage, but it’s food for thought regardless.

Most games can continue to be played while data is missing/unsavable, but Lua Learning’s tutorial section can’t. Lua Learning has 200+ tutorials for users to read. It’s “global” data in the sense that it isn’t tied to any player, but can be altered by multiple live servers (any servers that have a moderator logged in and authed.)

If DataStores are down, a massive portion of the game would just say “Failed to load, retry later” and I don’t find that acceptable.

I wrote a plugin (just a script, no UI) that (when I open Lua Learning in Studio) will fetch the data, write it into module scripts1 , put it into a Backups folder in ServerStorage, and then prompt me to save that to my PC for backup in case something ever goes wrong.

If DataStores are failing, the server will fetch the data from these ModuleScripts instead. The tutorial data could be old2, but that’s generally good enough. Viewing live stats, reviews, upvoting, etc will still be down- but you can always read tutorials regardless of DataStore status.3


1 ModuleScripts is plural because plugins cannot write enough characters to Script.Source, so I have a top layer module that requires, combines, and returns all children modules and each child module returns a dictionary of tutorial data. Annoying, but the result is the same.

2 After I approve new/changed tutorials, I generally remember to go recreate these backups so it’s almost always up-to-date. However, that’s not always the case and the point I was making is that the data is a snapshot that can work regardless of time or correctness.

3 I just like footnotes. :stuck_out_tongue:

4 Likes

Two ideas I had just off the top of my head that might help with detecting datastore issues (in the case the methods don’t error, which has happened a couple times):

  • Save a test value to a key in a data store manually (in Studio). Then, when you need to see if retrieving values is working from in-game, try reading the value from that key and see if it matches the test value. If it doesn’t, getting data from data stores is probably broken.

  • When you need to see if saving is working in-game, write some test data to a key in a data store, then read it back and compare it with the original value. If they don’t match, saving data into data stores is probably broken.

I haven’t considered how caching and other details might affect these checks, but in theory I think they should work.

2 Likes