{SOLVED [Question]} Flat File saving methods

Discussion in 'Spigot Plugin Development' started by xxxCheeseproxxX, May 23, 2015.

?

Which way of saving Flat File is faster in performance?

Poll closed May 23, 2015.
  1. Save all the players' data in one file is faster!

    1 vote(s)
    50.0%
  2. Save every player's data in a unique file is faster!

    0 vote(s)
    0.0%
  3. Other? Please explain :).

    1 vote(s)
    50.0%
  1. So I am creating a plugin that saves many players' statuses, my plugin offers MySQL as a file saving method as well as Flat File saving method as a alternative method.

    I was wondering which type of Flat File saving should I use? For example the one that I am using right now saves all the players' data in one file.

    Code (Text):
    players:
      2g79agr6-b545-42ee-99f3-d630c87ssadd:
        rank: creeper
        level: 1
        xp: 280
        xp-multiplier: 3
        achievements:
            test1: true
            test2: true
            test3: true
      9f79agr6-b545-42ee-99f3-d620c87asadd:
        rank: creeper
        level: 1
        xp: 280
        xp-multiplier: 3
        achievements:
            test1: true
            test2: true
            test3: true
      0s79agr6-b545-42ee-99f3-dak0c87ssadd:
        rank: creeper
        level: 1
        xp: 280
        xp-multiplier: 3
        achievements:
            test1: true
            test2: true
            test3: true
     
    *The amount of data saved per player will increase, as I add more features.

    As you might already know that Essentials saves players' data by creating a unique file for each player rather than saving all the players' info in one.

    My question is which way of saving Flat File is fastest in performance? Both Read and Write.

    Thank you :)

     
    #1 xxxCheeseproxxX, May 23, 2015
    Last edited: May 23, 2015
  2. I really don't think there is one that is noticeably "faster", but looking at the amount of data you are storing for each player, individual files named as the UUID of the player would be much cleaner, and i'm sure plugin users would appreciate the ease of finding a small file for each player rather than sifting through one large file crammed with data.
     
    • Agree Agree x 3
    • Like Like x 1
  3. Thank you for your advice. Anyone else? :D
     
  4. Unique player files:
    • Usually requires reading from the disk a lot more often
      • Each file is probably going to have to be read from the disk whenever it's needed
      • Doesn't benefit as much as large files do from using YamlConfigurations
    • Easier for humans to look at, it only contains the relevant data
    • Saving the smaller amount of data usually won't affect the thread it's saving on
      • The unique file will usually need to be saved after each change
      • Saving asynchronously becomes a lot harder to manage, and the benefit isn't worth doing so
      • Saving on the main thread usually doesn't take too large of a hit unless you're doing it very often
    • Making a backup can be very time consuming since it takes a long time to copy hundreds of files instead of one file

    One large file:
    • Read once (usually when a plugin enables)
      • Only requires opening once, so you won't need to read from the disk while the server is running with players on it
      • If the file was a yml and you're using a YamlConfiguration object, the data can be set and retrieved super fast as if it was a HashMap, even though the file contains a lot of data
    • A little more difficult for humans to read in a text editor
    • Saving the large file can hold up the thread it's saving on if there's a lot of data to write at once
      • If you are saving in the main thread, it can hold up the server for a few seconds every time it's saved depending on how large the file is
      • You'll probably save onDisable if you do it in the main thread since it takes longer
      • If you know how to manage saving asynchronously, this time vanishes to basically nothing
    • Making a backup is much easier because one large file copies faster than hundreds of small files

    I personally prefer using one large file instead of unique files for each player. If you plan to iterate through all of the data at any point, the large file won't require opening all of the small files cuz it's already open and ready to be looked at for data.



    tl;dr
    Unique files require reading/writing to the disk more often
    Large files do it a lot less often but at the cost of taking longer to do so

    Unique files are unnoticeable by default if you don't try to get/set information too often
    If you manage the large file correctly, you can make it's disk read/write unnoticeable, and get/set as much as you wish
     
    #4 iPyronic, May 23, 2015
    Last edited: May 23, 2015
    • Informative Informative x 2
  5. Wow... Thank you for your extremely helpful and informative reply :eek:

    And can you please tell me "how to manage saving asynchronously"? Since, I have never worked with thread stuff xP.

    FYI: right now the way that I read data is by caching all the data in the file located in the onEnable() and then read the data from HashMaps.
     
  6. I would tidy it up with a separate YamlConfiguration per function or purpose such as data.yml contains all data such as UUID's etc and things like config.yml or other YamlConfigurations use for configuring to change data in data.yml
     
    • Agree Agree x 1
  7. I personally only load my large flat files onEnable.

    When I save, I use a boolean to check if I had already started saving recently, if I did, I retry to save in a couple seconds in hopes that the old save is done. If the file wasn't already saving, I set the saving boolean to true, create a new async task, then run the saving code. Once the the file is done saving, it changes the boolean back to false.

    The following code is close to what I usually use when saving large (upwards of 600KB) yaml database files asynchronously. I literally took one of my plugins and tore out most of the irrelevant stuff, and I also tried to make it a little easier to read. You'll want to mostly focus on the saveMainYaml() method. The rest is just so you have an idea of how I set it up.

    Code (Text):

    package net.nulll.uso.MyPlugin;

    import java.io.FileNotFoundException;

    import org.bukkit.Bukkit;
    import org.bukkit.command.Command;
    import org.bukkit.command.CommandSender;
    import org.bukkit.configuration.file.YamlConfiguration;
    import org.bukkit.event.Listener;
    import org.bukkit.plugin.java.JavaPlugin;

    public final class MyPlugin extends JavaPlugin {
     
        private final JavaPlugin plugin = this;
     
        static YamlConfiguration mainYMLObject = new YamlConfiguration();
        static boolean savingMainYaml = false;
        static boolean queueSavingMainYaml = false;
     
        @Override
        public void onEnable(){
         
            mainYMLObject = loadYaml("plugins/MyPlugin/mainYaml.yml");
         
            Bukkit.getServer().getPluginManager().registerEvents(new Listener(){
                //Events go here
            },this);
        }
     
        @Override
        public void onDisable(){
            //Unregister event listener
            org.bukkit.event.HandlerList.unregisterAll(plugin);
            //Cancel tasks
            Bukkit.getScheduler().cancelTasks(plugin);
        }
     
        public boolean onCommand(CommandSender sender, Command cmd, String label, String[] args){
            //Commands go here
            return false;
        }
     
        public void saveMainYaml(){
            if(savingMainYaml==false){
                savingMainYaml = true;
                Bukkit.getScheduler().runTaskAsynchronously(plugin, new Runnable(){public void run(){
                    saveYaml(mainYMLObject, "plugins/MyPlugin/mainYaml.yml");
                    savingMainYaml = false;
                }});
            }else{
                if(queueSavingMainYaml==false){
                    queueSavingMainYaml = true;
                    Bukkit.getScheduler().runTaskLater(plugin, new Runnable(){public void run(){
                        saveMainYaml();//Recursion
                        queueSavingMainYaml = false;
                    }},20*5);
                }
            }
        }
     
        public static YamlConfiguration loadYaml(String ymlFile){  
            YamlConfiguration yml = new YamlConfiguration();
            try{
                yml.load(ymlFile);
            }catch(FileNotFoundException e){
                try{
                    yml.save(ymlFile);//Create the file if it didn't exist
                }catch(Exception e2){e.printStackTrace();}
            }catch(Exception e){e.printStackTrace();}
            return yml;
        }

        public static boolean saveYaml(YamlConfiguration yamlConfig, String ymlFile){
            try{
                yamlConfig.save(ymlFile);
                return true;
            }catch(Exception e){e.printStackTrace();}
            return false;
        }
    }
    If I use multiple large databases, I create another pair of booleans for saving and queuing, I don't make them share the same ones.
     
    #7 iPyronic, May 23, 2015
    Last edited: Apr 25, 2017
    • Like Like x 2
    • Useful Useful x 1
  8. Thank you so much for your help @iPyronic! I very much appreciated :). Thank you again for explaining to me :D.
     
    • Like Like x 1
  9. No problem, I hope you'll find my option useful~

    Edit: Just remember async isn't always the best option and my code will become unstable if loading occurs while the plugin is running (and at which point the large yaml is potentially saving). That's why I only load the yaml when the plugin enables.

    Edit2: Of course, you could optionally set up some isLoading booleans too I guess if it was needed xD
     
    • Like Like x 1
  10. @iPyronic From your replies I am getting the feeling that when saving a YamlConfiguration asynchronously, if you call for another save while it is still saving from the first save call, there is a possibility of bad things happening. Is that hunch correct?
     
  11. That is entirely correct. If doing it async, it must be managed to avoid saving/loading while it is already doing one of those.

    Edit: If you look above at post #7, you can see what I've tried to do in order to avoid saving again while it is already saving.
     
    • Like Like x 1
  12. @iPyronic @xxxCheeseproxxX do note that the code is not thread safe. Use atomic / volatile fields and perhaps wrap your gets and sets in synchronization blocks.
     
    • Agree Agree x 1
  13. You totally made me remember I made a YamlConfiguration cloner because of the getting/setting while saving, but I never copied that part over to my other plugins so I forgot about it D: Much thanks xD

    I just tried stress-testing without the cloner. The testing involved calling for a save every game tick, as well as setting 10 new keys every tick too. It occasionally gave me ConcurrentModificationExceptions, no surprises there >.< Adding the cloning code stopped the exceptions from occurring ;D

    The stress-test file contained over 21800 keys and was 1,653 KB. The test server (filled with 59 other plugins) averaged 19.8 TPS per minute. There was an occasional 1 second delay between commands during testing, although that didn't always happen. The server had 2,230MB RAM and 952 chunks loaded across 5 worlds, along with 400 entities and 2,200 tile entities listed in Essentials' /mem command.

    The cloner code iterates over the source YamlConfiguration keys and writes the values to a new YamlConfiguration object.

    Cloner:
    Code (Text):
    public YamlConfiguration cloneYaml(YamlConfiguration yamlConfiguration){
        YamlConfiguration newYaml = new YamlConfiguration();
        for(String s : yamlConfiguration.getKeys(false)){
            newYaml.set(s, yamlConfiguration.get(s));
        }
        return newYaml;
    }
    Updated Save Queue Method:
    Code (Text):
    public void saveMainYaml(){
       if(savingMainYaml==false){
          savingMainYaml = true;
          final YamlConfiguration finalYML = mainYMLObject;
          Bukkit.getScheduler().runTaskAsynchronously(plugin, new Runnable(){public void run(){
             saveYaml(finalYML, "plugins/MyPlugin/mainYaml.yml");
             savingMainYaml = false;
          }});
       }else{
          if(queueSavingMainYaml==false){
             queueSavingMainYaml = true;
             Bukkit.getScheduler().runTaskLater(plugin, new Runnable(){public void run(){
                saveMainYaml();//Recursion
                queueSavingMainYaml = false;
             }},20*5);
          }
       }
    }

    Does cloning effectively remove the need for the synchronized setting/getting? (since the YamlConfiguration that is being written to the disk is never interacted with after beginning the asynchronous save)
     
    • Useful Useful x 1
  14. A file for each player would be much better for multithreading. Say you have two threads running in parallel trying to read/write information. If it's one large file there would be concurrency issues, whereas with a file for each player the odds of that file being taken up by two threads at once (thus causing exceptions) are far lower.

    Alternatively, try SQLite. It's like a database in a file. If you're skilled with databases, this is a good alternative.