Solved Getting and deleting ConfigurationSection keys

Discussion in 'Spigot Plugin Development' started by Bobcat00, Jul 14, 2018.

  1. I'm pretty sure I can do what I want my plugin to do, but I figured I'd double check with you guys first.

    I want to iterate through the keys in a config file by using getConfigurationSection("path").getKeys(false). As part of my processing for each key, I may want to delete the key itself and any data in its subsection from the config, which would be done with set("path.key",null).

    My question is: If I delete a key, can I still continue to iterate over the keys that I got from getKeys? Obviously, I wouldn't do anything with the key I just deleted, but continue with the next key.

    I think I can do this, because the Java Set returned by getKeys is still valid and unchanged when I delete a key in the file. Right? I know I can't change the Set itself.
     
  2. Looking at MemorySection class, the getKeys() creates a new LinkedHashSet and returns that:
    Code (Java):
    public Set<String> getKeys(boolean deep) {
        LinkedHashSet result = new LinkedHashSet();
        Configuration root = this.getRoot();
        if (root != null && root.options().copyDefaults()) {
            ConfigurationSection defaults = this.getDefaultSection();
            if (defaults != null) {
                result.addAll(defaults.getKeys(deep));
            }
        }
        this.mapChildrenKeys(result, this, deep);
        return result;
    }
    So you should be fine removing the section while iterating, since it doesn't affect the key set.
     
  3. Note that while the answer above is correct, this behaviour is not specified in the Javadocs. In a future version, your code could break; so just for god's sake you might want to add all the keys to a set instead, and remove them after iterating.

    But in the current version, yes, that'll work just fine.
     
  4. I'm not sure what you mean. getKeys is already giving me a Set. Do you mean copy them to my own object?
     
  5. Well you get a Set object of a Map. Since the set method could alter the underlying Map (which it does in the current implementation), if the Set is a view from the Map in any sense (i.e. KeySet), an Iterator would throw a CME. (By contract of Map)

    Hence, if Bukkit doesn't return a copy, iterating over the Set and modifying the same node in config is not a valid thing to do. The documentation doesn't say it's a copy, nor that it's safe to modify the node while iterating over the keys, so making a copy of the Set is the right thing to do.
     
  6. So when I make my own copy, do I make it Set<String> or LinkedHashSet? (A related question is how am I supposed to know it's a LinkedHashSet when that isn't even mentioned in the javadocs?)

     
  7. you don't need to know the Set implementation, as long as it's properly Iterable (because in what other way are you going to copy it, lol).

    You get a Set, you just need to make a copy with any data type (ArrayList, or just an array, is probably the best type to use since the size is known and iteration is fast)
     
  8. I think example code will make it clear for you:
    Code (Java):
    ConfigurationSection config = getConfigurationSection("path");
    Set<String> set = config.getKeys(false);
    for (String s : set) {
        if (...)
            config.set(s, null);
    }
    I assume that is what you've done; this should work just fine in the current version. However, this could cause some issues as you're deleting keys while you're iterating over them; while the current implementation doesn't mind, in the future with a Spigot update this could become an issue, so for forwards-compatible code you might want to do this:
    Code (Java):
    ConfigurationSection config = getConfigurationSection("path");
    Set<String> set = config.getKeys(false);
    Set<String> toRemove = new HashSet<>(); // could also be TreeSet, or LinkedHashSet, or anything else, but of those HashSet performs the best
    for (String s : set) {
        if (...)
            toRemove.add(s);  // mark the key for removal
    }

    for (String s : toRemove) {
        config.set(s, null);  // actually remove the key
    }
    Neither of the two is wrong, but the one below is a "bit more correct".


    The reason why I'm saying that is that it's popular among Java devs to create "backing" keysets, which don't need additional memory but will fail if a key is modified. For example, all Java-internal Map classes use backing keysets with which you couldn't do #1; HashMap.keySet() will throw an exception if you modify the map during iteration.
     
    #8 StillNoNumber, Jul 15, 2018
    Last edited: Jul 15, 2018
  9. just
    new HashSet<>(config.getKeys(false))
     
  10. Well, no, LinkedHashSet would perform better than HashSet when iterating over it right after this.

    What's more, Lists would easily outperform Set in terms of speed for add and iteration, and they outperform in memory usage. Using an ArrayList with preallocated size (or even better, just use the ArrayList(Collection<? extends E>) constructor) would be the best.
     
  11. Well, obviously you need to add the condition; you don't want to delete all the keys usually. I've edited the code to make that clearer
     
  12. That's not true (and implementation/hardware-specific)! When using a LinkedHashSet, literally every iteration is a cache miss, while when using a HashSet only buckets with contents are a cache-miss (so 1 cache-miss per iteration at worst). This means that while the theoretical complexity of a HashMap is higher, in practice only filled buckets matter. On-top of that, the O(h+n) complexity gets reduced to O(n) if we don't remove items. In only very few scenarios it makes sense to iterate through a LinkedList of any kind simply because of shitty hardware caching, usually it's even better to use an ArrayDeque pointing to the entries in the Map than using LinkedHashSet. (If you're not removing items that is)

    Code (Text):

    public static void main(String[] args) {
        int dummy = 0;


        int hashTotal = 0;
        int linkedTotal = 0;
        long start = 0, end = 0;
        for (int i = 0; i < 100_000; i++) {
            Map<Integer, Integer> hash = new HashMap<>();
            start = System.nanoTime();
            dummy += doMapOperation(hash);
            end = System.nanoTime();
            hashTotal += end - start;

            Map<Integer, Integer> linked = new LinkedHashMap<>();
            start = System.nanoTime();
            dummy += doMapOperation(linked);
            end = System.nanoTime();
            linkedTotal += end - start;
        }

        System.out.println(hashTotal + " " + linkedTotal);
    }

    public static int doMapOperation(Map<Integer, Integer> map) {
        int dummy = 133769;
        for (int i = 0; i < 100; i++) {
            map.put(dummy += i, dummy ^= i);
        }
        for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
            dummy += entry.getKey();
        }
        return dummy;
    }
    Output: 640331973 644336596

    Re-running the test gives me different results, from 550m to 650m, but the maps are consistently equally fast.

    And sure, you can use ArrayList which would obviously beat all of them performance-wise - but the code I showed was about correctness, not speed. If it turns out to be an issue, go back and optimize. (And asymptotically it's all the same anyways)
     
    #12 StillNoNumber, Jul 15, 2018
    Last edited: Jul 15, 2018
  13. Thanks for the suggestion of going through the keys and building a list of ones to remove and doing the actual removals later. That makes a lot of sense since the number I'll be removing will be very small (and usually zero).

    Why not just use:
    List<String> toRemove = new ArrayList<String>;
    That should work, right? Also, it's worthwhile noting that due to the structure of my config file, I need to pull out items from the removal list in the same order I put them in.
     
  14. Yep! That works. As pointed out by @DarkSeraphim, that would in fact even be faster, and the item order would also be correct.
     
    #14 StillNoNumber, Jul 15, 2018
    Last edited: Jul 15, 2018
  15. I recall seeing different results in a benchmark checking HashSet vs TreeSet vs LinkedHashSet. Run some JMH benchmarks rather than self hacked benchmarks :p
    Not quite true. The logic would be quite the same, except that you add another loop to go over removals rather than just doing it.
    Code (Java):

    for (T item : items) {
        if (item.hasCondition()) {
            remaining.add(item);
        }
    }

    for (T item : remaining) {
        item.doAction();
    }
    Is the same as
    Code (Java):
    for (T item : items) {
        if (item.hasCondition()) {
            item.doAction();
        }
    }
    But with k more Collection::add(T) calls, and k more iterations in a second for loop :p
     
  16. I won't - but if you want to or can find that benchmark again, I'd be happy to see your results. This is a common case of theory vs. practice - in theory LinkedHashSets have an edge, but the fact that only cache misses really matter kills it. I've re-run the code quite a few times, so it doesn't seem like it's due to outside factors.

    On-top of that, as I mentioned, it's implementation and hardware-specific; I could imagine there's some radical difference on other Java implementations (of the allocator, VM, or HashMap class itself) or processors other than my i5.
     
    #16 StillNoNumber, Jul 15, 2018
    Last edited: Jul 15, 2018
  17. Couldn't find it again, so I just ran it again. On machine / OS / JVM combination, it's twice as fast.
    Code (Text):

    # Run complete. Total time: 00:13:30

    Benchmark                            Mode  Cnt    Score   Error   Units
    MyBenchmark.testLinkedSetIteration  thrpt  200  135.035 ± 0.875  ops/ms
    MyBenchmark.testSetIteration        thrpt  200   63.137 ± 0.325  ops/ms
    The code: https://pastebin.com/iykPc7fH (Just drop this in a default generated JMH template generated with their Maven archetype).

    It's ran on a hypervisor machine, Ubuntu 18.04, java 10.
     
  18. Code (Text):
    Benchmark                            Mode  Cnt    Score    Error   Units
    MyBenchmark.testLinkedSetIteration  thrpt   20  210.846 ± 30.389  ops/ms
    MyBenchmark.testSetIteration        thrpt   20  159.995 ±  6.852  ops/ms
     
    This is what I get with your code. However, I've done both put and iterate in my earlier benchmarks - you've only benchmarked iterate. So it seems that the LinkedList solution is indeed faster to iterate, but the overhead of putting elements balances it out. Interesting!
     
  19. The only overhead you have is creating a new node, setting the head field in the LinkedHashSet, and setting the next field in the previous node.
     
  20. Yeah, but in my case 'k' will be zero most of the time, and if it's non-zero, I doubt it would ever be more than 5. So a second loop is basically nothing. And I'm only doing this at plugin startup.

    In case you're wondering, I'm writing an Alt Detector plugin. It keeps track of players' IP addresses, UUIDs, player names, and the time they last joined. The questions I'm asking are in regards to purging old records (which is why I save the time). If a record is more than 60 days old, I'm deleting the data. I don't punish the players in any way, I just want to keep track of who's who.