General efficiency question about HashMaps

Discussion in 'Spigot Plugin Development' started by SnowGears, Jun 2, 2016.

  1. Hello.

    I have a question about a specific case for efficiency vs storage on plugins (and in general).

    Lets say I have a HashMap with locations as keys and Machines as values.
    This allows me to have constant retrieval time from the map for Machines when a user clicks on a location.
    But if I want to get a Machine based on the owner, I can't have constant retrieval time from the HashMap unless I create another identical HashMap with player UUIDs as keys and Machines as values.

    Basically, it is a question on storage vs efficiency. Any thoughts? Maybe not just in this example, but just in general as well. Thanks!

    Is there a better data structure to use than HashMap for this case?
  2. Depends on your type of storage and is the data going to be saved? If you are storing the data temporarily then hash maps are the way to go. If you want permanent storage then use yml.
  3. You're using HashMaps wrong if your keys and your values are unique. In that case, I'd make an object called X, which simply contains a Location and a Machine, then I would use a HashSet (which is O(1)) and store all those values in there. Don't forget to override equals in X though.
  4. All keys are unique in HashMaps. Otherwise, it wouldn't be a unique mapping of values, which is the whole point of the data structure.

    And in what way would I be using the HashMap wrong? It seems to me that in this case a HashSet would make less sense, since I rely on a key to get its mapped value.
  5. My point was that your structure means a Machine only had 1 Location linked to it, which is kind of wasting the potential of a HashMap. As for UUID's, you could just add a UUID field to your Machine class, so that 1 UUID can own 0..* machines. However that would mean you'd have to parse the entire list O(n), so it's probably not the best option.
  6. I'm still confused how that is "wasting the potential of a HashMap" since the key is used for retrieval of a Machine object.
    The Machine object already does have a UUID field in it. I am saying that if I wanted to get a Machine from any other field than location (which I can get in O(1) time from the HashMap), I would have to loop through all of the Machines and scan their class variables for a matching UUID (O(n) time).

    I may not have been clear enough in my main post, but my question is this:
    I want O(1) time retrieval for both of these fields. (Location and UUID).
    Is there a way to do this without creating two HashMaps, one with locations as the keys and one with UUID as keys, but the exact same values in both. This seems like an incredible waste of memory (but also much more efficient), especially when Machines on servers can number in the tens of thousands.
  7. Damn, sorry I misunderstood the main post. Then I guess you don't have much of a choice but to create a second HashMap with the same values, but with the key set to UUIDs, at least that's what a quick google search proposes.

    This might be a little bit of overkill, but what would you think of a custom object as Key? You could store in there the UUID and the Location. To make the search faster, the equals method would look a bit like this:
    Code (Text):
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof YourClass)) return false;
            YourClass key = (YourClass) o;
            if(key.UUID != null)
              return this.UUID.equals(key.UUID);
              return this.location.equals(key.location);
    And you would search by creating a new object of YourClass, but giving it only the UUID or the location

    I just thought about it, but a proper usage of hashCode() is also very important in this case.
  8. No worries!

    That's a clever idea actually. It muddies up the code a bit but it definitely saves some memory as I wouldn't have to store duplicates of all the values in the HashMap. I will look into this more. Thanks!
  9. Or you can just make your own object that contains all 3 things, machine, location, and owner. Throw these objects in a Set. Nice and tidy.

    If speed and size are a huge concern (i'm talking hundreds of thousands of entries here) then you can simply have two maps, but not how suggested. There's no need to have two maps pointing to the same object (machine).

    Map<UUID, Location>
    Map<Location, Machine>

    But, this means that only ONE uuid can be assigned to a single location, and thus a single machine. This doesnt seem to be a concern, though, since your original idea is also subject to this same limitation.

    If this limitation is not acceptable, go back to the drawing board on design, or take up my original idea for a custom object containing all 3 variables you need held in a Set.
    #9 BillyGalbreath, Jun 2, 2016
    Last edited: Jun 2, 2016
    • Agree Agree x 1
  10. HashMaps are amazing for speed when working over large datasets because of the key-value pairing. Putting the data into a set instead and iterating over it (linear search or binary search, etc..) will be much slower when the amount of data is large. The way a hash map works (simplified a bit) is that an algorithm takes the data of the key and uses it to produce a number. This number is then used to find a memory location which contains the value being looked up (Yes it is a bit more complex than this, but that's the basics of it). This means you can find data in large sets much quicker.

    I would start by considering how much data you have. If you need to be able to lookup a 'machine' using either a location as a key or an owner UUID as a key then the best solution for speed in a large dataset would be to have a map of <Location, Machine> and a map of <UUID, Machine>. However you then run into the issue that both maps need to be kept up-to-date with one another. If the amount of data is smaller then it might be easier and just as good to have one map, and just search through it based on other criteria as needed.

    If the above solutions sound non-ideal that's because they are. If you can find a custom class written somewhere that allows adding a 'secondary key' that would be best.
    • Agree Agree x 1