Convert words in yml file to a list w/ no formatting?

Discussion in 'Spigot Plugin Development' started by devonzimmi, Nov 27, 2019.

  1. I have a list of words that is 370k words long.

    How do I take this yml file and put it in a list without formatting the file or using a loop to do so. Any ideas?
  2. Choco


    okay... what? What's your goal here and why do you want to load the entire English dictionary into memory?

    • Funny Funny x 1
  3. Idk if the api reads them as keys but you can use a Scanner or BufferedReader
  4. I'm sorry I'm a bit loopy atm.

    I have a dictionary yml file that I want to put in a list to check if a word typed by a player exists in the english language. I'm trying to put it into a list ( static List<String> dictionary; ). I'm unsure of how to grab all the words in the file to put them in that list.

    I'm not asking for spoonfeeding, but I would like help being put on the right track. If what I'm doing is inefficient, can you tell me a better way of checking the existence of a word?
    #4 devonzimmi, Nov 27, 2019
    Last edited: Nov 27, 2019
  5. First of all, checking if a List of 370k entries contains a given word is a really bad idea for performance reasons since it would go over each entry in sequential order until it finds something. You want to use a more specialized data structure for that, like a HashSet or Trie.

    Secondly, the screenshot you posted looks like a plaintext file, without YAML structure. And that may be a good thing, since I'm not sure how well the configuration loader performs with such big files. I'd recommend you just read it the vanilla Java way (new BufferedReader(new FileReader(filePath)), doing readLine until it's null).
    If you insist on YAML, just format the file as a YAML list.

    Code (YAML):
    - value1
    - value2
    - value3
    So you'd have to modify your existing file to fit that format. If you're using a decent editor this should be an easy task. From there on it should boil down to a simple config.getList("key").
    You still want to convert that list to an HashSet or something similarly effective (new HashSet<String>(list) should do the trick.).
    • Like Like x 1
  6. Choco


    I'm not sure that loading all words into memory at runtime (in a List of all collections; if you were to do this, at least use a HashSet because its #contains() would be significantly faster) is all that great of an idea. 370,000 unique objects is a significant amount of instances in memory. I would suggest perhaps instead using an online dictionary with a RESTful service. Query the word and if the response is valid, it's likely an English word. Though that brings its own set of issues in that there may be a slight delay in response time and requires a connection to the internet.

    Note about the use of a HashSet is that you may also run into collisions which I'm not sure is too much of an issue in this case, but it's still worth noting as the performance of the HashSet may be negligibly slower than its best case.
  7. Thank you so much. I will definitely look into trying to have the words inside a hash set. It's probably my best chance of getting this to work.
    Yep. Already found that article... I use google quite often actually and all of those API have limits as to how many times you can actually pull from their API. Which is not ideal when making a public plugin, especially one I intend to sell...