String filtering problem

Discussion in 'Spigot Plugin Development' started by Crypnotic, Jun 22, 2018.

  1. So I need to filter a string for each word from an array, I'll use cat for my example.

    The logic I came to try was one where I check to see if the string contained cat.
    Code (Text):
    The cat jumped over the wall.
    In this case, it should match cat.

    However, I also need to ignore the match and find all others when a certain excluded word containing the search word was found. For instance, cattle or catastrophic.

    The problem I have found is that a string containing the search word and an excluded word, such as:
    Code (Text):
    The cattle jumped over the wall to chase the cat.
    Would think that the string doesn't contain the search word cat, when it indeed does.
     
  2. Not the most elegant solution, but you can do 4 checks on the string,
    1. Check if the string contains “ “ + word + “ “
    2. Check if the string starts with word + “ “
    3. Check if the string ends with “ “ + word
    4. Check if the string equals word
     
  3. Other solution can be create a list splitting all words separated by a space, stream it and check if the values in the stream matches with your two conditions, it is cat and isn't cattle, here is an example

    Code (Java):

    List<String> exempted = Arrays.asList("cattle", "catsup");
    String check = "The cat is in the bridge";
    if (Arrays.asList(check.split(" ")).stream().anyMatch(string -> string.contains("cat") /*or any other check with patterns or anything that you're using*/ && !exempted.contains(string))) {
        // do what you want to do with this
    }
     
    You can also stream the exempted list and do a better check for each value
    Code (Java):

    Arrays.asList(check.split(" ")).stream().anyMatch(string -> string.contains("cat") /*or any other check with patterns or anything that you're using*/ && !exempted.stream().anyMatch(exempt -> string.contains(exempt)));
     
     
  4. FrostedSnowman

    Resource Staff

    Use regex:

    Code (Java):
    private static final Pattern PATTERN = Pattern.compile("\\bcat\\b", Pattern.CASE_INSENSITIVE);

    Matcher matcher = PATTERN.matcher("The cattle jumped over the wall to chase the cat.");
    System.out.println(matcher.group()); //prints cat
     
     
    • Winner Winner x 1
  5. Regex is exactly what you'll want for this specific case! Right on @FrostedSnowman
    Here's a little tutorial to regex: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
    And if you can mess around with various configurations with this site to get live feedback on your patterns: https://regexr.com/
     
  6. If you're making a profanity filter, be weary of the Scunthorpe problem. This is still an ongoing issue with string filters and word recognition, especially since kids keep coming up with more clever ways of expressing things to get around filter/detection systems.

    Code (Text):
    "The cat tle jumped over the wall to chase the ca t"
    [​IMG]
     
    • Winner Winner x 1
    • Informative Informative x 1
  7. I understand how regex would be useful, but please explain how I can make a regex catch an infinitely possible amount of variations of a word while also ignoring an infinitely possible amount of exceptions
     
  8. If you can figure that out, you'd be the first and make a lot of money. ^_^
     
  9. For your specific request from the original post, you don't need an infinite system. Simply use matcher.matches(), or while (matcher.find()) {}.
     
  10. The thing is that the variations and exceptions would all be explicitly defined, just need to figure out how to fix the problem I outlined.