regex for match words #7

jfilter · 2018-07-03T15:49:53Z

I am not happy with the current way how match words work. I have the problem that the German term "IFG" is not as omnipresent as the English one. And I don't want to annoy people with posting false positives [0]. For this, it would be better to allow regex patterns instead of words. Then, I could prevent matches such as LIFG. In the code, you would have to construct a regex [1] and use e.g. findall to get matches.

[0] https://twitter.com/IFG_IFG_IFG/status/1014061073559949312
[1] https://docs.python.org/3/library/re.html

The text was updated successfully, but these errors were encountered:

thisisparker · 2018-07-03T19:15:19Z

I've thought about this! When I was first building this for FOIA Feed I was pretty sure it would have to use regex, but then (in English, for my purposes) the results from simple string-matching were so effective that I didn't bother introduce the complexity of regex. (That complexity is mostly for end-users... I don't think it would be very difficult to do regex matches in the code itself.)

At the risk of proliferation, what do you think about a third matchwords file that has regular expressions?

jfilter · 2018-07-04T08:49:44Z

Sounds like a good idea. So people can start off with simple words and move on to more complex patterns as needed.

byeskille · 2018-09-06T20:02:43Z

Will just mark interest in some regex support.

I have started a Norwegian FOIAbot [0], and as we have no single term that most stories using FOIA uses in the same way as in the US we have to try to track combinations of words in order to pick as many stories as possible.

[0] https://twitter.com/InnsynBot

thisisparker · 2018-09-06T20:19:51Z

OK, this convinces me, @byeskille, I'll get it into the next release. Do you have any objection to another file that has newline-separated regexes, like the format of the other matchwords files? That seems to me the most straightforward way of doing it.

byeskille · 2018-09-06T20:20:56Z

That should work I believe.

thisisparker · 2018-09-06T20:22:49Z

Off the top of my head I'm a little concerned with how you would match actual newlines, but I think that's edge-case-y enough that we don't need to worry about it. (Plus, like, newlines are used as paragraph breaks and paragraphs are the unit within which matches happen, so I'm not really sure what it would even mean to match newlines.)

thisisparker changed the title ~~match words~~ Regex for match words Jul 3, 2018

thisisparker changed the title ~~Regex for match words~~ regex for match words Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regex for match words #7

regex for match words #7

jfilter commented Jul 3, 2018

thisisparker commented Jul 3, 2018

jfilter commented Jul 4, 2018

byeskille commented Sep 6, 2018

thisisparker commented Sep 6, 2018

byeskille commented Sep 6, 2018

thisisparker commented Sep 6, 2018

regex for match words #7

regex for match words #7

Comments

jfilter commented Jul 3, 2018

thisisparker commented Jul 3, 2018

jfilter commented Jul 4, 2018

byeskille commented Sep 6, 2018

thisisparker commented Sep 6, 2018

byeskille commented Sep 6, 2018

thisisparker commented Sep 6, 2018