Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexType reads a line for every byte in mutableOffset.offset #47

Open
craigpell opened this issue May 9, 2018 · 1 comment
Open

RegexType reads a line for every byte in mutableOffset.offset #47

craigpell opened this issue May 9, 2018 · 1 comment

Comments

@craigpell
Copy link

I wanted to use my local machine’s /usr/share/file/magic/kml to detect KML and KMZ files, with this code:

ContentInfoUtil matcher =
    new ContentInfoUtil(new File("/usr/share/file/magic/kml"));

ContentInfo info = matcher.findMatch(new File(kmlFile));

But it always fails, because this line in the magic file never matches:

>>&0 regex ['"]http://earth.google.com/kml Google KML document

It appears this is because RegexType is reading an entire line for every byte in the mutableOffset, causing the matching content to be skipped entirely. In other words, if mutableOffset.offset is ten, the code reads ten lines, rather than limiting its scope to ten bytes.

I found I was able to get KML files to be correctly detected by changing these lines in RegexType.java from this:

if (i < mutableOffset.offset) {
    bytesOffset += line.length() + 1;
}

to this:

if (i < mutableOffset.offset) {
    bytesOffset += line.length() + 1;
    i += line.length();
}
@j256
Copy link
Owner

j256 commented Jul 11, 2018

According to my reading of the docs, the regex matching type is supposed to match on lines and not bytes. Maybe the pattern is wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants