Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for checksum lists #5138

Merged
merged 11 commits into from
Jun 8, 2022
Merged

Add support for checksum lists #5138

merged 11 commits into from
Jun 8, 2022

Conversation

theonlypwner
Copy link
Contributor

@theonlypwner theonlypwner commented Jan 12, 2021

EDIT:
I'm checking off each extension as I find the time to. I can't review everything at once, so I'm ticking off extensions/filenames as they're verified. Sorry for the mess…
@Alhadis
EDIT Ⅱ:
Never mind, this is @lildude's problem now. 😁
ibid

Description

Fixes #5130.

Checklist

  • I am adding a new language.
    • The language is in-use by hundreds of users on GitHub.
    • I have included real-world samples for all extensions added in this PR:
      • Source: Author (@theonlypwner)
      • License:

        If it isn't already in the public domain, I agree to license my samples under the MIT License, so it may be included in this repo without any issue.

    • I have included or enabled a syntax highlighting grammar.
    • I have added or updated the heuristics.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
@theonlypwner theonlypwner marked this pull request as ready for review January 16, 2021 21:39
@theonlypwner theonlypwner requested a review from a team as a code owner January 16, 2021 21:39
@lildude lildude requested review from Alhadis and pchaigno January 18, 2021 09:09
@Alhadis Alhadis changed the title Add support for checksum list files (Fixes #5130) Add support for checksum lists Jan 19, 2021
Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have yet to verify the usage of each addition, but here're some preliminary thoughts.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
samples/Checksums/single_hash.crc Outdated Show resolved Hide resolved
theonlypwner and others added 3 commits January 19, 2021 20:42
These names are too ubiquitous across GitHub, and code-searches yield no
actually valid files containing a CRC32 sum.

See http://github.com/github/linguist/pull/5138/files/09c642Ω#r559949817
for the original discussion justifying their removal.
Copy link
Collaborator

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lildude, I'm passing this ball to your court, as you're currently responsible for sussing out popularity metrics. Hopefully it's not too much of a pain. 😁

@Alhadis Alhadis requested review from lildude and removed request for pchaigno May 31, 2022 05:46
@lildude
Copy link
Member

lildude commented Jun 7, 2022

Hopefully it's not too much of a pain. 😁

With this many filenames and extensions, it's a right PITA!! 😞

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few file names that I couldn't find a single example of or aren't popular enough when I did.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
@theonlypwner
Copy link
Contributor Author

I agree with the suggestions and have made them.

I noticed a lot of results for extension .digest, so please check it.

Also, I noticed some files in the format name whitespace hash instead of hash whitespace name, which I hope either won't cause issues for the syntax highlighter or can be fixed later.

@theonlypwner theonlypwner requested a review from lildude June 8, 2022 02:44
@lildude
Copy link
Member

lildude commented Jun 8, 2022

I noticed a lot of results for extension .digest, so please check it.

Yes, there are quite a few, however the format isn't consistent from a quick look at various results. I'm not sure if the grammar can support this at this stage and I'd really like to build the next release today.

We can always add it later once @Alhadis confirms the grammar can support these various formats.

Accordingly, I've removed the .digest and the sample for now.

@lildude lildude merged commit 9703d62 into github-linguist:master Jun 8, 2022
@theonlypwner
Copy link
Contributor Author

We can always add it later

Yes, I agree. It's better to merge this first and open a new issue/PR for later changes.

@Alhadis
Copy link
Collaborator

Alhadis commented Jun 18, 2022

I'm not sure if the grammar can support [.digest files] at this stage and I'd really like to build the next release today.

Well, I can add support for patterns like this

digest1:dwLYXWqDkzaJhlo1P6sQqpxgU5cBRRYM9Sna/buqhHIoAX6/np9fYp9Zbgb3Rc7I

… however, I'm unfamiliar with whatever program generates this kind of output, and I'd ideally like some authoritative reference on what formats are accepted in such lines (the example looks to be using base64, but I'd rather not go by guesswork if I can help it).

Also, I noticed some files in the format name whitespace hash instead of hash whitespace name, which I hope either won't cause issues for the syntax highlighter or can be fixed later.

There's no foolproof way to tell name hash from a filename that contains spaces, followed by a word that happens to contain n alphanumeric characters. Conversely, a line that begins with precisely n characters is a little less ambiguous. Consider files that begin with arbitrary text (e.g., comments), or even files like this that aren't technically a checksum list, but are still highlighted non-intrusively.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for hash sum files (tools like sha256sum)
4 participants