-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate from fileinfo.com list of file extensions #112
Comments
I've added a few filetypes in my personal branch. About 2500 of them, so I didn't want to destroy the work you've done with sorting files in the master branch, and sorting all of these would be way too tedious. Just a heads up. :) |
Holy ... what? Where did you get all this? |
I scraped data from the wikipedia article on file extensions. Wanted to send you a pm about it but there's no such feature yet on github it seems. :) |
Hahaha. Holy crap. You don't get any performance issues or anything with that? When you type It might be possible to organize these (relatively) rapidly using a script that prints Wikipedia's description of a filetype and gives you buttons to hit for which category to put it in. But that's still 3000 bloody mouse clicks. In #109 I planned to make a |
Yeah, I don't know the best way to approach this either. If the wiki description for each extension read something like audio/mp3, image/jpeg or whatever, it would be possible to do this programmaticly... However, I've made a somewhat clean dump of the extensions and descriptions if anyone's up for sorting all of these out somehow: https://github.com/trapd00r/LS_COLORS/blob/japh/wiki_fileext.txt |
I figured using libmagic would work wonders (file uses it: file foo.*). The issue, though, is that it guesses the filetype based on the first few bytes of a file, so you can't just touch all of these 3k file.extensions since they'll be empty. You'll have to actually create the files in question. Here's the database that libmagic uses: https://github.com/threatstack/libmagic/tree/master/magic/Magdir |
Why days? I could cross-reference that super fast in sqlite. If this is a lot of work for you, stand back! I got this. Still not sure we want to do this though. |
Sure, go ahead. I added everything in a dictionary: https://gist.github.com/trapd00r/554f03450ed114fee191e794c87b0215 I am not sure either, but there's no performance issues so why not, really. :D |
Great, that will be super easy to parse. Some of these are kind of giving me lulz though.
I use direnv and environment variables for various purposes so I often do Anyway, the point is my use case is probably not the normal one, so if performance is really that much of a non-issue then there's little reason not to include these if we can automate the categorization. |
Also, would it be a goal to automate the scraping / categorization? That seems horrendously over-engineered but people will be updating the list on Wikipedia ... edit: a script to build the |
Yeah, forgot to tell you but the extensions in my dict above is scraped from fileinfo.com - their descriptions were a lot better (and also more extensions). And yeah, some of them are pretty bonkers... I'm all for automation, I'll tinker more with this tomorrow after a good nights sleep... |
We could cheat and scrape from their already defined categories but not sure if every filetype is categorized. Maybe it's good enough anyway. Edit: If you're going to scrape anything, do note that only 500 results are showed by default - you'll have to scroll down and click view full list |
https://github.com/trapd00r/LS_COLORS/tree/motherofgod/bin/scrape_fileinfo
|
Btw. This wasn't an issue with the entries I scraped from wikipedia (only +2500), but this is over 11k entries and, welp, we run into the 120KiB limit per env var.
It's kind of a big deal because:
|
Heh. Maybe some kind of shell extension could delegate highlighting to a subprocess. Zsh might be able to do that with a plugin, but I use bash. Regardless, even if it were workable to do that delegation, we'd actually really have to worry about performance now. Some directories have thousands of files in them. And only really weird people are going to want to install extensions like that just so that they can have a special color for What I'm saying is, I think we need two things:
[1]: Speaking of which, most of these file types have no significance in a *nix environment, which is where 99% of users of |
Given that all extensions use the ecma-48 spec notation and each extension have 5 chars we could do roughly 9k (13 chars per entry). And agree, a curated list would work better, however then this whole automation thing falls short. |
I might be able to trim the list quite effectively, I happened to write a thing while playing around with this... |
That's cool. Let me look into this and get back to you. Btw, are we concerned about who holds the copyright for the descriptions of the file types at fileinfo.com? I haven't looked into that at all. |
Kind of small potatoes but if you have imagemagick installed, |
For folks concerned with blowing up their environment, try something like this:
|
On my way to Zanzibar right now but I stumbled upon this on hackernews: Pretty comprehensive and with a lot of information on each type. |
Would it make sense to compile this file list into a YAML file, ala vivid's config? This could be done as part of #195 . |
|
Should be some kind of metadata, but maybe not in the existing metadata group
Edit: This was originally about
.vcf
and.vcard
extensions but it sprawled into something much more ambitious.The text was updated successfully, but these errors were encountered: