Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EXIF taken date as sorting date for photos [$200] #221

Closed
1 task
jancborchardt opened this issue Jul 20, 2015 · 36 comments
Closed
1 task

Use EXIF taken date as sorting date for photos [$200] #221

jancborchardt opened this issue Jul 20, 2015 · 36 comments

Comments

@jancborchardt
Copy link
Member

jancborchardt commented Jul 20, 2015

Use-case: Doing a trip with friends and multiple people take photos. Afterwards when you move them all in the same folder from the different devices, the sorting is off. It’s not based off the EXIF date but instead the modified date.

We should sort photos where there is an EXIF date present by that date. @oparoz

Ref #101

Pre-requirements

There is a $200 open bounty on this issue. Add to the bounty at Bountysource.

@oparoz
Copy link
Contributor

oparoz commented Jul 20, 2015

Agreed, but for this to work well, data should be extracted and indexed in the DB at upload time, otherwise it will slow down all operations since each file would have to be opened before we're able to get the list of files.

@jancborchardt
Copy link
Member Author

Maybe this is something for core then? The sorting should be right in the Files app already as well. cc @schiesbn @DeepDiver1975

@oparoz
Copy link
Contributor

oparoz commented Jul 20, 2015

Yes, I think the upload mechanism could do with adding more data to the FileInfo object attached to each file (dimensions, exif data object), but I'm not sure the Files app would benefit from it, even in thumbnails view. That kind of sorting works best for media files.

Also, I want to add the possibility of letting users add the missing data to other files so that they can mix PSD and JPEG files per example, but I'm not sure it's something people are going to do often (both adding data and mixing content).

A first step could be to introduce another of those settings which lets users who need the feature enable it despite the slowdowns. That way, it wouldn't affect the app for all the other users.
The main downside is that there is no easy way of caching the information between sessions...

@patman15
Copy link

I just made in one evening a prototype implementation on v14.0.0 that adds the exif time to the file properties and allows sorting. Since I'm new to owncloud and galleryplus, I'm pretty sure it's not the best, but it works. Also I read using the file system might slow things down. Nevertheless, if there is interest, I could provide the patch.

@jancborchardt
Copy link
Member Author

@patman15 hey, cool! :) Of course there’s interest – it would be very nice if you can open a pull request here with the patch.

@oparoz
Copy link
Contributor

oparoz commented Sep 22, 2015

@patman15 - Thanks for your work :). Please submit a PR and we'll know more about what you've done.

The actual EXIF parsing code is not really the problem here, but the fact that not all images contain that information and that if we don't store it, then we need to parse the files every time and it will be very slow for external storages.

Without a new sorting button (!), the sort by date button will mix together different types of information. We need to think about the best solution for users here. The behaviour needs to be predictable.
Let's list available options and outcomes.

@patman15
Copy link

Ok, sure I'll submit a PR soon.

Button is drawn and added, logic is enhanced from two to three sort types. I'm still thinking about how to handle files without EXIF. Either use the modification time as fallback for those cases or for the complete album or disable the button if not all files are with EXIF.

Regarding speed: I read about the issue before I started. Since I'm new to OC programming I couldn't figure out yet where file (content) is actually accessed. Thumbnail generation might be one good point to add the parsing. Currently, I put it into the searchmediaserivce.php and there is still an enhancement by switching to stream access. Nevertheless, yes, it loads every time you open the folder.

@oparoz
Copy link
Contributor

oparoz commented Sep 22, 2015

Thumbnail generation might be one good point to add the parsing

Yes and that's done in core. There are 2 alternatives:

  • add a hook and to collect the information in our own DB table every time a file is uploaded, see Use image meta data programme [$115] #101
  • create an app just for that as the information could be useful for other apps

I'm in favour of the 2nd option as it offers much more flexibility and zero dependency on core.

Then in Gallery, we could have an ExifService (or use the one in the Exif app even if it's frown upon) which would collect the needed information.

@patman15
Copy link

@oparoz So, PR is there, if you want to have a look. I still have to fix a few issues to increase quality.

I just tested performance on a directory with only jpg images (worst case as other mime types do not impact performance). On the directory containing 1429 images my (very small home) server took 3.9s for scanning while the patch increases the time to 6.4s (+64%). I still have a few performance increasing options available, but it won't help too much. Nevertheless, I'm quite happy with it for my personal use case.

If you could give me some more infos on how your second option should look like or whether I should try to extend the core, I would be happy to invest some more time. Just let me know if I should finalize the current patch as an intermediate solution or directly focus on a more mature solution.

@oparoz
Copy link
Contributor

oparoz commented Sep 23, 2015

Regarding performance, it would be good to time a few test cases, so that we can use it in the documentation, but that's only relevant if we don't store the information in the DB, so let's look at that first :)


Exif app

Requirements

  • Generic EXIF parser, not just limited to date collection
  • Stores the information in its own table(s)
  • Adds the information at upload time (GUI, WebDAV)
  • CRUD operations

Implementation

  • AppFramework app created with ocdev
  • Don't trust the documentation on hooks, it's bogus

I can definitely help you with setting things up. We just have to think hard to decide if an Exif apps would be useful to others, because otherwise, there is little point in keeping the functionality outside of the app.

@patman15
Copy link

I checked for apps that could use it as well. I only found

So basically put photos on a map or different galleries. Also when thinking about which apps (e.g. on Android) use EXIF it is mostly for rotating or displaying on a map (which could be part of gallery as well) Not sure it's worth the effort to make a separate app in the first run, but for sure you have more experience. My guess is to go for a service and if it's really demanded by/interesting for others we could still move it to a separate app.

What do you think?

@oparoz
Copy link
Contributor

oparoz commented Sep 23, 2015

I was about to ask on the mailing list, but I agree. If there is demand for it, it will be straightforward to re-package the service. People will be looking for a migration path and you can decide then if you want to offer/sell one.
There is an issue open for showing pictures on a map. That can be done by invoking the maps app or by using a layer provided by a maps API. In any cases, it should be easy to implement then, once we have the pins' coordinates.

So, let's talk about architecture. I think it would be good to implement this as a part of #101

So maybe we should have a MetaDataService and a folder (like preview or config) containing various parsers/writers and we would start with the ExifParser Class?
Feel free to come up with your own proposal :)

The database classes are described in #101 and you can use Activity as an example of how to use hooks.
https://github.com/owncloud/activity/blob/master/lib/fileshooksstatic.php#L36-L38

It creates untestable code unfortunately, but that would be the quickest way to get started and we can improve on that later.

@patman15
Copy link

I did a bit of research how EXIF information is handled in other viewers. The most general approach -- which is used in some famous viewers -- is to sort files with EXIF and files without in separate blocks. Then there is sometimes an option to use modification date as fall back (default disabled). I like this quite a lot as we would not need an additional sorting button but, unlike in my pull request.
@jancborchardt Nevertheless, if we parse all EXIF information there are a number of sorting options we would need a GUI as soon as not only date is interesting as mentioned in #101.

@oparoz To summarize:

Exif app

Requirements

  • Generic EXIF parser, not just limited to date collection
  • Stores the information in its own table(s)
  • Adds the information at upload time (GUI, WebDAV)
  • CRUD operations

Implementation

Next steps

  • Keep PR as patch for people not willing to wait?
  • Ask other people about whether to make a separate app?
  • I'll make a prototype implementation for the EXIF service

Sounds like a (reasonable) plan?

@oparoz
Copy link
Contributor

oparoz commented Sep 24, 2015

is to sort files with EXIF and files without in separate blocks

I think this is going to be a barrier, because it requires making big changes to the photowall to make it usable. People won't know there are more images much lower in the wall, in another block.

an option to use modification date as fall back (default disabled)

That could be very messy, with big blocks of pictures with exif data, in the middle of the rest of the pictures, unless people carefully curate their collections.
I guess, if this is optional and comes with a list of caveats, people enabling it know what they're getting into, until we can come up with a better design.

Keep PR as patch for people not willing to wait?

Yes. That could help us get some feedback. I'm not sure Gallery wants patches, so maybe keep this in Gallery+? @jancborchardt @karlitschek

Ask other people about whether to make a separate app?

I'd say, let's use your fresh stock of energy to get the service built first and we'll see about the app when that's done :)

I'll make a prototype implementation for the EXIF service

Cool. I'll move the specs over to #101, since this is the way forward.

@oparoz
Copy link
Contributor

oparoz commented Sep 24, 2015

OK, so I've updated #101. @patman15 , please refer to that for the general guidelines and I'll update it if it needs be.
This issue here will be about the specifics of the Exif parser and how to make sorting by date work.

@deMattin
Copy link

I'm the same opinion as @oparoz not to give an automatic date sorting on files date or exif date.
This can be very confusing in some not very well maintained collections.
It's not only a matter of existence of Exif data because often Exif dates exist, but are wrong (empty battery of cam before useage and forgotten to set date and time).

So sorting on Exif data should always be an additional option.

@patman15
Copy link

Well, we can keep a separate button (I'm happy with that) nevertheless we need to define what should happen if you choose Exif sorting and the information is partly missing. Maybe @oparoz you got me wrong, the current tools do sort all files without Exif info by modification time first and then all with exif information by this date. It's not perfect but if you choose the fall back it's like you said somehow mixed.
Before reading @deMattin comment I would have assumed Exif date <= modification date but even that is not always true. :-( Anyhow all of this can be done just by minor adjustments to the javascript sorting algorithm, after the data is available. :-)

@oparoz
Copy link
Contributor

oparoz commented Sep 24, 2015

Anyhow all of this can be done just by minor adjustments to the javascript sorting algorithm, after the data is available. :-)

Agreed :)

@rhatguy
Copy link

rhatguy commented Sep 24, 2015

I'm more than thrilled to see the progress being made here. I wanted to throw out one use case in case you guys hadn't thought of it yet. I upload my pictures into a folder that is an external storage mountpoint in owncloud. Then sometime later (maybe years) I may go back and add further exif information (GPS coordinates...facetags..etc). It would be good for the implementation to have some way to trigger a re-scan of the exif data in case the files are updated outside of ownclouds control. I can understand though if the final answer is "the files should be under owncloud control" :)

I'm willing to help if testing is needed. I have a 27k file picture repository with pics and videos.

@oparoz
Copy link
Contributor

oparoz commented Sep 24, 2015

@rhatguy - That could be solved:

  • via a TTL, but people may not like the slowdown happening on a weekly basis and it may always be the wrong time span
  • via the command line or a button in user settings

I think solution 1 is not good enough and solution 2 will take a while since we still don't have settings in the app.

@patman15
Copy link

Trying to get familiar with owncloud concepts ...

@oparoz Just thinking (maybe too) loud: if we talk about "meta data support" we need parsers for different types. Looking through the thumbnail service in core I can find excellent patterns, we would need/should use as well:

  • configurable list of supported mime types
  • providers for different mime types and all the patterns around that
  • all necessary functions/hooks for create/update, etc.

I know we discussed two days ago that this is the ideal point. Looking at the requirements #101 it's more or less duplicating thumbnail generation, name it meta data generation, and replace a bit of logic in the individual providers for the mime types. Further, meta data might be relevant also for non images, e.g. PDFs where you have author, ...

Just want to clarify before I start which patterns to use. I can of course skip the provider concept and just use a switch statement for the types and more or less put all logic into one "metadataservice".

So shouldn't we somehow reduce the scope (not metadata but image properties or so) or really go for a core extension? If (both) not, can someone give me guidance on which patterns are worth the effort for the intended functionality?

@oparoz oparoz added the ready label Sep 24, 2015
@oparoz
Copy link
Contributor

oparoz commented Sep 24, 2015

@patman15 - I'm guessing you're making reference to the Preview Class?

If I understand you correctly, you'd rather add a metaData() method to each preview provider, collecting the information and returning it, ready to be added to the database by the Preview class?

One problem is that the preview system can be disabled. That would stop all meta data collection. core's solution is...an app! owncloud-archive/metadata#1

Attributes - No visible attributes as key-values that can be attached to a file/folder/object. This can be used to store exif data for a picture, id3 data for an mp3 or other attributes.

Using that app means that you won't be able to start development until next year, at the earliest, but it means that all the data collected would be made available via WebDAV.

Hacking the preview class is not a bad idea, but beware. It's a dinosaur pining to be refactored.
Also, the Image class contains a couple of EXIF methods which should be extracted.

If you think it makes more sense, go for it. Just open an issue in core describing what you want to achieve, so that others can voice their opinion. I know the preview class and its providers are not very popular and that the latter will eventually move out of core, so there might not be many people jumping in joy.
Once the solution is ready, we can always include a patch of the PR in Gallery+ as a preview of what's coming in 9.0. And all that data can later be exposed via the metadata app.

The smaller project is, of course, to start here, focusing on media meta data, probably duplicating some of the patterns put in place in core, but without the need to use providers (I think). But it would be good to think big when designing the DB schema, so that it doesn't need too many updates if it moves to core.

@patman15
Copy link

@oparoz Thanks a lot for the overview, I think I got an impression about the surrounding problems.

I'm guessing you're making reference to the Preview Class?

Yes, I was browsing around there. :-)

I'll do something that can be made available earlier than one year but with concepts that can survive longer. Hopefully I can spend enough time. Let's start.

Thanks again!

@jancborchardt
Copy link
Member Author

@oparoz @patman15 pull requests should always go to the original Gallery, never to a fork. Period. cc @karlitschek @DeepDiver1975

@oparoz
Copy link
Contributor

oparoz commented Sep 25, 2015

If the feature is intended for Gallery, yes. It makes things easier from a legal and organisational point of view.

@patman15
Copy link

I did some more research and also performance tests. Here is a short summary of the three options I found:

Method Effort Speed Dependency File Types example*
exif_read_data low fast (1x) none TIFF/JPG 15s
exiftool higher slower (2x) tool installed 65+ 35s
imagick low slow (20x) no additional ~5-10 348s

*) example: 1420 JPG files on local storage, initial scan

My suggestion would be to go with exiftool (can produce php, xml, or json output directly) and use exif_read_data as fall back. Exiftool can be run as a separate process in batch mode, so it has low overhead.
The only issue that still worries me at the moment are external storages as they would require to transfer all files locally first and then scan them. Thus, speed would go down to the one of imagick unless I find a way to pipe data in and read only the necessary (how much?) header of all files.
Database is rather easy as all tools use "[section]:[tag]" => "[value]" whereas [section] can be [section1]:[section2]:...

@oparoz, @jancborchardt are you ok with the additional dependency for exiftool? It would save a huge implementation effort regarding all the file types, etc. Any other thoughts?

@oparoz
Copy link
Contributor

oparoz commented Sep 27, 2015

I'm OK with using exiftool with a exif_*_data fallback.
As you've probably discovered, there is even a composer library to make things easier:
https://github.com/romainneutron/PHPExiftool

I'm not worried about the overhead since we should really only collect the data at upload time and provide a separate scanner for existing files.

Regarding external storage, the files are already transferred locally in order to create the preview. If the hook gives us access to that local representation, then there shouldn't be any problem. If it gives us a "link" to the external file, then it will be difficult to stream the data if I'm not mistaken as not all backends support it.

@oparoz
Copy link
Contributor

oparoz commented Sep 27, 2015

Also, for performance reasons, I would use $this->configService->getSupportedMediaTypes(true, false) to make sure we only generate meta data for something we support.

@patman15
Copy link

As you've probably discovered, there is even a composer library to make things easier:
https://github.com/romainneutron/PHPExiftool

Yes, I did, but then I thought of KISS (keep it simple stupid). The library does a lot of type conversion and is dependent on the version of exiftool. Most things we do not need (including tag writing). And:

This driver is not suitable for production

So I thought more about using something like https://github.com/tsmgeek/ExifTool_PHP_Stayopen or even better take the ideas and build my own class with the functionality we need here.

@oparoz
Copy link
Contributor

oparoz commented Sep 27, 2015

Fine by me :)

@oparoz
Copy link
Contributor

oparoz commented Sep 27, 2015

@cbxk1xg might be interested to collaborate with @patman15 on this as he's also experienced this area.

@patman15
Copy link

Sorry, currently I'm pretty busy with my normal job so implementation is still not progressing. :-(

@oparoz
Copy link
Contributor

oparoz commented Nov 19, 2015

No problem. One thing that would be great, if you can, is to have some sort of roadmap for your project so that we can know a bit more about it and where you're at. It could be that some part can be implemented earlier and beneficial to other items.

@oparoz oparoz changed the title Use EXIF taken date as sorting date for photos Use EXIF taken date as sorting date for photos [$200] Dec 17, 2015
@oparoz oparoz added the bounty label Dec 17, 2015
@oparoz
Copy link
Contributor

oparoz commented Dec 18, 2015

@patman15 There is now a decent bounty on this issue and I hope it will give your some extra hacking motivation this holiday season :)

@patman15
Copy link

Hi!

On 2015-12-18 11:12, Olivier Paroz wrote:

@patman15 https://github.com/patman15 There is now a decent bounty on
this issue and I hope it will give your some extra hacking motivation
this holiday season :)

Motivation is not the issue. Currently it's still time. But let's hope
for the holiday season. At least chances are higher.

Merry Christmas,
Patrick

@oparoz
Copy link
Contributor

oparoz commented Sep 4, 2016

Moved to nextcloud/gallery#51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants