Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split fingerprinting #87

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thesunlover
Copy link
Contributor

test_fingerprint_by_splitting.py creates long file from the existing files in mp3/.

to fingerprint with a length check you should use
djv.fingerprint_with_duration_check(long_song, song_name="Concatenates_test")
as shown in the split-test file

@thesunlover
Copy link
Contributor Author

this is the updated version of the previous pull request
it should consider verification checks with the wavio situation
there is no chance for me to test that.

@thesunlover thesunlover mentioned this pull request Aug 5, 2015
@thesunlover
Copy link
Contributor Author

repost of PR #75
for issue #18

@wangzhengyi
Copy link

You don't consider the offset_seconds.

@thesunlover
Copy link
Contributor Author

@wangzhengyi 👍
I would request code comments and recommendations.
Edit1:
the new function is based on all existing
For me it is enough the use of offset_seconds to happen in there
Got It. What is needed is to calculate and add the previous lengths...
any proper suggestions are welcomed

@sheffieldnikki
Copy link

Any news on merging this with the master branch? dejavu is almost unusable on low memory machines - even the example mp3 files give out of memory errors when trying to fingerprint on a 512MB machine :( (and relying on swap is a disaster on this machine - its only storage is a memory card). Thanks

@NathanielCustom
Copy link

The solution that worked for me to get the offsets correct is to (A) extract the offset (in seconds) as defined by the split file name (ex. start_sec60_end_sec120.mp3), (B) convert the seconds value to the equivalent sampling offset value, and (C) add the derived sampling offset value to the offset as determined by the fingerprinting process for the given file.

Note: I am using a different fork so some of the smaller details may be different ex. database.py naming.

(A) Extract Offset Data & (B) Extract Offset Data

# __init__.py

def _fingerprint_worker(filename, limit=None, song_name=None):
   ...
   channel_amount = len(channels)

   # Get Offset from name.
   try:
       first_split = song_name.split("start_sec", 1)
       select_second = first_split[1]
       second_split = select_second.split("_end_sec", 1)
       
       # Convert second_split[0] to sampling offset
       split_offset = round(
           int(second_split[0]) * fingerprint.DEFAULT_FS /
           fingerprint.DEFAULT_WINDOW_SIZE / fingerprint.DEFAULT_OVERLAP_RATIO,
           5
       )
   except:
       split_offset = 0
    ...
    return song_name, result, file_hash, split_offset

Iterate and Pass the Value

# __init__.py

while True:
            try:
                song_name, hashes, file_hash, split_offset = next(iterator)
            ...
            else:
                #sid = self.db.insert_song(song_name, file_hash) # REMOVE
                if treat_as_split and song_name_for_the_split:
                    sid = self.db.insert_song(song_name_for_the_split, file_hash)
                if not treat_as_split:
                    sid = self.db.insert_song(song_name, file_hash)               

                self.db.insert_hashes(sid, hashes, split_offset)
                ...

(C) Apply Offset

# database.py

    def insert_hashes(self, sid, hashes, split_offset=0):
        ...
        for hash, offset in set(hashes):
            fingerprints.append(
                Fingerprint(
                    hash=binascii.unhexlify(hash),
                    song_id=sid,
                    offset=int(offset+split_offset)
                )
            )
        self.session.bulk_save_objects(fingerprints)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants