Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference possibilities #2

Open
amil-rp-work opened this issue Sep 13, 2020 · 1 comment
Open

Inference possibilities #2

amil-rp-work opened this issue Sep 13, 2020 · 1 comment

Comments

@amil-rp-work
Copy link

Hey,
Thanks for providing such an amazing easy-to-use webapp code.

I was wondering if this pre-trained model could be used for any random non-youtube video file(audio or video)?

@jramcast
Copy link
Owner

The models do not process video, only audio.

You can use any 16 bit PCM wav file, which is then preprocessed by the vggish network here (this is a prerequisite for models trained on Audioset):

https://github.com/jramcast/mgr-service/blob/master/mgr/infrastructure/audioset/vggish/loader.py#L46
https://github.com/jramcast/mgr-service/blob/master/mgr/infrastructure/audioset/vggish/model/vggish_input.py#L75

If you need to use other audio formats, e.g. mp3, you need to covert the audio to wav before feeding it into the model. I used ffmpeg here to extract the wav audio from youtube videos, but you can also use it to convert mp3 to wav.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants