Skip to content

Course project of Introduction to Visual and Audio System. Pytorch implementation of "The Sound of Pixels".

Notifications You must be signed in to change notification settings

IrisLi17/Sound_of_Pixels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sound_of_Pixels

Course project of Introduction to Visual and Audio System.

Pytorch implementation of The Sound of Pixels.

Environment

  • python 3.5
  • pytorch 0.4.1
  • cuda 8.0
  • cudnn 6.0

Structure

  • models/
  • util/

Analysis for STFT

  • The wave data in dataset is sampled by a sample_rate of 44.1kHz
  • According to the paper, we can divide the wave into segments around 6s. And then down-sample these segments to 11kHz.
  • Then we will get around 66k samples for each segment.
  • With a 1202 long window, we can get 256 samples in time domain with a gap of 256 between each two windows.
  • 1022+256*255=66302, which means we need to cut the wave at least each 6.027 seconds, we can take it as 6.05 seconds.
  • In the frequency domain, we first get 512 samples via DTFT, and then resample it with log f scale into 256 samples.

About

Course project of Introduction to Visual and Audio System. Pytorch implementation of "The Sound of Pixels".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages