video-transcription-generator Generates transcripts with speaker diarization from video and audio files. Speaker Diarization using resemblyzer VoiceEncoder and Spectral Clustering. Speech-to-text using HuggingFace pretrained Speech2Text model