-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
In the doanet_parameters.py you can set your sampling rate parameter by the FEATURE PARAMS fs parameter, but when I changed it from 24000 to 16000, the results got very bad.
Turns out the code doesn't do the downsampling, but the labels get misaligned.
I think cls_feature_class.py def _load_audio should look something like this:
def _load_audio(self, audio_path):
fs, audio = wav.read(audio_path)
audio = audio[:, :self._nb_channels] / 32768.0 + self._eps
audio = np.stack([librosa.resample(audio_ch, orig_sr=fs, target_sr=self._fs) for audio_ch in audio.T]).T # MISSING ROW
fs = self._fs
if audio.shape[0] < self._audio_max_len_samples:
zero_pad = np.random.rand(self._audio_max_len_samples - audio.shape[0], audio.shape[1])*self._eps
audio = np.vstack((audio, zero_pad))
elif audio.shape[0] > self._audio_max_len_samples:
audio = audio[:self._audio_max_len_samples, :]
return audio, fsMetadata
Metadata
Assignees
Labels
No labels