Refer to the supporting webpage here: https://apps.ahlab.org/DroneAudioSet-code/
Dataset available here: https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet
Use the pandas or datasets libraries to download the dataset from HuggingFace.
ComputeResourcesCheck folder contains all sample audio files
ComputeResourcesCheck/preprocessed-audio: contains 6 audio file recordings containing source and drone sounds. The chosen setting was:
Volume: 80pc
Room: room1
Drone: drone1
Drone-Speaker Distance: speaker-dist-1m
Mic: mic3_8array-up
Drone-Mic Distance: mic-dist-25cm
ComputeResourcesCheck/beamforming: contains 6 audio files after thepreprocess-audiofiles are passed through the beamforming (MVDR) stage.ComputeResourcesCheck/spectral-gating: contains 6 audio files after thebeamformingfiles are passed through the spectral gating (noise-reduce) stage.ComputeResourcesCheck/mpsenet: contains 6 audio files after thebeamformingfiles are passed through the neural noise suppression (MPSENET) stage.ComputeResourcesCheck/classification: contains 6 audio files after thempsenetfiles are passed through the classification (SSLAM) stage.
Go to scripts/
conda create -n droneaudioset python=3.9
conda activate droneaudioset
pip install -r requirements.txt
For mpsenet and sslam, separate environments are required, as provided in requirements_mpsenet.txt and requirements_sslam.txt.
For more details, refer to:
MPSENet Github: https://github.com/yxlu-0102/MP-SENet/tree/main
SSLAM Github: https://github.com/ta012/SSLAM/tree/main
- create a sub-folder
mpsenetin thescriptsfolder - copy the folder
models, and the python files dataset.py, env.py, inference.py, and utils.py from MPSENET repo (https://github.com/yxlu-0102/MP-SENet/tree/main) into thescripts/mpsenetfolder - also copy over
best_ckptfrom MPSENET repo toscriptsfolder
- create a sub-folder
SSLAMin thescriptsfolder - copy the folder
SSLAM_inferenceand the filescheckpoint_best.ptandlabel_descriptors.csvfrom SSLAM repo (https://github.com/ta012/SSLAM/tree/main) toscripts/SSLAMfolder
audioset_labelmapping.csv: has the mapping of the 527 audioset classes to the three categories HV (human vocals), HNV (human non-vocals), and NH (non-human) soundsavg-snr-per-setting.csv: has the average SNR computed per settingdocs: contains all files for the webpage