Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
8849c87
add initial setup for testing the porting of the BeatNet model into C++
pasquale90 Jan 17, 2026
951c8f8
test:implement cpp project structure for testing; integrate AudioFile…
pasquale90 Jan 20, 2026
0cac0cd
test: implement cpp testing and store results into separate files wit…
pasquale90 Jan 20, 2026
0513dcb
test: implement Python testing and store results into separate files …
pasquale90 Jan 20, 2026
2857f02
beatNet gitignore additions
bmascaro Feb 28, 2026
f7b678f
framed signal
bmascaro Feb 26, 2026
9d4a686
filterbank processor
bmascaro Feb 26, 2026
05787ba
change parameter values
bmascaro Feb 28, 2026
d57e872
spectral diff
bmascaro Feb 26, 2026
405c1f2
changed file paths to std::filesystem paths
bmascaro Mar 2, 2026
b9b1c21
get sample rate from audiofile
bmascaro Mar 2, 2026
c7f9c70
add vectors time and output0, output1
bmascaro Mar 2, 2026
89ebc4c
changed the loop for audio block and frames processing
bmascaro Mar 2, 2026
a39f114
beatpositions condition
bmascaro Mar 2, 2026
836777f
plots of signal output0 output1
bmascaro Mar 2, 2026
dc3ec8a
separation of parameters buffersize, framelength, hopsize for BeatNet…
bmascaro Mar 3, 2026
6bcc917
separate beat and downbeat
bmascaro Mar 3, 2026
16f1e7e
filepaths changes
bmascaro Mar 3, 2026
4af2d0e
cleanup testCPP
bmascaro Mar 3, 2026
f2d026e
fix 128bpm filename
bmascaro Mar 3, 2026
af33fc6
fix results
bmascaro Mar 3, 2026
289c7d4
Cmakelists.txt : add copy dlls dependencies in the exe directory
bmascaro Mar 5, 2026
644ed2a
submit updated results
pasquale90 Mar 18, 2026
b5dadd8
fix minor bug in cmake/cpm.cmake : replaced LIB_DIR with CMAKE_CURREN…
pasquale90 Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
onnx/out
onnx/.vs/
onnx/libs/
onnx/test/.vs/
onnx/test/out/
onnx/test/Audiofile/
src/BeatNet/__pycache__/
53 changes: 36 additions & 17 deletions onnx/BeatNet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,8 @@ BeatNet::BeatNet(
env(nullptr), session(nullptr), session_options(nullptr),
memory_info(nullptr), allocator(nullptr), run_options(nullptr),
input_name(nullptr), output_name(nullptr),
signal_processor(FRAME_LENGTH, HOP_SIZE),
fft_processor(FRAME_LENGTH, FFT_SIZE, FRAME_SIZE_POW2),
filterbank_processor(BANKS_PER_OCTAVE, FFT_SIZE, SR_BEATNET, 30.0f, 11025.0f, true, true),
fft_processor(FRAME_LENGTH, FFT_SIZE, FRAME_LENGTH),
filterbank_processor(BANKS_PER_OCTAVE, FFT_SIZE, SR_BEATNET, 30.0f, 17000.0f, true, true),
SR(0),bufferSize(0)
{

Expand Down Expand Up @@ -136,18 +135,38 @@ void BeatNet::setup(double sampleRate, int samplesPerBlock) {

bool BeatNet::preprocess(const std::vector<float>& raw_input, std::vector<float>& preprocessed_input) {

std::vector<float> resampled = resampler.resample(raw_input);
std::vector<float> frame;
bool valid_frame = signal_processor.process(resampled,frame);
if (!valid_frame) {
// std::cout<<"invalid frame and will be invalid for the first ~"<<FRAME_LENGTH/resampled.size()-1<<" frames"<<std::endl;
return false;
}
Comment on lines -142 to -145
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably keep this, to make sure that during the first calls of the function, where the first frame that is currently under formation while collecting the first buffers, will return false, aborting the inference of the model. Simply put, in such case, there is not yet a valid input signal to pass to the model.


spectrum = fft_processor.compute_fft(frame);
filters = filterbank_processor.apply(spectrum);
log_fb = log_compress(filters);
diff = spectral_diff(log_fb, prev_log_fb);
std::vector<float> resampledSignal = resampler.resample(raw_input);

// slice original signal to Frames
const int nFrames = 4;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer defining hyperparameters outside functions. Especially since this is a fixed value, you can either define it using #define ( i.e. #define FRAMED_SIGNAL_NFRAMES 4) in BeatNet.h along with the rest of them, or either in the top of the framedSignal.h. I believe that would make things clearer and more maintainable in case of you need to experiment with its value in the future.

FramedSignal framedSignal{ resampledSignal , nFrames, FRAME_LENGTH, HOP_SIZE };
Copy link
Copy Markdown
Owner

@pasquale90 pasquale90 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FramedSignal framedSignal should be declared on the header first. This will create an member object of the BeatNet class. Declaring it here does not make sense for many reasons.

At first, because the framedSignal object resides in the scope of the function, which means that its lifetime has automatic duration and at the end of each function call, the object gets destroyed and defined in the next call. So the concept of accumulating buffers for creating frames to feed the model, is in this way violated.

You should declare the object on the header, initialize it on the BeatNet constructor, and then call framedSignal.process() function from withing preprocess function.


// spectral difference
// last frame
auto frame_3 = framedSignal[3];
auto spectrum_3 = fft_processor.compute_fft(frame_3);
auto filters_3 = filterbank_processor.apply(spectrum_3);
auto log_compress_3 = log_compress(filters_3);
log_fb = std::move(log_compress_3);

// frame before
auto frame_2 = framedSignal[2];
auto spectrum_2 = fft_processor.compute_fft(frame_2);
auto filters_2 = filterbank_processor.apply(spectrum_2);
auto log_compress_2 = log_compress(filters_2);
prev_log_fb = std::move(log_compress_2);

// diff = log_fb3 - log_fb2
diff.assign(log_fb.size(), 0.0f);
std::transform(log_fb.begin(), log_fb.end(), prev_log_fb.begin(),
diff.begin(), std::minus());

// replace negative values with zero
std::replace_if(diff.begin(), diff.end(),
[](float x) {return x < 0.0f; },
0.0f);

// stack log spectrum and spectral difference
hstack(log_fb, diff, preprocessed_input);
return true;
}
Expand Down Expand Up @@ -191,7 +210,7 @@ void BeatNet::inference(std::vector<float>& output) {
output[i] = output_data[i];
}

printOutputShape(output_tensor);
// printOutputShape(output_tensor);

ReleaseValue(input_tensor);
ReleaseValue(output_tensor);
Expand All @@ -214,4 +233,4 @@ void BeatNet::printOutputShape(OrtValue* output_tensor) {
std::cout << "]" << std::endl;

ReleaseTensorTypeAndShapeInfo(shape_info);
}
}
13 changes: 7 additions & 6 deletions onnx/BeatNet.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#include <string>
#include "onnxruntime_c_api.h"
#include "resampler.h"
#include "frameprocessor.h"
#include "framedSignal.h"
#include "fftprocessor.h"
#include "filterbankprocessor.h"
#include "logspecutils.h"
Expand All @@ -17,10 +17,10 @@ constexpr double MS_FR_GITHUB {0.064};
constexpr double MS_HOP_GITHUB {0.020};
constexpr int FRAME_LENGTH {static_cast<int>(SR_BEATNET*MS_FR_GITHUB)}; // 1411
constexpr int HOP_SIZE {static_cast<int>(SR_BEATNET*MS_HOP_GITHUB)}; // 441
constexpr int FFT_SIZE { FRAME_LENGTH / 2 + 1}; // 706
constexpr int FFT_SIZE {FRAME_LENGTH / 2}; // 705
constexpr int FRAME_SIZE_POW2 {2048}; // this is the minumum higher than FRAME_LENGTH (1411) that is a power-of-two value.
constexpr int FBANK_SIZE {272};
constexpr int BANKS_PER_OCTAVE {16}; // {24};;
constexpr int BANKS_PER_OCTAVE {24};

using OrtGetApiBaseFn = const OrtApiBase* (*)();
using OrtCreateTensorWithDataAsOrtValueFn = OrtStatus* (*)
Expand Down Expand Up @@ -125,16 +125,17 @@ class BeatNet{

// Preprocessing
Resampler resampler;
FramedSignalProcessor signal_processor;

FFTProcessor fft_processor;
FilterBankProcessor filterbank_processor;
std::vector<float> preprocessed_input;
std::vector<int64_t> input_shape;
std::vector<float> spectrum;
std::vector<float> filters;
std::vector<float> log_fb;
std::vector<float> diff;
std::vector<float> prev_log_fb;
std::vector<float> diff;


// helper functions - preprocess for feature extraction and inference for model utilization
bool preprocess(const std::vector<float>& raw_input, std::vector<float>& preprocessed_input);
Expand All @@ -143,4 +144,4 @@ class BeatNet{

};

#endif
#endif
2 changes: 1 addition & 1 deletion onnx/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ endif()
set(LIB_SOURCE_FILES
BeatNet.cpp
resampler.cpp
frameprocessor.cpp
framedSignal.cpp
fftprocessor.cpp
filterbankprocessor.cpp
logspecutils.cpp
Expand Down
105 changes: 89 additions & 16 deletions onnx/filterbankprocessor.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "filterbankprocessor.h"
#include <algorithm>

FilterBankProcessor::FilterBankProcessor(
int bands_per_octave,
Expand All @@ -22,24 +23,76 @@ FilterBankProcessor::FilterBankProcessor(
void FilterBankProcessor::buildFilters() {
filters.clear();
float num_octaves = std::log2(fmax / fmin);
int num_filters = static_cast<int>(std::floor(num_octaves * bands_per_octave));
std::vector<float> centers(num_filters + 2);
// centerfrequencies (219)
int num_filters = static_cast<int>(std::floor(num_octaves * bands_per_octave)); // (219)

for (int i = 0; i < centers.size(); ++i) {
centers[i] = fmin * std::pow(2.0, (float)i / (float)bands_per_octave);
}
/*
# get the range
left = np.floor(np.log2(float(fmin) / fref) * bands_per_octave)
right = np.ceil(np.log2(float(fmax) / fref) * bands_per_octave)
# generate frequencies
frequencies = fref * 2. ** (np.arange(left, right) /
float(bands_per_octave))
# filter frequencies
# needed, because range might be bigger because of the use of floor/ceil
frequencies = frequencies[np.searchsorted(frequencies, fmin):]
frequencies = frequencies[:np.searchsorted(frequencies, fmax, 'right')]

*/
const float fref = 440.0; // 440Hz reference value in madmom python code
float left = std::floor(std::log2(fmin / fref) * bands_per_octave);
float right = std::ceil(std::log2(fmax / fref) * bands_per_octave);

// centers
std::vector<float> centers(num_filters);
float val = left + 1.0f; // left + 1 to skip the first value which is < fmin
std::generate(centers.begin(), centers.end(),
[&val, fref, this]()
{
return fref * std::pow(2.0f, val++ / (float) bands_per_octave);
});

// bins
std::vector<int> bins = centersHzToBins(centers);
for (int i = 1; i < bins.size() - 1; ++i) {

std::vector<float> filt(fft_size, 0.0f); // std::vector<float> filt(fft_size / 2 + 1, 0.0);

int l = bins[i - 1]; // float l = hzToBin(centers[i - 1]);
int c = bins[i]; // float c = hzToBin(centers[i]);
int r = bins[i + 1]; // float r = hzToBin(centers[i + 1]);

int start = l;
int center = c - l; // relative to start
int stop = r - l; // relative to start

/*
data = np.zeros(stop)
# rising edge (without the center)
data[:center] = np.linspace(0, 1, center, endpoint=False)
# falling edge (including the center, but without the last bin)
data[center:] = np.linspace(1, 0, stop - center, endpoint=False)
*/

int n = stop;
std::vector<float> data(n, 0.0f);

float dx = 1.0f / center;

for (int i = 1; i < centers.size() - 1; ++i) {
std::vector<float> filt(fft_size / 2 + 1, 0.0);
float l = hzToBin(centers[i - 1]);
float c = hzToBin(centers[i]);
float r = hzToBin(centers[i + 1]);
// rising edge(without the center)
float x0 = 0.0f;
for (int i = 0; i < center; ++i) {
data[i] = x0 + (i * dx);
}

// falling edge (including the center, but without the last bin)
x0 = 1.0f;
for (int i = center; i < stop; ++i) {
data[i] = x0 - ( (i - center) * dx);
}

for (int j = (int)std::ceil(l); j < (int)std::ceil(c) && j < filt.size(); ++j)
filt[j] = (j - l) / (c - l);
std::copy(data.begin(), data.end(), filt.begin() + start);

for (int j = (int)std::ceil(c); j < (int)std::ceil(r) && j < filt.size(); ++j)
filt[j] = (r - j) / (r - c);

if (norm_filters) {
float sum = std::accumulate(filt.begin(), filt.end(), 0.0);
Expand All @@ -66,6 +119,26 @@ int FilterBankProcessor::numBands() const
return (int)filters.size();
}

float FilterBankProcessor::hzToBin(float f) const {
return (f / (float)sample_rate) * fft_size;
std::vector<int> FilterBankProcessor::centersHzToBins(const std::vector<float>& centers) const {

std::vector<int> bins(centers.size());
for (int i= 0; i < bins.size(); ++i)
{
const float value = std::round( centers[i] / ((float) sample_rate / 2.0f)* fft_size);
bins[i] = static_cast<int>(value);
}

// keep values unique
auto newend = std::unique(bins.begin(), bins.end());
bins.erase(newend, bins.end());

// remove values higher than fft_size
const int size_max = fft_size - 1;
newend = std::remove_if(bins.begin(), bins.end(), [&size_max](int x) {return x > size_max;});
bins.erase(newend, bins.end());

// add the size_max value at the end of the array
bins.push_back(size_max);

return bins;
}
2 changes: 1 addition & 1 deletion onnx/filterbankprocessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class FilterBankProcessor {
bool unique_filters;
std::vector<std::vector<float>> filters;

float hzToBin(float f) const;
std::vector<int> centersHzToBins(const std::vector<float>& centers) const;

};

Expand Down
56 changes: 56 additions & 0 deletions onnx/framedSignal.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#include "framedSignal.h"
#include <algorithm>
#include <stdexcept>
#include "iostream"

FramedSignal::FramedSignal(const std::vector<float>& inputSignal, int nFrames, int frameSize, int hopSize)
: original_signal(inputSignal),
nFrames(nFrames),
frameSize(frameSize),
hopSize(hopSize)
{
int nMax = ((nFrames -1) * hopSize) + frameSize;
padded_signal.assign(nMax, 0.0f);

{
auto s0 = original_signal.begin();
auto sEnd = original_signal.end();
auto destination = padded_signal.begin() + frameSize / 2;

int i = frameSize / 2;

std::copy_if(s0, sEnd, destination,
[&i, nMax](float x)
{
return i++ < nMax;
});
}

for (int iFrame = 0, index = 0; iFrame < nFrames; iFrame++, index += hopSize)
{
auto i0 = padded_signal.begin() + index;

std::vector<float> signal(i0, i0 + frameSize);
frames.push_back(signal);
}
}
Comment on lines +11 to +36
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the framing logic on the constructor results in the inputSignal variable being lost, because the object of the class is simply destructed after the scope of the caller (the audio callback) reach to the end of execution. The only way to maintain the information of it would be to declare inputSignal as a static object in the header, but that s not the best design IMO for that particular implementation.

This logic should be transferred to a process function that collects input buffers in each call, and utilizes them to create the frames that will feed the model. Take a look here

bool process(const std::vector<float>& input, std::vector<float>& frame_out);

There's an input buffer coming in, in each call, and an output frame coming out, while the function returns true if a valid frame is produced (simply because the first buffers wont be enough in length to produce a full frame)


FramedSignal::~FramedSignal()
{

}

std::vector<float> FramedSignal::operator[](int i)
{
return frames.at(i);
}

std::vector<float> FramedSignal::getOriginalSignal()
{
return original_signal;
}

int FramedSignal::get_nFrames()
{
return nFrames;
}
26 changes: 26 additions & 0 deletions onnx/framedSignal.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#ifndef FRAMEDSIGNAL_H
#define FRAMEDSIGNAL_H

#include <vector>

class FramedSignal {
public:

FramedSignal(const std::vector<float>& inputSignal, int nFrames, int frameSize, int hopSize);
~FramedSignal();

std::vector<float> operator[](int i);
std::vector<float> getOriginalSignal();
int get_nFrames();
Comment on lines +13 to +14
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are not used. Should they be removed?


private:
std::vector<float> original_signal;
int nFrames;
int frameSize;
int hopSize;

std::vector<float> padded_signal;
std::vector<std::vector<float>> frames;
};

#endif
Loading