Log


2024-06-03T20:23:11+00:00nicorare:

X-axis decorators feature for annotating data.


2024-05-28T22:23:26+00:00nicorare:

Implementing onset + transient start detection for percussive sounds in music and doing some benchmarks; ended up taking the spectral flux approach with some specifics:.

  1. Differentiation of the input to emphasize high frequency changes instead of weighting the difference of each frequency bin.
  2. Short-time fourier transform with dynamic size depending on input sample rate (128@44.1khz).
    Smaller size for better time resolution (also better low-end resolution isn't necessary for transient detection).

Input (raw)

Differentiated
  1. No normalization between frames, as it caused to miss changes in amplitude when the spectral distribution between frames was similar, and sudden changes in amplitude of a signal with "sustained" tonality (more common in synthesized percussion) would be ignored.
  1. Squared difference instead of full L2 norm. Easier to generate a threshold value to determine the start of a percussive sound in a full mix, and ignore the elements with less transients.

Squared spectral flux

L2 norm
  1. No half-wave rectification (or relu func, for limiting of the difference vector to positive values, which has the effect of only reacting to an increase in energy).

    After some benchmarks focusing on the specific use case of detecting a single percussive element in a mix, opted for no half-wave rectification which worked better with the rest of the pipeline that uses the generated envelope.
  2. Upsampling from the windowed STFT resolution back to input sample rate using cubic interpolation to get a smooth envelope and a better transient start estimation.

Half-wave rectification

Comparison between normal and half-wave rect.

2024-05-24T09:27:25+00:00nicorare:

Testing snare extraction + resynthesis model with noise tunning.

Input:

Output + variations:

Input:

Output + variations:


2024-05-16T13:47:34+00:00nicorare:

"many-in-one" envelope editor built overlaying multiple linear segment components to edit parameters inline simultaneously.

Visibility can be toggled, and any envelope can be selected by focusing on one of its breakpoints.

Not sure why this approach of seeing all related envelopes in the same space isnt used by more applications, which instead tend to separate each time-based control into its own section or track.


2024-05-06T19:48:47+00:00 nicorare:

Improved extraction for more complex inputs with presence of reverb, noisy transients or undefined low end tail; recreated with multiple variations in decay and transients.

Input:

Output + variations:


After searching for tools to smooth data with adaptive/dynamic parameters, ended up making a simple time-domain filter using a classic sliding window with adjustable weight distribution/curve (exponential, gaussian, whatever) and window size at each step.

The other solutions were getting tricky use because they introduced some non-linear or non-symmetrical phase delay effects on the output, making similarity or correlation tasks less precise unless some correction like bidirectional filtering was applied. (which would at least require storing parameter changes over time and reading them in reverse along with the signal during the backward pass).

Addaptive filter application that adjusts smoothing depending on input rate of change and input value.


Segmented linear editor with support for panning and zoom behaviour by syncing it with an underlaying draggable area component drawing wave data.

Combined with the other base components to build a scrollable waveform visualization and applytransformations inline.


Simple segmented linear editor for drawing wave data. A spline/cubic bezier interp would provide more natural data for audio, but leaving it as linear for now, since it would also require managing control points in order to be useful.

Editable breakpoint positions with constraints:


Mini QOL improvement for dragging both selection markers at once.


Percussion extraction and resynthesis experiments with relatively simple electronic sources.

Inputs taken from user suggestions in discord and reddit reconstructed into a clean, mostly sine-wave based kick drum.

Input:

Output + variations:

Input:

Output + variations:

Input:

Output + variations:

Input:

Output + variations:


Adding sample-accurate start and end selection controls by building on top of the scrollable waveform.

Horizontally draggable marker components overlayed on top of the XY pad component and in sync with the panning/zooming of the graph, inline as part of the same visualization with minimal overlap.


Some experiments with interpolation methods. I think these alternating rendering styles look quite pretty besides the increased performance, but they can be less accurate (and possibly jarring).


More optimizations for waveform rendering; now the visualization resorts to a lower fidelity representation during movement to make it faster while scrolling data.

There is a lack of balance in the perceived grey tones while switching between these two algorithms.

(!)There is a clear jitter-like effect in the anchor point that needs to be fixed


Optimized waveform rendering with more aggressive resampling, the signal is converted to a list of min-max ranges for each vertical position, with added neighbourg's slope (plain bool value that indicates "increasing" or "decreasing") for lerp between each vertical.

Some artifacts from the filtered samples are visible and make the waveform look kind of unstable when being panned, but still a great approximation for the performance gains.


Smoother waveform rendering algorithm with simple linear interpolation and some optimization tricks.

Currently overlapping each channel in favor of the previous amplitude range visualization, quite slower at the moment, but more intuitive.


Explorable waveform viz component with data that can load from disk using juce's audio thumbnail, which has a background thread for resolving sample data, can read in chunks from disk as necessary and handles caching. Avoids having to load everything into memory when not needed.

Demo with big wav files read from disk. mp3 files are noticeably slower when panning into non-cached data.


Adding zoom behaviour. XY dragging area component as a primitive that can be added for mouse-anchored pan and zoom actions over the waveform.


Making some UI components for working with time-series data in a more "inlined" style. Aiming to improve the user experience of some sample-based tools I've been working on.

Scrollable multi channel audio data in memory with drag control.

Instead of the common mono or overlapping audio channels visualizations, the audio is rendered as a single solid, with the range of amplitude covered vertically at each sample position.

Currently building with C++ and JUCE.


↓ © RARE DIGITAL SIGNAL PROCESSING