Audio Stuff

Hi,
I’m building some audio feature extraction networks. I plan on releasing it in some way, maybe on chopchop, maybe here. While building these I noticed a couple of things:

Audio Spectrum CHOP:

  • It would be great if we could get complex output (real and imaginary). This would make it easier to calculate some features (autocorrelation for example)
  • If we switch it to magnitude and phase, we can’t adjust the frame length, I guess it just does a (windowed?) fft on the input buffer. This is nice, clear and I like it’s simplicity. But how would I go about re-framing this input if I need a different frame size? Would I need to window it manually? What about overlap?
  • The output frame length seems odd sometimes. If you plug in an Audio File In CHOP (all defaults) into the Audio Spectrum CHOP, you get out a buffer of length 735. The internal framelength seems to be 1024 with meaningful output of length 512 (real only/symmetric spectrum blabla). We have to trim the output manually to arrive at a buffer of length 512 to get what we wanted.
    Audio Filter CHOP:
  • It would be great if we could have a biquad, or a one-zero and a one-pole with direct coefficient input. You obviously have built various filters already and you have implemented the formulae to get the coefficients from the filter parameters (frequency,Q → bs and as). It would be great if we could specify the coefficients of the transfer function directly (in Second-order-section format to ensure numerical stability for higher orders). This would enable us to use coefficients from scientific publications, for example to implement perceptual models correctly. Also, a minimal version of what I want here doesn’t seem like a lot of work since it just exposes some internal parameters of the existing Audio filter CHOP.
  • It would be great if we had an efficient convolution CHOP (We can do convolution via multiplication of spectra, but this is suboptimal for some applications)

Thanks a lot