Concentus Apply window and compute the MDCT for all sub-frames and all channels in a frame

OPT: This is the kernel you really want to optimize. It gets used a lot by the prefilter and by the PLC.

non-pointer case

only needed in one place

Decoder state

Scratch space used by the decoder. It is actually a variable-sized field that resulted in a variable-sized struct. There are 6 distinct regions inside. I have laid them out into separate variables here, but these were the original definitions: val32 decode_mem[], Size = channels*(DECODE_BUFFER_SIZE+mode.overlap) val16 lpc[], Size = channels*LPC_ORDER val16 oldEBands[], Size = 2*mode.nbEBands val16 oldLogE[], Size = 2*mode.nbEBands val16 oldLogE2[], Size = 2*mode.nbEBands val16 backgroundLogE[], Size = 2*mode.nbEBands

The original C++ defined in_mem as a single float[1] which was the "caboose" to the overall encoder struct, containing 5 separate variable-sized buffer spaces of heterogeneous datatypes. I have laid them out into separate variables here, but these were the original definitions: val32 in_mem[], Size = channels*mode.overlap val32 prefilter_mem[], Size = channels*COMBFILTER_MAXPERIOD val16 oldBandE[], Size = channels*mode.nbEBands val16 oldLogE[], Size = channels*mode.nbEBands val16 oldLogE2[], Size = channels*mode.nbEBands

Definition for each "pseudo-critical band"

Number of lines in allocVectors

Number of bits in each band for several rates

Takes the pitch vector and the decoded residual vector, computes the gain that will give ||p+g*y||=1 and mixes the residual with the pitch. Decode pulse vector and combine the result with the pitch vector to produce the final normalised signal in the current band.

For performance reasons, do not use this generic class if possible

This simulates a C++ style pointer as far as can be implemented in C#. It represents a handle to an array of objects, along with a base offset that represents the address. When you are programming in debug mode, this class also enforces memory boundaries, tracks uninitialized values, and also records all statistics of accesses to its base array.

Returns the value currently under the pointer, and returns a new pointer with +1 offset. This method is not very efficient because it creates new pointers; this is because we must preserve the pass-by-value nature of C++ pointers when they are used as arguments to functions

Copies the contents of this pointer, starting at its current address, into the space of another pointer. !!! IMPORTANT !!! REMEMBER THAT C++ memcpy is (DEST, SOURCE, LENGTH) !!!! IN C# IT IS (SOURCE, DEST, LENGTH). DON'T GET SCOOPED LIKE I DID

Copies the contents of this pointer, starting at its current address, into an array. !!! IMPORTANT !!! REMEMBER THAT C++ memcpy is (DEST, SOURCE, LENGTH) !!!!

Loads N values from a source array into this pointer's space

Assigns a certain value to a range of spaces in this array

The value to set The number of values to write

Assigns a certain value to a range of spaces in this array

The value to set The number of values to write

Moves regions of memory within the bounds of this pointer's array. Extra checks are done to ensure that the data is not corrupted if the copy regions overlap

The offset to send this pointer's data to The number of values to copy

This is a helper class which contains static methods that involve pointers

Allocates a new array and returns a pointer to it

Creates a pointer to an existing array

*The number of bits to use for the range-coded part of uint integers.*/ *The resolution of fractional-precision bit usage measurements, i.e.,

Normalizes the contents of val and rng so that rng lies entirely in the high-order symbol.

The probability of having a "one" is 1/(1<<_logp).

Outputs a symbol, with a carry bit. If there is a potential to propagate a carry over several symbols, they are buffered until it can be determined whether or not an actual carry will occur. If the counter for the buffered symbols overflows, then the stream becomes undecodable. This gives a theoretical limit of a few billion symbols in a single packet on 32-bit systems. The alternative is to truncate the range in order to force a carry, but requires similar carry tracking in the decoder, needlessly slowing it down.

Returns the number of bits "used" by the encoded or decoded symbols so far. This same number can be computed in either the encoder or the decoder, and is suitable for making coding decisions. This will always be slightly larger than the exact value (e.g., all rounding error is in the positive direction).

The number of bits.

This is a faster version of ec_tell_frac() that takes advantage of the low(1/8 bit) resolution to use just a linear function followed by a lookup to determine the exact transition thresholds.

Integer log in base2. Undefined for zero and negative numbers Integer log in base2. Defined for zero, but not for negative numbers

Multiplies two 16-bit fractional values. Bit-exactness of this macro is important

Compute floor(sqrt(_val)) with exact arithmetic. This has been tested on all possible 32-bit inputs.

Sqrt approximation (QX input, QX/2 output) Reciprocal approximation (Q15 input, Q16 output) Reciprocal sqrt approximation in the range [0.25,1) (Q16 in, Q14 out) Base-2 logarithm approximation (log2(x)). (Q14 input, Q10 output) Base-2 exponential approximation (2^x). (Q10 input, Q16 output)

Rotate a32 right by 'rot' bits. Negative rot values result in rotating left. Output is 32bit int.

Rotate a32 right by 'rot' bits. Negative rot values result in rotating left. Output is 32bit uint.

((a32 >> 16) * (b32 >> 16))

Adds two signed 32-bit values in a way that can overflow, while not relying on undefined behaviour (just standard two's complement implementation-specific behaviour)

Subtracts two signed 32-bit values in a way that can overflow, while not relying on undefined behaviour (just standard two's complement implementation-specific behaviour)

Multiply-accumulate macros that allow overflow in the addition (ie, no asserts in debug mode)

(a32 * (int)((short)(b32))) >> 16 output have to be 32bit int

//////////////////

Add with saturation for positive input values

saturates before shifting

Macro to convert floating-point constants to fixed-point by applying a scalar factor Because of limitations of the C# JIT, this macro is actually evaluated at runtime and therefore should not be used if you want to maximize performance

PSEUDO-RANDOM GENERATOR Make sure to store the result as the seed for the next call (also in between frames), otherwise result won't be random at all. When only using some of the bits, take the most significant bits by right-shifting.

silk_SMMUL: Signed top word multiply.

Divide two int32 values and return result as int32 in a given Q-domain

I numerator (Q0) I denominator (Q0) I Q-domain of result (>= 0) O returns a good approximation of "(a32 << Qres) / b32"

Invert int32 value and return result as int32 in a given Q-domain

I denominator (Q0) I Q-domain of result (> 0) a good approximation of "(1 << Qres) / b32"

a32 + (b32 * (int)((short)(c32))) >> 16 output have to be 32bit int

* (a32 * (b32 >> 16)) >> 16 */ * (int)((short)(a32)) * (b32 >> 16) */ * a32 + (int)((short)(b32)) * (c32 >> 16) */ * a64 + (b32 * c32) */

(a32 * b32) >> 16

a32 + ((b32 * c32) >> 16)

Get number of leading zeros and fractional part (the bits right after the leading one)

input number of leading zeros the 7 bits right after the leading one

Approximation of square root. Accuracy: +/- 10% for output values > 15 +/- 2.5% for output values > 120

Approximation of 128 * log2() (very close inverse of silk_log2lin()) Convert input to a log scale

(I) input in linear scale

Approximation of 2^() (very close inverse of silk_lin2log()) Convert input to a linear scale

input on log scale Linearized value

Interpolate two vectors

(O) interpolated vector [MAX_LPC_ORDER] (I) first vector [MAX_LPC_ORDER] (I) second vector [MAX_LPC_ORDER] (I) interp. factor, weight on 2nd vector (I) number of parameters

Inner product with bit-shift

I input vector 1 I input vector 2 I number of bits to shift I vector lengths

returns the value that has fewer higher-order bits, ignoring sign bit (? I think?)

Counts leading zeroes

returns inverse base-2 log of a value

Arbitrary-rate audio resampler originally implemented for the Speex codec.

typedef int (* resampler_basic_func)(SpeexResamplerState*, int , Pointer<short>, int *, Pointer<short>, Pointer<int>);

Create a new resampler with integer input and output rates (in hertz).

The number of channels to be processed Input sampling rate, in hertz Output sampling rate, in hertz Resampling quality, from 0 to 10

Create a new resampler with fractional input/output rates. The sampling rate ratio is an arbitrary rational number with both the numerator and denominator being 32-bit integers.

The number of channels to be processed Numerator of sampling rate ratio Denominator of sampling rate ratio Input sample rate rounded to the nearest integer (in hz) Output sample rate rounded to the nearest integer (in hz) Resampling quality, from 0 to 10 A newly created resampler

Make sure that the first samples to go out of the resamplers don't have leading zeros. This is only useful before starting to use a newly created resampler. It is recommended to use that when resampling an audio file, as it will generate a file with the same length.For real-time processing, it is probably easier not to use this call (so that the output duration is the same for the first frame).

Clears the resampler buffers so a new (unrelated) stream can be processed.

Sets the input and output rates

Input sampling rate, in hertz Output sampling rate, in hertz

Get the current input/output sampling rates (integer value).

(Output) Sampling rate of input (Output) Sampling rate of output

Sets the input/output sampling rates and resampling ration (fractional values in Hz supported)

Numerator of the sampling rate ratio Denominator of the sampling rate ratio Input sampling rate rounded to the nearest integer (in Hz) Output sampling rate rounded to the nearest integer (in Hz)

Gets the current resampling ratio. This will be reduced to the least common denominator

(Output) numerator of the sampling rate ratio (Output) denominator of the sampling rate ratio

Gets or sets the resampling quality between 0 and 10, where 0 has poor quality and 10 has very high quality.

Gets or sets the input stride

Gets or sets the output stride

Get the latency introduced by the resampler measured in input samples.

Gets the latency introduced by the resampler measured in output samples.

Gets the latency introduced by the resampler.

The Opus decoder structure. Opus is a stateful codec with overlapping blocks and as a result Opus packets are not coded independently of each other. Packets must be passed into the decoder serially and in the correct order for a correct decode. Lost packets can be replaced with loss concealment by calling the decoder with a null reference and zero length for the missing packet. A single codec state may only be accessed from a single thread at a time and any required locking must be performed by the caller. Separate streams must be decoded with separate decoder states and can be decoded in parallel.

Decodes an Opus packet, putting the decoded audio into a floating-point buffer.

The input payload. This may be empty if the previous packet was lost in transit (when PLC is enabled) A buffer to put the output PCM. The output size is (# of samples) * (# of channels). You can use the OpusPacketInfo helpers to get a hint of the frame size before you decode the packet if you need exact sizing. Otherwise, the minimum safe buffer size is 5760 samples The number of samples (per channel) of available space in the output PCM buf. If this is less than the maximum packet duration (120ms; 5760 for 48khz), this function will not be capable of decoding some packets. In the case of PLC (data == NULL) or FEC (decode_fec == true), then frame_size needs to be exactly the duration of the audio that is missing, otherwise the decoder will not be in an optimal state to decode the next incoming packet. For the PLC and FEC cases, frame_size *must* be a multiple of 10 ms. Indicates that we want to recreate the PREVIOUS (lost) packet using FEC data from THIS packet. Using this packet recovery scheme, you will actually decode this packet twice, first with decode_fec TRUE and then again with FALSE. If FEC data is not available in this packet, the decoder will simply generate a best-effort recreation of the lost packet. The number of decoded samples

Decodes an Opus packet, putting the decoded audio into an int16 buffer.

Resets all buffers and prepares this decoder to process a fresh (unrelated) stream

Gets the version string of the library backing this implementation.

An arbitrary version string.

Gets the encoded bandwidth of the last packet decoded. This may be lower than the actual decoding sample rate, and is only an indicator of the encoded audio's quality

Returns the final range of the entropy coder. If you need this then I also assume you know what it's for.

Gets or sets the gain (Q8) to use in decoding

Gets the duration of the last packet, in PCM samples per channel

Gets the number of channels that this decoder decodes to. Always constant for the lifetime of the decoder.

Gets the last estimated pitch value of the decoded audio

Gets the sample rate that this decoder decodes to. Always constant for the lifetime of the decoder

Represents an Opus encoder for a 1- or 2-channel audio stream. May be backed either by managed code or a native adapter layer, depending on your platform and performance requirements.

Encodes an Opus frame.

Input signal (Interleaved if stereo). Length should be at least frame_size * channels The number of samples per channel in the inpus signal. The frame size must be a valid Opus framesize for the given sample rate. For example, at 48Khz the permitted values are 120, 240, 480, 960, 1920, and 2880. Passing in a duration of less than 10ms (480 samples at 48Khz) will prevent the encoder from using FEC, DTX, or hybrid modes. Destination buffer for the output payload. This must contain at least max_data_bytes The maximum amount of space allocated for the output payload. This may be used to impose an upper limit on the instant bitrate, but should not be used as the only bitrate control (use the Bitrate parameter for that) The length of the encoded packet, in bytes. This value will always be less than or equal to 1275, the maximum Opus packet size.

Encodes an Opus frame using floating point input.

Input signal in float format (Interleaved if stereo). Length should be at least frame_size * channels. Value should be normalized to the +/- 1.0 range. Samples with a range beyond +/-1.0 will be clipped. The number of samples per channel in the inpus signal. The frame size must be a valid Opus framesize for the given sample rate. For example, at 48Khz the permitted values are 120, 240, 480, 960, 1920, and 2880. Passing in a duration of less than 10ms (480 samples at 48Khz) will prevent the encoder from using FEC, DTX, or hybrid modes. Destination buffer for the output payload. This must contain at least max_data_bytes The maximum amount of space allocated for the output payload. This may be used to impose an upper limit on the instant bitrate, but should not be used as the only bitrate control (use the Bitrate parameter for that) The length of the encoded packet, in bytes. This value will always be less than or equal to 1275, the maximum Opus packet size.

Resets the state of this encoder, usually to prepare it for processing a new audio stream without reallocating.

Gets the version string of the library backing this implementation.

An arbitrary version string.

Gets or sets the application (or signal type) of the input signal. This hints to the encoder what type of details we want to preserve in the encoding. This cannot be changed after the encoder has started

Gets or sets the bitrate for encoder, in bits per second. Valid bitrates are between 6K (6144) and 510K (522240)

Gets or sets the maximum number of channels to be encoded. This can be used to force a downmix from stereo to mono if stereo separation is not important

Gets or sets the maximum bandwidth to be used by the encoder. This can be used if high-frequency audio is not important to your application (e.g. telephony)

Gets or sets the "preferred" encoded bandwidth. This does not affect the sample rate of the input audio, only the encoding cutoffs

Gets or sets a flag to enable Discontinuous Transmission mode. This mode is only available in the SILK encoder (Bitrate < 40Kbit/s and/or ForceMode == SILK). When enabled, the encoder detects silence and background noise and reduces the number of output packets, with up to 600ms in between separate packet transmissions.

Gets or sets the encoder complexity, between 0 and 10

Gets or sets a flag to enable Forward Error Correction. This mode is only available in the SILK encoder (Bitrate < 40Kbit/s and/or ForceMode == SILK). When enabled, lost packets can be partially recovered by decoding data stored in the following packet.

Gets or sets the expected amount of packet loss in the transmission medium, from 0 to 100. Only applies if UseInbandFEC is also enabled, and the encoder is in SILK mode.

Gets or sets a flag to enable Variable Bitrate encoding. This is recommended as it generally improves audio quality with little impact on average bitrate

Gets or sets a flag to enable constrained VBR. This only applies when the encoder is in CELT mode (i.e. high bitrates)

Gets or sets a hint to the encoder for what type of audio is being processed, voice or music. This is not set by the encoder itself i.e. it's not the result of any actual signal analysis.

Gets the number of samples of audio that are being stored in a buffer and are therefore contributing to latency.

Gets the encoder's input sample rate. This is fixed for the lifetime of the encoder.

Gets the number of channels that this encoder expects in its input. Always constant for the lifetime of the decoder.

Returns the final range of the entropy coder. If you need this then I also assume you know what it's for.

Gets or sets the bit resolution of the input audio signal. Though the encoder always uses 16-bit internally, this can help it make better decisions about bandwidth and cutoff values

Gets or sets a fixed length for each encoded frame. Typically, the encoder just chooses a frame duration based on the input length and the current internal mode. This can be used to enforce an exact length if it is required by your application (e.g. monotonous transmission)

Sets a user-forced mode for the encoder. There are three modes, SILK, HYBRID, and CELT. Silk can only encode below 40Kbit/s and is best suited for speech. Silk also has modes such as FEC which may be desirable. Celt sounds better at higher bandwidth and is comparable to AAC. It also performs somewhat faster. Hybrid is used to create a smooth transition between the two modes. Note that this value may not always be honored due to other factors such as frame size and bitrate.

Gets or sets a flag to disable prediction, which does... something with the SILK codec

The Opus multistream decoder structure. Multistream decoding is an aggregate of several internal decoders and extra logic to parse multiple frames from single packets and map them to the correct channels. The behavior of a multistream decoder is functionally the same as a single decoder in most other respects.

Decodes a multichannel Opus packet, putting the decoded audio into a floating-point buffer.

The input payload. This may be empty if the previous packet was lost in transit (when PLC is enabled) A buffer to put the output PCM. The output size is (# of samples) * (# of channels) for a given single frame size (maximum 120ms). The number of samples (per channel) of available space in the output PCM buf. It should contain at least enough space to contain (# of samples) * (# of channels) for a given single frame size (maximum 120ms). In the case of PLC (data == NULL) or FEC (decode_fec == true), then frame_size needs to be exactly the duration of the audio that is missing, otherwise the decoder will not be in an optimal state to decode the next incoming packet. For the PLC and FEC cases, frame_size *must* be a multiple of 10 ms. Indicates that we want to recreate the PREVIOUS (lost) packet using FEC data from THIS packet. Using this packet recovery scheme, you will actually decode this packet twice, first with decode_fec TRUE and then again with FALSE. If FEC data is not available in this packet, the decoder will simply generate a best-effort recreation of the lost packet. The number of decoded samples

Decodes a multichannel Opus packet, putting the decoded audio into an int16 buffer.

Resets all buffers and prepares this decoder to process a fresh (unrelated) stream

Gets the version string of the library backing this implementation.

An arbitrary version string.

Gets the encoded bandwidth of the last packet decoded. This may be lower than the actual decoding sample rate, and is only an indicator of the encoded audio's quality

Returns the final range of the entropy coder. If you need this then I also assume you know what it's for.

Gets or sets the gain (Q8) to use in decoding

Gets the duration of the last packet, in PCM samples per channel

Gets the sample rate that this decoder decodes to. Always constant for the lifetime of the decoder

Gets the number of channels of the input data. Always constant for the lifetime of the decoder

The Opus multistream encoder structure. Multistream encoding is an aggregate of several internal encoders and extra logic to pack multiple frames into single packets and map them to the correct channels. The behavior of a multistream encoder is functionally the same as a single encoder in most other respects.

Encodes a multistream Opus frame.

Input signal, interleaved to the total number of surround channels, according to Vorbis channel layouts. Length should be at least (# of samples) * (# of channels) for a given single frame size (maximum 120ms). The number of samples per channel in the inpus signal. The frame size must be a valid Opus framesize for the given sample rate. Destination buffer for the output payload. This must contain at least max_data_bytes The maximum amount of space allocated for the output payload. This may be used to impose an upper limit on the instant bitrate, but should not be used as the only bitrate control (use the Bitrate parameter for that) The length of the encoded packet, in bytes. This value will always be less than or equal to 1275, the maximum Opus packet size.

Encodes a multistream Opus frame.

Resets the state of this encoder, usually to prepare it for processing a new audio stream without reallocating.

Gets the version string of the library backing this implementation.

An arbitrary version string.

Gets or sets the "preferred" encoded bandwidth. This does not affect the sample rate of the input audio, only the encoding cutoffs

Gets or sets the bitrate for encoder, in bits per second. Valid bitrates are between 6K (6144) and 510K (522240)

Gets or sets the encoder complexity, between 0 and 10

Gets the number of channels that this encoder expects in its input. Always constant for the lifetime of the decoder.

Returns the final range of the entropy coder. If you need this then I also assume you know what it's for.

Gets or sets a user-forced mode for the encoder. There are three modes, SILK, HYBRID, and CELT. Silk can only encode below 40Kbit/s and is best suited for speech. Silk also has modes such as FEC which may be desirable. Celt sounds better at higher bandwidth and is comparable to AAC. It also performs somewhat faster. Hybrid is used to create a smooth transition between the two modes. Note that this value may not always be honored due to other factors such as frame size and bitrate.