Skip to content

Algorithm Configuration

Sergiu Ciumac edited this page Dec 27, 2021 · 8 revisions

You don't need to change default configuration options unless you are interested in fine-tuning the algorithm to your specific requirements.

The algorithm has two entry points to configure the parameters related to fingerprints generation and fingerprints retrieval. It is done in the builder objects in the corresponding WithFingerprintConfig and WithQueryConfig methods. Fingerprints retrieval parameters, defined during QueryCommand build, contain fingerprint generation parameters.

  • Fingerprinting Configuration
FingerprintCommandBuilder.Instance
            .BuildFingerprintCommand()
            .From("file.ogg")
            .WithFingerprintConfig(config =>
            {
                // audio configuration
                config.Audio = new DefaultFingerprintConfiguration();
                // video configuration
                config.Video = new DefaultVideoFingerprintConfiguration();
                return config;
            });
  • Query Configuration
QueryCommandBuilder.Instance.BuildQueryCommand()
            .From("file.ogg")
            .WithQueryConfig(config =>
            {
                // audio configuration
                config.Audio.FingerprintConfiguration = new DefaultFingerprintConfiguration();
                // video configuration
                config.Video.FingerprintConfiguration = new DefaultVideoFingerprintConfiguration();
                return config;
            });

The following properties must have the same values during fingerprinting and query for the algorithm to be able to cross match them:

  • FrequencyRange
  • HaarWaveletNorm
  • TopWavelets
  • SampleRate
  • HashingConfig
  • FrameNormalizationTransform

Fingerprinting Configuration options

  • Stride - is the distance measured in samples between consecutive fingerprints. The smaller the stride, the better the recognition rate, as more fingerprints are generated per one second of audio. Default value IncrementalStaticStride(512). Ignored by video fingerprinting.
  • FrequencyRange - defines the frequency range to consider during fingerprinting. (default: 318-2000 Hz) The Max value has to be less than SampleRate/2 to satisfy Nyquist frequency. Default: [318-2000], since it worked well for others. Setting Min below 300Hz does not make a lot of sense since you will start picking up low frequencies which are normally regarded as noise (i.e., airplane noise). Setting Max above 2,000Hz makes sense if the audio which you are fingerprinting contains a lot of unique harmonics (i.e., classical music). Measured in hertz. Ignored by video fingerprinting.
  • TopWavelets - each fingerprint is generated by min-hashing an image transformed through standard wavelet decomposition. The size of the resulting image is 128 * 32 pixels. In layman's terms, TopWavelets define how many of those pixels (ordered by magnitude) have to be taken into account by the Min-Hash algorithm to hash the image (default = 200, which is ~4.9%). Lowering this value will instruct the algorithm to consider fewer top wavelets, potentially making it more robust to degraded audio. The same number of top wavelets have to be used during insert and query fingerprints generation.
  • SampleRate - sample rate to downsample the input audio, before extracting log-spectrum of a specific FrequencyRange. Ignored by video fingerprinting.
  • HashingConfig - min-hash schema hashing configuration. Since we are not operating in geometric space, instead of making random projections (standard LSH usage), we will apply random set permutations known as min-hashing as our LSH schema. The configuration options for min-hash are defined in this class.
  • FrameNormalizationTransform - image normalization strategy. Defaults:
    • Audio = LogSpectrumNormalization allows enhancing small frequency peaks which would have been disregarded by the min-hash hashing algorithm.
    • Video = GaussianBlurNormalization(Kernel=5,Sigma=1.5), works as a smoothing mechanism that combats pixelation.

Query Configuration options

By default DefaultQueryConfiguration class is used to provide parameters for fingerprints generation and matches. You can amend the default config, by using the following override.

QueryCommandBuilder.Instance
        .BuildQueryCommand()
        .From(new AudioSamples(match, "cnn", 5512))
        .WithQueryConfig(config =>
        {
            config.Audio.AllowMultipleMatchesOfTheSameTrackInQuery = true;
            config.Audio.PermittedGap = 3d;
            return config;
        })
        .UsingServices(modelService)
        .Query();
  • AllowMultipleMatchesOfTheSameTrackInQuery - a flag indicating whether the algorithm should search for multiple matches of the same track in the query. Useful when you have a long query containing the same track multiple times scattered across the query. Default is false.
  • PermittedGap - A gap indicates the difference between query and target track. ResultEntry object contains the Coverage.BestPath object that describes in detail when the match occurred, including any gaps that happened along the way. PermittedGap defines the length in seconds of the gap to be ignored, default is 2 seconds.
  • ThresholdVotes - Each fingerprint contains a predefined number of integers that describe it (see HashedFingerprint.HashBins) (by default 25 integers). Threshold votes control how many of those integers have to match to report a successful match. The higher the number the more precise the content from the query and track have to be (default = 4). Setting ThresholdVotes to 3 will make the algorithm more lenient towards noise and disturbances.
  • YesMetaFieldsFilters and NoMetaFieldsFilters - at times you may want to have a secondary filter applied on the returned result entries, driven by the requirements of your application. For example, you may want to match only tracks annotated with Region = "USA". To do so, set the region identifier on TrackInfo object when it is inserted in the data store (use TrackInfo.MetaFields property). Then, at query time, pass YesMetaFieldsFilters with corresponding key-value pairs.
  • MaxTracksToReturn - sets the maximum number of tracks to return out of all the analyzed candidates (default = 25).

Query configuration options specific to fingerprint generation

  • Stride - similar to FingerprintConfiguration, it defines the stride between two consecutive fingerprints. You don't have to use the same stride when inserting and querying. It is better to use a randomized query stride to minimize the probability of unlucky alignments between query and target track. Default query stride: IncrementalRandomStride(256, 512).