Skip to content

Candidate Scoring

After Brave returns a list of (artist, title, kind) triples, they are merged, ranked by match_score, and a canonical pair is picked from the top. This page explains the scoring algorithm and the special cases layered on top of it.

Source: src-tauri/src/resolver/mod.rs


match_score

fn match_score(query: &str, cand: &TrackCandidate) -> f32

Returns a value in [0, ~4.5]. Higher is better.

Components

Component +Score Condition
Artist overlap +1.0 part_matches(query_tokens, query_norm, &cand.artist)
Title overlap +1.0 part_matches(query_tokens, query_norm, &cand.title)
Exact artist match +1.0 norm(artist_stripped) == norm(query)
Exact title match +1.0 norm(title_stripped) == norm(query)
Multi-source bonus +0.5 × (sources.len() − 1) More sources reporting same candidate = more confidence

The multi-source bonus never exceeds a genuine text-match: 3 sources add +1.0, which ties with one text-match signal but cannot beat two (artist + title = 2.0).

part_matches

fn part_matches(
    query_tokens: &HashSet<String>,
    query_norm: &str,
    candidate_part: &str,
) -> bool

Two signals, either one sufficient:

  1. Token overlap ≥ 50%: at least half of the candidate part's tokens appear in the query token set.
  2. "Нон стоп" (2 tokens) against query "нон стоп молли": overlap = 2/2 = 100% → match
  3. "Paranoid Android" (2 tokens) against "radiohead paranoid": overlap = 1/2 = 50% → match

  4. Substring match (normalized, either direction):

  5. norm(query) contains norm(candidate_part), OR
  6. norm(candidate_part) contains norm(query)
  7. Handles no-space variants like "нонстоп" matching "Нон стоп"

Bracket annotation stripping

Before scoring, bracketed annotations are stripped from both artist and title:

fn strip_bracket_annotations(s: &str) -> String {
    // Removes everything inside (...) and [...]
}

This handles Genius titles like "Сукины дети (Sons of Bitches)" where the parenthetical is an English translation. Without stripping, the 5-token version "сукины дети sons of bitches" would dilute the token overlap ratio for a Cyrillic query "сукины дети".


Canonicalization threshold

The minimum score to accept a candidate as canonical:

const CANONICAL_MIN_SCORE: f32 = 2.0;

Score ≥ 2.0 means both artist AND title matched the query (one text-match signal each = 2.0). A score of 1.0 means only one side matched — not enough to confidently canonicalize.

This threshold exists because Brave sorts by relevance but not always correctly. Example failure without threshold: query "первый класс" → Brave's top track hit is "1.Kla$ - АКНЕ" (artist matches "первый класс" semantically, title does not) → score 1.0 → correctly not canonicalized.


Artist-hub cross-boost

let artist_hub_norms: HashSet<String> = merged
    .iter()
    .filter(|c| c.kind == MatchKind::Artist)
    .map(|c| norm::norm(&c.artist))
    .collect();

// For each Track candidate:
if c.kind == MatchKind::Track && artist_hub_norms.contains(&norm::norm(&c.artist)) {
    s += 1.0;
}

When Brave returns both an Artist page and Track pages for the same artist, the track candidates get +1.0. This validates the artist name across two independent source types (artist hub + track page) and lets a correct track clear the 2.0 threshold even when the query doesn't mention the title explicitly.

Example: query "первый класс сукины дети" → Brave returns artist page "1.Kla$" (+1.0 to all 1.Kla$ tracks) and track page "1.Kla$ - Сукины дети" (artist overlap +1.0 + cross-boost +1.0 = total 2.0 from overlap, clears threshold).


Album / Artist candidate acceptance

Track candidates require score >= 2.0. Album and Artist candidates use a lower bar:

MatchKind::Album | MatchKind::Artist => {
    if score >= 1.0 { return true; }
    // Script-mismatch escape: Cyrillic query + Latin-only candidate
    let q_cyrl = trimmed.chars().any(|ch| matches!(ch, 'а'..='я' | 'А'..='Я' | 'ё' | 'Ё'));
    let c_latin_only = full.chars().all(|ch| !matches!(ch, 'а'..='я' | ...));
    q_cyrl && c_latin_only
}

Score ≥ 1.0 is sufficient because an Artist page for "Пошлая Молли" against query "нон стоп молли" scores exactly 1.0 (artist match), but that's still a strong signal that the artist is correct.

The script-mismatch escape (score = 0 allowed) handles: query in Cyrillic, candidate entirely Latin. norm() maps both to alphanumeric lowercase but Cyrillic "молли" (молли) and Latin "Molly" (molly) are different strings — pure string comparison cannot detect the semantic equivalence. The escape lets a Cyrillic query match a Latin-only Genius artist page by allowing score = 0.


Pre-truncation album/artist candidates

Before the ranked list is truncated to 10, album and artist candidates are extracted separately:

let album_page_cand = ranked
    .iter()
    .find(|(_, c)| c.kind == MatchKind::Album && special_trust(c))
    .map(|(_, c)| c.clone());
let artist_page_cand = ranked
    .iter()
    .find(|(_, c)| c.kind == MatchKind::Artist && !c.artist.is_empty() && special_trust(c))
    .map(|(_, c)| c.clone());

These are added back to the ranked list after truncation. This prevents a strong Artist or Album match from being silently lost if it fell below position 10 because its text score was 0 (cross-script) while Track candidates with partial text-match scored higher numerically.


Artist-only query detection

After scoring, a final check determines whether the raw query is entirely the artist's name:

let q_tokens_set: HashSet<String> = tokenize(trimmed).into_iter().collect();
let artist_only_query = artist_page_cand.as_ref().is_some_and(|c| {
    let stripped = strip_bracket_annotations(&c.artist);
    let a_tokens: HashSet<String> = tokenize(&stripped).into_iter().collect();
    q_tokens_set.iter().all(|t| a_tokens.contains(t))
});

If every token in the query appears in the top Artist candidate's name, the query is treated as an artist-only search. This overrides the normal scoring: the canonical is set to {artist: "...", title: ""} and intent forced to Artist.

Without this check, "Пошлая Молли" (a pure artist query) would pick whichever track Brave happened to list first (e.g. "АКНЕ"), because every track by that artist scores ~2.0 via the artist-hub cross-boost, and stable_sort falls back to Brave's ordering to break ties.