Skip to content

Two-Stage Resolution

The resolver converts raw user input into a structured (artist, title) pair before any search provider runs. This dramatically improves hit quality: instead of sending "нон стоп молли" to a BitTorrent tracker's full-text search, we send "Пошлая Молли Нон стоп".

Source: src-tauri/src/resolver/mod.rs, src-tauri/src/resolver/types.rs


Why it exists

File-based search (RuTracker, SoulSeek) works on exact strings inside filenames and torrent names. User queries are messy:

  • Lyric snippets: "смотрю в иллюминатор я вижу море огней"
  • Transliteration: "paranoid android radiohead" → actual artist is "Radiohead", title "Paranoid Android"
  • Colloquial: "нон стоп молли" → artist "Пошлая Молли", title "Нон стоп"
  • Cross-script: "perviy klass sukiny deti" → artist "1.Kla$", title "Сукины дети"

The resolver bridges this gap by asking a web-scale index (Genius via Brave Search) what the canonical metadata is, then feeding the clean "Artist Title" string to the providers.


Output: ResolveResult

pub struct ResolveResult {
    pub query: String,                  // raw user input, echoed back
    pub canonical: Option<ArtistTitle>, // best-guess (artist, title) — null if resolver failed
    pub candidates: Vec<TrackCandidate>,// up to ~10 alternatives, best-first
    pub intent: Intent,                 // Track / Artist / Album / Lyric / Raw
    pub elapsed_ms: u64,
}

pub struct ArtistTitle {
    pub artist: String,
    pub title: String,   // empty for Artist-intent results
}

pub enum Intent {
    Track,   // one clear song identified
    Artist,  // query ~= artist name, multiple songs returned
    Album,   // cluster of tracks from one album
    Lyric,   // LRCLIB matched + query looks like a lyric snippet
    Raw,     // nothing useful resolved — providers search the raw string
}

The frontend uses intent to decide which UI chip to show ("Трек", "Исполнитель", etc.) and whether to route to the artist page or the track list.


Architecture

Only one source runs: Brave Search with site:genius.com. iTunes and LRCLIB were both dropped:

  • iTunes ranked by global popularity and surfaced wrong tracks when the user asked for a different song by the same artist
  • LRCLIB was network-blocked on the primary development machine and always timed out after 3 s, adding 3 s to every query

The Brave path typically resolves in 700–1200 ms on a healthy connection. When Brave returns 429, the resolver solves an Argon2id proof-of-work challenge and retries (see Brave Search & PoW).

resolve_query(raw_query)
sources::brave::lookup(client, query, 10, dlog)
    ↓ Vec<(artist, title, MatchKind)>
merge_candidates(...)
    ↓ Vec<TrackCandidate> (deduped by normalized key)
rank by match_score(query, candidate)
pick canonical (score ≥ 2.0 OR lyric/script-mismatch override)
classify_intent(query, candidates, has_canonical)
ResolveResult { canonical, candidates, intent, elapsed_ms }

Candidate merging

merge_candidates deduplicates by (kind, norm(artist), norm(title)). norm() is lowercase + strip non-alphanumeric (see resolver/norm.rs):

pub fn norm(s: &str) -> String {
    s.chars()
        .filter(|c| c.is_alphanumeric())
        .flat_map(|c| c.to_lowercase())
        .collect()
}

So "Пошлая Молли" and "пошлая молли" collapse to the same key (пошлаямолли), preventing duplicates when Brave returns the same artist from multiple page titles.

Each candidate tracks which sources reported it:

pub struct TrackCandidate {
    pub artist: String,
    pub title: String,
    pub kind: MatchKind,     // Track / Album / Artist
    pub sources: Vec<String>, // e.g. ["brave"]
}

More sources = higher confidence, used as a tiebreaker in scoring.


MatchKind

Brave returns pages of three types. The parser in brave.rs classifies each:

Kind Example title Meaning
Track "Пошлая Молли - Нон стоп Lyrics | Genius" A specific song
Album "Пошлая Молли - Незваный гость Lyrics and Tracklist | Genius" An album page
Artist "Пошлая Молли Lyrics, Songs, and Albums | Genius" An artist hub page

The kind carries through to Intent classification and to the UI chip shown in the search hint.


Canonicalization rules

A candidate is accepted as canonical when:

score(query, candidate) >= 2.0

OR one of two overrides applies:

  1. Lyric override (trust_lyric_source): query is long (>20 chars), has no "Artist - Title" separator, and at least one candidate came from a lyric source (Brave). Used for lyric-snippet queries where the query text is deliberately not the track name.

  2. Script-mismatch override: query contains Cyrillic characters but the top candidate is Latin-only. match_score can't bridge scripts, so it reports 0 for e.g. "параноид андроид" ↔ "Radiohead - Paranoid Android". The override lets these through anyway.

If no candidate meets the threshold, canonical = None and intent = Raw. The provider then searches the raw string.


Artist-only query detection

After picking the canonical candidate, the resolver checks whether the raw query is entirely contained within the top candidate's artist name (token-by-token):

let q_tokens_set: HashSet<String> = tokenize(trimmed).into_iter().collect();
let artist_only_query = artist_page_cand.as_ref().is_some_and(|c| {
    let stripped = strip_bracket_annotations(&c.artist);
    let a_tokens: HashSet<String> = tokenize(&stripped).into_iter().collect();
    q_tokens_set.iter().all(|t| a_tokens.contains(t))
});

If true, the canonical title is set to empty and intent is forced to Artist. This prevents "Пошлая Молли" (a pure artist query) from picking the first track Brave happened to list for that artist.


Intent classification

After canonicalization:

canonical.kind == Album  → Intent::Album
canonical.kind == Artist → Intent::Artist
canonical.kind == Track  →
    if (query looks like lyric phrase)
    AND (query has words not in artist+title)
        → Intent::Lyric
    else
        → Intent::Track
canonical == None        → Intent::Raw

The "looks like lyric" check: char_count > 20 AND no "Artist - Title" separator (" - ", " – ", " — ").

The "has extra words" check: at least one query token is not in the union of tokenize(artist) ∪ tokenize(title). If every query word is already explained by the metadata, it is a plain track query, not a lyric query.


Tauri command

The resolver is exposed as a single Tauri command:

#[tauri::command]
pub async fn resolve_query(
    state: State<'_, ResolverState>,
    query: String,
) -> Result<ResolveResult, String>

ResolverState holds a shared reqwest::Client (connection-pooled across invocations) and the Arc<AppDebugLog>.

The frontend calls it through the thin wrapper in src/search/resolver.ts:

const resolved = await resolveQuery(rawQuery);
// resolved: { canonical, candidates, intent, elapsed_ms }