Brave Search & Argon2id PoW¶

Brave Search is the only web source used by the resolver. It is queried with a site:genius.com constraint to keep results focused on lyrics pages with predictable title formats. Brave occasionally returns HTTP 429 with an Argon2id proof-of-work challenge — the resolver solves it locally and retries.

Source: src-tauri/src/resolver/sources/brave.rs

Query construction¶

The resolver appends a site filter to every query:

const LYRICS_SITES: &[&str] = &["genius.com"];

let sites_clause = LYRICS_SITES
    .iter()
    .map(|s| format!("site:{s}"))
    .collect::<Vec<_>>()
    .join(" OR ");
let q = format!("{query} ({sites_clause})");
// → "нон стоп молли (site:genius.com)"

Why client-side filtering too? Brave honours site:genius.com loosely — it still mixes in YouTube, Apple Music, Last.fm, and blog results when the corpus hit count is low. A second check is applied after receiving results:

if !title.to_lowercase().contains("genius") {
    continue; // drop non-Genius results
}

Any result whose page title does not contain "genius" is silently dropped, regardless of what Brave's URL claimed.

Normal request¶

GET https://search.brave.com/search?q=<query>&source=web
Accept: application/json
User-Agent: Mozilla/5.0 (Macintosh; ...) Chrome/122.0.0.0 Safari/537.36

The User-Agent mirrors the one used in the original Python prototype that established what request shape Brave expects. Changing it risks different rate-limit behaviour.

On success (HTTP 200), the response is raw HTML. Content-Type handling note: Brave's Content-Type header sometimes omits charset=utf-8, causing reqwest's .text() to fall back to latin-1 and mojibake every Cyrillic byte pair. To work around this, the code reads raw bytes and decodes explicitly:

let bytes = res.bytes().await?;
let html = String::from_utf8_lossy(&bytes); // lossy: malformed bytes → U+FFFD

Title extraction¶

The HTML parser looks for search-snippet-title spans using a regex:

const TITLE_PAT: &str =
    r#"(?s)<(?:div|span|a)[^>]*class="[^"]*search-snippet-title[^"]*"[^>]*>([^<]{3,200})</"#;

This is intentionally not a real HTML parser — it is a forgiving regex because Brave tweaks class names periodically. The capture group extracts the raw inner text of the snippet title element.

Title parsing¶

Each extracted title is passed through parse_artist_title. It tries three kinds of patterns in order:

1. Album pages¶

"Пошлая Молли - Незваный гость Lyrics and Tracklist | Genius"
→ (artist="Пошлая Молли", title="Незваный гость", kind=Album)

make(r"^(.+?)\s*[-–—]\s*(.+?)\s+Lyrics\s+and\s+Tracklist\s*\|\s*Genius\s*$"),

Album patterns are checked first because their more specific suffix prevents them from matching as tracks.

2. Track pages (7 patterns)¶

"Пошлая Молли - Нон стоп Lyrics | Genius"
→ (artist="Пошлая Молли", title="Нон стоп", kind=Track)

"текст песни 1.Kla$ - Сукины дети | ..."
→ (artist="1.Kla$", title="Сукины дети", kind=Track)

Some patterns have swapped capture order (bool = false) — for shapes like "song lyrics by Artist", where group 1 is the title and group 2 is the artist:

(make(r"^(.+?)\s*[-–—]\s*song\s+and\s+lyrics\s+by\s+(.+?)(?:\s*[|\-–—].*)?$"), false),

3. Artist pages¶

"Пошлая Молли Lyrics, Songs, and Albums | Genius"
→ (artist="Пошлая Молли", title="", kind=Artist)

"Тексты песен Земфира | ..."
→ (artist="Земфира", title="", kind=Artist)

Artist pages carry an empty title and lead to Intent::Artist downstream.

Rejection filters¶

After parsing, each candidate is checked:

if junk.is_match(artist) || junk.is_match(title) { continue; }
if artist.chars().count() > 60 || title.chars().count() > 80 { continue; }
if artist.contains('|') || title.contains('|') { continue; }

The junk regex matches standalone words like "lyrics", "paroles", "letras", "тексты", "перевод". These appear in poorly-structured titles where the regex swallowed a suffix into the artist/title capture group.

The | check catches cases where the regex swallowed a "| Genius" suffix into the capture (e.g. title="Radiohead | Genius").

Translation accounts are also explicitly rejected:

let artist_lc = artist.to_lowercase();
if artist_lc.contains("translations") || artist_lc.contains("перевод") { continue; }

Genius translator accounts ("Genius Russian Translations") publish album-shaped pages whose title matches the album regex but whose "artist" is the translator, not the band. Dropping them lets the real artist-hub page from the same result set win.

HTML entity decoding¶

html_unescape decodes standard entities (&, ", ', &#NNNN;). The key implementation note: the function iterates by char, not by byte:

// Correct: char iteration keeps multibyte UTF-8 codepoints intact
let mut chars = s.char_indices();
while let Some((pos, ch)) = chars.next() { ... }

An earlier byte-based version pushed b[i] as char for non-& bytes, which mangled Cyrillic characters (each encoded as 2 UTF-8 bytes) into garbage Latin-1 codepoints. The char-based version processes each Unicode code point as a unit.

429 and PoW challenge¶

When Brave suspects a bot, it returns HTTP 429 with Content-Type: application/json and a challenge body:

{
  "tokens": ["abc123...", "def456..."],
  "zero_count": 4,
  "hash_function_params": {
    "iterations": 2,
    "memory_size": 19456,
    "parallelism": 1,
    "hash_length": 32
  },
  "solution_limit": 5000,
  "set_token": "eyJ..."
}

The challenge requires finding, for each token, a 16-byte random salt whose Argon2id hash starts with zero_count leading zero nibbles (hex characters).

Solving¶

The solver runs on a blocking thread via tokio::task::spawn_blocking:

async fn solve_pow(challenge: Challenge) -> Option<OwnedSolution> {
    tokio::task::spawn_blocking(move || solve_pow_sync(&challenge))
        .await.ok().flatten()
}

solve_pow_sync grinds Argon2id for each token:

for _ in 0..limit {
    rng.fill_bytes(&mut salt_bytes);
    // Brave expects the hex STRING's UTF-8 bytes as the Argon2 salt input —
    // not the raw 16 bytes. This matches the Python prototype.
    let salt_hex = hex::encode(salt_bytes);
    argon.hash_password_into(tok.as_bytes(), salt_hex.as_bytes(), &mut digest)?;
    let digest_hex = hex::encode(&digest);
    if digest_hex.bytes().take(zeros).all(|b| b == b'0') {
        found = Some(salt_hex);
        break;
    }
}

Argon2 salt input format

Brave uses the hex string of the salt as the Argon2 salt input, not the raw bytes. hash_password_into(token_bytes, salt_hex_utf8_bytes, ...). This is non-obvious and was discovered by matching the Python prototype that was established to work.

The solver has a hard time budget: POW_MAX_MS = 3000. If any token cannot be solved within the budget, solve_pow_sync returns None and the resolver gives up on Brave rather than stalling indefinitely.

Submitting the solution¶

client
    .post("https://search.brave.com/api/captcha/pow?brave=0")
    .header("Content-Type", "application/json")
    .json(&solution)  // { set_token, solutions: {token: salt_hex}, taken_time }
    .send()
    .await?;
// After POST, the search is retried — Brave sets a cookie server-side

Retry loop¶

The resolver tries up to 3 times total. Each iteration may get a fresh 429 with a new challenge, which is solved independently:

for _ in 0..3 {
    let res = client.get(base_url).query(...).send().await?;
    if res.status().is_success() {
        // parse HTML and return
    }
    if res.status() != StatusCode::TOO_MANY_REQUESTS {
        return Err(format!("brave: http {}", res.status()));
    }
    // 429: decode challenge, solve, POST solution, loop
    let challenge: Challenge = res.json().await?;
    let solution = solve_pow(challenge).await.ok_or("PoW solve failed")?;
    client.post(".../pow").json(&solution).send().await?;
}

Timeouts¶

Constant	Value	Purpose
`BRAVE_TIMEOUT`	4000 ms	Per-attempt wall-clock limit (includes PoW solve time)
`POW_MAX_MS`	3000 ms	Hard budget for the Argon2id grind
`connect_timeout`	5 s	TCP connection establishment
`timeout` (reqwest)	12 s	Full request/response cycle

The BRAVE_TIMEOUT wraps the entire lookup() future in tokio::time::timeout. If Brave is slow or PoW is hard, the resolver gives up and returns Raw intent rather than making the user wait.