Brave Search & Argon2id PoW¶
Brave Search is the only web source used by the resolver. It is queried with a site:genius.com constraint to keep results focused on lyrics pages with predictable title formats. Brave occasionally returns HTTP 429 with an Argon2id proof-of-work challenge — the resolver solves it locally and retries.
Source: src-tauri/src/resolver/sources/brave.rs
Query construction¶
The resolver appends a site filter to every query:
const LYRICS_SITES: &[&str] = &["genius.com"];
let sites_clause = LYRICS_SITES
.iter()
.map(|s| format!("site:{s}"))
.collect::<Vec<_>>()
.join(" OR ");
let q = format!("{query} ({sites_clause})");
// → "нон стоп молли (site:genius.com)"
Why client-side filtering too? Brave honours site:genius.com loosely — it still mixes in YouTube, Apple Music, Last.fm, and blog results when the corpus hit count is low. A second check is applied after receiving results:
Any result whose page title does not contain "genius" is silently dropped, regardless of what Brave's URL claimed.
Normal request¶
GET https://search.brave.com/search?q=<query>&source=web
Accept: application/json
User-Agent: Mozilla/5.0 (Macintosh; ...) Chrome/122.0.0.0 Safari/537.36
The User-Agent mirrors the one used in the original Python prototype that established what request shape Brave expects. Changing it risks different rate-limit behaviour.
On success (HTTP 200), the response is raw HTML. Content-Type handling note: Brave's Content-Type header sometimes omits charset=utf-8, causing reqwest's .text() to fall back to latin-1 and mojibake every Cyrillic byte pair. To work around this, the code reads raw bytes and decodes explicitly:
let bytes = res.bytes().await?;
let html = String::from_utf8_lossy(&bytes); // lossy: malformed bytes → U+FFFD
Title extraction¶
The HTML parser looks for search-snippet-title spans using a regex:
const TITLE_PAT: &str =
r#"(?s)<(?:div|span|a)[^>]*class="[^"]*search-snippet-title[^"]*"[^>]*>([^<]{3,200})</"#;
This is intentionally not a real HTML parser — it is a forgiving regex because Brave tweaks class names periodically. The capture group extracts the raw inner text of the snippet title element.
Title parsing¶
Each extracted title is passed through parse_artist_title. It tries three kinds of patterns in order:
1. Album pages¶
"Пошлая Молли - Незваный гость Lyrics and Tracklist | Genius"
→ (artist="Пошлая Молли", title="Незваный гость", kind=Album)
Album patterns are checked first because their more specific suffix prevents them from matching as tracks.
2. Track pages (7 patterns)¶
"Пошлая Молли - Нон стоп Lyrics | Genius"
→ (artist="Пошлая Молли", title="Нон стоп", kind=Track)
"текст песни 1.Kla$ - Сукины дети | ..."
→ (artist="1.Kla$", title="Сукины дети", kind=Track)
Some patterns have swapped capture order (bool = false) — for shapes like "song lyrics by Artist", where group 1 is the title and group 2 is the artist:
3. Artist pages¶
"Пошлая Молли Lyrics, Songs, and Albums | Genius"
→ (artist="Пошлая Молли", title="", kind=Artist)
"Тексты песен Земфира | ..."
→ (artist="Земфира", title="", kind=Artist)
Artist pages carry an empty title and lead to Intent::Artist downstream.
Rejection filters¶
After parsing, each candidate is checked:
if junk.is_match(artist) || junk.is_match(title) { continue; }
if artist.chars().count() > 60 || title.chars().count() > 80 { continue; }
if artist.contains('|') || title.contains('|') { continue; }
The junk regex matches standalone words like "lyrics", "paroles", "letras", "тексты", "перевод". These appear in poorly-structured titles where the regex swallowed a suffix into the artist/title capture group.
The | check catches cases where the regex swallowed a "| Genius" suffix into the capture (e.g. title="Radiohead | Genius").
Translation accounts are also explicitly rejected:
let artist_lc = artist.to_lowercase();
if artist_lc.contains("translations") || artist_lc.contains("перевод") { continue; }
Genius translator accounts ("Genius Russian Translations") publish album-shaped pages whose title matches the album regex but whose "artist" is the translator, not the band. Dropping them lets the real artist-hub page from the same result set win.
HTML entity decoding¶
html_unescape decodes standard entities (&, ", ', &#NNNN;). The key implementation note: the function iterates by char, not by byte:
// Correct: char iteration keeps multibyte UTF-8 codepoints intact
let mut chars = s.char_indices();
while let Some((pos, ch)) = chars.next() { ... }
An earlier byte-based version pushed b[i] as char for non-& bytes, which mangled Cyrillic characters (each encoded as 2 UTF-8 bytes) into garbage Latin-1 codepoints. The char-based version processes each Unicode code point as a unit.
429 and PoW challenge¶
When Brave suspects a bot, it returns HTTP 429 with Content-Type: application/json and a challenge body:
{
"tokens": ["abc123...", "def456..."],
"zero_count": 4,
"hash_function_params": {
"iterations": 2,
"memory_size": 19456,
"parallelism": 1,
"hash_length": 32
},
"solution_limit": 5000,
"set_token": "eyJ..."
}
The challenge requires finding, for each token, a 16-byte random salt whose Argon2id hash starts with zero_count leading zero nibbles (hex characters).
Solving¶
The solver runs on a blocking thread via tokio::task::spawn_blocking:
async fn solve_pow(challenge: Challenge) -> Option<OwnedSolution> {
tokio::task::spawn_blocking(move || solve_pow_sync(&challenge))
.await.ok().flatten()
}
solve_pow_sync grinds Argon2id for each token:
for _ in 0..limit {
rng.fill_bytes(&mut salt_bytes);
// Brave expects the hex STRING's UTF-8 bytes as the Argon2 salt input —
// not the raw 16 bytes. This matches the Python prototype.
let salt_hex = hex::encode(salt_bytes);
argon.hash_password_into(tok.as_bytes(), salt_hex.as_bytes(), &mut digest)?;
let digest_hex = hex::encode(&digest);
if digest_hex.bytes().take(zeros).all(|b| b == b'0') {
found = Some(salt_hex);
break;
}
}
Argon2 salt input format
Brave uses the hex string of the salt as the Argon2 salt input, not the raw bytes. hash_password_into(token_bytes, salt_hex_utf8_bytes, ...). This is non-obvious and was discovered by matching the Python prototype that was established to work.
The solver has a hard time budget: POW_MAX_MS = 3000. If any token cannot be solved within the budget, solve_pow_sync returns None and the resolver gives up on Brave rather than stalling indefinitely.
Submitting the solution¶
client
.post("https://search.brave.com/api/captcha/pow?brave=0")
.header("Content-Type", "application/json")
.json(&solution) // { set_token, solutions: {token: salt_hex}, taken_time }
.send()
.await?;
// After POST, the search is retried — Brave sets a cookie server-side
Retry loop¶
The resolver tries up to 3 times total. Each iteration may get a fresh 429 with a new challenge, which is solved independently:
for _ in 0..3 {
let res = client.get(base_url).query(...).send().await?;
if res.status().is_success() {
// parse HTML and return
}
if res.status() != StatusCode::TOO_MANY_REQUESTS {
return Err(format!("brave: http {}", res.status()));
}
// 429: decode challenge, solve, POST solution, loop
let challenge: Challenge = res.json().await?;
let solution = solve_pow(challenge).await.ok_or("PoW solve failed")?;
client.post(".../pow").json(&solution).send().await?;
}
Timeouts¶
| Constant | Value | Purpose |
|---|---|---|
BRAVE_TIMEOUT |
4000 ms | Per-attempt wall-clock limit (includes PoW solve time) |
POW_MAX_MS |
3000 ms | Hard budget for the Argon2id grind |
connect_timeout |
5 s | TCP connection establishment |
timeout (reqwest) |
12 s | Full request/response cycle |
The BRAVE_TIMEOUT wraps the entire lookup() future in tokio::time::timeout. If Brave is slow or PoW is hard, the resolver gives up and returns Raw intent rather than making the user wait.