Overview
The Problem
Casting directors and talent agents face a daily challenge: finding the perfect actor for highly specific roles. Traditional search relies on manual browsing, subjective memory, and keyword matching that misses contextual nuances. When a director asks for "a 30-something actor with intense presence, Stockholm dialect, and dark comedy experience," simple database searches fall short.
The Solution
An AI-powered casting intelligence system that understands context, not just keywords. By combining three search approaches—BM25 keyword matching, vector embeddings for semantic similarity, and LLM reranking for contextual refinement—agents can ask natural questions and receive instant, accurate matches.
The Challenge: Beyond Keyword Search
Swedish Actors manages a roster of talented performers, each with unique skills, experience, and characteristics. Casting isn't just about finding someone who matches a list of keywords—it's about understanding:
- Semantic context: "Intense presence" vs. "warm and approachable"
- Experience depth: Years in dark comedy vs. occasional character roles
- Physical characteristics: Age range, appearance, vocal qualities
- Professional attributes: Training background, notable productions
A simple database search for "dark comedy" might return every actor who has ever done comedy. What agents need is nuanced understanding—actors who specialize in that specific tone.
The Approach: Hybrid Search Architecture
Rather than choosing between keyword precision and semantic understanding, I built a system that leverages both through a three-stage hybrid search pipeline:
Stage 1: BM25 Keyword Matching
Why: Traditional keyword search excels at exact matches and term frequency analysis. When agents search for specific attributes like "Stockholm dialect" or "Bergman Theatre," BM25 ensures these critical markers aren't missed.
How: Weaviate's built-in BM25 algorithm tokenizes actor profiles and queries, calculating relevance scores based on term frequency and inverse document frequency. This creates a baseline of candidates who match explicit criteria.
Stage 2: Vector Embeddings for Semantic Similarity
Why: Language is nuanced. "Intense presence" and "commanding stage energy" mean similar things but share no keywords. Vector embeddings capture this semantic meaning by representing text as coordinates in high-dimensional space—similar concepts cluster together.
How: Each actor profile is converted into a 1536-dimensional vector using OpenAI'stext-embedding-3-small model. When agents search, their query is also embedded, and Weaviate performs approximate nearest neighbor (ANN) search to find profiles with similar semantic meaning—even if the exact words differ.
// Example: Embedding actor profiles
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: actorProfile.biography + " " + actorProfile.skills
});
// Store in Weaviate with vector
await weaviateClient.data.creator()
.withClassName("Actor")
.withProperties(actorProfile)
.withVector(embedding.data[0].embedding)
.do();Stage 3: LLM Reranking for Contextual Refinement
Why: The first two stages return candidates, but final ranking requires understanding complex tradeoffs. Is "15 years in dark comedy" better than "trained at Bergman School with comedy experience"? LLMs excel at this contextual reasoning.
How: The top 20 results from hybrid BM25+vector search are sent to GPT-4-class models with the original query. The model analyzes each candidate's full profile against the casting requirements, considering experience depth, training quality, and subtle preferences. It returns a reranked list with explanations for why each actor fits.
Building the AI Pipeline
1. Data Ingestion & Vectorization
The first challenge was structuring actor data for semantic search. Each profile includes:
- Biography and professional background
- Training and education (drama schools, workshops)
- Notable performances and productions
- Physical attributes (age range, appearance, voice)
- Skills and specialties (accents, stunts, musical abilities)
These fields are concatenated and embedded using OpenAI's embedding model, then stored in Weaviate alongside the original structured data. This allows both semantic search and precise filtering.
2. Hybrid Search Query Execution
When an agent submits a search query, the system:
- Embeds the query using the same OpenAI model (ensuring vector space consistency)
- Executes parallel BM25 and vector searches in Weaviate with configurable weights (currently 30% BM25, 70% vector)
- Merges and deduplicates results based on combined scores
- Applies optional filters (age range, location availability, experience level)
3. LLM Reranking & Explanation
The top 20 results are sent to GPT-4-class models with a structured prompt:
You are a casting assistant helping find the best actors for a role.
Query: "{user_query}"
Candidates:
{actor_profiles}
Rank these actors from best to worst fit, considering:
- Relevance to the specific requirements
- Depth of experience in mentioned areas
- Training and professional background
- Overall suitability for the role
Return JSON with rankings and brief explanations.This final layer adds reasoning that pure vector similarity can't capture—understanding that "10 years at Bergman Theatre" signals deep expertise, or that "dark comedy specialist" trumps "general comedy experience."
Impact & Results
Performance Metrics
- Cost per query:$0.0015 (embedding + LLM reranking)
- Average response time:<1 second (includes all 3 stages)
- Search accuracy:95% agent satisfaction with top-5 results
- Time saved:~15 minutes per casting search
Agent Feedback
"I can finally search the way I think—describing the character, not just listing keywords. The system understands nuance in a way our old database never could."
Tech Stack & Architecture
Core Technologies
- Weaviate: Vector database for hybrid search
- OpenAI: text-embedding-3-small + GPT-4-class models
- Next.js 15: Frontend & API routes
- TypeScript: Type-safe development
Key Design Decisions
- • Weaviate over Pinecone for built-in hybrid search
- • text-embedding-3-small for cost efficiency
- • 30/70 BM25/vector weight after A/B testing
- • GPT-4-class model reranking for top 20 (not all results)
- • Caching embeddings to reduce API calls
Lessons Learned
Hybrid > Pure Vector Search
Early tests with vector-only search missed obvious exact matches. Agents searching for "Bergman Theatre" expect that specific institution to rank highly—semantic similarity alone doesn't guarantee it. Combining BM25 ensures precision while vectors add recall.
LLM Reranking is Worth the Cost
Adding GPT-4-class model reranking increased cost by ~$0.001 per query but dramatically improved result quality. The contextual understanding—evaluating "15 years of dark comedy" vs. "trained at top school"—justifies the expense for high-stakes searches.
Profile Quality Matters More Than Algorithm
Rich, well-written actor profiles outperform sparse data regardless of search sophistication. Investing in profile completeness—detailed biographies, specific skills, notable productions—improved results more than any algorithmic tuning.
Future Iterations
- →Multi-modal search: Search by uploading reference images or video clips
- →Availability integration: Real-time calendar sync for booking workflow
- →Learning from clicks: Fine-tune rankings based on which actors get selected
- →Ensemble casting: Find complementary actor combinations for multi-role productions
