The €1.2M in your archive — what 20,000 unsearched CVs are actually worth

Most boutique recruitment firms are sitting on between 5,000 and 50,000 CVs scattered across Outlook attachments, OneDrive folders, exports from the ATS they used in 2019, and personal Dropbox accounts. Nobody can search across it. So functionally it doesn't exist.

The headline math is uncomfortable, so I'll show the workings and the caveats. The number is deliberately conservative on three independent variables. The point isn't to over-promise — the point is to demonstrate that even on cautious assumptions, the value is real and big.

Why a ten-year archive is bigger than you think

The recruitment industry doesn't publish a "CVs received per desk per year" benchmark. On the firms I've audited, the realistic range is 2,000 to 4,000 CVs per desk per year, across active and passive channels — applications, LinkedIn outreach replies, referrals, candidate-of-candidate emails.

A ten-year-old, ten-desk firm at the low end of that range has 20,000 CVs in its loose-file archive. At the high end, 40,000. Most owners I talk to underestimate the volume by half — the data accumulates quietly, in places nobody thinks to count.

Why most of it is genuinely worthless

Before the upside, the honest counterweight: most of a long-archive isn't placeable, and pretending otherwise is the fastest way to lose credibility.

B2B contact data decays at 22.5 to 70.3% per year depending on industry and seniority; 70.8% of business contacts change something within 12 months — title, phone, email, employer (Landbase data-decay statistics). Compounded over five years, that's most of your old contacts unreachable through the channels you originally had for them.

Recruitment-specific factors compound the decay: candidates retire, leave the country, change careers, move into roles you don't fill, accept counter-offers and stay put, get headhunted by your competitor and stop responding to anyone. By year five of an archive, a sober estimate is that 60–70% of the file is genuinely stale — wrong number, wrong email, wrong life situation.

That leaves roughly 40% reachable on a ten-year archive. On 20,000 CVs, that's 8,000 viable contacts. Already smaller than the headline. We're not done shrinking.

Why the rest is worth more than your top biller

The 40% who are reachable are a particular kind of pool: people who, at some point in the past, demonstrated interest in being recruited by you. They opted in. They sent a CV. They took a call. They're warmer than a cold LinkedIn outreach by a meaningful margin.

The Recruiterflow source-of-hire data finds that 63% of placements at firms with a working CRM come from candidates already in the system before the job order opens; top-quartile firms hit 71% (Recruiterflow analysis). Greg Savage's independent figure, drawn from twenty years of consulting boutique firms, is that 50%+ of placements made through LinkedIn or job-board searches involve a candidate who was already in the agency's own database (Greg Savage, "Is your database a 'candidate graveyard'?").

Two independent sources converging: between half and two-thirds of placements were already in the firm's own data. In a spreadsheet shop, that data is unsearchable. The placements happen anyway — but you pay LinkedIn Recruiter to rediscover the candidate you already paid to source two years ago.

Realistic reactivation-campaign benchmarks for well-segmented outreach to former candidates: 10–30% genuine re-engagement, with 3–10% booking/meeting rates (Adonis Media database-reactivation benchmarks). I'll take the bottom of that range — 15% — as the defensible number.

The conservative math, shown

For a 10-desk firm · 10-year archive

CVs in the archive	start	20,000
Currently reachable (not too stale)	× 40%	8,000
Genuine reactivation interest	× 15%	1,200
Placement conversion (warm pipeline)	× 4%	48 placements
Average permanent-placement fee	× €25k	€1,200,000

Total reactivation revenue €1.2M

Three independent conservatism levers: 40% reachable (low end of decay-adjusted survival), 15% reactivation (bottom of the 10–30% industry benchmark), 4% conversion (a standard sales-pipeline number for warm B2B leads, not the optimistic recruitment-specific figure).

The €25,000 fee assumes a mid-level placement at around €110,000 salary at 23%, which matches DACH/CH boutique permanent-placement benchmarks: 15–30% of annual salary in Germany, 15–33% in Switzerland (IT and specialist roles trending 20–30%), 15–25% in Croatia (Search X Recruitment; Headcount.ch on Swiss recruitment costs). On industrial trades and lower-salary placements, drop the fee to €12–15k and the total falls to €600–700k. Either way, the order of magnitude holds.

The single-placement test: at €25k average fee, you need one reactivated candidate to pay for an entire archive extraction project. The other 47 placements over 12–24 months are the actual return.

Why off-the-shelf parsing isn't enough

Commercial CV-parsing services exist and are well-priced. Affinda parses at roughly $0.80 per CV at volume (~$800/month base plan). RChilli starts at $75/month for 500 credits. Sovren/Textkernel from $99/month for 500 documents, scaling to $500+/month at 5,000 documents. Daxtra is custom enterprise quotes (Affinda pricing).

On raw parsing cost alone, custom build doesn't beat them. A standalone parsing run on 20,000 CVs lands between roughly $1,600 (Affinda volume pricing) and $15,000+ (Daxtra enterprise tier).

The gap isn't parsing — it's integration and ownership. A parser gives you structured JSON. A structured JSON sitting unused in a folder is the same problem as PDFs sitting unread — you've moved the data, you haven't moved the value. Archive Unlock as a custom build extracts into your existing CRM, with the schema your team already searches, on infrastructure you already own. The cost premium over raw parsing buys you the integration work that turns extracted data into usable candidate records.

When Archive Unlock fails

Honest counter-cases. Archive reactivation produces bad outcomes when one of the following is true:

The archive is too young. Under two years old, the candidates are mostly still in the same roles you originally pitched them. Reactivation rates collapse.
There's no follow-up capacity. If you can't call 1,200 people across 12 months — or build the outreach sequences to engage them — extracting the data produces a list nobody works.
There's no destination CRM. Extracting into a folder is half the project. The data needs to land somewhere searchable, where consultants will actually use it.
The original consent is genuinely gone. Under GDPR Article 6(1)(f), past business relationships generally give you legitimate-interest grounds to re-contact for similar opportunities, with a clear opt-out. But if the candidates were sourced through scraping, bought lists, or unconsented third-party referrals, that defence weakens.
The fee math doesn't work. For a firm doing €5k average placements in volume staffing (rather than €25k permanent), the absolute revenue at 48 placements is closer to €240k — still good ROI, but the headline drops accordingly.

On GDPR and re-contact. A buyer in Germany or Austria will immediately ask whether you can legally email 1,200 dormant candidates. The short answer: yes, under Article 6(1)(f) legitimate interests for past business relationships, with a clear unsubscribe and respect for any prior opt-out. The longer answer involves a Legitimate Interests Assessment (LIA) for the campaign. Archive Unlock includes the data structure to support this; it does not replace the legal review your DPO or counsel will want.

The honest bottom line

Most boutique recruitment firms over five years old are sitting on a six-figure asset that's invisible because it's not searchable. The reactivation work is unglamorous — extract, structure, segment, re-engage, log — and most firms never get around to it because it sits below the urgency line of new business.

The €1.2M number above is for a ten-desk firm with a ten-year archive on conservative assumptions. For a five-desk firm with a five-year archive, the math is roughly a quarter of that — still €300k of latent placement value. For a thirty-desk firm, multiple millions. The order of magnitude moves with your size; the underlying logic is the same.

If the math even roughly applies to you, the next question is whether the archive is genuinely too young, the follow-up capacity is genuinely absent, or the destination CRM genuinely isn't there. If none of those caveats apply — the work pays for itself in the first placement.

Sorapis · Archive Unlock

Extract, structure, integrate. Three to five days.

Archive Unlock takes PDFs, Word files, scanned images and ZIP exports from your old ATS, extracts them into structured candidate records inside your CRM (or a clean export if you don't have one yet), with confidence-scoring and low-confidence flagging for review. Swiss-hosted processing available for regulated data. From €3,000 standalone, or bundled into a full CRM build.

Talk about your archive →

Anto Andrijanic · Sorapis · Custom CRM and operations systems for boutique recruitment firms on Microsoft 365.

Sources

Recruiterflow — Source of hire data — 63% from existing CRM
Greg Savage — "Is your database a 'candidate graveyard'?"
Greg Savage — "Your database is cheating on you"
Landbase / SignalHire — B2B data decay statistics
Adonis Media — Database reactivation benchmarks
Search X Recruitment — Recruitment agency fees by country
Headcount.ch — Swiss recruitment cost benchmarks
Affinda — CV parsing pricing reference
Treegarden — Candidate database management guide