The 2–6 Hours Hiding in Every Credit File

May 2026 · Financial Services

Walk the credit floor of any regional bank or credit union and you'll find some of the most expensive talent in the building doing some of the least valuable work. Skilled analysts — people hired to read a business, weigh its risks, and reach a credit judgment — sit rekeying borrower tax returns, compiled financial statements, and rent rolls into spreading templates, then assembling credit memos paragraph by paragraph. The figure most often cited for a single commercial file is two to six hours of analyst time, depending on entity complexity and the number of guarantors. Multiply that across a year of new and renewing credits and you're spending thousands of analyst-hours on transcription instead of underwriting.

I spend a lot of time with mid-market lenders deciding where to point their first serious AI effort, and I keep landing on the same answer: commercial loan spreading. Not because it's the flashiest use case — it is decidedly not — but because it's the one where the value is easiest to prove, the technology is genuinely ready, and the regulatory risk stays low if you scope it correctly. Let me make that case.

Why spreading, and why now

Three things make spreading the right first workflow rather than the fashionable one.

The baseline is already measured. Every Chief Credit Officer already tracks hours per file, time-to-decision, and analyst capacity. That matters more than it sounds. The reason most AI pilots can't prove their worth is that nobody measured the "before"; in spreading, the before is on a dashboard the bank has watched for years. ROI here is arithmetic, not anecdote.

Speed has quietly become a competitive weapon. A traditional bank still takes weeks to turn a commercial credit around; digital and fintech lenders decision in hours to days and fund inside a week. Borrowers notice, and the relationship doesn't always survive the wait. The gap shows up in approvals, too — large banks green-light only something like 13–15% of small-business applications, while digital lenders underwriting on alternative data approve two to three times that share. Some of that difference is risk appetite, but a meaningful chunk is simply that a borrower who gets a "yes" in two days doesn't wait around three weeks for a "maybe." This isn't a slow market you can take your time in, either — FDIC data showed industry loan growth accelerating to 5.9% year over year in late 2025, the fastest in nearly three years, led by commercial real estate and C&I. Credit unions crossed $1.7 trillion in loans outstanding over the same stretch. More files are arriving, and the institutions that turn them faster are taking the deals.

The market is enormous and under-automated. There are still roughly 3,900 community banks and more than 4,000 credit unions in the U.S., and the overwhelming majority spread credit by hand. This is not a solved problem that a few laggards haven't caught up on. It's an open one.

The work is more interesting than it looks

The naive version of this project — "point an OCR tool at a tax return and dump the numbers into a template" — is exactly why so many spreading pilots disappoint. The reason spreading resisted automation for so long isn't that the documents are hard to read. It's that the judgment lives in the edge cases, and a tool that handles only the clean 80% while quietly fumbling the messy 20% is worse than useless, because the 20% is where the credit risk hides.

A pipeline worth deploying has to recognize and surface the things an experienced analyst catches almost reflexively:

Officer compensation and discretionary add-backs. Owner pay above a reasonable market wage, depreciation, amortization, and genuinely one-time expenses get added back to reach available cash flow — but only the defensible ones, and a model needs to flag the call rather than make it silently.
Related-party rent. When a borrower pays rent to an affiliated entity, or will hold the property going forward, that figure has to be normalized or the cash flow picture is fiction.
Global cash flow and guarantor roll-ups. Banks, SBA lenders, and credit unions underwrite the whole web — business plus guarantors plus related entities — not a single return in isolation.
Covenants and structure buried in the fine print. A 1.25x DSCR covenant, a balloon note disclosed only in a footnote, a line of credit maturing mid-year, a subsequent-event capital commitment in Note 5. These are precisely the details that don't live in a tidy field, and precisely the ones an examiner will ask about.

Conventional debt-service-coverage thresholds cluster around 1.20x to 1.35x depending on facility and collateral, which means a tenth of a turn is the difference between a clean approval and a structured one. Extraction that's "mostly right" isn't good enough when the load-bearing numbers carry that much weight. The differentiator of a real system isn't raw accuracy on the easy fields — it's whether it knows what it doesn't know and routes that to a human.

What a production pipeline actually looks like

It is not a chatbot bolted onto a document folder. It's a pipeline with four honest stages, each playing to a real strength of modern document AI:

Extraction with calibrated confidence scoring. Vendors love to quote 99%-plus field accuracy, and on clean, structured documents they're not lying. Real tax returns and scanned compiled statements are neither clean nor structured, so the number that matters isn't headline accuracy — it's calibration. A well-built system that says it's 95% confident should be right 95% of the time, so that low-confidence fields get auto-routed to a reviewer instead of flowing through unnoticed. Industry straight-through-processing rates sit around 40–60% for typical document workflows and above 85% for the best; the goal isn't to chase 100%, it's to put a human exactly where the model is unsure.
Validation that catches when the statements don't tie. Retained-earnings roll-forwards, balance sheets that balance, totals that foot. These deterministic checks are unglamorous and they catch the errors that erode trust fastest.
A retrieval-grounded first-draft memo. This is where retrieval-augmented generation earns its place. The draft memo is grounded in the borrower's own documents and the bank's own credit policy, with enforced citations so every statement traces back to a source — a policy clause or a line on a return. Standard RAG optimizes for fluent prose; what a regulated lender needs is auditability, where a reviewer can click any claim and see where it came from. That distinction is the whole game.
Integration and visibility. Output lands in the loan origination system where work actually happens, and a dashboard tracks hours per file, time-to-decision, and capacity so the program keeps proving itself.

Done this way, the analyst's day inverts. Instead of starting with a stack of PDFs and a blank template, they start with a populated spread, a flagged set of exceptions, and a draft memo to edit and own. The tool does the boring 80%; the human does the 20% that was always the actual job.

Why this is a data project before it's an AI project

The single most common reason a spreading pilot stalls isn't the model. It's that the borrower documents arrive as scanned PDFs of wildly varying quality, the credit policy lives in a binder and three veterans' heads, and the loan origination system was never designed to accept structured input from anywhere but a human typing into it. Industry research on AI deployments is blunt on this point: the data foundation is the number-one precondition, and projects fail on poor or siloed data far more often than on inadequate algorithms.

For a mid-market lender, that means the first weeks of any real project are deeply unglamorous. You standardize how documents come in. You get the credit policy into a form a retrieval system can actually cite, clause by clause. You build a clean, validated write-path into the LOS so the spread and the draft memo land where the analyst already works. None of that demos well, and it's exactly the part vendors skip in a sales cycle — which is why so many pilots look brilliant on a curated set of files and fall apart on the fifty-first.

There's a strategic reading of this too. The same body of research that finds most pilots failing also finds that externally built, well-scoped solutions reach production roughly twice as often as internal "let's experiment with AI" efforts — not because outside teams have better models, but because they're disciplined about scope and plumbing in a way internal experiments rarely are. The model is a commodity everyone can buy. The integration into your documents, your policy, and your systems is the part that's actually hard, and it's where the value is defensible.

The line you don't cross

Here's the part that separates a project that survives an exam from one that invites an enforcement action: automate the spreading and the memo, never the decision.

Credit approvals sit squarely inside fair-lending law, and the regulators have been explicit. The CFPB's 2023 guidance made clear that the Equal Credit Opportunity Act and Regulation B require lenders to give applicants the specific, accurate principal reasons for an adverse action — and that a creditor cannot escape that obligation by claiming its model is too complex or opaque to explain. Reg B gives you about 30 days to deliver that notice with real reasons. A black-box system making the actual decision is a compliance problem wearing an efficiency costume.

Model risk management compounds it. Anything that drives a credit decision falls under longstanding supervisory expectations — documented validation, monitoring, change control — and while recent guidance has emphasized that those expectations should scale to a bank's size and risk, the safe and frankly smarter play for a mid-market lender is to keep the human as the decision-maker of record. Let the AI produce auditable, source-linked data and a defensible draft. Let your credit officer decide. That framing isn't just lower-risk; it's an easier sale to your own credit committee, and adoption is where these projects live or die.

Why most pilots never make it

The uncomfortable backdrop to any AI conversation in 2026 is that most pilots fail. MIT's NANDA initiative found that roughly 95% of enterprise generative-AI pilots delivered no measurable impact on the bottom line — and crucially, the failures were rarely about the technology. Gartner expects a large share of generative-AI projects to be abandoned after proof-of-concept. The cause is almost always organizational: bad data plumbing, no clear owner, and analysts who don't trust the output enough to rely on it.

That's actually good news for a lender willing to be disciplined, because it means the differentiator isn't a smarter model — everyone has access to the same models. It's the unglamorous work: clean data integration, validation rules tuned to your credit policy, and a rollout that earns analyst trust by being transparent about its own uncertainty. The research consistently finds that externally built, well-scoped solutions reach production far more often than internal science projects. Community bankers are warming to this — the share describing themselves as highly concerned about AI roughly halved between 2024 and 2025 — but their remaining anxiety is about governance, which is exactly the part a careful spreading deployment gets right.

The 90-day test

My advice to lending teams is always the same, and it's deliberately low-drama. Pull fifty historical files you've already decided. Run them through a pilot pipeline. Then score the output against what your analysts actually produced — not just field-level extraction accuracy, but accuracy on the load-bearing fields, how often the draft memo needed material correction, and how many hours the draft saved when it was right.

If the numbers don't make the case, stop. You'll have spent very little and learned something real. But in a document-heavy workflow with a baseline you already measure and edge cases your analysts already know cold, they usually do make the case — and when they do, you have a business case written in your own data instead of a vendor's slide deck. That's the version of AI that gets funded, gets adopted, and gets through the exam.