Why You Can’t Ask Your Favorite LLM for an R&D Tax Credit

Neo.Tax
April 16, 2026

Large-Language Models (LLMs) are improving constantly. And it feels like we’re told about the massive gains every single day. What was once the province of mimicry and chatbots can now write books or plan vacations or build sophisticated apps.

So, of course, we’ve been asked: why can’t I just ask my [insert your favorite LLM here] to file my R&D Tax Credit? The answer gets to the heart of what we’re building at Neo.Tax.

See, the thing is: when it comes to tax, perfect is the only “good enough” that counts. LLMs will get impressively close to an answer, but for the details they can’t find, they’ll often “hallucinate”—filling in the blanks with guestimates or full-on fabrications. Obviously, in something as exacting as taxes, that can’t happen. But there’s actually more to the issue than the threat of hallucination.

Mind the Context Window 

LLMs are ingenious at solving problems within a narrow context. Unfortunately, as the context window expands, the LLM begins to struggle. To get to an answer, an LLM has to understand the connective tissues and patterns within a data set. That’s why managing the context window of an underlying LLM is the key. 

As we explained earlier, perfect is the only "good enough" in tax, and as the data set scales, managing a context window to the specificity necessary for tax is a massive challenge.

Here are some of the keys to understanding why a complex project like R&D filing requires a set of specialized algorithms like the ones we’ve built (and continue to improve upon) at Neo.Tax:

1. As Data Load Increases, Hallucinations Increase: If you ask an LLM to divide a set of tickets into groups, it will sometimes forget tickets. These “forgotten” tickets are a problem because that can directly translate into missing Qualified Research Expenses (QREs). Most troublingly, LLMs are more likely to forget tickets as the number of tickets increases—which is precisely when the value of using AI is highest.

2. Quality Decreases as Scale Increases: LLM output quality degrades with context length, which means that extra information does not help the LLM reason better. Instead, it distracts the LLM. You're much more likely to get high-quality output when you give your favorite LLM a single focused task. A major component in designing and engineering AI systems is therefore figuring out how to curate the right set of relevant information for the LLM to process.

3. Project Management Data is Rarely Consistent and Easy to Ingest: The kind of information that's relevant for determining projects varies a lot from one company to another. This means that the AI system needs to be adaptive and flexible in fetching the relevant context. Even within the same company, different teams will leverage their project management systems  differently. That means that you need an intelligent harness around the LLM, which knows when to focus within teams and when to look across teams to determine project boundaries.

4. Tickets Sometimes Only Can Be Categorized in Context: This may be the most challenging aspect of grouping tickets, which off-the-shelf LLMs have not been trained for. Tickets have relationships between them, so in most cases the relevant context for understanding a ticket is not the ticket itself, but the tickets it's related to. That means that for each ticket, the amount of potentially relevant information explodes, and you need to have a specialized strategy for figuring out which related tickets are relevant for determining R&D projects.

A Complex Problem Calls for a Specialized Approach

As you start to understand the challenge of digesting and sorting thousands to hundreds of thousands of tickets into relevant projects (which then must be determined as qualified or not qualified), you realize why a specialized approach is necessary for R&D tax credits. 

Our engineers built an intelligent harness that integrates with the CS teams' onboarding process. When a customer onboards, we learn how their ticketing system is organized, and use that information to adapt our algorithms to their data.

By taking the time to understand the data structure and then having the ability to train and adapt our algorithms to solve for a specific company’s data, Neo.Tax can solve R&D at scale without hallucination. 

Our AI harness was not built to write books or plan travel or build sophisticated apps. It was built specifically for R&D tax credits. (And we have another dedicated harness for software capitalization!) So, that’s what it does better than any other LLM on the market.

Share this post

Catch up on the latest news and updates

Subscribe To Our Newsletter

Insights on R&D tax credits and AI innovation delivered to your inbox every month.