Team NYAI

What ‘Purpose-Built’ Actually Means in Legal AI

What ‘Purpose-Built’ Actually Means in Legal AI

Purpose-built. The phrase has entered the vocabulary of legal AI without entering its definitions. For senior advocates, General Counsels, and legal-technology buyers in India, that creates a problem the marketing material does not solve. Two products described in identical language can be architecturally separated by an order of magnitude in accuracy, in defensibility, in compliance posture. The difference does not show up in the demo. It shows up in court.

This piece is an attempt to give the term its precise weight. Purpose-built is not a description. It is an architectural commitment, and that commitment now carries judicial, regulatory, and empirical consequences in India that no responsible legal team can ignore. What follows is a layered explanation of what those commitments are, what they cost to honour, and what is at stake when they are skipped.

The empirical baseline

The starting point is not architecture. It is evidence.

In 2024, researchers at Stanford’s RegLab and Institute for Human-Centered AI ran the first large-scale empirical study of legal AI hallucination. Their findings, published in the Journal of Legal Analysis, established that general-purpose large language models hallucinate between 69 and 88 percent of specific legal queries. On questions about a court’s actual holding, the error rate exceeded 75 percent. The study found that the models did not just get the law wrong; they did so confidently, often reinforcing rather than correcting incorrect legal premises supplied in the prompt.

A follow-up study by the same team, peer-reviewed in the Journal of Empirical Legal Studies in 2025, tested commercial legal-AI products grounded in retrieval-augmented generation, the architectural pattern that connects a language model to a curated corpus rather than relying on its training data alone. The hallucination rate dropped sharply, but it did not vanish. Leading commercial legal research tools still produced hallucinated answers on between 17 and 33 percent of queries.

Two conclusions follow from that data, and they are the foundation of every argument that comes after.

The first is that grounding architecture matters more than model size. Moving from a general-purpose chatbot to a properly retrieval-grounded legal system reduces hallucination roughly four to five-fold. The second is that retrieval grounding alone is insufficient. The corpus, the citator, the evaluation discipline and the engineering choices that surround the retrieval layer determine whether the system sits at the lower end of that band or the higher.

For Indian practice specifically, academic work has begun to confirm the same direction of travel. The IL-TUR benchmark, released at ACL 2024 by researchers at IIT Kanpur and collaborators, evaluates language models against a corpus of 34,000 Supreme Court and High Court documents across English, Hindi, and nine Indian languages. On the court-judgement-prediction task, the best baseline language model scored an F1 of 0.76, against a human benchmark of 0.94. In a separate study presented at ACL’s NLLP workshop in 2025, frontier models tested against the Supreme Court Advocate-on-Record examination underperformed the human topper on long-form reasoning, with reliability failures clustering on three Indian-specific dimensions: procedural and format compliance, citation discipline, and forum-appropriate voice and structure.

These are the precise dimensions on which fabricated citations enter Indian filings. The architecture has to answer them.

What Indian courts have already established

The empirical case is now reinforced by judicial direction, and the trajectory in India has been swift.

In August 2023, the Delhi High Court in Christian Louboutin SAS v. M/s The Shoe Boutique [CS(COMM) 583/2023] became the first Indian court to address the use of generative AI in litigation. Justice Prathiba M. Singh held that the tool could at best aid preliminary research, and could not substitute the human element of adjudication. The Court warned of the risk of imaginative data and fictional case law in AI-generated output. That early warning has aged well.

In December 2024, the Income Tax Appellate Tribunal at Bengaluru passed an order in Buckeye Trust v. PCIT citing four authorities, three Supreme Court decisions and one Madras High Court ruling that did not exist. The Tribunal recalled the order in January 2025 under Section 254(2) of the Income Tax Act, 1961. The fabricated citations were traced to the use of a general-purpose AI tool.

In September 2025, the Delhi High Court dismissed a petition in Greenopolis Welfare Association v. Narender Singh after senior counsel demonstrated that the petitioner had relied on quoted paragraphs of Raj Narain v. Indira Nehru Gandhi, (1972) 3 SCC 850 that did not appear in the judgment, and a second authority Chitra Narain v. DDA that did not exist. The Court, presided over by Justice Girish Kathpalia, warned of contempt and perjury consequences.

In January 2026, the Bombay High Court in Deepak v. Heart & Soul Entertainment Ltd. [2026 SCC OnLine Bom 209] imposed costs of ₹50,000 on a litigant whose written submissions cited a non-existent authority. Justice M.M. Sathaye observed that the formatting features of the submission bullet markers, repetitive structure, and formulaic phrasing indicated the work of a generative AI tool.

On 27 February 2026, the Supreme Court in Gummadi Usha Rani v. Sure Mallikarjuna Rao [SLP(C) No. 7575/2026] took suo motu cognisance of an Andhra Pradesh trial court order that had been built on four fabricated AI-generated judgements. The Bench of Justices P.S. Narasimha and Alok Aradhe set the matter in language that should now be read as the operating standard for legal practice in India:

A decision based on such non-existent and fake alleged judgements is not an error in the decision-making. It would be a misconduct, and legal consequences shall follow.

The Court issued notice to the Attorney General, the Solicitor General, and the Bar Council of India, and appointed Senior Advocate Shyam Divan as amicus curiae. On 5 May 2026, it directed the Bar Council to constitute an expert committee to examine the use of artificial intelligence in court proceedings.

The judicial response has been complemented by formal court policy. The Kerala High Court in July 2025 issued a binding policy for the district judiciary that prohibits the use of public generative AI tools in case work, mandates verification of every AI output against primary sources, and ties violations to disciplinary proceedings. The Gujarat High Court in April 2026 issued its own policy under Articles 225 and 227 of the Constitution, prohibiting the use of AI in judicial reasoning, order drafting, and bail or sentencing considerations, mandating compliance with the Digital Personal Data Protection Act, 2023, and barring the upload of party names, witness identities, or privileged communications into AI systems.

The progression in less than three years has been from cautious commentary to disciplinary categorisation. Indian courts will no longer treat AI hallucination as a technology problem. They will treat it as a conduct problem.

Indian regulators are converging on the same architectural template

A second pattern, less visible but no less consequential, has emerged from India’s sectoral regulators. Without a unifying AI statute, the regulators have begun to converge independently on the same architectural demands.

The Reserve Bank of India’s Framework for Responsible and Ethical Enablement of Artificial Intelligence, released on 13 August 2025, is built around seven principles: trust, people first, innovation, fairness, accountability, explainability, and resilience operationalised through twenty-six recommendations across six pillars. The framework treats AI risk in regulated financial entities as comparable to credit risk and cyber risk, and demands governance at the same level. The Department of Supervision’s underlying survey records that only twenty percent of the 612 surveyed entities were either using or developing AI, meaning four out of five Indian regulated financial entities are still in the pre-deployment phase. That is an unusually large window in which to set standards, and the RBI has used it.

The Securities and Exchange Board of India’s consultation paper on responsible AI and machine learning in Indian securities markets, issued in June 2025, requires regulated entities to document the logic of their AI models so that outcomes are explainable, traceable, and repeatable. It mandates shadow testing with live traffic, retention of model inputs and outputs for at least five years, fairness and bias prevention, plain-language investor disclosure, and critically it assigns sole responsibility for AI outcomes to the regulated entity, even where the model is procured from a third party.

The Digital Personal Data Protection Act, 2023, and the DPDP Rules notified on 13 November 2025, apply to any AI system that processes digital personal data. Significant Data Fiduciaries are required to conduct annual data protection impact assessments and audits, and to verify through algorithmic due diligence that the algorithmic software they deploy does not pose a risk to the rights of data principals. Non-compliance attracts penalties up to ₹250 crore. The Rules are in phased rollout, with most substantive obligations coming into force by May 2027 a runway, not an extension.

The Central Drugs Standard Control Organisation’s draft guidance on medical device software, released in October 2025, introduced a risk-based classification system for software in and as a medical device, and most relevantly a requirement for an Algorithm Change Protocol governing adaptive AI updates. That last point matters, because it is the first Indian regulatory recognition that adaptive AI requires lifecycle controls rather than one-time approvals.

The pattern across these instruments is unmistakable. Five demands recur: explainability of how the output was produced, traceability of the data that produced it, accountability of the entity that deployed it, demonstrable confidentiality, and human oversight by design. A general-purpose chatbot run over a public API meets none of these by default. A purpose-built system is architected to evidence each.

What ‘purpose-built’ means, technically

With the empirical and regulatory ground established, the architectural definition follows. Purpose-built legal AI rests on five engineering choices, each of which has direct evidentiary support and each of which a buyer can verify.

1. Retrieval-augmented generation over a verified, jurisdiction-specific corpus.

The Stanford studies establish empirically that retrieval grounding is the dominant factor in legal accuracy. Generation quality matters less than retrieval quality. A purpose-built Indian system therefore begins with a continuously updated corpus of Central and State Acts, Rules, Notifications, Government Orders, Circulars, Gazettes, Supreme Court and High Court judgements, and tribunal orders, indexed against authoritative repositories. The test for the buyer is not whether retrieval exists. It is whether the retrieval is grounded in authoritative primary sources and whether the corpus is continuously updated against the day’s notifications.

2. Structured legal data layers, not flat document indexes.

Legal text is not undifferentiated prose. Statutes are graphs of provisions, amendments, repeals, and savings clauses. Judgements are graphs of citations, distinguishings, and overrulings. Without a structured statute tree and a structured citator, even a retrieval-augmented system will surface stale authority. The Bharatiya Nyaya Sanhita, 2023 will be confused with the Indian Penal Code, 1860. A judgement read down by a Constitution Bench will be cited as good law. The architectural answer is a knowledge layer, separate from the language model, that holds the legal ecosystem in structured form. This is the engineering work that distinguishes a search-and-summarise tool from a legally aware system.

3. Citation traceability to primary sources.

The Gujarat High Court policy, the Kerala High Court policy, and the Supreme Court’s direction in Gummadi Usha Rani converge on a single non-negotiable: every citation must be verifiable against an authoritative primary source. This is an engineering choice, not a user-interface afterthought. Each cited paragraph in an output must link to the underlying judgement, surface the actual paragraph as it appears in the source, and log the retrieval. A purpose-built system does not just say where the law comes from. It shows the user the law, in its source.

4. Explainability and human oversight by design.

The OECD AI Principles, endorsed by forty-seven governments and through the G20, mandate transparency, explainability, and accountability anchored in traceability of datasets, processes, and decisions. The European Union’s AI Act classifies AI used by judicial authorities in researching and interpreting facts and law as a high-risk system, with effect from August 2026, requiring human oversight, documentation, and risk management. The Indian regulatory instruments RBI FREE-AI, SEBI’s 2025 paper, the Kerala and Gujarat HC policies demand the same. A purpose-built system surfaces its working, not just its output. The lawyer using it can see what was searched, why a result was retrieved, and on what basis the answer was constructed.

5. Domain-grounded evaluation.

Generic benchmarks tell us nothing about legal accuracy. A model that excels on standardised reasoning benchmarks may still hallucinate three quarters of its citations on Indian law. The Stanford team’s call that the legal profession should turn to public benchmarking and rigorous evaluation is now beginning to be answered. LegalBench, with its 162 hand-crafted legal reasoning tasks contributed by forty practising lawyers, is the public benchmark for general legal reasoning. IL-TUR provides the Indian equivalent. A purpose-built system is one whose accuracy claims are testable against these benchmarks, with results that hold up to publication.

These five choices are not independent features. They are layers of a single architecture. Removing any one of them collapses the others. A retrieval system without a structured citator surfaces stale law. A citator without traceability cannot be defended in court. Traceability without evaluation discipline cannot be improved. Evaluation discipline without explainability cannot be communicated to the lawyer who has to rely on the output.

What purpose-built is not

It is worth being explicit about three categories of legal AI that are sometimes described as purpose-built but architecturally are not.

A general-purpose model with a legal vocabulary is not purpose-built. Vocabulary is not knowledge. A model that can produce a paragraph in the cadence of the Supreme Court has not, for that reason, read the Supreme Court. The Stanford findings on hallucination on specific legal queries apply with full force to such systems.

A fine-tuned model without a retrieval layer is not purpose-built. Fine-tuning adjusts the weights of a model to a domain. It does not give the model access to current statutes, this week’s notifications, or yesterday’s order of the Supreme Court. The legal landscape moves faster than any training run. Without a live retrieval layer, the system will speak law fluently and cite law inaccurately.

A general-purpose chatbot wrapped in a legal interface is not purpose-built. The wrapper does not change the underlying behaviour. Confidentiality, citation traceability, statutory currency, evaluation discipline none of these are added by the user interface.

The architectural test is whether the system can answer five questions, structurally:

Where does the data come from, and when was it last updated? What is the structured knowledge layer that connects statutes to amendments and judgements to overrulings? How is each citation in the output traced to the source paragraph? How is the system evaluated, against what benchmark, and with what published result? How is confidentiality preserved through every step of the lawyer’s use?

A system that can answer these architecturally is purpose-built. A system that cannot is not.

What this means for the Indian buyer

For the senior advocate, the General Counsel, the law firm partner, or the legal-technology buyer in India, the implications are practical and immediate.

The first is that the responsibility under the Supreme Court’s direction in Gummadi Usha Rani is non-delegable. Citing fabricated AI-generated judgements is misconduct, and the consequences fall on the lawyer who signs the filing, not on the platform that produced the citation. The choice of legal AI is therefore not a technology procurement decision. It is a professional risk decision, and it sits at the same level of seniority as the choice of which precedent to lead with.

The second is that compliance with India’s emerging architectural standard, the Kerala and Gujarat High Court policies, the DPDP Rules, the RBI and SEBI frameworks where applicable cannot be retrofitted. A system that was not architected for explainability does not become explainable through configuration. A system that was not architected for citation traceability does not become traceable through training. The architecture is the compliance posture.

The third is that the market is now sufficiently mature that the architectural questions can be asked at the procurement stage, and answered. Grand View Research projects the Indian legal AI market to reach 106.3 million United States dollars by 2030, growing faster than any other regional market. The vendors that deserve serious consideration are the ones whose architectural choices are documented, whose accuracy claims are testable, and whose compliance with the Indian regulatory perimeter is evidenced rather than asserted.

A practical procurement test follows from this. Evaluate vendors against the five architectural requirements identified above. Run a controlled pilot in which the vendor’s system is tested against a representative set of Indian legal queries, with primary-source verification of every cited authority. Establish, through the pilot, where the system sits in the seventeen-to-thirty-three percent band that the Stanford evaluation has shown to be the upper bound of credible retrieval-grounded performance, and treat any system that cannot demonstrate measurement at all as architecturally unverified.

A precise word, used precisely

Purpose-built has been used loosely in legal AI marketing because the cost of using it precisely is high. Building an Indian legal corpus end-to-end, structuring it into statute trees and citator graphs, grounding generation in retrieval, surfacing every citation to its source, evaluating against published Indian benchmarks, and architecting for the DPDP regime and the Kerala and Gujarat policies each of these is a multi-year engineering investment, and together they are the work that the term names.

The Stanford findings, the IL-TUR benchmark, the Supreme Court’s direction in Gummadi Usha Rani, the Kerala and Gujarat High Court policies, the RBI FREE-AI framework, the SEBI 2025 paper, and the DPDP Rules read in sequence form a single argument. India has decided what it expects from legal AI. The architecture has to answer.

Purpose-built is not a feature. It is the precondition for everything else.

Share on

LinkedIn

|

X

Stop Struggling.

Start Leading with NYAI.