Can AI Really 'Think Like a Lawyer'?

The rise of artificial intelligence (AI) has rapidly accelerated developments in the legal world. Today, large language models such as Lexis+ AI, Claude, Copilot, ChatGPT 3.5, and Gemini are deployed for various tasks. A recent paper¹ extensively investigates to what extent these AI systems can perform legal reasoning according to the well-known IRAC methodology – a framework that is essential in legal education and practice.

IRAC: The Backbone of Legal Analysis

The IRAC framework (Issue, Rule, Application, Conclusion) forms the core of legal analyses and has been a standard method in the legal profession and education for years. The paper¹ explains how lawyers and students first identify the legal issue through IRAC, then name the relevant laws and regulations, subsequently apply the rule to the facts, and finally draw a well-considered conclusion. This model ensures that complex legal issues can be approached in a structured and systematic manner. The research presents a series of scenarios, ranging from simple rule analyses to complex cases where both analog and statutory reasoning are central. This examines whether LLMs are capable of adequately processing the nuances of legal thinking – and the associated critical judgment ability.

Performance of the AI Models

One of the most striking findings from the study is that all tested LLMs are capable of performing a basic IRAC analysis. However, the quality and depth of their answers varied considerably. In an extensive comparison, it appeared that the scores of the different models on the IRAC tasks varied, as shown in the table below:

Criterion	Lexis+ AI	Claude	Copilot	GPT 3.5	Gemini
Relied on Sources as Instructed	10.500	12.600	12.000	11.900	11.200
Issue Identification	11.200	13.300	12.600	11.900	11.200
Stating the Rule	11.200	12.600	12.600	12.600	10.600
Applying the Rule	7.900	12.600	9.200	8.500	8.500
Reaching Correct Conclusion	10.600	12.000	12.000	10.000	11.300
Conclusion Stated with Certainty	11.200	13.300	11.200	11.900	11.200
Chain of Thought Prompt	3.429	6.858	6.858	4.572	5.715
Hallucination	3.429	8.001	8.001	6.858	6.858
TOTAL SCORE /100	69.46	91.26	84.46	78.23	76.57

Claude scored the highest with an impressive 91.26%, while Lexis+ AI only reached 69.46%. This difference suggests that models not specifically trained on legal data sometimes perform better than models developed specifically for legal purposes.

The paper extensively discusses that the models not only differ in the extent to which they master the basic structure of an IRAC analysis, but also in how they process important elements such as "Issue Identification", "Stating the Rule", "Applying the Rule", and "Reaching the Correct Conclusion". For example, it was established that some models, such as ChatGPT and Gemini, showed a hallucination rate of approximately 14%; they drew conclusions that were not fully in line with the given facts, such as in an exercise where it was concluded that an untrained animal would still meet ADA requirements. This stands in stark contrast to other models such as Claude and Copilot, which generally provided more stable and consistent answers.

Challenges and Limitations

What the study further emphasizes is that a significant obstacle to the legal applicability of LLMs lies in their inherent inconsistency. When the same question is repeatedly posed to a model, the answers can vary considerably. This non-deterministic output poses a serious problem in a constitutional state where stability and repeatability are crucial for the reliability of legal sources (see paragraphs 91-94). Moreover, some models exhibit remarkable "false confidence," meaning they present an answer with great certainty, even if that answer is incorrect based on the facts. This phenomenon can lead to misleading information, especially when a lawyer or student relies on the apparent certainty of an AI answer system.

Improvement through Chain-of-Thought Prompting

An interesting aspect of the research is the use of the "think step by step" (chain-of-thought) prompt. This technique appeared to improve the output of some models, particularly Claude, Copilot, and Gemini, by offering additional details and deeper analysis. Although this prompting strategy had less effect on ChatGPT and Lexis+ AI, it does emphasize that there are possibilities to optimize the reasoning processes of AI. However, a fundamental limitation remains: AI models lack the ability to make moral and ethical judgments, an aspect that is crucial in the legal profession.

Implications for Legal Education and Practice

The findings of the study have far-reaching consequences for both legal education and professional practice. On one hand, AI offers enormous efficiency advantages. Think of automated document analysis, searching for jurisprudence, and compiling draft arguments. On the other hand, the authors warn that too great a dependence on AI carries the risk that future lawyers will not (fully) develop their crucial skills – such as critical thinking, logical reasoning, and ethical judgment.

The Human Factor Remains Indispensable

In summary, the study clearly shows that, although LLMs are able to perform legal analyses at a fundamental level via the IRAC method, they still do not master the full spectrum of "thinking like a lawyer." The problems around hallucinations, inconsistency, false confidence, and the lack of moral and ethical reasoning emphasize that human lawyers – with their ability for deep critical thinking and moral considerations – remain irreplaceable for the time being.

For those who want to delve deeper into the methodology, case studies, and extensive analyses of the different AI models, reading the full paper is highly recommended. This blog is based on the paper "Artificial intelligence and legal analysis: Implications for legal education and the profession"¹.

Sources

[1]Peoples, Lee F.(2025)Artificial intelligence and legal analysis: Implications for legal education and the profession. Law Library Journal.

Laden...

IRAC: The Backbone of Legal Analysis

Performance of the AI Models

Criterion	Lexis+ AI	Claude	Copilot	GPT 3.5	Gemini
Relied on Sources as Instructed	10.500	12.600	12.000	11.900	11.200
Issue Identification	11.200	13.300	12.600	11.900	11.200
Stating the Rule	11.200	12.600	12.600	12.600	10.600
Applying the Rule	7.900	12.600	9.200	8.500	8.500
Reaching Correct Conclusion	10.600	12.000	12.000	10.000	11.300
Conclusion Stated with Certainty	11.200	13.300	11.200	11.900	11.200
Chain of Thought Prompt	3.429	6.858	6.858	4.572	5.715
Hallucination	3.429	8.001	8.001	6.858	6.858
TOTAL SCORE /100	69.46	91.26	84.46	78.23	76.57

Challenges and Limitations

Improvement through Chain-of-Thought Prompting

Implications for Legal Education and Practice

The Human Factor Remains Indispensable

Can AI Really 'Think Like a Lawyer'?

Don't miss any AI developments

IRAC: The Backbone of Legal Analysis

Performance of the AI Models

Challenges and Limitations

Improvement through Chain-of-Thought Prompting

Implications for Legal Education and Practice

The Human Factor Remains Indispensable

Sources

Stay up to date with AI developments

Zahed Ashkara

Ready to start with AI Literacy?

IRAC: The Backbone of Legal Analysis

Performance of the AI Models

Challenges and Limitations

Improvement through Chain-of-Thought Prompting

Implications for Legal Education and Practice

The Human Factor Remains Indispensable

Sources