A new pre-print study conducted by researchers from the University of Chicago has demonstrated that large language models (LLMs) like GPT-4 can successfully perform financial statement analysis, often surpassing human analysts in predicting future earnings.
This finding could have significant implications for the future of financial analysis and decision-making in financial markets.
The Study
The pre-print study, which has not yet been peer-reviewed, was published on SSRN on May 21, 2024.
It investigated whether an LLM, specifically GPT-4, could analyze standardized and anonymous financial statements to determine the direction of future earnings.
Remarkably, even without any accompanying narrative or industry-specific information, the LLM outperformed traditional financial analysts in its ability to predict earnings changes.
This advantage was most pronounced in situations where human analysts typically struggle.
The researchers compared the LLM’s performance to that of human analysts and a state-of-the-art ML (machine learning) model.
They found that the LLM’s prediction accuracy was on par with the narrowly trained ML model, suggesting that LLMs could take a central role in decision-making processes.
Methodology
The research team provided GPT-4 with standardized financial statements, including balance sheets and income statements, and instructed the model to analyze them to predict future earnings.
The study used the Compustat annual financial data from 1968 to 2021, setting aside 2022 data to predict 2023 earnings and test the model’s performance beyond its training window, which ends in April 2023.
Data filters ensured completeness and consistency, leaving 150,678 observations from 15,401 firms.
Financial statements were anonymized and standardized.
For analyst forecasts, the sample included data from 1983 onward, requiring at least three forecasts per observation, resulting in 39,533 firm-year observations.
The LLM was able to generate useful narrative insights about a company’s future performance.
This capability allowed the LLM to exhibit a relative advantage over human analysts, particularly in complex situations requiring in-depth analysis and critical thinking.
Outperforming Human Analysts
The GPT-4 predictions were more accurate than those of financial analysts.
The LLM achieved an accuracy rate of 60%, compared to the 53% accuracy rate of human analysts.
More specifically, the human analysts were right 52.71% of the time when making predictions one month after financial statements were released.
This improved to 55.95% and 56.58% after three and six months, respectively.
GPT-4, when given basic instructions, had a similar accuracy of 52.33%.
But when GPT-4 was guided to think step-by-step like a human, its accuracy jumped to 60.35%, which is much better than the analysts’ one-month predictions.
These results show that GPT-4 can do a better job than human analysts at predicting future earnings, especially when it uses a human-like, step-by-step approach.
Simply telling the model to analyze financial statements without this guidance doesn’t work as well.
The study also showed that the LLM’s predictions were not based on any memorized data from its training.
Instead, GPT-4 generated narrative insights by analyzing the financial data, similar to the thought processes of a human analyst.
This ability to synthesize information and provide meaningful predictions based on purely numeric data sets the LLM apart from traditional analytical methods.
Interestingly, GPT-4 excelled most in situations where the human analysts were either likely to be biased, and in cases where the disagreement level among the human analysts was higher than average
Economic Impact
The researchers also demonstrated a profitable trading strategy based on GPT-4’s predictions, which outperformed other machine learning-based strategies.
The trading strategies based on GPT-4’s predictions yielded higher Sharpe ratios and alphas than those based on other models.
The Sharpe ratio measures the performance of an investment compared to a risk-free asset, after adjusting for its risk.
A higher Sharpe ratio indicates better risk-adjusted returns.
Alpha represents the excess return of an investment relative to the return of a benchmark index.
Higher alphas indicate that an investment has performed better than the market average.
This suggests that the LLM’s insights could be valuable for investment strategies and financial decision-making, offering a potential edge in the market.
Discussion
The implications of this study are far-reaching.
If LLMs can consistently outperform human analysts in financial statement analysis, they could revolutionize the field of financial analysis and decision-making.
The ability of LLMs to quickly process large amounts of data and generate accurate predictions could lead to more efficient and informed financial markets.
But the study also emphasizes the importance of narrative and contextual analysis in financial predictions.
While LLMs can excel in numeric analysis, the integration of qualitative information could further enhance their predictive capabilities.
Conclusion
The study concludes that LLMs such as GPT-4 have the potential to significantly improve financial statement analysis and earnings prediction.
By outperforming human analysts and specialized ML models, LLMs could take a central role in financial decision-making processes.
“The ability to perform tasks across domains points towards the emergence of Artificial General Intelligence,” the authors write. “Broadly, our analysis suggests that LLMs can take a more central place in decision-making than previously thought.”
Study Details
- Title: “Financial Statement Analysis with Large Language Models“
- Authors: Alex G. Kim, Maximilian Muhn, Valeri V. Nikolaev
- Published online: May 20, 2024