Large Language Models

FinText is a suite of specialised LLMs designed specifically for the accounting, finance, and related fields. By being pre-trained on high-quality, domain-specific historical data, FinText has aimed to mitigate critical issues such as look-ahead bias and information leakage. A diverse range of textual datasets has been utilised, including news articles, regulatory filings, IP records, key corporate information, speeches from the ECB and the FED, transcripts of corporate events, board member information, and Wikipedia for general knowledge, covering the period from 2007 to 2023. Notably, a separate model has been pre-trained for each year within this timeframe.

Listen to the 6-minute podcast summarising the paper here