Cutting corners: Spreadsheets are a staple of so many jobs and business fields that it is almost impossible to quantify their economic value. People adept with this tool are in high demand and often well-compensated. Microsoft has now created a feature that has the potential to kill some jobs while simultaneously freeing people from tedious data analysis.
Microsoft has developed a framework called SpreadsheetLLM that uses large language models for analyzing and interpreting spreadsheet data. The tool is designed to address the challenges traditional LLMs face when dealing with spreadsheets, such as handling extensive two-dimensional grids, flexible layouts, and varied formatting options. It solves this by serializing data and incorporating cell addresses, values, and formats into a data stream.
The tool includes a component that compresses spreadsheets for LLMs. It consists of three modules: one analyzes spreadsheet structure and discards non-table content; another translates data to a more efficient representation; and a third aggregates the data.
SpreadsheetLLM does have some limitations in its current form. For example, it ignores cell background colors, which may convey some meaning to the sheet. It also lacks semantic-based compression for cells containing natural language.
Weaknesses aside, it appears to be highly effective. In tests, it outperformed traditional approaches by 25.6 percent in GPT-4’s in-context learning setting. Additionally, SheetCompressor reduces token usage for spreadsheet encoding by 96 percent, significantly decreasing computational costs. SpreadsheetLLM is also very good at spreadsheet table detection, which is fundamental to spreadsheet understanding. This tool could streamline data processing in several industries.
If this sounds like a tool that can replace some jobs, you are probably correct, especially in accounting and data analysis positions. SpreadsheetLLM could allow non-technical users to query and manipulate spreadsheet data using natural language prompts with little knowledge of how the sheets work.
It is impossible to estimate the job impact since the AI/LLM industry is still burgeoning. Positions in many sectors rely heavily on spreadsheets, particularly in Microsoft Excel. These roles are highly valued, with commensurate compensation. A study of nearly 27 million job listings found that Microsoft Excel skills were the most sought-after software skill.
It also stands to reason that it could augment, rather than replace, some tasks in finance, accounting, and other data-intensive fields. For example, the model introduces a “Chain of Spreadsheet” (CoS) framework, which can decompose spreadsheet reasoning into a table “detection-match-reasoning” pipeline.
Perhaps more intriguing is the model’s ability to work with both structured and unstructured spreadsheet data. According to the researchers, this aspect could potentially reduce hallucinations in AI-generated outputs, with the spreadsheet serving as a “source of truth” to improve the reliability of AI-assisted analysis.
However, SpreadsheetLLM is not ready for a public launch. It’s still in the research phase and too raw to be incorporated into commercial products like Microsoft Excel.