New AI system extracts numerical data from academic texts, freeing researchers from routine tasks

The Quinex framework automatically structures quantitative data and is designed to help manage the growing flood of data

21-Apr-2026
AI-generated image

Symbolic image

Numbers are the language of science—yet in research articles, they are often buried within the text and difficult to analyze. Researchers at Jülich have developed an AI system that automatically identifies these numbers, categorizes them, and converts them into structured data. The Quinex framework thus eliminates the need for time-consuming manual work.

Whether in energy, climate, or materials research—scientific papers are full of numbers—or, more precisely, quantitative data: efficiencies, temperatures, costs, emissions. These are often crucial for improving models or identifying trends. At the same time, the number of scientific publications is growing rapidly. For many research questions, it is now virtually impossible to manually evaluate all relevant publications—the time and resources required would be enormous.

The Quinex (“Quantitative Information Extraction”) framework, developed by researchers at Jülich, is based on language models and automates this process: Artificial intelligence identifies numerical values, assigns them to appropriate units, and recognizes what was measured, when, where, and how. Thus, a sentence like “Efficiency levels of 63 to 71 percent are assumed for 2025” is transformed into a structured dataset containing all relevant contextual information—from the year and measurement method to the source.

Open and Efficient AI

Unlike many proprietary AI solutions, Quinex is based entirely on open, relatively small, and thus efficient language models. These have been specifically trained to recognize and classify quantitative information in scientific texts. Compared to similar systems, Quinex delivers more precise results, captures contextual information in a more nuanced way, and also takes implicit characteristics into account.

Despite its compact size, Quinex achieves a recognition accuracy (F1) of around 98 percent for numbers and associated units, and approximately 87 and 82 percent for the classification of quantified properties and entities. These high accuracy rates were achieved through specially created training datasets and methodological improvements.

“We wanted to develop a tool that is powerful, yet also transparent and resource-efficient,” explains Dr. Jann Weinand, head of the Integrated Scenarios Department at Jülich System Analysis. “Quinex makes artificial intelligence more accessible for data analysis in science.”

Successful Practical Test

To test Quinex’s practical suitability, the system was applied to thousands of scientific abstracts from various fields. It successfully extracted data on electricity production costs for various energy technologies, on maximum oxygen uptake in humans, on earthquake magnitudes and locations, and on the band gaps of photovoltaic materials.

The automatically derived values closely matched the respective reference data. This demonstrates that Quinex is well-suited for analyzing large volumes of academic literature across a wide range of research fields and deriving reliable trends from it.

New Perspectives for Research

“Language models open up new perspectives for science and help maintain an overview of entire research fields,” says lead author Jan Göpfert. “They enable automated literature searches, the creation of uniformly structured research databases, and trend analyses that reveal developments in science and technology at an early stage.”

“Our goal is to relieve researchers of routine work,” says Dr. Patrick Kuckertz, head of the Research Data Management Group. “Quinex is designed to help them arrive at insights more quickly and manage the growing flood of data in science.”

Limitations and future improvements

Quinex isn’t entirely error-free either—but transparency is part of its design. “The system recognizes numbers and units very reliably,” says Jan Göpfert. “Since they are taken directly from the text, they cannot be ‘hallucinated.’ However, misinterpretations sometimes occur, for example when important references are scattered throughout the text.”

Thus, Quinex remains a tool that supports people but does not replace them. “We recommend using Quinex where it informs and relieves researchers—but the responsibility for interpreting the results remains with them,” says Göpfert. Every recognized number can be traced back to its source and, where possible, is highlighted in the original text.

The team is working to further develop Quinex with additional domain-specific datasets and models, making it even more efficient and flexible enough to adapt to various research requirements.

Open Collaboration Welcome

Forschungszentrum Jülich is making Quinex available as an open-source project.

This is intended to give researchers worldwide the opportunity to test, expand, and adapt the system to their own fields—from energy research to chemistry and biomedicine.

Original publication

Other news from the department science

Most read news

More news from our other portals

Is artificial intelligence revolutionising chemistry?