13-Jun-2022 - Westfälische Wilhelms-Universität Münster

More Data in Chemistry

Clearer reporting of negative experimental results would improve reaction planning in chemistry

Databases containing huge amounts of experimental data are available to researchers across a wide variety of chemical disciplines. However, a team of researchers have discovered that the available data is unsuccessful in predicting the yields of new syntheses using artificial intelligence (AI) and machine learning. Their study published in the journal Angewandte Chemie suggests that this is in large part down to the tendency of scientists not to report failed experiments.

Although AI-based models have been particularly successful in predicting molecular structures and material properties, they return rather inaccurate predictions for information relating to product yields in synthesis, as Frank Glorius and his team of researchers at Westfälische Wilhelms-Universität Münster, Germany, have discovered.

The researchers attribute this failure to the data used to train AI systems. “Interestingly, the prediction of reaction yields (reactivity) is much more challenging than the prediction of molecular properties. Reactants, reagents, quantities, conditions, the experimental execution—all determine the yield, and thus, the problem of yield prediction becomes very data-intensive,” explains Glorius. So, despite the huge amounts of available literature and results, the researchers came to realize that the data is not fit for accurate predictions of the expected yield.

The problem is not only down to a lack of experiments. In contrast, the team identified three possible causes for biased data. Firstly, the results of chemical syntheses may be flawed due to experimental error. Secondly, when chemists are planning their experiments, they may, either consciously or unconsciously, introduce bias based on personal experience and reliance on well-established methods. Finally, since only reactions with a positive outcome are believed to contribute to progress, failed reactions are reported less frequently.

To find out which of these three factors had the greatest influence, Glorius and the team purposely altered the datasets for four different, commonly used (and therefore data-rich) organic reactions. They artificially increased experimental error, reduced the size of the data sampling sets, or removed negative results from the data. Their investigations showed that the experimental error had the smallest influence on the model, while the contribution made by the lack of negative results was fundamental.

The group hopes that these findings will encourage scientists to always report failed experiments as well as their successes. This would improve data availability for training AI, ultimately helping to speed up planning and making experimentation more efficient. Glorius adds: “machine learning in (molecular) chemistry will increase efficiency dramatically and fewer reactions will have to be run to achieve a certain goal, for example, an optimization. This will empower chemists and will help them to make chemical processes—and the world—more sustainable.”

Facts, background information, dossiers
  • artificial intelligence
  • Machine Learning
More about WWU Münster
  • News

    New way to produce important molecular entity

    Among the most common structures relevant to the function of biologically active molecules, natural products and drugs are so-called vicinal diamines - in particular, unsymmetrically constructed diamines. Vicinal diamines contain two functional atomic groups responsible for the substance pr ... more

    Researchers solve a problem in organic chemistry

    In chemicals used in agriculture, as well as in pharmaceuticals and a variety of materials, pyridines are often found as so-called functional units which decisively determine the chemical properties of substances. Pyridines belong to the group of ring-shaped carbon-hydrogen (C-H) compounds ... more

    Researchers show that chiral oxide catalysts align electron spin

    Controlling the spin of electrons opens up future scenarios for applications in spin-based electronics (spintronics), for example in data processing. It also presents new opportunities for controlling the selectivity and efficiency of chemical reactions. Researchers recently presented first ... more

More about Angewandte Chemie
  • News

    Making Drinking Water Bacteria-Free

    Water contaminated with bacteria is a large threat to global health. A Chinese research team has described a simple new method of disinfection in the journal Angewandte Chemie. It is based on tiny biocompatible assemblies of atoms, known as quantum dots, made of silver sulfide with caps mad ... more

    Vanillin from Kraft Lignin

    Huge amounts of technical or Kraft lignin are formed during pulp production. This lignin is difficult to process and so is usually just incinerated for heat production. A team of researchers, reporting in the journal Angewandte Chemie, have now succeeded in developing a green method for rec ... more

    Two Worlds, One Material

    Until now, it has been clear: you can have a metal or a plastic, but not both in one. However, things don’t have to stay that way. In the journal Angewandte Chemie, a Chinese research team has now reported a polymer with a metallic backbone that is conductive, thermally stable, and has inte ... more