Progress in the Field of Data Resourcing of Adsorbent Materials for Water Treatment
In the face of the urgent need for precise treatment of characteristic pollutants in water bodies, the efficient development of environmental functional materials has long been limited by the bottleneck of unclear analysis of key action mechanisms and lack of high-quality data resources. In recent years, Professor Zhang Weiming's group at Nanjing University has innovatively proposed the research paradigm of “water treatment adsorbent material data resource driving”, which has successfully realized the closed-loop optimization of adsorbent materials from “performance evaluation-mechanism analysis-targeted design” through the systematic construction of a multi-source heterogeneous material database and the development of advanced machine-learning analysis tools, and has provided an opportunity for the efficient development of functional materials for the environment. Targeted design", providing a new path for the intelligent development of environmental functional materials. The group has carried out a series of researches on the application strategy of adsorbent material data resources, including data-driven adsorption model development (Separation and Purification Technology 368 (2025) 133019), data-driven high-throughput material screening (Separation and Purification Technology 339 (2024) 133019) and data-driven high throughput material screening (Separation and Purification Technology 339 (2024) 133019). Technology 339 (2024) 126732), and data-driven material reverse engineering (Environmental Science & Technology 2024, 58, 15298-15310). In this context, based on the previously constructed adsorbent material dataset for target pollutants, the group further expanded the adsorbent material dataset for a variety of oxygen-containing anions. It was found that the existing literature datasets generally have serious data bias problems: 1. about 80% of the data are concentrated in pH-neutral conditions (pH=7), while the samples under acidic and alkaline conditions are seriously scarce, leading to the modeling error in judging the influence of pH on the adsorption regulation; 2. about 65% of the initial pollutant concentrations are distributed in the range of 10-100 ppm, and data for the application scenarios of high-concentration and low-concentration waste water treatment are missing, resulting in the modeling error in judging the effect of pH on the adsorption. The lack of data for the application scenarios caused the model to misrepresent the negative correlation between concentration and adsorption. This “survivor bias” caused by “publication bias” has seriously misled the scholars to analyze the adsorption mechanism, for example, the model incorrectly determines the spatial resistance as the dominant mechanism, which is contrary to the real adsorption phenomenon. In order to address the above challenges, the group innovatively proposed an “experimental-literature composite dataset” equalization strategy, in which 697 sets of experimental data were supplemented for real-world application scenarios (e.g., typical wastewater conditions: pH 1-6, concentration 1-1000 ppm). This strategy significantly improves the model performance, with the prediction accuracy improved by 4.49% and the confidence interval narrowed by 50%. Based on the equalized dataset, the feature importance analysis clearly showed that the electrostatic effect dominated the adsorption process with a contribution of 48.4%. And the experimental validation results are consistent with this inference: the quaternary amine resin enhanced the adsorption of CrO42- by three times compared with the primary amine resin due to its higher charge density. Meanwhile, DFT theoretical calculations also supported the dominance of electrostatic interaction (rather than spatial site resistance) in the adsorption process. As a result, the group further oriented the covalent organic polymer COP-A, which enhanced the adsorption capacity by 56% compared with the conventional material through the enhancement of charge density. This study has constructed a new paradigm of “data equalization-feature classification-mechanism quantification”, which provides an effective method for the common problem of “distortion of data leads to distortion of conclusions” in the study of adsorbent materials for water treatment, and provides scientific guidance for the precise separation of characteristic pollutants in water.