IRIS - Res&Arch Institutional Research Information System - Research & Archive

The recent increase of bioactivity data freely available to the scientific community and stored as activity data points in chemogenomic repositories provides a huge amount of ready-to-use information to support the development of predictive models. However, the benefits provided by the availability of such a vast amount of accessible information are strongly counteracted by the lack of uniformity and consistency of data from multiple sources, requiring a process of integration and harmonization. While different automated pipelines for processing and assessing chemical data have emerged in the last years, the curation of bioactivity data points is a less investigated topic, with useful concepts provided but no tangible tools available. In this context, the present work represents a first step toward the filling of this gap, by providing a tool to meet the needs of end-user in building proprietary high-quality data sets for further studies. Specifically, we herein describe Q-raKtion, a systematic, semiautomated, flexible, and, above all, customizable KNIME workflow that effectively aggregates information on biological activities of compounds retrieved by two of the most comprehensive and widely used repositories, PubChem and ChEMBL.

Q-raKtion: A Semiautomated KNIME Workflow for Bioactivity Data Points Curation

Palazzotti D.;Fiorelli M.;Sabatini S.;Massari S.;Barreca M. L.;Astolfi A.

2022

Abstract

The recent increase of bioactivity data freely available to the scientific community and stored as activity data points in chemogenomic repositories provides a huge amount of ready-to-use information to support the development of predictive models. However, the benefits provided by the availability of such a vast amount of accessible information are strongly counteracted by the lack of uniformity and consistency of data from multiple sources, requiring a process of integration and harmonization. While different automated pipelines for processing and assessing chemical data have emerged in the last years, the curation of bioactivity data points is a less investigated topic, with useful concepts provided but no tangible tools available. In this context, the present work represents a first step toward the filling of this gap, by providing a tool to meet the needs of end-user in building proprietary high-quality data sets for further studies. Specifically, we herein describe Q-raKtion, a systematic, semiautomated, flexible, and, above all, customizable KNIME workflow that effectively aggregates information on biological activities of compounds retrieved by two of the most comprehensive and widely used repositories, PubChem and ChEMBL.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Rivista su cui è pubblicata l'opera
	
				JOURNAL OF CHEMICAL INFORMATION AND MODELING
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
acs.jcim.2c01199.pdf accesso aperto Tipologia di allegato: PDF-editoriale Licenza: Creative commons Dimensione 2.8 MB Formato Adobe PDF Visualizza/Apri	2.8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1538255

Citazioni

5

11

10

social impact