Discovering hidden relationships in crystallographic databases

The wealth of crystallographic data, as any other data in the modern world, grows at a tremendous pace. The question is how to transfer that data into new knowledge. There are a number of resources available that allow users to access and interpret the vast amounts of data available for molecular structures, such as PDBe and PDBe Knowledge-Base. These resources help users to extract knowledge from the underlying data, through the development of tools to more easily visualize and interpret 3D structure data. Furthermore, the integration of data from multiple resources allows access to enriched information to increase the knowledge that can be derived from these structures.

 

The availability of reliable and freely accessible 3D structural data from sources such as the Cambridge Structural Database (CSD) and Protein Data Bank (PDB) has allowed researchers to explore this knowledge and apply it to various disciplines of science and technology. This available crystallographic information provides an in-depth analysis of the structural aspects of the proteins, small molecules, and cocrystals. It also helps in the development of tools to access, use and integrate the data for various purposes like data analysis, drug discovery, development of formulations, and in structural bioinformatics. The information from CSD and PDB knowledgebases is being actively used in the identification of new targets for drug discovery, understanding the mechanism of action of several drug molecules, advancing biologicals, and studying agrochemicals.

The tools available at CCDC are being actively used for the discovery of novel drug molecules, drug repositioning, and the development of novel cocrystals for pharmaceutical formulations. The workshop sessions would be in the form of interactive lectures, and hands-on training sessions. Participants will be able to learn and try out discovery tools like GOLD, CSD-CrossMiner, SuperStar, and IsoStar and apply these to their questions as well.

For more bespoke and in-depth analysis, skills in programmatic access methods can allow researchers to do much more at larger scales. Nowadays, major crystallographic databases such as PDB and CSD allow one to access their data in a programmatic manner via their exposed APIs, which leads to entirely new ways to exploit the information contained in them. One example could be uncovering the phenomenon of allostery in enzymes by machine learning algorithms, leading to the discovery of hidden allosteric pathways. The lecture will be given in interactive presentations where the participants can be directly involved in trying out and tweaking the presented programs and algorithms.