Comput Biol Chem. 2022 May 10;98:107693. doi: 10.1016/j.compbiolchem.2022.107693. Online ahead of print.
Accurately identifying protein-metal ion ligand binding residues is the key to study protein functions. Because the number of binding residues and non-binding residues is significantly imbalanced, false positives is hard to be eliminated from the binding residues prediction result. Therefore, identification of protein-metal ion ligand binding residues remains challenging. In this paper, the binding site of 7 metal ions (Ca2+, Mg2+, Zn2+, Fe3+, Mn2+, Cu2+ and Co2+) were used as the objects of the study. Besides generally adopted parameters: amino acids and predicted secondary structure information, we creatively introduced ten orthogonal properties as a parameter. These orthogonal properties are clustering of 188 physical and chemical characteristics that can be used to describe three-dimension structural information. With the optimized parameters, we used the Random Forest algorithm to predict ion ligand binding residues. The proposed method obtained good prediction results with the MCC values of Mg2+, Ca2+ and Zn2+ reaching 0.255, 0.254, 0.540, respectively. Comparing to the IonSeq method, the method developed in this paper has advantages on the binding residues prediction of some ions.