We have developed an optimization based method to predict the secondary structure of a target protein without the use of profile information. Hence, this method can be applied to proteins which do not produce reliable profile information using sequence alignment tools. The model combines two models, an α helix prediction model HELIOS (HELical prediction using Integer Optimization approacheS) and a β strand prediction prediction model BEST-PRED (BEta STrand PREDiction). For α helix prediction, a two stage infeasibility minimization problem has been introduced. The first stage is a linear programming (LP) model for parameter estimation, while the second stage is an integer programming (ILP) model for helix prediction. The residues of a target protein are divided into 4 regions depending on their putative proximity to the helix termini, and propensity to be in helices is compared to a pre evaluated residue dependent threshold propensity, using overlapping nonapeptides surrounding the central residue. BEST-PRED for β strand prediction has been introduced as an integer programming (ILP) model, which maximizes a residue's propensity to be in a β strand. The protein is divided into overlapping pentapeptides. The β strand propensity weight for the central residue is evaluated by implementing a novel combination of Naοve Bayesian and first order Markov models, which represent the physical nature of a β strand. In both models, important mathematical constraints are introduced to ensure that biologically meaningful results are presented. These constraints refer to the physical nature of the residues [5], along with the minimum and maximum secondary structure content [6]. A formulation of this kind not only provides the secondary structure prediction corresponding to the evaluated global minima, but also has the ability to provide a rank ordered list of best solutions. Such a rank ordered list can help in finding the most frequent predictions in a particular class for a given residue. Further, the formulation allows the user to add any form of prior knowledge about the secondary structure easily. This method was tested on a set of α, β and mixed α- β proteins, and the preliminary results are very encouraging. A Qα accuracy of 82% was obtained for purely α helical proteins using HELIOS, while a Qβ accuracy of 78.9% was obtained for purely β proteins using BEST-PRED. These results compare very favorably with some of the standard secondary structure prediction servers.
[1] Klepeis J.L. and Floudas C.A., 2003, ASTRO FOLD: a combinatorial and global optimization framework for ab initio prediction of three dimensional structures of proteins from the amino acid sequence, Biophysical Journal, 85, 2119 2146.
[2] Altschul SF, Gish W., Miller W., Myers E.W. and Lipman D.J., 1997, Gapped BLAST and PSI BLAST: a new generation of protein database search programs, Nucleic Acids Research, 25, 3389 3402.
[3] McGuffin L.J., Bryson K and Jones D.T., 2000, The PSIPRED protein structure prediction server, Bioinformatics, 16, 404 405.
[4] Gassend B., O'Donnell C.W., Thiel W., Lee A., van Dijk M. and Devadas S., 2007, BMC Bioinformatics, 8, S3.
[5] Aurora R. and Rose G.D., 1998, Helix Capping, Protein Science, 7, 21 38.
[6] Homaeian L., Kurgan L. A., Ruan J., Cios K. J. and Chen K., 2007, Prediction of protein secondary structure content for twilight zone sequences, Proteins, 69, 486 498.