Predicting {alpha}-helix and {beta}-strand segments of globular proteins

Victor V. Solovyev, Asaf A. Salamov

Research output: Contribution to journalArticlepeer-review

Abstract

All current methods of protein secondary structure prediction are based on evaluation of a single residue state. Although the accuracy of the best of them is [~]60-70%.for reliable prediction of tertiary structure it is more useful to predict an approximate location of {alpha}-helix and {beta}-strand segments, especially prolonged ones. We have developed a simple method for protein secondary structure prediction which is oriented on the location of secondary structure segments. The method uses linear discriminant analysis to assignsegments of a given amino acid sequence a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long {alpha}-helix and {beta}-strand segments respectively. These functions combine three characteristics: hydrophobic moment, segment singlet, and pair preferences to an {alpha}-helix or {beta}-strand. The last two characteristics are calculated by summing the preference parameters of single residues and pairs of residues located in a segment and its adjacent regions. The final program SSP predicts all possible potential {alpha}-helices and {beta}-strands and resolves some possible overlap between them. Overall three-state ({alpha}, {beta}, c) prediction gives [~]65.I % correctly predicted residues on 126 non-homologous proteins using the jackknife test procedure. Analysis of the prediction results shows a high prediction accuracy of long secondary structure segments ([~]89% of {alpha}-helices of length gt;8 and 71% of {beta}-strands of length >6 are correctly located with probability of correct prediction 0.82 and 0.78 respectively). This is of importance, because the long segments mainl, determine the protein folding. Using the mean values of discriminant functions over the aligned sequences of homologous proteins, we achieved a prediction accuracy of 68.2%.
Original languageEnglish
Pages (from-to)661-669
Number of pages9
JournalBioinformatics
Volume10
Issue number6
Publication statusPublished - 1 Dec 1994

Cite this