आईएसएसएन: 0974-276X
William F Porto, Simone Maria-Neto, Diego O Nolasco and Octavio L Franco
Protein structures can provide some functional evidences. Therefore structural genomics efforts to identify the functions of hypothetical proteins have brought advances in our understanding of biological systems. To this end, a new strategy to mine protein databases in the search for candidates for function prediction was here described. The strategy was applied to Escherichia coli proteins deposited in the NCBI’s non-redundant database. Briefly, data mining selects small conserved hypothetical proteins without significant templates on Protein Data Bank, without transmembrane regions and with similarity to Eukaryote proteins. Through this strategy, 12 protein sequences were selected for molecular modelling, from a total of 13,306 E. coli's conserved hypothetical sequences. From these, only three sequences could be modelled. GI 488361128 model was similar to cupredoxins, GI 281178323 model was similar to β-barrel proteins and GI 227886634 model showed structural similarities to lipid binding proteins. However, only the GI 227886634 seems to have a function related to the similar structures, since it was the unique structure that kept the fold during the molecular dynamics simulation. The method here described can be relevant to select hypothetical sequences that can be targets for in vitro and/or in vivo functional characterization.