Prediction and characterization of human ageing-related proteins by using machine learning

    Ageing has a huge impact on human health and economy, but its molecular basis – regulation and mechanism – is still poorly understood. By today, more than three hundred genes (almost all of them function as protein-coding genes) have been related to human ageing. Although individual ageing-related genes or some small subsets of these genes have been intensively studied, their analysis as a whole has been highly limited. To fill this gap, for each human protein we extracted 21000 protein features from various databases, and using these data as an input to state-of-the-art machine learning methods, we classified human proteins as ageing-related or non-ageing-related. We found a simple classification model based on only 36 protein features, such as the “number of ageing-related interaction partners”, “response to oxidative stress”, “damaged DNA binding”, “rhythmic process” and “extracellular region”. Predicted values of the model quantify the relevance of a given protein in the regulation or mechanisms of the human ageing process. Furthermore, we identified new candidate proteins having strong computational evidence of their important role in ageing. Some of them, like Cytochrome b-245 light chain (CY24A) and Endoribonuclease ZC3H12A (ZC12A) have no previous ageing-associated annotations.

    Csaba Kerepesi, Bálint Daróczy, Ádám Sturm, Tibor Vellai, András Benczúr
    Scientific Reports, Vol 8, 4094 (2018).