Haemophilia is an X-linked genetic disorder in which A and B types are the most common that occur due to absence or lack of protein factors VIII and IX, respectively. Severity of the disease depends on mutation. Available Machine Learning (ML) methods that predict the mutational severity require high time complexity and have compromised accuracy. In this study, Haemophilia 'A' patient mutation dataset containing 7784 mutations was processed by the proposed Position-Specific Mutation (PSM) and One-Hot Encoding (OHE) encoding technique to predict the disease severity. The dataset was processed by PSM and OHE, analyzed, and trained for classification of mutation severity level using various ML algorithms. Surprisingly, PSM outperformed OHE, both in terms of time efficiency and accuracy, with training and prediction time improvement in the range of approximately 91 to 98% and 80 to 99% respectively. The accuracy was also improved by using PSM with different ML algorithms.
Singh Vikalp Kumar, Maurya Neha Shree, Mani Ashutosh, Yadav Rama Shankar
Factor VIII, Haemophilia, Machine learning, Mutation, One-hot encoding, Position specific mutation