Abstract
Background: CRISPR/Cas9, a new generation of targeted gene editing technology with low cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible, but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction.
Methods: We compared three encoding methods based on One-Hot and combined the gene sequence with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance.
Results: The performance is compared with existing tools based on the ROC value and PRC value. The experimental results show that the XGBCRISPR model is superior to the existing tools.
Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy of model can be improved further as many off-target scores appear.
Keywords: CRISPR/Cas9, off-target effects, machine learning, ensemblelearning, XGBoost, XGBCRISPR.
Graphical Abstract