ArXiv Preprint
Incident management for cloud services is a complex process involving several
steps and has a huge impact on both service health and developer productivity.
On-call engineers require significant amount of domain knowledge and manual
effort for root causing and mitigation of production incidents. Recent advances
in artificial intelligence has resulted in state-of-the-art large language
models like GPT-3.x (both GPT-3.0 and GPT-3.5), which have been used to solve a
variety of problems ranging from question answering to text summarization. In
this work, we do the first large-scale study to evaluate the effectiveness of
these models for helping engineers root cause and mitigate production
incidents. We do a rigorous study at Microsoft, on more than 40,000 incidents
and compare several large language models in zero-shot, fine-tuned and
multi-task setting using semantic and lexical metrics. Lastly, our human
evaluation with actual incident owners show the efficacy and future potential
of using artificial intelligence for resolving cloud incidents.
Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, Saravan Rajmohan
2023-01-10