Diffusion-based Time Series Data Imputation for Microsoft 365

Yang, Fangkai; Yin, Wenjie; Wang, Lu; Li, Tianci; Zhao, Pu; Liu, Bo; Wang, Paul; Qiao, Bo; Liu, Yudong; Björkman, Mårten; Rajmohan, Saravan; Lin, Qingwei; Zhang, Dongmei

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2309.02564 (cs)

[Submitted on 3 Aug 2023]

Title:Diffusion-based Time Series Data Imputation for Microsoft 365

Authors:Fangkai Yang, Wenjie Yin, Lu Wang, Tianci Li, Pu Zhao, Bo Liu, Paul Wang, Bo Qiao, Yudong Liu, Mårten Björkman, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

View PDF

Abstract:Reliability is extremely important for large-scale cloud systems like Microsoft 365. Cloud failures such as disk failure, node failure, etc. threaten service reliability, resulting in online service interruptions and economic loss. Existing works focus on predicting cloud failures and proactively taking action before failures happen. However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently based on the observed data. Our experiments and application practice show that our model contributes to improving the performance of the downstream failure prediction task.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.02564 [cs.DC]
	(or arXiv:2309.02564v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2309.02564

Submission history

From: Wenjie Yin [view email]
[v1] Thu, 3 Aug 2023 10:25:17 UTC (2,645 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Diffusion-based Time Series Data Imputation for Microsoft 365

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Diffusion-based Time Series Data Imputation for Microsoft 365

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators