北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (5): 62-68.doi: 10.13190/j.jbupt.2018-308
一种在线集群异常作业预测方法
谢丽霞, 汪子荧
- 中国民航大学 计算机科学与技术学院, 天津 300300
-
收稿日期:
2018-12-23出版日期:
2019-10-28发布日期:
2019-11-25 -
作者简介:
谢丽霞(1974-),女,教授,E-mail:lxxie@126.com. -
基金资助:
国家自然科学基金民航联合研究基金项目(U1833107);国家科技重大专项项目(2012ZX03002002);中央高校基本科研业务费专项资金项目(ZYGX2018028)
An Online Cluster Anomaly Job Prediction Method
XIE Li-xia, WANG Zi-ying
- School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
-
Received:
2018-12-23Online:
2019-10-28Published:
2019-11-25
摘要: 设计了作业子任务动态特征计算方式;其次依据此动态特征提出一种改进门控递归单元(IGRU)神经网络;然后采用IGRU根据动态特征实时预测任务终止状态是否异常;最后根据作业与其子任务运行状态之间的状态相关性检索异常作业,完成对异常作业的预测.实验结果表明,在线集群异常作业预测在预测灵敏度、误差率、精确度和预测时长方面与其他预测方法相比有明显提升;在保障集群平台安全方面具有一定的应用性.
中图分类号:
引用本文
谢丽霞, 汪子荧. 一种在线集群异常作业预测方法[J]. 北京邮电大学学报, 2019, 42(5): 62-68.
XIE Li-xia, WANG Zi-ying. An Online Cluster Anomaly Job Prediction Method[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(5): 62-68.
[1] Google cluster data[EB/OL]. (2010-01-10)[2018-09-10]. http://googleresearch.blogspot.com/2010/01/google-cluster-data.html. [2] 刘春红, 韩晶晶, 商彦磊. 基于SVM分类的云集群失败作业主动预测方法[J]. 北京邮电大学学报, 2016, 39(5):104-109. Liu Chunhong, Han Jingjing, Shang Yanlei. Predicting job failure in cloud cluster:based on SVM classification[J]. Journal of Beijing University of Posts and Telecommunications, 2016, 39(5):104-109. [3] 王意洁, 孙伟东, 周松, 等. 云计算环境下的分布存储关键技术[J]. 软件学报, 2012, 23(4):962-986. Wang Yijie, Sun Weidong, Zhou Song, et al. Key technologies of distributed storage for cloud computing[J]. Journal of Software, 2012, 23(4):962-986. [4] Soualhia M, Khomh F, Tahar S. Predicting scheduling failures in the cloud:a case study with Google clusters and Hadoop on Amazon EMR[C]//2015 IEEE 17th International Conference on High Performance Computing and Communications. Piscataway:IEEE Press, 2015:58-65. [5] Chen X, Lu C D, Pattabiraman K. Failure analysis of jobs in compute clouds:a Google cluster case study[C]//2014 IEEE 25th International Symposium on Software Reliability Engineering. Piscataway:IEEE Press, 2014:167-177. [6] Jakobik, Agnieszka, Grzonka D, Palmieri F. Non-deterministic security driven meta scheduler for distributed cloud organizations[J]. Simulation Modelling Practice and Theory, 2017, 76(8):67-81. [7] Sonoda M, Kikuchi S, Watanabe Y, et al. Online failure prediction in cloud datacenters by real-time message pattern learning[C]//Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science Proceedings. Piscataway:IEEE Press, 2012:504-511. [8] 唐红艳, 李影, 贾统, 等. 基于时间序列分析的杀手级任务在线识别方法[J]. 计算机科学, 2017, 44(4):43-46. Tang Hongyan, Li Ying, Jia Tong, et al. Time series based killer task online recognition approach[J]. Computer Science, 2017, 44(4):43-46. [9] Tang H, Li Y, Jia T, et al. Hunting killer tasks for cloud system through behavior pattern learning[C]//IEEE/IFIP International Conference on Dependable Systems & Networks Workshop. Piscataway:IEEE Press, 2016:1-12. [10] Liu C, Han J, Shang Y, et al. Predicting of job failure in compute cloud based on online extreme learning machine:a comparative study[J]. IEEE Access, 2017, 5(99):9359-9368. [11] Garraghan P, Townend P, Xu J. An empirical failure-analysis of a large-scale cloud computing environment[C]//2014 IEEE 15th International Symposium on High-Assurance Systems Engineering. Piscataway:IEEE Press, 2014:113-120. [12] Rosa A, Chen L Y, Binder W. Predicting and mitigating jobs failures in big data clusters[C]//2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Piscataway:IEEE Press, 2015:221-230. [13] Yamnual K, Phunchongharn P, Achalakul T. Failure detection through monitoring of the scientific distributed system[C]//2017 International Conference on Applied System Innovation. Piscataway:IEEE Press, 2017:568-571. [14] Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014, 55(9):1406-1420. [15] Lipton Z C, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning[J]. Computer Science, 2015, 56(10):1506-1543. |
[1] | 袁建国 余林峰 游薇 庞宇. 一种优化错误模式集的极化码 SCL-Flip 译码算法[J]. 北京邮电大学学报, 2023, 46(4): 76-82. |
[2] | 田心记 王坤. IRS辅助下行NOMA系统中最大化折中的方法[J]. 北京邮电大学学报, 0, (): 0-0. |
[3] | 黄龙杨 张楠 刘笑笑 申滨. 异构蜂窝网络中基于雾节点协作贡献度的计算卸载算法[J]. 北京邮电大学学报, 2023, 46(2): 37-42. |
[4] | 张浩 冯春燕 杨佳汇 郭彩丽 周博文. 面向语义通信的3D骨骼点数据编码与压缩方法[J]. 北京邮电大学学报, 2022, 45(6): 62-69. |
[5] | 王则予 张梦菲 孙耀华 彭木根. 星上透明转发非地面网络中的切换机制研究[J]. 北京邮电大学学报, 2022, 45(6): 105-112. |
[6] | 朱剑锋 孙耀华 彭木根. 低轨卫星通信系统的前导序列设计研究[J]. 北京邮电大学学报, 2022, 45(6): 78-84. |
[7] | 牛凯 吴泊霖 戴金晟 王森 袁弋非. 面向6G系统的极化编码调制[J]. 北京邮电大学学报, 2022, 45(6): 1-11. |
[8] | 张顺外 夏子寒. 基于SWIPT的乘积Polar编码协作构造与分析[J]. 北京邮电大学学报, 2022, 45(5): 54-59. |
[9] | 赵海英, 解光鹏, 高子惠. 基于数学规则的龟背纹自动生成方法[J]. 北京邮电大学学报, 2022, 45(4): 110-115. |
[10] | 赵海英 解光鹏 高子惠. 基于数学规则的龟背纹自动生成方法[J]. 北京邮电大学学报, 2022, 45(4): 119-125. |
[11] | 郭世泽, 吕仁健, 何明枢, 张杰, 俞赛赛. 流谱理论及其在网络防御中的应用[J]. 北京邮电大学学报, 2022, 45(3): 19-25. |
[12] | 王惠琴, 杨丽荣, 彭清斌, 曹明华, 陈丹. 基于离散小波变换的O-OFDM-DIM[J]. 北京邮电大学学报, 2022, 45(3): 57-63. |
[13] | 刘绍华, 李明豪, 李兆歆, 毛天露, 刘京. 一种联合采样的神经网络光场[J]. 北京邮电大学学报, 2021, 44(6): 109-115. |
[14] | 赵海英, 李焕洪, 侯小刚. 基于感知驱动的纹饰矢量图评价方法[J]. 北京邮电大学学报, 2021, 44(6): 53-58. |
[15] | 刘小明, 齐涛, 甘露. 一种宽带反射型极化旋转器[J]. 北京邮电大学学报, 2021, 44(6): 20-25,32. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||