[1]石 磊,李 天,高宇飞,等.基于机器学习的数据库系统参数优化方法综述[J].郑州大学学报(工学版),2024,45(01):1-11.[doi:10.13705/j.issn.1671-6833.2024.01.008]
 SHI Lei,LI Tian,GAO Yufei,et al.A Review of Machine Learning-Based Methods for Database Tuning[J].Journal of Zhengzhou University (Engineering Science),2024,45(01):1-11.[doi:10.13705/j.issn.1671-6833.2024.01.008]
点击复制

基于机器学习的数据库系统参数优化方法综述()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45
期数:
2024年01期
页码:
1-11
栏目:
出版日期:
2024-01-19

文章信息/Info

Title:
A Review of Machine Learning-Based Methods for Database Tuning
作者:
石 磊 李 天 高宇飞 卫 琳 李翠霞 陶永才
1.郑州大学 网络空间安全学院,河南 郑州 450002;2. 郑州大学 计算机与人工智能学院,河南 郑州 450001;3. 嵩 山实验室,河南 郑州 450046
Author(s):
SHI Lei LI Tian GAO Yufei WEI Lin LI Cuixia TAO Yongcai
1. School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China; 2. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; 3. Songshan Laboratory, Zhengzhou 450046, China
关键词:
数据库系统 参数优化 性能优化 机器学习 强化学习 数据库运维
Keywords:
database system knobs tuning performance optimization machine learning reinforcement learning database maintenance
DOI:
10.13705/j.issn.1671-6833.2024.01.008
文献标志码:
A
摘要:
参数优化是影响数据库性能和适应性的关键技术,合理的参数配置对于保障数据库系统的高效运行至 关重要,但由于参数较多且参数间具有强关联性,传统参数优化方法难以在高维连续的参数空间中寻找最优配 置,机器学习的发展为解决这一难题带来新的机遇。 通过总结和分析相关工作,将已有工作按照发展时间和特 性分为专家决策、静态规则、启发式算法、传统机器学习方法和深度强化学习方法。 对数据库参数优化问题进行 定义,并说明启发式算法在参数优化问题上的局限性。 介绍基于传统机器学习的参数优化方法,包括随机森林、 支持向量机、决策树等,描述机器学习方法解决参数优化问题的一般流程并给出一般实现。 由于需要大量带标 注的数据,传统机器学习模型在适应性和调优能力等方面存在不足。 侧重介绍深度强化学习模型的工作原理, 定义参数优化问题与深度强化学习模型的映射关系,比较基于深度强化学习的相关工作对数据库性能提升、模 型训练时间和涉及的技术,描述基于深度神经网络构建和训练智能体的具体流程。 最后,总结已有工作的特点, 对当前机器学习在数据库参数优化方面的研究热点和发展方向进行展望,指出多粒度调优、自适应算法和自运 维是未来的研究趋势。
Abstract:
Knobs tuning is a key technology that affects the performance and adaptability of databases. However, traditional tuning methods have difficulty in finding the optimal configuration in high-dimensional continuous parameter spaces. The development of machine learning could bring new opportunities to solve this problem. By summarizing and analyzing relevant work, existing work was classified according to development time and characteristics, including expert decision-making, static rules, heuristic algorithms, traditional machine learning methods, and deep reinforcement learning methods. The database tuning problem was defined, and the limitations of heuristic algorithms in tuning problems were discussed. Traditional machine learning-based tuning methods were introduced, including random forest, support vector machine, decision tree, etc. The general process of using machine learning methods to solve tuning problems was described, and specific implementations were provided. The shortcomings of traditional machine learning models in adaptability and tuning capabilities were also discussed. The principles of deep reinforcement learning models were emphasized, and the mapping relationship between tuning problems and deep reinforcement learning models was defined. Recent relevant work on improving database performance, time consumption and model characteristics was introduced, and the process of building and training agents based on deep neural networks was described. Finally, the characteristics of existing work were summarized, and the research hotspots and development directions of machine learning in database tuning were outlined. Distributed scenarios, multi-granularity tuning, adaptive algorithms and self-maintenance capabilities were identified as future research trends

参考文献/References:

[1] 李国良, 周煊赫, 孙佶, 等. 基于机器学习的数据库技术综述[J]. 计算机学报, 2020, 43(11): 2019-2049.LI G L, ZHOU X H, SUN J, et al. A survey of machine learning based database techniques[J]. Chinese Journal of Computers, 2020, 43(11): 2019-2049.

[2] FRANÇOIS-LAVET V, HENDERSON P, ISLAM R, et al. An introduction to deep reinforcement learning[J]. Foundations and Trends in Machine Learning, 2018, 11(3/4): 219-354.
[3] 黄万伟, 郑向雨, 张超钦, 等. 基于深度强化学习的智能路由技术研究[J]. 郑州大学学报(工学版), 2023, 44(1): 44-51.HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[J]. Journal of Zhengzhou University (Engineering Science), 2023, 44(1): 44-51.
[4] ZHENG C H, DING Z H, HU J L. Self-tuning perfor-mance of database systems with neural network[C]∥10th International Conference on Intelligent Computing. Piscataway: IEEE, 2014: 1-12.
[5] ZHANG X Y, WU H, LI Y, et al. Towards dynamic and safe configuration tuning for cloud databases[C]∥Proceedings of the 2022 International Conference on Management of Data. New York: ACM, 2022: 631-645.
[6] KUNJIR M, BABU S. Black or white how to develop an AutoTuner for memory-based analytics[C]∥Procee-dings of the 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020: 1667-1683.
[7] SCHNAITTER K, POLYZOTIS N. Semi-automatic index tuning: keeping DBAs in the loop[J]. Proceedings of the VLDB Endowment, 2012, 5(5): 478-489.
[8] FEKRY A, CARATA L, PASQUIER T, et al. To tune or not to tune in search of optimal configurations for data analytics[C]∥Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining. New York: ACM, 2020: 2494-2504.
[9] FEKRY A, CARATA L, PASQUIER T, et al. Tuneful: an online significance-aware configuration tuner for big data analytics[EB/OL]. (2020-01-22)[2023-08-01]. https:∥arxiv.org/abs/2001.08002.
[10] MARCO A, BERKENKAMP F, HENNIG P, et al. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization[C]∥2017 IEEE International Conference on Robo-tics and Automation (ICRA). Piscataway: IEEE, 2017: 1557-1563.
[11] ZHANG J, ZHOU K, LI G L, et al. CDBTune+: an efficient deep reinforcement learning-based automatic cloud database tuning system[J]. The VLDB Journal, 2021, 30(6): 959-987.
[12] TRUMMER I. DB-BERT: a database tuning tool that “reads the manual”[C]∥Proceedings of the 2022 International Conference on Management of Data. New York: ACM, 2022: 190-203.
[13] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2019-05-24)[2023-08-01]. https:∥arxiv.org/abs/1810.04805.
[14] HAYDEN M. MySQLTuner needs you[EB/OL]. [2023-08-01]. https:∥github.com/major/MySQLTuner-perl.
[15] XU T Y, JIN L, FAN X P, et al. Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software[C]∥Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015: 307-319.
[16] ZHU Y Q, LIU J X, GUO M Y, et al. BestConfig: tapping the performance potential of systems via automatic configuration tuning[C]∥Proceedings of the 2017 Symposium on Cloud Computing. New York: ACM, 2017: 338-350.
[17] Oracle. Oracle database online documentation 11g release2(11.2)[EB/OL]. [2023-08-01]. https:∥docs.oracle.com/cd/E11882_01/index.html.
[18] IBM. DB2 tuning overview[EB/OL]. [2023-08-01]. https:∥www.ibm.com/docs/en/sdse/6.4.0?topic=overview-db2-tuning.
[19] Microsoft. Tune applications and databases for perfor-mance in Azure SQL Database and Azure SQL Managed Instance[EB/OL]. [2023-08-01]. https:∥learn.microsoft.com/en-us/azure/azure-sql/database/performance-guidance?view=azuresql-mi.
[20] DUAN S Y, THUMMALA V, BABU S. Tuning database configuration parameters with iTuned[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 1246-1257.
[21] VAN AKEN D, PAVLO A, GORDON G J, et al. Automatic database management system tuning through large-scale machine learning[C]∥Proceedings of the 2017 ACM International Conference on Management of Data. New York: ACM, 2017: 1009-1024.
[22] TAN J, ZHANG T Y, LI F F, et al. iBTune: individua-lized buffer tuning for large-scale cloud databases[J]. Proceedings of the VLDB Endowment, 2019, 12(10): 1221-1234.
[23] MAHGOUB A, WOOD P, GANESH S, et al. Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads[C]∥Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. New York: ACM, 2017: 28-40.
[24] KANELLIS K, DING C, KROTH B, et al. LlamaTune: sample-efficient DBMS configuration tuning[EB/OL]. (2022-05-10)[2023-08-01]. https:∥arxiv.org/abs/2203.05128v1.[25] CEREDA S, VALLADARES S, CREMONESI P, et al. CGPTuner[J]. Proceedings of the VLDB Endowment, 2021, 14(8): 1401-1413.
[26] MAHGOUB A, MEDOFF A, KUMAR R, et al. OPTIMUSCLOUD: heterogeneous configuration optimization for distributed databases in the cloud[C]∥ 2020 USENIX Annual Technical Conference. Berkeley: USENIX Association, 2020: 189-203.
[27] LIMA M I V, DE FARIAS V A E, PRACIANO F D B S, et al. Workload-aware parameter selection and perfor-mance prediction for in-memory databases[C]∥ Brazilian Symposium on Bioinformatics. Brazil: SBC, 2018: 169-180.
[28] SUI Y N, GOTOVOS A, BURDICK J, et al. Safe exploration for optimization with Gaussian processes[J] Proceedings of Machine Learning Research, 2015, 37: 997-1005.
[29] GUNASEKARAN K P, TIWARI K, ACHARYA R. Deep learning based auto tuning for database management system[EB/OL]. (2023-05-24)[2023-08-01]. https:∥arxiv.org/abs/2304.12747.
[30] 沈忱, 邰凌翔, 彭煜玮.面向自动参数调优的动态负载匹配方法[J].计算机应用, 2021, 41(3): 657-661.SHEN C, TAI L X, PENG Y W. Dynamic workload matching method for automatic parameter tuning[J]. Journal of Computer Applications, 2021, 41(3): 657-661.
[31] ISHIHARA Y, SHIBA M. Dynamic configuration tuning of working database management systems[C]∥2020 IEEE 2nd Global Conference on Life Sciences and Technologies (LifeTech). Piscataway: IEEE, 2020: 393-397.
[32] SIEGMUND N, GREBHAHN A, APEL S, et al. Performance-influence models for highly configurable systems[C]∥Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015: 284-294.
[33] NAIR V, MENZIES T, SIEGMUND N, et al. Using bad learners to find good configurations[C]∥Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017: 257-267.
[34] RODD S F, KULKARNI U P. Adaptive self-tuning techniques for performance tuning of database systems: a fuzzy-based approach[C]∥2013 2nd International Conference on Advanced Computing, Networking and Security. Piscataway: IEEE, 2013: 124-129.
[35] TAFT R, EL-SAYED N, SERAFINI M, et al. P-store: an elastic database system with predictive provisioning[C]∥ Proceedings of the 2018 International Conference on Management of Data. New York: ACM, 2018: 205-219.
[36] BAO L, LIU X, WANG F Z, et al. ACTGAN: automatic configuration tuning for software systems with generative adversarial networks[C]∥2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE, 2019: 465-476.
[37] HA H, ZHANG H Y. DeepPerf: performance prediction for configurable software with deep sparse neural network[C]∥2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). Piscataway: IEEE, 2019: 1095-1106.
[38] FANG X, ZOU Y, FANG Y G, et al. A query-level distributed database tuning system with machine learning[C]∥2022 IEEE International Conference on Joint Cloud Computing (JCC). Piscataway: IEEE, 2022: 29-36.
[39] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. (2013-12-19)[2023-08-01]. https:∥arxiv.org/abs/1312.5602.
[40] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[41] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[EB/OL].(2018-12-13)[2023-08-01]. https:∥arxiv.org/abs/1812.05905.
[42] LI G L, ZHOU X H, LI S F, et al. QTune: a query-aware database tuning system with deep reinforcement learning[J]. Proceedings of the VLDB Endowment, 2019, 12: 2118-2130.
[43] CAI B Q, LIU Y, ZHANG C, et al. HUNTER: an online cloud database hybrid tuning system for personalized requirements[C]∥Proceedings of the 2022 International Conference on Management of Data. New York: ACM, 2022: 646-659.
[44] ZHANG X Y, WU H, CHANG Z, et al. ResTune: resource oriented tuning boosted by meta-learning for cloud databases[C]∥Proceedings of the 2021 International Conference on Management of Data. New York: ACM, 2021: 2102-2114.
[45] 李琳, 李玉泽, 张钰嘉, 等. 基于多估计器平均值的深度确定性策略梯度算法[J]. 郑州大学学报(工学版), 2022, 43(2): 15-21.LI L, LI Y Z, ZHANG Y J, et al. Deep deterministic policy gradient algorithm based on mean of multiple estimators[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(2): 15-21.
[46] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]∥Proceedings of the 31st International Conference on International Conference on Machine Learning. New York: ACM, 2014: 1-9.
[47] LEE J, CHOI J, SEO S, et al. K2vTune: automatic database tuning with knob vector representation[EB/OL]. (2022-09-21)[2023-08-01]. https:∥ssrn.com/abstract=4225456.[48] JAMSHIDI P, SIEGMUND N, VELEZ M, et al. Transfer learning for performance modeling of configurable systems: an exploratory analysis[C]∥2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). Piscataway: IEEE, 2017: 497-508.
[49] ZHANG X Y, CHANG Z, LI Y, et al. Facilitating database tuning with hyper-parameter optimization[J]. Proceedings of the VLDB Endowment, 2022, 15(9): 1808-1821.
[50] PAVLO A, ANGULO G, ARULRAJ J, et al. Self-dri-ving database management systems[C]∥ Conference on Innovative Data Systems Research. Chaminade: CIDR, 2017: 1-6.
[51] LI G L, ZHOU X H, SUN J, et al. openGauss[J]. Proceedings of the VLDB Endowment, 2021, 14(12): 3028-3042.
[52] 李国良, 周煊赫. 轩辕: AI原生数据库系统[J]. 软件学报, 2020, 31(3): 831-844.LI G L, ZHOU X H. XuanYuan: an AI-native database systems[J]. Journal of Software, 2020, 31(3): 831-844.
[53] LI G L, ZHOU X H, CAO L. Machine learning for databases[C]∥Proceedings of the First International Confe-rence on AI-ML Systems. New York: ACM, 2021: 1-2.
[54] LI G L, ZHOU X H, CAO L. AI meets database: AI4DB and DB4AI[C]∥Proceedings of the 2021 International Conference on Management of Data. New York: ACM, 2021: 2859-2866.
[55] SHI J C, CONG G, LI X L. Learned index benefits: machine learning based index performance estimation[J]. Proceedings of the VLDB Endowment, 2022, 15: 3950-3962.[56] ZHAO X Y, ZHOU X H, LI G L. Automatic database knob tuning: a survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(12): 12470-12490.

相似文献/References:

[1]徐龙琴,刘双印..基于PSO-WSVR的短期水质预测模型研究[J].郑州大学学报(工学版),2013,34(03):112.[doi:10.3969/j.issn.1671-6833.2013.03.027]
 XU Long-qin,LIU Shuang-vin.Study of Short-term Water Quality Prediction Model Based on PSo-wSVR[J].Journal of Zhengzhou University (Engineering Science),2013,34(01):112.[doi:10.3969/j.issn.1671-6833.2013.03.027]
[2]张军,申俊敏,牛玺荣..加筋地基承载特性与参数优化研究[J].郑州大学学报(工学版),2013,34(05):12.[doi:10.3969/j.issn.1671-6833.2013.05.003]
 ZHANGJun,SHENJunmin,NIU Xi·rong.Study on Bearing Characteristics and Parameter Optimization of Reinforced Ground Soil[J].Journal of Zhengzhou University (Engineering Science),2013,34(01):12.[doi:10.3969/j.issn.1671-6833.2013.05.003]
[3]周荣敏,雷延峰..基于遗传算法的最小生成树的参数优化研究[J].郑州大学学报(工学版),2002,23(02):9.[doi:10.3969/j.issn.1671-6833.2002.02.003]

更新日期/Last Update: 2024-01-23