Page 147 - 2025年第56卷第1期
P. 147

[J∕OL]. ArXiv, (2023-05-25)[2024-02-01]. https:∕∕arxiv.org∕abs∕2212.10560.
                [ 24 ]  CHEN L, LI S, YAN J, et al. Alpagasus: Training a better alpaca with fewer data[J∕OL]. ArXiv, (2024-02-
                       13)[2024-02-14]. https:∕∕arxiv.org∕abs∕2307.08701.
                [ 25 ]  WU T, TERRY M, CAI C J. AI chains: Transparent and controllable human-ai interaction by chaining large lan⁃
                      guage model prompts[C]∕∕Proceedings of the 2022 CHI conference on human factors in computing systems. New
                      Orleans, 2022.
                [ 26 ]  LI S, LIU H, BIAN Z, et al. Colossal-AI: A unified deep learning system for large-scale parallel training[C]∕∕
                      Proceedings of the 52nd International Conference on Parallel Processing. Salt Lake City, 2023.
                [ 27 ]  BAI J Z, BAI S, CHU Y F, et al. Qwen technical report[J∕OL]. ArXiv, (2023-09-28)[2023-11-26]. ht⁃
                      tps:∕∕arxiv.org∕abs∕2309.16609.
                [ 28 ]  WANG Y X, SUN Q X, HE S C. M3E: Moka massive mixed embedding model[EB∕OL]. (2023-06-24)[2023-
                       11-23]. https:∕∕huggingface.co∕moka-ai∕m3e-base.
                [ 29 ]  JOHNSON J, DOUZE M, JEGOU H. Billion - scale similarity search with GPUs[J]. IEEE Transactions on Big
                      Data, 2019, 7(3): 535-547.
                [ 30 ]  DU Z, QIAN Y, LIU X, et al. Glm: General language model pretraining with autoregressive blank infilling[J∕
                      OL]. ArXiv, (2022-05-17)[2023-11-23]. https:∕∕arxiv.dosf.top∕abs∕2103.10360.
                [ 31 ]  YANG A Y, XIAO B, WANG B N, et al. Baichuan 2: Open large - scale language models [ J∕OL]. ArXiv,
                      (2023-09-19)[2023-11-27]. https:∕∕arxiv.org∕abs∕2309.10305.




                          Grouting works knowledge service system based on large language model
                                  1                1            2              1        1            1
                   ZHANG Tianhong , WANG Xiaoling , YU Hongling , WANG Jiajun , SU Zhe , ZHANG jun
                 (1. State Key Laboratory of Hydraulic Engineering Intelligent Construction and Operation, Tianjin University, Tianjin  300072, China;
                         2. College of Water Resources and Civil Engineering, China Agricultural University, Beijing  100091, China)


                  Abstract: Professional information acquisition in grouting engineering currently relies on manual processing of di⁃
                  verse texts, resulting in high labor costs and low efficiency. To address these challenges, it is essential to develop
                  a knowledge service system based on large language model. The construction of this service system hinges on provi⁃
                  ding high-quality text for fine-tuning large language model and managing the inherent timeliness and information
                  security risks of specific engineering texts. To overcome these challenges, a hybrid strategy dataset construction
                  method is proposed, based on universal grouting construction specifications. By incorporating self-checking chain
                  of thought and scoring strategies, the quality limitations of traditional data generation are addressed, ensuring the
                  provision of high-quality data necessary for fine-tuning the large language model. Additionally, LangChain is em⁃
                  ployed to build a retrieval augmented generation framework for grouting engineering. By incorporating an embedded
                  local knowledge base, this framework ensures the isolation of specific grouting text and models, guaranteeing infor⁃
                  mation security. To meet the timeliness requirements of specific engineering texts, a staged update approach is a⁃
                  dopted. Professional tests demonstrate that the Qwen-7B-Grout model, fine-tuned using our proposed methods,
                  achieves 100% accuracy in the judgment of grouting-specific issues and 80% accuracy in fill-in-the-blank ques⁃
                  tions. The knowledge service system based on large language model proposed in this study not only facilitates gener⁃
                  al grouting knowledge Q&A but also realizes efficient retrieval augmented generation of engineering documents. In
                  conclusion, this system significantly improves production efficiency and reduces labor costs, offering new intelligent
                  assistance for grouting design and construction management.
                  Keywords: grouting works knowledge service; mixed strategy data generation; large language model; retrieval
                  augmented generation; grouting works


                                                                            (责任编辑: 尹  婧  韩  昆)





                —  1 4 2  —
   142   143   144   145   146   147   148   149   150