信息抽取技术随着自然语言处理技术的发展,已经取得了较好的发展,但在实际应用中,由于算法标注数据需求高、训练代价大,上下文理解难,私有化领域落地一直存在较高瓶颈。本文提出了一种基于提示增强的LLM信息抽取算法(LLM-IE Base on Prompt Enhance),通过将文本信息抽取任务转化为文本生成任务,并基于生成文本进行结构化解析,形成文本信息抽取结果。该方法在实体、关系、事件三类自建数据集上进行测试验证,面对少样本困境,该方法通过提示增强激发模型信息提取任务能力,可以近似达成模型微调的效果,同时相较于其他主流信息抽取模型在准确率与召回率上都有提升。With the development of natural language processing technology, information extraction techniques have made significant progress. However, in practical applications, due to high algorithmic annotation data requirements, large training costs, and challenges in understanding context, private domain implementations have consistently faced high barriers. This paper proposes an information extraction algorithm for LLMs based on prompt enhance (LLM-IE Based on Prompt Enhance). This method transforms text information extraction tasks into text generation tasks and performs structured parsing based on the generated text to form the results of information extraction. The method was tested and validated on three self-built datasets for entities, relationships, and events. In addressing the challenge of limited sample data, this approach can approximate the effect of model fine-tuning by stimulating the model’s information extraction task capabilities through prompt Enhancement. Additionally, compared to other mainstream information extraction models, this method shows improvements in both accuracy and recall rates.
当前信息抽取任务主要依赖大语言模型(LLM),而标书信息中广泛存在领域术语,模型缺乏相关先验知识,导致微调效率低且抽取性能不佳。此外,模型的抽取和泛化性能在很大程度上依赖于提示信息的质量和提示模板的构建方式。针对上述问题,提出一种基于提示学习的标书信息抽取方法(TIEPL)。首先,利用生成式信息抽取的提示学习方法对LLM注入领域知识,以实现预训练和微调阶段的统一优化;其次,以LoRA(Low-Rank Adaption)微调方法为框架,单独设计提示训练旁路,并设计标书场景关键词提示模板,从而增强模型信息抽取与提示的双向关联。在自建的招中标数据集上的实验结果表明,相较于次优的UIE(Universal Information Extraction)方法,TIEPL的ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation)和BLEU-4(BiLingual Evaluation Understudy)分别提高1.05和4.71个百分点,能更准确和完整地生成抽取结果,验证了所提方法在提高标书信息抽取准确性和泛化性方面的有效性。