随着大型语言模型(Large Language Models,LLMs)在语言理解、文本生成和机器翻译等领域的卓越表现,已成为目前自然语言处理研究的热点,但由于庞大的参数和复杂的结构,使得在训练和部署时需要消耗大量的计算资源和时间,所以对大语言模型的压缩成为当前研究的重点。在模型压缩领域,模型剪枝作为其中的关键技术,对LLMs的压缩发挥着越来越重要的作用。为了更加详细地了解剪枝算法在LLMs上的表现,本文根据剪枝算法所用的结构化或非结构化的修剪策略对算法进行分类,并对近几年主流的算法进行分析和归类,总结每类方法的研究思路和特点,最后对LLMs剪枝方法的未来发展进行展望。
End-to-end object detection Transformer(DETR)successfully established the paradigm of the Transformer architecture in the field of object detection.Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years.There has been an abundance of work improving upon DETR.However,DETR and its variants require a substantial amount of memory resources and computational costs,and the vast number of parameters in these networks is unfavorable for model deployment.To address this issue,a greedy pruning(GP)algorithm is proposed,applied to a variant denoising-DETR(DN-DETR),which can eliminate redundant parameters in the Transformer architecture of DN-DETR.Considering the different roles of the multi-head attention(MHA)module and the feed-forward network(FFN)module in the Transformer architecture,a modular greedy pruning(MGP)algorithm is proposed.This algorithm separates the two modules and applies their respective optimal strategies and parameters.The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset.The model obtained through the MGP algorithm reduces the parameters by 49%and the number of floating point operations(FLOPs)by 44%compared to the Transformer architecture of DN-DETR.At the same time,the mean average precision(mAP)of the model increases from 44.1%to 45.3%.