文章提出了一种针对流数据概念漂移现象的在线学习算法。为了提高预测的速度与精度,本文提出了多步预测回归集成模型,并详细描述了结合聚类算法的样本重抽样过程,以应对流数据的高维和大规模问题。通过将重抽样后的样本引入基于滑动窗口的在线自适应框架,结合多步预测回归模型组成本文的在线学习算法,该算法能够及时识别和处理概念漂移现象。此外,还提出了概念漂移的统计理论依据,确保了算法的准确性。针对路口车流量与网站浏览量数据,本文提出了概念漂移的类型,并针对突变漂移提出布尔因子,有效减少了突变漂移的不良影响。在实例评估中,本文方法在准确度和稳定性上均表现良好。This paper proposes an online learning algorithm for the concept drift phenomenon of streaming data. In order to improve the speed and accuracy of prediction, this paper proposes a multi-step prediction regression ensemble model and describes in detail the sample resampling process combined with the clustering algorithm to cope with the high-dimensional and large-scale problems of streaming data. By introducing the resampled samples into an online adaptive framework based on sliding windows and combining them with the multi-step prediction regression model to form the online learning algorithm of this paper, the algorithm can timely identify and handle the concept drift phenomenon. In addition, the paper also proposes a statistical theoretical basis for concept drift to ensure the accuracy of the algorithm. For the intersection traffic flow and website pageview data, this paper proposes the type of concept drift and proposes a Boolean factor for sudden drift, which effectively reduces the adverse effects of sudden drift. In the example evaluation, the method in this paper performs well in both accuracy and stability.
针对支持向量数据描述(Support Vector Data Description,SVDD)在线学习时的支持向量数量随样本规模的扩大呈线性增加,进而导致模型更新时间呈非线性增长的问题,提出一种基于支持向量约减的支持向量数据描述(R-SVDD)在线学习方法。该算法通过执行支持向量约减,控制在线学习时的支持向量数量,从而使其具有比其他SVDD算法更快速且更稳定的模型更新时间,适合大规模数据的分类处理。首先阐述了支持向量约减的原理;进而给出了在线R-SVDD算法。在单分类和多分类数据集上的实验结果表明,R-SVDD算法相较于SVDD算法,能够在保持分类精度的基础上拥有更快的学习速度。