短视频传播影响力评估:一个新的现实世界数据集与一个新的大图模型

Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model

摘要 Abstract

短视频平台在全球范围内获得了极大的受欢迎程度,吸引了数百万甚至数十亿用户的兴趣。近期,研究人员强调了分析短视频传播的重要性,这通常涉及发现商业价值、公众意见以及用户行为等。本文提出了一项新的短视频传播影响力评估(SPIR)任务,并从数据集和方法两个角度推动了SPIR的发展。首先,我们提出了一个新的跨平台短视频(XS-Video)数据集,旨在为各种平台提供大规模且真实的短视频传播网络,以促进对短视频传播的研究。我们的XS-Video数据集涵盖了来自中国五大平台的117,720个视频、381,926个样本和535个主题,并标注了从0到9级的传播影响力。据我们所知,这是第一个包含跨平台数据或提供了所有观看次数、点赞数、分享数、收藏数、粉丝数、评论数及评论内容的大规模短视频数据集。其次,我们基于一种新颖的三阶段训练机制,提出了一种名为NetGPT的大图模型(LGM),以连接异构图结构化数据与大型语言模型(LLMs)的强大推理能力和知识。我们的NetGPT能够理解和分析短视频传播图,从而预测短视频的长期传播影响力。在我们的XS-Video数据集上,通过分类和回归指标进行的综合实验结果表明,我们的方法在SPIR任务中的优越性。

Short-video platforms have gained immense popularity, captivating the interest of millions, if not billions, of users globally. Recently, researchers have highlighted the significance of analyzing the propagation of short-videos, which typically involves discovering commercial values, public opinions, user behaviors, etc. This paper proposes a new Short-video Propagation Influence Rating (SPIR) task and aims to promote SPIR from both the dataset and method perspectives. First, we propose a new Cross-platform Short-Video (XS-Video) dataset, which aims to provide a large-scale and real-world short-video propagation network across various platforms to facilitate the research on short-video propagation. Our XS-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms, annotated with the propagation influence from level 0 to 9. To the best of our knowledge, this is the first large-scale short-video dataset that contains cross-platform data or provides all of the views, likes, shares, collects, fans, comments, and comment content. Second, we propose a Large Graph Model (LGM) named NetGPT, based on a novel three-stage training mechanism, to bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of Large Language Models (LLMs). Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos. Comprehensive experimental results evaluated by both classification and regression metrics on our XS-Video dataset indicate the superiority of our method for SPIR.