RECALL-MM:用于风险分析的消费品召回多模态数据集:基于计算方法和大型语言模型的研究
RECALL-MM: A Multimodal Dataset of Consumer Product Recalls for Risk Analysis using Computational Methods and Large Language Models
摘要 Abstract
产品召回为工程设计过程中的潜在风险和危害提供了宝贵的见解,但其全部潜力尚未得到充分利用。在本研究中,我们从美国消费品安全委员会(CPSC)的召回数据库中整理数据,开发了一个多模态数据集RECALL-MM,该数据集利用历史信息进行数据驱动的风险评估,并通过生成方法对其进行补充。数据集中揭示的模式突显了改进安全措施可能产生重大影响的具体领域。我们通过展示交互式聚类图进一步扩展了分析,这些图基于召回描述和产品名称将所有召回嵌入到共享的潜在空间中。利用这些数据驱动工具,我们探讨了三个案例研究,以展示该数据集在识别产品风险和指导更安全设计决策方面的实用性。前两个案例研究展示了设计师如何可视化被召回产品的模式,并将新产品想法置于更广泛的召回背景中,从而主动预测潜在危害。在第三个案例研究中,我们通过采用大型语言模型(LLM)仅基于产品图片预测潜在危害,这表明该模型能够利用视觉上下文识别风险因素,并在许多危害类别中与历史召回数据高度一致。然而,分析还指出了危害预测仍然具有挑战性的领域,强调了在整个设计过程中保持风险意识的重要性。总体而言,这项工作旨在弥合历史召回数据与未来产品安全之间的差距,提出了一种可扩展的数据驱动方法,以实现更安全的工程设计。
Product recalls provide valuable insights into potential risks and hazards within the engineering design process, yet their full potential remains underutilized. In this study, we curate data from the United States Consumer Product Safety Commission (CPSC) recalls database to develop a multimodal dataset, RECALL-MM, that informs data-driven risk assessment using historical information, and augment it using generative methods. Patterns in the dataset highlight specific areas where improved safety measures could have significant impact. We extend our analysis by demonstrating interactive clustering maps that embed all recalls into a shared latent space based on recall descriptions and product names. Leveraging these data-driven tools, we explore three case studies to demonstrate the dataset's utility in identifying product risks and guiding safer design decisions. The first two case studies illustrate how designers can visualize patterns across recalled products and situate new product ideas within the broader recall landscape to proactively anticipate hazards. In the third case study, we extend our approach by employing a large language model (LLM) to predict potential hazards based solely on product images. This demonstrates the model's ability to leverage visual context to identify risk factors, revealing strong alignment with historical recall data across many hazard categories. However, the analysis also highlights areas where hazard prediction remains challenging, underscoring the importance of risk awareness throughout the design process. Collectively, this work aims to bridge the gap between historical recall data and future product safety, presenting a scalable, data-driven approach to safer engineering design.