HumanAesExpert：迈向多模态基础模型的人类图像美学评估

Research

arXiv

HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment

Zhichao Liao ,

Xiaokun Liu ,

Wenyu Qin ,

Qingyu Li ,

Qiulin Wang ,

Pengfei Wan ,

Di Zhang ,

Long Zeng ,

Pingfa Feng

论文信息在线阅读PDF

摘要 Abstract

图像美学评估（IAA）是一项长期且具有挑战性的研究任务。然而，其子集——人类图像美学评估（HIAA），尽管在社交媒体、AI工作流及相关领域广泛应用，却鲜有深入探索。为填补这一研究空白，我们的工作开创性地提出了一套针对HIAA的整体实施框架。具体而言，我们引入了HumanBeauty数据集，这是首个专为HIAA设计的数据集，包含10.8万张高质量的人类图像及人工标注。为了实现全面且细致的HIAA，我们通过严格的筛选流程手动收集了5万张人类图像，并利用开创性的12维美学标准进行标注，同时从公开数据集中系统性筛选出剩余的5.8万张带有总体美学标签的图像。基于HumanBeauty数据库，我们提出了HumanAesExpert，这是一种用于评估人类图像美学的强大视觉语言模型。我们创新性地设计了一个专家头模块，整合了人类对美学子维度的知识，同时结合语言建模（LM）和回归头模块共同发挥作用。这种方法使我们的模型在整体及细粒度HIAA方面表现出色。此外，我们引入了MetaVoter，它聚合了三个头模块的评分，有效平衡了每个头模块的能力，从而实现了更精确的评估。大量实验表明，我们的HumanAesExpert模型在HIAA任务中的表现显著优于其他最先进的模型。我们的数据集、模型和代码已公开发布，以推动HIAA社区的发展。项目网页：https://humanaesexpert.github.io/HumanAesExpert/

Image Aesthetic Assessment (IAA) is a long-standing and challenging research task. However, its subset, Human Image Aesthetic Assessment (HIAA), has been scarcely explored, even though HIAA is widely used in social media, AI workflows, and related domains. To bridge this research gap, our work pioneers a holistic implementation framework tailored for HIAA. Specifically, we introduce HumanBeauty, the first dataset purpose-built for HIAA, which comprises 108k high-quality human images with manual annotations. To achieve comprehensive and fine-grained HIAA, 50K human images are manually collected through a rigorous curation process and annotated leveraging our trailblazing 12-dimensional aesthetic standard, while the remaining 58K with overall aesthetic labels are systematically filtered from public datasets. Based on the HumanBeauty database, we propose HumanAesExpert, a powerful Vision Language Model for aesthetic evaluation of human images. We innovatively design an Expert head to incorporate human knowledge of aesthetic sub-dimensions while jointly utilizing the Language Modeling (LM) and Regression head. This approach empowers our model to achieve superior proficiency in both overall and fine-grained HIAA. Furthermore, we introduce a MetaVoter, which aggregates scores from all three heads, to effectively balance the capabilities of each head, thereby realizing improved assessment precision. Extensive experiments demonstrate that our HumanAesExpert models deliver significantly better performance in HIAA than other state-of-the-art models. Our datasets, models, and codes are publicly released to advance the HIAA community. Project webpage: https://humanaesexpert.github.io/HumanAesExpert/