利用机器学习进行月球矿物学研究-I:火山样本的高光谱成像

Using Machine Learning for Lunar Mineralogy-I: Hyperspectral Imaging of Volcanic Samples

摘要 Abstract

本研究考察了类似于月球物质的火山样本的矿物组成,重点关注橄榄石和辉石。通过400至1000纳米范围内的高光谱成像,我们创建了数据立方体以分析来自意大利西西里岛北部爱奥尼亚群岛火山岛的样本反射特性,并将其划分为九个感兴趣区域,对每个区域的光谱数据进行分析。我们应用了多种无监督聚类算法,包括K-Means、层次聚类、GMM(高斯混合模型)和光谱聚类,以分类光谱轮廓。主成分分析揭示了与特定矿物相关的独特光谱特征,从而实现了精确识别。不同区域的聚类性能各异,其中K-Means算法的轮廓得分最高,为0.47,而GMM表现较差,得分为仅0.25。非负矩阵分解有助于识别不同方法和橄榄石及辉石参考光谱之间的相似性。层次聚类被证明是最可靠的技术之一,在一个样本中与橄榄石光谱的相似度达到94%,而GMM表现出显著的变异性。总体而言,分析表明层次聚类和K-Means方法在总测量中的误差较低,且K-Means在估算分散性和聚类方面表现出色。此外,与其他模型相比,GMM显示出更高的均方根误差(RMSE)。RMSE分析确认K-Means是所有样本中最一致的算法,表明火山岛地区橄榄石相对于辉石占主导地位。这种主导地位可能与历史上类似的形成条件有关,类似于月球上的火山过程,其中橄榄石丰富的组成在古老的熔岩流和撞击熔融岩石中较为常见。

This study examines the mineral composition of volcanic samples similar to lunar materials, focusing on olivine and pyroxene. Using hyperspectral imaging from 400 to 1000 nm, we created data cubes to analyze the reflectance characteristics of samples from samples from Vulcano, a volcanically active island in the Aeolian Archipelago, north of Sicily, Italy, categorizing them into nine regions of interest and analyzing spectral data for each. We applied various unsupervised clustering algorithms, including K-Means, Hierarchical Clustering, GMM, and Spectral Clustering, to classify the spectral profiles. Principal Component Analysis revealed distinct spectral signatures associated with specific minerals, facilitating precise identification. Clustering performance varied by region, with K-Means achieving the highest silhouette-score of 0.47, whereas GMM performed poorly with a score of only 0.25. Non-negative Matrix Factorization aided in identifying similarities among clusters across different methods and reference spectra for olivine and pyroxene. Hierarchical clustering emerged as the most reliable technique, achieving a 94\% similarity with the olivine spectrum in one sample, whereas GMM exhibited notable variability. Overall, the analysis indicated that both Hierarchical and K-Means methods yielded lower errors in total measurements, with K-Means demonstrating superior performance in estimated dispersion and clustering. Additionally, GMM showed a higher root mean square error compared to the other models. The RMSE analysis confirmed K-Means as the most consistent algorithm across all samples, suggesting a predominance of olivine in the Vulcano region relative to pyroxene. This predominance is likely linked to historical formation conditions similar to volcanic processes on the Moon, where olivine-rich compositions are common in ancient lava flows and impact melt rocks.