Abstract
Multimodal models can experience multimodal collapse, leading to sub-optimal performance on tasks like fine-grained e-commerce product classification. To address this, we introduce an approach that leverages multimodal Shapley values (MM-SHAP) to quantify the individual contributions of each modality to the model's predictions. By employing weighted stacked ensembles of unimodal and multimodal models, with weights derived from these Shapley values (MM-SHAP), we enhance the overall performance and mitigate the effects of multimodal collapse. Using this approach we improve previous results (F1-score) from 0.67 to 0.79.
Original language | English |
---|---|
Title of host publication | 2024 International Conference on Machine Learning and Applications (ICMLA) |
Publisher | IEEE |
ISBN (Electronic) | 979-8-3503-7488-9 |
DOIs | |
Publication status | Published - 4 Mar 2025 |