Abstract
Multimodal models can experience multimodal collapse, leading to sub-optimal performance on tasks like fine-grained e-commerce product classification. To address this, we introduce an approach that leverages multimodal Shapley values (MM-SHAP) to quantify the individual contributions of each modality to the model's predictions. By employing weighted stacked ensembles of unimodal and multimodal models, with weights derived from these Shapley values (MM-SHAP), we enhance the overall performance and mitigate the effects of multimodal collapse. Using this approach we improve previous results (F1-score) from 0.67 to 0.79.
| Original language | English |
|---|---|
| Title of host publication | 2024 International Conference on Machine Learning and Applications (ICMLA) |
| Publisher | IEEE |
| ISBN (Electronic) | 979-8-3503-7488-9 |
| DOIs | |
| Publication status | Published - 4 Mar 2025 |