Abstract
Leveraging limited data to synthesize an additional training set is essential for robotic vision, particularly in dynamic environments where collecting large datasets is impractical. Traditional robotic vision systems rely on extensive training data for object recognition and scene understanding but struggle to generalize to real-world variations, such as lighting conditions, occlusions, and sensor noise. This article proposes causal diffuse variational autoencoder (causal DiffuseVAE), a novel method integrating causal inference with high-fidelity image synthesis to generate counterfactual images. By combining the disentanglement properties of variational autoencoders (VAEs) with the generative capabilities of diffusion models, causal DiffuseVAE produces realistic, interpretable simulations of variations, such as shadows and occlusions. This combination enables data-efficient generative modeling by learning from small subsets and synthesizing missing or unseen samples. In addition, causal inference ensures that generated data follow real-world dependencies, making it robust and interpretable for deployment in unpredictable environments. Four baseline approaches are evaluated across six different datasets, demonstrating that causal DiffuseVAE consistently outperforms the four baseline approaches.
| Original language | English |
|---|---|
| Pages (from-to) | 1-12 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Industrial Informatics |
| Early online date | 27 Jan 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 27 Jan 2026 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver