Classical explanations for the modality effect—superior short-term serial recall of auditory compared to visual sequences—typically recur to privileged processing of information derived from auditory sources. Here we critically appraise such accounts, and re-evaluate the nature of the canonical empirical phenomena that have motivated them. Three experiments show that the standard account of modality in memory is untenable, since auditory superiority in recency is often accompanied by visual superiority in mid-list serial positions. We explain this simultaneous auditory and visual superiority by reference to the way in which perceptual objects are formed in the two modalities and how those objects are mapped to speech motor forms to support sequence maintenance and reproduction. Specifically, stronger obligatory object formation operating in the standard auditory form of sequence presentation compared to that for visual sequences leads both to enhanced addressability of information at the object boundaries and reduced addressability for that in the interior. Because standard visual presentation does not lead to such object formation, such sequences do not show the boundary advantage observed for auditory presentation, but neither do they suffer loss of addressability associated with object information, thereby affording more ready mapping of that information into a rehearsal cohort to support recall. We show that a range of factors that impede this perceptual-motor mapping eliminate visual superiority while leaving auditory superiority unaffected. We make a general case for viewing short-term memory as an embodied, perceptual-motor process.