Detection of deviant speech syllables embedded in continuous noise was investigated in an oddball paradigm. Behavioral results showed improvement of detecting and identifying the syllables when congruent visual speech accompanied the utterances. A centrally maximal negative ERP difference wave peaking at approximately 290ms post-stimulus was elicited by audiovisual but not by auditory- or visual-only task-irrelevant deviant syllables. Whereas the circumstances of the elicitation of this ERP response are similar to those of the mismatch negativity component (MMN and its visual counterpart, vMMN), its scalp distribution differs from that of both unimodal MMNs. Elicitation of an MMN-like ERP response (termed here as the audiovisual MMN: avMMN) suggests that detection of the audiovisual deviants involved integrated audiovisual memory representations. The pattern of behavioral and ERP results suggest that the formation of such cross-modal memory representation does not require voluntary operations and may even proceed for stimuli outside the focus of attention.