Suppressing unintended invocation of the machine due to the speech that appears like wake-word, or unintentional button presses, is important for a great consumer expertise, and is known as False-Set off-Mitigation (FTM). In case of a number of invocation choices, the normal method to FTM is to make use of invocation-specific fashions, or a single mannequin for all invocations. Each approaches are sub-optimal: the reminiscence value for the previous method grows linearly with the variety of invocation choices, which is prohibitive for on-device deployment, and doesn’t benefit from shared coaching information; whereas the latter is unable to precisely seize acoustic variations throughout totally different invocation varieties. To this finish, we suggest a Unified Acoustic Detector (UAD) for FTM when a number of invocation choices can be found on machine. The proposed UAD is skilled utilizing a multi-task studying framework, the place a collectively skilled acoustic encoder mannequin is augmented with invocation-specific classification layers. Within the context of the FTM process, we present for the primary time that utilizing the shared mannequin structure throughout invocations (thus, retaining the mannequin measurement just like that of a monolithic mannequin used for a single invocation sort), we cannot solely match however largely enhance the accuracy of the invocation-specific fashions. In explicit, within the difficult case of touch-based invocation, we get hold of 50% and 35% relative enchancment in false optimistic fee at 99% true optimistic fee, in comparison with a single-output mannequin for each invocations, and separate fashions per invocation, respectively. Moreover, we suggest streaming and non-streaming variants of the UAD, and present that they each outperform a conventional ASR-based method to FTM.