I know that audio is out of the question since no legacy device could handle the encryption. So it would by definition be video only.

It wouldn't have to be preprogrammed (although it wouldn't hurt) it just has to have the ability to map learned commands to inputs.