In remote-via-satellite live television for news reports, lip-sync is always an issue for audio techs to be concerned with. Even if everything is OK, lip-sync is still checked. Almost without fail, a correspondent and crew will have no special test signal for lip-sync. The audio tech adjusts based on the correspondent’s speech while viewing the image. If things appear especially difficult to judge due to loud background noise or other circumstance, the audio tech might ask an on-camera person to do individual hand claps a couple seconds apart for half a minute. If you like to take things to an extreme, read on, otherwise skip to the end.
If you’re really concerned about it, and you have a video camera, record a bit of yourself clapping and play that back. Don’t hide the part of your hands that meet from the camera by curling you fingers. You should hear the sound of the clap at the same moment as the hands meet. Another good diagnostic tool in this amateur analysis is a clock with a once-per-second ticking forward second hand. Does a direct recording of the ticking clock play back with the hand movement and the ticking sound in unison?
If you want an even more critical adjustment, and you have the means to play back video with sound in slow motion, record a view of your video monitor playing back an event with a definite sound-to-image link. Play back your recorded video in slow motion or one frame at a time to determine if the sound hits in the same video frame as the visual event.
As to a choice of material for your test, an explosion in a movie may not be the best choice. Sometimes, for theatrical effect, there will be anticipatory sound before the visual event or the sound will be delayed as in reality, where you see a sound causing event in the distance before the sound arrives. Also, in a ‘ka-boom’ sound, the ‘ka’ portion may not move the woofer much on its own and the ‘boom’ portion may occur a frame or two after the visual event begins. You have to know that the source you use in the test has both the visual and aural event in the same frame, as in speech or a near-camera hand clap.
If you can’t hear sound when you play back in slow or still-frame mode, there is another thing to try, but a bit more involved and in need of carefulness. Depending on the way you do this, it may be just as often misleading as accurate. If you can place a loudspeaker with the grill off at almost, but not quite, a 90º angle to the viewing screen, then video record a visual event that happens on the screen while at the same time having the speaker cone in sharp focus. Adjust the playback volume so that the speaker cone moves obviously when the on-screen even occurs. (Don’t harm you speakers doing this!) When you play back your recording in still frame mode, the cone should start to move in the same frame as the visual event is first seen. If you don’t see the earliest of the cone movement in the first frame of the visual event, you may see larger cone movement in the frame or two afterward. If the cone moves too soon, you need more audio delay.
Is this even necessary? Maybe the monitor introduces no noticeable video delay that would require a complimentary audio delay. Back to the video camera as a diagnostic tool: place the camera so that both you and your monitor are in view. Take the output of the camera and send it to the monitor – there will be visual feedback so keep the monitor at the far left or right of the camera’s view with only half the monitor being seen by the camera. You stand at the other edge of the camera's view and move your arm up and down. As seen on the monitor, does your arm move up and down in unison with reality, or is there a delay? If there is a delay, the faster you move your arm, the more pronounced the demonstration of delay. If you can later play back your recording in still-frame mode, how many frames after your arm reaches a certain point does the image in the monitor reach the same position? If you’re going to try calculating the time delay, NTSC video has 59.94 interlaced fields for a total of 29.97 frames per second.
All-in-all the simple do-the-lips-match method is all the pros have time to do for live events. In an editing situation there is the jog-the-footage check, but more often than not home equipment doesn’t give the consumer that option. Unless you are really bothered or you are in the mood to experiment, simple is best – ignore my verbose ramblings and go with what PodBoy said.