The military is working on voice-to-voice translation, but I don't know how far they've gotten lately. The problem is difficult.
In principle, you could write a program that would capture the streaming video audio output and feed it as input to Dragon Naturally Speaking; capture Dragon's text output and feed it as input to a text-to-text translation program; capture that program's text output and either display it in a window or feed it as input to a voice-to-text program.
Good luck. This is not trivial and the results are likely to be marginal at best. That's a very optimistic estimate, actually. Note, for example, that most of the time Fox News can't even get English-to-text subtitles right.
Of course, if you have a spare $100K to fund the work, let me know. It isn't impossible, just non-trivial. My wife doesn't want me doing stuff like this anymore, but for enough money....