Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

pelespirit@sh.itjust.works · 7 months ago

simple · 7 months ago

That lip sync is scary good. It’s still a little off, the teeth are weirdly stretchy, but nobody would notice it’s a deepfake on first glance.

Seems very similar to Nvidia’s idea of only having a moving photo for video calls to reduce bandwidth needed. Very nice.

Aatube@kbin.melroy.org · edit-2 7 months ago

We’d need better optimization and more powerful processing on ye average laputopu for that to happen.