Just like Mr. Pop said. You have to create each frame in real life, and fotogaph it. You can reuse some frames, like on 00:14 in the example you gave. It is just an iteration of two frames. The speaking is just one same frame and they put/remove a black shape on the mouth to make it open/close. Making such a (little) movie requires a lot of time, work and patience. Good luck if you try to make one!
__________________
|