EbSynth is a style transfer utility put out by Secret Weapons. It can take one frame and apply it’s visual style to a video clip. Most of Secret Weapon’s demos for the software have shown live action video being made to look like a moving painting and that seems to be the intended use-case. I propose however, that this could be an incredible tool for CG animation, or really animation of any kind.
I tried using EbSynth as shortcut for rendering my animations at a very high quality. To do this I rendered one frame out of Maya using Arnold renderer cranked up to give a beautiful frame. On my Machine (a good Alienware gaming laptop that’s starting to show it’s age) a 2k x 2k image took about two hours. On a better machine it might only take twenty minutes but that’s why I think this technique might be worth while! It could be a viable method for getting better results from cheaper machines. Especially useful for the small time freelancer or hobbyist.
I believe EbSynth uses texture tracking combined with some other tricks to paste the intended style onto the input video. It doesn’t know anything about the input video or objects in the scene, it just has to make guesses about which pixels are supposed to move together. In my experience working with Nuke and Mocha’s texture trackers these things tend to base their guesses on points of high contrast in the image. If a clump of high contrast pixels seems to move together across screen, that’s probably “an object,” the tracker thinks, and anything that’s going to get tracked onto the object should move with it. There are also some edge detection algorithms and convolution filters that get applied to help the computer track the same bunch of pixels across frames but basically that’s tracking in a nut shell. (Actually that’s a simplified version of point tracking but texture tracking is kind of the same idea.)
So! My theory is that if you give EbSynth an animation rendered out really fast with a lot of high contrast detail, you can track the high res render onto it and dramatically cut down on render times. Now, going into this, I already knew of some limitations. 1) EbSynth doesn’t know about any background objects occluded by the foreground. Meaning EbSynth makes no assumptions about the 3d space of my scene. It just takes the image as-is and tries to intelligently smear it around to match the input clip. 2) The longer the clip or the more motion in the clip, the more artifacting EbSynth will produce. As time goes on and as the pixels get tracked farther away, the computer’s guesses are going to get less accurate and the image quality will suffer. In the above example, Secret Weapons is using multiple key frames that produce a bunch of clips that can be blended together for a cleaner result. Each time the actor turns his head, they need a new clip to blend in because the part of his face that is away from camera becomes occluded. But I’m trying to speed things up here, so I’m just going to use one key frame and see what I can get away with.
With those limitations in mind, I rendered a short sequence with a checkerboard pattern and no shadow information. I rendered this with Maya’s hardware renderer, it only took seconds to render out. EbSynth then only took around 2 minutes to process the footage.
We can see the results are promising but not super great yet. There are lots of creepy crawlies along the checker lines from the input clip and the shadow edges get very soft and mushy. I think EbSynth is doing a reasonably good job tracking the corners of the checker pattern but it’s clear that it needs denser detail in the input clip to get the texture to track better.
For the next test I tried using a fractal noise texture instead of the checker pattern and adding back the shadow information by using a simple diffuse shader. Again it renders 250 frames in seconds. Using the high quality settings that would have taken 2 hours for each frame.
And here’s the result:
OK! Way better. There’s still some artifacting but there’s some impressive fidelity on the sculpt. This isn’t the quality you’d get from a full on HQ render out of Maya, but it’s way better than you could normally get for the 10 second hardware render + 2 minutes of processing time from Ebsynth. Interestingly there’s still some creepy crawlly-ness in the render and this is with Synthesis Detail turned to high in EbSynth’s advanced settings. Let’s see how far this technique can be pushed.
With a bigger camera move we can see the occlusion problem. The algorithm is making a guess about what exists behind the sculpt and it gets it kind of wrong. On their website, Secret Weapons recommends compositing each element on it’s own layer to avoid this issue, and while that would work this is all about expediency and seeing what we can get away with. For this render I did add a fractal noise pattern to the background and it isn’t suffering from the same artifacting we saw before as a result. Also note how the algorithm is handling the highlight on the right cheek and neck. It’s making some impressively good guesses about how the light would roll around that edge as the camera swings into place. Remember this is all generated from one keyframe! However the screen right earring does suffer a bit in the exchange.
All told, I think this technique has promise. It doesn’t handle big movements or occluded elements very well but there might be quick ways of getting around those problems without having to render out more High quality keyframes. We might be able to generate new keyframes from the output clip by cleaning them up in Photoshop and running EbSynth again. Expect more experiments on this in the future. Also it seems like other folks are already applying this style transfer idea to 2D animation: https://twitter.com/Rafikisland/status/1202327515689299968 and https://twitter.com/zmillerTV/status/1357116424339132417
Here’s hoping this leads to some dope stuff.
-Tim