The LLM/deterministic split is the smart call here. You can iterate on a script without the rest of the pipeline drifting under you. Curious how far the vowel-per-word heuristic holds before you wish you had Rhubarb, but "regenerates instantly" sounds like the right tradeoff for a studio loop.
This looks great. Curious about the lip-sync — viseme set or just
open/closed mouths? The South Park style is super forgiving but
HyperFrames quality seems like it'd need more.
I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters.
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...
The LLM/deterministic split is the smart call here. You can iterate on a script without the rest of the pipeline drifting under you. Curious how far the vowel-per-word heuristic holds before you wish you had Rhubarb, but "regenerates instantly" sounds like the right tradeoff for a studio loop.
This looks great. Curious about the lip-sync — viseme set or just open/closed mouths? The South Park style is super forgiving but HyperFrames quality seems like it'd need more.
Very cool! I will definitely try this out - cartoons is something I have been interested in for a while. Will check it out.
static video with text2speech audio and two circles moving representing the mouths: "OMG I might have a show on my hands "
I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters.
Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/
I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.
Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.
Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...