In the previous part we created a website where users can generate GIF animations using Emoji, domain-specific language (DSL) and a Canvas. In this post we'll upgrade our animations to talkies!
Intro
I thought that it'd be funny to create animations where Emoji can talk. I already had Emoji moving around and displaying phrases as text. Obviously it was missing sound. In this article I'll show you how I added it!
tl;dr: try this animation
⚠️ warning: contains sound!
Text-to-Speech
Accidentally I stumbled upon "Text To Speech In 3 Lines Of JavaScript" article (thanks, @asaoluelijah!) and that "3 lines" quickly migrated to my project.
const msg = new SpeechSynthesisUtterance(); msg.text = 'Hello World'; speechSynthesis.speak(msg); // ☝️ You can run this in the console, BTW
Surely "3 lines" turned out to be 80. But I'll get to that later.
Text-to-Speech — is a part of browser Web Speech API that allows us to read text out loud and recognize speech.
But before we can go further with adding Text-to-Speech to animation, I need to show you how I rendered animation in the first place.
Animation and RxJS
After parsing DSL and rendering it to canvas (see part I), I had an array of frames:
[ { image: 'http://.../0.png' , phrases: [ 'Hello!' ] , duration: 1000 } , { image: 'http://.../1.png' , phrases: [ 'Hi!' ] , duration: 1000 } ]
Each frame had a rendered image
, phrases
within it and frame duration
.
To show the animation I used a React component with RxJS stream inside:
import React, { useState, useEffect } from 'react'; function Animation({ frames }) { // state for current frame const [frame, setFrame] = useState(null); useEffect(() => { // turn array intro stream of arrays const sub = from(frames).pipe( // with each frame delayed by frame.duration delayWhen(frame => timer(frame.duration)), // mapped to an Image map(frame => <img src={frame.image} />) ) .subscribe(setFrame); return () => sub.unsubscribe(); // teardown logic }, [frames]); return frame; }
Here I use a useEffect
hook to create a RxJS Observable and a subscription to it. The from
function will iterate over the rendered frames
array, delayWhen
will delay each frame by frame.duration
and map
will turn each frame into a new <img />
element. And I can easily loop the animation by simply adding a repeat()
operator.
Note that subscription has to be cancelled at some point (specially the endless repeat()
): the component might be destroyed or the frames
might change. So the function passed to useEffect
hook needs to return a teardown callback. In this case I unsubscribe from the animation observable, effectively terminating the flow.
With that covered, we can now discuss the Text-to-Speech!
Text-to-Speech and RxJS
Now I needed to pronounce the text using Speech API, but that frame.duration
delay I used wouldn't work: I had to wait until the phrase is spoken and only then switch to the next frame. Also, if user edits the scenario or navigates away — I need to stop current synthesis. Happily, RxJS is ideal for such things!
First I needed to create an Observable wrapper around Speech Synthesis API:
export function speak(text) { return new Observable((observer) => { // create and config utterance const utterance = new SpeechSynthesisUtterance(); utterance.text = text; // subscribe our observer to utterance events utterance.onend = () => observer.complete(); utterance.onerror = (err) => observer.error(err); // start the synthesis speechSynthesis.speak(utterance); return () => { speechSynthesis.cancel(); } }); }
When utterance will end Observable will complete, thus letting us chaining the synthesis. Also, if we unsubscribe from Observable — the synthesis will be stopped.
I've actually decided to publish this Observable wrapper as an npm package. There's a link in the footer 👇!
Now we can safely compose our phrases and be notified when they end:
concat( speak('Hello'), speak('World') ) .subscribe({ complete(){ console.log('done'); } });
Try this code online at https://stackblitz.com/edit/rxjs-tts?file=index.ts
And to integrate the Text-to-Speech back into our Animation component:
from(frames).pipe( concatMap(frame => { // concat all phrases into a chain const phrases$ = concat( EMPTY, ...frame.phrases.map(text => speak(text)) ); // we'll wait for phrase to end // even if duration is shorter const duration$ = merge( phrases$, timer(frame.duration) ); // to acknowledge the duration we need to merge it // while ignoring it's values return merge( of(<img src={frame.image} />), duration$.pipe(ignoreElements()) ); }) )
Thats it! Now our Emoji can walk and talk!
Turn the volume up and try this "Dancing" animation
And surely try creating your own 🙂
Outro
It was pretty simple, huh?
But there was a hidden trick: previously the web app was hosted on GitHub pages and users shared their animations using downloaded GIFs. But GIF cannot contain sound, you know... so I needed another way for users to share animations.
In the next article I'll share details on how I migrated the create-react-app to NextJS/Vercel platform and added MongoDB to it.
Have a question or idea? Please, share your thoughts in the comments!
Thanks for reading this and see you next time!
❤️ 🦄 📖
Links
-
Web Speech API
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
-
RxJS Text-to-Speech wrapper npm package
npm i rxjs-tts
-
My twitter (in case you want to follow 🙂)
Top comments (0)