React / Vanilla Speech Highlight

React.Speech.Highlight.mp4

React / Vanilla js text-to-speech with highlighting the words and sentences that are being spoken using audio files, text-to-speech API, and web speech synthesis API.

Try the demo React Speech Highlight

Other Version

Vanilla JS (Native Javascript)

We support implementation using vanilla js. this package has bundle size of 86 KB. You can easily combine this library with your website, maybe your website using jquery

Read the API_VANILLA.md to see the different.

Try the demo Vanilla Speech Highlight

Watch Youtube Video about implementation vanilla speech highlight for javascript text to speech task.

Do you want other implementation? just ask me via discord: albirrkarim

Features:

Precise Highlight
Human like sound (you can use your audio file)
Generate viseme for current spoken TTS.
Better pronunciation. Read roman number, document id, date range, custom abbreviation function, etc.
Highlight animation without react rerender so the performance is fast
Support unlimited string length
Auto find the best voices for specific language.
Work on all environment see test.md
Solve the Speech Synthesis problems see problem.md

This is the Documentation for private repo React Speech Highlight package and demo website source code

Docs for v4.9.7

Table Of Contents

A. Introduction
B. Todo
C. API & Example Code
D. Changelog
E. Disclaimer & Warranty
F. FAQ
G. Payment

A. Introduction

What i want?

Recently, I want to implement the text-to-speech with highlight the word and sentence that are being spoken on my website.

Then i do search on the internet. but i can't find the npm package to solve all TTS problems

I just want some powerfull package that work on all platforms and good voice quality.

Here what i got when i search on internet:

Using Web SpeechSynthesis

Comes with problems (See PROBLEMS.md). Robot like sound, Supported Devices Available, etc..

Using paid subscription text-to-speech synthesis API

When we talk about good sound / human like voices AI models inference should get involved. So it doesn't make sense if doing that on client side.

Then the speech synthesis API provider like ElevenLabs, Google Cloud, Amazon Polly, and Open AI play their roles.

But they don't provide the npm package to do highlighting.

Then i found Speechify. but i dont find any docs about using some npm package that integrate with their service. (if some one know please tell me). Also this is a paid subscriptions services.

Searching again, Then i found ElevenLabs its free if the 10000 character / month and will reset on next month. Cool right? So i decide to use this as speech synthesis API in my project. But this platform also doesn't provide the react npm package to highlight their audio.

Solutions

So, I decide to making this npm package that combines various methods above to achives all the good things and throw the bad things.

My package combines Built in Web SpeechSynthesis and Audio File (optional) to run.

When using prefer/fallback to audio file you can achive high quality sound and remove all compactbility problem from Built in Web SpeechSynthesis. how you can automatically get the audio file of some text ? you can use ElevenLabs, Google Cloud, Amazon Polly, and Open AI or any other TTS API as long as they can produce audio file (mp3, mp4, wav, etc...) for the detail see the AUDIO_FILE.md. In the demo website i provide you example using ElevenLabs and even you can try your own audio file on that demo web.

When this package just take input text and audio file, how this package know the timing spoken word or sentence of played audio? This package can detect the spoken word and sentence.

Also this package is one time pay. No Subscription. Who likes subscription? I also don't.

Use Cases

Interactive Blog

Imagine that you have long article and have TTS button then played the text to speech and users can see how far the article has been read. you article will be SEO ready because this package has Server Side Rendering (SSR) capability.

Web AI Avatar / NPC

In the demo i provide, you can see the 3D avatar from readyplayer.me can alive playing the idle animation and their mouth can synchronize with the highlighted text to speech, it because this package has react state that represent current spoken viseme. the viseme list that i use in the demo is Oculus OVR LipSync.

B. TODO

(Still working on) Trying to add support for react native
Add viseme support for chinese character
Let me know what you want from this package, the package architecture is scalable to make various feature, please write it on issues tab, or send me discord message @albirrkarim

C. API & Example Code

See API.md and EXAMPLE_CODE.md that contain simple example code.

The full example code and implementation example is using source code from demo website. the source code of demo website is included when you buy this package.

This package is written with typescript, You don't have to read all the docs in here, because this package now support js doc and VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...

Just use the source code from demo website, you can literally just understand the package.

intellisense.mp4

D. Changelog

Changelog contains information about new feature, improve accuracy, fix bug, and what you should do when the version is update.

See CHANGELOG.md

E. Disclaimer & Warranty

There's no refund.

I love feedback from my customers. You can write on the issue tab so when i have time i can try to solve that and deliver for the next update.

F. FAQ

Why it's expensive? Why it's not opensource package?

Well, i need money to funding the research, you know that making package is cost a lot of time and of course money.

Making the pronounciation engine that combines prompt engineering and efficient algorithm to saving Open AI API cost. Need to be tested and the test is repeatly that cost the API call.

Making the transcript time detection engine is also cost in making the audio file TTS API (elevenlabs).

Just try by yourself to make this package. you will be grateful I am selling it cheap.

Can you give me some discount?

Yes, if you are student.

Is it well documented and well crafted?

You can see the docs in this repo, and this package is written with typescript, and tested using jest to make sure the quality.

You don't have to read all the docs in here, because this package now support VS Code IntelliSense what is that? simply its when you hover your mouse into some variable or function VS Code will show some popup (simple tutorial) what is the function about, examples, params, etc...

Just use the source code from demo website, you can literally just understand the package.

intellisense.mp4

This package written in Typescript? Is it can be mixed together with jsx or native js project?

Yes it can, just ask chat gpt, and explain your problems.

Example :

"My project is using webpack, code is using jsx, i want to use tsx code along side the jsx, how can i?"

How accurate the viseme generation?

Goto the Vanilla Speech Highlight

I make demo for outputing the viseme into console.log. just open the browser console and play the prefer audio example (english). and you will see the word and viseme in the current timing of played tts.

How accurate the highlight capability?

Just see the demo

Why there's no voices available on the device?

Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md

or

Try to setting the speech synthesis or language in your device.

If you use smartphone (Android):

Make sure you install Speech Recognition & Synthesis
If step 1 doesn't work. Try to download google keyboard. then setting the Dictation language. wait a few minute (your device will automatically download the voice), then restart your smartphone.

Why speech doesn't work for first played voice?

Your device will download that voice first. then your device will have that voice locally.

Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md

Can i use this text-to-speech without showing the highlight?

Yes, see

Can i use without openai API?

Yes, but you will got this problem

What dependency this package use?

see the package.json in this repo. see the peerDependencies once you build this package you will need only npm package that is in that peerDependencies. Only react.

This package required open ai API for better doing text-to-speech task (solve the problem).

Support for various browsers and devices?

Yes, See the detail on TEST.md

or you can Try to use Prefer or Fallback to Audio File see AUDIO_FILE.md

How it work? Is the Package Architecture Scalable?

It just work. Simple explanation is in the introduction above.

The architecture scalable, just ask me what feature you want.

How about API cost of using open AI API for the pronounciation engine?

I try to optimize the cost while maintaining the accuracy by making new version of engine. v2, v3 etc...

For now, here the test report of the pronoun v2 engines in version 4.9.7 of this library.

const v2_pronoun_engine_reports = {
  overallResults: {
    Name: "v2",
    Detail: "GPT3",
    AvgAcc: "90.50%",
    AvgScore: "92.05%",
    AvgTime: "81.62s",
    AvgCost: "869.53",
    TotalTime: "652.94 s",
    TotalCost: "Rp. 6956.27", // IDR 6956.27 is about USD $0.42 cost of open AI chat completion API
    TotalRecords: 87, // 87 sentence that contain equations or term that should be the pronounciation corrected
    CreatedAt: "29-04-2024 19:07",
  },
  testResults: {
    romanNumberPronounTestCase: {
      AvgAcc: "100.00%",
      AvgScore: "95.83%",
      AvgTime: "5.19s",
      AvgCost: "53.41",
      TotalCost: "320.44",
    },
    mathEquations: {
      AvgAcc: "100.00%",
      AvgScore: "95.62%",
      AvgTime: "5.87s",
      AvgCost: "54.80",
      TotalCost: "273.98",
    },
    demoTestCase: {
      AvgAcc: "95.00%",
      AvgScore: "95.83%",
      AvgTime: "4.71s",
      AvgCost: "32.20",
      TotalCost: "644.00",
    },
    physicalEquations: {
      AvgAcc: "100.00%",
      AvgScore: "97.29%",
      AvgTime: "6.76s",
      AvgCost: "58.16",
      TotalCost: "581.62",
    },
    computerScienceTestCase: {
      AvgAcc: "90.00%",
      AvgScore: "97.58%",
      AvgTime: "7.73s",
      AvgCost: "85.52",
      TotalCost: "855.17",
    },
    machineLeaningTestCase: {
      AvgAcc: "73.68%",
      AvgScore: "80.13%",
      AvgTime: "9.99s",
      AvgCost: "109.85",
      TotalCost: "2087.12",
    },
    biologyTestCase: {
      AvgAcc: "87.50%",
      AvgScore: "96.09%",
      AvgTime: "9.79s",
      AvgCost: "119.12",
      TotalCost: "952.95",
    },
    chemistryTestCase: {
      AvgAcc: "77.78%",
      AvgScore: "78.05%",
      AvgTime: "9.47s",
      AvgCost: "137.89",
      TotalCost: "1240.99",
    },
  },
};

G. Payment

The Web Version (React and Vanilla js)

The current price is $94 USD (Before is $70)

What you got

After you pay you will be invited inside my private repo and stay inside for 1 year to receive any updates.

The Mobile App Version (React Native) (Coming Soon)

The web version is easier to make. but in react native the code and the flow is different. I have to rewrite the entire library so it can be use on react native.

I think the fair price will be $200.

What you got

The Demo App source code (coming soon)

Payment method

I accept various payment method:

Github Sponsors

Choose One Time Tab, Select the option, and follow the next instruction from github.

When you country doesn't have acccess to github sponsors, you can use wise.com. You can adjust the price into your currency then directly send with your currency using wise.

If you are in indonesia (my country) you can easily transfer through bank and e wallet (gopay, shopee pay, jenius)

Keywords

So this package is the answer for you who looking for:

Best Text to Speech Software
text to speech with viseme lipsync javascript
javascript text to speech highlight words
How to text to speech with highlight the sentence and words like speechify
How to text to speech with highlight the sentence and words using elevenlabs
How to text to speech with highlight the sentence and words using open ai
How to text to speech with highlight the sentence and words using google speech synthesis
Text to speech react js
Text to speech javascript
Typescript text to speech
Highlighted Text to Speech
Speech Highlighting in TTS
TTS with Sentence Highlight
Word Highlight in Text-to-Speech.
Elevenlabs TTS
Highlighted TTS Elevenlabs
OpenAI Text to Speech
Highlighted Text OpenAI TTS
React Text to Speech Highlight
React TTS with Highlight
React Speech Synthesis
Highlighted TTS in React
Google Speech Synthesis in React
Text to Speech React JS
React JS TTS
React Text-to-Speech
TTS in React JS
React JS Speech Synthesis
Text to Speech JavaScript
JavaScript TTS
Text-to-Speech in JS
JS Speech Synthesis
Highlighted TTS JavaScript

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github		.github
backend/nodejs		backend/nodejs
img		img
API.md		API.md
API_VANILLA.md		API_VANILLA.md
AUDIO_FILE.md		AUDIO_FILE.md
CHANGELOG.md		CHANGELOG.md
EXAMPLE_CODE.md		EXAMPLE_CODE.md
HOW_TO_USE.md		HOW_TO_USE.md
MAKE_BACKEND.md		MAKE_BACKEND.md
PROBLEMS.md		PROBLEMS.md
README.md		README.md
README_CN.md		README_CN.md
TEST.md		TEST.md
package.json		package.json

albirrkarim/react-speech-highlight-demo

Folders and files

Latest commit

History

Repository files navigation