Files

Joshua Lochner 5229a254b7 Kokoro.js v1.2.0: Streaming support (#92 )

* Set up JS project

* Finalise JS library

* Update README

* Fix package.json repository url

* Rename package -> `kokoro-js`

* Fix samples in README

* Cleanup README

* Bump `phonemizer` version

* Create web demo

* Run prettier

* Link to model used in demo

* Enable multithreading in HF space demo (~40% faster)

* Add link to demo in README

* Bump to v1.0.1

* Update voices

* Update versions

* Update phonemize JSDoc

* Use updated voice pack

* Update versions

* Update demo (v1.0 & WebGPU support)

* Update README

* Enforce maximum number of tokens

* Update README

* [version] Update to 1.1.1

* Create simple sentence splitter

* Update `npm run test`

* Update API to use sync and async iterators

* Add support for streamed generation in kokoro.js

* Always split on newlines

* Remove debug line

* Improvements

* Add more matching puntuation marks

* Update comments

* nits

* Export TextSplitterStream too

* Update splitter.js

* Update README

* [version] Update to 1.2.0

2025-02-15 11:06:33 -08:00

6.2 KiB

Raw Blame History

Kokoro TTS

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). This JavaScript library allows the model to be run 100% locally in the browser thanks to 🤗 Transformers.js. Try it out using our online demo!

Usage

First, install the kokoro-js library from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
  device: "wasm", // Options: "wasm", "webgpu" (web) or "cpu" (node). If using "webgpu", we recommend using dtype="fp32".
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_heart",
});
audio.save("audio.wav");

Or if you'd prefer to stream the output, you can do that with:

import { KokoroTTS, TextSplitterStream } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
  // device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
});

// First, set up the stream
const splitter = new TextSplitterStream();
const stream = tts.stream(splitter);
(async () => {
  let i = 0;
  for await (const { text, phonemes, audio } of stream) {
    console.log({ text, phonemes });
    audio.save(`audio-${i++}.wav`);
  }
})();

// Next, add text to the stream. Note that the text can be added at different times.
// For this example, let's pretend we're consuming text from an LLM, one word at a time.
const text = "Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. It can even run 100% locally in your browser, powered by Transformers.js!";
const tokens = text.match(/\s*\S+/g);
for (const token of tokens) {
  splitter.push(token);
  await new Promise((resolve) => setTimeout(resolve, 10));
}

// Finally, close the stream to signal that no more text will be added.
splitter.close();

// Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
// splitter.flush();

Voices/Samples

Tip

You can find samples for each of the voices in the model card on Hugging Face.

American English

Name	Traits	Target Quality	Training Duration	Overall Grade
af_heart	🚺❤️			A
af_alloy	🚺	B	MM minutes	C
af_aoede	🚺	B	H hours	C+
af_bella	🚺🔥	A	HH hours	A-
af_jessica	🚺	C	MM minutes	D
af_kore	🚺	B	H hours	C+
af_nicole	🚺🎧	B	HH hours	B-
af_nova	🚺	B	MM minutes	C
af_river	🚺	C	MM minutes	D
af_sarah	🚺	B	H hours	C+
af_sky	🚺	B	M minutes 🤏	C-
am_adam	🚹	D	H hours	F+
am_echo	🚹	C	MM minutes	D
am_eric	🚹	C	MM minutes	D
am_fenrir	🚹	B	H hours	C+
am_liam	🚹	C	MM minutes	D
am_michael	🚹	B	H hours	C+
am_onyx	🚹	C	MM minutes	D
am_puck	🚹	B	H hours	C+
am_santa	🚹	C	M minutes 🤏	D-

British English

Name	Traits	Target Quality	Training Duration	Overall Grade
bf_alice	🚺	C	MM minutes	D
bf_emma	🚺	B	HH hours	B-
bf_isabella	🚺	B	MM minutes	C
bf_lily	🚺	C	MM minutes	D
bm_daniel	🚹	C	MM minutes	D
bm_fable	🚹	B	MM minutes	C
bm_george	🚹	B	MM minutes	C
bm_lewis	🚹	C	H hours	D+

6.2 KiB Raw Blame History