Files
kokoro/kokoro.js/README.md
Joshua Lochner 5229a254b7 Kokoro.js v1.2.0: Streaming support (#92)
* Set up JS project

* Finalise JS library

* Update README

* Fix package.json repository url

* Rename package -> `kokoro-js`

* Fix samples in README

* Cleanup README

* Bump `phonemizer` version

* Create web demo

* Run prettier

* Link to model used in demo

* Enable multithreading in HF space demo (~40% faster)

* Add link to demo in README

* Bump to v1.0.1

* Update voices

* Update versions

* Update phonemize JSDoc

* Use updated voice pack

* Update versions

* Update demo (v1.0 & WebGPU support)

* Update README

* Enforce maximum number of tokens

* Update README

* [version] Update to 1.1.1

* Create simple sentence splitter

* Update `npm run test`

* Update API to use sync and async iterators

* Add support for streamed generation in kokoro.js

* Always split on newlines

* Remove debug line

* Improvements

* Add more matching puntuation marks

* Update comments

* nits

* Export TextSplitterStream too

* Update splitter.js

* Update README

* [version] Update to 1.2.0
2025-02-15 11:06:33 -08:00

6.2 KiB

Kokoro TTS

NPM NPM Downloads jsDelivr Hits License Demo

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). This JavaScript library allows the model to be run 100% locally in the browser thanks to 🤗 Transformers.js. Try it out using our online demo!

Usage

First, install the kokoro-js library from NPM using:

npm i kokoro-js

You can then generate speech as follows:

import { KokoroTTS } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
  device: "wasm", // Options: "wasm", "webgpu" (web) or "cpu" (node). If using "webgpu", we recommend using dtype="fp32".
});

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text, {
  // Use `tts.list_voices()` to list all available voices
  voice: "af_heart",
});
audio.save("audio.wav");

Or if you'd prefer to stream the output, you can do that with:

import { KokoroTTS, TextSplitterStream } from "kokoro-js";

const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
  dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
  // device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
});

// First, set up the stream
const splitter = new TextSplitterStream();
const stream = tts.stream(splitter);
(async () => {
  let i = 0;
  for await (const { text, phonemes, audio } of stream) {
    console.log({ text, phonemes });
    audio.save(`audio-${i++}.wav`);
  }
})();

// Next, add text to the stream. Note that the text can be added at different times.
// For this example, let's pretend we're consuming text from an LLM, one word at a time.
const text = "Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. It can even run 100% locally in your browser, powered by Transformers.js!";
const tokens = text.match(/\s*\S+/g);
for (const token of tokens) {
  splitter.push(token);
  await new Promise((resolve) => setTimeout(resolve, 10));
}

// Finally, close the stream to signal that no more text will be added.
splitter.close();

// Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
// splitter.flush();

Voices/Samples

Tip

You can find samples for each of the voices in the model card on Hugging Face.

American English

Name Traits Target Quality Training Duration Overall Grade
af_heart 🚺❤️ A
af_alloy 🚺 B MM minutes C
af_aoede 🚺 B H hours C+
af_bella 🚺🔥 A HH hours A-
af_jessica 🚺 C MM minutes D
af_kore 🚺 B H hours C+
af_nicole 🚺🎧 B HH hours B-
af_nova 🚺 B MM minutes C
af_river 🚺 C MM minutes D
af_sarah 🚺 B H hours C+
af_sky 🚺 B M minutes 🤏 C-
am_adam 🚹 D H hours F+
am_echo 🚹 C MM minutes D
am_eric 🚹 C MM minutes D
am_fenrir 🚹 B H hours C+
am_liam 🚹 C MM minutes D
am_michael 🚹 B H hours C+
am_onyx 🚹 C MM minutes D
am_puck 🚹 B H hours C+
am_santa 🚹 C M minutes 🤏 D-

British English

Name Traits Target Quality Training Duration Overall Grade
bf_alice 🚺 C MM minutes D
bf_emma 🚺 B HH hours B-
bf_isabella 🚺 B MM minutes C
bf_lily 🚺 C MM minutes D
bm_daniel 🚹 C MM minutes D
bm_fable 🚹 B MM minutes C
bm_george 🚹 B MM minutes C
bm_lewis 🚹 C H hours D+