Kokoro.js v1.2.0: Streaming support (#92)

* Set up JS project

* Finalise JS library

* Update README

* Fix package.json repository url

* Rename package -> `kokoro-js`

* Fix samples in README

* Cleanup README

* Bump `phonemizer` version

* Create web demo

* Run prettier

* Link to model used in demo

* Enable multithreading in HF space demo (~40% faster)

* Add link to demo in README

* Bump to v1.0.1

* Update voices

* Update versions

* Update phonemize JSDoc

* Use updated voice pack

* Update versions

* Update demo (v1.0 & WebGPU support)

* Update README

* Enforce maximum number of tokens

* Update README

* [version] Update to 1.1.1

* Create simple sentence splitter

* Update `npm run test`

* Update API to use sync and async iterators

* Add support for streamed generation in kokoro.js

* Always split on newlines

* Remove debug line

* Improvements

* Add more matching puntuation marks

* Update comments

* nits

* Export TextSplitterStream too

* Update splitter.js

* Update README

* [version] Update to 1.2.0
This commit is contained in:
Joshua Lochner
2025-02-15 21:06:33 +02:00
committed by GitHub
parent 93abff8795
commit 5229a254b7
6 changed files with 1109 additions and 17 deletions

View File

@@ -37,6 +37,44 @@ const audio = await tts.generate(text, {
audio.save("audio.wav");
```
Or if you'd prefer to stream the output, you can do that with:
```js
import { KokoroTTS, TextSplitterStream } from "kokoro-js";
const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
const tts = await KokoroTTS.from_pretrained(model_id, {
dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
// device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
});
// First, set up the stream
const splitter = new TextSplitterStream();
const stream = tts.stream(splitter);
(async () => {
let i = 0;
for await (const { text, phonemes, audio } of stream) {
console.log({ text, phonemes });
audio.save(`audio-${i++}.wav`);
}
})();
// Next, add text to the stream. Note that the text can be added at different times.
// For this example, let's pretend we're consuming text from an LLM, one word at a time.
const text = "Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. It can even run 100% locally in your browser, powered by Transformers.js!";
const tokens = text.match(/\s*\S+/g);
for (const token of tokens) {
splitter.push(token);
await new Promise((resolve) => setTimeout(resolve, 10));
}
// Finally, close the stream to signal that no more text will be added.
splitter.close();
// Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
// splitter.flush();
```
## Voices/Samples
> [!TIP]