Kokoro.js v1.2.0: Streaming support (#92)
* Set up JS project * Finalise JS library * Update README * Fix package.json repository url * Rename package -> `kokoro-js` * Fix samples in README * Cleanup README * Bump `phonemizer` version * Create web demo * Run prettier * Link to model used in demo * Enable multithreading in HF space demo (~40% faster) * Add link to demo in README * Bump to v1.0.1 * Update voices * Update versions * Update phonemize JSDoc * Use updated voice pack * Update versions * Update demo (v1.0 & WebGPU support) * Update README * Enforce maximum number of tokens * Update README * [version] Update to 1.1.1 * Create simple sentence splitter * Update `npm run test` * Update API to use sync and async iterators * Add support for streamed generation in kokoro.js * Always split on newlines * Remove debug line * Improvements * Add more matching puntuation marks * Update comments * nits * Export TextSplitterStream too * Update splitter.js * Update README * [version] Update to 1.2.0
This commit is contained in:
@@ -37,6 +37,44 @@ const audio = await tts.generate(text, {
|
||||
audio.save("audio.wav");
|
||||
```
|
||||
|
||||
Or if you'd prefer to stream the output, you can do that with:
|
||||
|
||||
```js
|
||||
import { KokoroTTS, TextSplitterStream } from "kokoro-js";
|
||||
|
||||
const model_id = "onnx-community/Kokoro-82M-v1.0-ONNX";
|
||||
const tts = await KokoroTTS.from_pretrained(model_id, {
|
||||
dtype: "fp32", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
|
||||
// device: "webgpu", // Options: "wasm", "webgpu" (web) or "cpu" (node).
|
||||
});
|
||||
|
||||
// First, set up the stream
|
||||
const splitter = new TextSplitterStream();
|
||||
const stream = tts.stream(splitter);
|
||||
(async () => {
|
||||
let i = 0;
|
||||
for await (const { text, phonemes, audio } of stream) {
|
||||
console.log({ text, phonemes });
|
||||
audio.save(`audio-${i++}.wav`);
|
||||
}
|
||||
})();
|
||||
|
||||
// Next, add text to the stream. Note that the text can be added at different times.
|
||||
// For this example, let's pretend we're consuming text from an LLM, one word at a time.
|
||||
const text = "Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects. It can even run 100% locally in your browser, powered by Transformers.js!";
|
||||
const tokens = text.match(/\s*\S+/g);
|
||||
for (const token of tokens) {
|
||||
splitter.push(token);
|
||||
await new Promise((resolve) => setTimeout(resolve, 10));
|
||||
}
|
||||
|
||||
// Finally, close the stream to signal that no more text will be added.
|
||||
splitter.close();
|
||||
|
||||
// Alternatively, if you'd like to keep the stream open, but flush any remaining text, you can use the `flush` method.
|
||||
// splitter.flush();
|
||||
```
|
||||
|
||||
## Voices/Samples
|
||||
|
||||
> [!TIP]
|
||||
|
||||
Reference in New Issue
Block a user