Create Kokoro TTS JavaScript library (#3)
* Set up JS project * Finalise JS library * Update README * Fix package.json repository url * Rename package -> `kokoro-js` * Fix samples in README * Cleanup README * Bump `phonemizer` version * Create web demo * Run prettier * Link to model used in demo * Enable multithreading in HF space demo (~40% faster) * Add link to demo in README * Bump to v1.0.1
This commit is contained in:
4
kokoro.js/.gitignore
vendored
Normal file
4
kokoro.js/.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
node_modules/
|
||||||
|
dist
|
||||||
|
types
|
||||||
|
LICENSE
|
||||||
2
kokoro.js/.prettierignore
Normal file
2
kokoro.js/.prettierignore
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
dist
|
||||||
|
types
|
||||||
55
kokoro.js/README.md
Normal file
55
kokoro.js/README.md
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
# Kokoro TTS
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://www.npmjs.com/package/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/kokoro-js"></a>
|
||||||
|
<a href="https://www.npmjs.com/package/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/kokoro-js"></a>
|
||||||
|
<a href="https://www.jsdelivr.com/package/npm/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/kokoro-js"></a>
|
||||||
|
<a href="https://github.com/hexgrad/kokoro/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/hexgrad/kokoro?color=blue"></a>
|
||||||
|
<a href="https://huggingface.co/spaces/webml-community/kokoro-web"><img alt="Demo" src="https://img.shields.io/badge/Hugging_Face-demo-green"></a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). This JavaScript library allows the model to be run 100% locally in the browser thanks to [🤗 Transformers.js](https://huggingface.co/docs/transformers.js). Try it out using our [online demo](https://huggingface.co/spaces/webml-community/kokoro-web)!
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
First, install the `kokoro-js` library from [NPM](https://npmjs.com/package/kokoro-js) using:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm i kokoro-js
|
||||||
|
```
|
||||||
|
|
||||||
|
You can then generate speech as follows:
|
||||||
|
|
||||||
|
```js
|
||||||
|
import { KokoroTTS } from "kokoro-js";
|
||||||
|
|
||||||
|
const model_id = "onnx-community/Kokoro-82M-ONNX";
|
||||||
|
const tts = await KokoroTTS.from_pretrained(model_id, {
|
||||||
|
dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
|
||||||
|
});
|
||||||
|
|
||||||
|
const text = "Life is like a box of chocolates. You never know what you're gonna get.";
|
||||||
|
const audio = await tts.generate(text, {
|
||||||
|
// Use `tts.list_voices()` to list all available voices
|
||||||
|
voice: "af_bella",
|
||||||
|
});
|
||||||
|
audio.save("audio.wav");
|
||||||
|
```
|
||||||
|
|
||||||
|
## Voices/Samples
|
||||||
|
|
||||||
|
> Life is like a box of chocolates. You never know what you're gonna get.
|
||||||
|
|
||||||
|
| Voice | Nationality | Gender | Sample |
|
||||||
|
| ------------------------ | ----------- | ------ | -------------------------------------------------------------------------------------------------------- |
|
||||||
|
| Default (`af`) | American | Female | <video controls src="https://github.com/user-attachments/assets/c183df83-58a9-4aea-8fdf-225092acec57" /> |
|
||||||
|
| Bella (`af_bella`) | American | Female | <video controls src="https://github.com/user-attachments/assets/0730fff0-22b3-458f-9675-36d313d872d6" /> |
|
||||||
|
| Nicole (`af_nicole`) | American | Female | <video controls src="https://github.com/user-attachments/assets/4ce0b3f6-eaec-4e47-901c-9d29e2b60c86" /> |
|
||||||
|
| Sarah (`af_sarah`) | American | Female | <video controls src="https://github.com/user-attachments/assets/d37dba3f-de59-44c4-bc3d-da91ea1b5a4a" /> |
|
||||||
|
| Sky (`af_sky`) | American | Female | <video controls src="https://github.com/user-attachments/assets/38230be5-881c-4407-81e6-a0b1e4101565" /> |
|
||||||
|
| Adam (`am_adam`) | American | Male | <video controls src="https://github.com/user-attachments/assets/66a4c439-e80b-4c91-8a27-ae094486a2d8" /> |
|
||||||
|
| Michael (`am_michael`) | American | Male | <video controls src="https://github.com/user-attachments/assets/79a8879d-b564-4222-b2d5-a97f783ae897" /> |
|
||||||
|
| Emma (`bf_emma`) | British | Female | <video controls src="https://github.com/user-attachments/assets/ad5eb254-1d84-4282-9d23-371d5765d820" /> |
|
||||||
|
| Isabella (`bf_isabella`) | British | Female | <video controls src="https://github.com/user-attachments/assets/ea7e6825-dad0-403c-9ece-680af04f5a25" /> |
|
||||||
|
| George (`bm_george`) | British | Male | <video controls src="https://github.com/user-attachments/assets/e09040aa-578f-40a6-b7fd-76a5b005346c" /> |
|
||||||
|
| Lewis (`bm_lewis`) | British | Male | <video controls src="https://github.com/user-attachments/assets/5d7b26bf-8900-4a9a-8ee5-a16c39bb834c" /> |
|
||||||
24
kokoro.js/demo/.gitignore
vendored
Normal file
24
kokoro.js/demo/.gitignore
vendored
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
# Logs
|
||||||
|
logs
|
||||||
|
*.log
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
lerna-debug.log*
|
||||||
|
|
||||||
|
node_modules
|
||||||
|
dist
|
||||||
|
dist-ssr
|
||||||
|
*.local
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.vscode/*
|
||||||
|
!.vscode/extensions.json
|
||||||
|
.idea
|
||||||
|
.DS_Store
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
||||||
59
kokoro.js/demo/README.md
Normal file
59
kokoro.js/demo/README.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
---
|
||||||
|
title: Kokoro Text-to-Speech
|
||||||
|
emoji: 🗣️
|
||||||
|
colorFrom: indigo
|
||||||
|
colorTo: purple
|
||||||
|
sdk: static
|
||||||
|
pinned: false
|
||||||
|
license: apache-2.0
|
||||||
|
short_description: High-quality speech synthesis powered by Kokoro TTS
|
||||||
|
header: mini
|
||||||
|
models:
|
||||||
|
- onnx-community/Kokoro-82M-ONNX
|
||||||
|
custom_headers:
|
||||||
|
cross-origin-embedder-policy: require-corp
|
||||||
|
cross-origin-opener-policy: same-origin
|
||||||
|
cross-origin-resource-policy: cross-origin
|
||||||
|
---
|
||||||
|
|
||||||
|
# Kokoro Text-to-Speech
|
||||||
|
|
||||||
|
A simple React + Vite application for running [Kokoro](https://github.com/hexgrad/kokoro), a frontier text-to-speech model for its size. The model runs 100% locally in the browser using [kokoro-js](https://www.npmjs.com/package/kokoro-js) and [🤗 Transformers.js](https://www.npmjs.com/package/@huggingface/transformers)!
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
Follow the steps below to set up and run the application.
|
||||||
|
|
||||||
|
### 1. Clone the Repository
|
||||||
|
|
||||||
|
Clone the examples repository from GitHub:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
git clone https://github.com/hexgrad/kokoro.git
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Navigate to the Project Directory
|
||||||
|
|
||||||
|
Change your working directory to the `demo` folder:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
cd kokoro/kokoro.js/demo
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Install Dependencies
|
||||||
|
|
||||||
|
Install the necessary dependencies using npm:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
npm i
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Run the Development Server
|
||||||
|
|
||||||
|
Start the development server:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
The application should now be running locally. Open your browser and go to `http://localhost:5173` to see it in action.
|
||||||
35
kokoro.js/demo/eslint.config.js
Normal file
35
kokoro.js/demo/eslint.config.js
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
import js from "@eslint/js";
|
||||||
|
import globals from "globals";
|
||||||
|
import react from "eslint-plugin-react";
|
||||||
|
import reactHooks from "eslint-plugin-react-hooks";
|
||||||
|
import reactRefresh from "eslint-plugin-react-refresh";
|
||||||
|
|
||||||
|
export default [
|
||||||
|
{ ignores: ["dist"] },
|
||||||
|
{
|
||||||
|
files: ["**/*.{js,jsx}"],
|
||||||
|
languageOptions: {
|
||||||
|
ecmaVersion: 2020,
|
||||||
|
globals: globals.browser,
|
||||||
|
parserOptions: {
|
||||||
|
ecmaVersion: "latest",
|
||||||
|
ecmaFeatures: { jsx: true },
|
||||||
|
sourceType: "module",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
settings: { react: { version: "18.3" } },
|
||||||
|
plugins: {
|
||||||
|
react,
|
||||||
|
"react-hooks": reactHooks,
|
||||||
|
"react-refresh": reactRefresh,
|
||||||
|
},
|
||||||
|
rules: {
|
||||||
|
...js.configs.recommended.rules,
|
||||||
|
...react.configs.recommended.rules,
|
||||||
|
...react.configs["jsx-runtime"].rules,
|
||||||
|
...reactHooks.configs.recommended.rules,
|
||||||
|
"react/jsx-no-target-blank": "off",
|
||||||
|
"react-refresh/only-export-components": ["warn", { allowConstantExport: true }],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
];
|
||||||
13
kokoro.js/demo/index.html
Normal file
13
kokoro.js/demo/index.html
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<link rel="icon" type="image/svg+xml" href="/hf-logo.svg" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>Kokoro Text-to-Speech</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="root"></div>
|
||||||
|
<script type="module" src="/src/main.jsx"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
4680
kokoro.js/demo/package-lock.json
generated
Normal file
4680
kokoro.js/demo/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
33
kokoro.js/demo/package.json
Normal file
33
kokoro.js/demo/package.json
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
{
|
||||||
|
"name": "kokoro-web",
|
||||||
|
"private": true,
|
||||||
|
"version": "0.0.0",
|
||||||
|
"type": "module",
|
||||||
|
"scripts": {
|
||||||
|
"dev": "vite",
|
||||||
|
"build": "vite build",
|
||||||
|
"lint": "eslint .",
|
||||||
|
"preview": "vite preview"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"kokoro-js": "file:..",
|
||||||
|
"motion": "^11.12.0",
|
||||||
|
"react": "^18.3.1",
|
||||||
|
"react-dom": "^18.3.1"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@eslint/js": "^9.15.0",
|
||||||
|
"@types/react": "^18.3.12",
|
||||||
|
"@types/react-dom": "^18.3.1",
|
||||||
|
"@vitejs/plugin-react": "^4.3.4",
|
||||||
|
"autoprefixer": "^10.4.20",
|
||||||
|
"eslint": "^9.15.0",
|
||||||
|
"eslint-plugin-react": "^7.37.2",
|
||||||
|
"eslint-plugin-react-hooks": "^5.0.0",
|
||||||
|
"eslint-plugin-react-refresh": "^0.4.14",
|
||||||
|
"globals": "^15.12.0",
|
||||||
|
"postcss": "^8.4.49",
|
||||||
|
"tailwindcss": "^3.4.15",
|
||||||
|
"vite": "^6.0.1"
|
||||||
|
}
|
||||||
|
}
|
||||||
6
kokoro.js/demo/postcss.config.js
Normal file
6
kokoro.js/demo/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
export default {
|
||||||
|
plugins: {
|
||||||
|
tailwindcss: {},
|
||||||
|
autoprefixer: {},
|
||||||
|
},
|
||||||
|
};
|
||||||
8
kokoro.js/demo/public/hf-logo.svg
Normal file
8
kokoro.js/demo/public/hf-logo.svg
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 34 KiB |
9
kokoro.js/demo/public/wave.svg
Normal file
9
kokoro.js/demo/public/wave.svg
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="198">
|
||||||
|
<defs>
|
||||||
|
<linearGradient id="a" x1="50%" x2="50%" y1="-10.959%" y2="100%">
|
||||||
|
<stop stop-color="#57BBC1" stop-opacity=".25" offset="0%"/>
|
||||||
|
<stop stop-color="#015871" offset="100%"/>
|
||||||
|
</linearGradient>
|
||||||
|
</defs>
|
||||||
|
<path fill="url(#a)" fill-rule="evenodd" d="M.005 121C311 121 409.898-.25 811 0c400 0 500 121 789 121v77H0s.005-48 .005-77z" transform="matrix(-1 0 0 1 1600 0)"/>
|
||||||
|
</svg>
|
||||||
|
After Width: | Height: | Size: 465 B |
144
kokoro.js/demo/src/App.jsx
Normal file
144
kokoro.js/demo/src/App.jsx
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
import { useRef, useState, useEffect } from "react";
|
||||||
|
import { motion } from "motion/react";
|
||||||
|
|
||||||
|
export default function App() {
|
||||||
|
// Create a reference to the worker object.
|
||||||
|
const worker = useRef(null);
|
||||||
|
|
||||||
|
const [inputText, setInputText] = useState("Life is like a box of chocolates. You never know what you're gonna get.");
|
||||||
|
const [selectedSpeaker, setSelectedSpeaker] = useState("af");
|
||||||
|
|
||||||
|
const [status, setStatus] = useState(null);
|
||||||
|
const [error, setError] = useState(null);
|
||||||
|
const [loadingMessage, setLoadingMessage] = useState("Loading model (only downloaded once)...");
|
||||||
|
|
||||||
|
const [results, setResults] = useState([]);
|
||||||
|
|
||||||
|
// We use the `useEffect` hook to setup the worker as soon as the `App` component is mounted.
|
||||||
|
useEffect(() => {
|
||||||
|
// Create the worker if it does not yet exist.
|
||||||
|
worker.current ??= new Worker(new URL("./worker.js", import.meta.url), {
|
||||||
|
type: "module",
|
||||||
|
});
|
||||||
|
|
||||||
|
// Create a callback function for messages from the worker thread.
|
||||||
|
const onMessageReceived = (e) => {
|
||||||
|
switch (e.data.status) {
|
||||||
|
// TODO: WebGPU feature checking
|
||||||
|
// case "feature-success":
|
||||||
|
// break;
|
||||||
|
|
||||||
|
// case "feature-error":
|
||||||
|
// setError(e.data.data);
|
||||||
|
// break;
|
||||||
|
|
||||||
|
case "ready":
|
||||||
|
setStatus("ready");
|
||||||
|
break;
|
||||||
|
|
||||||
|
case "complete":
|
||||||
|
const { audio, text } = e.data;
|
||||||
|
// Generation complete: re-enable the "Generate" button
|
||||||
|
setResults((prev) => [{ text, src: audio }, ...prev]);
|
||||||
|
setStatus("ready");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const onErrorReceived = (e) => {
|
||||||
|
console.error("Worker error:", e);
|
||||||
|
};
|
||||||
|
|
||||||
|
// Attach the callback function as an event listener.
|
||||||
|
worker.current.addEventListener("message", onMessageReceived);
|
||||||
|
worker.current.addEventListener("error", onErrorReceived);
|
||||||
|
|
||||||
|
// Define a cleanup function for when the component is unmounted.
|
||||||
|
return () => {
|
||||||
|
worker.current.removeEventListener("message", onMessageReceived);
|
||||||
|
worker.current.removeEventListener("error", onErrorReceived);
|
||||||
|
};
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
const handleSubmit = (e) => {
|
||||||
|
e.preventDefault();
|
||||||
|
setStatus("running");
|
||||||
|
|
||||||
|
worker.current.postMessage({
|
||||||
|
type: "generate",
|
||||||
|
text: inputText.trim(),
|
||||||
|
voice: selectedSpeaker,
|
||||||
|
});
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div className="relative w-full min-h-screen bg-gradient-to-br from-gray-900 to-gray-700 flex flex-col items-center justify-center p-4 relative overflow-hidden font-sans">
|
||||||
|
<motion.div initial={{ opacity: 1 }} animate={{ opacity: status === null ? 1 : 0 }} transition={{ duration: 0.5 }} className="absolute w-screen h-screen justify-center flex flex-col items-center z-10 bg-gray-800/95 backdrop-blur-md" style={{ pointerEvents: status === null ? "auto" : "none" }}>
|
||||||
|
<div className="w-[250px] h-[250px] border-4 border-white shadow-[0_0_0_5px_#4973ff] rounded-full overflow-hidden">
|
||||||
|
<div className="loading-wave"></div>
|
||||||
|
</div>
|
||||||
|
<p className={`text-3xl my-5 text-center ${error ? "text-red-500" : "text-white"}`}>{error ?? loadingMessage}</p>
|
||||||
|
</motion.div>
|
||||||
|
|
||||||
|
<div className="max-w-3xl w-full space-y-8 relative z-[2]">
|
||||||
|
<div className="text-center">
|
||||||
|
<h1 className="text-5xl font-extrabold text-gray-100 mb-2 drop-shadow-lg font-heading">Kokoro Text-to-Speech</h1>
|
||||||
|
<p className="text-2xl text-gray-300 font-semibold font-subheading">
|
||||||
|
Powered by
|
||||||
|
<a href="https://github.com/hexgrad/kokoro" target="_blank" rel="noreferrer" className="underline">
|
||||||
|
Kokoro
|
||||||
|
</a>
|
||||||
|
and
|
||||||
|
<a href="https://huggingface.co/docs/transformers.js" target="_blank" rel="noreferrer" className="underline">
|
||||||
|
<img width="40" src="hf-logo.svg" className="inline translate-y-[-2px] me-1"></img>Transformers.js
|
||||||
|
</a>
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
<div className="bg-gray-800/50 backdrop-blur-sm border border-gray-700 rounded-lg p-6">
|
||||||
|
<form onSubmit={handleSubmit} className="space-y-4">
|
||||||
|
<textarea placeholder="Enter text..." value={inputText} onChange={(e) => setInputText(e.target.value)} className="w-full min-h-[100px] max-h-[300px] bg-gray-700/50 backdrop-blur-sm border-2 border-gray-600 rounded-xl resize-y text-gray-100 placeholder-gray-400 px-3 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent" rows={Math.min(8, inputText.split("\n").length)} />
|
||||||
|
<div className="flex flex-col items-center space-y-4">
|
||||||
|
<select value={selectedSpeaker} onChange={(e) => setSelectedSpeaker(e.target.value)} className="w-full bg-gray-700/50 backdrop-blur-sm border-2 border-gray-600 rounded-xl text-gray-100 px-3 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent">
|
||||||
|
<option value="af">Default (American Female)</option>
|
||||||
|
<option value="af_bella">Bella (American Female)</option>
|
||||||
|
<option value="af_nicole">Nicole (American Female)</option>
|
||||||
|
<option value="af_sarah">Sarah (American Female)</option>
|
||||||
|
<option value="af_sky">Sky (American Female)</option>
|
||||||
|
<option value="am_adam">Adam (American Male)</option>
|
||||||
|
<option value="am_michael">Michael (American Male)</option>
|
||||||
|
<option value="bf_emma">Emma (British Female)</option>
|
||||||
|
<option value="bf_isabella">Isabella (British Female)</option>
|
||||||
|
<option value="bm_george">George (British Male)</option>
|
||||||
|
<option value="bm_lewis">Lewis (British Male)</option>
|
||||||
|
</select>
|
||||||
|
<button type="submit" className="inline-flex justify-center items-center px-6 py-2 text-lg font-semibold bg-gradient-to-t from-blue-600 to-purple-600 hover:from-blue-700 hover:to-purple-700 transition-colors duration-300 rounded-xl text-white disabled:opacity-50" disabled={status === "running" || inputText.trim() === ""}>
|
||||||
|
{status === "running" ? "Generating..." : "Generate"}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{results.length > 0 && (
|
||||||
|
<motion.div initial={{ y: 50, opacity: 0 }} animate={{ y: 0, opacity: 1 }} transition={{ duration: 0.5 }} className="max-h-[250px] overflow-y-auto px-2 mt-4 space-y-6 relative z-[2]">
|
||||||
|
{results.map((result, i) => (
|
||||||
|
<div key={i}>
|
||||||
|
<div className="text-white bg-gray-800/70 backdrop-blur-sm border border-gray-700 rounded-lg p-4 z-10">
|
||||||
|
<span className="absolute right-5 font-bold">#{results.length - i}</span>
|
||||||
|
<p className="mb-3 max-w-[95%]">{result.text}</p>
|
||||||
|
<audio controls src={result.src} className="w-full">
|
||||||
|
Your browser does not support the audio element.
|
||||||
|
</audio>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</motion.div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="bg-[#015871] pointer-events-none absolute left-0 w-full h-[5%] bottom-[-50px]">
|
||||||
|
<div className="wave"></div>
|
||||||
|
<div className="wave"></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
100
kokoro.js/demo/src/index.css
Normal file
100
kokoro.js/demo/src/index.css
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
@tailwind base;
|
||||||
|
@tailwind components;
|
||||||
|
@tailwind utilities;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Wave animations adapted from the following two demos:
|
||||||
|
* - https://codepen.io/upasanaasopa/pen/poObEWZ
|
||||||
|
* - https://codepen.io/breakstorm00/pen/qBJZQNB
|
||||||
|
*/
|
||||||
|
|
||||||
|
*,
|
||||||
|
*:before,
|
||||||
|
*:after {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
|
||||||
|
.loading-wave {
|
||||||
|
position: relative;
|
||||||
|
top: 0;
|
||||||
|
width: 100%;
|
||||||
|
height: 100%;
|
||||||
|
background: #2c74b3;
|
||||||
|
border-radius: 50%;
|
||||||
|
box-shadow: inset 0 0 50px 0 rgba(0, 0, 0, 0.5);
|
||||||
|
}
|
||||||
|
|
||||||
|
.loading-wave:before,
|
||||||
|
.loading-wave:after {
|
||||||
|
content: "";
|
||||||
|
position: absolute;
|
||||||
|
top: 0;
|
||||||
|
left: 50%;
|
||||||
|
width: 200%;
|
||||||
|
height: 200%;
|
||||||
|
background: black;
|
||||||
|
transform: translate(-50%, -75%);
|
||||||
|
}
|
||||||
|
|
||||||
|
.loading-wave:before {
|
||||||
|
border-radius: 45%;
|
||||||
|
background: rgba(255, 255, 255, 1);
|
||||||
|
animation: animate 5s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
.loading-wave:after {
|
||||||
|
border-radius: 40%;
|
||||||
|
background: rgba(255, 255, 255, 0.5);
|
||||||
|
animation: animate 10s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
.wave {
|
||||||
|
background: url(/wave.svg) repeat-x;
|
||||||
|
position: absolute;
|
||||||
|
top: -198px;
|
||||||
|
width: 6400px;
|
||||||
|
height: 198px;
|
||||||
|
animation: wave 7s cubic-bezier(0.36, 0.45, 0.63, 0.53) infinite;
|
||||||
|
transform: translate3d(0, 0, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
.wave:nth-of-type(2) {
|
||||||
|
top: -175px;
|
||||||
|
animation:
|
||||||
|
wave 7s cubic-bezier(0.36, 0.45, 0.63, 0.53) -0.125s infinite,
|
||||||
|
swell 7s ease -1.25s infinite;
|
||||||
|
opacity: 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes wave {
|
||||||
|
0% {
|
||||||
|
margin-left: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
100% {
|
||||||
|
margin-left: -1600px;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes swell {
|
||||||
|
0%,
|
||||||
|
100% {
|
||||||
|
transform: translate3d(0, -25px, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
50% {
|
||||||
|
transform: translate3d(0, 5px, 0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes animate {
|
||||||
|
0% {
|
||||||
|
transform: translate(-50%, -75%) rotate(0deg);
|
||||||
|
}
|
||||||
|
|
||||||
|
100% {
|
||||||
|
transform: translate(-50%, -75%) rotate(360deg);
|
||||||
|
}
|
||||||
|
}
|
||||||
10
kokoro.js/demo/src/main.jsx
Normal file
10
kokoro.js/demo/src/main.jsx
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
import { StrictMode } from "react";
|
||||||
|
import { createRoot } from "react-dom/client";
|
||||||
|
import "./index.css";
|
||||||
|
import App from "./App.jsx";
|
||||||
|
|
||||||
|
createRoot(document.getElementById("root")).render(
|
||||||
|
<StrictMode>
|
||||||
|
<App />
|
||||||
|
</StrictMode>,
|
||||||
|
);
|
||||||
20
kokoro.js/demo/src/worker.js
Normal file
20
kokoro.js/demo/src/worker.js
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
import { KokoroTTS } from "kokoro-js";
|
||||||
|
|
||||||
|
const model_id = "onnx-community/Kokoro-82M-ONNX";
|
||||||
|
const tts = await KokoroTTS.from_pretrained(model_id, {
|
||||||
|
dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
|
||||||
|
});
|
||||||
|
|
||||||
|
self.postMessage({ status: "ready" });
|
||||||
|
|
||||||
|
// Listen for messages from the main thread
|
||||||
|
self.addEventListener("message", async (e) => {
|
||||||
|
const { text, voice } = e.data;
|
||||||
|
|
||||||
|
// Generate speech
|
||||||
|
const audio = await tts.generate(text, { voice });
|
||||||
|
|
||||||
|
// Send the audio file back to the main thread
|
||||||
|
const blob = audio.toBlob();
|
||||||
|
self.postMessage({ status: "complete", audio: URL.createObjectURL(blob), text });
|
||||||
|
});
|
||||||
8
kokoro.js/demo/tailwind.config.js
Normal file
8
kokoro.js/demo/tailwind.config.js
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
/** @type {import('tailwindcss').Config} */
|
||||||
|
export default {
|
||||||
|
content: ["./index.html", "./src/**/*.{js,ts,jsx,tsx}"],
|
||||||
|
theme: {
|
||||||
|
extend: {},
|
||||||
|
},
|
||||||
|
plugins: [],
|
||||||
|
};
|
||||||
12
kokoro.js/demo/vite.config.js
Normal file
12
kokoro.js/demo/vite.config.js
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
import { defineConfig } from "vite";
|
||||||
|
import react from "@vitejs/plugin-react";
|
||||||
|
|
||||||
|
// https://vite.dev/config/
|
||||||
|
export default defineConfig({
|
||||||
|
plugins: [react()],
|
||||||
|
worker: { format: "es" },
|
||||||
|
build: {
|
||||||
|
target: "esnext",
|
||||||
|
},
|
||||||
|
logLevel: process.env.NODE_ENV === "development" ? "error" : "info",
|
||||||
|
});
|
||||||
2972
kokoro.js/package-lock.json
generated
Normal file
2972
kokoro.js/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
65
kokoro.js/package.json
Normal file
65
kokoro.js/package.json
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
{
|
||||||
|
"name": "kokoro-js",
|
||||||
|
"version": "1.0.1",
|
||||||
|
"type": "module",
|
||||||
|
"exports": {
|
||||||
|
"types": "./types/kokoro.d.ts",
|
||||||
|
"node": {
|
||||||
|
"import": "./dist/kokoro.js",
|
||||||
|
"require": "./dist/kokoro.cjs"
|
||||||
|
},
|
||||||
|
"default": "./dist/kokoro.web.js"
|
||||||
|
},
|
||||||
|
"scripts": {
|
||||||
|
"build": "rm -rf dist types && rollup -c && tsc && cp ../LICENSE LICENSE",
|
||||||
|
"format": "prettier --write . --print-width 1000",
|
||||||
|
"test": "vitest"
|
||||||
|
},
|
||||||
|
"keywords": [
|
||||||
|
"kokoro",
|
||||||
|
"tts",
|
||||||
|
"text-to-speech"
|
||||||
|
],
|
||||||
|
"author": {
|
||||||
|
"name": "hexgrad",
|
||||||
|
"email": "hello@hexgrad.com"
|
||||||
|
},
|
||||||
|
"browser": {
|
||||||
|
"path": false,
|
||||||
|
"fs/promises": false
|
||||||
|
},
|
||||||
|
"contributors": [
|
||||||
|
"Xenova"
|
||||||
|
],
|
||||||
|
"license": "Apache-2.0",
|
||||||
|
"description": "High-quality text-to-speech for the web",
|
||||||
|
"dependencies": {
|
||||||
|
"@huggingface/transformers": "^3.3.1",
|
||||||
|
"phonemizer": "^1.2.1"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@rollup/plugin-node-resolve": "^16.0.0",
|
||||||
|
"@rollup/plugin-terser": "^0.4.4",
|
||||||
|
"prettier": "3.4.2",
|
||||||
|
"rollup": "^4.30.1",
|
||||||
|
"typescript": "^5.7.3",
|
||||||
|
"vitest": "^2.1.8"
|
||||||
|
},
|
||||||
|
"files": [
|
||||||
|
"types",
|
||||||
|
"dist",
|
||||||
|
"voices",
|
||||||
|
"README.md",
|
||||||
|
"LICENSE"
|
||||||
|
],
|
||||||
|
"homepage": "https://github.com/hexgrad/kokoro",
|
||||||
|
"repository": {
|
||||||
|
"type": "git",
|
||||||
|
"url": "git+https://github.com/hexgrad/kokoro.git"
|
||||||
|
},
|
||||||
|
"publishConfig": {
|
||||||
|
"access": "public"
|
||||||
|
},
|
||||||
|
"jsdelivr": "./dist/kokoro.web.js",
|
||||||
|
"unpkg": "./dist/kokoro.web.js"
|
||||||
|
}
|
||||||
42
kokoro.js/rollup.config.js
Normal file
42
kokoro.js/rollup.config.js
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
import terser from "@rollup/plugin-terser";
|
||||||
|
import { nodeResolve } from "@rollup/plugin-node-resolve";
|
||||||
|
|
||||||
|
const plugins = (browser) => [nodeResolve({ browser }), terser({ format: { comments: false } })];
|
||||||
|
|
||||||
|
const OUTPUT_CONFIGS = [
|
||||||
|
// Node versions
|
||||||
|
{
|
||||||
|
file: "./dist/kokoro.cjs",
|
||||||
|
format: "cjs",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
file: "./dist/kokoro.js",
|
||||||
|
format: "esm",
|
||||||
|
},
|
||||||
|
|
||||||
|
// Web version
|
||||||
|
{
|
||||||
|
file: "./dist/kokoro.web.js",
|
||||||
|
format: "esm",
|
||||||
|
},
|
||||||
|
];
|
||||||
|
|
||||||
|
const WEB_SPECIFIC_CONFIG = {
|
||||||
|
onwarn: (warning, warn) => {
|
||||||
|
if (!warning.message.includes("@huggingface/transformers")) warn(warning);
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
const NODE_SPECIFIC_CONFIG = {
|
||||||
|
external: ["@huggingface/transformers", "phonemizer"],
|
||||||
|
};
|
||||||
|
|
||||||
|
export default OUTPUT_CONFIGS.map((output) => {
|
||||||
|
const web = output.file.endsWith(".web.js");
|
||||||
|
return {
|
||||||
|
input: "./src/kokoro.js",
|
||||||
|
output,
|
||||||
|
plugins: plugins(web),
|
||||||
|
...(web ? WEB_SPECIFIC_CONFIG : NODE_SPECIFIC_CONFIG),
|
||||||
|
};
|
||||||
|
});
|
||||||
90
kokoro.js/src/kokoro.js
Normal file
90
kokoro.js/src/kokoro.js
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
import { StyleTextToSpeech2Model, AutoTokenizer, Tensor, RawAudio } from "@huggingface/transformers";
|
||||||
|
import { phonemize } from "./phonemize.js";
|
||||||
|
import { getVoiceData, VOICES } from "./voices.js";
|
||||||
|
|
||||||
|
const STYLE_DIM = 256;
|
||||||
|
const SAMPLE_RATE = 24000;
|
||||||
|
|
||||||
|
export class KokoroTTS {
|
||||||
|
/**
|
||||||
|
* Create a new KokoroTTS instance.
|
||||||
|
* @param {import('@huggingface/transformers').StyleTextToSpeech2Model} model The model
|
||||||
|
* @param {import('@huggingface/transformers').PreTrainedTokenizer} tokenizer The tokenizer
|
||||||
|
*/
|
||||||
|
constructor(model, tokenizer) {
|
||||||
|
this.model = model;
|
||||||
|
this.tokenizer = tokenizer;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Load a KokoroTTS model from the Hugging Face Hub.
|
||||||
|
* @param {string} model_id The model id
|
||||||
|
* @param {Object} options Additional options
|
||||||
|
* @param {"fp32"|"fp16"|"q8"|"q4"|"q4f16"} [options.dtype="fp32"] The data type to use.
|
||||||
|
* @param {"wasm"|"webgpu"|"cpu"|null} [options.device=null] The device to run the model on.
|
||||||
|
* @param {import("@huggingface/transformers").ProgressCallback} [options.progress_callback=null] A callback function that is called with progress information.
|
||||||
|
* @returns {Promise<KokoroTTS>} The loaded model
|
||||||
|
*/
|
||||||
|
static async from_pretrained(model_id, { dtype = "fp32", device = null, progress_callback = null } = {}) {
|
||||||
|
const model = StyleTextToSpeech2Model.from_pretrained(model_id, { progress_callback, dtype, device });
|
||||||
|
const tokenizer = AutoTokenizer.from_pretrained(model_id, { progress_callback });
|
||||||
|
|
||||||
|
const info = await Promise.all([model, tokenizer]);
|
||||||
|
return new KokoroTTS(...info);
|
||||||
|
}
|
||||||
|
|
||||||
|
get voices() {
|
||||||
|
return VOICES;
|
||||||
|
}
|
||||||
|
|
||||||
|
list_voices() {
|
||||||
|
console.table(VOICES);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Generate audio from text.
|
||||||
|
*
|
||||||
|
* Note: The model will be loaded on the first call, and subsequent calls will use the same model.
|
||||||
|
* @param {string} text The input text
|
||||||
|
* @param {Object} options Additional options
|
||||||
|
* @param {keyof typeof VOICES} [options.voice="af"] The voice style to use
|
||||||
|
* @param {number} [options.speed=1] The speaking speed
|
||||||
|
* @returns {Promise<RawAudio>} The generated audio
|
||||||
|
*/
|
||||||
|
async generate(text, { voice = "af", speed = 1 } = {}) {
|
||||||
|
if (!VOICES.hasOwnProperty(voice)) {
|
||||||
|
console.error(`Voice "${voice}" not found. Available voices:`);
|
||||||
|
console.table(VOICES);
|
||||||
|
throw new Error(`Voice "${voice}" not found. Should be one of: ${Object.keys(VOICES).join(", ")}.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const language = voice.at(0); // "a" or "b"
|
||||||
|
const phonemes = await phonemize(text, language);
|
||||||
|
const { input_ids } = this.tokenizer(phonemes, {
|
||||||
|
truncation: true,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Select voice style based on number of input tokens
|
||||||
|
const num_tokens = Math.max(
|
||||||
|
input_ids.dims.at(-1) - 2, // Without padding;
|
||||||
|
0,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Load voice style
|
||||||
|
const data = await getVoiceData(voice);
|
||||||
|
const offset = num_tokens * STYLE_DIM;
|
||||||
|
const voiceData = data.slice(offset, offset + STYLE_DIM);
|
||||||
|
|
||||||
|
// Prepare model inputs
|
||||||
|
const inputs = {
|
||||||
|
input_ids,
|
||||||
|
style: new Tensor("float32", voiceData, [1, STYLE_DIM]),
|
||||||
|
speed: new Tensor("float32", [speed], [1]),
|
||||||
|
};
|
||||||
|
|
||||||
|
// Generate audio
|
||||||
|
const { waveform } = await this.model(inputs);
|
||||||
|
|
||||||
|
return new RawAudio(waveform.data, SAMPLE_RATE);
|
||||||
|
}
|
||||||
|
}
|
||||||
197
kokoro.js/src/phonemize.js
Normal file
197
kokoro.js/src/phonemize.js
Normal file
@@ -0,0 +1,197 @@
|
|||||||
|
import { phonemize as espeakng } from "phonemizer";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Helper function to split a string on a regex, but keep the delimiters.
|
||||||
|
* This is required, because the JavaScript `.split()` method does not keep the delimiters,
|
||||||
|
* and wrapping in a capturing group causes issues with existing capturing groups (due to nesting).
|
||||||
|
* @param {string} text The text to split.
|
||||||
|
* @param {RegExp} regex The regex to split on.
|
||||||
|
* @returns {{match: boolean; text: string}[]} The split string.
|
||||||
|
*/
|
||||||
|
function split(text, regex) {
|
||||||
|
const result = [];
|
||||||
|
let prev = 0;
|
||||||
|
for (const match of text.matchAll(regex)) {
|
||||||
|
const fullMatch = match[0];
|
||||||
|
if (prev < match.index) {
|
||||||
|
result.push({ match: false, text: text.slice(prev, match.index) });
|
||||||
|
}
|
||||||
|
if (fullMatch.length > 0) {
|
||||||
|
result.push({ match: true, text: fullMatch });
|
||||||
|
}
|
||||||
|
prev = match.index + fullMatch.length;
|
||||||
|
}
|
||||||
|
if (prev < text.length) {
|
||||||
|
result.push({ match: false, text: text.slice(prev) });
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Helper function to split numbers into phonetic equivalents
|
||||||
|
* @param {string} match The matched number
|
||||||
|
* @returns {string} The phonetic equivalent
|
||||||
|
*/
|
||||||
|
function split_num(match) {
|
||||||
|
if (match.includes(".")) {
|
||||||
|
return match;
|
||||||
|
} else if (match.includes(":")) {
|
||||||
|
let [h, m] = match.split(":").map(Number);
|
||||||
|
if (m === 0) {
|
||||||
|
return `${h} o'clock`;
|
||||||
|
} else if (m < 10) {
|
||||||
|
return `${h} oh ${m}`;
|
||||||
|
}
|
||||||
|
return `${h} ${m}`;
|
||||||
|
}
|
||||||
|
let year = parseInt(match.slice(0, 4), 10);
|
||||||
|
if (year < 1100 || year % 1000 < 10) {
|
||||||
|
return match;
|
||||||
|
}
|
||||||
|
let left = match.slice(0, 2);
|
||||||
|
let right = parseInt(match.slice(2, 4), 10);
|
||||||
|
let suffix = match.endsWith("s") ? "s" : "";
|
||||||
|
if (year % 1000 >= 100 && year % 1000 <= 999) {
|
||||||
|
if (right === 0) {
|
||||||
|
return `${left} hundred${suffix}`;
|
||||||
|
} else if (right < 10) {
|
||||||
|
return `${left} oh ${right}${suffix}`;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return `${left} ${right}${suffix}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Helper function to format monetary values
|
||||||
|
* @param {string} match The matched currency
|
||||||
|
* @returns {string} The formatted currency
|
||||||
|
*/
|
||||||
|
function flip_money(match) {
|
||||||
|
const bill = match[0] === "$" ? "dollar" : "pound";
|
||||||
|
if (isNaN(Number(match.slice(1)))) {
|
||||||
|
return `${match.slice(1)} ${bill}s`;
|
||||||
|
} else if (!match.includes(".")) {
|
||||||
|
let suffix = match.slice(1) === "1" ? "" : "s";
|
||||||
|
return `${match.slice(1)} ${bill}${suffix}`;
|
||||||
|
}
|
||||||
|
const [b, c] = match.slice(1).split(".");
|
||||||
|
const d = parseInt(c.padEnd(2, "0"), 10);
|
||||||
|
let coins = match[0] === "$" ? (d === 1 ? "cent" : "cents") : d === 1 ? "penny" : "pence";
|
||||||
|
return `${b} ${bill}${b === "1" ? "" : "s"} and ${d} ${coins}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Helper function to process decimal numbers
|
||||||
|
* @param {string} match The matched number
|
||||||
|
* @returns {string} The formatted number
|
||||||
|
*/
|
||||||
|
function point_num(match) {
|
||||||
|
let [a, b] = match.split(".");
|
||||||
|
return `${a} point ${b.split("").join(" ")}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Normalize text for phonemization
|
||||||
|
* @param {string} text The text to normalize
|
||||||
|
* @returns {string} The normalized text
|
||||||
|
*/
|
||||||
|
function normalize_text(text) {
|
||||||
|
return (
|
||||||
|
text
|
||||||
|
// 1. Handle quotes and brackets
|
||||||
|
.replace(/[‘’]/g, "'")
|
||||||
|
.replace(/«/g, "“")
|
||||||
|
.replace(/»/g, "”")
|
||||||
|
.replace(/[“”]/g, '"')
|
||||||
|
.replace(/\(/g, "«")
|
||||||
|
.replace(/\)/g, "»")
|
||||||
|
|
||||||
|
// 2. Replace uncommon punctuation marks
|
||||||
|
.replace(/、/g, ", ")
|
||||||
|
.replace(/。/g, ". ")
|
||||||
|
.replace(/!/g, "! ")
|
||||||
|
.replace(/,/g, ", ")
|
||||||
|
.replace(/:/g, ": ")
|
||||||
|
.replace(/;/g, "; ")
|
||||||
|
.replace(/?/g, "? ")
|
||||||
|
|
||||||
|
// 3. Whitespace normalization
|
||||||
|
.replace(/[^\S \n]/g, " ")
|
||||||
|
.replace(/ +/, " ")
|
||||||
|
.replace(/(?<=\n) +(?=\n)/g, "")
|
||||||
|
|
||||||
|
// 4. Abbreviations
|
||||||
|
.replace(/\bD[Rr]\.(?= [A-Z])/g, "Doctor")
|
||||||
|
.replace(/\b(?:Mr\.|MR\.(?= [A-Z]))/g, "Mister")
|
||||||
|
.replace(/\b(?:Ms\.|MS\.(?= [A-Z]))/g, "Miss")
|
||||||
|
.replace(/\b(?:Mrs\.|MRS\.(?= [A-Z]))/g, "Mrs")
|
||||||
|
.replace(/\betc\.(?! [A-Z])/gi, "etc")
|
||||||
|
|
||||||
|
// 5. Normalize casual words
|
||||||
|
.replace(/\b(y)eah?\b/gi, "$1e'a")
|
||||||
|
|
||||||
|
// 5. Handle numbers and currencies
|
||||||
|
.replace(/\d*\.\d+|\b\d{4}s?\b|(?<!:)\b(?:[1-9]|1[0-2]):[0-5]\d\b(?!:)/g, split_num)
|
||||||
|
.replace(/(?<=\d),(?=\d)/g, "")
|
||||||
|
.replace(/[$£]\d+(?:\.\d+)?(?: hundred| thousand| (?:[bm]|tr)illion)*\b|[$£]\d+\.\d\d?\b/gi, flip_money)
|
||||||
|
.replace(/\d*\.\d+/g, point_num)
|
||||||
|
.replace(/(?<=\d)-(?=\d)/g, " to ")
|
||||||
|
.replace(/(?<=\d)S/g, " S")
|
||||||
|
|
||||||
|
// 6. Handle possessives
|
||||||
|
.replace(/(?<=[BCDFGHJ-NP-TV-Z])'?s\b/g, "'S")
|
||||||
|
.replace(/(?<=X')S\b/g, "s")
|
||||||
|
|
||||||
|
// 7. Handle hyphenated words/letters
|
||||||
|
.replace(/(?:[A-Za-z]\.){2,} [a-z]/g, (m) => m.replace(/\./g, "-"))
|
||||||
|
.replace(/(?<=[A-Z])\.(?=[A-Z])/gi, "-")
|
||||||
|
|
||||||
|
// 8. Strip leading and trailing whitespace
|
||||||
|
.trim()
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Escapes regular expression special characters from a string by replacing them with their escaped counterparts.
|
||||||
|
*
|
||||||
|
* @param {string} string The string to escape.
|
||||||
|
* @returns {string} The escaped string.
|
||||||
|
*/
|
||||||
|
function escapeRegExp(string) {
|
||||||
|
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
|
||||||
|
}
|
||||||
|
|
||||||
|
const PUNCTUATION = ';:,.!?¡¿—…"«»“”(){}[]';
|
||||||
|
const PUNCTUATION_PATTERN = new RegExp(`(\\s*[${escapeRegExp(PUNCTUATION)}]+\\s*)+`, "g");
|
||||||
|
|
||||||
|
export async function phonemize(text, language = "a", norm = true) {
|
||||||
|
// 1. Normalize text
|
||||||
|
if (norm) {
|
||||||
|
text = normalize_text(text);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Split into chunks, to ensure we preserve punctuation
|
||||||
|
const sections = split(text, PUNCTUATION_PATTERN);
|
||||||
|
|
||||||
|
// 3. Convert each section to phonemes
|
||||||
|
const lang = language === "a" ? "en-us" : "en";
|
||||||
|
const ps = (await Promise.all(sections.map(async ({ match, text }) => (match ? text : (await espeakng(text, lang)).join(" "))))).join("");
|
||||||
|
|
||||||
|
// 4. Post-process phonemes
|
||||||
|
let processed = ps
|
||||||
|
// https://en.wiktionary.org/wiki/kokoro#English
|
||||||
|
.replace(/kəkˈoːɹoʊ/g, "kˈoʊkəɹoʊ")
|
||||||
|
.replace(/kəkˈɔːɹəʊ/g, "kˈəʊkəɹəʊ")
|
||||||
|
.replace(/ʲ/g, "j")
|
||||||
|
.replace(/r/g, "ɹ")
|
||||||
|
.replace(/x/g, "k")
|
||||||
|
.replace(/ɬ/g, "l")
|
||||||
|
.replace(/(?<=[a-zɹː])(?=hˈʌndɹɪd)/g, " ")
|
||||||
|
.replace(/ z(?=[;:,.!?¡¿—…"«»“” ]|$)/g, "z");
|
||||||
|
|
||||||
|
// 5. Additional post-processing for American English
|
||||||
|
if (language === "a") {
|
||||||
|
processed = processed.replace(/(?<=nˈaɪn)ti(?!ː)/g, "di");
|
||||||
|
}
|
||||||
|
return processed.trim();
|
||||||
|
}
|
||||||
121
kokoro.js/src/voices.js
Normal file
121
kokoro.js/src/voices.js
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
import path from "path";
|
||||||
|
import fs from "fs/promises";
|
||||||
|
|
||||||
|
export const VOICES = Object.freeze({
|
||||||
|
af: {
|
||||||
|
// Default voice is a 50-50 mix of Bella & Sarah
|
||||||
|
name: "Default",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
af_bella: {
|
||||||
|
name: "Bella",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
af_nicole: {
|
||||||
|
name: "Nicole",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
af_sarah: {
|
||||||
|
name: "Sarah",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
af_sky: {
|
||||||
|
name: "Sky",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
am_adam: {
|
||||||
|
name: "Adam",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Male",
|
||||||
|
},
|
||||||
|
am_michael: {
|
||||||
|
name: "Michael",
|
||||||
|
language: "en-us",
|
||||||
|
gender: "Male",
|
||||||
|
},
|
||||||
|
|
||||||
|
bf_emma: {
|
||||||
|
name: "Emma",
|
||||||
|
language: "en-gb",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
bf_isabella: {
|
||||||
|
name: "Isabella",
|
||||||
|
language: "en-gb",
|
||||||
|
gender: "Female",
|
||||||
|
},
|
||||||
|
bm_george: {
|
||||||
|
name: "George",
|
||||||
|
language: "en-gb",
|
||||||
|
gender: "Male",
|
||||||
|
},
|
||||||
|
bm_lewis: {
|
||||||
|
name: "Lewis",
|
||||||
|
language: "en-gb",
|
||||||
|
gender: "Male",
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const VOICE_DATA_URL = "https://huggingface.co/onnx-community/Kokoro-82M-ONNX/resolve/main/voices";
|
||||||
|
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
* @param {keyof typeof VOICES} id
|
||||||
|
* @returns {Promise<ArrayBufferLike>}
|
||||||
|
*/
|
||||||
|
async function getVoiceFile(id) {
|
||||||
|
if (fs?.readFile) {
|
||||||
|
const file = path.resolve(import.meta.dirname ?? __dirname, `../voices/${id}.bin`);
|
||||||
|
const { buffer } = await fs.readFile(file);
|
||||||
|
return buffer;
|
||||||
|
}
|
||||||
|
|
||||||
|
const url = `${VOICE_DATA_URL}/${id}.bin`;
|
||||||
|
|
||||||
|
let cache;
|
||||||
|
try {
|
||||||
|
cache = await caches.open("kokoro-voices");
|
||||||
|
const cachedResponse = await cache.match(url);
|
||||||
|
if (cachedResponse) {
|
||||||
|
return await cachedResponse.arrayBuffer();
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
console.warn("Unable to open cache", e);
|
||||||
|
}
|
||||||
|
|
||||||
|
// No cache, or cache failed to open. Fetch the file.
|
||||||
|
const response = await fetch(url);
|
||||||
|
const buffer = await response.arrayBuffer();
|
||||||
|
|
||||||
|
if (cache) {
|
||||||
|
try {
|
||||||
|
// NOTE: We use `new Response(buffer, ...)` instead of `response.clone()` to handle LFS files
|
||||||
|
await cache.put(
|
||||||
|
url,
|
||||||
|
new Response(buffer, {
|
||||||
|
headers: response.headers,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
} catch (e) {
|
||||||
|
console.warn("Unable to cache file", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer;
|
||||||
|
}
|
||||||
|
|
||||||
|
const VOICE_CACHE = new Map();
|
||||||
|
export async function getVoiceData(voice) {
|
||||||
|
if (VOICE_CACHE.has(voice)) {
|
||||||
|
return VOICE_CACHE.get(voice);
|
||||||
|
}
|
||||||
|
|
||||||
|
const buffer = new Float32Array(await getVoiceFile(voice));
|
||||||
|
VOICE_CACHE.set(voice, buffer);
|
||||||
|
return buffer;
|
||||||
|
}
|
||||||
95
kokoro.js/tests/phonemize.test.js
Normal file
95
kokoro.js/tests/phonemize.test.js
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
import { describe, test, expect } from "vitest";
|
||||||
|
import { phonemize } from "../src/phonemize.js";
|
||||||
|
|
||||||
|
const A_TEST_CASES = new Map([
|
||||||
|
["‘Hello’", "həlˈoʊ"],
|
||||||
|
["‘Test’ and ‘Example’", "tˈɛst ænd ɛɡzˈæmpəl"],
|
||||||
|
["«Bonjour»", '"bɔːnʒˈʊɹ"'],
|
||||||
|
["«Test «nested» quotes»", '"tˈɛst "nˈɛstᵻd" kwˈoʊts"'],
|
||||||
|
["(Hello)", "«həlˈoʊ»"],
|
||||||
|
["(Nested (Parentheses))", "«nˈɛstᵻd «pɚɹˈɛnθəsˌiːz»»"],
|
||||||
|
["こんにちは、世界!", "dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ, tʃˈaɪniːzlˌɛɾɚ tʃˈaɪniːzlˌɛɾɚ!"],
|
||||||
|
["これはテストです:はい?", "dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ: dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ?"],
|
||||||
|
["Hello World", "həlˈoʊ wˈɜːld"],
|
||||||
|
["Hello World", "həlˈoʊ wˈɜːld"],
|
||||||
|
["Hello\n \nWorld", "həlˈoʊ wˈɜːld"],
|
||||||
|
["Dr. Smith", "dˈɑːktɚ smˈɪθ"],
|
||||||
|
["DR. Brown", "dˈɑːktɚ bɹˈaʊn"],
|
||||||
|
["Mr. Smith", "mˈɪstɚ smˈɪθ"],
|
||||||
|
["MR. Anderson", "mˈɪstɚɹ ˈændɚsən"],
|
||||||
|
["Ms. Taylor", "mˈɪs tˈeɪlɚ"],
|
||||||
|
["MS. Carter", "mˈɪs kˈɑːɹɾɚ"],
|
||||||
|
["Mrs. Johnson", "mˈɪsɪz dʒˈɑːnsən"],
|
||||||
|
["MRS. Wilson", "mˈɪsɪz wˈɪlsən"],
|
||||||
|
["Apples, oranges, etc.", "ˈæpəlz, ˈɔɹɪndʒᵻz, ɛtsˈɛtɹə"],
|
||||||
|
["Apples, etc. Pears.", "ˈæpəlz, ɛtsˈɛtɹə. pˈɛɹz."],
|
||||||
|
["Yeah", "jˈɛə"],
|
||||||
|
["yeah", "jˈɛə"],
|
||||||
|
["1990", "nˈaɪntiːn nˈaɪndi"],
|
||||||
|
["12:34", "twˈɛlv θˈɜːɾi fˈoːɹ"],
|
||||||
|
["2022s", "twˈɛnti twˈɛnti tˈuːz"],
|
||||||
|
["1,000", "wˈʌn θˈaʊzənd"],
|
||||||
|
["12,345,678", "twˈɛlv mˈɪliən θɹˈiː hˈʌndɹɪd fˈoːɹɾi fˈaɪv θˈaʊzənd sˈɪks hˈʌndɹɪd sˈɛvənti ˈeɪt"],
|
||||||
|
["$100", "wˈʌn hˈʌndɹɪd dˈɑːlɚz"],
|
||||||
|
["£1.50", "wˈʌn pˈaʊnd ænd fˈɪfti pˈɛns"],
|
||||||
|
["12.34", "twˈɛlv pˈɔɪnt θɹˈiː fˈoːɹ"],
|
||||||
|
["0.01", "zˈiəɹoʊ pˈɔɪnt zˈiəɹoʊ wˈʌn"],
|
||||||
|
["10-20", "tˈɛn tə twˈɛnti"],
|
||||||
|
["5-10", "fˈaɪv tə tˈɛn"],
|
||||||
|
["10S", "tˈɛn ˈɛs"],
|
||||||
|
["5S", "fˈaɪv ˈɛs"],
|
||||||
|
["Cat's tail", "kˈæts tˈeɪl"],
|
||||||
|
["X's mark", "ˈɛksᵻz mˈɑːɹk"],
|
||||||
|
["U.S.A.", "jˈuːˈɛsˈeɪ."],
|
||||||
|
["A.B.C", "ˈeɪbˈiːsˈiː"],
|
||||||
|
]);
|
||||||
|
|
||||||
|
const B_TEST_CASES = new Map([
|
||||||
|
["‘Hello’", "həlˈəʊ"],
|
||||||
|
["‘Test’ and ‘Example’", "tˈɛst and ɛɡzˈampəl"],
|
||||||
|
["«Bonjour»", '"bɔːnʒˈʊə"'],
|
||||||
|
["«Test «nested» quotes»", '"tˈɛst "nˈɛstɪd" kwˈəʊts"'],
|
||||||
|
["(Hello)", "«həlˈəʊ»"],
|
||||||
|
["(Nested (Parentheses))", "«nˈɛstɪd «pəɹˈɛnθəsˌiːz»»"],
|
||||||
|
["こんにちは、世界!", "dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə, tʃˈaɪniːzlˌɛtə tʃˈaɪniːzlˌɛtə!"],
|
||||||
|
["これはテストです:はい?", "dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə: dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə?"],
|
||||||
|
["Hello World", "həlˈəʊ wˈɜːld"],
|
||||||
|
["Hello World", "həlˈəʊ wˈɜːld"],
|
||||||
|
["Hello\n \nWorld", "həlˈəʊ wˈɜːld"],
|
||||||
|
["Dr. Smith", "dˈɒktə smˈɪθ"],
|
||||||
|
["DR. Brown", "dˈɒktə bɹˈaʊn"],
|
||||||
|
["Mr. Smith", "mˈɪstə smˈɪθ"],
|
||||||
|
["MR. Anderson", "mˈɪstəɹ ˈandəsən"],
|
||||||
|
["Ms. Taylor", "mˈɪs tˈeɪlə"],
|
||||||
|
["MS. Carter", "mˈɪs kˈɑːtə"],
|
||||||
|
["Mrs. Johnson", "mˈɪsɪz dʒˈɒnsən"],
|
||||||
|
["Apples, oranges, etc.", "ˈapəlz, ˈɒɹɪndʒɪz, ɛtsˈɛtɹə"],
|
||||||
|
["Apples, etc. Pears.", "ˈapəlz, ɛtsˈɛtɹə. pˈeəz."],
|
||||||
|
["1990", "nˈaɪntiːn nˈaɪnti"],
|
||||||
|
["12:34", "twˈɛlv θˈɜːti fˈɔː"],
|
||||||
|
["1,000", "wˈɒn θˈaʊzənd"],
|
||||||
|
["12,345,678", "twˈɛlv mˈɪliən θɹˈiː hˈʌndɹɪdən fˈɔːti fˈaɪv θˈaʊzənd sˈɪks hˈʌndɹɪdən sˈɛvənti ˈeɪt"],
|
||||||
|
["$100", "wˈɒn hˈʌndɹɪd dˈɒləz"],
|
||||||
|
["£1.50", "wˈɒn pˈaʊnd and fˈɪfti pˈɛns"],
|
||||||
|
["12.34", "twˈɛlv pˈɔɪnt θɹˈiː fˈɔː"],
|
||||||
|
["0.01", "zˈiəɹəʊ pˈɔɪnt zˈiəɹəʊ wˈɒn"],
|
||||||
|
["Cat's tail", "kˈats tˈeɪl"],
|
||||||
|
["X's mark", "ˈɛksɪz mˈɑːk"],
|
||||||
|
]);
|
||||||
|
|
||||||
|
describe("phonemize", () => {
|
||||||
|
describe("en-us", () => {
|
||||||
|
for (const [input, expected] of A_TEST_CASES) {
|
||||||
|
test(`phonemize("${input}")`, async () => {
|
||||||
|
expect(await phonemize(input)).toEqual(expected);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
describe("en-gb", () => {
|
||||||
|
for (const [input, expected] of B_TEST_CASES) {
|
||||||
|
test(`phonemize("${input}")`, async () => {
|
||||||
|
expect(await phonemize(input, "b")).toEqual(expected);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
16
kokoro.js/tsconfig.json
Normal file
16
kokoro.js/tsconfig.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"include": ["src/**/*"],
|
||||||
|
"compilerOptions": {
|
||||||
|
"checkJs": true,
|
||||||
|
"target": "esnext",
|
||||||
|
"module": "nodenext",
|
||||||
|
"moduleResolution": "nodenext",
|
||||||
|
"outDir": "types",
|
||||||
|
"strict": false,
|
||||||
|
"skipLibCheck": true,
|
||||||
|
"declaration": true,
|
||||||
|
"declarationMap": true,
|
||||||
|
"noEmit": false,
|
||||||
|
"emitDeclarationOnly": true
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
kokoro.js/voices/af.bin
Normal file
BIN
kokoro.js/voices/af.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/af_bella.bin
Normal file
BIN
kokoro.js/voices/af_bella.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/af_nicole.bin
Normal file
BIN
kokoro.js/voices/af_nicole.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/af_sarah.bin
Normal file
BIN
kokoro.js/voices/af_sarah.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/af_sky.bin
Normal file
BIN
kokoro.js/voices/af_sky.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/am_adam.bin
Normal file
BIN
kokoro.js/voices/am_adam.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/am_michael.bin
Normal file
BIN
kokoro.js/voices/am_michael.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/bf_emma.bin
Normal file
BIN
kokoro.js/voices/bf_emma.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/bf_isabella.bin
Normal file
BIN
kokoro.js/voices/bf_isabella.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/bm_george.bin
Normal file
BIN
kokoro.js/voices/bm_george.bin
Normal file
Binary file not shown.
BIN
kokoro.js/voices/bm_lewis.bin
Normal file
BIN
kokoro.js/voices/bm_lewis.bin
Normal file
Binary file not shown.
Reference in New Issue
Block a user