Create Kokoro TTS JavaScript library (#3)

* Set up JS project * Finalise JS library * Update README * Fix package.json repository url * Rename package -> `kokoro-js` * Fix samples in README * Cleanup README * Bump `phonemizer` version * Create web demo * Run prettier * Link to model used in demo * Enable multithreading in HF space demo (~40% faster) * Add link to demo in README * Bump to v1.0.1
2025-01-16 19:50:34 +02:00
parent 757c80cc5b
commit 0a1dc5750c
37 changed files with 8820 additions and 0 deletions
--- a/kokoro.js/.gitignore
+++ b/kokoro.js/.gitignore
@@ -0,0 +1,4 @@
+node_modules/
+dist
+types
+LICENSE
--- a/kokoro.js/.prettierignore
+++ b/kokoro.js/.prettierignore
@@ -0,0 +1,2 @@
+dist
+types
--- a/kokoro.js/README.md
+++ b/kokoro.js/README.md
@@ -0,0 +1,55 @@
+# Kokoro TTS
+
+<p align="center">
+    <a href="https://www.npmjs.com/package/kokoro-js"><img alt="NPM" src="https://img.shields.io/npm/v/kokoro-js"></a>
+    <a href="https://www.npmjs.com/package/kokoro-js"><img alt="NPM Downloads" src="https://img.shields.io/npm/dw/kokoro-js"></a>
+    <a href="https://www.jsdelivr.com/package/npm/kokoro-js"><img alt="jsDelivr Hits" src="https://img.shields.io/jsdelivr/npm/hw/kokoro-js"></a>
+    <a href="https://github.com/hexgrad/kokoro/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/hexgrad/kokoro?color=blue"></a>
+    <a href="https://huggingface.co/spaces/webml-community/kokoro-web"><img alt="Demo" src="https://img.shields.io/badge/Hugging_Face-demo-green"></a>
+</p>
+
+Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). This JavaScript library allows the model to be run 100% locally in the browser thanks to [🤗 Transformers.js](https://huggingface.co/docs/transformers.js). Try it out using our [online demo](https://huggingface.co/spaces/webml-community/kokoro-web)!
+
+## Usage
+
+First, install the `kokoro-js` library from [NPM](https://npmjs.com/package/kokoro-js) using:
+
+```bash
+npm i kokoro-js
+```
+
+You can then generate speech as follows:
+
+```js
+import { KokoroTTS } from "kokoro-js";
+
+const model_id = "onnx-community/Kokoro-82M-ONNX";
+const tts = await KokoroTTS.from_pretrained(model_id, {
+  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
+});
+
+const text = "Life is like a box of chocolates. You never know what you're gonna get.";
+const audio = await tts.generate(text, {
+  // Use `tts.list_voices()` to list all available voices
+  voice: "af_bella",
+});
+audio.save("audio.wav");
+```
+
+## Voices/Samples
+
+> Life is like a box of chocolates. You never know what you're gonna get.
+
+| Voice                    | Nationality | Gender | Sample                                                                                                   |
+| ------------------------ | ----------- | ------ | -------------------------------------------------------------------------------------------------------- |
+| Default (`af`)           | American    | Female | <video controls src="https://github.com/user-attachments/assets/c183df83-58a9-4aea-8fdf-225092acec57" /> |
+| Bella (`af_bella`)       | American    | Female | <video controls src="https://github.com/user-attachments/assets/0730fff0-22b3-458f-9675-36d313d872d6" /> |
+| Nicole (`af_nicole`)     | American    | Female | <video controls src="https://github.com/user-attachments/assets/4ce0b3f6-eaec-4e47-901c-9d29e2b60c86" /> |
+| Sarah (`af_sarah`)       | American    | Female | <video controls src="https://github.com/user-attachments/assets/d37dba3f-de59-44c4-bc3d-da91ea1b5a4a" /> |
+| Sky (`af_sky`)           | American    | Female | <video controls src="https://github.com/user-attachments/assets/38230be5-881c-4407-81e6-a0b1e4101565" /> |
+| Adam (`am_adam`)         | American    | Male   | <video controls src="https://github.com/user-attachments/assets/66a4c439-e80b-4c91-8a27-ae094486a2d8" /> |
+| Michael (`am_michael`)   | American    | Male   | <video controls src="https://github.com/user-attachments/assets/79a8879d-b564-4222-b2d5-a97f783ae897" /> |
+| Emma (`bf_emma`)         | British     | Female | <video controls src="https://github.com/user-attachments/assets/ad5eb254-1d84-4282-9d23-371d5765d820" /> |
+| Isabella (`bf_isabella`) | British     | Female | <video controls src="https://github.com/user-attachments/assets/ea7e6825-dad0-403c-9ece-680af04f5a25" /> |
+| George (`bm_george`)     | British     | Male   | <video controls src="https://github.com/user-attachments/assets/e09040aa-578f-40a6-b7fd-76a5b005346c" /> |
+| Lewis (`bm_lewis`)       | British     | Male   | <video controls src="https://github.com/user-attachments/assets/5d7b26bf-8900-4a9a-8ee5-a16c39bb834c" /> |
--- a/kokoro.js/demo/.gitignore
+++ b/kokoro.js/demo/.gitignore
@@ -0,0 +1,24 @@
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+pnpm-debug.log*
+lerna-debug.log*
+
+node_modules
+dist
+dist-ssr
+*.local
+
+# Editor directories and files
+.vscode/*
+!.vscode/extensions.json
+.idea
+.DS_Store
+*.suo
+*.ntvs*
+*.njsproj
+*.sln
+*.sw?
--- a/kokoro.js/demo/README.md
+++ b/kokoro.js/demo/README.md
@@ -0,0 +1,59 @@
+---
+title: Kokoro Text-to-Speech
+emoji: 🗣️
+colorFrom: indigo
+colorTo: purple
+sdk: static
+pinned: false
+license: apache-2.0
+short_description: High-quality speech synthesis powered by Kokoro TTS
+header: mini
+models:
+  - onnx-community/Kokoro-82M-ONNX
+custom_headers:
+  cross-origin-embedder-policy: require-corp
+  cross-origin-opener-policy: same-origin
+  cross-origin-resource-policy: cross-origin
+---
+
+# Kokoro Text-to-Speech
+
+A simple React + Vite application for running [Kokoro](https://github.com/hexgrad/kokoro), a frontier text-to-speech model for its size. The model runs 100% locally in the browser using [kokoro-js](https://www.npmjs.com/package/kokoro-js) and [🤗 Transformers.js](https://www.npmjs.com/package/@huggingface/transformers)!
+
+## Getting Started
+
+Follow the steps below to set up and run the application.
+
+### 1. Clone the Repository
+
+Clone the examples repository from GitHub:
+
+```sh
+git clone https://github.com/hexgrad/kokoro.git
+```
+
+### 2. Navigate to the Project Directory
+
+Change your working directory to the `demo` folder:
+
+```sh
+cd kokoro/kokoro.js/demo
+```
+
+### 3. Install Dependencies
+
+Install the necessary dependencies using npm:
+
+```sh
+npm i
+```
+
+### 4. Run the Development Server
+
+Start the development server:
+
+```sh
+npm run dev
+```
+
+The application should now be running locally. Open your browser and go to `http://localhost:5173` to see it in action.
--- a/kokoro.js/demo/eslint.config.js
+++ b/kokoro.js/demo/eslint.config.js
@@ -0,0 +1,35 @@
+import js from "@eslint/js";
+import globals from "globals";
+import react from "eslint-plugin-react";
+import reactHooks from "eslint-plugin-react-hooks";
+import reactRefresh from "eslint-plugin-react-refresh";
+
+export default [
+  { ignores: ["dist"] },
+  {
+    files: ["**/*.{js,jsx}"],
+    languageOptions: {
+      ecmaVersion: 2020,
+      globals: globals.browser,
+      parserOptions: {
+        ecmaVersion: "latest",
+        ecmaFeatures: { jsx: true },
+        sourceType: "module",
+      },
+    },
+    settings: { react: { version: "18.3" } },
+    plugins: {
+      react,
+      "react-hooks": reactHooks,
+      "react-refresh": reactRefresh,
+    },
+    rules: {
+      ...js.configs.recommended.rules,
+      ...react.configs.recommended.rules,
+      ...react.configs["jsx-runtime"].rules,
+      ...reactHooks.configs.recommended.rules,
+      "react/jsx-no-target-blank": "off",
+      "react-refresh/only-export-components": ["warn", { allowConstantExport: true }],
+    },
+  },
+];
--- a/kokoro.js/demo/index.html
+++ b/kokoro.js/demo/index.html
@@ -0,0 +1,13 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/hf-logo.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Kokoro Text-to-Speech</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.jsx"></script>
+  </body>
+</html>
--- a/kokoro.js/demo/package-lock.json
+++ b/kokoro.js/demo/package-lock.json
--- a/kokoro.js/demo/package.json
+++ b/kokoro.js/demo/package.json
@@ -0,0 +1,33 @@
+{
+  "name": "kokoro-web",
+  "private": true,
+  "version": "0.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "vite build",
+    "lint": "eslint .",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "kokoro-js": "file:..",
+    "motion": "^11.12.0",
+    "react": "^18.3.1",
+    "react-dom": "^18.3.1"
+  },
+  "devDependencies": {
+    "@eslint/js": "^9.15.0",
+    "@types/react": "^18.3.12",
+    "@types/react-dom": "^18.3.1",
+    "@vitejs/plugin-react": "^4.3.4",
+    "autoprefixer": "^10.4.20",
+    "eslint": "^9.15.0",
+    "eslint-plugin-react": "^7.37.2",
+    "eslint-plugin-react-hooks": "^5.0.0",
+    "eslint-plugin-react-refresh": "^0.4.14",
+    "globals": "^15.12.0",
+    "postcss": "^8.4.49",
+    "tailwindcss": "^3.4.15",
+    "vite": "^6.0.1"
+  }
+}
--- a/kokoro.js/demo/postcss.config.js
+++ b/kokoro.js/demo/postcss.config.js
@@ -0,0 +1,6 @@
+export default {
+  plugins: {
+    tailwindcss: {},
+    autoprefixer: {},
+  },
+};
--- a/kokoro.js/demo/public/hf-logo.svg
+++ b/kokoro.js/demo/public/hf-logo.svg
--- a/kokoro.js/demo/public/wave.svg
+++ b/kokoro.js/demo/public/wave.svg
@@ -0,0 +1,9 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="198">
+  <defs>
+    <linearGradient id="a" x1="50%" x2="50%" y1="-10.959%" y2="100%">
+      <stop stop-color="#57BBC1" stop-opacity=".25" offset="0%"/>
+      <stop stop-color="#015871" offset="100%"/>
+    </linearGradient>
+  </defs>
+  <path fill="url(#a)" fill-rule="evenodd" d="M.005 121C311 121 409.898-.25 811 0c400 0 500 121 789 121v77H0s.005-48 .005-77z" transform="matrix(-1 0 0 1 1600 0)"/>
+</svg>
--- a/kokoro.js/demo/src/App.jsx
+++ b/kokoro.js/demo/src/App.jsx
@@ -0,0 +1,144 @@
+import { useRef, useState, useEffect } from "react";
+import { motion } from "motion/react";
+
+export default function App() {
+  // Create a reference to the worker object.
+  const worker = useRef(null);
+
+  const [inputText, setInputText] = useState("Life is like a box of chocolates. You never know what you're gonna get.");
+  const [selectedSpeaker, setSelectedSpeaker] = useState("af");
+
+  const [status, setStatus] = useState(null);
+  const [error, setError] = useState(null);
+  const [loadingMessage, setLoadingMessage] = useState("Loading model (only downloaded once)...");
+
+  const [results, setResults] = useState([]);
+
+  // We use the `useEffect` hook to setup the worker as soon as the `App` component is mounted.
+  useEffect(() => {
+    // Create the worker if it does not yet exist.
+    worker.current ??= new Worker(new URL("./worker.js", import.meta.url), {
+      type: "module",
+    });
+
+    // Create a callback function for messages from the worker thread.
+    const onMessageReceived = (e) => {
+      switch (e.data.status) {
+        // TODO: WebGPU feature checking
+        // case "feature-success":
+        //   break;
+
+        // case "feature-error":
+        //   setError(e.data.data);
+        //   break;
+
+        case "ready":
+          setStatus("ready");
+          break;
+
+        case "complete":
+          const { audio, text } = e.data;
+          // Generation complete: re-enable the "Generate" button
+          setResults((prev) => [{ text, src: audio }, ...prev]);
+          setStatus("ready");
+          break;
+      }
+    };
+
+    const onErrorReceived = (e) => {
+      console.error("Worker error:", e);
+    };
+
+    // Attach the callback function as an event listener.
+    worker.current.addEventListener("message", onMessageReceived);
+    worker.current.addEventListener("error", onErrorReceived);
+
+    // Define a cleanup function for when the component is unmounted.
+    return () => {
+      worker.current.removeEventListener("message", onMessageReceived);
+      worker.current.removeEventListener("error", onErrorReceived);
+    };
+  }, []);
+
+  const handleSubmit = (e) => {
+    e.preventDefault();
+    setStatus("running");
+
+    worker.current.postMessage({
+      type: "generate",
+      text: inputText.trim(),
+      voice: selectedSpeaker,
+    });
+  };
+
+  return (
+    <div className="relative w-full min-h-screen bg-gradient-to-br from-gray-900 to-gray-700 flex flex-col items-center justify-center p-4 relative overflow-hidden font-sans">
+      <motion.div initial={{ opacity: 1 }} animate={{ opacity: status === null ? 1 : 0 }} transition={{ duration: 0.5 }} className="absolute w-screen h-screen justify-center flex flex-col items-center z-10 bg-gray-800/95 backdrop-blur-md" style={{ pointerEvents: status === null ? "auto" : "none" }}>
+        <div className="w-[250px] h-[250px] border-4 border-white shadow-[0_0_0_5px_#4973ff] rounded-full overflow-hidden">
+          <div className="loading-wave"></div>
+        </div>
+        <p className={`text-3xl my-5 text-center ${error ? "text-red-500" : "text-white"}`}>{error ?? loadingMessage}</p>
+      </motion.div>
+
+      <div className="max-w-3xl w-full space-y-8 relative z-[2]">
+        <div className="text-center">
+          <h1 className="text-5xl font-extrabold text-gray-100 mb-2 drop-shadow-lg font-heading">Kokoro Text-to-Speech</h1>
+          <p className="text-2xl text-gray-300 font-semibold font-subheading">
+            Powered by&nbsp;
+            <a href="https://github.com/hexgrad/kokoro" target="_blank" rel="noreferrer" className="underline">
+              Kokoro
+            </a>
+            &nbsp;and&nbsp;
+            <a href="https://huggingface.co/docs/transformers.js" target="_blank" rel="noreferrer" className="underline">
+              <img width="40" src="hf-logo.svg" className="inline translate-y-[-2px] me-1"></img>Transformers.js
+            </a>
+          </p>
+        </div>
+        <div className="bg-gray-800/50 backdrop-blur-sm border border-gray-700 rounded-lg p-6">
+          <form onSubmit={handleSubmit} className="space-y-4">
+            <textarea placeholder="Enter text..." value={inputText} onChange={(e) => setInputText(e.target.value)} className="w-full min-h-[100px] max-h-[300px] bg-gray-700/50 backdrop-blur-sm border-2 border-gray-600 rounded-xl resize-y text-gray-100 placeholder-gray-400 px-3 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent" rows={Math.min(8, inputText.split("\n").length)} />
+            <div className="flex flex-col items-center space-y-4">
+              <select value={selectedSpeaker} onChange={(e) => setSelectedSpeaker(e.target.value)} className="w-full bg-gray-700/50 backdrop-blur-sm border-2 border-gray-600 rounded-xl text-gray-100 px-3 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent">
+                <option value="af">Default (American Female)</option>
+                <option value="af_bella">Bella (American Female)</option>
+                <option value="af_nicole">Nicole (American Female)</option>
+                <option value="af_sarah">Sarah (American Female)</option>
+                <option value="af_sky">Sky (American Female)</option>
+                <option value="am_adam">Adam (American Male)</option>
+                <option value="am_michael">Michael (American Male)</option>
+                <option value="bf_emma">Emma (British Female)</option>
+                <option value="bf_isabella">Isabella (British Female)</option>
+                <option value="bm_george">George (British Male)</option>
+                <option value="bm_lewis">Lewis (British Male)</option>
+              </select>
+              <button type="submit" className="inline-flex justify-center items-center px-6 py-2 text-lg font-semibold bg-gradient-to-t from-blue-600 to-purple-600 hover:from-blue-700 hover:to-purple-700 transition-colors duration-300 rounded-xl text-white disabled:opacity-50" disabled={status === "running" || inputText.trim() === ""}>
+                {status === "running" ? "Generating..." : "Generate"}
+              </button>
+            </div>
+          </form>
+        </div>
+
+        {results.length > 0 && (
+          <motion.div initial={{ y: 50, opacity: 0 }} animate={{ y: 0, opacity: 1 }} transition={{ duration: 0.5 }} className="max-h-[250px] overflow-y-auto px-2 mt-4 space-y-6 relative z-[2]">
+            {results.map((result, i) => (
+              <div key={i}>
+                <div className="text-white bg-gray-800/70 backdrop-blur-sm border border-gray-700 rounded-lg p-4 z-10">
+                  <span className="absolute right-5 font-bold">#{results.length - i}</span>
+                  <p className="mb-3 max-w-[95%]">{result.text}</p>
+                  <audio controls src={result.src} className="w-full">
+                    Your browser does not support the audio element.
+                  </audio>
+                </div>
+              </div>
+            ))}
+          </motion.div>
+        )}
+      </div>
+
+      <div className="bg-[#015871] pointer-events-none absolute left-0 w-full h-[5%] bottom-[-50px]">
+        <div className="wave"></div>
+        <div className="wave"></div>
+      </div>
+    </div>
+  );
+}
--- a/kokoro.js/demo/src/index.css
+++ b/kokoro.js/demo/src/index.css
@@ -0,0 +1,100 @@
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+
+/*
+ * Wave animations adapted from the following two demos:
+ * - https://codepen.io/upasanaasopa/pen/poObEWZ
+ * - https://codepen.io/breakstorm00/pen/qBJZQNB
+ */
+
+*,
+*:before,
+*:after {
+  margin: 0;
+  padding: 0;
+  box-sizing: border-box;
+}
+
+.loading-wave {
+  position: relative;
+  top: 0;
+  width: 100%;
+  height: 100%;
+  background: #2c74b3;
+  border-radius: 50%;
+  box-shadow: inset 0 0 50px 0 rgba(0, 0, 0, 0.5);
+}
+
+.loading-wave:before,
+.loading-wave:after {
+  content: "";
+  position: absolute;
+  top: 0;
+  left: 50%;
+  width: 200%;
+  height: 200%;
+  background: black;
+  transform: translate(-50%, -75%);
+}
+
+.loading-wave:before {
+  border-radius: 45%;
+  background: rgba(255, 255, 255, 1);
+  animation: animate 5s linear infinite;
+}
+
+.loading-wave:after {
+  border-radius: 40%;
+  background: rgba(255, 255, 255, 0.5);
+  animation: animate 10s linear infinite;
+}
+
+.wave {
+  background: url(/wave.svg) repeat-x;
+  position: absolute;
+  top: -198px;
+  width: 6400px;
+  height: 198px;
+  animation: wave 7s cubic-bezier(0.36, 0.45, 0.63, 0.53) infinite;
+  transform: translate3d(0, 0, 0);
+}
+
+.wave:nth-of-type(2) {
+  top: -175px;
+  animation:
+    wave 7s cubic-bezier(0.36, 0.45, 0.63, 0.53) -0.125s infinite,
+    swell 7s ease -1.25s infinite;
+  opacity: 1;
+}
+
+@keyframes wave {
+  0% {
+    margin-left: 0;
+  }
+
+  100% {
+    margin-left: -1600px;
+  }
+}
+
+@keyframes swell {
+  0%,
+  100% {
+    transform: translate3d(0, -25px, 0);
+  }
+
+  50% {
+    transform: translate3d(0, 5px, 0);
+  }
+}
+
+@keyframes animate {
+  0% {
+    transform: translate(-50%, -75%) rotate(0deg);
+  }
+
+  100% {
+    transform: translate(-50%, -75%) rotate(360deg);
+  }
+}
--- a/kokoro.js/demo/src/main.jsx
+++ b/kokoro.js/demo/src/main.jsx
@@ -0,0 +1,10 @@
+import { StrictMode } from "react";
+import { createRoot } from "react-dom/client";
+import "./index.css";
+import App from "./App.jsx";
+
+createRoot(document.getElementById("root")).render(
+  <StrictMode>
+    <App />
+  </StrictMode>,
+);
--- a/kokoro.js/demo/src/worker.js
+++ b/kokoro.js/demo/src/worker.js
@@ -0,0 +1,20 @@
+import { KokoroTTS } from "kokoro-js";
+
+const model_id = "onnx-community/Kokoro-82M-ONNX";
+const tts = await KokoroTTS.from_pretrained(model_id, {
+  dtype: "q8", // Options: "fp32", "fp16", "q8", "q4", "q4f16"
+});
+
+self.postMessage({ status: "ready" });
+
+// Listen for messages from the main thread
+self.addEventListener("message", async (e) => {
+  const { text, voice } = e.data;
+
+  // Generate speech
+  const audio = await tts.generate(text, { voice });
+
+  // Send the audio file back to the main thread
+  const blob = audio.toBlob();
+  self.postMessage({ status: "complete", audio: URL.createObjectURL(blob), text });
+});
--- a/kokoro.js/demo/tailwind.config.js
+++ b/kokoro.js/demo/tailwind.config.js
@@ -0,0 +1,8 @@
+/** @type {import('tailwindcss').Config} */
+export default {
+  content: ["./index.html", "./src/**/*.{js,ts,jsx,tsx}"],
+  theme: {
+    extend: {},
+  },
+  plugins: [],
+};
--- a/kokoro.js/demo/vite.config.js
+++ b/kokoro.js/demo/vite.config.js
@@ -0,0 +1,12 @@
+import { defineConfig } from "vite";
+import react from "@vitejs/plugin-react";
+
+// https://vite.dev/config/
+export default defineConfig({
+  plugins: [react()],
+  worker: { format: "es" },
+  build: {
+    target: "esnext",
+  },
+  logLevel: process.env.NODE_ENV === "development" ? "error" : "info",
+});
--- a/kokoro.js/package-lock.json
+++ b/kokoro.js/package-lock.json
--- a/kokoro.js/package.json
+++ b/kokoro.js/package.json
@@ -0,0 +1,65 @@
+{
+  "name": "kokoro-js",
+  "version": "1.0.1",
+  "type": "module",
+  "exports": {
+    "types": "./types/kokoro.d.ts",
+    "node": {
+      "import": "./dist/kokoro.js",
+      "require": "./dist/kokoro.cjs"
+    },
+    "default": "./dist/kokoro.web.js"
+  },
+  "scripts": {
+    "build": "rm -rf dist types && rollup -c && tsc && cp ../LICENSE LICENSE",
+    "format": "prettier --write . --print-width 1000",
+    "test": "vitest"
+  },
+  "keywords": [
+    "kokoro",
+    "tts",
+    "text-to-speech"
+  ],
+  "author": {
+    "name": "hexgrad",
+    "email": "hello@hexgrad.com"
+  },
+  "browser": {
+    "path": false,
+    "fs/promises": false
+  },
+  "contributors": [
+    "Xenova"
+  ],
+  "license": "Apache-2.0",
+  "description": "High-quality text-to-speech for the web",
+  "dependencies": {
+    "@huggingface/transformers": "^3.3.1",
+    "phonemizer": "^1.2.1"
+  },
+  "devDependencies": {
+    "@rollup/plugin-node-resolve": "^16.0.0",
+    "@rollup/plugin-terser": "^0.4.4",
+    "prettier": "3.4.2",
+    "rollup": "^4.30.1",
+    "typescript": "^5.7.3",
+    "vitest": "^2.1.8"
+  },
+  "files": [
+    "types",
+    "dist",
+    "voices",
+    "README.md",
+    "LICENSE"
+  ],
+  "homepage": "https://github.com/hexgrad/kokoro",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/hexgrad/kokoro.git"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "jsdelivr": "./dist/kokoro.web.js",
+  "unpkg": "./dist/kokoro.web.js"
+}
--- a/kokoro.js/rollup.config.js
+++ b/kokoro.js/rollup.config.js
@@ -0,0 +1,42 @@
+import terser from "@rollup/plugin-terser";
+import { nodeResolve } from "@rollup/plugin-node-resolve";
+
+const plugins = (browser) => [nodeResolve({ browser }), terser({ format: { comments: false } })];
+
+const OUTPUT_CONFIGS = [
+  // Node versions
+  {
+    file: "./dist/kokoro.cjs",
+    format: "cjs",
+  },
+  {
+    file: "./dist/kokoro.js",
+    format: "esm",
+  },
+
+  // Web version
+  {
+    file: "./dist/kokoro.web.js",
+    format: "esm",
+  },
+];
+
+const WEB_SPECIFIC_CONFIG = {
+  onwarn: (warning, warn) => {
+    if (!warning.message.includes("@huggingface/transformers")) warn(warning);
+  },
+};
+
+const NODE_SPECIFIC_CONFIG = {
+  external: ["@huggingface/transformers", "phonemizer"],
+};
+
+export default OUTPUT_CONFIGS.map((output) => {
+  const web = output.file.endsWith(".web.js");
+  return {
+    input: "./src/kokoro.js",
+    output,
+    plugins: plugins(web),
+    ...(web ? WEB_SPECIFIC_CONFIG : NODE_SPECIFIC_CONFIG),
+  };
+});
--- a/kokoro.js/src/kokoro.js
+++ b/kokoro.js/src/kokoro.js
@@ -0,0 +1,90 @@
+import { StyleTextToSpeech2Model, AutoTokenizer, Tensor, RawAudio } from "@huggingface/transformers";
+import { phonemize } from "./phonemize.js";
+import { getVoiceData, VOICES } from "./voices.js";
+
+const STYLE_DIM = 256;
+const SAMPLE_RATE = 24000;
+
+export class KokoroTTS {
+  /**
+   * Create a new KokoroTTS instance.
+   * @param {import('@huggingface/transformers').StyleTextToSpeech2Model} model The model
+   * @param {import('@huggingface/transformers').PreTrainedTokenizer} tokenizer The tokenizer
+   */
+  constructor(model, tokenizer) {
+    this.model = model;
+    this.tokenizer = tokenizer;
+  }
+
+  /**
+   * Load a KokoroTTS model from the Hugging Face Hub.
+   * @param {string} model_id The model id
+   * @param {Object} options Additional options
+   * @param {"fp32"|"fp16"|"q8"|"q4"|"q4f16"} [options.dtype="fp32"] The data type to use.
+   * @param {"wasm"|"webgpu"|"cpu"|null} [options.device=null] The device to run the model on.
+   * @param {import("@huggingface/transformers").ProgressCallback} [options.progress_callback=null] A callback function that is called with progress information.
+   * @returns {Promise<KokoroTTS>} The loaded model
+   */
+  static async from_pretrained(model_id, { dtype = "fp32", device = null, progress_callback = null } = {}) {
+    const model = StyleTextToSpeech2Model.from_pretrained(model_id, { progress_callback, dtype, device });
+    const tokenizer = AutoTokenizer.from_pretrained(model_id, { progress_callback });
+
+    const info = await Promise.all([model, tokenizer]);
+    return new KokoroTTS(...info);
+  }
+
+  get voices() {
+    return VOICES;
+  }
+
+  list_voices() {
+    console.table(VOICES);
+  }
+
+  /**
+   * Generate audio from text.
+   *
+   * Note: The model will be loaded on the first call, and subsequent calls will use the same model.
+   * @param {string} text The input text
+   * @param {Object} options Additional options
+   * @param {keyof typeof VOICES} [options.voice="af"] The voice style to use
+   * @param {number} [options.speed=1] The speaking speed
+   * @returns {Promise<RawAudio>} The generated audio
+   */
+  async generate(text, { voice = "af", speed = 1 } = {}) {
+    if (!VOICES.hasOwnProperty(voice)) {
+      console.error(`Voice "${voice}" not found. Available voices:`);
+      console.table(VOICES);
+      throw new Error(`Voice "${voice}" not found. Should be one of: ${Object.keys(VOICES).join(", ")}.`);
+    }
+
+    const language = voice.at(0); // "a" or "b"
+    const phonemes = await phonemize(text, language);
+    const { input_ids } = this.tokenizer(phonemes, {
+      truncation: true,
+    });
+
+    // Select voice style based on number of input tokens
+    const num_tokens = Math.max(
+      input_ids.dims.at(-1) - 2, // Without padding;
+      0,
+    );
+
+    // Load voice style
+    const data = await getVoiceData(voice);
+    const offset = num_tokens * STYLE_DIM;
+    const voiceData = data.slice(offset, offset + STYLE_DIM);
+
+    // Prepare model inputs
+    const inputs = {
+      input_ids,
+      style: new Tensor("float32", voiceData, [1, STYLE_DIM]),
+      speed: new Tensor("float32", [speed], [1]),
+    };
+
+    // Generate audio
+    const { waveform } = await this.model(inputs);
+
+    return new RawAudio(waveform.data, SAMPLE_RATE);
+  }
+}
--- a/kokoro.js/src/phonemize.js
+++ b/kokoro.js/src/phonemize.js
@@ -0,0 +1,197 @@
+import { phonemize as espeakng } from "phonemizer";
+
+/**
+ * Helper function to split a string on a regex, but keep the delimiters.
+ * This is required, because the JavaScript `.split()` method does not keep the delimiters,
+ * and wrapping in a capturing group causes issues with existing capturing groups (due to nesting).
+ * @param {string} text The text to split.
+ * @param {RegExp} regex The regex to split on.
+ * @returns {{match: boolean; text: string}[]} The split string.
+ */
+function split(text, regex) {
+  const result = [];
+  let prev = 0;
+  for (const match of text.matchAll(regex)) {
+    const fullMatch = match[0];
+    if (prev < match.index) {
+      result.push({ match: false, text: text.slice(prev, match.index) });
+    }
+    if (fullMatch.length > 0) {
+      result.push({ match: true, text: fullMatch });
+    }
+    prev = match.index + fullMatch.length;
+  }
+  if (prev < text.length) {
+    result.push({ match: false, text: text.slice(prev) });
+  }
+  return result;
+}
+
+/**
+ * Helper function to split numbers into phonetic equivalents
+ * @param {string} match The matched number
+ * @returns {string} The phonetic equivalent
+ */
+function split_num(match) {
+  if (match.includes(".")) {
+    return match;
+  } else if (match.includes(":")) {
+    let [h, m] = match.split(":").map(Number);
+    if (m === 0) {
+      return `${h} o'clock`;
+    } else if (m < 10) {
+      return `${h} oh ${m}`;
+    }
+    return `${h} ${m}`;
+  }
+  let year = parseInt(match.slice(0, 4), 10);
+  if (year < 1100 || year % 1000 < 10) {
+    return match;
+  }
+  let left = match.slice(0, 2);
+  let right = parseInt(match.slice(2, 4), 10);
+  let suffix = match.endsWith("s") ? "s" : "";
+  if (year % 1000 >= 100 && year % 1000 <= 999) {
+    if (right === 0) {
+      return `${left} hundred${suffix}`;
+    } else if (right < 10) {
+      return `${left} oh ${right}${suffix}`;
+    }
+  }
+  return `${left} ${right}${suffix}`;
+}
+
+/**
+ * Helper function to format monetary values
+ * @param {string} match The matched currency
+ * @returns {string} The formatted currency
+ */
+function flip_money(match) {
+  const bill = match[0] === "$" ? "dollar" : "pound";
+  if (isNaN(Number(match.slice(1)))) {
+    return `${match.slice(1)} ${bill}s`;
+  } else if (!match.includes(".")) {
+    let suffix = match.slice(1) === "1" ? "" : "s";
+    return `${match.slice(1)} ${bill}${suffix}`;
+  }
+  const [b, c] = match.slice(1).split(".");
+  const d = parseInt(c.padEnd(2, "0"), 10);
+  let coins = match[0] === "$" ? (d === 1 ? "cent" : "cents") : d === 1 ? "penny" : "pence";
+  return `${b} ${bill}${b === "1" ? "" : "s"} and ${d} ${coins}`;
+}
+
+/**
+ * Helper function to process decimal numbers
+ * @param {string} match The matched number
+ * @returns {string} The formatted number
+ */
+function point_num(match) {
+  let [a, b] = match.split(".");
+  return `${a} point ${b.split("").join(" ")}`;
+}
+
+/**
+ * Normalize text for phonemization
+ * @param {string} text The text to normalize
+ * @returns {string} The normalized text
+ */
+function normalize_text(text) {
+  return (
+    text
+      // 1. Handle quotes and brackets
+      .replace(/[‘’]/g, "'")
+      .replace(/«/g, "“")
+      .replace(/»/g, "”")
+      .replace(/[“”]/g, '"')
+      .replace(/\(/g, "«")
+      .replace(/\)/g, "»")
+
+      // 2. Replace uncommon punctuation marks
+      .replace(/、/g, ", ")
+      .replace(/。/g, ". ")
+      .replace(/！/g, "! ")
+      .replace(/，/g, ", ")
+      .replace(/：/g, ": ")
+      .replace(/；/g, "; ")
+      .replace(/？/g, "? ")
+
+      // 3. Whitespace normalization
+      .replace(/[^\S \n]/g, " ")
+      .replace(/  +/, " ")
+      .replace(/(?<=\n) +(?=\n)/g, "")
+
+      // 4. Abbreviations
+      .replace(/\bD[Rr]\.(?= [A-Z])/g, "Doctor")
+      .replace(/\b(?:Mr\.|MR\.(?= [A-Z]))/g, "Mister")
+      .replace(/\b(?:Ms\.|MS\.(?= [A-Z]))/g, "Miss")
+      .replace(/\b(?:Mrs\.|MRS\.(?= [A-Z]))/g, "Mrs")
+      .replace(/\betc\.(?! [A-Z])/gi, "etc")
+
+      // 5. Normalize casual words
+      .replace(/\b(y)eah?\b/gi, "$1e'a")
+
+      // 5. Handle numbers and currencies
+      .replace(/\d*\.\d+|\b\d{4}s?\b|(?<!:)\b(?:[1-9]|1[0-2]):[0-5]\d\b(?!:)/g, split_num)
+      .replace(/(?<=\d),(?=\d)/g, "")
+      .replace(/[$£]\d+(?:\.\d+)?(?: hundred| thousand| (?:[bm]|tr)illion)*\b|[$£]\d+\.\d\d?\b/gi, flip_money)
+      .replace(/\d*\.\d+/g, point_num)
+      .replace(/(?<=\d)-(?=\d)/g, " to ")
+      .replace(/(?<=\d)S/g, " S")
+
+      // 6. Handle possessives
+      .replace(/(?<=[BCDFGHJ-NP-TV-Z])'?s\b/g, "'S")
+      .replace(/(?<=X')S\b/g, "s")
+
+      // 7. Handle hyphenated words/letters
+      .replace(/(?:[A-Za-z]\.){2,} [a-z]/g, (m) => m.replace(/\./g, "-"))
+      .replace(/(?<=[A-Z])\.(?=[A-Z])/gi, "-")
+
+      // 8. Strip leading and trailing whitespace
+      .trim()
+  );
+}
+
+/**
+ * Escapes regular expression special characters from a string by replacing them with their escaped counterparts.
+ *
+ * @param {string} string The string to escape.
+ * @returns {string} The escaped string.
+ */
+function escapeRegExp(string) {
+  return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
+}
+
+const PUNCTUATION = ';:,.!?¡¿—…"«»“”(){}[]';
+const PUNCTUATION_PATTERN = new RegExp(`(\\s*[${escapeRegExp(PUNCTUATION)}]+\\s*)+`, "g");
+
+export async function phonemize(text, language = "a", norm = true) {
+  // 1. Normalize text
+  if (norm) {
+    text = normalize_text(text);
+  }
+
+  // 2. Split into chunks, to ensure we preserve punctuation
+  const sections = split(text, PUNCTUATION_PATTERN);
+
+  // 3. Convert each section to phonemes
+  const lang = language === "a" ? "en-us" : "en";
+  const ps = (await Promise.all(sections.map(async ({ match, text }) => (match ? text : (await espeakng(text, lang)).join(" "))))).join("");
+
+  // 4. Post-process phonemes
+  let processed = ps
+    // https://en.wiktionary.org/wiki/kokoro#English
+    .replace(/kəkˈoːɹoʊ/g, "kˈoʊkəɹoʊ")
+    .replace(/kəkˈɔːɹəʊ/g, "kˈəʊkəɹəʊ")
+    .replace(/ʲ/g, "j")
+    .replace(/r/g, "ɹ")
+    .replace(/x/g, "k")
+    .replace(/ɬ/g, "l")
+    .replace(/(?<=[a-zɹː])(?=hˈʌndɹɪd)/g, " ")
+    .replace(/ z(?=[;:,.!?¡¿—…"«»“” ]|$)/g, "z");
+
+  // 5. Additional post-processing for American English
+  if (language === "a") {
+    processed = processed.replace(/(?<=nˈaɪn)ti(?!ː)/g, "di");
+  }
+  return processed.trim();
+}
--- a/kokoro.js/src/voices.js
+++ b/kokoro.js/src/voices.js
@@ -0,0 +1,121 @@
+import path from "path";
+import fs from "fs/promises";
+
+export const VOICES = Object.freeze({
+  af: {
+    // Default voice is a 50-50 mix of Bella & Sarah
+    name: "Default",
+    language: "en-us",
+    gender: "Female",
+  },
+  af_bella: {
+    name: "Bella",
+    language: "en-us",
+    gender: "Female",
+  },
+  af_nicole: {
+    name: "Nicole",
+    language: "en-us",
+    gender: "Female",
+  },
+  af_sarah: {
+    name: "Sarah",
+    language: "en-us",
+    gender: "Female",
+  },
+  af_sky: {
+    name: "Sky",
+    language: "en-us",
+    gender: "Female",
+  },
+  am_adam: {
+    name: "Adam",
+    language: "en-us",
+    gender: "Male",
+  },
+  am_michael: {
+    name: "Michael",
+    language: "en-us",
+    gender: "Male",
+  },
+
+  bf_emma: {
+    name: "Emma",
+    language: "en-gb",
+    gender: "Female",
+  },
+  bf_isabella: {
+    name: "Isabella",
+    language: "en-gb",
+    gender: "Female",
+  },
+  bm_george: {
+    name: "George",
+    language: "en-gb",
+    gender: "Male",
+  },
+  bm_lewis: {
+    name: "Lewis",
+    language: "en-gb",
+    gender: "Male",
+  },
+});
+
+const VOICE_DATA_URL = "https://huggingface.co/onnx-community/Kokoro-82M-ONNX/resolve/main/voices";
+
+/**
+ *
+ * @param {keyof typeof VOICES} id
+ * @returns {Promise<ArrayBufferLike>}
+ */
+async function getVoiceFile(id) {
+  if (fs?.readFile) {
+    const file = path.resolve(import.meta.dirname ?? __dirname, `../voices/${id}.bin`);
+    const { buffer } = await fs.readFile(file);
+    return buffer;
+  }
+
+  const url = `${VOICE_DATA_URL}/${id}.bin`;
+
+  let cache;
+  try {
+    cache = await caches.open("kokoro-voices");
+    const cachedResponse = await cache.match(url);
+    if (cachedResponse) {
+      return await cachedResponse.arrayBuffer();
+    }
+  } catch (e) {
+    console.warn("Unable to open cache", e);
+  }
+
+  // No cache, or cache failed to open. Fetch the file.
+  const response = await fetch(url);
+  const buffer = await response.arrayBuffer();
+
+  if (cache) {
+    try {
+      // NOTE: We use `new Response(buffer, ...)` instead of `response.clone()` to handle LFS files
+      await cache.put(
+        url,
+        new Response(buffer, {
+          headers: response.headers,
+        }),
+      );
+    } catch (e) {
+      console.warn("Unable to cache file", e);
+    }
+  }
+
+  return buffer;
+}
+
+const VOICE_CACHE = new Map();
+export async function getVoiceData(voice) {
+  if (VOICE_CACHE.has(voice)) {
+    return VOICE_CACHE.get(voice);
+  }
+
+  const buffer = new Float32Array(await getVoiceFile(voice));
+  VOICE_CACHE.set(voice, buffer);
+  return buffer;
+}
--- a/kokoro.js/tests/phonemize.test.js
+++ b/kokoro.js/tests/phonemize.test.js
@@ -0,0 +1,95 @@
+import { describe, test, expect } from "vitest";
+import { phonemize } from "../src/phonemize.js";
+
+const A_TEST_CASES = new Map([
+  ["‘Hello’", "həlˈoʊ"],
+  ["‘Test’ and ‘Example’", "tˈɛst ænd ɛɡzˈæmpəl"],
+  ["«Bonjour»", '"bɔːnʒˈʊɹ"'],
+  ["«Test «nested» quotes»", '"tˈɛst "nˈɛstᵻd" kwˈoʊts"'],
+  ["(Hello)", "«həlˈoʊ»"],
+  ["(Nested (Parentheses))", "«nˈɛstᵻd «pɚɹˈɛnθəsˌiːz»»"],
+  ["こんにちは、世界！", "dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ, tʃˈaɪniːzlˌɛɾɚ tʃˈaɪniːzlˌɛɾɚ!"],
+  ["これはテストです：はい？", "dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ: dʒˈæpəniːzlˌɛɾɚ dʒˈæpəniːzlˌɛɾɚ?"],
+  ["Hello World", "həlˈoʊ wˈɜːld"],
+  ["Hello   World", "həlˈoʊ wˈɜːld"],
+  ["Hello\n   \nWorld", "həlˈoʊ wˈɜːld"],
+  ["Dr. Smith", "dˈɑːktɚ smˈɪθ"],
+  ["DR. Brown", "dˈɑːktɚ bɹˈaʊn"],
+  ["Mr. Smith", "mˈɪstɚ smˈɪθ"],
+  ["MR. Anderson", "mˈɪstɚɹ ˈændɚsən"],
+  ["Ms. Taylor", "mˈɪs tˈeɪlɚ"],
+  ["MS. Carter", "mˈɪs kˈɑːɹɾɚ"],
+  ["Mrs. Johnson", "mˈɪsɪz dʒˈɑːnsən"],
+  ["MRS. Wilson", "mˈɪsɪz wˈɪlsən"],
+  ["Apples, oranges, etc.", "ˈæpəlz, ˈɔɹɪndʒᵻz, ɛtsˈɛtɹə"],
+  ["Apples, etc. Pears.", "ˈæpəlz, ɛtsˈɛtɹə. pˈɛɹz."],
+  ["Yeah", "jˈɛə"],
+  ["yeah", "jˈɛə"],
+  ["1990", "nˈaɪntiːn nˈaɪndi"],
+  ["12:34", "twˈɛlv θˈɜːɾi fˈoːɹ"],
+  ["2022s", "twˈɛnti twˈɛnti tˈuːz"],
+  ["1,000", "wˈʌn θˈaʊzənd"],
+  ["12,345,678", "twˈɛlv mˈɪliən θɹˈiː hˈʌndɹɪd fˈoːɹɾi fˈaɪv θˈaʊzənd sˈɪks hˈʌndɹɪd sˈɛvənti ˈeɪt"],
+  ["$100", "wˈʌn hˈʌndɹɪd dˈɑːlɚz"],
+  ["£1.50", "wˈʌn pˈaʊnd ænd fˈɪfti pˈɛns"],
+  ["12.34", "twˈɛlv pˈɔɪnt θɹˈiː fˈoːɹ"],
+  ["0.01", "zˈiəɹoʊ pˈɔɪnt zˈiəɹoʊ wˈʌn"],
+  ["10-20", "tˈɛn tə twˈɛnti"],
+  ["5-10", "fˈaɪv tə tˈɛn"],
+  ["10S", "tˈɛn ˈɛs"],
+  ["5S", "fˈaɪv ˈɛs"],
+  ["Cat's tail", "kˈæts tˈeɪl"],
+  ["X's mark", "ˈɛksᵻz mˈɑːɹk"],
+  ["U.S.A.", "jˈuːˈɛsˈeɪ."],
+  ["A.B.C", "ˈeɪbˈiːsˈiː"],
+]);
+
+const B_TEST_CASES = new Map([
+  ["‘Hello’", "həlˈəʊ"],
+  ["‘Test’ and ‘Example’", "tˈɛst and ɛɡzˈampəl"],
+  ["«Bonjour»", '"bɔːnʒˈʊə"'],
+  ["«Test «nested» quotes»", '"tˈɛst "nˈɛstɪd" kwˈəʊts"'],
+  ["(Hello)", "«həlˈəʊ»"],
+  ["(Nested (Parentheses))", "«nˈɛstɪd «pəɹˈɛnθəsˌiːz»»"],
+  ["こんにちは、世界！", "dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə, tʃˈaɪniːzlˌɛtə tʃˈaɪniːzlˌɛtə!"],
+  ["これはテストです：はい？", "dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə: dʒˈapəniːzlˌɛtə dʒˈapəniːzlˌɛtə?"],
+  ["Hello World", "həlˈəʊ wˈɜːld"],
+  ["Hello   World", "həlˈəʊ wˈɜːld"],
+  ["Hello\n   \nWorld", "həlˈəʊ wˈɜːld"],
+  ["Dr. Smith", "dˈɒktə smˈɪθ"],
+  ["DR. Brown", "dˈɒktə bɹˈaʊn"],
+  ["Mr. Smith", "mˈɪstə smˈɪθ"],
+  ["MR. Anderson", "mˈɪstəɹ ˈandəsən"],
+  ["Ms. Taylor", "mˈɪs tˈeɪlə"],
+  ["MS. Carter", "mˈɪs kˈɑːtə"],
+  ["Mrs. Johnson", "mˈɪsɪz dʒˈɒnsən"],
+  ["Apples, oranges, etc.", "ˈapəlz, ˈɒɹɪndʒɪz, ɛtsˈɛtɹə"],
+  ["Apples, etc. Pears.", "ˈapəlz, ɛtsˈɛtɹə. pˈeəz."],
+  ["1990", "nˈaɪntiːn nˈaɪnti"],
+  ["12:34", "twˈɛlv θˈɜːti fˈɔː"],
+  ["1,000", "wˈɒn θˈaʊzənd"],
+  ["12,345,678", "twˈɛlv mˈɪliən θɹˈiː hˈʌndɹɪdən fˈɔːti fˈaɪv θˈaʊzənd sˈɪks hˈʌndɹɪdən sˈɛvənti ˈeɪt"],
+  ["$100", "wˈɒn hˈʌndɹɪd dˈɒləz"],
+  ["£1.50", "wˈɒn pˈaʊnd and fˈɪfti pˈɛns"],
+  ["12.34", "twˈɛlv pˈɔɪnt θɹˈiː fˈɔː"],
+  ["0.01", "zˈiəɹəʊ pˈɔɪnt zˈiəɹəʊ wˈɒn"],
+  ["Cat's tail", "kˈats tˈeɪl"],
+  ["X's mark", "ˈɛksɪz mˈɑːk"],
+]);
+
+describe("phonemize", () => {
+  describe("en-us", () => {
+    for (const [input, expected] of A_TEST_CASES) {
+      test(`phonemize("${input}")`, async () => {
+        expect(await phonemize(input)).toEqual(expected);
+      });
+    }
+  });
+  describe("en-gb", () => {
+    for (const [input, expected] of B_TEST_CASES) {
+      test(`phonemize("${input}")`, async () => {
+        expect(await phonemize(input, "b")).toEqual(expected);
+      });
+    }
+  });
+});
--- a/kokoro.js/tsconfig.json
+++ b/kokoro.js/tsconfig.json
@@ -0,0 +1,16 @@
+{
+  "include": ["src/**/*"],
+  "compilerOptions": {
+    "checkJs": true,
+    "target": "esnext",
+    "module": "nodenext",
+    "moduleResolution": "nodenext",
+    "outDir": "types",
+    "strict": false,
+    "skipLibCheck": true,
+    "declaration": true,
+    "declarationMap": true,
+    "noEmit": false,
+    "emitDeclarationOnly": true
+  }
+}
--- a/kokoro.js/voices/af.bin
+++ b/kokoro.js/voices/af.bin
--- a/kokoro.js/voices/af_bella.bin
+++ b/kokoro.js/voices/af_bella.bin
--- a/kokoro.js/voices/af_nicole.bin
+++ b/kokoro.js/voices/af_nicole.bin
--- a/kokoro.js/voices/af_sarah.bin
+++ b/kokoro.js/voices/af_sarah.bin
--- a/kokoro.js/voices/af_sky.bin
+++ b/kokoro.js/voices/af_sky.bin
--- a/kokoro.js/voices/am_adam.bin
+++ b/kokoro.js/voices/am_adam.bin
--- a/kokoro.js/voices/am_michael.bin
+++ b/kokoro.js/voices/am_michael.bin
--- a/kokoro.js/voices/bf_emma.bin
+++ b/kokoro.js/voices/bf_emma.bin
--- a/kokoro.js/voices/bf_isabella.bin
+++ b/kokoro.js/voices/bf_isabella.bin
--- a/kokoro.js/voices/bm_george.bin
+++ b/kokoro.js/voices/bm_george.bin
--- a/kokoro.js/voices/bm_lewis.bin
+++ b/kokoro.js/voices/bm_lewis.bin