install

npm i wandler

# or
yarn add wandler

# or
pnpm add wandler

hello, world!

let's get a simple text generation example running. this will demonstrate the core functionality of wandler.

import { loadModel, generateText } from "wandler";

// 1. Load a model
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");

// 2. Generate text
const result = await generateText({
	model,
	messages: [{ role: "user", content: "Hello, Wandler!" }],
});

// 3. Output the result
console.log(result.text);
  1. loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX"): this line loads the onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX model from the hugging face. wandler automatically downloads and prepares the model for browser-based inference.
  2. generateText({...}): this function takes a model and a set of messages (representing a conversation) and generates a text response. it's the simplest way to get a single text output from a model.
  3. console.log(result.text): this line prints the generated text to the console.

core functions

wandler revolves around three core functions: loadModel, generateText, and streamText

lets take a look at each of them.

loadModel

loadModel(modelPath: string, options?: ModelOptions)

  • purpose: loads a pre-trained model from the hugging face model hub once and then stores it into the local cache of your browser. this allows for faster inference when the same model is requested multiple times.
  • modelPath: the path to the model. this is typically a string like "onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX" (for a hugging face hub model) or a local file path.
  • options: an optional object to configure model loading. key options include:
    • device: "webgpu" (default if available), "wasm", "cpu". specifies the device to use for inference. "webgpu" is generally the fastest.
    • dtype: "q4f16" (default), "q4", "fp16". specifies the data type to use for the model. "q4f16" is a good balance of speed and accuracy.
    • onProgress: a callback function to track the model loading progress.
    • useWorker: true (default), false. whether to load the model in a web worker.
  • returns: a BaseModel object, which contains the loaded model, tokenizer, and configuration.

example: progress tracking

import { loadModel } from "wandler";

const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX", {
	onProgress: info => {
		if (info.status === "progress") {
			console.log(`Loading: ${info.loaded}/${info.total} bytes`);
		}
	},
});
console.log("Model loaded!");

generateText

generateText(options: NonStreamingGenerationOptions)

  • purpose: generates a single text response from a language model.
  • options: an object containing the generation configuration. key options include:
    • model: the BaseModel object returned by loadModel.
    • messages: an array of Message objects representing the conversation history. each Message has a role ("user", "assistant", "system") and content.
    • prompt: a single prompt string (alternative to messages).
    • maxTokens: the maximum number of tokens to generate.
    • temperature: sampling temperature (higher values = more creative, lower values = more predictable).
    • topP: nucleus sampling parameter.
  • returns: a GenerationResult object, which contains the generated text, reasoning (if available), and usage statistics.

example: system message & temperature

import { loadModel, generateText } from "wandler";

const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const result = await generateText({
	model,
	system: "You are a helpful assistant.",
	messages: [{ role: "user", content: "What is the capital of France?" }],
	temperature: 0.7,
});
console.log(result.text);

streamText

streamText(options: StreamingGenerationOptions)

  • purpose: generates text in streaming mode, allowing for real-time updates. this is ideal for chat applications or scenarios where you want to display the output as it's being generated.
  • options: an object containing the streaming generation configuration:
    • onChunk: a callback function that is called with each chunk of generated text.
  • returns: a StreamTextResult object, which contains:
    • textStream: a ReadableStream that yields text chunks.
    • fullStream: a ReadableStream that yields structured events (text deltas, reasoning, sources).
    • result: a Promise that resolves to the final GenerationResult.

example: streaming output

import { loadModel, streamText } from "wandler";

const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const { textStream } = await streamText({
	model,
	messages: [{ role: "user", content: "Tell me a short story." }],
});

const decoder = new TextDecoder();
for await (const chunk of textStream) {
	console.log("Chunk:", decoder.decode(chunk));
}

example: using onChunk with fullStream

import { loadModel, streamText } from "wandler";

const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const { fullStream } = await streamText({
	model,
	messages: [{ role: "user", content: "Explain the theory of relativity." }],
	onChunk: chunk => {
		if (chunk.type === "text-delta") {
			console.log("Text Chunk (onChunk):", chunk.text);
		} else if (chunk.type === "reasoning") {
			console.log("Reasoning (onChunk):", chunk.text);
		}
	},
});

for await (const event of fullStream) {
	if (event.type === "text-delta") {
		console.log("Text Chunk (fullStream):", event.textDelta);
	} else if (event.type === "reasoning") {
		console.log("Reasoning (fullStream):", event.textDelta);
	}
}