install
npm i wandler
# or
yarn add wandler
# or
pnpm add wandler
hello, world!
let's get a simple text generation example running. this will demonstrate the core functionality of wandler.
import { loadModel, generateText } from "wandler";
// 1. Load a model
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
// 2. Generate text
const result = await generateText({
model,
messages: [{ role: "user", content: "Hello, Wandler!" }],
});
// 3. Output the result
console.log(result.text);
loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX")
: this line loads the onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX model from the hugging face. wandler automatically downloads and prepares the model for browser-based inference.generateText({...})
: this function takes a model and a set of messages (representing a conversation) and generates a text response. it's the simplest way to get a single text output from a model.console.log(result.text)
: this line prints the generated text to the console.
core functions
wandler revolves around three core functions: loadModel
, generateText
, and streamText
lets take a look at each of them.
loadModel
loadModel(modelPath: string, options?: ModelOptions)
- purpose: loads a pre-trained model from the hugging face model hub once and then stores it into the local cache of your browser. this allows for faster inference when the same model is requested multiple times.
modelPath
: the path to the model. this is typically a string like"onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX"
(for a hugging face hub model) or a local file path.options
: an optional object to configure model loading. key options include:device
:"webgpu"
(default if available),"wasm"
,"cpu"
. specifies the device to use for inference."webgpu"
is generally the fastest.dtype
:"q4f16"
(default),"q4"
,"fp16"
. specifies the data type to use for the model."q4f16"
is a good balance of speed and accuracy.onProgress
: a callback function to track the model loading progress.useWorker
:true
(default),false
. whether to load the model in a web worker.
- returns: a
BaseModel
object, which contains the loaded model, tokenizer, and configuration.
example: progress tracking
import { loadModel } from "wandler";
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX", {
onProgress: info => {
if (info.status === "progress") {
console.log(`Loading: ${info.loaded}/${info.total} bytes`);
}
},
});
console.log("Model loaded!");
generateText
generateText(options: NonStreamingGenerationOptions)
- purpose: generates a single text response from a language model.
options
: an object containing the generation configuration. key options include:model
: theBaseModel
object returned byloadModel
.messages
: an array ofMessage
objects representing the conversation history. eachMessage
has arole
("user"
,"assistant"
,"system"
) andcontent
.prompt
: a single prompt string (alternative tomessages
).maxTokens
: the maximum number of tokens to generate.temperature
: sampling temperature (higher values = more creative, lower values = more predictable).topP
: nucleus sampling parameter.
- returns: a
GenerationResult
object, which contains the generated text, reasoning (if available), and usage statistics.
example: system message & temperature
import { loadModel, generateText } from "wandler";
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const result = await generateText({
model,
system: "You are a helpful assistant.",
messages: [{ role: "user", content: "What is the capital of France?" }],
temperature: 0.7,
});
console.log(result.text);
streamText
streamText(options: StreamingGenerationOptions)
- purpose: generates text in streaming mode, allowing for real-time updates. this is ideal for chat applications or scenarios where you want to display the output as it's being generated.
options
: an object containing the streaming generation configuration:onChunk
: a callback function that is called with each chunk of generated text.
- returns: a
StreamTextResult
object, which contains:textStream
: aReadableStream
that yields text chunks.fullStream
: aReadableStream
that yields structured events (text deltas, reasoning, sources).result
: aPromise
that resolves to the finalGenerationResult
.
example: streaming output
import { loadModel, streamText } from "wandler";
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const { textStream } = await streamText({
model,
messages: [{ role: "user", content: "Tell me a short story." }],
});
const decoder = new TextDecoder();
for await (const chunk of textStream) {
console.log("Chunk:", decoder.decode(chunk));
}
onChunk
with fullStream
example: using import { loadModel, streamText } from "wandler";
const model = await loadModel("onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX");
const { fullStream } = await streamText({
model,
messages: [{ role: "user", content: "Explain the theory of relativity." }],
onChunk: chunk => {
if (chunk.type === "text-delta") {
console.log("Text Chunk (onChunk):", chunk.text);
} else if (chunk.type === "reasoning") {
console.log("Reasoning (onChunk):", chunk.text);
}
},
});
for await (const event of fullStream) {
if (event.type === "text-delta") {
console.log("Text Chunk (fullStream):", event.textDelta);
} else if (event.type === "reasoning") {
console.log("Reasoning (fullStream):", event.textDelta);
}
}