What chunk size should I use for RAG?

A common starting point is 200–500 tokens per chunk with a 10–20% overlap, then tune to your embedding model and documents. Smaller chunks give more precise retrieval but less surrounding context; larger chunks keep context but dilute the relevant signal.

Why split by tokens instead of characters?

Embedding and chat models bill and limit by tokens, not characters. Splitting by tokens lets you size chunks to a model's context window exactly. This tool measures tokens with the real tiktoken o200k_base encoding used by GPT-4o.

What does overlap do?

Overlap repeats the end of one chunk at the start of the next so an idea split across a boundary still appears whole in at least one chunk, improving retrieval at the cost of storing some text twice.

No. The splitter and the tokenizer run as JavaScript in your browser, so your documents are never sent to a server.

Text Chunk Splitter for RAG

On this page

How to use it
Why chunking matters
Choosing size & overlap
Boundary-aware mode
Why split locally
FAQ

How to split text into chunks

Paste your text into the box and the tool splits it immediately — there is no button to press. Pick the unit you want to size chunks by (tokens, characters or words), set the chunk size and how much each chunk should overlap with the next, and the chunks appear below with an exact token count for each. Token sizing and the per-chunk counts use the real tiktoken o200k_base encoding, the same one GPT-4o uses, so the numbers match what your model will actually see. When you are happy, copy the whole set as a JSON array or download it to feed straight into an ingestion pipeline.

Why chunking matters for RAG

Retrieval-augmented generation grounds a model in your own documents. Before a model can use a long document, the text is split into smaller passages — chunks — each of which is embedded into a vector and stored in a vector database. At query time the most relevant chunks are retrieved and placed in the prompt. The way you chunk directly controls retrieval quality: it decides how much surrounding context travels with each idea and how precisely a search can target the right passage. Chunk well and the model gets exactly the context it needs; chunk badly and relevant facts get buried or cut in half.

Choosing chunk size and overlap

Chunk size is a trade-off. Chunks that are too large dilute the relevant signal and waste context-window budget; chunks that are too small lose the surrounding meaning a passage needs to stand on its own. A common starting point is a few hundred tokens per chunk — often 200 to 500 — with an overlap of roughly 10–20% so a sentence split across a boundary still appears whole somewhere. Size in tokens rather than characters when you care about fitting an embedding model's input limit or a chat model's context window, because that is the unit those models actually count. Use the live max- and average-token stats to confirm no chunk overruns your model's limit.

Boundary-aware packing

The default fixed-window mode cuts exactly at the size limit, which is predictable but can slice through a sentence. Turn on keep sentences whole to switch to greedy packing: the tool splits the text into paragraphs and sentences, then fills each chunk up to the size limit without breaking a sentence, carrying a short overlap from the previous chunk. This produces slightly uneven chunk sizes but cleaner, more readable passages — usually better for retrieval because each chunk is a complete thought rather than an arbitrary slice.

Why split text locally

The documents you most want to embed are often the ones you least want to upload — internal wikis, contracts, support transcripts, unreleased material. Running the splitter and the tokenizer entirely in your browser means the text and its token counts are computed on your device and nothing is logged, matching the gitime.dev default that your data stays local.

Token-aware — size and count by real tiktoken tokens.
Flexible — tokens, characters or words; tunable overlap.
Boundary mode — keep sentences whole when you want.
Exportable — copy or download chunks as JSON.
Local — documents never uploaded.

Frequently asked questions

What chunk size should I use for RAG?: A common start is 200–500 tokens with 10–20% overlap, then tune to your embedding model and documents.
Why split by tokens instead of characters?: Models bill and limit by tokens, so token sizing fits a context window exactly. Counts use tiktoken o200k_base.
What does overlap do?: It repeats the end of one chunk at the start of the next so ideas spanning a boundary stay retrievable.
Is my text uploaded?: No. The splitter and tokenizer run locally, so your documents are never sent to a server.