How to split text into chunks
Paste your text into the box and the tool splits it immediately — there is no button to press. Pick the unit you want to size chunks by (tokens, characters or words), set the chunk size and how much each chunk should overlap with the next, and the chunks appear below with an exact token count for each. Token sizing and the per-chunk counts use the real tiktoken o200k_base encoding, the same one GPT-4o uses, so the numbers match what your model will actually see. When you are happy, copy the whole set as a JSON array or download it to feed straight into an ingestion pipeline.
Why chunking matters for RAG
Retrieval-augmented generation grounds a model in your own documents. Before a model can use a long document, the text is split into smaller passages — chunks — each of which is embedded into a vector and stored in a vector database. At query time the most relevant chunks are retrieved and placed in the prompt. The way you chunk directly controls retrieval quality: it decides how much surrounding context travels with each idea and how precisely a search can target the right passage. Chunk well and the model gets exactly the context it needs; chunk badly and relevant facts get buried or cut in half.
Choosing chunk size and overlap
Chunk size is a trade-off. Chunks that are too large dilute the relevant signal and waste context-window budget; chunks that are too small lose the surrounding meaning a passage needs to stand on its own. A common starting point is a few hundred tokens per chunk — often 200 to 500 — with an overlap of roughly 10–20% so a sentence split across a boundary still appears whole somewhere. Size in tokens rather than characters when you care about fitting an embedding model's input limit or a chat model's context window, because that is the unit those models actually count. Use the live max- and average-token stats to confirm no chunk overruns your model's limit.
Boundary-aware packing
The default fixed-window mode cuts exactly at the size limit, which is predictable but can slice through a sentence. Turn on keep sentences whole to switch to greedy packing: the tool splits the text into paragraphs and sentences, then fills each chunk up to the size limit without breaking a sentence, carrying a short overlap from the previous chunk. This produces slightly uneven chunk sizes but cleaner, more readable passages — usually better for retrieval because each chunk is a complete thought rather than an arbitrary slice.
Why split text locally
The documents you most want to embed are often the ones you least want to upload — internal wikis, contracts, support transcripts, unreleased material. Running the splitter and the tokenizer entirely in your browser means the text and its token counts are computed on your device and nothing is logged, matching the gitime.dev default that your data stays local.
- Token-aware — size and count by real tiktoken tokens.
- Flexible — tokens, characters or words; tunable overlap.
- Boundary mode — keep sentences whole when you want.
- Exportable — copy or download chunks as JSON.
- Local — documents never uploaded.
Frequently asked questions
- What chunk size should I use for RAG?
- A common start is 200–500 tokens with 10–20% overlap, then tune to your embedding model and documents.
- Why split by tokens instead of characters?
- Models bill and limit by tokens, so token sizing fits a context window exactly. Counts use tiktoken o200k_base.
- What does overlap do?
- It repeats the end of one chunk at the start of the next so ideas spanning a boundary stay retrievable.
- Is my text uploaded?
- No. The splitter and tokenizer run locally, so your documents are never sent to a server.