A transformer-based reading environment that tokenizes, parses, annotates, and assigns dictionary entries to text in 30+ modern and historical languages. Reads a variety of input formats — PDF, HTML snapshots, Word documents, EPUBs.
Built for serious learners and professionals who want research-grade linguistic understanding.
Documents are ingested with their original page geometry preserved — page numbers, running headers, centred text blocks, line wrapping, and book typography. Dependency arcs are drawn directly over the live text, so syntax is visible on the page.
Save an HTML snapshot of any web page with the companion Chrome extension — Language Capture — and load it into the reader as a frozen, fully-analysable text. Infoboxes, hyperlinks, and visual structure stay intact.
Use the selector tool to highlight specific chunks of prose for lookup.
Arabic clitics are split off from their host words. Sanskrit sandhi chains are decomposed into their constituent morphemes. Scriptio continua languages — Thai, Japanese, Chinese — are segmented by a powerful multilingual transformer pretrained on 2.6 terabytes of text from over 100 languages.
Hovering a token surfaces a stack of curated dictionaries assembled per language — Wiktionary alongside specialist sources like JMDict (Japanese), CC-CEDICT (Chinese), KRDICT (Korean), and historical lexica for classical languages.
Entries are filtered by lemma and POS, so you only see context-suitable candidates.
Each entry is annotated with a POS tag, a language-specific fine-grained POS tag, a grapheme-level script/romanization breakdown, and a full morphological feature bundle: aspect, mood, tense, voice, person, number, case, gender and more. Inflected forms are also resolved back to their dictionary lemma.
Sentence structure is rendered visually onscreen through dependency arcs drawn over the live text, accompanied by POS shading to give shape to phrase and clause boundaries. Each word’s grammatical role in the sentence is flagged, marking out subjects, objects, verbs and function words.
Named entities are tagged with colour-coded chips so you can scan a passage at a glance for the who, what, where and when.
Per-token glosses for word-sense disambiguation in context. On-demand synthetic dictionary entries for forms not covered by the existing sources. Sentence-level translations anchored to the source text. Free-form notes attached to dictionary entries. All of the above are community-editable — once one user generates an annotation, every other reader can see, alter, and benefit from it.
A built-in LLM chatbot always has your page loaded as context, and can answer any questions you might have.
The world’s most widely spoken modern languages, plus the classical languages that open up the heritage of major civilisations.