If you've been using an AI for collaborative fiction, you've run into the wall. The model forgets what happened three scenes ago. It breaks character. It writes "His eyes widened in shock." for the fourth time in a row. The context fills up and suddenly your character doesn't know her own backstory.
Lagoon was built to fix all of that. It's not a chat app with a system prompt — it's a writing environment where the system is actively working to keep your story coherent, your character in voice, and your prose quality high. Every turn, automatically.
Bring your own models. Venice.ai · Ollama (local) · Any OpenAI-compatible endpoint. Paste your API keys and run. No subscriptions, no cloud accounts, no data leaving your machine.
01 Your character, persistent across hundreds of turns
In Lagoon, a character is more than a system prompt. It's a complete profile — voice, personality, backstory, world context, tone rules — that travels with every conversation. Your character doesn't reset between sessions. The same voice, the same rules, every time you open a chat.
System Prompt
Who they are, how they speak, what they never do. Permanent — it never gets pruned from context no matter how long the story gets.
World Context
The setting, the rules of your world, background the model needs to know. Also permanent in context. Separate from the character voice.
Opening Line
Set an intro statement — the character's first words before you've said anything. Establishes voice and scene from turn zero.
Avatar
Generate a portrait from your character description with one click. Stored locally. Persists across every chat with that character.
One config, many stories. Character configs are reusable. Open a new chat with the same character and everything carries over — voice, rules, lore, tone. The story history is separate; the character is not.
02 Memory — the story remembers itself
After every exchange, Lagoon chunks the conversation and builds a searchable memory of what happened. Before each new turn, it finds the chunks most relevant to what you just said and quietly injects them into context. The story recalls its own past — without you having to remind it.
Without memory
You: "What did she say about her brother?"
AI: "I'm not sure — could you remind me what was discussed earlier in our conversation?"
With Lagoon RAG memory
You: "What did she say about her brother?"
AI: (retrieves the exchange from turn 23 automatically) "She mentioned he left for the coast before the war. Never came back."
How it works — without the jargon
Every few turns, Lagoon takes your conversation and breaks it into chunks. Each chunk gets a fingerprint. When you write something new, Lagoon finds the chunks whose fingerprints most closely match — and quietly drops them into context, right before your turn. You never see it. The model just knows.
100%
Local — runs on your machine, no API calls for memory
~8–10
Turns before memory starts contributing meaningfully
∞
No session limit — memory grows with the story
Configurable. Adjust how many past chunks to retrieve, how closely they need to match, how many tokens they're allowed to use — all live in Memory Settings, no restart needed. Or turn it off entirely.
03 Anchors — world knowledge that doesn't waste tokens
You've got world-building facts, character histories, faction details, location descriptions. If you put them all in the system prompt, they eat your context budget on every single turn — even when they're irrelevant. Anchors only fire when you need them.
Write a lore entry. Assign it keywords. When those keywords appear in the recent conversation, the entry injects silently as context for that turn. When they don't appear, it costs nothing.
Without Anchors
Option A: cram everything into the system prompt, blowing 2,000 tokens on backstory the model doesn't need this turn.
Option B: leave it out, and watch the model invent history on the fly.
With Anchors
"The Kael family history" entry has keywords: Kael, family, parents, Mira's father.
It only appears in context when someone mentions the Kaels. Otherwise: 0 tokens used.
-
⚡
Priority order. Multiple entries can fire at once. Higher-priority entries inject first. Put "who she is at her core" above "that obscure noble house she mentioned once."
-
💰
Token budget. Set a cap on how many tokens Anchors can use per turn. If you hit the limit, lower-priority entries get dropped. Your context budget stays predictable.
-
🔍
Scan depth. Controls how far back in the conversation Lagoon looks for keywords. Default: last 15 messages. Adjust if your story has long gaps between references.
04 Character Awareness — automatic reveal tracking Unique
This is the feature that doesn't exist anywhere else. Mark a lore entry as "character not yet aware." From that point on, the model is told: this is true, but your character doesn't know it yet. When the reveal finally happens in the story — when the model writes it — Lagoon detects it automatically and marks the entry as known. No manual tracking. No forgetting to update it. It just works.
You write: "Her brother is still alive — she doesn't know yet"
→
Mark as: "character not yet aware"
Lagoon injects: "This is true, but Elena doesn't know it. If she learns this turn, signal it."
→
Model writes the reveal. Lagoon catches the signal.
Entry flips to: "aware"
→
Toast notification: "Elena now knows: her brother is alive."
Why this matters: In a long story with multiple reveals, tracking "does she know this yet?" manually is exhausting and error-prone. Lagoon handles it automatically. The model knows what the character knows. The moment that changes, so does the system.
05 Style Overseer — fixing AI's bad prose habits Unique
AI models have consistent, repeatable bad habits. They double-space after sentences. They write single dramatic lines as their own paragraph. They slip out of third-person. They open responses by restating what you just said. They write unattributed dialogue that floats with no action anchor. After a long session, these patterns compound and the prose degrades.
The Style Overseer reviews every response after it streams, flags each violation, and lets you fix it in place.
What you get without intervention
She looked at him. Her eyes widened in shock. She couldn't believe what she was hearing.
Three sentences. Three paragraphs. The middle one is a cliché standing alone for dramatic effect.
After the Overseer flags and you accept
She looked at him. She couldn't believe what she was hearing.
Isolated dramatic sentence merged into the paragraph. Cliché gone. One click.
Six built-in rules (on by default, optional)
-
✗
Isolated dramatic sentences. "She was already gone." standing alone as its own paragraph. The Overseer flags it. You decide if it stays.
-
✗
Double spacing. Two spaces after a period. Small thing, constant thing. Caught automatically.
-
✗
POV slippage. Third-person story suddenly reads "I felt a chill run down my spine." The model broke perspective. Flagged.
-
✗
Correction acknowledgment. "You're right, I apologize for the confusion earlier—" The model stepped outside the story to respond to your direction. Stripped.
-
✗
Verbatim echo. The AI opens by paraphrasing what you just said. "You reached for the door — and as your hand closed around the handle…"
-
✗
Unattributed dialogue. A line of dialogue with no action, tag, or anchor to show who's moving in the scene.
And your own rules, in plain English
Add rules like: "Do not use the phrase 'a mix of'" or "Never open a paragraph with a gerund" or "This character does not use contractions." Written in plain English. Applied every turn.
How accepting a correction works
Overseer flags a violation with an excerpt
→
You hit Accept
Offending text replaced in-place in the message bubble
→
Rule added to Author's Note: "DO NOT: [excerpt]"
Author's Note injects near generation point next turn
→
Model trained away from the pattern. It compounds.
Auto-accept mode: Enable it and the Overseer runs silently on every response — no badge, no prompt. Every violation is fixed automatically. Corrections still accumulate in the Author's Note. The prose quality just steadily improves on its own.
Model presets: GLM, DeepSeek, Llama, and Qwen each have known bad habits. Load the preset for your model and those model-specific rules are added automatically.
06 Author's Note — your persistent nudge
Some instructions need to be close to where the model generates, not buried at the top of a long context. The Author's Note re-injects at a configurable depth from the bottom on every single turn — right before the model writes. Perfect for tone reminders, current scene context, or things you don't want the model to forget mid-session.
Without it
You put tone guidance in the system prompt 200 messages ago. The model has long since drifted. You're now getting warm, gentle responses from a character who should be cold and controlled.
Author's Note
[Tone] She is controlled, not warm. Every kindness she shows costs her something. Do not let her soften.
Re-injected 4 messages from the bottom. Every turn. She doesn't soften.
Depth is configurable. Default is 4 messages from the bottom — close enough to matter, not so close it crowds the model. For fast-paced scenes, move it to 2. For more stable characters, push it out to 8.
Session-only override: The sidebar Author's Note panel is session-only. Changes there don't touch the saved character config. Close the chat, they're gone. Use it for temporary scene-specific notes — the config version handles the permanent voice rules.
07 Long stories don't die
Every AI has a context limit — a cap on how much text it can hold in its attention at once. When you hit it, something has to go. Lagoon handles this gracefully, with your approval, so you never lose the thread of a long story.
What happens when the context fills up
Context hits 75% of the model's limit (configurable)
→
Lagoon generates a summary of the conversation so far
Summary is flagged as pending. A banner appears.
→
Nothing is deleted until you approve
You approve. Older messages are pruned.
→
Summary injects as context. Story continues.
Summaries stack. Each new one is told what's already been summarized so it doesn't repeat. The character's permanent system prompt and world context are never touched — only the conversation history gets pruned.
Manual trigger available. Don't wait for the threshold. Prompt Monitor tab → Summarize Now. Use it to create a chapter break, clean up a sprawling scene, or just keep the context tidy on your own schedule.
09 Dual Model — characters talking to each other
Set up two characters — each with their own model, system prompt, and temperature — and let them run an automated conversation. You write the opening line. They do the rest until you stop them.
Each character brings their full setup. Model A responds, Model B receives that as a prompt and responds, they alternate. You control the pace: pause at any moment, resume, add a turn limit, or stop and redirect. Manual intervention — editing or regenerating any message — auto-pauses so you don't lose your place.
Where it shines: generating a scene between two characters where you want both voices to be reactive to each other, not scripted. Two models, each playing their role, each responding to what the other actually said. The dialogue stays organic.
10 Model freedom
Your character config stores the model. Switch mid-story by editing the config — the conversation history carries over unchanged. Venice today, a local Ollama model when you're offline. One writing environment, all your models.
Venice.ai
Ollama (local)
KoboldCpp
llama.cpp
LM Studio
Any OpenAI-compatible endpoint
BYOK — paste your API keys in settings. No accounts, no Lagoon subscription. Lagoon is a local application, not a service.
Venice E2EE: For Venice's TEE models, Lagoon sets up a full end-to-end encrypted session — your messages are encrypted before they leave your machine and decrypted only inside the secure enclave. Venice cannot see the content.
11 Getting started
Requirements: Python 3.10+. No Node.js, no Docker, no Electron. Runs entirely on your machine.
Windows
- 1
Download LagoonSetup.exe from the releases page and run it.
- 2
A console window will open and install Python dependencies. This takes a few minutes — don't close it.
- 3
Click the Start Menu shortcut — Lagoon starts in the background and opens in your browser automatically.
- 4
Enter your Venice API key in Settings and create your first character.
Mac / Linux
- 1
Clone the repo.
- 2
Run python setup.py — creates a virtual environment, installs dependencies, walks you through API key setup.
- 3
Run: python app.py
- 4
Open https://localhost:5007 — create your first character and start writing.
First character setup: Start with a system prompt that describes who they are in their own voice, not what they should do. Add a world context block for setting details. Drop an Author's Note with your current tone requirements. You're writing within two messages.
LAGOON v1.3 · LOCAL-FIRST · BYOK · NO CLOUD
Built by a solo contractor who writes fiction and got tired of the context wall.
Every feature here exists because it was needed, not because it was impressive.