Every automated Codex run starts at zero
If you're using Codex or similar coding agent CLIs for automated tasks (nightly fixes, scheduled refactors, CI-triggered changes), you're re-priming the model with codebase context on every single run.
Architecture decisions. Non-obvious constraints. Prior attempt history. The reason a particular approach was tried and abandoned. None of it carries over.
This is different from the interactive assistant problem. When you're typing in Cursor or Claude.dev, you're there to re-explain things. When a Codex agent runs at 2 AM, nobody's there to fill in the gaps.
Why CLI coding agents are a different problem
Interactive coding assistants (Copilot, Cursor, Windsurf) have you present to course-correct. You notice when the assistant misses context and you supply it.
Automated CLI agents run unattended. A Codex job triggered by a CI event or a cron schedule has to work from whatever context is in the prompt. If that context is stale or thin, the agent makes suboptimal decisions with no one watching.
The typical workaround is a giant AGENTS.md or CODEX.md file that gets prepended to every run. That helps, but it's static. It doesn't accumulate knowledge from prior runs. It doesn't record why a particular approach failed last Tuesday. It's a prompt template, not memory.
What you actually need to carry forward
The high-value items for automated coding runs:
What was tried and didn't work. If a Codex agent attempted a particular fix and it broke tests, that failure should be stored. Next run, the agent shouldn't attempt the same broken path.
Codebase conventions the agent discovered. The first time an agent learns that all external API calls go through a specific wrapper, that's worth keeping. It shouldn't have to rediscover it from reading files on every run.
Checkpoint state for multi-step tasks. A refactor job that spans multiple runs needs to know where it left off. Storing phase completion state lets runs be resumable.
Task-specific outcomes. Which services got updated, which were skipped and why, what the diff looked like. Useful context for subsequent runs on the same codebase.
# Example memories worth storing after a Codex run
- Attempted to migrate from axios to native fetch in payments-service.
Blocked: payments-service uses axios interceptors for auth token injection.
Wrapper at /lib/http/client.ts is the right target, not individual call sites.
- billing-worker uses a custom test harness in /test/billing-harness.ts.
Standard jest mocks don't work for its queue integration tests.
Implementation pattern
The basic pattern: before each Codex run, pull relevant memories and inject them. After each run, store what was learned.
#!/bin/bash
# codex-with-memory.sh
TASK="$1"
# Pull relevant context from memory store
CONTEXT=$(mc recall --query "$TASK" --limit 8 --format text)
# Build augmented prompt
PROMPT=$(cat <<EOF
Project context from prior runs:
$CONTEXT
---
Task: $TASK
EOF
)
# Run Codex with augmented prompt
codex exec --full-auto "$PROMPT"
# After run completes, store outcome
# (your post-run hook goes here)
For the post-run hook, you have a few options depending on your setup. Simplest is a fixed-format output block that your wrapper script parses and saves to memory. More sophisticated is a second lightweight LLM call that extracts lessons learned from the agent's output.
Using Memory Crystal with CLI agents
Memory Crystal's CLI and HTTP API both work well for this pattern. The workflow is:
# At run start: inject project memory
mc recall \
--query "payment service refactor context" \
--tags "payments-service" \
--limit 6 \
--format markdown > /tmp/mc_context.md
# Compose prompt
cat /tmp/mc_context.md task.md | codex exec --full-auto "$(cat /dev/stdin)"
# After run: save what was learned
mc remember \
--store episodic \
--category event \
--title "Payments refactor run $(date +%Y-%m-%d)" \
--content "$(cat /tmp/codex_output.md)" \
--tags "payments-service,refactor"
If you're running Codex via the Node SDK instead of CLI, the same pattern works with the JavaScript client:
import MemoryCrystal from "@memory-crystal/sdk";
import { execSync } from "child_process";
const mc = new MemoryCrystal({ apiKey: process.env.MC_API_KEY });
async function runWithMemory(task: string, tags: string[]) {
// Pull context
const memories = await mc.recall({ query: task, tags, limit: 8 });
const contextBlock = memories.map(m => `- ${m.title}: ${m.content}`).join("\n");
// Build augmented task
const augmentedTask = `Prior context:\n${contextBlock}\n\nTask: ${task}`;
// Run agent
const result = execSync(`codex exec --full-auto "${augmentedTask}"`).toString();
// Store outcome
await mc.remember({
store: "episodic",
category: "event",
title: `${task} — ${new Date().toISOString().split("T")[0]}`,
content: result,
tags
});
return result;
}
What a static AGENTS.md can't do
The static context file approach hits a ceiling once your automated agents are running regularly. It can't:
- Record why something was tried and rejected
- Track which parts of the codebase are "dangerous" based on prior failures
- Maintain checkpoint state across a multi-session task
- Surface only relevant context for the current task (instead of dumping everything)
Dynamic memory retrieval gives you targeted injection. The agent working on payments-service gets memories tagged to payments-service, not the full codebase history.
Failure modes to avoid
Storing raw agent output verbatim. Agent traces are verbose and noisy. Extract the meaningful signal -- the decisions, the failures, the discoveries -- before storing.
No TTL on volatile observations. "This endpoint is down" is not a durable fact. Use short TTLs for ephemeral state and explicit supersession when a prior observation gets invalidated.
Re-summarizing the same codebase facts every run. If a convention got stored after run 1, don't store it again on run 7. Deduplication matters for retrieval quality.
The pattern in short
- Before each run: recall relevant memories, inject as context prefix
- During the run: let the agent work
- After the run: extract lessons, failures, and discoveries, store with appropriate tags and TTL
An automated Codex agent with this pattern behaves more like a developer who has done this work before. It knows what didn't work. It knows the local conventions. It knows where to be careful.
That's a meaningfully different agent than one that starts blank every time.