Goal-verification is hard
Asking an agent "does this task advance the goal?" is almost useless. A rationalizing agent (or a hallucinating one) will always answer yes. The Meridian agent could have answered yes to every fictional task it created. The question is too easy to pass.
Why most framing fails:
- "Does this advance the goal?" → Always yes (motivated reasoning)
- "Could this theoretically help?" → Always yes (any task can be rationalized)
- "Is this aligned?" → Always yes (the agent that invented the goal is also the judge)
The root cause: self-evaluation under bias. The agent creating the task is the same agent evaluating the task, with full context of why it wants the task to exist.
The cognitive fix — specificity-forcing:
The only technique that reliably breaks motivated reasoning is demanding specific, falsifiable claims rather than general agreement. You cannot specifically fabricate — vagueness is the tell.