The Recursion That Actually Works
Most people think of AI as a tool you give instructions to. I built a system where the AI writes its own instructions. Not in a theoretical research lab sense. In a production business operations sense. The skill-creator skill is an AI agent whose sole job is to observe what works in real sessions, extract the patterns, and codify them into new skills that other agents can use.
A skill, in my system, is a structured set of instructions that tells an AI agent how to perform a specific task. It includes the trigger conditions, the step-by-step procedure, the quality gates, the error handling, and the expected outputs. Writing a good skill takes deep domain knowledge and careful iteration. It used to take me hours per skill. Now the AI writes them in minutes, and the quality is often better than what I produce manually.
How Skill Self-Creation Works
The process starts with observation. During every working session, the AI tracks which actions it takes, which tools it uses, which decisions require my input, and which outcomes are successful. This creates a session log — a structured record of the entire workflow from start to finish.
After the session, the skill-creator agent analyzes the log. It identifies repeatable patterns: sequences of actions that were performed multiple times with consistent success. It extracts the decision logic: the conditions under which the AI chose one path over another. And it captures the quality gates: the checks that determined whether an output was acceptable.
From this analysis, the agent drafts a new skill. The skill follows a standardized format — YAML frontmatter with metadata, followed by markdown instructions with step-by-step procedures. The agent writes the description that determines when the skill triggers, the instructions that determine how it executes, and the validation criteria that determine whether it succeeded.
The Quality Problem and How We Solved It
Early versions of skill self-creation produced mediocre skills. They captured the surface-level actions but missed the contextual judgment that made the workflow actually work. The agent would write a skill that said “publish to WordPress” but miss the nuance of checking excerpt length, verifying category assignment, or running the SEO optimization pass before publishing.
The fix was adding a refinement loop. After the agent drafts a skill, it runs a simulated execution against a test case. If the simulated execution misses steps that the original session included, the agent revises the skill. This loop runs until the simulated execution matches the original session’s quality within a defined tolerance.
The second fix was adding a description optimization pass. A skill is useless if it never triggers. The agent now analyzes the trigger conditions — the keywords, phrases, and contexts that should activate the skill — and optimizes the description for maximum recall without false positives. This is essentially SEO for AI skills.
Skills That Write Better Skills
The most recursive part of the system is that the skill-creator skill itself was partially written by an earlier version of itself. I wrote the first version manually. That version observed me creating skills by hand, extracted the patterns, and produced a second version that was more comprehensive. The second version then refined itself into the third version, which is what runs in production today.
Each generation captures more nuance. The first version knew to include trigger conditions. The second version learned to include negative triggers — conditions that should explicitly not activate the skill. The third version added variance analysis — testing whether a skill performs consistently across different invocation contexts or only works in the specific scenario where it was created.
This is not artificial general intelligence. It is not sentient. It is a well-designed feedback loop that improves operational documentation through structured iteration. But the output is remarkable: a library of over 80 production skills, many of which were created or significantly refined by the system itself.
What This Means for Business Operations
The traditional way to scale operations is to hire people, train them, and hope they follow the procedures consistently. The skill self-creation model inverts this. The AI observes the best version of a procedure, codifies it perfectly, and then executes it identically every time. No training decay. No interpretation drift. No Monday morning inconsistency.
When I discover a better way to optimize a WordPress post — a new schema type, a better FAQ structure, a more effective interlink pattern — I do it once in a live session. The skill-creator agent watches, extracts the improvement, and updates the relevant skill. From that moment forward, every post optimization across every site includes the improvement. One session, permanent upgrade, portfolio-wide deployment.
The Limits of Self-Creation
The system cannot create skills for tasks it has never observed. It cannot invent new optimization techniques or discover new strategies. It can only codify and refine what it has seen work in practice. The creative direction, the strategic decisions, the judgment calls — those still come from me.
It also cannot evaluate business impact. It knows whether a skill executed correctly, but it does not know whether the output moved a meaningful metric. That evaluation layer requires human judgment and time — traffic data, conversion data, client feedback. The system optimizes execution quality, not business outcomes. The gap between those two things is where human expertise remains irreplaceable.
FAQ
How many skills has the system created autonomously?
Approximately 30 skills were created entirely by the skill-creator agent. Another 50 were human-created but significantly refined by the agent through the optimization loop.
Can the system create skills for any domain?
It can create skills for any domain where it has observed successful sessions. The more sessions it observes in a domain, the better the skills it produces.
What prevents the system from creating bad skills?
The simulated execution loop catches most quality issues. Skills that fail simulation are flagged for human review rather than deployed to production.