Commercial Photography and Requirement Translation

Hi, I am Kent. I work mainly in commercial advertising photography in Taiwan, and the company I founded is now entering its 10th year. Over the years, I have built my own way of understanding photography, image processing, and maintaining client relationships.

The most natural extension of my AI learning was to bring generative image tools deeper into commercial photography. But the real friction and pain points I felt while using early LLMs pulled my attention toward another path.

LLM Problems Beyond Image Generation

I started getting into AI around the GPT-3 to GPT-4 period. Since mainstream AI was not yet centered on multimodal models, I also tried image-generation models that were popular at the time, such as Stable Diffusion, and built image workflows with ComfyUI.

But I did not stay focused only on image-generation applications. Early LLMs often forgot or lost long context, became too agreeable in conversation, hallucinated, and treated every new chat as a fresh context. It was hard to accumulate progress and keep solving problems in depth. Those early LLM symptoms bothered me, so I started exploring ways to work around them myself.

Structural Symptoms of Early LLMs

The recurring issues included long-context forgetting and Lost in the Middle, being too willing to agree with the user, hallucinations around unverified information, and the feeling that every new chat started from zero instead of becoming a stable working state.

What I wanted was for the model to maintain a stable reasoning policy / task posture: not becoming overly agreeable, not rushing to finish, not pretending to be certain, and preserving reflection and state updates before and after tool use.

Structured Text and Context Replay

My earliest approach was almost like a natural-language notebook: write down the important state, rules, and judgments from the previous conversation, then inject them into a new chat to reproduce the prior context.

As the amount of content grew, I gradually moved toward Markdown and more structured natural language. Looking back, this path feels quite close to today’s Skill, AGENTS.md, and CLAUDE.md style of working.

Content Annotation Rules and XML Hybrid Format

As the re-injected Context kept accumulating, I found that Markdown alone gradually reached its limits. So I started building my own systematic content annotation rules to strengthen the structure of the text.

Later, I added systematic XML tags and combined them with Markdown into a hybrid format. I once tried injecting a 128K personal behavior-rule text in one pass, and even through very long conversations near a full context window, the consistency and stability were still reasonably good.

Memory Metabolism and Multi-Agent CLI Collaboration

After Agent CLI tools started expanding quickly in 2026, I began using VS Code with CLI agents for small coding experiments, and also started building a few agent-related tools of my own.

One example is IPL, an agent memory and learning system that does not rely on RAG, but instead uses text and memory distillation. Another is a file-based multi-agent CLI interaction system based on the idea that "Everything is a file", allowing agents such as Claude Code and Codex to communicate and collaborate asynchronously.

[ IPL ] In Project Learning

Extends in-context learning from a single conversation to the whole project. It does not train model weights; it uses text, distilled memory, and guardrails to carry learning forward inside a project.

Approaching an Idempotent State Machine

What I think is more worth mentioning is that the agent memory metabolism and collaboration described earlier actually grew out of the systematic content markup rules I had built during the chatbot period. Later, I tried applying that method to agents. The goal was not to hard-control agents externally with hooks or harnesses, but to rely on natural language, structured text, and process constraints.

Even across complex multi-turn tool use, including moments where the agent has to make its own judgments, I still want it to follow the same set of rules and produce consistent, reproducible results. In simple terms, this is an attempt to bring the agent closer to an idempotent state machine. It is not formal research, and there is still a lot of room for improvement. It is more like a long-term personal practice built out of interest.

01 / 07