Building your AI agent "toolkit" is becoming the equivalent of the perfect "productivity" setup where you spend your time reading blog posts, watching YouTube videos telling you how to be productive and creating habits and rituals...only to be overtaken by a person with a simple paper list of tasks that they work through.
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Lots of money being made by luring people into this trap.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
I asked Claude whether these elaborate words like "walk down the design tree" actually mean anything to the LLM and make a difference. The answer confirmed my gut feeling: You can just tell me to "be critical" and get mostly the same results.
Matt did incredible work teaching people TS, but this feels more like trying to create FOMO to sell snake oil and AI courses.
It feels to me that "walk down the design tree" has a specific meaning with respect to treating the design as a hierarchy (although whether that means BFS or DFS is still ambiguous). "Be critical" lacks that specificity.
Yes but then it’s better to spell those instructions out explicitly, eg state facts, state ambiguities / assumptions, inspect codebase, challenge assumptions, etc.
Problem is they don’t know how to express themselves and many people, especially those interested in tech, don’t want to learn.
I can’t tell you how many times I have a CS student in my office for advising and they tell me they only want to take technical courses, because anything reading or writing or psychology or history based is “soft”, unrelated to their major, and a waste of their time.
I’ve spent years telling them critical reading and expressive writing skills are very important to being a functioning adult, but they insist what they need to know can only be found in the Engineering college.
Much of my time at work is reading through quickly typed messages from my boss and understanding exactly what questions I need to ask in order to make it easy for him to answer clearly.
Engineers who lack soft skills cannot be effective in team environments.
Or, as I like to put it: I need to activate my personal transformers on my inner embeddings space to figure what is it I really want. And still, quite often, I think in terms of the programming language I'm used to and the library I'm familiar with.
So, to really create something new that I care about, LLMs don't help much.
Agree. For what it’s worth, in interviews Cherny (Claude Code creator) and Steinberger (OpenClaw creator) say they keep things simple and use none of the workflow frameworks. The latter even said he doesn’t even use plan mode, but I find that very useful: exiting plan mode starts clean with compressed context.
its not though if you're working in a massive codebase or on a distributed system that has many interconnected parts.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
But... plain Claude does that. At least for my codebase, which is nowhere close to your 10m line. But we do processing on lots of data (~100TB) and Claude definitely builds one-off tools and scripts to analyze it, which works pretty great in my experience.
I think people are looking at skills the wrong way. It's not like it gives it some kind of superpowers it couldn't do otherwise. Ideally you'll have Claude write the skills anyway. It's just a shortcut so you don't have to keep rewriting a prompt all over again and/or have Claude keep figuring out how to do the same thing repeatedly. You can save lots of time, tokens and manual guidance by having well thought skills.
Some people use these to "larp" some kind of different job roles etc and I don't think that's productive use of skills unless the prompts are truly exceptional.
At work I use skills to maintain code consistency. We instrumented a solid "model view viewmodel" architecture for a front-end app, because without any guard rails it was doing redundant data fetching and type casts and just messy overall. Having a "mvvm" rule and skill that defines the boundaries keeps the llm from writing a bunch of nonsense code that happens to work.
I have sometimes found "LARPing job roles" to be useful for expectations for the codebase.
Claude is kind of decent at doing "when in Rome" sort of stuff with your codebase, but it's nice to reinforce, and remind it how to deploy, what testing should be done before a PR, etc.
Even the most complex distributed systems can be understood with the context windows we have. Short of 1M+ loc, and even then you could use documentation to get a more succinct view of the whole thing.
This really doesn’t pan out in practice if you work a lot with these models
And also we know why: effective context depends on inout and task complexity. Our best guess right now is that we are often between 100k to 200k effective context length for frontier, 1m NIHS type models
Let me give you a counterexample. I'm working on a product for the national market, and i need to do all financial tasks, invoicing, submit to national fiscal databse etc. through a local accounting firm. So i integrate their API in the backend; this is a 100% custom API developed by this small european firm, with a few dozen restful enpoints supporting various accounting operations, and I need to use it programmatically to maintain sync for legal compliance. No LLM ever heard of it. It has a few hundred KB of HTML documentation that Claude can ingest perfectly fine and generate a curl command for, but i don't want to blow my token use and context on every interaction.
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
this is exactly how i use it too. i have a few custom MCP servers running on a mac mini homelab, one for permission management, one for infra gateway stuff. the key thing i learned is keeping CLAUDE.md updated with what each MCP server actually does and what inputs it expects. otherwise claude code will either not use the tool when it should, or call it with wrong params and waste a bunch of back and forth. once you document it properly it really does feel like having a team member who just knows how your stack works. the accounting use case is a great example because nobody else's generic tooling would ever cover that.
That's what you should be doing. Start from plain Claude, then add on to it for your specific use cases where needed. Skills are fantastic if used this way. The problem is people adding hundreds or thousands of skills that they download and will never use, but just bloat the entire system and drown out a useful system.
Your use is maybe more vanilla than you think. I think you are just getting shit done. Which is good.
Claude and an mcp and skill is plain to me. Writing your own agent connecting to LLMs to try to be better than Claude code, using Ralph loops and so on is the rabbit hole.
The basic problem is that the reporting and accounting rules are double plus bureaucratic and you need to have on hand multiple registers that show the financial situation at any time, submit them to the tax authority etc.
To give you a small taste: you need to issue an electronic invoice for each unique customer, and submit on the fly the tax authority - but these need to correlated monthly with the money in your business bank account. The paid invoices don't just go into your bank account, they are disbursed from time to time by the payment processor, on random dates that don't sync with the accounting month, so at end of month you have to have correlate precisely what invoice is paid or not. But wait, the card processor won't just send you the money in a lump sum, it will deduct from each payment some random fee that is determined by their internal formula, then, at the end of each month, add all those deducted fees (even for payments that have not been paid to you) and issue another invoice to you, which you need to account for in you books as being partially paid each month (from the fees deducted from payments already disbursed). You also have other payment channels, each with their fees etc. So I need to balance this whole overlapping intervals mess with all sort of edge cases, chargebacks and manual interventions I refuse to think about again.
This is one example, but there are also issues with wages and their taxation, random tax law changes in the middle of the month etc. The accountant can of course solve all this for you, but once you go a few hundred invoices per month (if you sell relatively cheap services) you are considered a "medium" business, so instead of paying for basic accounting services less than 100€ per month (have the certified accountant look over your books and sign them, as required by law), you will need more expensive packages which definitely add up to thousands in a few months.
This resonates with me. Sometimes I build up some artifacts within the context of a task, but these almost always get thrown away. There are primarily three reason I prefer a vanilla setup.
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
All I want is for my agent to save me time, and to become a _compounding_ multiplier for my output. As a PM, I mostly want to use it for demos and prototypes and ideation. And I need it to work with my fractured attention span and saturated meeting schedule, so compounding is critical.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
It's too early. People are trying all of the above. I use all of the above, specifically:
- A well-structured folder of markdown files that I constantly garden. Every sub-folder has a README. Every files has metadata in front-matter. I point new sessions at the entry point to this documentation. Constantly run agents that clean up dead references, update out of date information, etc. Build scripts that deterministically find broken links. It's an ongoing battle.
- A "continuation prompt" skill, that prompts the agent to collect all relevant context for another agent to continue
- Judicious usage of "memory"
- Structured systems made out of skills like GSD (Get Shit Done)
- Systems of "quality gate" hooks and test harnesses
For all of these, I have the agent set them up and manage them, but I've yet to find a context-management system that just works. I don't think we understand the "physics" of context management yet.
On your first point, one unexpected side effect I’ve noticed is that in an effort to offload my thinking to an agent, I often end up just doing the thinking myself. It’s a surprisingly effective antidote to writer’s block… a similar effect to journaling, and a good reason why people feel weird about sharing their prompts.
I’ve been thinking about this a lot. It’s obviously the ideal state of things. The challenge is that we’ve got existing docs frameworks and teams and inertia and unreleased features… and I don’t have time to wait for that when I’m trying to get something done today. Not to mention the trade off of writing in public vs. private.
One quick win I’ve thought could bridge this is updating our docs site to respond to `Accept: text/markdown` requests with the markdown version of the docs.
> Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
This is what I do; frankly I can't be arsed to take the time to write all these commands and skills and whatnot. I did use /init to get Claude to create a CLAUDE.md file, and I occasionally -- very occasionally -- go through it and correct anything that's no longer valid due to code changes (and then ask Claude to do the same).
But beyond that, I just ask it what I want it to ask, and that's it. I'm not convinced that putting more time into building the "toolbox" will actually give me significant returns on that time.
I do think that some of this (commands, skills, breaking up CLAUDE.md into separate rules files) can be useful, but it's highly context-dependent, and I think YAGNI applies here: don't front-load this work. Only set those up if you run into specific problems or situations where you think doing this work will make Claude work better.
My init.el file went from some 300 lines to under 50 with Claude's assistance. Some of that had to do with updating Emacs, but I really only use Emacs for Org mode so that contribution was minimal.
at work i've spent some time setting up our claude.md files and curated the .claude directory with relevant tools such as linear, figma, sentry, LSP, browser testing. sensible stuff anyone using these tools would want, it all works pretty well.
my only machine-specific config is overriding haiku usage with sonnet in claude code. i outline what i want in linear, have claude synthesize into a plan and we iterate until we're both happy, then i let it rip. works great.
then one of my juniors goes and loads up things like "superpowers" and all sorts of stuff that's started littering his PRs. i'm just not convinced this ricing of agents materially improves anything.
Understandable - I find skills for odd duck things and a simple set of rules you routinely prune work the best for me. Went from crappy code in niche projects to it nailing things first prompt almost every time now.
This. At work I have described this phenomenon as the equivalent of tinkering with the margins and fonts in your word processor instead of just writing your paper.
I've had the same thought recently and this definitely is a thing that you can do - but there are also cases where you get dramatically better results if you put some more effort into your setup.
e.g. spend time creating a skill about how to query production logs
if you work on platforms, frameworks, tools that are public knowledge, then yeah. If there’s nothing unique to your project or how to write code in it, build it, deploy it, operate it, yeah.
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
I’m seeing this more and more, where people build this artificial wall you supposedly need to climb to try agentic coding. That’s not the right way to start at all. You should start with a fresh .claude, empty AGENTS.md, zero skills and MCP and learn to operate the thing first.
I'd also go even further and say that you likely should never install ANY skill that you didn't create yourself (i mean, guided claude to create it for you works too), or "forked" an existing one and pulled only what you need.
Everyone's workflow is different and nobody knows which workflow is the right one. If you turn your harness into a junk drawer of random skills that get auto updated, you introduce yet another layer of nondeterminism into it, and also blow up your context window.
The only skill you should probably install instead of maintaining it yourself is playwright-cli, but that's pretty much it.
what? non techies are most at risk. There are a huge number of malicious skills. Not knowing or caring how to spot malicious behavior doesn’t mean someone shouldn’t be concerned about it, no matter how much they can’t or don’t want to do it.
I am an adminstrator of this stuff at my company and it’s an absolute effing nightmare devising policies that protect people from themselves. If I heard this come out of someone’s mouth underneath me I’d tell them to leave the room before I have a stroke.
And this is stuff like, if so and so’s machine is compromised, it could cost the company massive sums of money. for your personal use, fine, but hearing this cavalier attitude like it doesn’t matter is horrifying, because it absolutely does in a lot of contexts.
I run a small local non-profit which is essentially security hardening guide with some helper tooling that simplifies some concepts for non-techies (FDE, MFA, password managers etc).
LLMs have completely killed my motivation to continue running it. None of standard practices apply anymore
I had an issue with playwright MCP where only one Claude Code instance could be using it at a time, so I switched to Claude's built-in /chrome MCP.
In practice, I also find it more useful that the Chrome MCP uses my current profile since I might want Claude to look at some page I'm already logged in to.
I'm not very sophisticated here though. I mainly use use browser MCP to get around the fact that 30% of servers block agent traffic like Apple's documentation.
Would love if there is a way to parallelize playwright mcp using multiple agents and such, but it seems it's a fundamental limitation of that MCP that only on instance/tab can be controlled.
Chrome MCP is much slower and by default pretty much unusable because Claude seems to prefer to read state from screenshots. Also, no Firefox/Safari support means no cross-browser testing.
I was using the built-in chrome skill but it was too unreliable for me. So I switched to playwright cli and I can also have it use firefox to get help debugging browser-specific issues.
I used them for repeated problems or workflows I encounter when running with the default. If I find myself needing to repeat myself about a certain thing a lot, I put it into claude.md. When that gets too big or I want to have detailed token-heavy instructions that are only occasionally needed, I create a skill.
I also import skills or groups of skills like Superpowers (https://github.com/obra/superpowers) when I want to try out someone else's approach to claude code for a while.
You observe what it does to accomplish a particular task, and note any instances where it:
1. Had to consume context and turns by reading files, searching web, running several commands for what was otherwise a straightforward task
2. Whatever tool it used wasn't designed with agent usage in mind. Which most of the time will mean agent has to do tail, head, grep on the output by re-running the same command.
Then you create a skill that teaches how to do this in fewer turns, possibly even adding custom scripts it can use as part of that skill.
You almost never need a skill per se, most models will figure things out themselves eventually, skill is usually just an optimization technique.
Apart from this, you can also use it to teach your own protocols and conventions. For example, I have skills that teach Claude, Codex, Gemini how to communicate between themselves using tmux with some helper scripts. And then another skill that tell it to do a code review using two models from two providers, synthesize findings from both and flag anything that both reported.
Although, I have abandoned the built-in skill system completely, instead using my own tmux wrapper that injects them using predefined triggers, but this is stepping into more advanced territory. Built in skill system will serve you well initially, and since skills are nothing but markdown files + maybe some scripts, you can migrate them easily into whatever you want later.
Yes this is the path I’m taking. Experiment, build your own toolbox whether it’s hand rolled skills or particular skills you pull out from other public repos. Then maintain your own set.
You do not want to log in one day to find your favorite workflow has changed via updates.
Then again this is all personal preference as well.
This matters for big engineering teams who want to put _some_ kind of guardrails around Claude that they can scale out.
For example, I have a rule [^0] that instructs Claude to never start work until some pre-conditions are met. This works well, as it always seems to check these conditions before doing anything, every turn.
I can see security teams wanting to use this approach to feel more comfortable about devs doing things with agentic tools without worrying _as much_ about them wreaking havoc (or what they consider "havoc").
As well, as someone who's just _really_ getting started with agentic dev, spending time dumping how I work into rules helped Claude not do things I disapprove of, like not signing off commits with my GPG key.
That said, these rules will never be set in stone, at least not at first.
I'm also thinking on how we can put guardrails on Claude - but more around context changes. For example, if you go and change AGENTS.md, that affects every dev in the repo. How do we make sure that the change they made is actually beneficial? and thinking further, how do we check that it works on every tool/model used by devs in the repo? does the change stay stable over time?
Given the scope that AGENTS has, I would use PRs to test those changes and discuss them like any other large-impact area of the codebase (like configs).
If you wanted to be more “corporate” about it, then assuming that devs are using some enterprise wrapper around Claude or whatever, I would bake an instruction into the system prompt that ensures that AGENTS is only read from the main branch to force this convention.
This is harder to guarantee since these tools are non-deterministic.
This article isn't saying you must set up a big .claude folder before you start. It repeats several times that it's important to start small and keep it short.
It's also not targeted at first-timers getting their first taste of AI coding. It's a guide for how to use these tools to deal with frustrations you will inevitably encounter with AI coding.
Though really, many of the complaints about AI coding on HN are written by beginners who would also benefit from a simple .claude configuration that includes their preferences and some guidelines. A frequent complaint from people who do drive-by tests of AI coding tools before giving up is that the tools aren't reading their mind or the tools keep doing things the user doesn't want. Putting a couple lines into AGENTS.md or the .claude folder can fix many of those problems quickly.
Yes, but as soon as you start checking in and sharing access to a project with other developers these things become shared.
Working out how to work on code on your own with agentic support is one thing. Working out how to work on it as a team where each developer is employing agentic tools is a whole different ballgame.
1. Provision of optional tools: I may use an ai agent differently to all other devs on a team, but it seems useful for me to have access to the same set of project-specific commands, skills & MCP configs that my colleagues do. I amn't forced to use them but I can choose to on a case by case basis.
2. Guardrails: it seems sensible to define a small subset of things you want to dissuade everyone's agents from doing to your code. This is like the agentic extension of coding standards.
Most people do, most people don’t have wildly different setups do they? I’d bet there’s a lot in common between how you write code and how your coworkers do.
In my own group, agentic coding made sharing and collaboration go out the window because Claude will happily duplicate a bunch of code in a custom framework
In my AGENTS.md I have two lines in almost every single one:
- Under no condition should you use emoji's.
- Before adding a new function, method or class. Scan the project code base, and attached frame works to verify that something else can not be modified to fit the needs.
I'm curious about the token usage when it scans across multiple repositories to finding similar methods. As our code grows so fast, is it sustainable ?
I think the idea is that by creating these shared .claude files, you tell the agent how to develop for everyone and set shared standards for design patterns/architecture so that each user's agents aren't doing different things or duplicating effort.
Modern "skills" and Markdown formats of the day are no different than "save the kittens". All of these practices are promoted by influencers and adopted based on wishful thinking and anecdata.
Uh, this couldn't be more false. I've implemented these from scratch at my company and rolled them out org-wide and I've yet to watch a youtube video and don't consume any influencers. Mostly by just using the tools and reading documentation - as any other technical tool.
Perhaps your blanket statement could be wrong, and I would encourage you to let your mind be a bit more open. The landscape here is not what it was 6 months ago. This is an undeniable fact that people are going to have to come to terms with pretty soon. I did not want to be in this spot, I was forced to out of necessity, because the stuff does work.
Great, so how do you know this stuff works? Did you evaluate it against other approaches? How do you know it's actually reliable?
The Vercel team had some interesting findings[1]:
> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it.
Others had different findings for commonly accepted practices[2], some you may have adopted from reading documentation, which surely didn't come from influencers.
And yet others swear by magical Markdown documents[3].
So... who is the ultimate authority on what actually works, and who is just cargo culting the trendy practice of the week? And how is any of this different from what was being done a few years ago?
Sorry, but from your first comment, I don’t particularly feel inclined to help you figure this out. I was just offering I’ve already deployed these things at a scale with success using many of the configuration options offered as documentation in the op here. this stuff isn’t some mystical blackbox, although you seem to think it is.
I measure the tooling success with a suite of small prompt tests performing repeatable tasks, measuring the success rate over time, educating the broader team, and providing my own tried and tested in the field skills that I’ve shared to similar successes to the broader teams. We’ve seen a huge increase in velocity and lower bug rate, which are also very easily measurable (and long evaluated stats) enough to put me in the position I am, which was not a reluctant one. You’re perfectly free to view my long history on this topic on this forum to see I am a complete skeptic on this topic, and wouldn’t be here unless I had to.
everyone is figuring this out still. There is no authority, I am my own authority on what I have seen work and what hasn’t. Feel free to take of that what you will. I just wanted to provide a counterpoint to your initial claim. I’m certainly not going to expose to a fine degree what has worked for my org and what hasn’t due to obvious reasons.
2 months ago I built (with Claude) a quite advanced Python CLI script and Claude Skill that searches and filters the Claude logs to access information from other sessions or from the same session before context compaction. But today Claude Code has a builtin feature to search its logs and will readily do it when needed.
My point is, these custom things are often short lived band-aids, and may not be needed with better default harnesses or smarter future models.
I’ve been developing and working on dev tools for more than 15 years. I’ve never seen things evolve so rapidly.
Experiment, have fun and get things done, but don’t get too sure or attached to your patches.
It’s very likely the models and harnesses will keep improving around the gaps you see.
I’ve seen most of my AGENTS.md directives and custom tools fade away too, as the agents get better and better at reading the code and running the tests and feeding back on themselves.
.claude has become the new dotfiles. And what do people do when they want to start using dotfiles ? they copy other’s dotfiles and same is happening here :)
I totally agree with you that this not the right way to start. But, in my experience, the more you use the tool the more of a "feel" you get for it, and knowing how all these different pieces work and line up can be quite useful (though certainly not mandatory). It's been immensely frustrating to me how difficult it is to find all this info with all the low-quality junk that is out there on the internet.
> all the low-quality junk that is out there on the internet.
Isn't this article just another one in that same drawer?
> What actually belongs in CLAUDE.md - Write: - Import conventions, naming patterns, error handling styles
Then just a few lines below:
> Don’t write: - Anything that belongs in a linter or formatter config
The article overall seems filled with internal inconsistencies, so I'm not sure this article is adding much beyond "This is what an LLM generated after I put the article title with some edits".
I agree with most of this, with one important exception: you should have some form of sandboxing in place before running any local AI agent. The easiest way to do that is with .claude/settings.json[0].
This is important no matter how experienced you are, but arguable the most important when you don't know what you're doing.
0: or if you don't want to learn about that, you can use Claude Code Web
The part about permissions with settings.json [0] is laughable. Are we really supposed to list all potential variations of harmful commands? In addition to the `Bash(cat ./.env)`, we would also need to add `Bash(cat .env)`, Bash(tail ./.env)`, Bash(tail .env)`, `Bash(head ./.env)`, `Bash(sed '' ./.env)`, and countless others... while at the same time we allow something like `npm` to run?
I know the deny list is only for automatically denying, and that non-explicitly allowed command will pause, waiting for user input confirmation. But still it reminds me of the rationale the author of the Pi harness [1] gave to explain why there will be no permission feature built-in in Pi (emphasis mine):
> If you look at the security measures in other coding agents, *they're mostly security theater*. As soon as your agent can write code and run code, it's pretty much game over. [...] If you're uncomfortable with full access, run pi inside a container or use a different tool if you need (faux) guardrails.
As you mentioned, this is a big feature of Claude Code Web (or Codex/Antigravity or whatever equivalent of other companies): they handle the sand-boxing.
Yes. I don't bother with that. I feel like the risk of Claude Code running amok is pretty low, and I don't have it do long-running tasks that exceeds my desire to monitor it. (Not because I'm worried about it breaking things, it's just I don't use the tool in that way.)
I'm sure most folks run Claude without isolation or sandboxing. It's a terrible idea, but even most professional software developers don't think much about security.
There many decent options (cloud VMs, local VMs, Docker, the built-in sandboxing). My point is just that folks should research and set up at least one of them before running an agent.
Let's not fool ourselves here. If a security feature adds any amount of friction at all, and there's a simple way to disable it, users will choose to do so.
How did you contain Claude Code? Did you virtualize it? I just set up a simple firejail script for it. Not completely sure if it's enough but it's at least something.
You can download the devcontainer CLI and use it to start a Docker container with a working Claude Code install, simple firewall, etc. out of the box. (I believe this is how the VSCode extension works: It uses this repo to bootstrap the devcontainer).
this is true, but i think people are best off starting with SOME project that gives users an idea of how to organize and think about stuff. for me, this is gastown, and i now have what has gotta be the most custom gastown install out there. could not agree more that your ai experience must be that which you build for yourself, not a productized version that portends to magically agentize your life. i think this is the real genius of gastown— not how it works, but that it does work and yegge built it from his own mind. so i’ve taken the same lesson and run very, very far with it, while also going in a totally different direction in many ways. but it is a work of genius, and i respect the hell out of him for putting it out there.
It's not as bucolic as this when trying to get an org on board. We're currently very open to using Claude, but the unknowns are still the unknowns, so the guardrails the `.claude` folder provides gives us comfort when gaining familiarity with the tool.
Who is building an artificial wall? Maybe I skimmed the post too fast, but it doesn't seem like this information is being presented as "you have to know/do this before you start agentic engineering", just "this is some stuff to know."
with Anthropic already starting to sell "Claude Certified Architect" exams and a "Partner Network Program", I think a lot of this stuff is around building a side industry on top of it unfortunately
>If you tell Claude to always write tests before implementation, it will. If you say “never use console.log for error handling, always use the custom logger module,” it will respect that every time.
Feel little like this is generated and not based on experience. Claude.md should be short. Typescript strict mode isnt a gotcha, itll figure that out on its own easily, imo omit things like that. People put far too much stuff in claude, just a few lines and links to docs is all it needs. You can also @Agents.md and put everything there instead. Dont skills supercede commands? Subagents are good esp if you specify model, forked memory, linked skills, etc. Always ask what you can optimize after you see claude thrashing, then figure out how to encode that (or refactor your scripts or code choices).
Always separate plan from implementation and clear context between, its the build up of context that makes it bad ime.
The intro paragraph sounds exactly like Claude’s phrasing. So much so that I couldn’t read the rest of the article because I assumed I could just ask Claude about the topic.
Exactly this. If there is some nuance in the article vs what Claude can tell you, then that's worthwhile. This article is just generated with a specific prompt on style but very little content editing. What's the point? It's like posting the results of a Google search. The prompt would have been more interesting.
It's not against the rules to post AI slop here, and I don't necessarily think it should be. But I do wonder how we value written content going forward. There's value to taste and style and editing and all the other human things... there's very little value in the actual words themselves. We'll figure it out.
I keep seeing these posts, and here's the most interesting thing, for me.
I get the best results with the least number of skills and unnecessary configuration in place.
People are spending way too much time over-prescribing these documents, but AI is like a competent but nervous adult. The more you give it, the dumber it gets.
I wish all model providers would converge on a standard set of files, so I could switch easily from Claude to Codex to Cursor to Opencode depending on the situation
Issue is that both harness and specific model matters a lot in what type of instruction works best, if you were to use Anthrophic's models together with the best way to do prompting with Codex and GPT models, you'd get a lot worse results compared to if you use GPT models with Codex, prompted in the way GPTs react best to them.
I don't think people realize exactly how important the specific prompts are, with the same prompt you'd get wildly different results for different models, and when you're iterating on a prompt (say for some processing), you'd do different changes depending on what model is being used.
Having experimented with soft-linking AGENTS.md into CLAUDE.md and GEMINI.md, this lines up well with my experience. I now just let each time maintain it's own files and don't try to combine them. If it's something like my custom "## Agent Instructions" then I just copy-pasta and it's not been hard, and since that section is mostly identical I just treat AGENTS.md as the canonical and copy/paste any changes over to the others.
I think one of the main examples that i saw in a swyx article a while back is that using the sort of ALL CAPS and *IMPORTANT* language that works decently with claude will actually detune the codex models and make them perform worse. I will see if I can find the post
Because that just does it for you, it doesn't help me understand how to write better prompts.
Actually, I can just read the skill with my own eyes and then I can also learn. So, thank you for sharing. It's interesting to read through what it suggests for different models - it fits for the ones I work with regularly, but there are many I don't know the strengths and weaknesses of.
Cursor supports all the Claude file patterns, including plugins and marketplaces. We leverage that to support both Claude and Cursor with same instructions and skills
Claude Fast has very good alternate documentation for this. [0] I don't understand the hate for defining .claude/ . It is quite easy to have the main agent write the files. Then rather doing one shot coding, instead iterate quickly updating .claude/ I'm at the point where .claude/ makes copies of itself, performs the task, evaluates, and updates itself. I'm not coding code, I'm coding .claude/ which does everything else. This is also a mechanism for testing .claude, agents, and instructions which would be useful for sharing and reuse in an organization.
Great link, thanks for sharing! Read and bookmarked it.
TLDR "CLAUDE.md isn't documentation for Claude to read - it's an operating system for Claude to run. Define behavior, delegate knowledge to skills, and build a system that improves itself over time."
The real wall I never see people talking about is, yes, you can tell Claude to update whatever file you want, but you have to be aware that if it's .claude/INSTRUCTIONS.md or CLAUDE.md that you need to tell Claude to re-read those files because it wrote the contents but its not treating it as if it were fresh instructions, it will run off whatever the last time it read that file was, so if it never existed, it will not know. I believe Claude puts those instructions in a very specific part of its context window.
So when Anthropic releases a new model that "breaks compatibility" with some Markdown files, do we call it "refactoring" to find (guess) the required changes to have the desired outcome again? Don't we create brittle specifications to fit a version of a model?
Nice! Article didn't mention but ~/.claude/plans is where it stores plan md file when running in plan mode. I find it useful to open or backup plans from the directory.
I've been going heavily in the direction of globally configured MCP servers and composite agents with copilot, and just making my own MCP servers in most cases.
Then all I have to do is let the agents actually figure out how to accomplish what I ask of them, with the highly scoped set of tools and sub agents I give them.
I find this works phenomenally, because all the .agent.md file is, is a description of what the tools available are. Nothing more complex, no LARP instructions. Just a straightforward 'here's what you've got'.
And with agents able to delegate to sub agents, the workflow is self-directing.
Working with a specific build system? Vibe code an MCP server for it.
Making a tool of my own? MCP server for dev testing and later use by agents.
On the flipside, I find it very questionable what value skills and reusable prompts give. I would compare it to an architect playing a recording of themselves from weeks ago when talking to their developers. The models encode a lot of knowledge, they just need orientation, not badgering, at this point.
The best thing I’ve done so far is put GitHub behind an API proxy and reject pushes and pull requests that don’t meet a criteria, plus a descriptive error.
I find it forgets to read or follow skills a lot of the time, but it does always try to route around HTTP 400s when pushing up its work.
The claim that "whatever you write in CLAUDE.md, Claude will follow" is doing a lot of heavy lifting. In practice CLAUDE.md is a suggestion, not a contract. Complex tasks and compaction will dilute the use of CLAUDE.md, especially once the context window runs out.
This is correct. All of these .md files are just blobs of text that the LLM matches against. They might increase the likelihood of something happening or not happening.
They look to me like people actually want to build deterministic workflows, but blobs of text are the wrong approach for that. The right tool is code that controls the agent through specific states and validates the tool calls step by step.
The .claude folder structure reminds me of how Terraform organizes state files. Smart move putting conversation history in Json rether than some propiertary format, makes it trivial to grep through old conversations or build custom analysis tools.
Ha yeah that makes sense. Having the AI read its own conversation history in a format it already understands is a nice side effect of keeping it in plain Json.
So that's what "software engineering" has become nowadays ? Some cargo cult basically. Seriously all of this gives red flag. No statements here are provable. It's just like langhchain that was praised and then everyone realized it's absolute dog water. Just like MCP too. The job in 2026 is really sad.
I think I'm finding a pretty good niche for myself honestly. IMO, Software engineering is more so splitting into different professions based on the work is produces.
This sort of "prompt and pray" flow really works for people, as in they can make products and money, however, I do think the people that succeed today also would've reached for no-code tools 5 years ago and seen similar success. It's just faster and more comprehensive now. I think the general theme of the products remains the same though; not un-important or worthless, but it tends to be software that has effects that say INSIDE the realm of software. I feel like there's always been a market for that, as it IS important, it's just not WORTH the time and money to the right people to "engineer" those tools. A lot of SaaS products filled that niche for many years.
While it's not a way I want to work, I am also becoming comfortable with respecting that as a different profession for producing a certain brand of software that does have value, and that I wasn't making before. The intersection of that is opportunity I'm missing out on; no fault to anyone taking it!
The software engineer that writes the air traffic avoidance system for a plane better take their job seriously, understand every change they make, and be able to maintain software indefinitely. People might not care a ton about how their sales tracking software is engineered, but they really care about the engineering of the airplane software.
I think this is mostly right. The primary difference is that with no code you had to change platforms, but the Prompt and Pray method can be brought to bear on any software easily even the air traffic avoidance system.
It shouldn’t be, but it’s going to take some catastrophic events to convince people that we have to work to make sure we understand the systems we’re building and keep everything from devolving into vibe coded slop.
> the Prompt and Pray method can be brought to bear on any software easily even the air traffic avoidance system.
I guess that's why I see it as a separate profession, as in we have to actually profess a standard for how a professional in our field acts and believes. I think it's OK for it to bifurcate into two different fields, but Software Engineering would need to specifically reject prompt-and-pray on a principled and rational basis.
Sadly yes, that might require real cost to life in order to find out the "why" side of that rational basis. If you meet anyone that went to an engineering school in Québec, ask them about the ceremony they did and the ring they received. [0] It's not like that ceremony fixes anything, but it's a solemn declaration of responsibility which to me at least, sets a contract with society that says "we won't make things that harm you".
>Claude Code users typically treat the .claude folder like a black box. They know it exists. They’ve seen it appear in their project root. But they’ve never opened it, let alone understood what every file inside it does.
I know we are living in a post-engineering world now, but you can't tell me that people don't look at PRs anymore, or their own diffs, at least until/if they decide to .gitignore .claude.
I don't. I have Claude do all my PR reviews, running in a daily loop in the morning. The truth is an LLM is better at code review than the average programmer.
I'm a senior engineer who has been shipping code since before GitHub and PR reviews was a thing. Thankfully LLMs have freed me from being asked to read other people's shit code for hours every day.
Is there a completely free coding assistant agent that doesn't require you to give a credit card to use it?
I recently tried IntelliJ for Kotlin development and it wanted me to give a credit card for a 30 day trial. I just want something that scans my repo and I tell it the changes I want and it does it. If possible, it would also run the existing tests to make sure its changes don't break anything.
There are lots! Too many to cover in a single HN comment, and this space is evolving rapidly so I encourage you to look around.
While the coding assistants are pretty much universally free, you still need to connect them to a model. The model tokens generally cost something once you've gone past a certain quota.
I'm not sure if this is still true, but if you have a Google account, Gemini Code Assist had a quite generous "free tier" that I used for a while and found it do be pretty decent.
It's shocking how shitty claude code CLI app is - config is brittle shit (setting up a plugin LSP is searching through GitHub issues and guessing which parameters you messed up), hooks render errors in the app when there are none and the permission harness is barely documented, zero customization options (would you like the agent config come from a different folder than source root ? nope). Going through gihub issues, same issue you hit has been open since beginning of 2025 and ignored - their issues are /dev/null - it's basically a user forum.
I think this does a great job of explaining the .claude directories in a beginner friendly way. And I don’t necessarily read it as “you have to do all this, before you start”.
It has a few issues with outdated advice (e.g. commands has been merged with skills), but overall I might use share it with co-workers who needs an introduction to the concept.
Completely tangential, but can we please stop putting one million files at the root of the project which have nothing to do with the project? Can we land on a convention like, idk, a `.meta` folder (not the meta company, the actual word), or whatever, in which all of these Claude.md, .swift-version, Code-of-Conduct.md, Codeowners, Contributing.md, .rubocop.yml, .editorconfig, etc. files would go??
Here's a question that I hope is not too off-topic.
Do people find the nano-banana cartoon infographics to be helpful, or distracting? Personally, I'm starting to tire seeing all the little cartoon people and the faux-hand-drawn images.
I haven't come around any AI generated imagery in documents / slides that adds any value. It's more the opposite, they stand out like a sore thumb and often even reduce usability since text cannot be copied. Oh and don't get me started on leadership adding random AI generated images to their emails just to show that they use AI.
The problems are not visual but epistemic. If the author didn't specify enough to produce a useful chart, then it's going to be the diagram equivalent of stock images thrown on a finished presentation by a lazy intern. You can't rejection-sample away this kind of systemic fault.
The simple truth we're about to realize is there is no free lunch: a tool cannot inject more intent into a piece than its author put in. It might smooth out some blemishes or highlight some alternative choices, but it can't transform the input "make me a video game" into something greater than a statistical mix-mash of the concept. And traditional tools of automation give you a much better, more precise interface for intent than natural language, which allows these vagaries.
> Clutter is the disease of American writing. We are a society strangling in unnecessary words, circular constructions, pompous frills and meaningless jargon.
> Look for the clutter in your writing and prune it ruthlessly. Be grateful for everything you can throw away. Reexamine each sentence you put on paper. Is every word doing new work? Can any thought be expressed with more economy?
Most of the time I find them distracting, and sometimes a huge negative on the article. In this particular article though, they're well done and relevant, and I think they add quite a bit. It's a highly personal opinion kind of thing though for sure.
Some of the others, I don’t feel like added value, but I agree that these are some of the best of a practice that I agreed does not add a ton of value typically
I am a victim of AI-documentation-slop at work, and the result is that I've become far more "Tuftian" in my preferences than ever before. In the past, I was a fan of beautiful design and sometimes liked nice colors and ornaments. Now, though, I've a fan of sparse design and relevant data (not information -- lots of information is useless slop). I want content that's useful and actionable, and the majority of the documents many of my peers create using Claude, Gemini or ChatGPT are fluffy broadsheets of irrelevant filler, rarely containing insights and calls-to-action.
Bad infographics existed long before image models.
If the graphic still needs paragraphs to decode and doesn't let the reader pull out the key facts faster than plain text, it's not an infographic so much as cargo-cult design pasted on top of an explanation.
But they had already lost me at all the links, and the fact there's not a red wire through the entire article.
The first thing my eyes skimmed was:
> CLAUDE.md: Claude’s instruction manual
> This is the most important file in the entire system. When you start a Claude Code session, the first thing it reads is CLAUDE.md. It loads it straight into the system prompt and keeps it in mind for the entire conversation.
No it's not. Claude does not read this until it is relevant. And if it does, it's not SOT. So no, it's argumentatively not the most important file.
Are you certain? My understanding was that this is automatically injected in the context, and in my experience that's how it worked. I never see 'ReadFile(claude.md)', and yet claude is aware of some conventions I put in there.
Maybe. But I kind of view LinkedIn as a social network for people who only by the grace of a couple better decisions are talking about real business and not multilevel marketing schemes… but otherwise use the same themes and terminologies.
Like mostly people who have confused luck and success, or business acumen for religion.
So I wouldn’t use LinkedIn as a positive data point of what’s hot.
Off topic but earlier today I asked Gemini to read this article and advise how to do the same things for OpenCode. I am fascinated with trying to get good performance from small local models.
If these different agents could agree on a standard location that would be great. The specs are almost the same for .github and Claude but Claude won't even look at the .github location.
There already is, it's ~/.agents and you use symlinks for .claude, and the dir structure is pretty similar and anything you want to reuse across models is pretty standardized, just not formalized.
are agents/ still relevant after we got skills? I am genuinely confused on why I would need custom system prompts for specific agents, what should I use them for?
nice writeup. if you have good claude.md, .md files, .skills or mcp cli you want to monetize I built mog.md to let people/agents buy and sell these things.
The fuck? What's next, configuring maven and pom.xml? At least XML is unambiguous, well specified, and doesn't randomly refuse to compile 2% of the time..
Yea I went through my global claude skills and /context yesterday because claude was performing terribly. I deleted a bunch of stuff including memory and anecdotally got better results later on in the day.
AI agents like claude is slowly moving to config hell, as we often see in deployment pipelines, project setup etc. This is always a neverending timesink, and because of AI can/will probably need to be altered very frequently.
In the end it will still produce slop you need to review line by line.
The question is: Do you want to write code you know and verified that works, or review code written by AI that is of junior dev quality that is not verified.
> "The project-level folder holds team configuration. You commit it to git. Everyone on the team gets the same rules, the same custom commands, the same permission policies."
> "Most people either write too much or too little. Here’s what works."
It feels like I've been teleported into a recent LinkedIn feed. Do real people actually already write like AI or is it AI generated?
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
This particular skill is not great.
I can’t tell you how many times I have a CS student in my office for advising and they tell me they only want to take technical courses, because anything reading or writing or psychology or history based is “soft”, unrelated to their major, and a waste of their time.
I’ve spent years telling them critical reading and expressive writing skills are very important to being a functioning adult, but they insist what they need to know can only be found in the Engineering college.
Engineers who lack soft skills cannot be effective in team environments.
So, to really create something new that I care about, LLMs don't help much.
They are still useful for plenty of other tasks.
We used to have the very difficult task of producing working scalable maintainable code describing complex systems which do what we need them to do.
Now on top of it we have the difficult task of producing this code using constantly mutating complex nondeterministic systems.
We are the circus bear riding a bicycle on a high wire now being asked to also spin plates and juggle chainsaws.
Maybe singularity means that time sunk into managing LLMs is equal to time needed to manually code similar output in assembly or punch cards.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
What sort of skills are you referring to?
Claude is kind of decent at doing "when in Rome" sort of stuff with your codebase, but it's nice to reinforce, and remind it how to deploy, what testing should be done before a PR, etc.
Skills are crazy useful to tell Claude how to debug your particular project, especially when you have a library of useful scripts for doing so.
And also we know why: effective context depends on inout and task complexity. Our best guess right now is that we are often between 100k to 200k effective context length for frontier, 1m NIHS type models
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
Though, you get such a huge bang from customizing your config that I can easily see how you could go down that slippery slope.
Claude and an mcp and skill is plain to me. Writing your own agent connecting to LLMs to try to be better than Claude code, using Ralph loops and so on is the rabbit hole.
(I'm genuinely asking)
To give you a small taste: you need to issue an electronic invoice for each unique customer, and submit on the fly the tax authority - but these need to correlated monthly with the money in your business bank account. The paid invoices don't just go into your bank account, they are disbursed from time to time by the payment processor, on random dates that don't sync with the accounting month, so at end of month you have to have correlate precisely what invoice is paid or not. But wait, the card processor won't just send you the money in a lump sum, it will deduct from each payment some random fee that is determined by their internal formula, then, at the end of each month, add all those deducted fees (even for payments that have not been paid to you) and issue another invoice to you, which you need to account for in you books as being partially paid each month (from the fees deducted from payments already disbursed). You also have other payment channels, each with their fees etc. So I need to balance this whole overlapping intervals mess with all sort of edge cases, chargebacks and manual interventions I refuse to think about again.
This is one example, but there are also issues with wages and their taxation, random tax law changes in the middle of the month etc. The accountant can of course solve all this for you, but once you go a few hundred invoices per month (if you sell relatively cheap services) you are considered a "medium" business, so instead of paying for basic accounting services less than 100€ per month (have the certified accountant look over your books and sign them, as required by law), you will need more expensive packages which definitely add up to thousands in a few months.
Go be an entrepreneur, they said.
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
- A well-structured folder of markdown files that I constantly garden. Every sub-folder has a README. Every files has metadata in front-matter. I point new sessions at the entry point to this documentation. Constantly run agents that clean up dead references, update out of date information, etc. Build scripts that deterministically find broken links. It's an ongoing battle.
- A "continuation prompt" skill, that prompts the agent to collect all relevant context for another agent to continue
- Judicious usage of "memory"
- Structured systems made out of skills like GSD (Get Shit Done)
- Systems of "quality gate" hooks and test harnesses
For all of these, I have the agent set them up and manage them, but I've yet to find a context-management system that just works. I don't think we understand the "physics" of context management yet.
Great docs help you, your agents, your team and your customers.
If you’re confused and the agent can’t figure it out reliably how can anyone?
Easier said than done of course. And harder now than ever if the products are rapidly changing from agentic coding too.
One of my only universal AGENTS.md rules is:
> Write the pull request title and description as customer facing release notes.
One quick win I’ve thought could bridge this is updating our docs site to respond to `Accept: text/markdown` requests with the markdown version of the docs.
* Claude trying to install packages into my Python system interpreter - (always use uv and venvs)
* Claude pushing to main - (don't push to main ever)
* When creating a PR, completely ignoring how to contribute (always read CONTRIBUTING.md when creating a PR)
* Yellow ANSI text in console output - (Color choices must be visible on both dark and light backgrounds)
Because I got sick of repeating myself about the basics.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
But beyond that, I just ask it what I want it to ask, and that's it. I'm not convinced that putting more time into building the "toolbox" will actually give me significant returns on that time.
I do think that some of this (commands, skills, breaking up CLAUDE.md into separate rules files) can be useful, but it's highly context-dependent, and I think YAGNI applies here: don't front-load this work. Only set those up if you run into specific problems or situations where you think doing this work will make Claude work better.
my only machine-specific config is overriding haiku usage with sonnet in claude code. i outline what i want in linear, have claude synthesize into a plan and we iterate until we're both happy, then i let it rip. works great.
then one of my juniors goes and loads up things like "superpowers" and all sorts of stuff that's started littering his PRs. i'm just not convinced this ricing of agents materially improves anything.
e.g. spend time creating a skill about how to query production logs
All the fancy frameworks are vibe coded, so why could they do better than something you do by yourself?
At most get playwright MCP in so the agent can see the rendered output
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
Everyone's workflow is different and nobody knows which workflow is the right one. If you turn your harness into a junk drawer of random skills that get auto updated, you introduce yet another layer of nondeterminism into it, and also blow up your context window.
The only skill you should probably install instead of maintaining it yourself is playwright-cli, but that's pretty much it.
Ignore original comment below, as the post is technical so is the parent comment: for techies
---
That applies to tech users only.
Non-tech users starting to use Claude code and won't care to get the job done
Claude introduced skills is to bring more non-tech users to CLI as a good way to get your feet wet.
Not everyone will go for such minute tweaks.
I am an adminstrator of this stuff at my company and it’s an absolute effing nightmare devising policies that protect people from themselves. If I heard this come out of someone’s mouth underneath me I’d tell them to leave the room before I have a stroke.
And this is stuff like, if so and so’s machine is compromised, it could cost the company massive sums of money. for your personal use, fine, but hearing this cavalier attitude like it doesn’t matter is horrifying, because it absolutely does in a lot of contexts.
LLMs have completely killed my motivation to continue running it. None of standard practices apply anymore
In practice, I also find it more useful that the Chrome MCP uses my current profile since I might want Claude to look at some page I'm already logged in to.
I'm not very sophisticated here though. I mainly use use browser MCP to get around the fact that 30% of servers block agent traffic like Apple's documentation.
Chrome MCP is much slower and by default pretty much unusable because Claude seems to prefer to read state from screenshots. Also, no Firefox/Safari support means no cross-browser testing.
There appears to be https://github.com/sumyapp/playwright-parallel-mcp which may be worth trying.
I also import skills or groups of skills like Superpowers (https://github.com/obra/superpowers) when I want to try out someone else's approach to claude code for a while.
1. Had to consume context and turns by reading files, searching web, running several commands for what was otherwise a straightforward task
2. Whatever tool it used wasn't designed with agent usage in mind. Which most of the time will mean agent has to do tail, head, grep on the output by re-running the same command.
Then you create a skill that teaches how to do this in fewer turns, possibly even adding custom scripts it can use as part of that skill.
You almost never need a skill per se, most models will figure things out themselves eventually, skill is usually just an optimization technique.
Apart from this, you can also use it to teach your own protocols and conventions. For example, I have skills that teach Claude, Codex, Gemini how to communicate between themselves using tmux with some helper scripts. And then another skill that tell it to do a code review using two models from two providers, synthesize findings from both and flag anything that both reported.
Although, I have abandoned the built-in skill system completely, instead using my own tmux wrapper that injects them using predefined triggers, but this is stepping into more advanced territory. Built in skill system will serve you well initially, and since skills are nothing but markdown files + maybe some scripts, you can migrate them easily into whatever you want later.
You do not want to log in one day to find your favorite workflow has changed via updates.
Then again this is all personal preference as well.
For example, I have a rule [^0] that instructs Claude to never start work until some pre-conditions are met. This works well, as it always seems to check these conditions before doing anything, every turn.
I can see security teams wanting to use this approach to feel more comfortable about devs doing things with agentic tools without worrying _as much_ about them wreaking havoc (or what they consider "havoc").
As well, as someone who's just _really_ getting started with agentic dev, spending time dumping how I work into rules helped Claude not do things I disapprove of, like not signing off commits with my GPG key.
That said, these rules will never be set in stone, at least not at first.
[^0]: https://github.com/carlosonunez/bash-dotfiles/blob/main/ai/c...
If you wanted to be more “corporate” about it, then assuming that devs are using some enterprise wrapper around Claude or whatever, I would bake an instruction into the system prompt that ensures that AGENTS is only read from the main branch to force this convention.
This is harder to guarantee since these tools are non-deterministic.
cute that you think cluade gives a rat ass about this.
It's also not targeted at first-timers getting their first taste of AI coding. It's a guide for how to use these tools to deal with frustrations you will inevitably encounter with AI coding.
Though really, many of the complaints about AI coding on HN are written by beginners who would also benefit from a simple .claude configuration that includes their preferences and some guidelines. A frequent complaint from people who do drive-by tests of AI coding tools before giving up is that the tools aren't reading their mind or the tools keep doing things the user doesn't want. Putting a couple lines into AGENTS.md or the .claude folder can fix many of those problems quickly.
Working out how to work on code on your own with agentic support is one thing. Working out how to work on it as a team where each developer is employing agentic tools is a whole different ballgame.
Is this a hangover from when the tools were not as good?
1. Provision of optional tools: I may use an ai agent differently to all other devs on a team, but it seems useful for me to have access to the same set of project-specific commands, skills & MCP configs that my colleagues do. I amn't forced to use them but I can choose to on a case by case basis.
2. Guardrails: it seems sensible to define a small subset of things you want to dissuade everyone's agents from doing to your code. This is like the agentic extension of coding standards.
Most people do, most people don’t have wildly different setups do they? I’d bet there’s a lot in common between how you write code and how your coworkers do.
IMHO most of this “customize your config to be more productive” stuff will go away within a year, obsoleted by improved models and harnesses.
Just like how all the lessons for how to use LLMs in code from 1-2 years ago are already long forgotten.
Perhaps your blanket statement could be wrong, and I would encourage you to let your mind be a bit more open. The landscape here is not what it was 6 months ago. This is an undeniable fact that people are going to have to come to terms with pretty soon. I did not want to be in this spot, I was forced to out of necessity, because the stuff does work.
The Vercel team had some interesting findings[1]:
> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it.
Others had different findings for commonly accepted practices[2], some you may have adopted from reading documentation, which surely didn't come from influencers.
And yet others swear by magical Markdown documents[3].
So... who is the ultimate authority on what actually works, and who is just cargo culting the trendy practice of the week? And how is any of this different from what was being done a few years ago?
[1]: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
[2]: https://arxiv.org/abs/2602.11988
[3]: https://soul.md/
I measure the tooling success with a suite of small prompt tests performing repeatable tasks, measuring the success rate over time, educating the broader team, and providing my own tried and tested in the field skills that I’ve shared to similar successes to the broader teams. We’ve seen a huge increase in velocity and lower bug rate, which are also very easily measurable (and long evaluated stats) enough to put me in the position I am, which was not a reluctant one. You’re perfectly free to view my long history on this topic on this forum to see I am a complete skeptic on this topic, and wouldn’t be here unless I had to.
everyone is figuring this out still. There is no authority, I am my own authority on what I have seen work and what hasn’t. Feel free to take of that what you will. I just wanted to provide a counterpoint to your initial claim. I’m certainly not going to expose to a fine degree what has worked for my org and what hasn’t due to obvious reasons.
have a good day!
My point is, these custom things are often short lived band-aids, and may not be needed with better default harnesses or smarter future models.
I’ve been developing and working on dev tools for more than 15 years. I’ve never seen things evolve so rapidly.
Experiment, have fun and get things done, but don’t get too sure or attached to your patches.
It’s very likely the models and harnesses will keep improving around the gaps you see.
I’ve seen most of my AGENTS.md directives and custom tools fade away too, as the agents get better and better at reading the code and running the tests and feeding back on themselves.
Isn't this article just another one in that same drawer?
> What actually belongs in CLAUDE.md - Write: - Import conventions, naming patterns, error handling styles
Then just a few lines below:
> Don’t write: - Anything that belongs in a linter or formatter config
The article overall seems filled with internal inconsistencies, so I'm not sure this article is adding much beyond "This is what an LLM generated after I put the article title with some edits".
This is important no matter how experienced you are, but arguable the most important when you don't know what you're doing.
0: or if you don't want to learn about that, you can use Claude Code Web
I know the deny list is only for automatically denying, and that non-explicitly allowed command will pause, waiting for user input confirmation. But still it reminds me of the rationale the author of the Pi harness [1] gave to explain why there will be no permission feature built-in in Pi (emphasis mine):
> If you look at the security measures in other coding agents, *they're mostly security theater*. As soon as your agent can write code and run code, it's pretty much game over. [...] If you're uncomfortable with full access, run pi inside a container or use a different tool if you need (faux) guardrails.
As you mentioned, this is a big feature of Claude Code Web (or Codex/Antigravity or whatever equivalent of other companies): they handle the sand-boxing.
[0] https://blog.dailydoseofds.com/i/191853914/settingsjson-perm...
[1] https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
I never said "permissions", I said "sandboxing". You can configure that in settings.json.
https://code.claude.com/docs/en/sandboxing#configure-sandbox...
There many decent options (cloud VMs, local VMs, Docker, the built-in sandboxing). My point is just that folks should research and set up at least one of them before running an agent.
https://github.com/anthropics/claude-code
You can download the devcontainer CLI and use it to start a Docker container with a working Claude Code install, simple firewall, etc. out of the box. (I believe this is how the VSCode extension works: It uses this repo to bootstrap the devcontainer).
Basic instructions:
- Install the devcontainer CLI: `https://github.com/devcontainers/cli#install-script`
- Clone the Claude Code repo: `https://github.com/anthropics/claude-code`
- Navigate to the top-level repo directory and bring up the container: `devcontainer --workspace-folder . up`
- Start Claude in the container: `devcontainer exec --workspace-folder . bash -c "exec claude"`
P.S. It's all just Docker containers under the hood.
Better isolation than running it in a container.
which is basically every setup because claude sucks at calling skills and forget everything in claude.md with a few seconds.
>If you tell Claude to always write tests before implementation, it will. If you say “never use console.log for error handling, always use the custom logger module,” it will respect that every time.
It just isn't true lol
Always separate plan from implementation and clear context between, its the build up of context that makes it bad ime.
It's not against the rules to post AI slop here, and I don't necessarily think it should be. But I do wonder how we value written content going forward. There's value to taste and style and editing and all the other human things... there's very little value in the actual words themselves. We'll figure it out.
Don't skills sit in context while custom slash commands are only manually invoked?
The difference isn't clear to me, especially since, upon googling it right now, I see that skills can also be invoked with a /slash.
I get the best results with the least number of skills and unnecessary configuration in place.
People are spending way too much time over-prescribing these documents, but AI is like a competent but nervous adult. The more you give it, the dumber it gets.
I don't think people realize exactly how important the specific prompts are, with the same prompt you'd get wildly different results for different models, and when you're iterating on a prompt (say for some processing), you'd do different changes depending on what model is being used.
Would also be interested in examples of a CLAUDE.md file that works well in Claude, but works poorly with Codex.
https://github.com/nidhinjs/prompt-master
Actually, I can just read the skill with my own eyes and then I can also learn. So, thank you for sharing. It's interesting to read through what it suggests for different models - it fits for the ones I work with regularly, but there are many I don't know the strengths and weaknesses of.
[0] https://claudefa.st/blog/guide/mechanics/claude-md-mastery
TLDR "CLAUDE.md isn't documentation for Claude to read - it's an operating system for Claude to run. Define behavior, delegate knowledge to skills, and build a system that improves itself over time."
Just read the official Claude documentation:
https://code.claude.com/docs/
Coming soon, unit, behavioural and regression tests for your prompts and skills :P
You’ll have:
* Claude model version
* Claude Code prompts and tools
* Your own prompts and skills and whatnot
* Your repository’s source code (= the input)
All of those change constantly, it’s not like it’s some kind of SWE benchmark.
Then all I have to do is let the agents actually figure out how to accomplish what I ask of them, with the highly scoped set of tools and sub agents I give them.
I find this works phenomenally, because all the .agent.md file is, is a description of what the tools available are. Nothing more complex, no LARP instructions. Just a straightforward 'here's what you've got'.
And with agents able to delegate to sub agents, the workflow is self-directing.
Working with a specific build system? Vibe code an MCP server for it.
Making a tool of my own? MCP server for dev testing and later use by agents.
On the flipside, I find it very questionable what value skills and reusable prompts give. I would compare it to an architect playing a recording of themselves from weeks ago when talking to their developers. The models encode a lot of knowledge, they just need orientation, not badgering, at this point.
The best thing I’ve done so far is put GitHub behind an API proxy and reject pushes and pull requests that don’t meet a criteria, plus a descriptive error.
I find it forgets to read or follow skills a lot of the time, but it does always try to route around HTTP 400s when pushing up its work.
They look to me like people actually want to build deterministic workflows, but blobs of text are the wrong approach for that. The right tool is code that controls the agent through specific states and validates the tool calls step by step.
https://claudefa.st/blog/guide/mechanics/claude-md-mastery
Ultimately you can't force claude to solve any problem but you could make it so constraints are kept.
A simple way is a git hook thay runs all the deterministic things you care about.
Getting claude to follow your guidance files consistently is a bit maddening.
This sort of "prompt and pray" flow really works for people, as in they can make products and money, however, I do think the people that succeed today also would've reached for no-code tools 5 years ago and seen similar success. It's just faster and more comprehensive now. I think the general theme of the products remains the same though; not un-important or worthless, but it tends to be software that has effects that say INSIDE the realm of software. I feel like there's always been a market for that, as it IS important, it's just not WORTH the time and money to the right people to "engineer" those tools. A lot of SaaS products filled that niche for many years.
While it's not a way I want to work, I am also becoming comfortable with respecting that as a different profession for producing a certain brand of software that does have value, and that I wasn't making before. The intersection of that is opportunity I'm missing out on; no fault to anyone taking it!
The software engineer that writes the air traffic avoidance system for a plane better take their job seriously, understand every change they make, and be able to maintain software indefinitely. People might not care a ton about how their sales tracking software is engineered, but they really care about the engineering of the airplane software.
It shouldn’t be, but it’s going to take some catastrophic events to convince people that we have to work to make sure we understand the systems we’re building and keep everything from devolving into vibe coded slop.
I guess that's why I see it as a separate profession, as in we have to actually profess a standard for how a professional in our field acts and believes. I think it's OK for it to bifurcate into two different fields, but Software Engineering would need to specifically reject prompt-and-pray on a principled and rational basis.
Sadly yes, that might require real cost to life in order to find out the "why" side of that rational basis. If you meet anyone that went to an engineering school in Québec, ask them about the ceremony they did and the ring they received. [0] It's not like that ceremony fixes anything, but it's a solemn declaration of responsibility which to me at least, sets a contract with society that says "we won't make things that harm you".
[0] https://ironring.ca/home-en/
This is a brilliant reimagining of the old and trusted PnP acronym.
>Claude Code users typically treat the .claude folder like a black box. They know it exists. They’ve seen it appear in their project root. But they’ve never opened it, let alone understood what every file inside it does.
I know we are living in a post-engineering world now, but you can't tell me that people don't look at PRs anymore, or their own diffs, at least until/if they decide to .gitignore .claude.
I'm a senior engineer who has been shipping code since before GitHub and PR reviews was a thing. Thankfully LLMs have freed me from being asked to read other people's shit code for hours every day.
I recently tried IntelliJ for Kotlin development and it wanted me to give a credit card for a 30 day trial. I just want something that scans my repo and I tell it the changes I want and it does it. If possible, it would also run the existing tests to make sure its changes don't break anything.
While the coding assistants are pretty much universally free, you still need to connect them to a model. The model tokens generally cost something once you've gone past a certain quota.
I'm not sure if this is still true, but if you have a Google account, Gemini Code Assist had a quite generous "free tier" that I used for a while and found it do be pretty decent.
It is fun to use.
https://www.youtube.com/watch?v=0RLIlNWv1xo
You log in with your Goggle account.
Opencoder is bring your own model.
You get what you pay for so good luck.
It has a few issues with outdated advice (e.g. commands has been merged with skills), but overall I might use share it with co-workers who needs an introduction to the concept.
[1]: https://www.npmjs.com/package/claude-code-types
Do people find the nano-banana cartoon infographics to be helpful, or distracting? Personally, I'm starting to tire seeing all the little cartoon people and the faux-hand-drawn images.
Wouldn't Tufte call this chartjunk?
Feels like generated AI art like this is modern clipart
The simple truth we're about to realize is there is no free lunch: a tool cannot inject more intent into a piece than its author put in. It might smooth out some blemishes or highlight some alternative choices, but it can't transform the input "make me a video game" into something greater than a statistical mix-mash of the concept. And traditional tools of automation give you a much better, more precise interface for intent than natural language, which allows these vagaries.
Let’s say lose those and using emojis as bullet points. It’s going to be a lot harder to detect.
In this case, I'd say helpful because I didn't have to read the article at all to understand what was being communicated.
> Clutter is the disease of American writing. We are a society strangling in unnecessary words, circular constructions, pompous frills and meaningless jargon.
> Look for the clutter in your writing and prune it ruthlessly. Be grateful for everything you can throw away. Reexamine each sentence you put on paper. Is every word doing new work? Can any thought be expressed with more economy?
On Writing Well (Zinsser)
Some of the others, I don’t feel like added value, but I agree that these are some of the best of a practice that I agreed does not add a ton of value typically
So yes, it's chartjunk.
If the graphic still needs paragraphs to decode and doesn't let the reader pull out the key facts faster than plain text, it's not an infographic so much as cargo-cult design pasted on top of an explanation.
But they had already lost me at all the links, and the fact there's not a red wire through the entire article.
The first thing my eyes skimmed was:
> CLAUDE.md: Claude’s instruction manual
> This is the most important file in the entire system. When you start a Claude Code session, the first thing it reads is CLAUDE.md. It loads it straight into the system prompt and keeps it in mind for the entire conversation.
No it's not. Claude does not read this until it is relevant. And if it does, it's not SOT. So no, it's argumentatively not the most important file.
https://code.claude.com/docs/en/memory
“CLAUDE.md files are loaded into the context window at the start of every session”
Like mostly people who have confused luck and success, or business acumen for religion.
So I wouldn’t use LinkedIn as a positive data point of what’s hot.
I think the problem is that they're uninformative slop often enough that I've subconsciously determined they aren't worth risking attention time on.
No.
CLAUDE.md is just prompt text. Compaction rewrites prompt text.
If it matters, enforce it in other ways.
When you have this performative folder of skills the AI wastes a bunch of tool calls, gets confused, doesn't get to the meat of the problem.
beware!
In the end it will still produce slop you need to review line by line.
The question is: Do you want to write code you know and verified that works, or review code written by AI that is of junior dev quality that is not verified.
> Two folders, not one
Why post AI slop here?
> "The project-level folder holds team configuration. You commit it to git. Everyone on the team gets the same rules, the same custom commands, the same permission policies."
> "Most people either write too much or too little. Here’s what works."
It feels like I've been teleported into a recent LinkedIn feed. Do real people actually already write like AI or is it AI generated?