Agentic Research
· 5 min read
In my previous post about Effective Agentic Coding, I listed out five distinct workflows that I find myself falling into. The first of which was around researching topics with agents such as Claude Code. For many folks, this many be shocking as agentics are exceptionally confident and will tell you what you would like. However, they also are exceptional are searching the web and analyzing text and synthesizing it into something usable.
First must come the problem #
At Grafana Labs, I’m working on the Synthetic Monitoring product which has had browser tests usage growing rapidly. With any period of rapid growth, we’re also experiencing growing pains as we learn to operate browser checks at scale. So I’ve been dealing with escalations and learning to operate these myself, and have found it challenging to search to web around this topic. There’s only a handful of companies who are operating large browser farms, and not many of them write publicly about their platform. This my problem:
- chromium docs are rather dense
- discovering written posts about scaling chromium on k8s is hard
- no tools exist to help measure performance
Enter agents!
Researching with agents #
Here was my prompt for Claude:
This is purely a planning and research session. The end result should be a written document that I can refer to in the next phase. I’m looking to learn more about how to collect metrics on running chromium instances. The context is I run a chromium wrapper in grafana/crocochrome that is invoked from k6 runners, and I run these at scale in kubernetes. I’d like to figure out a way of gathering more runtime diagnostic data on chromium sessions that are spawned and execute k6 browser tests.
What happened from this point is:
- Multiple agents were spawned
- Each agent was tasked with a specific part of the problem:
- Chrome Developer Protocols(CDP)
- OS-level chromium metrics
- crocochrome architecture
- chromium monitoring in k8s
As I viewed the agents, what each one was doing was effectively
- Web search tool invocations
- piping results to python scripts to extract content
- aggregating content into summaries
At the end, the main agent collected all of the results into a final markdown file with it’s findings.
Summary #
Of course, agents and LLM’s are exceptionally confident and are there to please. So the first thing I always do is read through the summary as best as I can and try and pressure test it. The first draft brought together 11 sections of content plus references. That’s overwhelming as someone who’s just trying to dip there toes and learn more. For me, the most effective sections that were produced were:
From there, I added one more prompt because I was overwhelmed, and asked claude to simplify things as much as possible and provide learning resources. The result was learning cdp, which was honestly more effective then anything else.
Was it useful #
This exercise got me closer to material much faster then I could have without claude. The prompt I provided was too broad and I ended up with about 500 lines of text that took me another hour or so to read through. The links it provided were the most valuable aspect as it got me closer to where I wanted to be. It’s almost like I created a wikipedia article on the subject that’s complete vibes. So overall, yes it was useful in getting me material faster. No, I still do not have a confident answer on how to collect better performance metrics from chromium. What I do have is guided material I can use to spin up experiments to dig deeper! Which leads well into how I use claude to experiment and learn!
What’s changed #
My workflow for researching topics has completely shifted.
For my whole life, it’s generally been: Have problem => Search the internet with appropriate query => read through top 5 results => repeat until satisfied
Early search engines weren’t great, but were helpful enough.
Google reshaped the search engine market and had a market share for almost 20 years.
Wikipedia and StackOverflow were my mainstays for anything science + computer science related.
HowStuffWorks was a great surface level resource for learning about general things.
Reddit was a place to find communities around niche topics.
Basically, I learned not just how to find things, but where reliable information could be found.
It now makes more sense how Google effectively ruined their search page with AI placement. ChatGPT was an existential threat, but in hindsight it wasn’t the one Google was worried about. Harnesses like Claude were the things Google was worried about. A majority of my day is now spent in the terminal where I do most of my research without ever going to a web browser or google directly.
Where to go from here #
So now I’m spending a considerable amount of time reformulating how I research and discover topics. LLM’s and agents aren’t perfect, but neither are search engines or the internet. These are all tools that help you discover, but the onus still is on the human to decide what to do with that information. What’s changed is how fast I can gather information. My wetware is still the bottleneck, and so I must figure out ways of keeping these two in sync.
What I’ve found that works for me though is:
- have a problem
- launch a new session in some harness(claude for example)
- stick to planning mode and work to gather as much information as possible
- distill information into some written markdown file
- iterate
The artifact produced from this step is then used as the basis for further work. It’s far from perfect, but that’s fine. Search engines and the internet is far from perfect, but you use the tools the best way you can.