Hands-on with W&B and Coding Agents

Agent
Auto Improving

Learn to develop, evaluate, and continuously improve AI Agents using Coding Agents, W&B Weave, and W&B Skills. We named this learning content Agentforge — forging AI Agents to be stronger.

Start Learning

What is Auto Research?

Give an AI agent a real setup and let it experiment autonomously — modify, evaluate, keep or discard, repeat. You wake up to a log of experiments and a better system. Any metric you can efficiently evaluate can be autoresearched.

"All LLM frontier labs will do this. It's the final boss battle."
— Andrej Karpathy

karpathy/autoresearch Original post on X

Courses

Choose a course to begin your learning journey.

AI Agent Quality Evaluation & Improvement with Agents

Use W&B Weave as the foundation for tracing, evaluation, and monitoring of AI Agents, and learn to improve them with Coding Agents and W&B Skills.

15 chaptersStart Learning

Coming Soon

AI Agent Quality Evaluation & Improvement: Hands-on

Hands-on exercises and practical assignments for the AI Agent Quality course.

Who is this for?

Agent Optimization with Coding Agents

You use coding agents to build and optimize AI agents, and want a structured workflow for it.

From Human Eval to Automated Systems

You want to learn how to go from human evaluation to building a scalable evaluation framework.

Continuous Agent Improvement Teams

Your team wants a repeatable process for continuously improving agents in production.

W&B Weave Power Users

You want hands-on experience using Weave for evaluation, monitoring, and labeling of AI agents.

Auto Research Best Practices

You want to learn implementation and evaluation best practices for Auto Research agents.

Powered by W&B

This course uses W&B Weave and W&B Skills as core tools throughout every chapter.

W&B Weave

The observability and evaluation platform for AI applications.

Tracing — Record every agent action with full context
Evaluations — Structured quality assessments with datasets and scorers
Labeling — Human review workflows that feed into automated evaluation
Monitoring — Production dashboards, alerts, and regression detection
Feedback — Capture user signals and route them to improvement pipelines

Learn more about Weave →

W&B Skills

Reusable agent workflows for optimization and improvement.

Agent Optimization — Improve prompts, retrieval, and knowledge bases from evaluation data
Evaluation Schemes — Create scorers and evaluation pipelines
Weave Integration — Add tracing and monitoring to existing agents

Learn more about Skills →

AgentAuto Improving