Event: Agent Camp Boston

I helped organize Agent Camp Boston which was an all-day in-person event where attendees showed up with laptops for a day of learning to build AI agents from a series of talks and labs. We had around 60 participants. We stayed caffeinated, hydrated, and fed from generous sponsorship from Daymark which is also celebrating their 25th birthday (happy bday Daymark!) – and thank you for supporting the Boston technology community.

This event was jointly offered by Global AI and Boston Azure AI communities.

The topics and other details of the day can be viewed on the Meetup signup page (https://www.meetup.com/bostonazureai/events/314980519/).

I wore two hats. One hat is as an event organizer – and the content above is me as an event organizer

In the remainder of this post I will switch to my speaker hat add some detail to two areas where I was speaking and running a lab. This was two pieces: (a) I was part of the AI stack and observations panel and (b) I created an MCP & x402 lab.

DESIGN.md and LEGIBLE DESIGN INSTRUCTIONS

I participated in a panel (with Jason and Ashwin). During the panel we were asked to describe our day-to-day tool stack for building with AI. Since it wasn’t so different from what many others are using, I won’t repeat most of my stack here, but will call out the one point that I think was most different: part of my stack is DESIGN.md which is a new way specify and encode a visual design system.

I was a speaker at the Boston UXPA 2026 conference last month, co-presenting the talk “Beyond ChatGPT: Your New Coworker is an AI Agent” (see my short writeup on LinkedIn here) and the audience was mostly product managers, product designers, and UX researchers. That audience all knew about design systems so the idea of DESIGN.md landed.

However at Agent Camp, it didn’t seem to land so clearly. So I asked the audience if they knew about design systems and got only 1 or 2 hands to go up out of around 50 people in the room.

First let me say what it is not. We can informally decide that architecture design will go into ARCHITECTURE.md and we know our agent will be able to read this and make sense of it. Let’s call this the “informal convention” pattern. Other examples of the informal convention pattern might be ADR.md (architecture decision records – have a look if you don’t know!), BUGS.md, and BACKLOG.md. These “informal conventions” are that there’s a bunch of related stuff in these files.

Even the ubiquitous AGENTS.md is mostly an informal convention since the content structure is not specified, but at least the name is officially agreed upon by the many AI tools.

You familiar with SKILL.md? This is different. The structure of an AI SKILL is formally specified and includes filesystem layout and the contents of the SKILL.md file itself. Front matter is specified, for example. This is definitely in the “formal specification” department.

A recent specification proposal from Google would add DESIGN.md to the “formal specification” category for AI Agents. This is what I mentioned at Agent Camp. DESIGN.md has a repo, a spec, and a structure (including YAML). Quoting from the spec:

A DESIGN.md file has two layers. The YAML front matter contains machine-readable design tokens — the precise values agents use to enforce consistency. The markdown body provides human-readable design rationale organized into ## sections. Prose may use descriptive color names (e.g., “Midnight Forest Green”) that correspond to systematic token names (e.g., primary). The tokens are the normative values; the prose provides context for how to apply them.

There is also a linter tool in the repo.

I should also point out that DESIGN.md has no doubt been used frequent as an informal convention where we write down our design thoughts, decisions, or guiding principles. The DESIGN.md I am talking about is different. It is definitely in the formal specification realm, with a recently open source’s spec. The DESIGN.md is structured and well defined. So don’t confuse “yeah, I’ve been using DESIGN.md for a year now!” from this new spec-defined approach.

In the language of the day: You can make your design system (which colors, fonts, and other visual elements are used where – and WHY) legible (legible means understandable to AI) so your AI tools can follow your standards when you ask it to create visual surfaces (surfaces are web apps, mobile apps, APIs, CLIs, MCP servers).

MCP / x402 LAB

I designed and ran a lab focused on Model Context Protocol (MCP) and the x402 payment protocol.

The MCP lab is described below. The repo for the lab is on GitHub at CrankingAI/ipfacts-lab and contains all the instructions, so the following is offered for additional illumination.

EXERCISE 1 (demonstrate first failure mode)

The lab comes preconfigured with an AI Agent written in python using Microsoft’s open source Agent Framework library. Your first task was to run the lab code “as is” to see what happens – asking the agent to research and provide some facts about an IP address you provide. If you run the exercise a few times with the same IP address you will find that the agent is not returning consistent results – or “hallucinating” as the AI community calls it. The remedy comes in the next step.

EXERCISE 2 (fix first failure mode)

The second task is to give the agent access to tools. This helps “ground” the agent with reliable data so it can consistently and accurately answer the questions. Rerunning the agent with the IP address from the first exercise now yields consistent, accurate results.

When we say “give the agent access to tools” we mean tools from an MCP server. MCP servers can do more than make tools available, but that’s the most common function and the one focused on in the lab.

The MCP tools and data are provided by IP Facts which is an experimental service designed to provide some basic information about IP addresses. Some example IP facts are:

  • IP version: IPv4 or IPv6
  • Public or Private (some IP ranges are intended for public use, others for intranet or other specialized uses)
  • It is a Tor Exit Node (Tor is a publicly available privacy-protecting tool – it is common for network security tools to care whether a connection is coming from Tor)
  • Is it hosted by AWS or Azure (Amazon and Azure and other public cloud providers publish their IP ranges)
  • Country of origin (two character ISO-3166 country codes – like “CL” for Chile, “IN” for India, or “US” for United States)

You can check out IP Facts interactively at ipfacts.com but the lab used the MCP server at https://mcp.sandbox.ipfacts.com/mcp – we’ll come back to why “sandbox” appears in that URL.

EXERCISE 3 (demonstrate second failure mode)

The third exercise asks the user to expand the job of the agent to give more details – specifically adding in the country code for that IP address.

Most of the above IP facts listed above are available in a single tool from the MCP server, but one particular fact – from which country does the IP address originate – comes from a second tool. And the lab sets a bit of a trap because it uses the x402 protocol to CHARGE THE AGENT FOR THE DATA.

In the lab the amount charged was 1/100th of one cent – which was expressed as 100 atomic = 0.0001 – where “atomic” is a term from the cryptocurrency/token world.

Non-payment flow: The flow when calling the tool that DOES NOT REQUIRE PAYMENT is very simple – the lab agent calls the tool over HTTP and useful results are returned with HTTP 200 status. The HTTP 200 status means SUCCESS. This was the tool used in the above step, but is not sufficient here since it does not include the country code for the IP.

Payment flow: The flow when calling the tool that DOES REQUIRE PAYMENT is a little more of a dance – the agent calls the tool over HTTP and no data results are returned but rather some payment requirements are returned with HTTP 402 status. The HTTP 402 status means PAYMENT REQUIRED. The included “payment requirements” include how much does this cost (1/100th of a USD cent in our case) and what are the accepted types of payment (USDC in our case).

USDC is “USD Coin” (where USD means US dollar) – a digital currency created and maintained by Circle. USDC is known as a “stable coin” because 1 USDC is essentially equal in value to a dollar in regular US (fiat) dollar. This makes it easy to transfer known amounts of money digitally – such as between AI agents. If instead we used Bitcoin or other “traditional” cryptocurrency, we’d have to calculate how much Bitcoin is needed to match our price. Bitcoin values vary a lot over time. It is definitely possible to exchange money with non-stable coins, but less complex with a stable coin.

Since the agent is not yet equipped to make payments, it fails (the country code is not produced). It sees the HTTP 402 status code and doesn’t know how to handle it.

An accurate nit might be that the flow in the Lab is not exactly as described above. I describe the conceptual flow here. See the lab code for exact details. One of the points made during the lab was that we made the choice for the 402 processing to be done in deterministic code (meaning not in the LLM). There’s a mechanism in the lab using a middleware “hook” provided by Agent Framework to handle 402 processing (according to the x402 standard) using deterministic code. Enabling this deterministic code is what EXERCISE 4 does.

EXERCISE 4 (fix second failure mode)

And remember the we’ll come back to why “sandbox” appears in that URL comment from above? A “sandbox” is a term from the cryptocurrency world and it means (technically) we are on a testnet – a shadow blockchain with no real value. In plain English, this means the lab did not use real money. Of course, this is what makes it ideal for a lab – it is not real money. But rest assured – the mechanisms used in the lab can also process mainnet (real money version) blockchains. Other than the safeguards in the lab to require the quote MUST be the Base Sepolia testnet, the changes to support real money are config values.

We fix the second failure mode by enabling support to handle the HTTP 402 status.

We can now see payments flowing from the agent’s wallet to the IP Facts wallet such as the publicly viewable wallet address for the IP Facts sandbox. The “0.0001” entries below match the 1/100th of a penny price put on the MCP tool that returns country code for the IP address. So this is a great historical record of how the lab participants exercised that MCP tool path. This is all testnet USDC (aka “fake” money) for testing. As stated elsewhere in this post, the CODE CAN ALSO WORK WITH REAL USDC; just that would have been a more complicated lab exercise.

Each of the “0.0001” entries above is from a lab participant buying data from the https://mcp.sandbox.ipfacts.com/mcp MCP server.

EXERCISE 5 and BEYOND

Further lab exercises are included in the CrankingAI/ipfacts-lab repo. We won’t cover them here, but one point worth making:

The agent and the x402 payment middleware never hardcodes MCP tool names or tool behaviors.

This is a big deal. Compare that to an API where names and params and details are all burned in tightly.

The labs do offer the possibility to explore this further – and I demonstrated it at one point by having the two tools SWAP NAMES but otherwise keep their functionality. And the lab agents still worked fine.

CONCEPTS & TERMINOLOGY

Background reading or elaboration on the ideas and terms referenced above.

Design & agent files

Design systems (Nielsen Norman Group): https://www.nngroup.com/articles/design-systems-101/
DESIGN.md: https://github.com/google-labs-code/design.md
AGENTS.md: https://agents.md
Agent Skills / SKILL.md: https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview
Architecture Decision Records (ADR): https://adr.github.io

Agents, MCP & the framework

Model Context Protocol (MCP): https://modelcontextprotocol.io
Microsoft Agent Framework: https://github.com/microsoft/agent-framework

x402 & payments

x402 payment protocol: https://x402.org
x402 (deeper / spec): https://github.com/coinbase/x402
HTTP 402 Payment Required (MDN):
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402
USDC & stablecoins (Circle): https://www.circle.com/usdc
Base Sepolia testnet (Base docs): https://docs.base.org

Add value to your USDC testnet wallet: https://faucet.circle.com/
View your USDC balance using the explorer: https://sepolia.basescan.org (example link to transactions view of https://mcp.sandbox.ipfacts.com wallet)

Networking terms

Tor / Tor exit: https://www.torproject.org
ISO 3166 country codes (Wikipedia): https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

AI Terminology

Legible: machine-readable / understandable to an AI agent. Our lab makes certain facts about IP addresses “legible to our AI agent” so it can reason about them.

Surface: a realization of product functionality such as via web app, mobile app, API, CLI, or MCP server – here is the Web App surface for IP Facts (at https://ipfacts.com) in action:

Hallucination: https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)
Grounding: https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/overview
AI agent: https://en.wikipedia.org/wiki/Intelligent_agent
Tool use / tool calling: https://docs.claude.com/en/docs/agents-and-tools/tool-use/overview
Large language model (LLM): https://en.wikipedia.org/wiki/Large_language_model
Embedding model: https://en.wikipedia.org/wiki/Sentence_embedding
Vector database: https://en.wikipedia.org/wiki/Vector_database
Semantic similarity: https://en.wikipedia.org/wiki/Semantic_similarity

SOURCE CODE

  • The lab exercises are in the CrankingAI/ipfacts-lab repo.
  • I briefly showed the Vector Playground app – a tool for comparing text strings for semantic similarity using an embedding model which is an AI model, but is not a general purpose large language model (LLM). This is the key technology behind vector databases. Run it at vectorplayground.com and source code is in the CrankingAI/vectorplayground repo.

PRESENTATION

My lab was mostly demos, but here is the deck I used to set it up.

CONNECT

Connect with Bill

Connect with Boston Azure AI

Connect with Agent Framework Dev Project

OTel Traces for the Win

This post was inspired by my live presentation of Making Agents Work talk at Boston Code Camp #40, which included an OTel demo snafu.

Image stolen from Bala Subra – https://x.com/bsubra/status/2037887079804248504?s=20

Quoting from https://opentelemetry.io/ – “OpenTelemetry is an open source observability framework for cloud native software. It provides a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application.”

Here we focus on Open Telemetry – or OTel for short – Traces.

The Anemic OTel Trace Antipattern

Due to an error in my demo prep, I ended up showing sparse OTel Traces – definitely not producing meaningful telemetry so observability will be subpar (or terrible). I am calling this the Anemic OTel Trace Antipattern. This antipattern comes through in the four screenshots that follow. The first screenshot shows the overall traces view listing one trace-per row. This is actually fine and normal as these are reasonable top-level traces. But drilling into any of these individual traces revealed no nesting and no context.

Top-level OTel Traces – shown using Aspire on local machine (note the localhost URL)
Click on “functions: RunJob” trace

Click on “functions: POST api/jobs” trace – this is the detail after clicking on the one trace row
Click on “functions: GET api/jobs” trace – this is the detail after clicking on the one trace row

The Flat Trace OTel Trace Antipattern

Consider the traces below. If GetJob is triggered by an HTTP GET to the jobs endpoint, then my suggestion is they should be nested – GetJob under the corresponding HTTP GET /jobs/guid. As shown below they are flat, appearing as siblings rather than hierarchical. This is another OTel Trace Antipattern – let’s call it the Flat Trace OTel Trace Antipattern. We have this great “Trace”/”Span” nesting support, but still our signals look like old-school flat log entries. Definitely not optimal!

Properly Nested Traces and Spans

Let’s tighten up terminology. An OTel Trace represents the complete journey of a request through a system, and it is made up of one or more Spans that form a (logically nested) tree where each span is a unit of work. Within a trace it can make sense that some spans are siblings and others nested – it should mimic the actual flow through the system. The tree is reconstructed by following parent_span_id references. A trace can span multiple services (distributed tracing for the win!). Each service creates its own spans, propagating the Trace ID and parent Span ID via context propagation headers (e.g., traceparent in W3C Trace Context). Each Span in a Trace will share the same trace_id but have its own unique span_id.

So, using our vocabulary from above, the remedy for Anemic is to add more spans, and the remedy for Flat is to reuse span parents – passing them down to child processes rather than creating new spans.

With proper span structure, here is the SAME application again, except with OTel Traces and Spans more thoughtfully configured.

Now I can click on any of these and there will be spans nested within. You can tell the number and types of spans from the Spans column. The following span is from when the job was submitted: starts with an HTTP POST, stores some stuff in an Azure Storage Blob, creates a message in an Azure Storage Queue, then returns an HTTP 202 STATUS (“Accepted”) with a JobId.

Note above that movie we are requesting to assess is “best picture winner from 1988” – which is not a movie name you’ll find on IMDB. But a human will at least know what you mean. As will an LLM.

Now let’s double-click on the “RunJob” trace for the same movie – this is also around 20 seconds after the job was created since processing is asynchronous – and starts when our movie makes it to the front of the Azure Storage Queue queue:

Since we have an AI Agent, the movie request we made earlier (via HTTP POST) was asking to assess “best picture winner from 1988” and the name of the movie actually assessed was “Rain Man” as you can see. AI is working for us. For visibility in our monitoring and debugging, we added those details as properties in the OTel span. The helps us know exactly which business operation we are looking at when we review the telemetry.

Here’s one more span from the RunJob trace, this one showing some OTel Semantic Conventions for GenAI – the gen_ai.request_model and other span properties – but see also the next section for more on this:

OTel GenAI Semantic Conventions in action ☝

Finally, here’s the trace for that same movie request being retrieved after processing has completed:

OTel GenAI Semantic Conventions

OTel drives consistency across solutions and vendors by specifying semantic conventions. Specifically for GenAI they specify many identifiers (see example above – two screenshots ago). In this screenshot you can see a bunch of them in action. For more information, check out these resources:

Grabbed from Traces view in Aspire (click on a row, this appears in the right-most pane)

Source Code

Presentation

  • PowerPoint Deck is here:

Connect

Connect with Bill

Connect with Boston Azure AI

Connect with Agent Framework Dev Project

Talk: Making Agents Work – Boston Code Camp #40

I had the opportunity (28-Mar-2026) to present at the 40th running of Boston Code Camp. Thank you to the incredible pros running these events, twice yearly, making it happen for a grateful greater-Boston tech community.

Image stolen from Bala Subra – https://x.com/bsubra/status/2037887079804248504?s=20

Thank You to the Speakers, Sponsors, and Organizers

Thank you to all the speakers:

Anirban Tarafder · Bala Subra · Bill Wilder · Bob German · Bryan Hogan · Chris Seferlis · Cole Flenniken · Dave Davis · Dave Finn · Dekel Cohen Sharon · Fnu Tarana · Gleb Bahmutov · Harry Kimpel · Jason Haley · Jeff Blanchard · Jesse Liberty · Jim Wilcox · John Miner · Joseph Parzel · Josh Goldberg · Juan Pablo Garcia Gonzalez · Keith Fitts · Matt Ferguson · Matthew Norberg · Michael Mintz · Pavan Kumar Kasani · Richard Crane · Sunil Kadimdiwan · Taiob Ali · Ty Augustine · Udaiappa Ramachandran · Varsham Papikian · Vijaya Vishwanath · Viswa Mohanty

And thank you sponsors:

Hosting: Microsoft · Gold: MILL5 · Silver: Pulsar Security · Progress Telerik · Triverus · Brightstar · In-kind: Sessionize

Making Agents Work

My session was Making Agents Work which highlighted some of “the boring side” of building an AI Agent – but these boring details can be super-valuable. The talk was inspired by work I did in my day job as CTO at Open Admissions. I am using an AI Agent to scale a 30 year-old methodology that can be used to help people understand themselves better and use those insights to choose a more aligned college, major, job, or other consequential life decision. Doing this with an AI Agent is a huge responsibility and, as I shared, putting together the initial agent was the easy part – being confident it is consistent, accurate, well behaved, robust if attacked or misused – but still easy to use – that was the hard and boring part!

Image stolen from Bala Subra – https://x.com/bsubra/status/2037887079804248504?s=20

The talk uses a different AI Agent – a simple one that accepts a movie and returns a rating summary – to illuminate some of the points. For example, it uses Agent Framework and has a fan-out/fan-in workflow in the internal agent architecture, uses Microsoft Foundry, a modern tech stack, and Azure Monitor for OTel-aligned Observability.

The full description, link to github repo, and slides follow.

But first, please find some elaboration on OTel Traces, inspired by my OTel demo snafu at the live event. That blog post is here: https://blog.codingoutloud.com/2026/03/30/otel-traces-for-the-win/

OTel Traces for the Win

Speaking of OTel… Due 100% to user error (that would be me!), the demo I had prepared to show the incredible power of OTel had a technical glitch. So I have attempted to remedy that with a blog post I’m calling OTel Traces for the Win. So please hop over there if you are interested.

Making Agents Work – the official talk description

Building more powerful AI Agents seems to be getting easier by the day. They are powered by incredible models, have access to tools, and can work in teams. But how can we have confidence in non-deterministic systems that make consequential decisions?

This talk explores four approaches for building that confidence.

1. Observability platforms – You can’t improve what you can’t see. We’ll explore tools that make the hard-to-see stuff visible.

2. Evals (evaluations) – Moving beyond LGTM (looks good to me), evals wrap agents in formal testing structures to measure accuracy, consistency, and edge case handling – both before and after your Agent goes live.

3. Safety guardrails – Content filtering, PII detection, and hallucination detection from both platform vendors and standalone models. Let’s see how they fit into your agent stack.

4. Selective determinism – Sometimes we make better AI solutions by knowing when NOT to use AI. We will discuss mixing in deterministic logic with our non-deterministic behaviors.

Concepts are platform-agnostic, but demos will use Microsoft Foundry and the Agent Framework (currently in preview). (In case you haven’t been following along, Microsoft Foundry was previously know as known as Azure AI Foundry, and before that was Azure AI Studio. And Agent Framework is the next generation of both Semantic Kernel and AutoGen.)

Target audience: Those new to building production agent systems seeking approaches beyond the “hello world” tutorials – which described me not too long ago.

Source Code

Presentation

  • The slides I presented are here:

Connect

Connect with Bill

Connect with Boston Azure AI

Connect with Agent Framework Dev Project

Talk: Making Agents Work – Memphis AgentCamp

I had the opportunity (16-Mar-2026) to present at Memphis AgentCamp. Thank you Doug Starnes for a great event!

The description, link to github repo, and slides follow.

Making Agents Work

Building more powerful AI Agents seems to be getting easier by the day. They are powered by incredible models, have access to tools, and can work in teams. But how can we have confidence in non-deterministic systems that make consequential decisions?

This talk explores four approaches for building that confidence.

1. Observability platforms – You can’t improve what you can’t see. We’ll explore tools that make the hard-to-see stuff visible.

2. Evals (evaluations) – Moving beyond LGTM (looks good to me), evals wrap agents in formal testing structures to measure accuracy, consistency, and edge case handling – both before and after your Agent goes live.

3. Safety guardrails – Content filtering, PII detection, and hallucination detection from both platform vendors and standalone models. Let’s see how they fit into your agent stack.

4. Selective determinism – Sometimes we make better AI solutions by knowing when NOT to use AI. We will discuss mixing in deterministic logic with our non-deterministic behaviors.

Concepts are platform-agnostic, but demos will use Microsoft Foundry and the Agent Framework (currently in preview). (In case you haven’t been following along, Microsoft Foundry was previously know as known as Azure AI Foundry, and before that was Azure AI Studio. And Agent Framework is the next generation of both Semantic Kernel and AutoGen.)

Target audience: Those new to building production agent systems seeking approaches beyond the “hello world” tutorials – which described me not too long ago.

Connect with Bill and Boston Azure AI

Talk: AI Chatbot → Agent with Model Context Protocol

I had the opportunity (22-Nov-2025) to present at the 39th running of Boston Code Camp since started in 2003. Some links and notes and comments below.

First, thank you to the organizers, sponsors, and speakers who have been making this possible since 2003!

MCP – Model Context Protocol – is coming up on its first birthday and adoption is currently on 🔥 fire 🔥 accelerating the creation and adoption of new MCP servers.

Photo above from Robert Hurlbut’s LinkedIn post.

Anthropic’s original MCP specification:

Tools and Libraries for building, testing, and consuming MCP servers:

Registries of MCP Servers (these are a couple of examples of reputable ones, but be cautious about any registries, especially rando registries out there!):

Photo above courtesy of Udaiappa Ramachandran (who runs https://www.meetup.com/nashuaug/).

Talk description:

Agency is the capacity to act autonomously, make choices, and shape outcomes. The Model Context Protocol (MCP) brings this agency to AI systems at scale.

In this session, we’ll explain the gap MCP fills, highlight key use cases, and explore the rapidly growing ecosystem of tools and marketplaces. We’ll demonstrate MCP in action and walk through how an MCP tool is built and deployed.

You’ll leave knowing what MCP is, why it matters, and how it connects systems and data to make AI more effective – and more agentic. And as Spider-Man reminds us, with great power comes great responsibility: we’ll close by looking at the risks and governance challenges.

Above photo from Veronika Kolesnikova’s post.

I had the opportunity (22-Nov-2025) to present at the 39th running of Boston Code Camp since started in 2003.

And the deck is here:

Connect with Bill and Boston Azure AI

Talk: Human Language is the New UI. How this is possible?

I had the opportunity (15-Aug-2025) to talk to Azure Tech Group Bangladesh about how human language has become the new UI as part of their ML Summer School BD program. The talk was recorded and posted to YouTube.

The tool used in demos to illustrate an embedding model in action can be found at:

funwithvectors.com.

And the deck is here:

Connect with Bill and Boston Azure AI

GitHub Copilot Agent Mode for the Win: I added a new Tool to MCP Server with Single Prompt

Along with fellow panelists Jason Haley, Veronika Kolesnikova (the three of us run Boston Azure AI), and Udaiappa Ramachandran (he runs Nashua Cloud .NET & DevBoston), I was part of a Boston Azure AI event to discuss highlights from Microsoft’s 2025 Build conference. I knew a couple of the things I wanted to show off were GitHub Copilot Agent mode and hosting Model Context Protocol (MCP) tools in Azure Functions.

What I didn’t realize at first was that these would be the same demo.

I started with a solid sample C#/.NET MCP server ready to be deployed as an Azure Function (one of several languages offered). The sample implemented a couple of tools and my goal was to implement an additional tool that would accept an IP address and return the country where that IP address is registered. The IP to country code mapping functionality if available as part of Azure Maps.

I started to hand-implement it, then… I decided to see how far GitHub Copilot Agent mode would get me. I’ve used it many times before and it can be helpful, but this ask was tricky. One challenge being that there was IaC in the mix: Bicep files to support the azd up deployment, AVM modules, and many code files implementing the feature set. And MCP is still new. And the MCP support within Azure Functions was newer still.

Give GitHub Copilot Agent a Goal

The first step was to give the GitHub Copilot Agent a goal that matches my needs. In my case, I gave Agent mode this prompt:

The .NET project implements a couple of Model Context Protocol (MCP) tools – a couple for snippets and one that says hello. Add a new MCP tool that accepts an IPv4 IP address and returns the country where that IP address is registered. For example, passing in 8.8.8.8, which is Google’s well-known DNS server address, would return “us” because it is based in the USA. To look up the country of registration, use the Azure Maps API.

And here’s what happened – as told through some screenshots from what scrolled by in the Agent chat pane – in a sequence that took around 12 minutes:

I can see some coding progress along the way:

A couple of times the Agent paused to see if I wanted to continue:

It noticed an error and didn’t stop – it just got busy overcoming it:

It routinely asked for permissions before certain actions:

Again, error identification – then overcoming errors, sometimes by getting more up-to-date information:

Second check to make sure I was comfortable with it continuing – this one around 10 minutes after starting work on the goal:

In total 9 files were changed and 11 edit locations were identified:

Deploy to Azure

Using azd up, get it deployed into Azure.

Add MCP Reference to VS Code

Once up and running, then I installed it in VS Code as a new Tool – first click on the wrench/screwdriver:

Then from the pop-up, scroll the the bottom, then choose + Add More Tools…

Then follow the prompts (and see also instructions in the GitHub repo):

Exercise in VS Code

Now that you’ve added the MCP server (running from an Azure Function) into the MCP host (which is VS Code), you can invoke the MCT tool that accepts an IP and returns a country code:

domain-availability-checker% dig A en.kremlin.ru +short
95.173.136.70
95.173.136.72
95.173.136.71
domain-availability-checker%

Using the first of the three returned IP addresses, I ask within the Agent chat area “where is 95.173.136.70 located?” – assuming that the LLM used by the chat parser will recognize the IP address – and the need for a location – and figure out the right MCT tool to invoke:

I give it one-time permission and it does its thing:

Victory!

Check Code Changes into GitHub

Of course, using GitHub Copilot to generate a commit message:

Done!

Connect with Bill and Boston Azure AI

Talk: Empowering AI Agents with Tools using MCP

Last night I had the pleasure of speaking to two simultaneous audiences: Nashua Cloud .NET & DevBoston community tech groups. The talk was on Model Context Protocol (MCP) which, in a nutshell, is the rising star for answering the following question: What’s the best way to allow my LLM to call my code in a standard way?

There is a lot in that statement, so let me elaborate.

First, what do you mean by “the best way to allow my LLM to call my code” — why is the LLM calling my code at all? Don’t we invoke the LLM via its API, not the other way around? Good question, but LLMs can actually invoke your code. Because this is how LLMs are empowered to do more as AI Agents. Think about an AI Agent as an LLM + a Goal (prompts) + Tools (code, such as provided by MCP servers). The LLM uses the totality of the prompt (system prompt + user prompt + RAG data + any other context channeled in via prompt) to understand the goal you’ve given it then it figures out which tools to call to get that done.

In the simple Azure AI Agent I presented, its goal is to deliver an HTML snippet that follows HTML Accessibility best practices in linking to a logo it tracks down for us. One of the tools is web search to find the link to the logo. Another tool validates that the proposed link to the logo actually resolves to a legit image. And another tool could have been to create a text description of the image, but I made the design choice to leave that up to the Agent’s LLM since it was multimodel. (My older version had a separate tool for this that used a different LLM than the one driving the agent. This was an LLM with vision capabilities – which is still a reasonable idea here for multiple reasons, but kept it simple here.)

Second, what do you mean by “in a standard way” – aren’t all LLMs different? It is actually the differences between LLMs that drives the benefits of a standard way. It has been possible for a while to allow your LLM to call out to tools, but there were many ways to do this. Now doing so according to a cross-vendor agreed-upon standard, which MCP represents, lowers the bar for creating reusable and independently testable tools. And marketplaces!

Remember many challenges remain ahead. There are a few others in the deck, but here are two:

First screenshot reminds that there are limits to how many MCP tools an LLM (or host) can juggle; here, GitHub Copilot currently is capping at 128 tools, but you can get there quickly!

Second screenshot reminds that these are complex operational systems. This “major outage” (using Anthropic’s terminology) was shortly before this talk so complicated my planned preparation timel. But it recovered before the talk timeslot. Phew.

Connect with Bill and Boston Azure AI

Links from the talk

  1. Assorted Cranking AI resources ➞ https://github.com/crankingai
  2. Code for the Agent ➞ https://github.com/crankingai/logo-agent
  3. Code for the Logo Validator MCP tool ➞ https://github.com/crankingai/logo-validator-mcp
  4. Code for the Brave Web Search MCP tool ➞ https://github.com/crankingai/brave-search-mcp
  5. Images I used in the example ➞ https://github.com/crankingai/bad-images (https://raw.githubusercontent.com/crankingai/bad-images/refs/heads/main/JPEG_example_flower-jpg.png)

Anthropic status page ➞ https://status.anthropic.com/ (see screenshot above).

Model Context Protocol (MCP) Resources

Standards & Cross-vendor Cooperation

SDKs & Samples

MCP Servers & Implementations

Popular MCP Servers

  • GitHub MCP Server – GitHub’s official MCP server that provides seamless integration with GitHub APIs for automating workflows, extracting data, and building AI-powered tools. In case you’d like to create a Personal Access Token to allow your GitHub MCP tools to access github.com on your behalf ➞ https://github.com/settings/personal-access-tokens
  • Playwright MCP Server – Microsoft’s MCP server that provides browser automation capabilities using Playwright, enabling LLMs to interact with web pages through structured accessibility snapshots.
  • MCP Servers Repository – Collection of official reference implementations of MCP servers.
  • Popular MCP Servers Directory – Curated list of popular MCP server implementations.

MCP Inspector Tool ➞ Check this out for sure

Download the deck from the talk ➞

Talk: Human Language is the new UI. How does this work? at the AI Community Conference – AICO Boston event! #aicoevents

The organizers of the AI Community Conference – AICO Boston event did an incredible job. The conference was first-rate and I really enjoyed engaging with attendees and speakers, while learning from everyone.

I delivered a new iteration of my talk on how it is possible to have Human Language as the new UI, thanks to LLMs and Embedding models. There was an engaged and inquisitive group! The resources I used during the presentation, including my deck, are all included below.

Connect with Bill or other related resources:

Links from the talk:

  1. Assorted Cranking AI resources ➞ https://github.com/crankingai
  2. The funwithvectors.com app used in the talk ➞ https://funwithvectors.com and OSS repo
  3. The repo with code for the “next-token” project that I used to show how tokens have probabilities and how they are selected (and can be influenced by Temperature and Top-P which is also known as nucleus sampling) ➞ https://github.com/crankingai/next-token
  4. The OpenAI Tokenizer shown in the talk ➞ https://platform.openai.com/tokenizer/

The deck from the talk:

  1. The deck from the talk ➞

Talk: Human Language is the new UI. How is this possible? at Memphis Global AI Community Bootcamp event!

Earlier today I spoke at the Memphis edition of the Global AI Bootcamp 2025 hosted by the Memphis Technology User Groups. My talk was “Human Language is the new UI. How is this possible?” and resources and a few notes follow. Thank you Douglas Starnes for organizing! It was similar to, but not identical to, the recent talk I gave. And next time it will be different again. 😉

This is from the https://funwithvectors.com app I used to show vectors in action:

┃┃┃┃┃┃┃┃┃┃┃┃┃······· ⟪0.64⟫ → ‘doctor’ vs ‘physician’
┃┃┃┃┃┃┃┃┃┃┃┃┃······· ⟪0.67⟫ → ‘doctor’ vs ‘dr.’
┃┃┃┃┃┃┃┃┃┃·········· ⟪0.48⟫ → ‘physician’ vs ‘dr.’

The above is intended to illustrate the non-transitive nature of the “nearness” of two vectors. Just because “doctor” & “physician” are close and “doctor” & “dr.” are close does NOT mean “dr.” & “physician” are as close.

Connect with Bill or other related resources:

Links from the talk:

  1. Cranking AI resources (including source to funwithvectors.com app) ➞ https://github.com/crankingai
  2. The funwithvectors.com app used in the talk ➞ https://funwithvectors.com
  3. The OpenAI Tokenizer shown in the talk ➞ https://platform.openai.com/tokenizer/

The deck from the talk:

  1. The deck from the talk ➞ https://blog.codingoutloud.com/wp-content/uploads/2025/04/memphisglobalai-humanlanguageisnewui-25-apr-2025_pub.pptx