Anthropic's Agent-on-Agent Commerce Marketplace Experiment

1. Anthropic tests AI-driven commerce marketplace dynamics
2. Overview of Anthropic’s Marketplace Experiment
3. Details of Project Deal
3.1 Participant Structure
3.2 Budget Allocation
4. Marketplace Dynamics and Transactions
4.1 What this pilot can (and can’t) tell us
4.2 Types of Marketplaces
4.3 Transaction Volume and Value
5. Impact of Agent Quality on Outcomes
5.1 Comparison of Agent Models
5.2 Perceived Fairness Among Users
6. User Experience and Satisfaction Levels
7. Implications for Future Commerce

Anthropic tests AI-driven commerce marketplace dynamics

Internal Pilot Deal Outcomes
– Participants: 69 Anthropic employees (internal pilot)
– Budget: $100 per participant (gift cards)
– Activity: 186 completed deals
– Value: >$4,000 total deal value
– Setup: 4 parallel marketplaces (1 “real” + 3 study markets)
(Reported by Anthropic’s Project Deal write-up and TechCrunch’s coverage.)

Overview of Anthropic’s Marketplace Experiment

Anthropic’s “Project Deal” is a small but telling preview of what happens when AI systems stop being mere shopping assistants and start acting as economic actors. In the experiment, Anthropic created a classified-style marketplace in which AI agents represented humans on both sides of a transaction—buyers and sellers—negotiating in natural language and closing deals involving real goods and real money.

The setup was intentionally grounded in everyday commerce rather than abstract simulations. Participants listed personal items, negotiated prices, and ultimately honored agreements after the experiment ended. In other words, this wasn’t a toy model where nothing mattered; it was a controlled environment with real incentives and real follow-through.

Anthropic described the test as a pilot with a self-selected pool. That caveat matters: a workplace environment is relatively high-trust, and the stakes are modest compared with open internet marketplaces. Still, the company said it was “struck by how well Project Deal worked,” suggesting that the basic mechanics—matching, bargaining, agreement, and settlement—can be handled by agents without rigid protocols.

The experiment also aimed at a deeper question: if agents become the interface to commerce, what happens when some people have better agents than others? Project Deal’s early results point to a future where “agent quality” becomes a new axis of advantage—possibly invisible to the people losing out.

Internal Pilot Transaction Context
What this was: an internal, time-boxed pilot where Claude-based agents negotiated real secondhand transactions between coworkers, with deals honored afterward.
What this wasn’t: a public-market launch, a fraud-resistance test, or proof that the same outcomes will hold in open, adversarial marketplaces.
Freshness note: The results discussed here were publicly reported in late April 2026, based on Anthropic’s write-up and contemporaneous coverage.

Details of Project Deal

Project Deal was structured to test agent-mediated commerce end-to-end: listing, discovery, negotiation, agreement, and execution. Anthropic ran the experiment internally, using employees as participants and Claude-based agents as their representatives.

The company ran four separate marketplaces with different model configurations. One was described as “real,” meaning deals were honored and exchanges happened after the experiment. The other three were run for study, allowing Anthropic to compare outcomes under different agent conditions.

The marketplace itself functioned like a classifieds board, hosted in a controlled environment (Slack-based, per reporting and summaries). Where this article cites specific counts, ratings, or model comparisons, those figures are attributed to Anthropic’s write-up and TechCrunch’s reporting rather than inferred from the experiment’s existence alone. Items ranged widely—examples cited include snowboards and ping-pong balls—mirroring the messy variety of real secondhand markets where value is subjective and information is incomplete.

Blinded Multi-Market Negotiation Flow
1) Intake: participants share what they’re selling/buying, target prices, and negotiation preferences.
2) Agent setup: each participant is represented by a Claude-based agent; in some runs, model assignment is blinded to the participant.
3) Market structure: four parallel markets run (one “real” market where deals are honored; three study markets to compare conditions).
4) Live negotiation: agents post listings, discover items, message counterpart agents, and negotiate in natural language.
5) Agreement + settlement: once agents agree, deals are recorded; after the experiment, humans complete the exchange and payment using the provided budget.
Checkpoints that matter: blinding (reduces expectation effects), honoring deals (keeps incentives real), and parallel markets (enables model-to-model comparisons).

Participant Structure

Project Deal involved 69 Anthropic employees. Each participant was represented by an AI agent in the marketplace, acting on their behalf as both buyer and seller. That dual role is important: it forces agents to handle not just “getting a bargain,” but also maximizing value when selling—two objectives that often require different negotiation tactics.

Participants were self-selected, which likely skewed the pool toward people curious about AI and willing to tolerate some friction or odd behavior from an experimental system. That makes the results less generalizable to the public, but it also provides a realistic early-adopter lens—often the first group to shape how new marketplace mechanics evolve.

Anthropic’s design also included blinding around agent capability in at least some runs: participants did not know which model represented them. This matters because it reduces expectation effects—people can’t easily attribute outcomes to “my agent is weaker” if they don’t know what they have.

Agents were given instructions based on participant input, including preferences such as desired prices and negotiation style. Negotiations were conducted in natural language rather than through a fixed bidding protocol, pushing agents into the ambiguity that defines most real bargaining.

Budget Allocation

Each of the 69 participants received a budget of $100, paid out via gift cards, to buy items from coworkers. This created a hard constraint that mimics real consumer limits: agents couldn’t simply “win” every negotiation by paying more, because funds were finite.

The budget also served as a built-in safety rail. With relatively small amounts at stake, the experiment could test real-money dynamics without exposing participants to large losses. At the same time, the money was real enough to make trade-offs meaningful: overpaying for an item reduces what you can buy later; underselling leaves value on the table.

Anthropic reported that the marketplace produced more than $4,000 in total deal value. Given the $100-per-person budget, that aggregate figure signals that participants actively used the system rather than treating it as a novelty. It also suggests that agents were able to find matches and close transactions at a pace that made the market feel “liquid” enough to function.

Notably, Anthropic also observed that the initial instructions given to agents did not appear to affect sale likelihood or negotiated prices. If that finding holds up, it implies that model capability and situational dynamics may dominate over prompt-level “style” tweaks—an important lesson for anyone assuming that better prompting alone can equalize outcomes.

Marketplace Dynamics and Transactions

What this pilot can (and can’t) tell us

Project Deal was a one-week, internal, self-selected pilot among coworkers, with modest budgets and a relatively high-trust setting. That makes it a strong feasibility signal for agent negotiation and settlement mechanics, but a weak proxy for open marketplaces where fraud, adversarial bargaining, identity risk, and higher stakes dominate outcomes.

Project Deal’s most concrete proof point is simple: deals happened, repeatedly, and at non-trivial volume. For a one-week internal pilot, that’s enough activity to reveal patterns—especially around negotiation performance and user perception.

The experiment also highlights a subtle shift in what a “marketplace” is. In traditional classifieds, humans do the searching, messaging, bargaining, and scheduling. Here, agents did much of that work, compressing the time and attention required from participants. If that model scales, marketplaces may increasingly optimize for agent-to-agent interoperability rather than human browsing.

At the same time, Project Deal took place in a controlled, relatively cooperative environment. That likely reduced adversarial behavior, fraud attempts, and strategic manipulation—factors that dominate many public marketplaces. The results therefore demonstrate feasibility, not full readiness for open deployment.

Pilot Strengths and Limits
What the pilot supports with confidence:
– Agents can complete the negotiation loop (discover → message → bargain → agree) with real incentives.
– Parallel markets can surface measurable outcome differences by model capability.
What likely won’t generalize without additional design:
– Fraud resistance (scams, chargebacks, fake listings) in open networks.
– Adversarial negotiation tactics and manipulation at higher stakes.
– Identity, reputation, and dispute resolution when counterparties aren’t coworkers.
How to read the numbers responsibly: treat them as a feasibility and “directional inequality” signal, not as a forecast of public-market conversion rates or safety.

Types of Marketplaces

One was the “real” market, where participants were represented by the company’s most advanced model and deals were actually honored after the experiment. The other three markets were used for study, enabling comparisons across conditions.

This multi-market design matters because it treats marketplaces as experimental systems rather than monoliths. By varying which models represented users, Anthropic could observe how agent capability changes outcomes while keeping the overall environment similar.

The marketplace format was classified-style: participants listed items and negotiated directly. That’s a useful testbed because it requires agents to handle unstructured listings, subjective valuations, and back-and-forth bargaining—tasks that are harder than fixed-price checkout flows.

It also hints at where agent commerce may first take hold: environments where negotiation is common, inventory is heterogeneous, and the “cost” of human time is high relative to the value of the item. Secondhand markets fit that profile well.

Transaction Volume and Value

Anthropic reported 186 deals made, totaling more than $4,000 in value. Those numbers provide a baseline for what “working” looks like in an agent-mediated market: not just a handful of demo transactions, but sustained deal flow.

The experiment also involved more than 500 listed items (per summaries of Anthropic’s write-up). That gap—many listings, fewer completed deals—is typical of classifieds, where discovery and negotiation friction prevent many items from selling. The key question is whether agents reduce that friction enough to raise conversion rates over time.

That phrasing suggests the company expected more breakdowns—miscommunication, failure to close, or inability to coordinate. Instead, agents were able to identify matches, propose prices, respond to counteroffers, and finalize agreements in natural language.

A useful way to visualize the results would be a simple bar chart comparing (1) total listings, (2) completed deals, and (3) total value—paired with a split by agent model where applicable. Even without the chart, the headline is clear: agent-to-agent negotiation produced real economic activity quickly.

Impact of Agent Quality on Outcomes

Project Deal’s most consequential finding isn’t that agents can negotiate—it’s that better agents negotiate better, and the people represented by weaker agents may not realize they’re losing.

Anthropic reported that when users were represented by more advanced models, they achieved “objectively better outcomes.” In the experiment’s comparisons, participants represented by Claude Opus 4.5 outperformed those represented by Claude Haiku 4.5 on key metrics like sale price and deal volume.

This introduces a new kind of marketplace inequality. In human-to-human markets, skill differences are visible: some people bargain better, write better listings, or respond faster. In agent-mediated markets, the skill gap may be hidden behind a uniform interface—especially if users don’t know what model they’re running.

Comparison of Agent Models

In Anthropic’s comparisons, Claude Opus 4.5 (a more advanced model) delivered better economic outcomes than Claude Haiku 4.5 (a smaller model). Reported differences included:

Opus agents secured $3.64 more per item sold on average.
Opus users completed about two more deals per participant than Haiku users.

Those are not massive deltas per transaction, but they compound—particularly in markets where people transact frequently or where margins are thin. In a scaled setting (say, small businesses using agents to source supplies or manage resale inventory), small per-deal advantages can become meaningful competitive edges.

That finding, paired with the Opus-vs-Haiku gap, points toward capability—reasoning, negotiation strategy, and adaptability—being more decisive than prompt-level tuning.

The experiment’s design—running multiple marketplaces and varying models—also suggests a future where marketplaces may need to standardize agent capabilities or at least disclose them, much like financial markets regulate information asymmetry.

Opus Outperforms Haiku on Deals

Metric (reported)	Claude Opus 4.5	Claude Haiku 4.5	What it implies
Avg. sale price per item	Higher	Lower	Opus captured more value per sale
Avg. price delta	—	—	+$3.64 per item sold (Opus vs. Haiku)
Deals per participant	Higher	Lower	~+2 deals per participant (Opus vs. Haiku)
Fairness rating	4/7 (neutral)	4/7 (neutral)	Users didn’t perceive the outcome gap
(As summarized from Anthropic’s write-up and reporting on Project Deal.)

Perceived Fairness Among Users

Despite measurable performance differences, participants did not seem to notice the disparity. Fairness ratings clustered around neutral—reported as 4 out of 7—regardless of which model represented them.

That mismatch between objective outcomes and subjective perception is the red flag. If people can’t tell when their agent is underperforming, they may not know when to switch providers, upgrade, or challenge a marketplace’s rules. Anthropic explicitly raised the possibility of “agent quality gaps” where “people on the losing end might not realize they’re worse off.”

In consumer markets, perceived fairness often matters as much as actual fairness. But in agent-mediated commerce, perception may lag reality because the user experience is smoothed: the agent speaks confidently, negotiates quickly, and presents outcomes as reasonable—even if they’re systematically worse than what a stronger agent would achieve.

This is also where blinding becomes relevant. If participants didn’t know which model they had, they couldn’t easily attribute outcomes to agent quality. In the real world, users may also be unaware—because agent capability could be bundled, dynamic, or obscured behind branding.

User Experience and Satisfaction Levels

Project Deal wasn’t just a performance benchmark; it was also a test of whether people would accept agents as their commercial representatives. On that front, the signals were broadly positive.

Participants were generally satisfied with their agents’ performance, and nearly half—46%—said they would be willing to pay for such an agent service in the future (per Anthropic’s write-up summaries). That willingness-to-pay figure is notable because it suggests users saw value beyond novelty: reduced time spent messaging, negotiating, and coordinating.

The marketplace also appears to have functioned without requiring a rigid transaction protocol. Agents negotiated in natural language, handling ambiguity in listings and preferences. That matters because most real commerce is messy: items have quirks, conditions vary, and people care about convenience, timing, and trust—not just price.

At the same time, satisfaction coexisted with neutrality on fairness. Users can feel the system is “fine” even when outcomes differ by model quality. That combination—high convenience, low visibility into performance—could accelerate adoption while masking structural disadvantages.

The internal setting likely boosted satisfaction: coworkers are less likely to scam each other, and disputes are easier to resolve socially. In open marketplaces, user experience will depend heavily on guardrails, identity, dispute resolution, and how well agents handle adversarial tactics. Anthropic’s earlier work on agent limitations in other contexts (such as “Project Vend,” referenced in external summaries) underscores that capable agents can still be manipulated if constraints and verification steps are weak.

Convenience Drives Adoption Signals
Signals from participant feedback (as reported):
– Willingness to pay: 46% said they’d pay for an agent service like this
– Perceived fairness: 4/7 (neutral), even when outcomes differed by model
How to interpret that combo: high convenience can drive adoption even when performance gaps are hard for users to detect.

Implications for Future Commerce

Project Deal reads like an early prototype of “agentic commerce,” where software negotiators become the default interface between humans and markets. If that shift happens, it won’t just change how people shop—it will change what marketplaces optimize for, how competition works, and how fairness is defined.

Anthropic’s experiment suggests the infrastructure is viable: agents can negotiate and close deals with real incentives. The harder questions are economic and regulatory: who gets the best agents, how that advantage compounds, and what protections users need when their representative is an opaque model.

Marketplace Shifts Under Project Deal
A practical way to translate Project Deal into “real world” implications:
1) Who benefits first?
– Frequent transactors (resellers, small merchants, procurement teams)
– Anyone for whom time/attention is the bottleneck, not product availability
2) What changes inside marketplaces?
– Interfaces shift from browsing to preference-setting (“my max price,” “my minimum acceptable condition”)
– Competition shifts toward agent capability + interoperability (how well agents negotiate with other agents)
3) What new risks emerge?
– Hidden performance gaps (users can’t easily see they’re underperforming)
– New forms of information asymmetry (model tier, tools, and data access)
– Disputes become “agent behavior” problems, not just user behavior problems

Economic Inequality Concerns

The experiment’s clearest societal implication is the emergence of hidden inequality driven by agent capability. If more advanced agents consistently secure better prices and more deals, then access to those agents becomes a lever of economic advantage.

In Project Deal, the gap was measurable: higher average sale prices and more deals for Opus-represented users. Participants largely didn’t notice. At scale, that dynamic could reinforce existing inequalities if better agents are priced as premium services or bundled into higher-cost subscriptions.

What makes this different from traditional “pay for better tools” dynamics is the opacity. A seller might not know they’re leaving money on the table; a buyer might not know they’re consistently overpaying. If the interface feels equally competent, users may not detect systematic underperformance.

This also raises a marketplace design question: do platforms allow heterogeneous agent quality to compete freely, or do they enforce minimum standards to prevent a race where only the best-represented participants thrive? Project Deal doesn’t answer that—but it makes the question unavoidable.

“People on the losing end might not realize they’re worse off.”
—Anthropic, describing potential “agent quality” gaps in Project Deal findings

Regulatory Considerations

Agent-mediated commerce blurs lines that regulators typically rely on: who is making the decision, who is responsible for misrepresentation, and what disclosures are required. Project Deal surfaces several issues that would become sharper in public deployment:

Transparency: Should users be told what model represents them, and how it compares to others?
Accountability: If an agent negotiates a bad deal or makes a misleading claim, who bears responsibility—the user, the agent provider, or the marketplace?
Fair competition: If agent quality materially affects outcomes, regulators may treat undisclosed disparities as a form of information asymmetry.

External summaries of the broader discussion around Project Deal note calls for regulatory frameworks to ensure fair competition and consumer protection in AI-driven marketplaces. Even without new laws, marketplaces may face pressure to implement disclosure, auditability, and dispute mechanisms tailored to agent behavior.

The experiment also hints at a practical enforcement challenge: if users can’t perceive disadvantage, complaints may not surface until inequality is entrenched. That suggests a role for proactive auditing—measuring outcome disparities across agent tiers—rather than relying solely on user reports.

The Future of Agent-on-Agent Commerce

Navigating the New Digital Marketplace

Project Deal shows that agent-on-agent commerce is no longer speculative. In a constrained environment, agents negotiated real transactions at meaningful volume, and users found enough value to express willingness to pay for the service.

The next phase won’t be about proving agents can bargain—it will be about integrating them into real marketplaces with all the complexity that entails: identity, fraud, disputes, delivery, and adversarial behavior. It will also be about interoperability, as different agent providers represent different users, each optimizing for their own objectives.

If marketplaces become “agent-first,” the user experience may shift from browsing and messaging to setting preferences and constraints—then letting agents execute. That could reduce friction dramatically, but it also concentrates power in the systems that interpret those preferences and decide what “best outcome” means.

Operational Requirements for Agent Marketplaces
If you’re building or operating an agent-mediated marketplace, the “next-step” requirements tend to cluster into a few concrete rails:
– Identity & authorization: prove the agent is allowed to act for the user (and for which actions)
– Reputation signals: carry trust across transactions without leaking sensitive user data
– Fraud & abuse controls: detect scams, collusion, and manipulation aimed at agents
– Dispute resolution: clear escalation paths when agents miscommunicate or counterparties disagree
– Transparency: disclose agent capability/tier in a way users can understand and compare
– Outcome monitoring: routinely check for systematic performance gaps across agent tiers
– Interoperability: define how agents from different providers message, negotiate, and confirm agreements
– Settlement & receipts: unambiguous records of what was agreed, when, and by whom

Ensuring Fairness and Transparency

Project Deal’s most important warning is that unequal outcomes can coexist with user satisfaction and neutral fairness perceptions. That combination is precisely how structural disadvantages persist: they’re hard to see, easy to rationalize, and difficult to challenge.

If agent commerce expands, fairness may require explicit design choices: disclosure of agent capability, standardized performance baselines, and mechanisms for users to audit or compare outcomes. Without that, “agent quality” could become a quiet determinant of who wins in digital markets—less like a visible skill gap and more like an invisible infrastructure advantage.

This analysis is written from a digital-transformation and payments/marketplace-operations perspective shaped by Martin Weidemann’s work building and scaling technology-driven businesses in regulated, multi-stakeholder environments.

This article reflects publicly available information at the time of writing about Anthropic’s internal pilot marketplace (“Project Deal”) and contemporaneous coverage. Because it occurred in a high-trust workplace setting with modest budgets, the findings should be treated as an early feasibility signal rather than a predictor of open internet marketplace behavior. Some details remain uncertain and may change as additional documentation or follow-up results are published.

Martin Weidemann

I am Martín Weidemann, a digital transformation consultant and founder of Weidemann.tech. I help businesses adapt to the digital age by optimizing processes and implementing innovative technologies. My goal is to transform businesses to be more efficient and competitive in today’s market.
LinkedIn

Anthropic’s Agent-on-Agent Commerce Marketplace Experiment

Table of Contents

Anthropic tests AI-driven commerce marketplace dynamics

Overview of Anthropic’s Marketplace Experiment