AI product recommendations are becoming a new layer of product discovery. This article explains how different AI systems decide what to recommend, which sources they appear to trust, and what e-commerce brands operating in the EU should audit now.
Based on Lex Agentica’s exploratory research, conducted in late 2025 across categories such as treadmills and baby face cream, this article looks at how five AI systems behave when a customer asks them to recommend a product. The findings are directional, grounded in observed behaviour, and bounded to what was tested.
What commerce leaders need to know
- AI product recommendations are not traditional search rankings. They are generated answers that interpret, compare and filter products before the customer reaches a brand’s website.
- Different AI systems appear to rely on different evidence mixes. ChatGPT, ChatGPT shopping experiences, Google and Gemini, Perplexity and Claude do not behave like one channel.
- Product data is becoming commercial evidence. Attributes, specifications, price, availability, reviews, feeds, schema, third-party sources and safety signals all shape whether a product can be confidently recommended.
- Generic AI visibility advice often underweights product data because much of it is written for citations, not transactions. For ecommerce brands, visibility depends on whether the product can be understood, compared, trusted and recommended in a buying context.
- Visibility alone is not enough. A product can appear in an AI answer and still be framed weakly, cautiously or less favourably than a competitor.
- Language changes the picture. For EU brands, Italian, Spanish, German and English queries may produce different recommendations, sources and competitor sets.
- A serious AI visibility audit is not a one-prompt exercise. It compares systems, languages, query types, source quality, framing and competitor substitution.
AI recommendations are becoming a new shelf
E-commerce teams are used to thinking about visibility through channels they already know: search results, Shopping ads, marketplaces, retail media, category pages, social discovery, influencers, email and loyalty. Each of those channels has a commercial logic. Search has rankings. Marketplaces have listings. Paid media has targeting and cost. Retail media has placement. Social has attention and intent signals.
AI product recommendations introduce a different type of shelf. It is not one the customer scrolls through. It is a generated shortlist, a comparison or a single direct recommendation. The customer sees the market after the AI system has already interpreted the request, filtered the options and decided which products are worth mentioning.
A customer may ask “What is the best baby face cream for sensitive skin in Germany?” or “Which treadmill is best for fast running at home?” The answer may mention several products, one product, or none from your brand. It may recommend a competitor, describe your product inaccurately, add a caveat or raise a concern. It may decide another product is easier to justify.
That is the shift. The customer is no longer only searching for products. They are asking an AI system to reduce the market for them, often before they have visited a single brand website. The AI system becomes an interpretation layer between customer intent and product selection. For brands, the question changes. Not only: can customers find us? But: can AI systems understand, verify and recommend us?
That difference is where AI commerce visibility starts.
What we tested and what we found
This article is based on Lex Agentica’s exploratory research into how large language models and AI search systems recommend products. The research focused on practical buying queries rather than abstract tests, because commercial visibility only becomes meaningful when the query reflects a real customer need.
We looked at categories such as treadmills and baby face cream because they expose different types of recommendation behaviour. A treadmill is attribute-heavy: to recommend one, the system needs to understand speed, motor power, running surface, stability, folding design, delivery, warranty, price and country context. A baby face cream is trust-sensitive: the system needs to weigh ingredients, suitability for babies, safety language, dermatological claims, reviews, certifications and local-market evidence.
The point was not to create a universal ranking of brands. It was to understand which sources and signals appear to influence AI product selection.
Two patterns stood out. First, AI systems do not simply retrieve products. They construct recommendations from evidence, from familiar ecommerce signals such as product pages, price, availability, reviews, marketplace listings, Merchant Center data and structured data, to more strategic signals such as third-party reviews, comparison articles, safety tests, certifications, brand authority, entity clarity and local-language sources.
Second, the systems did not behave as one channel. The same type of query could produce different product sets, different sources, different levels of caution and different competitor substitutions depending on the platform. A product can be indexed and still fail to appear. It can appear and still be framed with hesitation. A competitor can be selected not because it is objectively better, but because it is easier for a specific system to understand, verify and justify.
This is why AI visibility should not be treated as only an SEO issue. It is also a product data issue, a trust issue, a source-quality issue, a language issue and a governance issue.
AI commerce visibility is not generic AI visibility
A lot of AI visibility advice is written for broad brand visibility or for B2B software. In that context, the advice often focuses on off-site authority: Reddit, YouTube, review sites, comparison articles, industry publications and community conversations. That advice is not wrong. If a buyer asks an AI system to compare CRM tools or analytics platforms, external sources can strongly influence the answer.
For product-led commerce, the picture is more complicated. When a system recommends a baby cream, treadmill, handbag, coffee machine or skincare item, it needs more than general brand mentions. It needs to understand what the product is, who it is for, what it costs, whether it is available, which variant matters, whether the claims are supportable, whether the reviews are credible, whether the product is safe for the intended use and whether it can be purchased in the relevant market.
That adds another layer to the conversation: product evidence. AI visibility asks whether the brand is mentioned. AI commerce visibility asks whether the system can understand, trust, compare and recommend the product in a buying context. For products, AI visibility is not only a citation problem. It is a product evidence problem.
The product evidence layer
Across the tests, AI product recommendations appeared to depend on several layers of evidence. Some were familiar to e-commerce teams: product pages, titles, descriptions, images, specifications, reviews, price, availability, marketplace listings, Merchant Center data and structured data. Others were more strategic: third-party reviews, comparison articles, safety tests, certifications, brand authority, entity clarity, local-language sources, user context, category risk and claim credibility.
The important point is not that every system uses every signal in the same way. It is that different systems appeared to reward different combinations of evidence. A brand may have excellent product pages but weak merchant feeds. It may have strong reviews but poor structured data. It may have brand authority but vague attributes. It may have good English content but weak Italian, Spanish or German evidence. The gaps do not live in one score. They live across systems, languages, queries and source types.
The source-priority map
The table below summarises what each platform appears to prioritise based on Lex Agentica’s findings. It is based on observed behaviour, source analysis and public platform documentation. It is not based on access to the internal ranking systems of any AI platform.
| Platform | Signals it appears to prioritise | What our research suggests | Commercial implication |
|---|---|---|---|
| ChatGPT reasoning | Public web knowledge, product pages, reviews, comparison content, user context, specifications | Strong at constructing a reasoned shortlist. It appears to reward products with clear attributes and well-explained use-case fit. | Optimise for clarity: product attributes, use cases, comparison copy, FAQs, reviews and evidence-rich pages. |
| ChatGPT shopping experiences | Product relevance to intent, price, reviews, availability, shopping and product data, merchant context | More sensitive to buyability and market availability than pure brand reputation. Results appear to depend on product detail, price, reviews and availability. | Product feed quality, price accuracy, availability, merchant presence and reviews matter heavily. |
| Google and Gemini shopping surfaces | Google Shopping Graph, Merchant Center data, product schema, structured data, reviews, authority signals, safety and certification signals | The baby-care findings suggest Gemini may be more sensitive to trust, safety and certification signals. Google’s commerce infrastructure points to Merchant Center, attributes and structured data as core discovery infrastructure. | For EU brands: Merchant Center, schema, reviews, claims, safety evidence, certifications and policy consistency are critical. |
| Perplexity | Cited web sources, editorial content, comparison articles, technical reviews, community and review content | Appeared more dependent on citation-worthy content and less on classic brand authority. Attribute precision and source-rich third-party material seemed to matter. | Brands need credible content outside their own site: reviews, comparison pages, expert content, technical explainers and third-party mentions. |
| Claude | Long-form reasoning, specifications, brand and entity clarity, safety signals, retrieved search results when web search is enabled | Appeared cautious and analytical. Weak authority, unclear entities or vague specifications may reduce confidence. | Invest in precise specifications, clear entity structure, trustworthy external mentions and low-risk claims. |
This table reflects Lex Agentica exploratory research and is directional, not a universal ranking.
The practical lesson is simple. There is no single “AI channel”. Each system appears to trust a different mix of sources and apply its own logic when deciding what is useful, safe, relevant or easy to justify. This does not mean chasing every system with disconnected tactics. It means building a stronger product evidence layer that can travel across multiple AI environments.
The practical signal matrix
This matrix shows how important different source types appear to be across AI product recommendation systems.
| Source type | ChatGPT | ChatGPT shopping | Google / Gemini | Perplexity | Claude |
|---|---|---|---|---|---|
| Product page clarity | |||||
| Product feed / merchant data | |||||
| Structured data / schema | |||||
| Price and availability | |||||
| Reviews | |||||
| Editorial / comparison content | |||||
| Certifications / safety tests | |||||
| Brand authority / entity clarity | |||||
| Attribute precision | |||||
| User context |
This matrix reflects Lex Agentica exploratory research and is directional, not a universal ranking.
The matrix shows the strategic problem. A brand may be strong in one evidence layer and weak in another. It may perform well in a research-style answer but poorly in a shopping-led answer. It may appear in ChatGPT but not in Gemini. It may be cited by Perplexity but lose in a transaction-oriented surface because the feed, availability or structured data is weaker than a competitor’s.
Read the matrix by column and a broader pattern appears. The systems moving closest to transactions, Google and Gemini shopping surfaces and ChatGPT shopping experiences, appear to weigh product feeds, structured data, price, availability and merchant data more heavily. The research-oriented systems, Perplexity and Claude, appear to lean more on editorial content, precise attributes and third-party sources.
A note on the prevailing advice
Much of the published guidance on AI visibility tells brands to focus on third-party sources: Reddit, YouTube, reviews, comparison sites and community discussions. That advice has value, especially for research-oriented systems where cited sources shape whether a brand enters the answer at all. Perplexity, in particular, appeared to reward citation-worthy third-party evidence in our testing.
But ecommerce brands should be careful with one-size-fits-all advice. For product recommendations, the transaction-oriented systems appeared to depend heavily on product-level evidence: product pages, feeds, structured data, price, availability, specifications, merchant listings, reviews and certification signals. This does not mean off-site evidence is irrelevant. It means off-site evidence cannot compensate for weak product data in a buying context. That is the difference between general AI visibility and AI commerce visibility. One is about being mentioned. The other is about being selected.
ChatGPT reasoning: useful evidence it can explain
ChatGPT-style product answers often behave like a shopping adviser. They take the customer’s intent, interpret the constraints and produce a reasoned shortlist. That means the system needs material it can reason with: for a treadmill, maximum speed, motor power, running deck, stability, delivery and warranty; for beauty, ingredients, skin type, claims, suitability, reviews and safety language.
The recurring pattern is that clearer product evidence makes the product easier to recommend with confidence. Marketing language alone is weak evidence. “Premium quality” does not give the system much to work with. “Dermatologically tested for sensitive skin, fragrance-free, suitable from birth, with ingredient list and certification evidence” gives it a clearer basis for comparison. The system cannot confidently recommend what it cannot clearly explain.
ChatGPT shopping: buyability, price, availability and product data
ChatGPT shopping experiences should be treated separately from general ChatGPT answers. A general answer explains options. A shopping-led answer is closer to a buying moment, and that makes commercial data more important: price, availability, variants, merchant options, reviews, images, descriptions, delivery and return conditions.
For brands, the lesson is practical. A strong product page is useful, but a strong product feed is essential. If the feed is incomplete, outdated or inconsistent, AI shopping experiences may struggle to represent the product accurately. The risk is not only lower visibility. It is wrong visibility: an old price, a missing variant, or a competing retailer listing with cleaner data. The catalogue is no longer only a database for the website. It is part of the brand’s AI discovery infrastructure.
Google and Gemini: structured commerce data, local context and trust signals
Google and Gemini require particular attention because Google already connects search, shopping, Merchant Center, product data, reviews, structured data and AI experiences. For these surfaces, the strongest source layers appear to include Merchant Center product data, structured data, attributes, price and availability, reviews, identifiers, shipping and return information, web authority, and safety or certification signals.
This is why Merchant Center is becoming more strategic. For years it was seen as a feed tool for Shopping ads and free listings. In AI-mediated commerce it becomes part of the product evidence layer. For trust-sensitive products the bar is higher. If a brand says “safe for sensitive skin”, the system looks for ingredient clarity, reviews, dermatological signals or certifications. If the evidence is weak, the recommendation becomes cautious, and caution can be expensive.
Perplexity: citation-worthy third-party evidence
Perplexity often behaves more like a source-led answer engine. Its recommendations depend heavily on what can be cited, compared and supported. A good product page may not be enough if the broader source environment is weak. Perplexity-style answers appear to benefit from editorial reviews, expert guides, comparison articles, technical reviews, credible third-party mentions and well-structured buying guides.
This matters most for mid-market and specialist brands. If the brand is not yet part of the external conversation around the category, the system has less material to cite. That does not mean chasing low-quality “best product” listicles. It means credible third-party corroboration matters. For Perplexity, authority is not only what the brand says about itself. It is what trustworthy sources say around it.
Claude: specifications, clarity and risk-sensitive reasoning
Claude often appears more cautious and analytical, which makes it useful for finding ambiguity. If specifications are vague, it may be less confident. If the entity is unclear, it may hesitate. If claims are broad, it may qualify the answer. If safety evidence is weak, it may become cautious.
This matters in categories where recommendation involves risk: beauty, baby care, food, wellness, health-adjacent products, electrical goods and home fitness. Claude may not be the highest-converting AI surface for shopping today, but it is a useful diagnostic surface. If Claude struggles to explain why your product is suitable, the issue is usually not Claude. It is that the product data, claim evidence or external sources are not strong enough.
Why category and language change the recommendation
AI product recommendations do not happen in a vacuum. The same system can behave differently depending on what kind of product is discussed, what risk is implied, what evidence is available and which language the user uses. Category and language are not secondary details in an audit. They shape the evidence environment the model works with.
Category changes the evidence requirement
A treadmill query forces performance comparison: speed, motor strength, running surface, stability, user weight capacity. A baby face cream query forces trust comparison: ingredients, suitability, dermatological claims, certifications, reviews and safety language. A fashion query forces fit, style, material and occasion. A food and beverage query forces origin, ingredients, allergens and serving context. A health-adjacent query forces caution around claims and safety.
This is why AI visibility audits should not be generic. The right audit for a skincare brand is not the right audit for a furniture brand. The underlying method can be consistent. The evidence requirements change by category.
Language changes the source environment
For EU brands, language is not a cosmetic layer. It changes the source environment the system can draw from. In our testing, the same system asked the same question in different languages returned substantially different product sets. A German query surfaces different retailers, reviews and safety expectations from an English one. An Italian query rewards different provenance and materials language. A Spanish query surfaces different comparison sites and retailers.
AI systems do not simply translate one universal answer. They construct answers from the evidence available around each query, and that evidence is local, linguistic and commercial. A brand can be strong in English and almost invisible in Italian. For EU e-commerce leaders, this is one of the biggest opportunities, because many competitors still treat AI visibility as an English-language problem. In Europe, AI commerce visibility is multilingual by default.
What this changes for marketing and e-commerce teams
AI product visibility does not sit neatly inside one team. Marketing owns positioning and authority. E-commerce owns product experience and catalogue performance. SEO owns discoverability. Data and operations own feeds, identifiers and attributes. Product owns specifications and claims. Legal owns risk boundaries. Leadership owns prioritisation. It sits across all of them, which is why it becomes messy quickly.
If the product page is clear but the feed is weak, the AI shopping result may fail. If the feed is strong but the claims are unsupported, the system may hesitate. If reviews are strong in English but absent in German, the brand may lose visibility in Germany. The work is not only to be found. It is to be understood, trusted, selected and represented accurately.
Visibility is not the same as selection
Many AI visibility conversations stop at one question: did the brand appear? That is too shallow. A product can appear in the answer and still lose the sale. The system may mention it but recommend a competitor more strongly. It may describe the product with hesitation, repeat an outdated claim, or raise a safety concern.
An answer that says “this is a popular option, but customers with sensitive skin may prefer alternatives with clearer ingredient information” is not a visibility win. It is a trust gap. In traditional search analytics, that nuance is hard to see. In AI product recommendations, it is commercially decisive. Brands need to measure not only whether they appear, but how they appear.
What brands should audit first
Most brands should not start by auditing the entire catalogue. That is too broad, too slow and too expensive for a first baseline. Start with the products that matter commercially: bestsellers, margin leaders, strategic categories, new launches, products with strong claims, products with known data issues, and products in trust-sensitive categories where competitors are already visible. The first goal is not to test everything. It is to understand where AI systems already interpret your market, where competitors are stronger, and which evidence gaps are fixable.
Start with realistic buyer prompts
Generic prompts produce generic answers. Commercial prompts reveal commercial weaknesses. Do not only test “Best [category]?” Test prompts that include use case, country, comparison, risk and budget: “Best [category] for [use case] in [country]?”, “Compare [brand] and [competitor]”, “Which [category] brands should I avoid?” and “What should I buy if I care about [ingredients, sustainability, durability, safety]?”
This is where the real findings appear. A brand may perform well in broad discovery prompts and poorly in high-intent comparison prompts. It may appear for its own brand name but disappear when the user asks for the best product in a category. It may be mentioned in English but absent in Italian or Spanish. Those differences show where the evidence layer is weak.
Compare systems, languages and query types
A serious AI visibility audit should compare systems, not rely on one platform. ChatGPT, ChatGPT shopping experiences, Google and Gemini, Perplexity and Claude may reveal different issues. One system may reward your reviews. Another may care more about structured data. Another may need third-party sources. Another may hesitate because specifications or claims are unclear.
It should also compare languages. For EU brands, Italian, Spanish, German and English may produce different product sets, sources and competitor substitutions. And it should compare query types: informational, comparison, risk, buyer-intent and avoidance prompts can all reveal different weaknesses. This is why a one-prompt test is not enough for a commercial decision. It captures one cell of a much larger grid.
Fix the evidence layer in order
Knowing that every evidence layer matters is not the same as knowing what to fix first. For most ecommerce brands, the highest-leverage sequence is practical:
- Product page clarity
- Product feed and merchant data quality
- Structured data and schema
- Attribute precision and claim clarity
- Reviews and third-party evidence
- Language-market coverage
- Source monitoring and governance
The exact priority will change by category, market and business model. A skincare brand with sensitive claims may need to prioritise claim evidence and safety signals earlier. A fashion brand may need to prioritise materials, sizing, imagery, returns and local-language vocabulary. But the principle is consistent: fix the evidence that helps the system understand, trust, compare and recommend the product.
From AI recommendations to agentic commerce
Today, most AI product discovery still sits in the recommendation and comparison stage. The final commercial action usually happens elsewhere: on a website, in a marketplace or through a merchant link. But the direction of travel is clear. Shopping research, AI shopping surfaces, merchant integrations and emerging protocols are all part of the same shift towards AI taking a larger role in how customers discover, compare and act.
This does not mean every EU brand needs to rebuild its commerce stack tomorrow. It does mean the foundations matter now. If product data is inconsistent, systems misread the offer. If claims are unsupported, trust-sensitive recommendations weaken. If language-market evidence is thin, the brand disappears from local prompts.
AI recommendation is the early signal. Agentic commerce is the larger infrastructure question. For a broader view of where this is heading, read our guide to Universal Cart and UCP for EU commerce leaders.
The commercial bottom line
AI product recommendations are becoming a new layer of product discovery. Not a replacement for search, brand or e-commerce fundamentals, but a new intermediary between customer intent and product selection.
The brands that win will not be the ones that simply publish more content about AI. They will be the ones whose products are easier to understand, easier to verify, safer to recommend and more reliable to transact with. That work starts with evidence: clear product pages, complete attributes, reliable feeds, structured data, accurate price and availability, useful reviews, credible third-party sources, supported claims, language-market coverage and governance.
The question for e-commerce leaders is no longer only how to get more traffic. It is whether, when AI systems recommend products in your category, they have enough evidence to choose you. If the answer is unclear, the brand has an AI visibility gap. And that gap is now measurable.
Assess your AI commerce visibility
Lex Agentica helps EU commerce teams understand how AI systems interpret their products across visibility, data integrity, commerce readiness and governance. Our AI Visibility Audit examines how your brand and products appear across AI systems, languages and commercial query types. It identifies where you are visible, where competitors are stronger, which sources appear to influence recommendations, and which product data or trust signals need attention.
FAQ
What is AI product visibility?
AI product visibility is the ability of a product or brand to appear, be understood and be recommended inside AI-generated answers, shopping results or product comparison experiences. It is different from traditional SEO because the system may summarise, compare and select products before the customer reaches a website.
Is AI product visibility the same as SEO?
No. SEO remains important, but it is not the whole picture. Traditional SEO focuses on search visibility. AI product visibility focuses on whether AI systems can understand, verify and recommend a product using the evidence available across product pages, feeds, structured data, reviews, third-party sources and user context.
Why is AI commerce visibility different from general AI visibility?
General AI visibility often focuses on whether a brand is mentioned or cited. AI commerce visibility goes further. It asks whether the system can understand, trust, compare and recommend a specific product in a buying context. For ecommerce brands, product data, feed quality, structured data, price, availability, attributes, reviews and claims all become part of the evidence layer.
Which AI systems should brands test?
At minimum, ChatGPT, ChatGPT shopping experiences where available, Google and Gemini, Perplexity and Claude. Each system may reveal different issues because each appears to rely on a different evidence mix.
Does Merchant Center solve AI visibility?
No. Merchant Center is important, especially for Google surfaces, but it does not solve everything. Brands also need strong product pages, complete attributes, structured data, reviews, third-party evidence, language-market coverage, supported claims and governance.
Why do reviews matter for AI recommendations?
Reviews provide real customer language. They often describe use cases, strengths, weaknesses, fit, quality, safety, sizing and suitability in ways product pages do not. That language helps AI systems understand when a product fits a specific buyer need.
Why does language matter in Europe?
AI product recommendations are shaped by the available evidence in each language and market. Italian, Spanish, German and English queries may surface different retailers, reviews, competitors, sources and trust expectations. For EU brands, AI visibility is language-market-specific.
Can a brand appear in AI recommendations and still have a problem?
Yes. A product may appear but be framed cautiously, compared unfavourably, shown with outdated information or mentioned only as a secondary option. This is why AI visibility audits should measure framing quality, not only presence.
What should brands audit first?
Start with your most commercially important products and categories. Test realistic buyer prompts across AI systems and languages. Then review product pages, feeds, structured data, reviews, claims, third-party sources and competitor substitution.
How often should AI visibility be checked?
AI visibility should be treated as a baseline and monitoring discipline. The first audit identifies structural gaps. Follow-up checks help track whether improvements are changing how AI systems interpret and recommend the brand.
Back to lexagentica.com