How we figure out where a product is really from
A plain-English guide to how Product Origin Checker scores products.
Every score we publish answers three questions about an Amazon product: where it was made, where the unit you'll receive ships from, and where the company selling it is headquartered. This page explains how we arrive at the numbers — and how honest we try to be about uncertainty.
The signals we read from the product page
Our browser extension reads (only) the publicly visible information on the Amazon product page:
- The product title, brand, and description bullets.
- The "Country of Origin" field, when Amazon lists one in the product details table. This is the strongest single signal we have.
- The "Ships from" and "Sold by" lines in the buybox.
- The seller's name and storefront link.
- Up to 15 customer reviews and 5 Q&A entries, scanned for "Made in X" claims, customs mentions, and shipping comments.
- Manufacturer, Date First Available, and Best Sellers Rank.
- The ratings distribution (a lopsided 5-star pattern is a known correlate of opaque drop-shippers).
The pattern cues we extract before any AI
Before we involve an AI model, we extract a handful of cheap pattern signals — designed especially for China detection, which is our primary calibrated use case:
- Whether the seller name contains a known Chinese city (Shenzhen, Hangzhou, Yiwu, Guangzhou, …).
- Whether the seller name ends with "Co., Ltd." — a common PRC business-registration suffix.
- Whether the brand name is a random alphanumeric string of the kind common among drop-shippers.
- Whether the listing was created within the last twelve months.
- Whether free text on the page literally says "Made in X" or "Manufactured in X" for any X.
The AI step
The assembled evidence — every field above, plus the pattern cues — is sent to Anthropic's Claude AI. Claude combines this with its training-data knowledge of brands and companies (it knows TCL is Chinese, that Hydro Flask is American, that Bosch is German), and returns a structured score: top country, ISO code, percent, confidence, and a probability distribution of alternative countries.
For products where signals are thin or contradictory, Claude can perform live web searches against business registries, brand websites, LinkedIn, and news archives — citing the URLs it found. Every score includes the methodology and citations Claude returned.
How confidence works
Each indicator has a percent (our certainty for the top country) and a separate confidence (our overall confidence in the estimate). These are not the same number — and the distinction matters.
- Percent close to 100, confidence close to 100 → we have strong, multiple corroborating sources. You can trust this.
- Percent 80, confidence 50 → we have a reasonable lean, but the evidence is sparse. Treat it as a working hypothesis, not a fact.
- Percent 50, confidence 20 → we genuinely cannot tell. The number is a placeholder; the meaningful signal is the low confidence.
We will not raise confidence above 85 unless we have either a literal Amazon-supplied origin field or multiple independent web sources confirming the same answer. Confidence honesty is the design principle.
Hong Kong, Macau, and Taiwan
For all three indicators, we score Hong Kong (HK), Macau (MO), and Taiwan (TW) as their own countries — never folded into mainland China. This is a hard rule in our system.
Manual verification (when humans override the AI)
Every score has a Contest this finding button. Reporters fill in a form telling us what they think is wrong, optionally proposing the correct value, and explaining their evidence and credentials. We verify the reporter's email and review the claim.
When our review team validates a correction, we install a manual override for that product. Future users see the verified value with a ✓ Verified mark — and the citations list includes the override as the top entry. Manual overrides can be retired or replaced as new information comes in; the full history is preserved for audit.
What we don't currently use
For transparency, here's what's NOT yet in our pipeline:
- US Customs import records. Panjiva, ImportGenius, and similar services have shipment-level data for who imports what from where. This is the gold-standard evidence layer; we plan to integrate it after launch.
- OCR on customer-uploaded product photos. "Made in X" is often printed on packaging visible in review photos; computer vision could read it.
- USPTO and WIPO trademark databases. Brand registrations + registrant addresses are strong public signals.
- OpenCorporates. Cross-jurisdiction company registry data.
Bottom line
Our scores are well-supported probabilistic estimates, not factual claims. When evidence is strong, we're confident. When evidence is thin, we say so loudly. When someone with first-hand knowledge tells us we're wrong, we listen — and the dataset gets better.
If you want to dig deeper, the full technical specification of every data source we consult is published at DATA_SOURCES.md in our repository.