Free playbooks in your inbox
Hands-on tutorials for people who want to build with AI.

How to Track Your AI Citations in 20 Minutes

Track your AI citations free by hand: build a 10-query, 3-engine scorecard, read your real citation share, and check it before you trust any paid tool.

From the youcanbuildthings catalog ▸ Build-tested 8 min read

Summary:

  1. How to track AI citations by hand: 10 buyer-intent queries across 3 engines, 30 cells, 20 minutes.
  2. A copy-paste buyer-intent query set so you measure the questions that actually convert.
  3. The five ways a first audit lies to you, and how to collect a number you can defend.
  4. The share math that turns “am I invisible?” into “13% versus 73%, and here’s the gap.”

Knowing how to track AI citations comes down to one unglamorous habit: 20 minutes a month, a spreadsheet, and your own eyes. Not a $100-a-month dashboard. The people who do this for a living keep landing on the same conclusion, which is that the manual method is still the most trustworthy thing going. Here is the exact protocol, the traps that corrupt a first run, and how to read the number it hands you.

A hand-built AI citation scorecard: 10 buyer-intent queries run across ChatGPT, Perplexity, and Google AI Mode for 30 cells, showing a brand cited in 13% of cells (4 of 30) versus a top competitor at 73% (22 of 30)

How do you track your AI citations?

You build a scorecard: 10 buyer-intent queries down the side, 3 engines across the top, and in each of the 30 cells you log which domains the answer cited. Then you count the cells that name you and divide by 30. That percentage is your citation share, collected by hand, and nobody can argue you out of it because you watched the answer with your own eyes.

The three engines that matter for most buyers are ChatGPT, Perplexity, and Google AI Mode. That trio is what practitioners converged on, and it covers where the questions get asked. Ten queries across three engines is 30 cells, and 30 cells is 20 minutes once you have the rhythm.

SCORECARD (one row per query, one column per engine)

Query                                  | ChatGPT        | Perplexity     | Google AI Mode
best invoicing app for freelancers     | FreshBooks,    | FreshBooks,    | FreshBooks,
                                       | Bonsai, Wave   | Wave, you      | Bonsai
Northwind vs FreshBooks                | FreshBooks     | you, FreshBooks| FreshBooks
invoicing tool that chases overdue...  | Bonsai, Fresh- | you, Bonsai    | (no AI answer)
                                       | Books          |                |
... (10 queries total = 30 cells)

In each cell: who got cited (domains, not just you) + a note on framing.

The discipline that makes the number trustworthy is sameness: same queries, same engines, same way of searching, every month. Change the inputs and you have broken your own before-and-after.

Build your buyer-intent query set

Start with the right questions, because the wrong ones hand you a number that feels great and means nothing. The split that matters is buyer-intent versus informational. “What is an invoice” is informational, someone learning. “Best invoicing app for freelancers” is buyer-intent, someone choosing. Getting cited on the first reaches a student. Getting cited on the second reaches a buyer with a credit card out. One is a vanity citation. The other is a customer.

Weight your set hard toward three buyer-intent shapes:

THE 10-QUERY SET — three shapes, weighted to buyer intent

"best X for Y"
  - best invoicing app for freelancers
  - best invoicing software with recurring billing
  - cheapest invoicing app for solo freelancers

"X vs competitor"
  - Northwind vs FreshBooks
  - FreshBooks vs Bonsai

"X alternative"
  - QuickBooks alternative for freelancers
  - cheaper FreshBooks alternative

+ a couple of your own buyer phrasings:
  - invoicing tool that auto-chases overdue clients
  - best way to accept ACH payments as a freelancer
  - invoicing software for small design agencies

Pull the phrasing from how your buyers actually talk, from support tickets, sales calls, and the subreddits where they complain. Aim for eight to ten where the buyer is clearly shopping. Then keep the set stable. The value comes from running the same 10 every month so the numbers are comparable. Ten you re-run beats fifty you abandon by month two.

Run the queries and read who got cited

Run each query on each engine and make sure the engine actually searched the web, because an answer from memory cites no one. Perplexity searches by default. In ChatGPT, click the web-search globe or type / and choose Search so it retrieves instead of answering from training. Google AI Mode lives at the AI Mode tab or google.com/ai, and AI Overviews show up at the top of a normal results page.

Work engine by engine, not query by query, so you are not constantly switching context. Here is the rhythm that gets 30 cells logged in 20 minutes:

THE 20-MINUTE RHYTHM (engine by engine, not query by query)

  5 min  Perplexity   -> paste all 10 queries, log cited domains
  5 min  ChatGPT      -> same 10, web search on
  5 min  Google       -> same 10, AI Mode or AI Overviews
  3 min  tally your share + your top competitor's
  2 min  screenshot the few answers that matter, dated folder
 -------
 20 min  done, and nothing could disagree with you

For each answer, write down two things. One: which domains got cited, all of them, not just whether you showed up, because your competitors’ presence is the context that makes your number mean something. Two: how each source was framed. Being named in “options include FreshBooks, Bonsai, and Northwind, though Northwind is the newcomer” is a weak citation. Being named in “for freelancers, Northwind is the standout” is a different result. Note which one you got. And take the screenshot. It is your receipt when the number moves next month.

What broke: why the tools disagree, and the manual fix

The reason to start by hand is that the automated trackers cannot agree on your number, and once you see why, you stop trusting any single dashboard. No AI engine, not OpenAI, Google, Anthropic, or Perplexity, publishes an analytics API for brand mentions. So every “AI visibility tracker” is inferring, by running a sample of prompts and extrapolating. Two tools run different samples and report different numbers for the same brand in the same week. Neither is the truth, because the truth, an exact count, is something neither can access.

Practitioners keep saying the same thing about this. One put it cleanly in a thread on tracking AI citations:

The manual approach is still the most reliable for now. Twenty minutes a month running your brand and your top three competitors through ChatGPT, Perplexity, and Google AI Mode, screenshotting the responses, and tracking who appears and how they are described. That gives you a baseline the tools cannot match because you are testing the actual queries your customers use.

— u/SuccessfulCoyote1800, r/SEO

Five traps quietly corrupt a first audit, and dodging them is the whole game. Personalization: run in a logged-out or private window so you measure what a stranger sees, not what the engine thinks you want. Run-to-run wobble: ask twice for your top queries and note whether you showed up both times. Name confusion: count the domain, not the word, so a namesake doesn’t pad your column. Mention is not recommendation: log whether you were recommended or merely listed. Checking only your favorite engine: one engine is one-third of the picture and a confident illusion.

Get your number: 13% versus 73%

Now count, and get the number. Go through all 30 cells, tally how many name you, divide by 30. Here is the math on a worked example:

YOUR SHARE
  cells that name you      = 4
  total cells (10 x 3)     = 30
  citation share           = 4 / 30  = ~13%

TOP COMPETITOR
  cells that name them     = 22
  citation share           = 22 / 30 = ~73%

The gap: 13% vs 73%. That is your starting line, on your own site,
for the queries your buyers actually type.

There it is. That gap, 13 versus 73, might sting. Good. A number that stings is a number you can move, and you now have what almost nobody in your category has: the real one.

Two refinements make it sharper. First, an intent-weighted share: count only your three or four highest-intent queries, because a citation on “best invoicing app for freelancers” is worth more than one on a soft query. Second, a recommended-share: split your cited cells into “recommended” (the engine puts you forward) and “merely listed” (your name appears but it steers elsewhere). A brand can be mentioned in 40% of answers and recommended in 5%, and that gap is often the most actionable line on the sheet, because moving from listed to recommended is usually easier than getting mentioned at all. Keep raw share as your headline, track recommended-share right beneath it.

What should you actually do?

  • If you have never measured → do one 20-minute run this week. A low number is a baseline, not a verdict. You can only prove improvement against a starting point.
  • If you are paying for a tracker → collect your hand baseline anyway, then cross-check the tool against it. If the tool says you are invisible on a query where you can plainly see yourself cited, its sample is missing your queries and its number is suspect for you.
  • If you got one good run → do not make a strategy call off a single snapshot. AI answers wobble. Run it monthly, same day, and read the trend, not the moment.
  • If the gap is brutal → start with your highest-intent queries where a competitor is cited and you are not. Those are winnable, because the engine is clearly willing to quote someone for that question.

The bottom line

  • The manual scorecard beats every paid dashboard for your first number, because you collected it, you can reproduce it, and no vendor’s sampling sits between you and the truth.
  • Measure buyer-intent queries, not informational ones. Being cited on “best invoicing app for freelancers” is a customer. Being cited on “what is an invoice” is a vanity stat that changes your revenue by zero.
  • One run is a snapshot, not a trend. The number only proves a tactic worked when you run the same 10 queries the same way every month and watch the line move.
Why trust this? Every youcanbuildthings guide is pulled from a build-tested book: code that ran in production before it was written down.

Frequently Asked Questions

How do you track AI citations for free?+

Run your top 10 buyer-intent queries through ChatGPT, Perplexity, and Google AI Mode once a month and log which domains each answer cites. Ten queries across three engines is 30 cells and about 20 minutes. It costs nothing and you can reproduce every number.

Are AI citation tracking tools accurate?+

They disagree, because no AI engine publishes a brand-mention API, so every tool samples prompts and extrapolates. Two tools can report different numbers for the same brand in the same week. Collect a hand baseline first and use it to check any tool you pay for.

What is a good AI citation share?+

There is no universal good number. What matters is your share versus your top competitor on buyer-intent queries, and whether it is rising. A brand cited in 4 of 30 cells (13%) against a competitor's 22 of 30 (73%) has a clear, measurable gap to close.