A technical SEO audit without a structured checklist is how small issues get missed and compound into ranking problems over months. This is the actual list I work through when auditing a new client's site — 50 checkpoints across crawlability, Core Web Vitals, structured data, site architecture, and AI readiness. I've grouped them into categories so you can prioritise by impact and work through them systematically.
Crawlability and Indexation
If Googlebot can't crawl your pages efficiently, nothing else matters. Start here.
- Robots.txt is accurate and not blocking critical resources. Fetch your robots.txt directly (domain.com/robots.txt) and verify that JS, CSS, and key page paths aren't being blocked. This is a shockingly common mistake on sites that have gone through CMS migrations.
- XML sitemap is submitted and validated in Search Console. The sitemap should only include canonical, indexable URLs. Remove noindex pages, paginated URLs (unless paginated correctly), and parameter-based duplicates.
- Canonical tags are consistent and pointing to the right URL. Check that self-referencing canonicals exist on every page and that canonicals aren't creating loops or pointing to redirected URLs.
- No important pages are returning 4xx or 5xx errors. Run a crawl with Screaming Frog or Sitebulb and filter for non-200 status codes. Cross-reference with Search Console's Coverage report.
- Crawl budget is not being wasted on low-value pages. Session-based URLs, infinite scroll parameters, and filter combinations can bloat crawl budget. Noindex or block these patterns in robots.txt.
- Pagination is handled correctly. For paginated content, use self-referencing canonicals on each page and ensure that page 2+ isn't noindexed accidentally.
- Hreflang is implemented correctly for multilingual sites. Hreflang errors are common — bidirectional tagging must be present, and the return tag must exist on the alternate page.
- No orphan pages exist. Pages with no internal links pointing to them are invisible to crawlers even if they're in the sitemap. Run a crawl and cross-reference with your sitemap to find them.
- Google Search Console shows no manual actions or security issues. Check the Security and Manual Actions reports — these override any ranking work you do.
- JavaScript rendering is not hiding critical content. If your site is heavily JS-rendered, use Search Console's URL Inspection tool to see the rendered HTML. Content that only appears after JS execution may not be indexed.
Core Web Vitals
Core Web Vitals are Google's user experience signals and a confirmed ranking factor. The current metrics that matter are LCP, CLS, and INP.
- LCP (Largest Contentful Paint) is under 2.5 seconds. The most common LCP elements are hero images and above-the-fold headings. Optimise by preloading the LCP image, using a CDN, and eliminating render-blocking resources above the fold.
- CLS (Cumulative Layout Shift) is under 0.1. CLS is usually caused by images without explicit dimensions, late-loading ads, or web fonts causing text shifts. Add width and height attributes to all images and reserve space for ad slots.
- INP (Interaction to Next Paint) is under 200ms. INP replaced FID in 2024 and measures responsiveness across all interactions on the page. Heavy main-thread JS is the primary culprit. Defer non-critical scripts and break up long tasks.
- TTFB (Time to First Byte) is under 800ms. TTFB is a server-side metric. Improve it with server-side caching, a CDN, and reducing server processing time for dynamic pages.
- Images are served in WebP or AVIF format. Both formats deliver significantly smaller file sizes than JPEG or PNG with no visible quality difference for most use cases.
- Images are lazy-loaded below the fold. Use the native loading="lazy" attribute on all images that aren't in the initial viewport.
- Render-blocking resources are eliminated or deferred. CSS should be inlined for critical above-the-fold styles; non-critical CSS and JS should be deferred or loaded asynchronously.
- Font loading is optimised. Use font-display: swap, preload your primary font file, and subset fonts to include only the characters you use.
- Third-party scripts are audited and non-essential ones removed. Every third-party tag adds latency. Run PageSpeed Insights and review the third-party summary — eliminate any tag that isn't actively driving value.
- Field data in CrUX matches lab data in PageSpeed Insights. Lab data shows potential; field data shows reality. Gaps between the two indicate conditions (devices, network speeds) you're not testing for locally.
Structured Data and Schema
- Article or BlogPosting schema is on all blog posts. Include author, datePublished, dateModified, headline, and image fields at minimum.
- Organization schema is on the homepage. Include name, url, logo, contactPoint, and sameAs (linking to all social profiles) to help search engines understand your entity.
- FAQ schema is used on appropriate pages. Only apply FAQ schema where genuine Q&A content exists on the page — Google penalises schema that misrepresents page content.
- BreadcrumbList schema is implemented site-wide. This helps search engines understand site hierarchy and often produces breadcrumb rich results in the SERP.
- LocalBusiness schema is implemented for location-based businesses. Include address, geo coordinates, openingHours, and telephone.
- All schema is implemented in JSON-LD format. JSON-LD is Google's preferred format — it's easier to manage than microdata and doesn't require changes to the HTML structure.
- All schema is validated in Google's Rich Results Test. Test both the live URL and the code snippet. Fix all errors before errors reach production.
- No schema is applied to content that isn't visible on the page. Schema that marks up hidden or non-existent content violates Google's guidelines and can result in a manual action.
- dateModified is updated when content is updated. Freshness signals matter for news and informational content — outdated modification dates can suppress rankings on time-sensitive queries.
- HowTo schema is implemented where step-by-step content exists. HowTo schema can trigger rich results with steps directly in the SERP, increasing click-through rate.
Site Architecture and Internal Linking
- No important page is more than 3 clicks from the homepage. Deep pages receive less crawl budget and less PageRank. If something is important, it should be accessible shallowly.
- Internal links use descriptive anchor text. "Click here" and "read more" are wasted opportunities. Anchor text sends relevance signals — use natural, descriptive phrases that include your target keyword where appropriate.
- Redirect chains are eliminated. A → B → C should be cleaned to A → C. Every redirect in a chain costs crawl budget and dilutes link equity.
- No redirect loops exist. Page A redirecting to Page B which redirects back to Page A causes crawl errors. Run a crawl to detect these.
- All internal links point to canonical URLs. Linking to non-canonical versions of pages creates crawl confusion. Audit internal links and update them to point directly to the canonical destination.
- Pillar pages are well-linked from cluster content. Topic clusters work when the supporting pages consistently link back to the pillar, passing authority and signalling topical depth.
- Navigation includes keyword-relevant anchor text. Global navigation links are crawled on every page — the anchor text in navigation carries significant weight for the linked pages.
- Broken internal links are fixed. Internal 404s are both a user experience problem and a crawl efficiency problem. Run a monthly crawl to catch these.
Mobile-First Indexing
- The mobile version of the site contains the same content as desktop. Google indexes the mobile version of your site. Any content that's collapsed, hidden, or removed on mobile may not be indexed.
- Font sizes are readable without zooming on mobile. Body text should be at least 16px. Smaller text forces users to zoom, which is a negative UX signal.
- Tap targets are large enough and spaced appropriately. Google's minimum recommended tap target size is 48x48px. Overlapping or too-close elements generate a poor mobile usability report in Search Console.
- Mobile-specific structured data matches desktop. If you serve different HTML to mobile users, the schema markup must be equivalent across both versions.
Log File Analysis
- Log files confirm Googlebot is crawling priority pages frequently. Log file analysis shows you exactly which pages Googlebot visits, how often, and with what status codes. Pages that are frequently crawled but not ranking often have content quality issues. Pages that are rarely crawled may have crawl budget problems.
- Non-priority pages are receiving disproportionate crawl budget. If Googlebot spends 60% of its crawl budget on paginated category pages with no unique value, your important pages are getting starved. Use log files to identify and address this.
- Crawl frequency matches content update frequency. Pages you update regularly should be crawled more often. If they're not, check that the sitemap lastmod dates are being updated correctly.
AEO and AI Search Readiness
As AI Overviews and LLM-based search engines become a larger share of how content is discovered, technical SEO increasingly overlaps with Answer Engine Optimization. These checks prepare your site for that environment.
- Content is structured with clear H2 and H3 headings that match question intent. LLMs parse heading structure to identify what a page covers. Headings phrased as questions or direct statements are easier for AI systems to extract and cite.
- Author information is visible and linked to an author page with credentials. E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) is now more important than ever. Visible, credentialled authorship signals to both Google and LLMs that the content is trustworthy.
- Publication date and last modified date are displayed on content pages. Freshness is a ranking signal for informational queries. Displaying dates also helps LLMs assess whether your content is current.
- External links point to authoritative sources. Citing reputable sources improves E-E-A-T and provides AI systems with context about the reliability of your claims.
- FAQ sections exist on informational pages where natural Q&A applies. AI Overviews frequently pull from clearly structured Q&A content. FAQ sections with concise, direct answers are a strong signal.
- Site speed on mobile passes Core Web Vitals thresholds in the field data. Field data from CrUX (Chrome User Experience Report) is what Google uses for ranking — not lab scores. Check your domain in the CrUX dashboard directly.
- HTTPS is enforced site-wide with no mixed content errors. HTTPS is a baseline ranking signal and a trust indicator. Mixed content (HTTP resources on HTTPS pages) is flagged by browsers and creates security warnings.
- Canonical tags, OG tags, and meta descriptions are present and accurate on all key pages. These are basic on-page hygiene items that affect how your pages appear in SERPs and when shared socially.