Microsoft
SetupBrand Audit Scrape
Programmatically crawl and analyze a website's brand presentation including messaging, visual consistency, and conversion paths
Instructions
Brand Audit Scrape
Crawl a target website to extract and catalog all brand-facing elements: headlines, CTAs, value propositions, navigation structure, color palette, font usage, and conversion paths. This fundamental produces the raw data that brand audit drills consume.
Option A: Playwright (headless browser)
Crawl top-level pages
const { chromium } = require('playwright');
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
// Crawl the sitemap or top-level nav
await page.goto('https://target-site.com/sitemap.xml');
const sitemapXml = await page.content();
// Parse sitemap for URLs, filter to marketing pages (exclude /docs, /blog, /api)
for (const url of marketingUrls) {
await page.goto(url);
const audit = {
url,
title: await page.title(),
metaDescription: await page.$eval('meta[name="description"]', el => el.content).catch(() => null),
h1: await page.$$eval('h1', els => els.map(e => e.textContent.trim())),
h2s: await page.$$eval('h2', els => els.map(e => e.textContent.trim())),
ctas: await page.$$eval('a[href], button', els => els.map(e => ({
text: e.textContent.trim(),
href: e.href || null,
tag: e.tagName
}))),
images: await page.$$eval('img', els => els.map(e => ({
src: e.src,
alt: e.alt
}))),
bodyText: await page.$eval('body', el => el.innerText.substring(0, 5000)),
computedStyles: await page.evaluate(() => {
const body = document.body;
const styles = getComputedStyle(body);
return {
fontFamily: styles.fontFamily,
color: styles.color,
backgroundColor: styles.backgroundColor
};
})
};
// Store audit per page
}
await browser.close();
Extract color palette
await page.evaluate(() => {
const allElements = document.querySelectorAll('*');
const colors = new Set();
allElements.forEach(el => {
const style = getComputedStyle(el);
colors.add(style.color);
colors.add(style.backgroundColor);
});
return [...colors].filter(c => c !== 'rgba(0, 0, 0, 0)');
});
Screenshot key pages
await page.screenshot({ path: `audit/${slug}-full.png`, fullPage: true });
await page.screenshot({ path: `audit/${slug}-above-fold.png`, clip: { x: 0, y: 0, width: 1440, height: 900 } });
Option B: Screaming Frog SEO Spider (CLI)
# Crawl up to 500 pages and export
/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher \
--crawl https://target-site.com \
--headless \
--output-folder ./audit-export \
--export-tabs "Internal:All,Page Titles:All,H1:All,H2:All,Meta Description:All,Images:All"
Exports CSVs with all on-page elements.
Option C: Clay Website Scraper
Use Clay's built-in website scraper for quick extraction:
- Create a Clay table with target URLs
- Add "Scrape Website" enrichment
- Extract: page title, H1, meta description, body text
- Export results as CSV or push to Attio via Clay-Attio integration
Output
The audit scrape produces a structured JSON or CSV per page containing:
- All headlines (H1, H2, H3)
- All CTAs with destination URLs
- Meta descriptions and page titles
- Body text (first 5000 chars)
- Image sources and alt text
- Color values used
- Full-page screenshots
This data feeds into the brand-audit-analysis drill for competitive positioning, messaging consistency, and conversion path analysis.
Error Handling
Navigation timeout: Page took >30s to load. Increase timeout or skip.Element not found: Selector returned null. Use.catch(() => null)for optional elements.Blocked by WAF: Site blocks headless browsers. Add realistic user-agent header and consider using a residential proxy.