Why Your Scraper Gets Nothing: SSR vs CSR Explained

Whether you need a headless browser or a plain fetch call depends on when the framework builds the HTML — server-side or client-side.

web-scrapingplaywrightcheerionext.jstypescript

I was building a job scraper for two sites — LinkedIn and Seek — and expected to write the same code twice. LinkedIn worked with four lines of fetch. Seek returned nothing. Same URL, same approach, completely different result.

The one DevTools check that tells you everything

Before writing a single line of scraping code, open DevTools on the target site. Go to the Network tab, reload the page, and click the first HTML document request. Open the Response tab.

If you can read the job titles in the raw response, a plain fetch() call is all you need.

If you see something like <div id="app"></div> — an empty shell and nothing else — you need a real browser.

This check takes ten seconds and saves hours.

Server-side rendering: HTML arrives fully built

In a server-side rendered app, the server does the work. It fetches the data, runs the template, and sends back a complete HTML page. By the time the response arrives, the content is already there.

fetch() + a parsing library like Cheerio is all you need. Call fetch(), hand the response text to Cheerio, and query the DOM exactly like you would in a browser.

In my code

LinkedIn's jobs API is SSR. fetch() returns a page full of .job-search-card elements — readable immediately with Cheerio.

const res = await fetch(url, { headers: HEADERS })
const html = await res.text()
const $ = cheerio.load(html)
 
$('.job-search-card').each((_, el) => {
  const title = $(el).find('.base-search-card__title').text().trim()
  const company = $(el).find('.base-search-card__subtitle').text().trim() || null
  const location = $(el).find('.job-search-card__location').text().trim()
})

Client-side rendering: you get an empty box

In a client-side rendered app, the server sends an empty HTML shell and a JavaScript bundle. Your browser downloads the bundle, executes it, and the JavaScript builds the DOM — filling in the content after the initial response arrives.

fetch() only sees what the server sent: the empty shell. The job listings, the prices, the content — none of it exists in the response. It's built later, inside a browser, in memory.

SSR — HTML arrives built
Your script calls fetch()
Server fetches data and builds HTML
Response arrives — content is in the HTML
Cheerio reads it directly
CSR — HTML is an empty shell
Your script calls fetch()
Server sends empty <div> + JS bundle
JS runs in the browser and builds the DOM
not in your script
fetch() response has no content

To scrape a CSR site you need to do what the browser does: download the JS, run it, wait for the DOM to populate. That's what Playwright (or Puppeteer) does — it launches a real browser, navigates to the page, and lets the JavaScript finish before you read anything.

In my code

Seek is CSR. Playwright navigates to the page and waits for job cards to appear before reading them.

await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 })
 
await page.waitForSelector('[data-testid="job-card"], [data-testid="no-results"]', {
  timeout: 15000,
}).catch(() => {})
 
const pageJobs = await page.evaluate(() => {
  const cards = document.querySelectorAll('[data-testid="job-card"]')
  return Array.from(cards).map(card => {
    const titleEl = card.querySelector<HTMLAnchorElement>('[data-testid="job-card-title"]')
    const companyEl = card.querySelector('[data-automation="jobCompany"]')
    // ...
  })
})

waitForSelector blocks until the JS has run and the cards exist in the DOM. Without it, evaluate() runs too early and returns an empty array.

Which frameworks use which model

Most frameworks commit to one side.

FrameworkRenderingfetch() works?
PHP, Rails, DjangoSSRYes
WordPressSSRYes
Next.js (getServerSideProps)SSRYes
Next.js (getStaticProps)SSG — pre-built at deployYes
React (Vite / CRA)CSRNo
Vue (default)CSRNo
AngularCSRNo

SSG (static site generation) is a third option — the server pre-builds all HTML at deploy time instead of per-request. From a scraping perspective it behaves exactly like SSR: the response already contains the content.

Next.js: the framework that can do both

Next.js is not purely SSR or CSR — each page chooses its own rendering mode independently.

Rendering modeSignalfetch() works?
getServerSidePropsExported from the page fileYes
getStaticPropsExported from the page fileYes — HTML pre-built at deploy
No data fetching exportDefault in Pages RouterNo
React Server ComponentsApp Router defaultYes
"use client" componentApp RouterNo
getServerSideProps — on every request
User visits the page
one per page load
Server calls getServerSideProps
Function fetches fresh data
DB, API, etc.
Next.js builds HTML with that data
Complete HTML sent to user
next page load repeats from the top
getStaticProps — once at build time
npm run build runs before deploy
before the server starts — part of CI/CD
Next.js calls getStaticProps once
Function fetches data
never called again after this
Next.js writes static HTML files to disk
Server starts and serves those files
data is frozen until next build

This means the same domain can behave completely differently across pages. The search results page might be CSR while the job detail page is SSR. The DevTools check is the only reliable way to know — you have to run it per page, not per site.

Bot detection on top of CSR

Playwright launches a real browser — same Chromium engine, same JavaScript runtime. But an automated browser is still subtly different from a human's, and those differences are measurable. Seek checks for five of them.

Seek doesn't write this detection code themselves — they use a third-party service like DataDome or PerimeterX that injects a script into every page. Simplified, it looks like this:

// injected by bot detection service — runs before any content renders
const score = 0
 
if (navigator.webdriver === true)          score += 100
if (/HeadlessChrome/.test(navigator.userAgent)) score += 100
if (window.innerWidth === 0)               score += 50
if (!navigator.language.startsWith('en-AU')) score += 30
if (requestsAreTooFast())                  score += 50
 
if (score >= 100) {
  renderCaptcha()  // or silently show empty results
  return
}
 
renderJobListings()

The flow from page load to content rendering looks like this:

Without stealth — bot caught
Browser receives empty HTML shell
JS bundle downloads and runs
Detection script checks navigator.webdriver
finds true — bot score hits 100
Captcha or empty results rendered
Your scraper gets nothing
With stealth — checks pass
Browser receives empty HTML shell
addInitScript injects our code
Playwright API — runs before any page script
Our code overrides navigator.webdriver → undefined
property redefined before Seek can read it
JS bundle downloads and runs
Detection script checks navigator.webdriver
finds undefined — score stays low
Job listings rendered

navigator.webdriver

Every browser exposes a JavaScript property called navigator.webdriver. In a normal human's browser it's undefined. In any automated browser — Playwright, Puppeteer, Selenium — it's automatically set to true.

Seek's page checks this in JavaScript before showing content. If it's true, you get nothing.

addInitScript runs our code before the page's own JavaScript executes — so by the time Seek checks, the flag is already overridden.

await context.addInitScript(() => {
  Object.defineProperty(navigator, 'webdriver', { get: () => undefined })
})

User-Agent string

Every browser sends a User-Agent header identifying its type and OS. Playwright's default user agent contains the word HeadlessChrome — an immediate giveaway.

userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'

Viewport size

A headless browser with no screen reports a 0×0 or unusually small viewport. That's detectable.

viewport: { width: 1280, height: 800 }

Language and locale headers

Real Australian users send Accept-Language: en-AU. A generic bot sends nothing or en-US. Seek can serve different content — or block entirely — based on locale mismatch.

extraHTTPHeaders: {
  'Accept-Language': 'en-AU,en;q=0.9',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
}

Request timing

Bots hit pages as fast as possible — zero delay between requests. Seek tracks timing. A short pause between page loads is enough to look human.

await new Promise(r => setTimeout(r, 1500)) // after each page.goto
await new Promise(r => setTimeout(r, 1000)) // between categories
What Seek checksWhat gives bots awayThe fix
navigator.webdrivertrue in automationOverride to undefined before page loads
User-Agent headerContains "HeadlessChrome"Replace with real Mac Chrome string
Viewport size0×0 or unusually smallSet to 1280×800
Accept-Language headerMissing or en-USSet to en-AU
Request timingInstant, no delaysAdd 1–2s pauses between pages

None of these individually is foolproof — Seek could add more checks at any time. Together they're enough to pass Seek's current detection.

← Back to blog