Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.searchable.com/llms.txt

Use this file to discover all available pages before exploring further.

What this is

@searchablehq/middleware is a small Node package that captures inbound requests in your application and fires a non-blocking POST to Searchable’s ingest endpoint. The Searchable backend then runs the same AI-bot classifier the rest of our connectors use, so only GPTBot, ClaudeBot, PerplexityBot, and other AI agents end up in your dashboard.
Zero added latency. The SDK fires events fire-and-forget after your response is built — your users never wait on Searchable. If the network is down or Searchable is unreachable, the request still completes.
The package ships with a first-class withSearchable wrapper for Next.js middleware. The same core primitives (buildEventPayload, sendEvent) can be reused to wire any other Node framework — examples for Express and Fastify are below.

Prerequisites

A Node 18+ application (Next.js 13+, Express, Fastify, etc.)
A Searchable project with your domain confirmed
The two credentials from the common prerequisites: a project site token (st_…) and a workspace API key (sk_live_…)

Install

# pnpm
pnpm add @searchablehq/middleware

# npm
npm install @searchablehq/middleware

# yarn
yarn add @searchablehq/middleware
The package has no required runtime dependencies. next is an optional peer dependency — only loaded when you import from @searchablehq/middleware/nextjs.

Next.js setup

1

Add the credentials to your environment

In .env.local (or your hosting platform’s env-var settings):
.env.local
SEARCHABLE_SITE_TOKEN=st_your_token_here
SEARCHABLE_API_KEY=sk_live_your_key_here
Both values come from LLM Analytics → Setup → Custom in your Searchable dashboard.
2

Create or update middleware.ts

In your Next.js project root:
middleware.ts
import { withSearchable } from "@searchablehq/middleware/nextjs";

export default withSearchable({
  siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
  apiKey: process.env.SEARCHABLE_API_KEY!,
});

export const config = {
  matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"],
};
The matcher keeps the middleware off Next’s static asset routes — those almost never come from AI crawlers, and there’s no point burning a middleware invocation on each.
3

Already have a middleware.ts?

Pass your existing middleware as the second argument to withSearchable. Searchable runs first, then yields to your logic:
middleware.ts
import { withSearchable } from "@searchablehq/middleware/nextjs";
import { NextResponse } from "next/server";

export default withSearchable(
  {
    siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
    apiKey: process.env.SEARCHABLE_API_KEY!,
  },
  async (request) => {
    // Your existing middleware logic here.
    return NextResponse.next();
  },
);
4

Deploy

Ship the change to your hosting platform. The middleware runs on every non-static request and starts forwarding events immediately.Open LLM Analytics → Setup in Searchable — the status strip flips to Connected within a few minutes of the next AI crawler hitting your site.

Config reference

interface SearchableConfig {
  /** Site token (st_*). Required. */
  siteToken: string;

  /** Workspace API key (sk_live_*). Required — sent as `Authorization: Bearer …`. */
  apiKey: string;

  /** Collector endpoint URL. Default: Searchable's tracker worker. */
  endpoint?: string;

  /** Zero the last IP octet before sending. Default: true. */
  anonymizeIp?: boolean;

  /** Log every captured event to stdout. Default: false. */
  debug?: boolean;

  /** Skip capture for matching paths. Return true to skip. */
  ignore?: (path: string) => boolean;

  /** Inject custom properties into the event. */
  custom?: (request: Request) => Record<string, string>;
}

Skipping paths

By default, the SDK auto-skips /_next/* and common static-asset extensions (.js, .css, .png, .jpg, .svg, .ico, .woff, .woff2, .ttf, .map). Add custom skips for health checks, internal APIs, or anything else you don’t want to count:
withSearchable({
  siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
  apiKey: process.env.SEARCHABLE_API_KEY!,
  ignore: (path) =>
    path.startsWith("/api/health") || path.startsWith("/api/internal"),
});

Custom properties

Anything you return from custom(request) is attached to the event under parameters and is queryable from the dashboard:
withSearchable({
  siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
  apiKey: process.env.SEARCHABLE_API_KEY!,
  custom: (request) => ({
    tenant: request.headers.get("x-tenant-id") ?? "unknown",
    abVariant: request.headers.get("x-experiment-bucket") ?? "control",
  }),
});

Debug mode

In development, set debug: true to log every captured event to your terminal:
withSearchable({
  siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
  apiKey: process.env.SEARCHABLE_API_KEY!,
  debug: process.env.NODE_ENV === "development",
});
You’ll see lines like:
[searchable/middleware] GET /pricing → 200 (45ms)
Turn it off in production — every event would also write to stdout.

Express, Fastify, other Node frameworks

The Next.js helper is a thin convenience wrapper around two exported primitives:
import {
  buildEventPayload,
  sendEvent,
  type CapturedRequest,
} from "@searchablehq/middleware";
Wire them into any framework with an after-response hook.

Express

server.ts
import express from "express";
import {
  buildEventPayload,
  sendEvent,
  anonymizeIp,
} from "@searchablehq/middleware";

const app = express();

app.use((req, res, next) => {
  const start = Date.now();

  res.on("finish", () => {
    const captured = {
      domain: req.headers.host ?? "",
      method: req.method,
      url: `${req.protocol}://${req.headers.host}${req.originalUrl}`,
      path: req.path,
      status_code: res.statusCode,
      response_time_ms: Date.now() - start,
      user_agent: req.headers["user-agent"] ?? "",
      ip_address: anonymizeIp(
        (req.headers["x-forwarded-for"] as string)?.split(",")[0]?.trim() ??
          req.ip ??
          "",
      ),
      referrer: (req.headers.referer as string) ?? "",
      referrer_domain: "",
      headers: {},
      query_parameters: {},
      utm_source: "",
      utm_medium: "",
      utm_campaign: "",
      utm_term: "",
      utm_content: "",
    } satisfies CapturedRequest;

    const payload = buildEventPayload(captured, process.env.SEARCHABLE_SITE_TOKEN!);
    // Fire-and-forget — don't await
    void sendEvent(payload, {
      siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
      apiKey: process.env.SEARCHABLE_API_KEY!,
    });
  });

  next();
});

Fastify

server.ts
import Fastify from "fastify";
import {
  buildEventPayload,
  sendEvent,
  anonymizeIp,
} from "@searchablehq/middleware";

const app = Fastify();

app.addHook("onResponse", async (request, reply) => {
  const captured = {
    domain: request.headers.host ?? "",
    method: request.method,
    url: `${request.protocol}://${request.headers.host}${request.url}`,
    path: request.url.split("?")[0],
    status_code: reply.statusCode,
    response_time_ms: reply.elapsedTime,
    user_agent: request.headers["user-agent"] ?? "",
    ip_address: anonymizeIp(request.ip),
    referrer: (request.headers.referer as string) ?? "",
    referrer_domain: "",
    headers: {},
    query_parameters: {},
    utm_source: "",
    utm_medium: "",
    utm_campaign: "",
    utm_term: "",
    utm_content: "",
  };

  const payload = buildEventPayload(
    captured,
    process.env.SEARCHABLE_SITE_TOKEN!,
  );
  void sendEvent(payload, {
    siteToken: process.env.SEARCHABLE_SITE_TOKEN!,
    apiKey: process.env.SEARCHABLE_API_KEY!,
  });
});
Always call sendEvent after the response is finalised (res.on("finish"), onResponse, …). Doing it earlier slows down the response, and status_code / response_time_ms won’t be accurate.

What gets captured

For every non-static request the SDK records:
  • HTTP method, path, URL, status code, response time
  • User agent (used by Searchable to classify the AI bot)
  • Anonymised IP (zero last octet by default — toggle via anonymizeIp: false)
  • Referrer + parsed referrer domain
  • UTM parameters extracted from the URL
  • Geo location (country, region, city) when your edge runtime exposes it (e.g. Next.js Edge Runtime)
  • Filtered request headers — only an allowlist of safe ones (accept-language, host, sec-ch-ua*, etc.)
  • Non-UTM query parameters
  • Anything you return from custom(request)
Cookies, request/response bodies, and full IP addresses are never sent. Sensitive headers (authorization, cookie, set-cookie, x-api-key, proxy-authorization, x-forwarded-for, x-real-ip) are stripped at the edge before the worker forwards events to ingest.

Verifying the connection

In Searchable:
  1. Go to LLM Analytics → Setup
  2. The Custom card’s status indicator should flip to Connected once the first event arrives
  3. Hit your site with curl using a known AI user agent to force one:
curl -H "User-Agent: GPTBot/1.0 (+https://openai.com/gptbot)" \
  https://yourdomain.com/
StatusWhat it means
Waiting for first eventThe middleware is deployed but no AI bot has been seen yet. Curl an AI UA to force one.
ConnectedEvents are arriving. The count from the last 24 hours is shown alongside.
Add debug: true locally to confirm events are firing without leaving your dev environment.

Troubleshooting

Most often the request never reaches Searchable because the middleware isn’t running on the routes AI bots hit.
  • Check your config.matcher actually covers the live URLs — Next’s default skips _next/* but you might be excluding more than intended
  • Confirm SEARCHABLE_SITE_TOKEN and SEARCHABLE_API_KEY are set in the deployed environment (not just locally)
  • Set debug: true, redeploy, and check your logs — the SDK logs every event it sends
  • Curl your live domain with User-Agent: GPTBot/1.0 and re-check the status
Searchable filters non-bot user agents server-side. If your test request used a normal browser UA, it’s discarded silently. Use a known AI UA:
curl -H "User-Agent: GPTBot/1.0 (+https://openai.com/gptbot)" https://yourdomain.com/
Other supported test UAs: ClaudeBot/1.0, PerplexityBot/1.0, Google-Extended/1.0.
The API key is missing or wrong:
  • Confirm SEARCHABLE_API_KEY starts with sk_live_ and has no leading/trailing whitespace
  • If you’ve recently revoked the key in Searchable, generate a new one and update your env
  • The key must have the Log Events permission. Re-create it from the Custom connector dialog if unsure — that’s the default permission for keys generated there.
Set different values for SEARCHABLE_SITE_TOKEN in your staging and production environments. Site tokens are per-project — staging traffic going to a staging project keeps your production data clean.The same workspace SEARCHABLE_API_KEY works across all projects in a workspace as long as it isn’t project-scoped. To scope a key to a single project, generate it from inside that project’s Custom connector dialog.
Override the endpoint via endpoint and point at a path your CDN forwards untouched, or route the SDK’s POST through a server-side proxy that re-adds the header.

Removing the integration

  1. Delete the withSearchable(...) call (or pass through the inner middleware only)
  2. Remove SEARCHABLE_SITE_TOKEN and SEARCHABLE_API_KEY from your env
  3. In Searchable → Settings → API Keys → revoke the API key
Revoking the key alone is enough to stop ingestion immediately — every in-flight POST starts failing with 403 — even if the SDK is still deployed.

Next steps

REST API reference

Want the raw HTTP shape, or instrument a non-Node stack?

See the data

Open LLM Analytics to see which assistants are crawling your site.