OG 图生成：我以为完成了，直到 x.com 抓取失败

2025-12-28

943 字

之前做的 OG 图能生成，x.com 却抓不到。所以又再次排查后，整理了一下完整的流程，记录下来，供参考。

准备工作

安装必需的依赖：

pnpm add satori sharp @fontsource/inter

说明：

satori：JSX 转 SVG
sharp：图像处理（SVG 转 PNG）
@fontsource/inter：英文字体（可选）

字体文件需要手动放置：

下载 Noto Sans SC TTF 从 Google Fonts
放到 public/fonts/ 下：NotoSansSC-Regular.ttf 和 NotoSansSC-Bold.ttf

字体优化：网页用 npm 包，OG 图用 TTF

不建议把大的 WOFF2 字体文件放在项目里，会增加代码体积。更好的方案是：

网页字体：用 @fontsource/noto-sans-sc npm 包提供的 WOFF2，自动按需加载 unicode 范围，体积更小。

pnpm add @fontsource/noto-sans-sc

在 Astro Layout 中导入：

---
// src/layouts/Base.astro
import "@fontsource/noto-sans-sc/400.css";
import "@fontsource/noto-sans-sc/700.css";
---

OG 图生成：保留 TTF 文件在 public/fonts/ 用于构建时生成 OG 图（不会被打包到网站中）。

这样做的好处：

网站不包含大体积 WOFF2 文件
网页字体从 npm 包加载，体积小
OG 图生成仍能正常工作
代码体积减少约 13.6 MB（两个 WOFF2 文件）

第一步：生成 OG 图

用 satori 把 JSX 转成 SVG，再用 sharp 输出 PNG（1200×630）。

中文字体必须用 TTF 格式（satori 的要求，WOFF 不行）。把 NotoSansSC-Regular.ttf 和 NotoSansSC-Bold.ttf 放在 public/fonts/ 里。

src/utils/og-image.ts：

import satori from 'satori';
import sharp from 'sharp';
import { readFile } from 'node:fs/promises';
import { join } from 'node:path';

let notoSansSCRegular: Buffer | null = null;
let notoSansSCBold: Buffer | null = null;

async function loadFonts() {
  if (!notoSansSCRegular || !notoSansSCBold) {
    try {
      const fontsPath = join(process.cwd(), 'public/fonts');
      notoSansSCRegular = await readFile(join(fontsPath, 'NotoSansSC-Regular.ttf'));
      notoSansSCBold = await readFile(join(fontsPath, 'NotoSansSC-Bold.ttf'));
      console.log('Loaded Noto Sans SC fonts (TTF for OG image generation)');
    } catch (error) {
      console.warn('Noto Sans SC TTF fonts not found:', error);
    }
  }
}

export async function generateOGImage(options: {
  title: string;
  description?: string;
}): Promise<Buffer> {
  await loadFonts();
  
  const hasChinese = /[\u4e00-\u9fff]/.test(options.title);
  
  const svg = await satori(
    h('div', { style: { fontSize: '72px', color: '#fff' } }, options.title),
    {
      width: 1200,
      height: 630,
      fonts: hasChinese ? [
        {
          name: 'Noto Sans SC',
          data: notoSansSCRegular,
          weight: 400
        },
        {
          name: 'Noto Sans SC',
          data: notoSansSCBold,
          weight: 700
        }
      ] : []
    }
  );
  
  return sharp(Buffer.from(svg)).png().toBuffer();
}

第二步：路由暴露图片

创建动态路由，返回图片 buffer，关键是设置正确的 header。

src/pages/og/[...slug].png.ts：

import type { APIRoute } from 'astro';
import { generateOGImage } from '../../utils/og-image';

export const GET: APIRoute = async ({ params }) => {
  const buffer = await generateOGImage({
    title: 'Your Page Title',
    description: 'Your Page Description'
  });
  
  return new Response(buffer, {
    headers: {
      'Content-Type': 'image/png',
      'Cache-Control': 'public, max-age=31536000, immutable'
    }
  });
};

重点：返回 200 状态码、正确的 Content-Type、不能重定向。

第三步：页面引用图片的绝对 URL

拼绝对地址（https://yourdomain.com/og/...png），注入到页面 head。

src/utils/og-url.ts：

export function getOGImageUrl(slug?: string): string {
  const baseUrl = import.meta.env.SITE || 'https://yourdomain.com';
  return `${baseUrl}/og/${slug || 'index'}.png`;
}

src/layouts/App.astro 的 head 部分：

---
import { getOGImageUrl } from '$utils/og-url';

const ogImageUrl = getOGImageUrl(/* slug */);
---

<html>
  <head>
    <meta property="og:image" content={ogImageUrl} />
    <meta property="og:image:width" content="1200" />
    <meta property="og:image:height" content="630" />
    <meta name="twitter:card" content="summary_large_image" />
    <meta name="twitter:image" content={ogImageUrl} />
  </head>
  <!-- ... -->
</html>

重点：og:image 必须是完整的绝对 URL，不能是相对路径。

第四步：允许爬虫访问

这是关键。robots.txt 必须明确允许社交爬虫。

src/pages/robots.txt.ts：

import type { APIRoute } from 'astro';

export const GET: APIRoute = ({ site }) => {
  const text = `
# Search engines
User-agent: Googlebot
User-agent: Bingbot
Allow: /

# Social platforms
User-agent: Twitterbot
User-agent: facebookexternalhit
User-agent: LinkedInBot
User-agent: Pinterestbot
Allow: /

# China platforms
User-agent: WeChatBot
User-agent: Weibo
User-agent: DouyinBot
User-agent: XiaoHongShuBot
Allow: /

# Default: allow all, but block CDN internals
User-agent: *
Allow: /
Disallow: /cdn-cgi

Sitemap: ${new URL('sitemap-index.xml', site)}
  `;
  return new Response(text);
};

第五步：验证爬虫能否访问

模拟爬虫请求验证。

# 1. 检查 robots.txt 是否允许
curl -I https://yourdomain.com/robots.txt | grep -A 20 Twitterbot

# 2. 用爬虫 UA 访问图片
curl -I -A "Twitterbot/1.0" https://yourdomain.com/og/index.png

# 期望看到：
# HTTP/1.1 200 OK
# Content-Type: image/png
# 无重定向

第六步：社交平台工具验证

拿 URL 去社交平台的卡片验证工具跑一遍（例如 X 的 Card Validator）。

清缓存后重新抓取，看能否正常显示图片。如果还是失败，检查：

图片 URL 能否直接访问（浏览器、curl）
og:image 是否是绝对 URL
robots.txt 是否包含对应爬虫的 Allow
如果用了 Cloudflare，检查是否有拦截规则对爬虫做了限制