Building a Claude-Powered Blog Image Generation System

Every blog post needs a social image. It’s the visual that appears when someone shares your article on Twitter, LinkedIn, or Slack. A good social image can mean the difference between a click and a scroll-past. But creating them is tedious work—fire up an image editor, find the right visual metaphor, resize to the right dimensions, export, upload, update frontmatter. Repeat for every post.

I wanted to automate this. Not just the generation part, but the entire workflow from “new blog post detected” to “image selected and applied.” What emerged is a system that orchestrates multiple AI providers, uses Claude as an intelligent curator, and provides a simple PR comment interface for human-in-the-loop decision making.

The Architecture

The system consists of two GitHub Actions workflows that work together:

Workflow 1: Image Generation

Triggers when blog MDX files change in a PR
Extracts post metadata (title, description, tags)
Generates images from multiple providers in parallel
Uses Claude to analyze all generated images
Creates a PR comment with image previews and selection commands

Workflow 2: Image Selection

Triggers when someone comments /select-image on a PR
Parses the command to identify which image was chosen
Moves the selected image to the final location
Updates the blog post’s frontmatter with the image path
Cleans up unused staging images

This two-workflow approach keeps concerns separated: generation is expensive and slow, selection is cheap and fast. You don’t want to regenerate images every time someone makes a typo fix in a post.

Multi-Provider Strategy

Rather than betting on a single AI image provider, the system queries multiple providers and lets Claude compare results. Currently it uses:

OpenAI’s gpt-image-1.5 - Tends toward photorealistic 3D renders
Google Gemini - Produces more stylized, editorial illustrations

Each provider generates two variations per post, giving four candidate images to choose from. The prompts are crafted from the post’s metadata:

const prompt = `Create a social media image for a blog post.
Title: ${post.title}
Description: ${post.description}
Tags: ${post.tags.join(', ')}

Style: Abstract, conceptual visualization. No text overlay.
Format: Landscape orientation for social sharing.`;

The “no text overlay” instruction is intentional—AI-generated text in images is notoriously bad, and the image will be displayed alongside the post title anyway.

Claude as Curator

Here’s where it gets interesting. After all images are generated, the system sends them to Claude’s vision API with this prompt:

Analyze these images for a blog post titled "${title}".
Consider: visual appeal, concept relevance, social sharing impact.
Recommend which image best represents the post's themes.

Claude examines each image and provides:

A description of what it sees
How well it matches the post’s themes
A recommendation with reasoning

This analysis appears in the PR comment, helping humans make informed decisions even if they disagree with Claude’s recommendation.

The PR Comment Interface

The generated PR comment looks something like this:

## Blog Images Generated

### post-slug-name

#### OpenAI Image 1
![Preview](staging-url)
`/select-image post-slug openai-1.png`

#### Gemini Image 2 (Recommended)
![Preview](staging-url)
`/select-image post-slug gemini-2.jpg`

### Claude's Analysis
The Gemini image better captures the abstract nature of...

To select an image, you simply comment /select-image post-slug filename. The selection workflow picks it up, applies the image, and commits the change. No manual file moving, no frontmatter editing.

This command-based interface has several advantages:

Audit trail - Every selection is recorded in PR history
Reversible - You can select a different image by commenting again
Collaborative - Team members can weigh in before selection
Bot-friendly - Automated systems could make selections based on criteria

What I Learned

After running this system on a batch of posts, some patterns emerged:

Gemini produces more distinctive images. For blog social cards, the stylized editorial illustration style stands out more in social feeds than photorealistic renders. The organic, whimsical aesthetic catches the eye in a sea of corporate stock imagery.

File sizes vary dramatically. Gemini’s JPEGs average around 600-800KB while OpenAI’s PNGs can hit 2-3MB. For social images that get compressed anyway, smaller is better.

Claude’s recommendations are solid but not infallible. About 80% of the time I agreed with Claude’s pick. The other 20%, I had context Claude didn’t—like knowing my audience prefers a certain aesthetic, or that a particular visual metaphor wouldn’t land.

The PR interface changes the workflow. Having image selection happen in the PR rather than a separate tool keeps everything in context. You’re reviewing code, reviewing images, and making decisions in one place.

Concurrency Control Matters

One gotcha: GitHub Actions can get into trouble if multiple selection commands fire simultaneously. The system uses concurrency groups to ensure only one selection runs at a time per PR:

concurrency:
  group: blog-image-selection-${{ github.event.issue.number }}
  cancel-in-progress: false

The cancel-in-progress: false is crucial—you don’t want a second selection to cancel the first before it completes.

The Broader Pattern

This image generation system is an instance of a broader pattern I’m finding useful: agent-orchestrated workflows with human-in-the-loop checkpoints.

The workflow:

Trigger - Something happens (PR opened, file changed)
Generate - AI agents do expensive/creative work
Present - Results shown to humans with context
Decide - Human makes final call via simple interface
Apply - System executes the decision

The human stays in control of consequential decisions while offloading the tedious generation work to AI. The PR comment interface serves as both presentation layer and decision capture mechanism.

I’m seeing this pattern apply beyond images—code review suggestions, documentation updates, dependency upgrades. Any workflow where AI can generate candidates and humans should pick winners.

Running It Yourself

The system is open source as part of my blog’s repository. The key files:

.github/workflows/generate-blog-images.yml - Generation workflow
.github/workflows/select-blog-image.yml - Selection workflow
.github/scripts/generate-blog-images.ts - Core generation logic
.github/scripts/apply-image-selection.ts - Selection processor

You’ll need API keys for OpenAI, Google AI (Gemini), and Anthropic (Claude). The workflows expect these as repository secrets.

The prompts and provider configuration are easily customizable. Swap in different models, adjust the style instructions, change the number of variations—the architecture supports experimentation.

What’s Next

A few improvements I’m considering:

Style consistency - Right now each post gets independent images. A system that maintains visual consistency across the blog (color palette, illustration style) would create a more cohesive brand.

Automatic selection - For posts where Claude’s confidence is high and the recommendation clearly outperforms alternatives, skip the human checkpoint entirely.

Cost tracking - Image generation isn’t free. Adding cost estimates to the PR comment would help with budgeting and identifying when to regenerate vs. accept.

For now, the system handles the tedious work while keeping humans in control of the creative decisions. That feels like the right balance for something as visible as social images.

The social image for this very post was generated by this system. Meta, I know.