$ Building a Claude-Powered Blog Image Generation System
How I built an automated system that generates social images for blog posts using multiple AI providers, with Claude analyzing results and a PR comment interface for human-in-the-loop selection. A case study in agent-orchestrated workflows.
Every blog post needs a social image. It’s the visual that appears when someone shares your article on Twitter, LinkedIn, or Slack. A good social image can mean the difference between a click and a scroll-past. But creating them is tedious work—fire up an image editor, find the right visual metaphor, resize to the right dimensions, export, upload, update frontmatter. Repeat for every post.
I wanted to automate this. Not just the generation part, but the entire workflow from “new blog post detected” to “image selected and applied.” What emerged is a system that orchestrates multiple AI providers, uses Claude as an intelligent curator, and provides a simple PR comment interface for human-in-the-loop decision making.
The Architecture
The system consists of two GitHub Actions workflows that work together:
Workflow 1: Image Generation
- Triggers when blog MDX files change in a PR
- Extracts post metadata (title, description, tags)
- Generates images from multiple providers in parallel
- Uses Claude to analyze all generated images
- Creates a PR comment with image previews and selection commands
Workflow 2: Image Selection
- Triggers when someone comments
/select-imageon a PR - Parses the command to identify which image was chosen
- Moves the selected image to the final location
- Updates the blog post’s frontmatter with the image path
- Cleans up unused staging images
This two-workflow approach keeps concerns separated: generation is expensive and slow, selection is cheap and fast. You don’t want to regenerate images every time someone makes a typo fix in a post.
Multi-Provider Strategy
Rather than betting on a single AI image provider, the system queries multiple providers and lets Claude compare results. Currently it uses:
- OpenAI’s gpt-image-1.5 - Tends toward photorealistic 3D renders
- Google Gemini - Produces more stylized, editorial illustrations
Each provider generates two variations per post, giving four candidate images to choose from. The prompts are crafted from the post’s metadata:
const prompt = `Create a social media image for a blog post.
Title: ${post.title}
Description: ${post.description}
Tags: ${post.tags.join(', ')}
Style: Abstract, conceptual visualization. No text overlay.
Format: Landscape orientation for social sharing.`;The “no text overlay” instruction is intentional—AI-generated text in images is notoriously bad, and the image will be displayed alongside the post title anyway.
Claude as Curator
Here’s where it gets interesting. After all images are generated, the system sends them to Claude’s vision API with this prompt:
Analyze these images for a blog post titled "${title}".
Consider: visual appeal, concept relevance, social sharing impact.
Recommend which image best represents the post's themes.Claude examines each image and provides:
- A description of what it sees
- How well it matches the post’s themes
- A recommendation with reasoning
This analysis appears in the PR comment, helping humans make informed decisions even if they disagree with Claude’s recommendation.
The PR Comment Interface
The generated PR comment looks something like this:
## Blog Images Generated
### post-slug-name
#### OpenAI Image 1

`/select-image post-slug openai-1.png`
#### Gemini Image 2 (Recommended)

`/select-image post-slug gemini-2.jpg`
### Claude's Analysis
The Gemini image better captures the abstract nature of...To select an image, you simply comment /select-image post-slug filename. The selection workflow picks it up, applies the image, and commits the change. No manual file moving, no frontmatter editing.
This command-based interface has several advantages:
- Audit trail - Every selection is recorded in PR history
- Reversible - You can select a different image by commenting again
- Collaborative - Team members can weigh in before selection
- Bot-friendly - Automated systems could make selections based on criteria
What I Learned
After running this system on a batch of posts, some patterns emerged:
Gemini produces more distinctive images. For blog social cards, the stylized editorial illustration style stands out more in social feeds than photorealistic renders. The organic, whimsical aesthetic catches the eye in a sea of corporate stock imagery.
File sizes vary dramatically. Gemini’s JPEGs average around 600-800KB while OpenAI’s PNGs can hit 2-3MB. For social images that get compressed anyway, smaller is better.
Claude’s recommendations are solid but not infallible. About 80% of the time I agreed with Claude’s pick. The other 20%, I had context Claude didn’t—like knowing my audience prefers a certain aesthetic, or that a particular visual metaphor wouldn’t land.
The PR interface changes the workflow. Having image selection happen in the PR rather than a separate tool keeps everything in context. You’re reviewing code, reviewing images, and making decisions in one place.
Concurrency Control Matters
One gotcha: GitHub Actions can get into trouble if multiple selection commands fire simultaneously. The system uses concurrency groups to ensure only one selection runs at a time per PR:
concurrency:
group: blog-image-selection-${{ github.event.issue.number }}
cancel-in-progress: falseThe cancel-in-progress: false is crucial—you don’t want a second selection to cancel the first before it completes.
The Broader Pattern
This image generation system is an instance of a broader pattern I’m finding useful: agent-orchestrated workflows with human-in-the-loop checkpoints.
The workflow:
- Trigger - Something happens (PR opened, file changed)
- Generate - AI agents do expensive/creative work
- Present - Results shown to humans with context
- Decide - Human makes final call via simple interface
- Apply - System executes the decision
The human stays in control of consequential decisions while offloading the tedious generation work to AI. The PR comment interface serves as both presentation layer and decision capture mechanism.
I’m seeing this pattern apply beyond images—code review suggestions, documentation updates, dependency upgrades. Any workflow where AI can generate candidates and humans should pick winners.
Running It Yourself
The system is open source as part of my blog’s repository. The key files:
.github/workflows/generate-blog-images.yml- Generation workflow.github/workflows/select-blog-image.yml- Selection workflow.github/scripts/generate-blog-images.ts- Core generation logic.github/scripts/apply-image-selection.ts- Selection processor
You’ll need API keys for OpenAI, Google AI (Gemini), and Anthropic (Claude). The workflows expect these as repository secrets.
The prompts and provider configuration are easily customizable. Swap in different models, adjust the style instructions, change the number of variations—the architecture supports experimentation.
What’s Next
A few improvements I’m considering:
Style consistency - Right now each post gets independent images. A system that maintains visual consistency across the blog (color palette, illustration style) would create a more cohesive brand.
Automatic selection - For posts where Claude’s confidence is high and the recommendation clearly outperforms alternatives, skip the human checkpoint entirely.
Cost tracking - Image generation isn’t free. Adding cost estimates to the PR comment would help with budgeting and identifying when to regenerate vs. accept.
For now, the system handles the tedious work while keeping humans in control of the creative decisions. That feels like the right balance for something as visible as social images.
The social image for this very post was generated by this system. Meta, I know.