Claude Vision: Image Analysis and Document Processing

Claude's vision capabilities let you send images alongside text prompts and get back structured, reasoned analysis. Every current Claude model — from Haiku to Opus — processes images through the same reasoning architecture used for text, which means Claude doesn't just label objects in photos. It interprets charts, reads handwritten notes, extracts data from screenshots, explains diagrams, and answers questions about visual content with the same depth it brings to written analysis.

What Claude Can See

Claude accepts images in four formats: PNG, JPEG, GIF, and WebP. You can include up to 20 images per request on claude.ai, or up to 600 via the API (100 for models with a 200k-token context window). Maximum image size is 8000x8000 pixels.

Strengths

Claude's vision is strongest where interpretation and reasoning matter more than raw perception:

Document OCR — reading printed and handwritten text from photos, scans, and screenshots
Chart and graph analysis — extracting data points, identifying trends, explaining what a visualization shows
Technical diagrams — interpreting UML diagrams, architecture drawings, flowcharts, and wireframes
Screenshot analysis — understanding UI layouts, identifying elements, reading text from application screenshots
Multi-image comparison — analyzing differences between images, comparing versions, or synthesizing information across multiple visuals
Contextual description — describing images with nuance, including spatial relationships, inferred purpose, and relevant context

Limitations

Claude's vision is not optimized for:

Precise spatial measurements — it can't reliably count pixels or measure exact distances
Small text in large images — text below roughly 12px in a full-resolution screenshot may be missed
Medical or scientific imaging — while it can describe what it sees, it's not validated for clinical diagnosis
Real-time video — Claude processes static images, not video streams
Image generation — Claude analyzes images but cannot create raster images (PNG, JPEG). It can generate SVG code and describe images for generation tools.

Using Vision in the API

Send images to Claude by including them in the message content array alongside text:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6-20260319",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": base64_encoded_image,
                    },
                },
                {
                    "type": "text",
                    "text": "Extract all text from this receipt and return it as structured JSON with fields: store_name, date, items (array of {name, quantity, price}), subtotal, tax, total."
                }
            ],
        }
    ],
)

URL-Based Images

You can also pass images by URL instead of base64:

{
    "type": "image",
    "source": {
        "type": "url",
        "url": "https://example.com/chart.png",
    },
}

Multiple Images

Send multiple images for comparison or batch analysis:

messages=[
    {
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": before_image}},
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": after_image}},
            {"type": "text", "text": "Compare these two UI screenshots. List every visual difference you can identify, including layout changes, color changes, and text differences."}
        ],
    }
]

Using Vision in Claude Code

Claude Code can read images directly from your filesystem:

> Look at the screenshot at screenshots/bug-report.png and identify
  what UI issue the user is reporting

> Read the architecture diagram at docs/system-design.png and explain
  how data flows from the API gateway to the database

> Compare the mockup at design/v2-mockup.png with the current page
  at screenshots/current-page.png and list what needs to change

Practical Applications

Invoice and Receipt Processing

Extract structured data from financial documents at scale:

Extract the following from this invoice image:
- Invoice number
- Vendor name and address
- Line items with descriptions, quantities, and amounts
- Subtotal, tax, and total
- Payment terms and due date

Return as JSON.

Claude reads both printed and handwritten text, handles multi-column layouts, and interprets tables — making it effective for automating data entry from scanned documents.

Screenshot-Based Bug Reports

Turn screenshots into actionable bug reports:

Analyze this screenshot of our checkout page. Identify:
1. Any visual bugs (misaligned elements, overlapping text, broken layouts)
2. Any text that appears truncated or incorrect
3. The state of the page (loading, error, success)
4. What the user was likely trying to do

Chart Data Extraction

Pull quantitative data from visualizations:

This is a bar chart showing monthly revenue for 2025. Extract the
approximate value for each month and return as a CSV with columns:
month, revenue. Also describe the overall trend.

Diagram Interpretation

Make technical diagrams searchable and actionable:

This is a sequence diagram for our authentication flow. Walk through
each step and identify: which services are involved, what data is
passed at each step, and where the flow could fail.

Content Moderation

Screen user-uploaded images for policy compliance:

Review this user-uploaded image. Check for:
- Text overlay (potential spam or advertising)
- Inappropriate content
- Low quality (blurry, corrupted, mostly blank)
Return a JSON object with: {safe: boolean, flags: string[], confidence: string}

Performance and Cost

Token Costs

Images consume tokens based on their resolution. A typical image costs between 200-1,600 tokens depending on size and detail. Higher resolution images cost more tokens but provide better analysis of fine details.

Optimization Tips

Resize before sending — if you're analyzing a 4000x3000 photo for general content, resize to 1000x750 first. You'll save tokens without losing useful information.
Crop to the relevant area — if you only care about one section of a screenshot, crop it before sending rather than asking Claude to focus on a specific region.
Use the right model — Haiku is faster and cheaper for simple classification tasks (is this a receipt? does this contain text?). Opus or Sonnet is better for detailed analysis and extraction.
Batch with the Batch API — for processing hundreds of images, use the Message Batches API for 50% cost savings.

Related Resources

Claude API Integration Guide — authentication, rate limits, and API setup
Cost Optimization: Batch Processing and Prompt Caching — reducing API costs at scale
Data Processing and Analysis with Claude — combining vision with data workflows

Claude Vision: Image Analysis and Document Processing

Claude Vision: Image Analysis and Document Processing

What Claude Can See

Strengths

Limitations

Using Vision in the API

URL-Based Images

Multiple Images

Using Vision in Claude Code

Practical Applications

Invoice and Receipt Processing

Screenshot-Based Bug Reports

Chart Data Extraction

Diagram Interpretation

Content Moderation

Performance and Cost

Token Costs

Optimization Tips

Related Resources

Related Topics in AI Development