Serverless architecture diagram showing AWS Lambda connected to AI model inference
← All Articles
AWS

AWS Lambda + AI: Build Serverless AI APIs Without Managing Servers

Why Serverless AI Makes Sense

Most AI workloads are bursty. A DevOps team runs 10 queries during an incident, zero queries overnight, and a few during code reviews. Paying for a GPU instance running 24/7 to serve these sporadic requests is wasteful.

AWS Lambda + Bedrock changes the math: you pay only when a request executes. No idle costs. No server management. No capacity planning. The infrastructure scales from zero to thousands of concurrent requests automatically.

For teams that need AI capabilities without the infrastructure overhead, this is the simplest production-ready architecture available.

Network infrastructure representing serverless cloud architecture Serverless AI eliminates idle costs — pay only for the requests you actually make

Architecture

Client Request
    |
    v
API Gateway (REST or HTTP API)
    |
    v
AWS Lambda (Python runtime)
    |
    v
Amazon Bedrock (Claude / Llama / Mistral)
    |
    v
Response back to client

No EC2 instances. No containers. No GPU management. The entire stack is managed by AWS.

Step 1: Create the Lambda Function

Project Structure

lambda-ai-api/
  |- handler.py
  |- requirements.txt
  |- template.yaml (SAM)

The Lambda Handler

# handler.py
import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

# Model configurations
MODELS = {
    "fast": {
        "id": "anthropic.claude-haiku-4-5-20251001-v1:0",
        "max_tokens": 1024
    },
    "standard": {
        "id": "anthropic.claude-sonnet-4-20250514-v1:0",
        "max_tokens": 4096
    },
    "reasoning": {
        "id": "meta.llama3-1-70b-instruct-v1:0",
        "max_tokens": 4096
    }
}


def lambda_handler(event, context):
    try:
        body = json.loads(event.get("body", "{}"))
        prompt = body.get("prompt")
        model_tier = body.get("model", "standard")
        system_prompt = body.get("system", "You are a helpful DevOps assistant.")

        if not prompt:
            return response(400, {"error": "prompt is required"})

        model_config = MODELS.get(model_tier, MODELS["standard"])

        # Call Bedrock
        result = bedrock.invoke_model(
            modelId=model_config["id"],
            contentType="application/json",
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": model_config["max_tokens"],
                "system": system_prompt,
                "messages": [
                    {"role": "user", "content": prompt}
                ]
            })
        )

        response_body = json.loads(result["body"].read())
        answer = response_body["content"][0]["text"]
        usage = response_body.get("usage", {})

        return response(200, {
            "answer": answer,
            "model": model_tier,
            "tokens": {
                "input": usage.get("input_tokens", 0),
                "output": usage.get("output_tokens", 0)
            }
        })

    except Exception as e:
        return response(500, {"error": str(e)})


def response(status_code, body):
    return {
        "statusCode": status_code,
        "headers": {
            "Content-Type": "application/json",
            "Access-Control-Allow-Origin": "*"
        },
        "body": json.dumps(body)
    }

Step 2: IAM Role for Lambda

The Lambda function needs permission to invoke Bedrock models:

resource "aws_iam_role" "lambda_ai" {
  name = "lambda-ai-api-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy" "bedrock_access" {
  name = "bedrock-invoke"
  role = aws_iam_role.lambda_ai.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream"
        ]
        Resource = [
          "arn:aws:bedrock:us-east-1::foundation-model/anthropic.*",
          "arn:aws:bedrock:us-east-1::foundation-model/meta.*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"
      }
    ]
  })
}

Step 3: Deploy with SAM

# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Timeout: 120
    MemorySize: 512
    Runtime: python3.12

Resources:
  AIFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: handler.lambda_handler
      Role: !GetAtt LambdaRole.Arn
      Events:
        ApiEvent:
          Type: HttpApi
          Properties:
            Path: /ai
            Method: post

  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: bedrock-access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - bedrock:InvokeModel
                Resource: '*'
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: '*'

Outputs:
  ApiEndpoint:
    Value: !Sub "https://${ServerlessHttpApi}.execute-api.${AWS::Region}.amazonaws.com/ai"

Deploy:

sam build
sam deploy --guided

Step 4: Test the API

# Standard query (Claude Sonnet)
curl -X POST https://your-api-id.execute-api.us-east-1.amazonaws.com/ai \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a Terraform module for an S3 bucket with versioning and encryption",
    "model": "standard"
  }'

# Fast query (Claude Haiku)
curl -X POST https://your-api-id.execute-api.us-east-1.amazonaws.com/ai \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is the kubectl command to check pod logs?",
    "model": "fast"
  }'

Terminal showing API requests and responses The serverless AI API handles everything from quick lookups to complex code generation

Step 5: Add Authentication

API Key Authentication

# Add to template.yaml
Resources:
  AIFunction:
    Properties:
      Events:
        ApiEvent:
          Type: HttpApi
          Properties:
            Path: /ai
            Method: post
            Auth:
              ApiKeyRequired: true

Cognito Authentication (for User-Based Access)

resource "aws_cognito_user_pool" "ai_api" {
  name = "ai-api-users"

  password_policy {
    minimum_length    = 12
    require_lowercase = true
    require_numbers   = true
    require_symbols   = true
    require_uppercase = true
  }
}

resource "aws_apigatewayv2_authorizer" "cognito" {
  api_id           = aws_apigatewayv2_api.ai_api.id
  authorizer_type  = "JWT"
  identity_sources = ["$request.header.Authorization"]
  name             = "cognito-auth"

  jwt_configuration {
    audience = [aws_cognito_user_pool_client.ai_api.id]
    issuer   = "https://${aws_cognito_user_pool.ai_api.endpoint}"
  }
}

Cost Analysis

Lambda + Bedrock pricing for a team of 5 engineers:

Per-Request Cost Breakdown

ComponentCost per Request
Lambda (512MB, 10 sec avg)$0.000083
API Gateway$0.000001
Bedrock (Claude Sonnet, avg 3K in / 2K out)$0.039
Total per request~$0.04

Monthly Cost by Usage

Monthly RequestsLambdaAPI GatewayBedrock (Sonnet)Total
100$0.01$0.00$3.90$3.91
500$0.04$0.01$19.50$19.55
1,000$0.08$0.01$39.00$39.09
5,000$0.42$0.05$195.00$195.47

Key insight: Lambda and API Gateway costs are negligible. Bedrock token usage dominates the bill. The serverless architecture means you pay exactly proportional to usage — zero requests costs $0.

Comparison with EC2 GPU

MetricLambda + BedrockEC2 g5.xlarge + Ollama
100 requests/month$3.91$734
500 requests/month$19.55$734
1,000 requests/month$39.09$734
Break-even~18,800 requestsFixed

Lambda + Bedrock is cheaper until approximately 18,800 requests per month. For most DevOps teams, that is far beyond normal usage.

Advanced Patterns

Streaming Responses

For long AI responses, stream tokens instead of waiting for completion:

def lambda_handler_stream(event, context):
    body = json.loads(event.get("body", "{}"))

    response = bedrock.invoke_model_with_response_stream(
        modelId="anthropic.claude-sonnet-4-20250514-v1:0",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "messages": [{"role": "user", "content": body["prompt"]}]
        })
    )

    # Process stream (requires Lambda Function URL with streaming)
    stream = response["body"]
    for event in stream:
        chunk = json.loads(event["chunk"]["bytes"])
        if chunk["type"] == "content_block_delta":
            yield chunk["delta"]["text"]

Caching Frequent Queries

Add DynamoDB caching to avoid repeat Bedrock calls:

import hashlib

dynamodb = boto3.resource("dynamodb")
cache_table = dynamodb.Table("ai-cache")

def get_cached_or_query(prompt, model):
    cache_key = hashlib.sha256(f"{model}:{prompt}".encode()).hexdigest()

    # Check cache
    cached = cache_table.get_item(Key={"id": cache_key}).get("Item")
    if cached:
        return cached["answer"]

    # Query Bedrock
    answer = query_bedrock(prompt, model)

    # Cache for 24 hours
    cache_table.put_item(Item={
        "id": cache_key,
        "answer": answer,
        "ttl": int(time.time()) + 86400
    })

    return answer

Rate Limiting

Prevent runaway costs with API Gateway throttling:

Resources:
  AIApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      DefaultRouteSettings:
        ThrottlingBurstLimit: 10
        ThrottlingRateLimit: 5

This limits the API to 5 requests per second with a burst of 10. Adjust based on your team size and budget.

Key Takeaways

  • Lambda + Bedrock is the cheapest AI architecture for teams under 18,000 requests/month
  • Zero idle costs — pay only when requests execute
  • Auto-scales from 0 to thousands of concurrent requests
  • Lambda costs are negligible — Bedrock tokens dominate the bill
  • Add DynamoDB caching to reduce repeat queries and costs
  • Use model tiers (Haiku for fast, Sonnet for standard) to optimize spend
  • Set API Gateway throttling to prevent cost overruns
  • The entire stack deploys in 10 minutes with SAM

FAQ

What is the Lambda timeout for AI requests?

Lambda supports up to 15 minutes timeout. Most Bedrock requests complete in 5-30 seconds depending on model and output length. Set timeout to 120 seconds for safety. If responses consistently approach the timeout, the prompt is too complex or the output too long.

Can Lambda handle streaming AI responses?

Yes, using Lambda Function URLs with response streaming. Standard API Gateway does not support streaming — it buffers the full response. For streaming, use Lambda Function URLs directly or CloudFront + Lambda@Edge.

Is Lambda cold start a problem for AI APIs?

Lambda cold starts add 1-3 seconds on the first request after idle. Since Bedrock API calls themselves take 3-15 seconds, the cold start is a small percentage of total latency. For latency-sensitive production APIs, use Provisioned Concurrency ($0.015/GB-hour) to keep functions warm.

Can I use my own models instead of Bedrock?

Not directly with this architecture. Lambda does not have GPU access. For custom models, use EC2 GPU instances or SageMaker endpoints. You can replace the Bedrock call with a SageMaker endpoint invocation using the same Lambda architecture.

How do I monitor costs?

Enable Cost Explorer tags on Lambda and Bedrock. Set AWS Budget alerts at your monthly limit. The API handler returns token counts in the response — log these to CloudWatch for usage tracking. Review weekly alongside your AWS cost optimization process.

Conclusion

Serverless AI is the simplest way to add AI capabilities to your tools and workflows. No GPUs to manage, no instances to monitor, no capacity to plan. Deploy the Lambda function, hit the API endpoint, get AI responses.

For teams that need AI occasionally — during incidents, code reviews, or documentation — this architecture costs single-digit dollars per month. It scales to thousands of requests without changes. And when nobody is using it, it costs exactly zero.

Need help building serverless AI APIs on AWS? View our AWS Infrastructure Setup service

Read next: AWS Bedrock vs Self-Hosted Ollama: When to Use Each

Written by
SysOpX
Battle-tested DevOps & AWS engineering guides
Need DevOps help? →