API Reference

All endpoints, parameters, and response formats for the FowyldAI Engine API.

Base URL All endpoints are served from your local instance: http://localhost:8000 (default). Configure the port in Configuration.

Authentication

FowyldAI supports optional API key authentication when configured. By default, authentication is disabled for local deployments.

curl -H "Authorization: Bearer YOUR_API_KEY" \
  http://localhost:8000/ask

Enable authentication in your config.yaml:

security:
  api_key_required: true
  api_keys:
    - name: "my-app"
      key: "fai_xxxxxxxxxxxxxxxxxxxx"
      scopes: ["ask", "models", "embeddings"]

POST /ask

Send a natural language query and receive a structured, reasoned response. This is the primary endpoint for interacting with FowyldAI.

Request

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the OWASP Top 10?",
    "context": "We are a healthcare SaaS company",
    "max_tokens": 500,
    "temperature": 0.3
  }'

Parameters

Parameter	Type	Required	Description
`query`	string	Yes	The question or instruction to process
`context`	string	No	Additional context to ground the response
`max_tokens`	integer	No	Maximum response length (default: 1024)
`temperature`	float	No	Creativity control, 0.0-1.0 (default: 0.4)
`model`	string	No	Specific model to use (default: auto-routed)
`stream`	boolean	No	Enable streaming response (default: false)

Response

{
  "answer": "The OWASP Top 10 is a standard awareness document...",
  "model_used": "fowyld-general",
  "tokens_used": 342,
  "latency_ms": 1247,
  "sovereign": true,
  "confidence": 0.92
}

GET /health

Returns engine status, loaded models, and uptime. Use this for monitoring and readiness probes.

Request

curl http://localhost:8000/health

Response

{
  "status": "healthy",
  "version": "1.4.0",
  "models_loaded": 3,
  "sovereign": true,
  "uptime_seconds": 86412,
  "gpu_available": true,
  "memory_used_mb": 4821
}

GET /models

List all loaded models with their capabilities and resource usage.

Request

curl http://localhost:8000/models

Response

{
  "models": [
    {
      "id": "fowyld-general",
      "name": "FowyldAI General 3.8B",
      "parameters": "3.8B",
      "quantization": "Q4_K_M",
      "capabilities": ["general", "code", "reasoning"],
      "memory_mb": 2400,
      "status": "loaded"
    },
    {
      "id": "fowyld-embed",
      "name": "FowyldAI Embed v1.5",
      "parameters": "137M",
      "capabilities": ["embeddings"],
      "memory_mb": 280,
      "status": "loaded"
    }
  ]
}

OpenAI-Compatible Endpoints

FowyldAI implements the OpenAI API specification. Point any OpenAI-compatible client library to your local instance:

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # local deployment
)

response = client.chat.completions.create(
    model="fowyld-default",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain container isolation."}
    ]
)
print(response.choices[0].message.content)

POST /v1/chat/completions

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fowyld-default",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "temperature": 0.7,
    "max_tokens": 256,
    "stream": false
  }'

POST /v1/embeddings

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fowyld-embed",
    "input": "Sovereign AI for enterprise workloads"
  }'

GET /sovereignty/status

Verify that the engine is operating in full sovereign mode with zero external connections.

Response

{
  "sovereign": true,
  "external_connections": 0,
  "telemetry_enabled": false,
  "cloud_models_active": false,
  "network_guard": "active",
  "last_audit": "2026-04-23T08:00:00Z"
}

Error Handling

All errors return a consistent JSON format:

{
  "error": {
    "code": "model_not_found",
    "message": "The requested model 'gpt-4' is not loaded.",
    "status": 404
  }
}

Status	Code	Description
400	`invalid_request`	Missing or malformed parameters
401	`unauthorized`	Invalid or missing API key
404	`model_not_found`	Requested model is not loaded
429	`rate_limited`	Too many requests (if rate limiting configured)
503	`engine_loading`	Engine is still loading models

Rate Limits

By default, no rate limits are applied. Configure limits in config.yaml:

rate_limiting:
  enabled: true
  requests_per_minute: 60
  burst: 10