API Reference
All endpoints, parameters, and response formats for the FowyldAI Engine API.
http://localhost:8000 (default). Configure the port in Configuration.
Authentication
FowyldAI supports optional API key authentication when configured. By default, authentication is disabled for local deployments.
curl -H "Authorization: Bearer YOUR_API_KEY" \
http://localhost:8000/ask
Enable authentication in your config.yaml:
security:
api_key_required: true
api_keys:
- name: "my-app"
key: "fai_xxxxxxxxxxxxxxxxxxxx"
scopes: ["ask", "models", "embeddings"]
POST /ask
Send a natural language query and receive a structured, reasoned response. This is the primary endpoint for interacting with FowyldAI.
Request
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"query": "What is the OWASP Top 10?",
"context": "We are a healthcare SaaS company",
"max_tokens": 500,
"temperature": 0.3
}'
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The question or instruction to process |
context | string | No | Additional context to ground the response |
max_tokens | integer | No | Maximum response length (default: 1024) |
temperature | float | No | Creativity control, 0.0-1.0 (default: 0.4) |
model | string | No | Specific model to use (default: auto-routed) |
stream | boolean | No | Enable streaming response (default: false) |
Response
{
"answer": "The OWASP Top 10 is a standard awareness document...",
"model_used": "fowyld-general",
"tokens_used": 342,
"latency_ms": 1247,
"sovereign": true,
"confidence": 0.92
}
GET /health
Returns engine status, loaded models, and uptime. Use this for monitoring and readiness probes.
Request
curl http://localhost:8000/health
Response
{
"status": "healthy",
"version": "1.4.0",
"models_loaded": 3,
"sovereign": true,
"uptime_seconds": 86412,
"gpu_available": true,
"memory_used_mb": 4821
}
GET /models
List all loaded models with their capabilities and resource usage.
Request
curl http://localhost:8000/models
Response
{
"models": [
{
"id": "fowyld-general",
"name": "FowyldAI General 3.8B",
"parameters": "3.8B",
"quantization": "Q4_K_M",
"capabilities": ["general", "code", "reasoning"],
"memory_mb": 2400,
"status": "loaded"
},
{
"id": "fowyld-embed",
"name": "FowyldAI Embed v1.5",
"parameters": "137M",
"capabilities": ["embeddings"],
"memory_mb": 280,
"status": "loaded"
}
]
}
OpenAI-Compatible Endpoints
FowyldAI implements the OpenAI API specification. Point any OpenAI-compatible client library to your local instance:
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed" # local deployment
)
response = client.chat.completions.create(
model="fowyld-default",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain container isolation."}
]
)
print(response.choices[0].message.content)
POST /v1/chat/completions
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "fowyld-default",
"messages": [
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"max_tokens": 256,
"stream": false
}'
POST /v1/embeddings
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "fowyld-embed",
"input": "Sovereign AI for enterprise workloads"
}'
GET /sovereignty/status
Verify that the engine is operating in full sovereign mode with zero external connections.
Response
{
"sovereign": true,
"external_connections": 0,
"telemetry_enabled": false,
"cloud_models_active": false,
"network_guard": "active",
"last_audit": "2026-04-23T08:00:00Z"
}
Error Handling
All errors return a consistent JSON format:
{
"error": {
"code": "model_not_found",
"message": "The requested model 'gpt-4' is not loaded.",
"status": 404
}
}
| Status | Code | Description |
|---|---|---|
| 400 | invalid_request | Missing or malformed parameters |
| 401 | unauthorized | Invalid or missing API key |
| 404 | model_not_found | Requested model is not loaded |
| 429 | rate_limited | Too many requests (if rate limiting configured) |
| 503 | engine_loading | Engine is still loading models |
Rate Limits
By default, no rate limits are applied. Configure limits in config.yaml:
rate_limiting:
enabled: true
requests_per_minute: 60
burst: 10