API Reference
Access 40+ AI models through a single, OpenAI-compatible API. Chat completions, image generation, video, TTS, search, and data extraction — all on one key.
Base URL: https://api.vincony.com/v1
Why Use the Vincony API?
One integration, every AI model — with the controls you need to ship confidently.
One Key, 400+ Models
Access every major AI provider through a single API key and billing system.
OpenAI-Compatible
Drop-in replacement that works with any OpenAI SDK — switch models by changing a string.
Built-in Cost Control
Per-key credit budgets, real-time usage analytics, and overspend alerts.
Popular Use Cases
SaaS Products
Add AI chat, summarization, or content generation to your app without managing multiple provider accounts.
Internal Tools
Build Slack bots, email drafters, or data analyzers powered by the best model for each task.
Content Pipelines
Automate blog posts, social media copy, and product descriptions at scale with batch generation.
Code Assistants
Integrate code generation, review, and debugging into CI/CD pipelines or IDE extensions.
Customer Support Bots
Deploy AI chatbots on any website using the chatbot embed endpoint and a single script tag.
Research & Analysis
Run queries across multiple models simultaneously for comparison and consensus.
Quickstart
Get up and running in under 2 minutes:
Get your API key
Sign up and generate an API key from the Developer Portal. Keys start with vncy_.
Make your first request
The API is OpenAI-compatible — use any HTTP client or the OpenAI SDK:
curl -X POST https://api.vincony.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer vncy_YOUR_API_KEY" \ -d '{"model":"openai/gpt-5-mini","messages":[{"role":"user","content":"Say hello!"}]}'
Parse the response
The response follows the OpenAI Chat Completions format:
{ "choices": [{ "message": { "content": "Hello! How can I help you?" } }] }
API Overview
| Base URL | https://api.vincony.com/v1 |
| Native Gateway URL | https://api.vincony.com/api-gateway |
| Content-Type | application/json |
| Authentication | Bearer vncy_... |
| Response Format | JSON (except TTS which returns audio/mpeg) |
| API Version | v1 (stable, no breaking changes planned) |
Authentication
All API requests require a Bearer token in the Authorization header. API keys start with the vncy_ prefix.
Authorization: Bearer vncy_YOUR_API_KEY
Generate keys from the Developer Portal.
Each key has configurable scopes that limit which endpoints it can access:
| Scope | Grants Access To |
|---|---|
| chat | Chat Completions (/v1/chat/completions) |
| image | Image Generation (generate-image-aiml) |
| video | Video Generation (generate-video) |
| tts | Text-to-Speech (text-to-speech) |
| search | AI Search (ai-search) |
| extract | Data Extraction (extract-data) |
Endpoints
The API provides two interfaces: an OpenAI-compatible endpoint at /v1/chat/completions and a native gateway at /api-gateway for image, video, TTS, search, and extraction.
| Endpoint | Method | Path | Scope | Cost |
|---|---|---|---|---|
| Chat Completions | POST | /v1/chat/completions | chat | 1–10 credits (varies by model) |
| List Models | GET | /v1/models | — | 0 credits |
| Image Generation | POST | /api-gateway | image | 3–5 credits (varies by model) |
| Video Generation | POST | /api-gateway | video | 15–200 credits (varies by model) |
| Text-to-Speech | POST | /api-gateway | tts | 2 credits |
| AI Search | POST | /api-gateway | search | 3 credits |
| Data Extraction | POST | /api-gateway | extract | 2 credits |
Models & Pricing
| Model ID | Provider | Category | Credits / req |
|---|---|---|---|
| google/gemini-3-flash-preview | General | 1 | |
| google/gemini-3-pro-preview | General | 4 | |
| google/gemini-2.5-flash | General | 1 | |
| google/gemini-2.5-flash-lite | General | 1 | |
| google/gemini-2.5-pro | General | 3 | |
| google/gemini-2.5-flash-image | General | 2 | |
| google/gemini-3-pro-image-preview | General | 4 | |
| openai/gpt-5 | OpenAI | General | 3 |
| openai/gpt-5-mini | OpenAI | General | 2 |
| openai/gpt-5-nano | OpenAI | General | 1 |
| openai/gpt-5.2 | OpenAI | General | 5 |
| openai/o4-mini | OpenAI | General | 2 |
| anthropic/claude-sonnet-4.5 | Anthropic | General | 3 |
| anthropic/claude-sonnet-4 | Anthropic | General | 3 |
| anthropic/claude-opus-4.5 | Anthropic | General | 8 |
| anthropic/claude-opus-4 | Anthropic | General | 10 |
| x-ai/grok-4 | xAI | General | 3 |
| deepseek/deepseek-v3.2 | DeepSeek | General | 1 |
| meta/llama-4-maverick | Meta | General | 2 |
| mistral/mistral-large | Mistral | General | 2 |
| openai/gpt-5.2-codex | OpenAI | Code | 4 |
| openai/gpt-5.1-codex-max | OpenAI | Code | 4 |
| openai/gpt-5-codex | OpenAI | Code | 3 |
| openai/gpt-5.1-codex | OpenAI | Code | 3 |
| openai/gpt-5.1-codex-mini | OpenAI | Code | 1 |
| openai/codex-mini | OpenAI | Code | 1 |
| mistral/devstral-2 | Mistral | Code | 3 |
| mistral/devstral-small-2 | Mistral | Code | 1 |
| mistral/devstral-small | Mistral | Code | 1 |
| mistral/codestral | Mistral | Code | 2 |
| alibaba/qwen3-coder-plus | Alibaba | Code | 3 |
| alibaba/qwen3-coder | Alibaba | Code | 2 |
| openai/o3-pro | OpenAI | Reasoning | 10 |
| openai/o3 | OpenAI | Reasoning | 4 |
| openai/o3-mini | OpenAI | Reasoning | 2 |
| openai/o1 | OpenAI | Reasoning | 7 |
| openai/gpt-5.1-thinking | OpenAI | Reasoning | 4 |
| deepseek/deepseek-r1 | DeepSeek | Reasoning | 4 |
| mistral/magistral-medium | Mistral | Reasoning | 2 |
| mistral/magistral-small | Mistral | Reasoning | 1 |
| x-ai/grok-4.1-fast-r | xAI | Reasoning | 2 |
| alibaba/qwen3-max-thinking | Alibaba | Reasoning | 5 |
Rate Limits
| Plan | Requests / Minute | Requests / Day |
|---|---|---|
| Power | 60 | 5,000 |
| Business | 120 | 20,000 |
Rate limit status is returned in response headers:
X-RateLimit-Remaining-RPM— remaining requests this minuteX-RateLimit-Remaining-RPD— remaining requests today
Per-Key Credit Budgets
Each API key can optionally have a monthly credit budget. When the budget is exhausted, all requests return 429 until the budget resets at the start of the next calendar month.
At 80% consumption, a webhook alert is fired to the key's configured budget_webhook_url. Configure budgets in the Developer Portal.
Error Codes
| Code | Status | Description |
|---|---|---|
| 400 | Bad Request | Invalid JSON body, missing required fields, or malformed parameters. |
| 401 | Unauthorized | Missing or invalid API key. Ensure your Authorization header uses a valid vncy_ key. |
| 402 | Payment Required | Insufficient credits. Upgrade your plan or purchase a top-up. |
| 403 | Forbidden | API key revoked, insufficient scope for the requested endpoint, or plan upgrade required. |
| 429 | Rate Limited | Exceeded RPM or RPD limits, or monthly per-key credit budget exhausted. Check the error message for details. |
| 500 | Internal Server Error | Unexpected server error. Contact support if this persists. |
| 502 | Bad Gateway | Upstream AI provider is temporarily unavailable. Retry with exponential back-off. |
OpenAI-Compatible Error Format
Returned by /v1/chat/completions:
{ "error": { "message": "Rate limit exceeded (60 requests/minute).", "type": "rate_limit_error", "code": "rate_limit_exceeded" } }
Native Gateway Error Format
Returned by /api-gateway:
{ "error": "Rate limit exceeded (60 requests/minute). Please slow down." }
402 status indicates credit/budget exhaustion, distinct from 429 rate limiting. Check the error message to distinguish between the two.Webhooks
Receive real-time HTTP POST notifications when events occur in your account. Configure webhook endpoints from the Webhooks settings.
Setup
- Add an HTTPS endpoint URL in the Webhooks settings page.
- Select which events to subscribe to.
- A signing secret is generated automatically — use it to verify payloads.
Event Types
| Event | Description |
|---|---|
| generation_complete | Fired when any content generation finishes (text, image, audio, video, 3D). |
| credit_threshold | Fired when your credit balance drops below a warning threshold. |
| scheduled_complete | Fired when a scheduled generation job completes. |
| batch_complete | Fired when a batch generation run finishes. |
| api.request.completed | Fired after every successful API gateway request (includes endpoint, latency, credits). |
Payload Format
Every webhook delivery is a POST request with these headers:
Content-Type: application/jsonX-Webhook-Event— the event nameX-Webhook-Signature— HMAC-SHA256 hex digest of the body
{ "event": "generation_complete", "timestamp": "2026-02-24T12:00:00.000Z", "data": { "model_id": "openai/gpt-5", "model_category": "text", "prompt_preview": "Write a blog post about...", "credits_consumed": 3 } }
Verifying Signatures (HMAC-SHA256)
import crypto from "crypto";
function verifyWebhook(body, secret, signature) { const expected = crypto .createHmac("sha256", secret) .update(body) .digest("hex"); return crypto.timingSafeEqual( Buffer.from(expected), Buffer.from(signature) ); }
Webhooks time out after 10 seconds. Return a 2xx status to acknowledge receipt. Failed deliveries are not retried.
Streaming (SSE)
Set "stream": true in your chat completions request to receive Server-Sent Events. Each event contains a JSON chunk with a delta object. The stream terminates with data: [DONE].
curl -N -X POST https://api.vincony.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer vncy_YOUR_API_KEY" \ -d '{ "model": "openai/gpt-5", "messages": [ { "role": "user", "content": "Write a haiku about AI." } ], "stream": true }'
# Each SSE event is a JSON object prefixed with "data: " # The stream ends with "data: [DONE]" # # data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Silicon"}}]} # data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" dreams"}}]} # ... # data: [DONE]
data: followed by JSON. The delta.content field contains the incremental text. Accumulate these to build the full response.Code Examples
curl -X POST https://api.vincony.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer vncy_YOUR_API_KEY" \ -d '{ "model": "openai/gpt-5", "messages": [ { "role": "user", "content": "Explain quantum computing in one sentence." } ], "temperature": 0.7, "max_tokens": 256 }'
SDKs & Compatibility
The /v1/chat/completions endpoint is fully compatible with the OpenAI SDK. Simply point the base URL to Vincony:
from openai import OpenAI
client = OpenAI(api_key="vncy_...", base_url="https://api.vincony.com/v1") chat = client.chat.completions.create( model="openai/gpt-5", messages=[{"role": "user", "content": "Hello!"}], ) print(chat.choices[0].message.content)
Compatible with: OpenAI Python SDK, OpenAI Node.js SDK, LangChain, LlamaIndex, Vercel AI SDK, and any tool targeting the OpenAI Chat Completions schema.
The native gateway endpoints (/api-gateway) use a Vincony-specific format and are accessed directly via HTTP.
API Changelog
The Vincony API is currently
All content generations (chat, image, video, audio, 3D) now fire the generation_complete webhook event.
about 1 month agoFull endpoint reference with response schemas, multi-language examples, and quickstart guide.
about 1 month agoNew video generation endpoint with text-to-video and image-to-video models.
about 1 month agoAPI keys can now have monthly credit budgets with 80% threshold webhook alerts.
about 2 months agoChat completions, image generation, TTS, AI search, and data extraction endpoints.
about 2 months agoVersioning policy: URL-based versioning (/v1/). New version only for breaking changes. Additive changes ship without a version bump.
Deprecation notices: Deprecated features include a Sunset response header with the removal date.
Ready to integrate?
Get your API key and start building in minutes.