API Documentation

Overview

This document provides a comprehensive guide to using the Sahara API. It walks you through discovering available models and compute providers, querying model metadata, and making inference requests using both raw HTTP and OpenAI-compatible Python clients. The API is especially useful for developers integrating multiple model providers into their workflow while maintaining a unified interface. You will learn how to:

Query all available models and compute providers
Filter models or providers using specific criteria
Access model usage details
Send inference requests through Langchain, OpenAI SDK, or direct HTTP
Implement multi-agent logic with routing

Preparation

API Setup

To access the Sahara Model Hub API, you need a valid API key. This key is required to authenticate every API request.

How to Get Your API Key

1. Go to the Developer Portal

Open: https://portal.saharalabs.ai

2. Log In and Access API Keys

Click your profile icon (top-right) → select "API Key".

3. Create a New Key

Click "Create API Key", assign it a name like "dev-client", and generate it.

4. Copy and Store Securely

You can only view the key once. Save it securely in an environment variable, config file, or secret manager.

Note: Never expose your API key in public code or repositories. Treat it as a secret credential.

Once you have your API key, configure it in your script. This will be required in all requests sent to the Sahara Model Hub API.

Configure http header with you API_KEY:

  API_KEY = "your-api-key"
    HEADERS = {
        "Accept": "application/json",
        "x-api-key": API_KEY,
    }

Replace "your-api-key" with the key you obtained from the Developer Portal.

Discover Available Models & Providers

The Sahara API allows you to dynamically explore available models and compute providers.

Get All Models

This command fetches all registered models across providers:

curl -s 'https://portal.saharalabs.ai/api/compute/models'   -H 
'Accept: application/json'   -H 'x-api-key: your-api-key' | jq

Sample Response

[
  "llama-3-8b",
  "gpt-4o",
  "deepseek-ai/DeepSeek-V3",
  "deepseek-ai/DeepSeek-R1",
  "llama3-3-70b",
  "Qwen/Qwen2.5-72B-Instruct-Turbo",
  "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  "Qwen/Qwen2.5-7B-Instruct-Turbo",
  "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
  "llama3-1-8b",
  "deepseek-ai/DeepSeek-V3-0324"
]

Get All Providers

This API lists all compute providers (e.g., OpenAI, Lepton, Together):

curl -s 'https://portal.saharalabs.ai/api/compute/providers'   -H 
'Accept: application/json'   -H 
'x-api-key: your-api-key'

Sample Response

["lepton","predibase","sagemaker","bedrock","openai","together"]

Get Models by Provider

Query models served by a specific provider

curl -s 'https://portal.saharalabs.ai/api/compute/models?provider=predibase'   -H 'Accept: application/json'   -H 'x-api-key: your-api-key' | jq

Sample Reponse

[
  "llama-3-8b"
]

Get Providers by Model

Find which providers serve a specific model, for example, when we want to find the provider serving deepseek-ai/DeepSeek-V3

curl -s 'https://portal.saharalabs.ai/api/compute/providers?model=deepseek-ai/DeepSeek-V3'   -H 'Accept: application/json'   -H 'x-api-key: your-api-key' | jq

Output

[
  "together"
]

Get Model Details

Fetch metadata and detailed usage requirements for a specific model-provider pair:

curl -s 'https://portal.saharalabs.ai/api/compute/modelDetail?model=deepseek-ai/DeepSeek-V3&provider=together'   -H 'Accept: application/json'   -H 'x-api-key: your-api-key' | jq

Sample Response

{
  "id": "1beec936-672e-4e63-9ef9-af721d0ed3e2",
  "name": "deepseek-ai/DeepSeek-V3",
  "description": "together AI deepseek-ai/DeepSeek-V3",
  "is_public": null,
  "license": null,
  "model_size": 0,
  "tags": null,
  "tensor_type": null
}

Model Inference by Raw HTTP Request

import os
import requests

SAHARA_DEVPORTAL_API_KEY = 'your-api-key'
MODEL_BASE_URL = "https://portal.saharalabs.ai/api/compute"

model_name = "gpt-4o"
model_provider = "openai"

url = f"{MODEL_BASE_URL}/chat/completions"
headers = {
   "Content-Type": "application/json",
   "Authorization": f"Bearer {SAHARA_DEVPORTAL_API_KEY}",
   "OpenAI-Organization": model_provider
}
data = {
   "model": model_name,
   "messages": [
       {"role": "system", "content": "You are a helpful assistant."},
       {"role": "user", "content": "Hello!"}
   ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Sample Response

{'id': 'chatcmpl-BHOQRlHqSrfSMi3wFtOYzxVOZWufb', 'choices': [{'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'content': 'Hello! How can I assist you today?', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None, 'annotations': []}}], 'created': 1743485167, 'model': 'gpt-4o-2024-08-06', 'object': 'chat.completion', 'service_tier': 'default', 'system_fingerprint': 'fp_898ac29719', 'usage': {'completion_tokens': 10, 'prompt_tokens': 19, 'total_tokens': 29, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}}

Model Inference by OpenAI SDK

If you prefer to use OpenAI's SDK, the Sahara endpoint fully supports OpenAI-compatible APIs.

Non-Streaming Response

from openai import OpenAI
client = OpenAI(
   base_url=MODEL_BASE_URL,
   api_key=SAHARA_DEVPORTAL_API_KEY,
   organization="openai"
)
completion = client.chat.completions.create(
 model="gpt-4o",
 messages=[
   {"role": "system", "content": "You are a helpful assistant. You are a helpful assistant. You are a helpful assistant. You are a helpful assistant."},
   {"role": "user", "content": "Hello! Who are you man? Are you ok? Hey hey hey"}
 ]
)
print(completion.choices[0].message)

Sample Output

ChatCompletionMessage(content="Hello! I'm an AI assistant here to help you with any questions or information you need. How can I assist you today?", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)

Streaming Response

async def generate(model_name, model_provider):
   print(f"Testing Streaming Output of {model_name} on {model_provider}")
   chat = ChatOpenAI(
       model=model_name,
       api_key=SAHARA_DEVPORTAL_API_KEY,
       openai_api_base=MODEL_BASE_URL,
       organization=model_provider,
       streaming=True,
       extra_body={
           "compute_provider": "lepton"
       }
   )


   messages = [
       HumanMessage(content="Hello! How are you are you are you? Hey hey hey!")
   ]


   try:
       full_content = ""
       async for chunk in chat.astream(messages):
           if chunk.content:
               full_content += chunk.content
               print(full_content)


       print(full_content)
       return


   except Exception as e:
       print(f"Streaming error: {e}")
       error_data = {"type": "error", "message": str(e)}
       print(f"data: {json.dumps(error_data)}\n\n")




async def main():
   for combination in model_provider_combinations[:1]:
       await generate(combination["model_name"], combination["model_provider"])


if __name__ == '__main__':
   asyncio.run(main())

Sample Response

Testing Streaming Output of gpt-4o on openai
Hello
Hello!
Hello! I'm
Hello! I'm here
Hello! I'm here and
Hello! I'm here and ready
Hello! I'm here and ready to
Hello! I'm here and ready to help
Hello! I'm here and ready to help.
Hello! I'm here and ready to help. What
Hello! I'm here and ready to help. What can
Hello! I'm here and ready to help. What can I
Hello! I'm here and ready to help. What can I do
Hello! I'm here and ready to help. What can I do for
Hello! I'm here and ready to help. What can I do for you
Hello! I'm here and ready to help. What can I do for you today
Hello! I'm here and ready to help. What can I do for you today?
Hello! I'm here and ready to help. What can I do for you today?

Model Inference Using Langchain

Prerequisites

Ensure the following tools and packages are installed before continuing

pip install langchain_openai

langchain_openai is a Python library that provides integration between LangChain and OpenAI’s API.

You can interact with sahara models using the `langchain` interface. This is useful for testing streaming outputs and experimenting with conversational flows. Below is an example using three working models and one invalid one to demonstrate both success and failure:

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
import asyncio
import json

model_name = "gpt-4o"
model_provider = "openai"

chat = ChatOpenAI(
   model=model_name,
   api_key=SAHARA_DEVPORTAL_API_KEY,
   openai_api_base=MODEL_BASE_URL,
   organization=model_provider,
   streaming=False,
)

messages = [
   HumanMessage(content="Hello! How are you?")
]


def generate():
   try:
       res = chat.invoke(messages)
       print(res)

   except Exception as e:
       print(f"Streaming error: {e}")
       error_data = {"type": "error", "message": str(e)}
       print(f"data: {json.dumps(error_data)}\n\n")


if __name__ == '__main__':
   generate()

Sample Response

content="Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 13, 'total_tokens': 43, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_eb9dce56a8', 'finish_reason': 'stop', 'logprobs': None} id='run-427fd56e-853e-4cb4-9c29-8f48cccab9d6-0' usage_metadata={'input_tokens': 13, 'output_tokens': 30, 'total_tokens': 43, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}

Multi-Agent Integration (OpenAI Agents SDK)

The Sahara API supports OpenAI's 'agents-python' package. This example sets up three agents:

A Spanish-speaking agent
An English-speaking agent
A triage agent that routes input based on langauge

Prerequisites

Ensure the following tools and packages are installed before continuing

pip install nest_asyncio
pip install "openai-agents @ git+https://github.com/openai/openai-agents-python.git"

openai-agents is a Python SDK that provides an Agent Framework for building intelligent agents.
nest_asyncio Allows you to run asynchronous code

import os
from agents import Agent, Runner, AsyncOpenAI, OpenAIChatCompletionsModel, RunConfig
import asyncio
import nest_asyncio
nest_asyncio.apply()


SAHARA_DEVPORTAL_API_KEY = 'your-api-key'
MODEL_BASE_URL = "https://portal.saharalabs.ai/api/compute"
os.environ["OPENAI_BASE_URL"] = os.environ["OPENAI_API_KEY"] = 


client_openai = AsyncOpenAI(
   api_key=SAHARA_DEVPORTAL_API_KEY,
   base_url=MODEL_BASE_URL,
   organization="openai"
)


client_together = AsyncOpenAI(
   api_key=SAHARA_DEVPORTAL_API_KEY,
   base_url=MODEL_BASE_URL,
   organization="together"
)


spanish_agent = Agent(
   name="Spanish agent",
   instructions="You only speak Spanish. Your name is James",
   model=OpenAIChatCompletionsModel(
       model="deepseek-ai/DeepSeek-V3",
       openai_client=client_together,
   )
)




english_agent = Agent(
   name="English agent",
   instructions="You only speak English. Your name is Jesse",
   model=OpenAIChatCompletionsModel(
       model="deepseek-ai/DeepSeek-V3",
       openai_client=client_together
   ),
)

triage_agent = Agent(
   name="Triage agent",
   instructions="Handoff to the appropriate agent based on the language of the request.",
   handoffs=[spanish_agent, english_agent],
   model=OpenAIChatCompletionsModel(
       model="gpt-4o",
       openai_client=client_openai
   ),
)

async def main():
   result = await Runner.run(triage_agent, input="Hola, ¿Cómo te llamas?")
   print(result.final_output)


asyncio.run(main())

Sample Response

¡Hola! Me llamo James. ¿En qué puedo ayudarte hoy?

This example demonstrates complex routing logic using OpenAI-compatible models served from Sahara.

Error Handling and Best Practices

Error Codes

400 Bad Request: Check request formatting.
404 Not Found: Verify pipeline or model IDs.
500 Internal Server Error: Retry or contact support.

Best Practices

Secure Keys:

Use environment variables to store API keys securely.

Monitor Usage:

Regularly review metrics to optimize performance.

Retry Logic:

Implement retry logic for transient errors (e.g., 500 Internal Server Error).

PreviousDataset Registry & Tokenization

Last updated 1 month ago