iOS API Spec

Leap

The entrypoint of LEAP SDK for iOS. It provides static methods for model loading and doesn’t hold any data.


public struct Leap {
    public static func load(url: URL) async throws -> ModelRunner
}

`load`

This function loads a model from a local file URL. The url should point to a model bundle file. The app needs to hold the model runner object returned by this function until there is no need to interact with the model anymore. See ModelRunner for more details.

The function will throw LeapError.modelLoadingFailure if LEAP fails to load the model.

Conversation

The instance of a conversation, which stores the message history and states that are needed by the model runner for generation.

While this Conversation instance holds the data necessary for the model runner to perform generation, the app still needs to maintain the UI state of the message history representation.


public class Conversation {
    public let modelRunner: ModelRunner
    public private(set) var history: [ChatMessage]
 
    public init(modelRunner: ModelRunner, history: [ChatMessage])
 
    // Generating response from a plain text message
    public func generateResponse(userTextMessage: String) -> AsyncStream<MessageResponse>
 
    // Generating response from a chat message
    public func generateResponse(message: ChatMessage) -> AsyncStream<MessageResponse>
}

`generateResponse`

This method adds the message to the conversation history, generates a response, and returns an AsyncStream<MessageResponse>. It can be called from the main thread.

The return value is a Swift AsyncStream. The generation will start immediately when the stream is created. Use for await loops or other async iteration patterns to consume the stream.

MessageResponse instances will be emitted from this stream, which contain chunks of data generated from the model.

Errors can be thrown within the async stream. Use do-catch blocks around the async iteration to capture errors from the generation.

If there is already a running generation, a new request will return an empty stream that finishes immediately.

Cancellation of the generation

Generation will be stopped when the task that iterates the AsyncStream is canceled. We highly recommend that generation be started within a Task associated with a lifecycle-aware component so that the generation can be stopped if the component is destroyed. Here is an example:


task = Task {
    let userMessage = ChatMessage(role: .user, content: [.text("Hello")])
    for await response in conversation.generateResponse(message: userMessage) {
        switch response {
        case .chunk(let text):
            print("Chunk: \(text)")
        case .reasoningChunk(let text):
            print("Reasoning: \(text)")
        case .complete(let usage, let completeInfo):
            print("Generation complete")
            print("Usage: \(usage)")
            print("Finish reason: \(completeInfo.finishReason)")
            if let stats = completeInfo.stats {
                print("Stats: \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s")
            }
        }
    }
}
 
// Stop the generation by canceling the task
task.cancel()

`history`

The history property returns the current chat message history. This is a read-only property that provides access to all messages in the conversation. If there is an ongoing generation, the partial message may not be available in the history until the generation completes. However, it is guaranteed that when MessageResponse.complete is received, the history will be updated to include the latest message.

Creation

Instances of this class are created directly using the initializer with a ModelRunner and initial message history.

Lifetime

While a Conversation stores the history and state that is needed by the model runner to generate content, its generation function relies on the model runner that creates it. As a result, if that model runner instance has been destroyed, the Conversation instance will fail to run subsequent generations.

ModelRunner

An instance of a model loaded in memory. This is returned by Leap.load(url:) and is used to create Conversation instances. The application needs to own the model runner object. If the model runner object is destroyed, ongoing generations may fail.

If you need your model runner to survive after view controllers are destroyed, you may need to manage it at the app level or in a service-like object.

Creating Conversations

Conversations are created directly using the Conversation initializer:


// Create a new conversation
let conversation = Conversation(modelRunner: modelRunner, history: [])
 
// Create conversation with system prompt
let systemMessage = ChatMessage(role: .system, content: [.text("You are a helpful assistant.")])
let conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])

The ModelRunner protocol is implemented internally by the SDK and should not be implemented by application code.

ChatMessage

Data structure that is compatible with the message object in OpenAI chat completion API.


public enum ChatMessageRole: String, Codable {
    case user = "user"
    case system = "system"
    case assistant = "assistant"
}
 
public struct ChatMessage: Codable {
    public var role: ChatMessageRole
    public var content: [ChatMessageContent]
    public var reasoningContent: String?
 
    public init(role: ChatMessageRole, content: [ChatMessageContent], reasoningContent: String? = nil)
}

Properties

role: The role of the message sender (user, system, or assistant)
content: An array of ChatMessageContent items (currently only text content is supported)
reasoningContent: Optional field for models that support chain-of-thought reasoning

The structure is compatible with OpenAI API message format.

ChatMessageContent

Data structure that is compatible with the content object in OpenAI chat completion API. It is implemented as an enum.


public enum ChatMessageContent {
    case text(String)
}

Currently, only text content is supported. Future versions may add support for other content types like images or audio.

MessageResponse

The response generated from models. Generation may take a long time to finish, so generated text is sent out as “chunks”. When generation completes, a complete response object is sent out. This is an enum with the following cases:


public enum GenerationFinishReason {
    case stop
    case exceed_context
}
 
public struct GenerationStats {
    public var promptTokens: UInt64      // Tokens in the prompt
    public var completionTokens: UInt64  // Tokens in the completion
    public var totalTokens: UInt64       // Total tokens used
    public var tokenPerSecond: Float     // Average generation speed
}
 
public struct GenerationCompleteInfo {
    public let finishReason: GenerationFinishReason
    public let stats: GenerationStats?  // Optional generation statistics
}
 
public enum MessageResponse {
    case chunk(String)
    case reasoningChunk(String)
    case complete(String, GenerationCompleteInfo)
}

chunk(String): Contains a piece of generated text
reasoningChunk(String): Contains reasoning text for models that support chain-of-thought
complete(String, GenerationCompleteInfo): Indicates generation completion. The String contains usage information, and the GenerationCompleteInfo provides the finish reason and optional generation statistics including token counts and speed.

Error Handling

All errors are thrown as LeapError. Currently defined cases include:


public enum LeapError: Error {
    case loadModelFailure
    case modelLoadingFailure(String, Error?)
    case generationFailure(String, Error?)
    case serializationFailure(String, Error?)
}

loadModelFailure: Generic model loading failure
modelLoadingFailure: Model loading failure with error details
generationFailure: Generation failure with error details
serializationFailure: JSON serialization/deserialization failure

LeapModelDownloader

The model downloader module allows downloading model bundles on-demand directly in your app, rather than bundling them with your app. This is useful for apps that need to support multiple models or want to reduce initial app size.

The model downloader is currently intended for prototyping and development use cases. For production applications, we recommend bundling models directly with your app or using your own secure download infrastructure with proper authentication, error handling, and retry logic.


import LeapModelDownloader
 
public class LeapModelDownloader {
    // Download status tracking
    public enum ModelDownloadStatus: Equatable {
        case notOnLocal
        case downloadInProgress(progress: Double)
        case downloaded
    }
 
    public init(notificationConfig: LeapModelDownloaderNotificationConfig? = nil)
 
    // Get local file URL for a model
    public func getModelFile(_ model: DownloadableModel) -> URL
 
    // Non-blocking download request
    public func requestDownloadModel(_ model: DownloadableModel, forceDownload: Bool = false)
 
    // Await download completion
    public func downloadModel(_ model: DownloadableModel, forceDownload: Bool = false) async -> Result<URL, Error>
 
    // Check download status
    public func queryStatus(_ model: DownloadableModel) async -> ModelDownloadStatus
 
    // Cancel download
    public func requestStopDownload(_ model: DownloadableModel)
 
    // Remove downloaded model
    public func removeModel(_ model: DownloadableModel) async throws
 
    // Utility methods
    public func getModelFileSize(_ model: DownloadableModel) -> Int64?
    public func getAvailableDiskSpace() -> Int64?
    public func requestNotificationPermissions() async -> Bool
}

DownloadableModel Protocol

Models available for download implement the DownloadableModel protocol:


public protocol DownloadableModel: Sendable {
    /// The remote URI where the model can be downloaded from
    var uri: URL { get }
 
    /// Human-readable name of the model
    var name: String { get }
 
    /// Local filename to use when storing the model on device
    var localFilename: String { get }
}

LeapDownloadableModel

The easiest way to download models from the Leap Model Library is using LeapDownloadableModel, which automatically resolves download URLs from the Leap API:


public struct LeapDownloadableModel: DownloadableModel {
    /// Model slug on Leap (e.g., "qwen-0.6b")
    public let modelSlug: String
 
    /// Quantization slug (e.g., "qwen-0.6b-20250610-8da4w")
    public let quantizationSlug: String
 
    /// Resolve a model from the Leap API
    public static func resolve(
        modelSlug: String,
        quantizationSlug: String
    ) async -> LeapDownloadableModel?
}

Usage:


// Resolve a model from the Leap API
let model = await LeapDownloadableModel.resolve(
    modelSlug: "qwen-0.6b",
    quantizationSlug: "qwen-0.6b-20250610-8da4w"
)
 
if let model = model {
    let downloader = LeapModelDownloader()
    downloader.requestDownloadModel(model)
}

HuggingFaceDownloadableModel

For HuggingFace models, you can use the HuggingFaceDownloadableModel:


public struct HuggingFaceDownloadableModel: DownloadableModel {
    /// Owner name on HuggingFace (e.g., "LiquidAI")
    public let ownerName: String
 
    /// Repository name on HuggingFace (e.g., "LeapBundles")
    public let repoName: String
 
    /// Filename of the model in the repository
    public let filename: String
 
    public init(ownerName: String, repoName: String, filename: String)
}

Usage:


let hfModel = HuggingFaceDownloadableModel(
    ownerName: "LiquidAI",
    repoName: "LeapBundles",
    filename: "qwen3-0_6b_8da4w_4096.bundle"
)
 
let downloader = LeapModelDownloader()
downloader.requestDownloadModel(hfModel)

Example Usage

Here’s a complete example showing how to resolve, download, and load a model from the Leap API:


import LeapSDK
import LeapModelDownloader
 
@MainActor
class ModelDownloadManager: ObservableObject {
    private let downloader = LeapModelDownloader()
    @Published var downloadStatus: LeapModelDownloader.ModelDownloadStatus = .notOnLocal
    @Published var isResolvingModel = false
 
    func downloadAndLoadLeapModel(modelSlug: String, quantizationSlug: String) async {
        isResolvingModel = true
 
        // First, resolve the model from the Leap API
        guard let model = await LeapDownloadableModel.resolve(
            modelSlug: modelSlug,
            quantizationSlug: quantizationSlug
        ) else {
            print("Failed to resolve model: \(modelSlug) with quantization: \(quantizationSlug)")
            isResolvingModel = false
            return
        }
 
        isResolvingModel = false
        print("Resolved model: \(model.name)")
        print("Download URL: \(model.uri)")
 
        // Request notification permissions first
        await downloader.requestNotificationPermissions()
 
        // Check current status
        downloadStatus = await downloader.queryStatus(model)
 
        if downloadStatus.type == .notOnLocal {
            // Download the model
            let result = await downloader.downloadModel(model)
 
            switch result {
            case .success(let fileURL):
                print("Model downloaded to: \(fileURL)")
 
                // Load the model with LeapSDK
                do {
                    let modelRunner = try await Leap.load(url: fileURL)
                    // Use modelRunner...
                } catch {
                    print("Failed to load model: \(error)")
                }
 
            case .failure(let error):
                print("Download failed: \(error)")
            }
        }
    }
 
    func trackDownloadProgress(_ model: DownloadableModel) async {
        // Monitor download progress
        while true {
            let status = await downloader.queryStatus(model)
            downloadStatus = status
 
            if status.type != .downloadInProgress {
                break
            }
 
            try? await Task.sleep(nanoseconds: 500_000_000) // 0.5 seconds
        }
    }
}
 
// Usage in a SwiftUI view
struct ContentView: View {
    @StateObject private var downloadManager = ModelDownloadManager()
 
    var body: some View {
        VStack {
            if downloadManager.isResolvingModel {
                ProgressView("Resolving model...")
            } else {
                Button("Download Qwen-0.6B Model") {
                    Task {
                        await downloadManager.downloadAndLoadLeapModel(
                            modelSlug: "qwen-0.6b",
                            quantizationSlug: "qwen-0.6b-20250610-8da4w"
                        )
                    }
                }
            }
 
            switch downloadManager.downloadStatus {
            case .notOnLocal:
                Text("Model not downloaded")
            case .downloadInProgress(let progress):
                ProgressView("Downloading...", value: progress, total: 1.0)
            case .downloaded:
                Text("Model ready!")
            }
        }
    }
}

GenerationOptions

Control generation parameters using GenerationOptions:


public struct GenerationOptions {
    public var temperature: Float?      // Sampling temperature (higher = more random)
    public var topP: Float?            // Nucleus sampling parameter
    public var minP: Float?            // Minimal possibility threshold
    public var repetitionPenalty: Float? // Repetition penalty (positive decreases repetition)
 
    public init(temperature: Float? = nil, topP: Float? = nil, minP: Float? = nil, repetitionPenalty: Float? = nil)
}

Usage with Conversation


let options = GenerationOptions(
    temperature: 0.7,
    topP: 0.9,
    repetitionPenalty: 1.1
)
 
// Pass options to generateResponse
for await response in conversation.generateResponse(message: userMessage, options: options) {
    // Handle response...
}

Complete Example

Here’s a complete example showing how to use the iOS SDK with the latest APIs:


import LeapSDK
import SwiftUI
 
@MainActor
class ChatStore: ObservableObject {
    @Published var conversation: Conversation?
    @Published var modelRunner: ModelRunner?
    @Published var isModelLoading = true
    @Published var outputText = ""
 
    private var generationTask: Task<Void, Never>?
 
    func setupModel() async {
        guard modelRunner == nil else { return }
        isModelLoading = true
 
        do {
            guard let modelURL = Bundle.main.url(
                forResource: "qwen3-0_6b",
                withExtension: "bundle"
            ) else {
                print("❗️ Could not find model bundle")
                isModelLoading = false
                return
            }
 
            let modelRunner = try await Leap.load(url: modelURL)
            self.modelRunner = modelRunner
 
            // Create conversation with optional system message
            let systemMessage = ChatMessage(
                role: .system,
                content: [.text("You are a helpful assistant.")]
            )
            conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])
            print("✅ Model loaded successfully!")
        } catch {
            print("🚨 Failed to load model: \(error)")
        }
 
        isModelLoading = false
    }
 
    func sendMessage(_ text: String) {
        guard let conversation = conversation else { return }
 
        generationTask?.cancel()
        outputText = ""
 
        generationTask = Task {
            let userMessage = ChatMessage(role: .user, content: [.text(text)])
 
            // Optional: Configure generation options
            let options = GenerationOptions(temperature: 0.7, topP: 0.9)
 
            for await response in conversation.generateResponse(message: userMessage, options: options) {
                if Task.isCancelled { break }
 
                switch response {
                case .chunk(let chunk):
                    outputText += chunk
                case .reasoningChunk(let reasoning):
                    // Handle reasoning if needed
                    print("Reasoning: \(reasoning)")
                case .complete(let usage, let completeInfo):
                    print("Complete. Usage: \(usage)")
                    print("Finish reason: \(completeInfo.finishReason)")
                    if let stats = completeInfo.stats {
                        print("Generation stats: \(stats.totalTokens) tokens, \(stats.tokenPerSecond) tok/s")
                    }
                }
            }
        }
    }
 
    func stopGeneration() {
        generationTask?.cancel()
    }
}