iOS API Spec

Leap

The entrypoint of LEAP SDK for iOS. It provides static methods for model loading and doesn’t hold any data.


public struct Leap {
    public static func load(url: URL) async throws -> ModelRunner
}

`load`

This function loads a model from a local file URL. The url should point to a model bundle file. The app needs to hold the model runner object returned by this function until there is no need to interact with the model anymore. See ModelRunner for more details.

The function will throw LeapError.modelLoadingFailure if LEAP fails to load the model.

Conversation

The instance of a conversation, which stores the message history and states that are needed by the model runner for generation.

While this Conversation instance holds the data necessary for the model runner to perform generation, the app still needs to maintain the UI state of the message history representation.


public class Conversation {
    public let modelRunner: ModelRunner
    public private(set) var history: [ChatMessage]
 
    public init(modelRunner: ModelRunner, history: [ChatMessage])
 
    // Generating response from a plain text message
    public func generateResponse(userTextMessage: String) -> AsyncStream<MessageResponse>
 
    // Generating response from a chat message
    public func generateResponse(message: ChatMessage) -> AsyncStream<MessageResponse>
}

`generateResponse`

This method adds the message to the conversation history, generates a response, and returns an AsyncStream<MessageResponse>. It can be called from the main thread.

The return value is a Swift AsyncStream. The generation will start immediately when the stream is created. Use for await loops or other async iteration patterns to consume the stream.

MessageResponse instances will be emitted from this stream, which contain chunks of data generated from the model.

Errors can be thrown within the async stream. Use do-catch blocks around the async iteration to capture errors from the generation.

If there is already a running generation, a new request will return an empty stream that finishes immediately.

Cancellation of the generation

Generation will be stopped when the task that iterates the AsyncStream is canceled. We highly recommend that generation be started within a Task associated with a lifecycle-aware component so that the generation can be stopped if the component is destroyed. Here is an example:


task = Task {
    let userMessage = ChatMessage(role: .user, content: [.text("Hello")])
    for await response in conversation.generateResponse(message: userMessage) {
        switch response {
        case .chunk(let text):
            print("Chunk: \(text)")
        case .reasoningChunk(let text):
            print("Reasoning: \(text)")
        case .complete(let usage, let reason):
            print("Generation complete")
            print("Usage: \(usage)")
            print("Reason: \(reason)")
        }
    }
}
 
// Stop the generation by canceling the task
task.cancel()

`history`

The history property returns the current chat message history. This is a read-only property that provides access to all messages in the conversation. If there is an ongoing generation, the partial message may not be available in the history until the generation completes. However, it is guaranteed that when MessageResponse.complete is received, the history will be updated to include the latest message.

Creation

Instances of this class are created directly using the initializer with a ModelRunner and initial message history.

Lifetime

While a Conversation stores the history and state that is needed by the model runner to generate content, its generation function relies on the model runner that creates it. As a result, if that model runner instance has been destroyed, the Conversation instance will fail to run subsequent generations.

ModelRunner

An instance of a model loaded in memory. This is returned by Leap.load(url:) and is used to create Conversation instances. The application needs to own the model runner object. If the model runner object is destroyed, ongoing generations may fail.

If you need your model runner to survive after view controllers are destroyed, you may need to manage it at the app level or in a service-like object.

Creating Conversations

Conversations are created directly using the Conversation initializer:


// Create a new conversation
let conversation = Conversation(modelRunner: modelRunner, history: [])
 
// Create conversation with system prompt
let systemMessage = ChatMessage(role: .system, content: [.text("You are a helpful assistant.")])
let conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])

The ModelRunner protocol is implemented internally by the SDK and should not be implemented by application code.

ChatMessage

Data structure that is compatible with the message object in OpenAI chat completion API.


public enum ChatMessageRole: String, Codable {
    case user = "user"
    case system = "system"
    case assistant = "assistant"
}
 
public struct ChatMessage: Codable {
    public var role: ChatMessageRole
    public var content: [ChatMessageContent]
    public var reasoningContent: String?
 
    public init(role: ChatMessageRole, content: [ChatMessageContent], reasoningContent: String? = nil)
}

Properties

role: The role of the message sender (user, system, or assistant)
content: An array of ChatMessageContent items (currently only text content is supported)
reasoningContent: Optional field for models that support chain-of-thought reasoning

The structure is compatible with OpenAI API message format.

ChatMessageContent

Data structure that is compatible with the content object in OpenAI chat completion API. It is implemented as an enum.


public enum ChatMessageContent {
    case text(String)
}

Currently, only text content is supported. Future versions may add support for other content types like images or audio.

MessageResponse

The response generated from models. Generation may take a long time to finish, so generated text is sent out as “chunks”. When generation completes, a complete response object is sent out. This is an enum with the following cases:


public enum GenerationFinishReason {
    case stop
    case exceed_context
}
 
public enum MessageResponse {
    case chunk(String)
    case reasoningChunk(String)
    case complete(String, GenerationFinishReason)
}

chunk(String): Contains a piece of generated text
reasoningChunk(String): Contains reasoning text for models that support chain-of-thought
complete(String, GenerationFinishReason): Indicates generation completion. The String contains usage information, and the reason indicates why generation finished

Error Handling

All errors are thrown as LeapError. Currently defined cases include:


public enum LeapError: Error {
    case loadModelFailure
    case modelLoadingFailure(String, Error?)
    case generationFailure(String, Error?)
    case serializationFailure(String, Error?)
}

loadModelFailure: Generic model loading failure
modelLoadingFailure: Model loading failure with error details
generationFailure: Generation failure with error details
serializationFailure: JSON serialization/deserialization failure

Complete Example

Here’s a complete example showing how to use the iOS SDK:


import LeapSDK
import SwiftUI
 
@MainActor
class ChatStore: ObservableObject {
    @Published var conversation: Conversation?
    @Published var modelRunner: ModelRunner?
    @Published var isModelLoading = true
    @Published var outputText = ""
 
    private var generationTask: Task<Void, Never>?
 
    func setupModel() async {
        guard modelRunner == nil else { return }
        isModelLoading = true
 
        do {
            guard let modelURL = Bundle.main.url(
                forResource: "qwen3-0_6b",
                withExtension: "bundle"
            ) else {
                print("❗️ Could not find model bundle")
                isModelLoading = false
                return
            }
 
            let modelRunner = try await Leap.load(url: modelURL)
            self.modelRunner = modelRunner
 
            // Create conversation with optional system message
            let systemMessage = ChatMessage(
                role: .system,
                content: [.text("You are a helpful assistant.")]
            )
            conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])
            print("✅ Model loaded successfully!")
        } catch {
            print("🚨 Failed to load model: \(error)")
        }
 
        isModelLoading = false
    }
 
    func sendMessage(_ text: String) {
        guard let conversation = conversation else { return }
 
        generationTask?.cancel()
        outputText = ""
 
        generationTask = Task {
            let userMessage = ChatMessage(role: .user, content: [.text(text)])
 
            for await response in conversation.generateResponse(message: userMessage) {
                if Task.isCancelled { break }
 
                switch response {
                case .chunk(let chunk):
                    outputText += chunk
                case .reasoningChunk(let reasoning):
                    // Handle reasoning if needed
                    print("Reasoning: \(reasoning)")
                case .complete(let usage, let reason):
                    print("Complete. Usage: \(usage)")
                    print("Finish reason: \(reason)")
                }
            }
        }
    }
 
    func stopGeneration() {
        generationTask?.cancel()
    }
}