Skip to Content
NEW Introducing LFM2: the fastest on-device foundation models

iOS API Spec

Leap

The entrypoint of LEAP SDK for iOS. It provides static methods for model loading and doesn’t hold any data.

public struct Leap { public static func load(url: URL) async throws -> ModelRunner }

load

This function loads a model from a local file URL. The url should point to a model bundle file. The app needs to hold the model runner object returned by this function until there is no need to interact with the model anymore. See ModelRunner for more details.

The function will throw LeapError.modelLoadingFailure if LEAP fails to load the model.

Conversation

The instance of a conversation, which stores the message history and states that are needed by the model runner for generation.

While this Conversation instance holds the data necessary for the model runner to perform generation, the app still needs to maintain the UI state of the message history representation.

public class Conversation { public let modelRunner: ModelRunner public private(set) var history: [ChatMessage] public init(modelRunner: ModelRunner, history: [ChatMessage]) // Generating response from a plain text message public func generateResponse(userTextMessage: String) -> AsyncStream<MessageResponse> // Generating response from a chat message public func generateResponse(message: ChatMessage) -> AsyncStream<MessageResponse> }

generateResponse

This method adds the message to the conversation history, generates a response, and returns an AsyncStream<MessageResponse>. It can be called from the main thread.

The return value is a Swift AsyncStream. The generation will start immediately when the stream is created. Use for await loops or other async iteration patterns to consume the stream.

MessageResponse instances will be emitted from this stream, which contain chunks of data generated from the model.

Errors can be thrown within the async stream. Use do-catch blocks around the async iteration to capture errors from the generation.

If there is already a running generation, a new request will return an empty stream that finishes immediately.

Cancellation of the generation

Generation will be stopped when the task that iterates the AsyncStream is canceled. We highly recommend that generation be started within a Task associated with a lifecycle-aware component so that the generation can be stopped if the component is destroyed. Here is an example:

task = Task { let userMessage = ChatMessage(role: .user, content: [.text("Hello")]) for await response in conversation.generateResponse(message: userMessage) { switch response { case .chunk(let text): print("Chunk: \(text)") case .reasoningChunk(let text): print("Reasoning: \(text)") case .complete(let usage, let completeInfo): print("Generation complete") print("Usage: \(usage)") print("Finish reason: \(completeInfo.finishReason)") if let stats = completeInfo.stats { print("Stats: \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s") } } } } // Stop the generation by canceling the task task.cancel()

history

The history property returns the current chat message history. This is a read-only property that provides access to all messages in the conversation. If there is an ongoing generation, the partial message may not be available in the history until the generation completes. However, it is guaranteed that when MessageResponse.complete is received, the history will be updated to include the latest message.

Creation

Instances of this class are created directly using the initializer with a ModelRunner and initial message history.

Lifetime

While a Conversation stores the history and state that is needed by the model runner to generate content, its generation function relies on the model runner that creates it. As a result, if that model runner instance has been destroyed, the Conversation instance will fail to run subsequent generations.

ModelRunner

An instance of a model loaded in memory. This is returned by Leap.load(url:) and is used to create Conversation instances. The application needs to own the model runner object. If the model runner object is destroyed, ongoing generations may fail.

If you need your model runner to survive after view controllers are destroyed, you may need to manage it at the app level or in a service-like object.

Creating Conversations

Conversations are created directly using the Conversation initializer:

// Create a new conversation let conversation = Conversation(modelRunner: modelRunner, history: []) // Create conversation with system prompt let systemMessage = ChatMessage(role: .system, content: [.text("You are a helpful assistant.")]) let conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])

The ModelRunner protocol is implemented internally by the SDK and should not be implemented by application code.

ChatMessage

Data structure that is compatible with the message object in OpenAI chat completion API.

public enum ChatMessageRole: String, Codable { case user = "user" case system = "system" case assistant = "assistant" } public struct ChatMessage: Codable { public var role: ChatMessageRole public var content: [ChatMessageContent] public var reasoningContent: String? public init(role: ChatMessageRole, content: [ChatMessageContent], reasoningContent: String? = nil) }

Properties

  • role: The role of the message sender (user, system, or assistant)
  • content: An array of ChatMessageContent items (currently only text content is supported)
  • reasoningContent: Optional field for models that support chain-of-thought reasoning

The structure is compatible with OpenAI API message format.

ChatMessageContent

Data structure that is compatible with the content object in OpenAI chat completion API. It is implemented as an enum.

public enum ChatMessageContent { case text(String) }

Currently, only text content is supported. Future versions may add support for other content types like images or audio.

MessageResponse

The response generated from models. Generation may take a long time to finish, so generated text is sent out as “chunks”. When generation completes, a complete response object is sent out. This is an enum with the following cases:

public enum GenerationFinishReason { case stop case exceed_context } public struct GenerationStats { public var promptTokens: UInt64 // Tokens in the prompt public var completionTokens: UInt64 // Tokens in the completion public var totalTokens: UInt64 // Total tokens used public var tokenPerSecond: Float // Average generation speed } public struct GenerationCompleteInfo { public let finishReason: GenerationFinishReason public let stats: GenerationStats? // Optional generation statistics } public enum MessageResponse { case chunk(String) case reasoningChunk(String) case complete(String, GenerationCompleteInfo) }
  • chunk(String): Contains a piece of generated text
  • reasoningChunk(String): Contains reasoning text for models that support chain-of-thought
  • complete(String, GenerationCompleteInfo): Indicates generation completion. The String contains usage information, and the GenerationCompleteInfo provides the finish reason and optional generation statistics including token counts and speed.

Error Handling

All errors are thrown as LeapError. Currently defined cases include:

public enum LeapError: Error { case loadModelFailure case modelLoadingFailure(String, Error?) case generationFailure(String, Error?) case serializationFailure(String, Error?) }
  • loadModelFailure: Generic model loading failure
  • modelLoadingFailure: Model loading failure with error details
  • generationFailure: Generation failure with error details
  • serializationFailure: JSON serialization/deserialization failure

LeapModelDownloader

The model downloader module allows downloading model bundles on-demand directly in your app, rather than bundling them with your app. This is useful for apps that need to support multiple models or want to reduce initial app size.

The model downloader is currently intended for prototyping and development use cases. For production applications, we recommend bundling models directly with your app or using your own secure download infrastructure with proper authentication, error handling, and retry logic.

import LeapModelDownloader public class LeapModelDownloader { // Download status tracking public enum ModelDownloadStatus: Equatable { case notOnLocal case downloadInProgress(progress: Double) case downloaded } public init(notificationConfig: LeapModelDownloaderNotificationConfig? = nil) // Get local file URL for a model public func getModelFile(_ model: DownloadableModel) -> URL // Non-blocking download request public func requestDownloadModel(_ model: DownloadableModel, forceDownload: Bool = false) // Await download completion public func downloadModel(_ model: DownloadableModel, forceDownload: Bool = false) async -> Result<URL, Error> // Check download status public func queryStatus(_ model: DownloadableModel) async -> ModelDownloadStatus // Cancel download public func requestStopDownload(_ model: DownloadableModel) // Remove downloaded model public func removeModel(_ model: DownloadableModel) async throws // Utility methods public func getModelFileSize(_ model: DownloadableModel) -> Int64? public func getAvailableDiskSpace() -> Int64? public func requestNotificationPermissions() async -> Bool }

DownloadableModel Protocol

Models available for download implement the DownloadableModel protocol:

public protocol DownloadableModel: Sendable { /// The remote URI where the model can be downloaded from var uri: URL { get } /// Human-readable name of the model var name: String { get } /// Local filename to use when storing the model on device var localFilename: String { get } }

LeapDownloadableModel

The easiest way to download models from the Leap Model Library  is using LeapDownloadableModel, which automatically resolves download URLs from the Leap API:

public struct LeapDownloadableModel: DownloadableModel { /// Model slug on Leap (e.g., "qwen-0.6b") public let modelSlug: String /// Quantization slug (e.g., "qwen-0.6b-20250610-8da4w") public let quantizationSlug: String /// Resolve a model from the Leap API public static func resolve( modelSlug: String, quantizationSlug: String ) async -> LeapDownloadableModel? }

Usage:

// Resolve a model from the Leap API let model = await LeapDownloadableModel.resolve( modelSlug: "qwen-0.6b", quantizationSlug: "qwen-0.6b-20250610-8da4w" ) if let model = model { let downloader = LeapModelDownloader() downloader.requestDownloadModel(model) }

HuggingFaceDownloadableModel

For HuggingFace models, you can use the HuggingFaceDownloadableModel:

public struct HuggingFaceDownloadableModel: DownloadableModel { /// Owner name on HuggingFace (e.g., "LiquidAI") public let ownerName: String /// Repository name on HuggingFace (e.g., "LeapBundles") public let repoName: String /// Filename of the model in the repository public let filename: String public init(ownerName: String, repoName: String, filename: String) }

Usage:

let hfModel = HuggingFaceDownloadableModel( ownerName: "LiquidAI", repoName: "LeapBundles", filename: "qwen3-0_6b_8da4w_4096.bundle" ) let downloader = LeapModelDownloader() downloader.requestDownloadModel(hfModel)

Example Usage

Here’s a complete example showing how to resolve, download, and load a model from the Leap API:

import LeapSDK import LeapModelDownloader @MainActor class ModelDownloadManager: ObservableObject { private let downloader = LeapModelDownloader() @Published var downloadStatus: LeapModelDownloader.ModelDownloadStatus = .notOnLocal @Published var isResolvingModel = false func downloadAndLoadLeapModel(modelSlug: String, quantizationSlug: String) async { isResolvingModel = true // First, resolve the model from the Leap API guard let model = await LeapDownloadableModel.resolve( modelSlug: modelSlug, quantizationSlug: quantizationSlug ) else { print("Failed to resolve model: \(modelSlug) with quantization: \(quantizationSlug)") isResolvingModel = false return } isResolvingModel = false print("Resolved model: \(model.name)") print("Download URL: \(model.uri)") // Request notification permissions first await downloader.requestNotificationPermissions() // Check current status downloadStatus = await downloader.queryStatus(model) if downloadStatus.type == .notOnLocal { // Download the model let result = await downloader.downloadModel(model) switch result { case .success(let fileURL): print("Model downloaded to: \(fileURL)") // Load the model with LeapSDK do { let modelRunner = try await Leap.load(url: fileURL) // Use modelRunner... } catch { print("Failed to load model: \(error)") } case .failure(let error): print("Download failed: \(error)") } } } func trackDownloadProgress(_ model: DownloadableModel) async { // Monitor download progress while true { let status = await downloader.queryStatus(model) downloadStatus = status if status.type != .downloadInProgress { break } try? await Task.sleep(nanoseconds: 500_000_000) // 0.5 seconds } } } // Usage in a SwiftUI view struct ContentView: View { @StateObject private var downloadManager = ModelDownloadManager() var body: some View { VStack { if downloadManager.isResolvingModel { ProgressView("Resolving model...") } else { Button("Download Qwen-0.6B Model") { Task { await downloadManager.downloadAndLoadLeapModel( modelSlug: "qwen-0.6b", quantizationSlug: "qwen-0.6b-20250610-8da4w" ) } } } switch downloadManager.downloadStatus { case .notOnLocal: Text("Model not downloaded") case .downloadInProgress(let progress): ProgressView("Downloading...", value: progress, total: 1.0) case .downloaded: Text("Model ready!") } } } }

GenerationOptions

Control generation parameters using GenerationOptions:

public struct GenerationOptions { public var temperature: Float? // Sampling temperature (higher = more random) public var topP: Float? // Nucleus sampling parameter public var minP: Float? // Minimal possibility threshold public var repetitionPenalty: Float? // Repetition penalty (positive decreases repetition) public init(temperature: Float? = nil, topP: Float? = nil, minP: Float? = nil, repetitionPenalty: Float? = nil) }

Usage with Conversation

let options = GenerationOptions( temperature: 0.7, topP: 0.9, repetitionPenalty: 1.1 ) // Pass options to generateResponse for await response in conversation.generateResponse(message: userMessage, options: options) { // Handle response... }

Complete Example

Here’s a complete example showing how to use the iOS SDK with the latest APIs:

import LeapSDK import SwiftUI @MainActor class ChatStore: ObservableObject { @Published var conversation: Conversation? @Published var modelRunner: ModelRunner? @Published var isModelLoading = true @Published var outputText = "" private var generationTask: Task<Void, Never>? func setupModel() async { guard modelRunner == nil else { return } isModelLoading = true do { guard let modelURL = Bundle.main.url( forResource: "qwen3-0_6b", withExtension: "bundle" ) else { print("❗️ Could not find model bundle") isModelLoading = false return } let modelRunner = try await Leap.load(url: modelURL) self.modelRunner = modelRunner // Create conversation with optional system message let systemMessage = ChatMessage( role: .system, content: [.text("You are a helpful assistant.")] ) conversation = Conversation(modelRunner: modelRunner, history: [systemMessage]) print("✅ Model loaded successfully!") } catch { print("🚨 Failed to load model: \(error)") } isModelLoading = false } func sendMessage(_ text: String) { guard let conversation = conversation else { return } generationTask?.cancel() outputText = "" generationTask = Task { let userMessage = ChatMessage(role: .user, content: [.text(text)]) // Optional: Configure generation options let options = GenerationOptions(temperature: 0.7, topP: 0.9) for await response in conversation.generateResponse(message: userMessage, options: options) { if Task.isCancelled { break } switch response { case .chunk(let chunk): outputText += chunk case .reasoningChunk(let reasoning): // Handle reasoning if needed print("Reasoning: \(reasoning)") case .complete(let usage, let completeInfo): print("Complete. Usage: \(usage)") print("Finish reason: \(completeInfo.finishReason)") if let stats = completeInfo.stats { print("Generation stats: \(stats.totalTokens) tokens, \(stats.tokenPerSecond) tok/s") } } } } } func stopGeneration() { generationTask?.cancel() } }
Last updated on