iOS API Spec
Leap
The entrypoint of LEAP SDK for iOS. It provides static methods for model loading and doesn’t hold any data.
public struct Leap {
public static func load(url: URL) async throws -> ModelRunner
}
load
This function loads a model from a local file URL. The url
should point to a model bundle file. The app needs to hold the model runner object returned by this function until there is no need to interact with the model anymore. See ModelRunner
for more details.
The function will throw LeapError.modelLoadingFailure
if LEAP fails to load the model.
Conversation
The instance of a conversation, which stores the message history and states that are needed by the model runner for generation.
While this Conversation
instance holds the data necessary for the model runner to perform generation, the app still needs to maintain the UI state of the message history representation.
public class Conversation {
public let modelRunner: ModelRunner
public private(set) var history: [ChatMessage]
public init(modelRunner: ModelRunner, history: [ChatMessage])
// Generating response from a plain text message
public func generateResponse(userTextMessage: String) -> AsyncStream<MessageResponse>
// Generating response from a chat message
public func generateResponse(message: ChatMessage) -> AsyncStream<MessageResponse>
}
generateResponse
This method adds the message to the conversation history, generates a response, and returns an AsyncStream<MessageResponse>
. It can be called from the main thread.
The return value is a Swift AsyncStream
. The generation will start immediately when the stream is created. Use for await
loops or other async iteration patterns to consume the stream.
MessageResponse
instances will be emitted from this stream, which contain chunks of data generated from the model.
Errors can be thrown within the async stream. Use do-catch
blocks around the async iteration to
capture errors from the generation.
If there is already a running generation, a new request will return an empty stream that finishes immediately.
Cancellation of the generation
Generation will be stopped when the task that iterates the AsyncStream
is canceled. We highly recommend that generation be started within a Task associated with a lifecycle-aware component so that the generation can be stopped if the component is destroyed. Here is an example:
task = Task {
let userMessage = ChatMessage(role: .user, content: [.text("Hello")])
for await response in conversation.generateResponse(message: userMessage) {
switch response {
case .chunk(let text):
print("Chunk: \(text)")
case .reasoningChunk(let text):
print("Reasoning: \(text)")
case .complete(let usage, let reason):
print("Generation complete")
print("Usage: \(usage)")
print("Reason: \(reason)")
}
}
}
// Stop the generation by canceling the task
task.cancel()
history
The history
property returns the current chat message history. This is a read-only property that provides access to all messages in the conversation. If there is an ongoing generation, the partial message may not be available in the history until the generation completes. However, it is guaranteed that when MessageResponse.complete
is received, the history will be updated to include the latest message.
Creation
Instances of this class are created directly using the initializer with a ModelRunner
and initial message history.
Lifetime
While a Conversation
stores the history and state that is needed by the model runner to generate content, its generation function relies on the model runner that creates it. As a result, if that model runner instance has been destroyed, the Conversation
instance will fail to run subsequent generations.
ModelRunner
An instance of a model loaded in memory. This is returned by Leap.load(url:)
and is used to create Conversation
instances. The application needs to own the model runner object. If the model runner object is destroyed, ongoing generations may fail.
If you need your model runner to survive after view controllers are destroyed, you may need to manage it at the app level or in a service-like object.
Creating Conversations
Conversations are created directly using the Conversation
initializer:
// Create a new conversation
let conversation = Conversation(modelRunner: modelRunner, history: [])
// Create conversation with system prompt
let systemMessage = ChatMessage(role: .system, content: [.text("You are a helpful assistant.")])
let conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])
The ModelRunner
protocol is implemented internally by the SDK and should not be implemented by
application code.
ChatMessage
Data structure that is compatible with the message object in OpenAI chat completion API.
public enum ChatMessageRole: String, Codable {
case user = "user"
case system = "system"
case assistant = "assistant"
}
public struct ChatMessage: Codable {
public var role: ChatMessageRole
public var content: [ChatMessageContent]
public var reasoningContent: String?
public init(role: ChatMessageRole, content: [ChatMessageContent], reasoningContent: String? = nil)
}
Properties
role
: The role of the message sender (user, system, or assistant)content
: An array ofChatMessageContent
items (currently only text content is supported)reasoningContent
: Optional field for models that support chain-of-thought reasoning
The structure is compatible with OpenAI API message format.
ChatMessageContent
Data structure that is compatible with the content object in OpenAI chat completion API. It is implemented as an enum.
public enum ChatMessageContent {
case text(String)
}
Currently, only text content is supported. Future versions may add support for other content types like images or audio.
MessageResponse
The response generated from models. Generation may take a long time to finish, so generated text is sent out as “chunks”. When generation completes, a complete response object is sent out. This is an enum with the following cases:
public enum GenerationFinishReason {
case stop
case exceed_context
}
public enum MessageResponse {
case chunk(String)
case reasoningChunk(String)
case complete(String, GenerationFinishReason)
}
chunk(String)
: Contains a piece of generated textreasoningChunk(String)
: Contains reasoning text for models that support chain-of-thoughtcomplete(String, GenerationFinishReason)
: Indicates generation completion. The String contains usage information, and the reason indicates why generation finished
Error Handling
All errors are thrown as LeapError
. Currently defined cases include:
public enum LeapError: Error {
case loadModelFailure
case modelLoadingFailure(String, Error?)
case generationFailure(String, Error?)
case serializationFailure(String, Error?)
}
loadModelFailure
: Generic model loading failuremodelLoadingFailure
: Model loading failure with error detailsgenerationFailure
: Generation failure with error detailsserializationFailure
: JSON serialization/deserialization failure
Complete Example
Here’s a complete example showing how to use the iOS SDK:
import LeapSDK
import SwiftUI
@MainActor
class ChatStore: ObservableObject {
@Published var conversation: Conversation?
@Published var modelRunner: ModelRunner?
@Published var isModelLoading = true
@Published var outputText = ""
private var generationTask: Task<Void, Never>?
func setupModel() async {
guard modelRunner == nil else { return }
isModelLoading = true
do {
guard let modelURL = Bundle.main.url(
forResource: "qwen3-0_6b",
withExtension: "bundle"
) else {
print("❗️ Could not find model bundle")
isModelLoading = false
return
}
let modelRunner = try await Leap.load(url: modelURL)
self.modelRunner = modelRunner
// Create conversation with optional system message
let systemMessage = ChatMessage(
role: .system,
content: [.text("You are a helpful assistant.")]
)
conversation = Conversation(modelRunner: modelRunner, history: [systemMessage])
print("✅ Model loaded successfully!")
} catch {
print("🚨 Failed to load model: \(error)")
}
isModelLoading = false
}
func sendMessage(_ text: String) {
guard let conversation = conversation else { return }
generationTask?.cancel()
outputText = ""
generationTask = Task {
let userMessage = ChatMessage(role: .user, content: [.text(text)])
for await response in conversation.generateResponse(message: userMessage) {
if Task.isCancelled { break }
switch response {
case .chunk(let chunk):
outputText += chunk
case .reasoningChunk(let reasoning):
// Handle reasoning if needed
print("Reasoning: \(reasoning)")
case .complete(let usage, let reason):
print("Complete. Usage: \(usage)")
print("Finish reason: \(reason)")
}
}
}
}
func stopGeneration() {
generationTask?.cancel()
}
}