Skip to Content
NEW Introducing LFM2: the fastest on-device foundation models
DocumentationiOSComparison with Cloud AI API

Comparision with Cloud AI API

If you are familiar with cloud-based AI APIs (e.g. OpenAI API ), this document shows the similarity and differences between these clould APIs and Leap.

We will inspect this piece of Python-based OpenAI API chat completion request to figure out how to migrate it to LeapSDK. This example code is modified from OpenAI API document .

from openai import OpenAI client = OpenAI() stream = client.chat.completions.create( model="gpt-4.1", messages=[ { "role": "user", "content": "Say 'double bubble bath' ten times fast.", }, ], stream=True, ) for chunk in stream: if chunk.choices: delta_content = chunk.choices[0].delta.get("content") if delta_content: print(delta_content, end="", flush=True) print("") print("Generation done!")

Loading the model

While models can be directly used on the cloud-based APIs once the API client is created, LeapSDK requires the developers to explicitly load the model before requesting the generation. It is necessary because the model will run locally. This step generally takes a few seconds depending on the model size and the device performance.

On cloud API, you need to create a API client:

client = OpenAI()

In LeapSDK, you need to load the model to create a model runner.

let modelRunner = try await Leap.load(url: modelURL)

The parameter will be the URL to the model bundle file. This URL must be local. The return value is a “model runner” which plays the similar role of client object in the clould API except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.

Request for generation

In the cloud API calls, client.chat.completions.create will return a stream object for caller to fetch the generated contents.

stream = client.chat.completions.create( model="gpt-4.1", messages=[ { "role": "user", "content": "Say 'double bubble bath' ten times fast.", }, ], stream=True, )

In LeapSDK iOS, we use generateResponse on the conversation object to obtain a Swift AsyncStream (equivalent to a Python stream) for generation. Since the model runner object contains all information about the model, we don’t need to indicate the model name in the call again.

let conversation = Conversation(modelRunner: modelRunner, history: []) let stream = conversation.generateResponse( message: ChatMessage( role: .user, content: [.text("Say 'double bubble bath' ten times fast.")] ) ) // This simplified call has the exactly same effect of the above call let stream = conversation.generateResponse(userTextMessage: "Say 'double bubble bath' ten times fast.")

Process generated contents

In clould API Python code, a for-loop on the stream object retrieves the contents.

for chunk in stream: if chunk.choices: delta_content = chunk.choices[0].delta.get("content") if delta_content: print(delta_content, end="", flush=True) print("") print("Generation done!")

In LeapSDK, we use a for await loop on the Swift AsyncStream to process the content. When the completion is done, a MessageResponse.complete case will be received.

for await response in stream { switch response { case .chunk(let text): print(text, terminator: "") case .reasoningChunk(let reasoning): // Handle reasoning content if needed break case .complete(let usage, let reason): print("") print("Generation done!") print("Usage: \(usage)") print("Finish reason: \(reason)") } }

Task and async/await

LeapSDK API is similar to cloud-based API. It is worth noting that most LeapSDK iOS APIs are based on Swift async/await . You will need to use an async context to execute these functions. In iOS, we recommend using Task or async functions within SwiftUI views with lifecycle-aware components.

func sendMessage(_ text: String) { Task { guard let modelURL = Bundle.main.url( forResource: "qwen3-0_6b", withExtension: "bundle" ) else { print("Could not find model bundle") return } let modelRunner = try await Leap.load(url: modelURL) let conversation = Conversation(modelRunner: modelRunner, history: []) let stream = conversation.generateResponse( message: ChatMessage( role: .user, content: [.text(text)] ) ) for await response in stream { switch response { case .chunk(let text): print(text, terminator: "") case .reasoningChunk(let reasoning): // Handle reasoning content if needed break case .complete(let usage, let reason): print("") print("Generation done!") print("Usage: \(usage)") print("Finish reason: \(reason)") } } } }

Next steps

For more information, please refer to the quick start guide and API reference.

Last updated on