Comparision with Cloud AI API

If you are familiar with cloud-based AI APIs (e.g. OpenAI API ), this document shows the similarity and differences between these clould APIs and Leap.

We will inspect this piece of Python-based OpenAI API chat completion request to figure out how to migrate it to LeapSDK. This example code is modified from OpenAI API document .


from openai import OpenAI
client = OpenAI()
 
stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)
 
for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)
 
print("")
print("Generation done!")

Loading the model

While models can be directly used on the cloud-based APIs once the API client is created, LeapSDK requires the developers to explicitly load the model before requesting the generation. It is necessary because the model will run locally. This step generally takes a few seconds depending on the model size and the device performance.

On cloud API, you need to create a API client:


client = OpenAI()

In LeapSDK, you need to load the model to create a model runner.


val modelRunner = LeapClient.loadModel(MODEL_BUNDLE_PATH)

The parameter will be the path to the model bundle file. This path must be local. The return value is a “model runner” which plays the similar role of client object in the clould API except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.

Request for generation

In the cloud API calls, client.chat.completions.create will return a stream object for caller to fetch the generated contents.


stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

In LeapSDK Android, we use generateResponse on the conversation object to obtain a Kotlin flow (equivalent to a Python stream) for generation. Since the model runner object contains all information about the model, we don’t need to indicate the model name in the call again.


val conversation = modelRunner.createConversation()
val stream = conversation.generateResponse(
    ChatMessage(
        ChatMessage.Role.User,
        listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
    )
)
 
// This simplified call has the exactly same effect of the above call
val stream = conversation.generateResponse("Say 'double bubble bath' ten times fast.")

Process generated contents

In clould API Python code, a for-loop on the stream object retrieves the contents.


for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)
 
print("")
print("Generation done!")

In LeapSDK, we call onEach function on the Kotlin flow to process the content. When the completion is done, callback in onCompletion will be invoked. In the end, a call to collect() is necessary to start the generation.


stream.onEach { chunk ->
    when (chunk) {
        is MessageResponse.Chunk -> {
            print(chunk.text)
        }
        else -> {}
    }
}.onCompletion {
    print("")
    print("Generation done!")
}.collect()

Coroutine scope

LeapSDK API is similar to cloud-based API. It is worth noting that most LeapSDK Android APIs are based on Kotlin coroutine . You will need to use a coroutine scope to execute these functions. In Android, we recommend to use “lifecycle scope” defined on lifecycle-aware components .


lifecycleScope.launch {
    val modelRunner = LeapClient.loadModel(MODEL_BUNDLE_PATH)
 
    val conversation = modelRunner.createConversation()
    val stream = conversation.generateResponse(
        ChatMessage(
            ChatMessage.Role.User,
            listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
        )
    )
 
    stream.onEach { chunk ->
        when (chunk) {
            is MessageResponse.Chunk -> {
                print(chunk.text)
            }
            else -> {}
        }
    }.onCompletion {
        print("")
        print("Generation done!")
    }.collect()
}

Next steps

For more information, please refer to the quick start guide and API reference.