Local AI Models with LM Studio and Spring AI

Local AI Models with LM Studio and Spring AI

This article explains how to use LM Studio to run AI models locally and use them in a Java application with Spring AI and Spring Boot. LM Studio is one of the most popular alternatives to Ollama, making it easy to run LLMs directly on your laptop. With a clean UI and a built-in OpenAI-compatible API server, you can easily integrate it with your applications. For those of you experimenting with LLMs and Java applications, the Spring AI framework seems like the natural choice. Today, you’ll learn how to use models launched via LM Studio in your application using Spring AI. For Mac users, it’s even more interesting because LM Studio supports MLX models, which can bring significant memory and performance improvements on macOS.

Spring AI is a framework that helps developers easily integrate artificial intelligence features, such as large language models, embeddings, and vector databases, into Spring Boot applications. On my blog, you will find a collection of articles that guide you step by step through working with Spring AI. I cover both simple examples that help you get started quickly and more advanced, complex use cases. I suggest starting with this article.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several sample applications. The correct application for this article is in the spring-ai-openai-compatibility directory.

Run AI Models with LM Studio

Using LM Studio, we can run many different models locally in GGUF or MLX formats. In this case, we need a model for a typical chat application interaction to demonstrate how to integrate the Spring AI application with a model on LM Studio via an OpenAI-compatible API server. To select the optimal model for my hardware in this category, I will use the llmfit tool. This is a simple command-line tool that, when run, displays a list of models recommended for a given hardware configuration, organized by category and with an overall score. The model I’ve chosen for today’s experiments is shown in the figure below. This is DeepSeek-V2-Lite-Chat. It may not be the highest-rated model in the Chat category, but it strikes a reasonable balance between token-processing speed and overall rating.

lm-studio-spring-ai-llmfit

Then, let’s look for our model in LM Studio. Here I found the model mlx-community/DeepSeek-V2-Lite-Chat-4bit-mlx. It’s even better because I want to run a model converted to the MLX format, which is optimized for macOS. Let’s download that model.

lm-studio-spring-ai-install-model

Finally, we can run the model locally. In the Local Server section, find the DeepSeek-V2-Lite-Chat-4bit-mlx model after clicking the Load Model button. Then just click the model to run it with LM Studio. The local server must be enabled. By default, it listens on port 1234. You can load and run several AI models in this way. Then the specific model is identified by its name. In our case, the model name is deepseek-v2-lite-chat-mlx. Remember both the local server name and the model identifier to set them in your Spring AI application settings later.

lm-studio-spring-ai-run-model

Integrate Spring AI with the Model on LM Studio

Using OpenAI-Compatible API

As I mentioned before, LM Studio provides an OpenAI-compatible API server. So, we need to include the Spring AI OpenAI starter in our Spring Boot app dependencies.

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
XML

Here’s a list of properties in the Spring Boot application.properties file. The API server is not protected with any API key, but we must set something to avoid an empty value. Then, we must set the API URL and the model name using the values previously read for that model in LM Studio.

spring.ai.openai.api-key = ${OPENAI_API_KEY:lm-studio}
spring.ai.openai.chat.base-url = http://192.168.0.16:1234
spring.ai.openai.chat.options.model = deepseek-v2-lite-chat-mlx

logging.level.org.springframework.ai.chat.client.advisor = DEBUG
Plaintext

Our application exposes one REST endpoint for demo purposes. The GET /simple/{country} allows us to ask for the capital of a given country. The LLM should briefly describe the city’s history.

@RestController
@RequestMapping("/simple")
public class SimpleController {

    private final ChatClient chatClient;

    public SimpleController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(SimpleLoggerAdvisor.builder().build())
                .build();
    }

    @GetMapping("/{country}")
    public String ping(@PathVariable String country) {
        PromptTemplate pt = new PromptTemplate("""
                What's the capital of {country} ?
                Describe the history of that city briefly.
        """);

        return chatClient.prompt(pt.create(Map.of("country", country)))
                .call()
                .content();
    }
}
Java

Then, go to the repository root directory and run the following command to start the app:

mvn spring-boot:run
ShellSession

Once the app starts successfully, you can make the following request by entering the name of the country whose capital you want to learn more about.

curl http://localhost:9080/simple/Poland 
ShellSession

Now for something a bit unexpected. Even though the app doesn’t throw any errors, it seems that it simply can’t connect to the model running on LM Studio. The requests were sent successfully from the app perspective, but the connection just hung indefinitely. The root cause is that LM Studio currently doesn’t support HTTP/2, which is used by default by Spring AI RestClient in Spring Boot apps. Therefore, to resolve this issue, we must enforce the use of the HTTP/1.1 protocol. To achieve it, just override the default RestClient builder in the following way:

@SpringBootApplication
public class SpringAIOpenAICompatibility {
    public static void main(String[] args) {
        SpringApplication.run(SpringAIOpenAICompatibility.class, args);
    }

    @Bean
    public RestClient.Builder restClientBuilder() {
        HttpClient httpClient = HttpClient.newBuilder()
                .version(HttpClient.Version.HTTP_1_1) // force HTTP/1.1
                .build();

        return RestClient.builder()
                .requestFactory(new JdkClientHttpRequestFactory(httpClient));
    }
}
Java

Now, you can restart the app and repeat a test call. My model works quite well. It responds fairly quickly and accurately describes the history of the selected capital.

lm-studio-spring-ai-response

Using Anthropic-Compatible API

LM Studio also provides an endpoint compatible with the Anthropic API. In that case, you can include a different starter than before. Here’s the starter that supports Anthropic API.

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-anthropic</artifactId>
</dependency>
XML

Then, you can leave the same values in the configuration properties, but under a different key spring.ai.anthropic.*.

spring.ai.anthropic.api-key = ${OPENAI_API_KEY:lm-studio}
spring.ai.anthropic.chat.base-url = http://192.168.0.16:1234
spring.ai.anthropic.chat.options.model = deepseek-v2-lite-chat-mlx

logging.level.org.springframework.ai.chat.client.advisor = DEBUG
Plaintext

Finally, you can restart the app and call the same app endpoints. This time, the app will communicate with the exact same model, but via the Anthropic API.

You can enable several features specific to the Anthropic API. Of course, the selected feature must be supported by the target AI model. Below is how to create a Spring AI client that enables the “thinking” mechanism using the AnthropicChatOptions object.

@GetMapping("/{country}")
public String ping(@PathVariable String country) {
   PromptTemplate pt = new PromptTemplate("""
          What's the capital of {country} ?
          Describe the history of that city briefly.
   """);

   return chatClient.prompt(pt.create(Map.of("country", country)))
          .options(AnthropicChatOptions.builder()
                  .temperature(1.0)
                  .thinking(AnthropicApi.ThinkingType.ENABLED, 2048)
                  .build())
          .call()
          .content();
}
Java

LM Studio allows you to verify logs on the server-side API. This makes it easy to verify, for example, that a message is being sent in a format compatible with the Anthropic API (the POST /v1/messages endpoint).

Conclusion

LM Studio offers more features for working with local models than Ollama. I find it particularly useful that I can easily run a model in MLX format and view the logs of messages sent to the API server. While experimenting with Spring AI and LM Studio, I ran into an unexpected issue with HTTP/2 support, which was quite confusing. However, I didn’t have problems solving it quickly using standard Spring features.

Leave a Reply