Ollama cheatsheet

Posted May 26, 2025

By Daniel Florido

4 min read

Ollama is a platform designed to easily download, run, and manage various LLMs (including Llama models) locally on your computer. Here is a simple LLaMA (not Ollama) cheat sheet for everyday basic usage:

Ollama Cheatsheet 🚀

Ollama simplifies running large language models locally. This cheatsheet covers common commands and concepts.

Installation

Download & Install: Go to the Ollama website and download the appropriate installer for your operating system (macOS, Windows, Linux).

Basic Commands

Run a Model:

      ollama run <model_name>

Example: ollama run llama2 (If the model isn’t downloaded, it will download it first.)

List Downloaded Models:

    ollama list

Pull/Download a Model:

      ollama pull <model_name>

Example: ollama pull mistral

Remove a Model:

      ollama rm <model_name>

Example: ollama rm phi3

Show Model Information:

      ollama show <model_name>

This displays details like parameters, license, and system prompt.

Serve Ollama (API Mode):

      ollama serve

Runs Ollama in the background, exposing its API (default: http://localhost:11434). This is necessary for other applications to interact with Ollama.

Interacting with Models

Once ollama run <model_name> is executed, you’ll enter an interactive chat session with the model.

Exit Chat: Type /bye or press Ctrl + D.
Load another Model (within chat): Type /set model <another_model_name>
See current model info (within chat): Type /set model
Multiturn Conversations: Ollama remembers context within the same run session.

Modelfiles (Customizing Models)

Modelfiles allow you to create, modify, or extend models. They’re similar to Dockerfiles.

Basic Structure:

      FROM <base_model_name> # Required: Specifies the base model
      PARAMETER temperature 0.7 # Optional: Adjusts generation randomness
      SYSTEM """You are a helpful AI assistant.""" # Optional: Sets a system prompt
      MESSAGE user "What is the capital of France?" # Optional: Add example messages

Create a Modelfile:
1. Create a new file, e.g., MyAssistant.Modelfile.
2. Add your desired instructions.
3. Create the custom model:

ollama create <new_model_name> -f <path/to/MyAssistant.Modelfile>

Example: ollama create my-assistant -f ./MyAssistant.Modelfile

Run your Custom Model:

ollama run my-assistant

Advanced Commands & Concepts

Copy a Model:

ollama copy <source_model> <destination_model>

Useful for creating a base for a new Modelfile.

Push a Model (to a registry):

      ollama push <model_name>

For sharing your custom models (requires a configured registry).

REST API: When ollama serve is running, you can interact with models programmatically.
Generate Completion: POST /api/generate
Chat Completion: POST /api/chat
List Models: GET /api/tags

Refer to the Ollama API documentation for full details.

Common Issues & Tips

“Error: connection refused”: Ensure ollama serve is running in the background.
Model Size: LLMs are large! Ensure you have enough disk space and RAM.

Choosing models from Huggingface and integrate into Ollama

To choose and integrate models from the Hugging Face website into OLLAMA, follow these steps:

Choosing a model

Browse the Hugging Face Model Hub: Visit the Hugging Face Model Hub to explore available models. You can search by keyword, select a specific task (e.g., sentiment analysis, text classification), or filter by language.
Evaluate model performance: Read through the model’s documentation and evaluate its performance on various tasks. Check the model’s accuracy, precision, recall, F1-score, and other metrics to determine its suitability for your use case.
Select a model that matches your needs: Choose a model that meets your requirements, such as the task you want to perform (e.g., conversation generation), the language you need to support, and the level of complexity.

Integrating a model into OLLAMA

Check if the model is compatible with OLLAMA: Ensure the chosen model is compatible with OLLAMA by checking its architecture, requirements, and supported features.
Download the model: Download the selected model using the Hugging Face API or by clicking on the “Use” button next to the model in the Model Hub. You can choose between various formats, including .pytorch, .onnx, and .huggingface.
Prepare the model for OLLAMA: Adapt the downloaded model to OLLAMA’s requirements by making any necessary modifications, such as updating the tokenizer or adding custom logic.
Integrate the model into your OLLAMA workflow: _ Use the Hugging Face Python library (transformers) to load and process the model. _ Create a custom OLLAMAModel class that wraps the downloaded model and provides necessary functionality for integrating it with OLLAMA.

Here’s an example of how you can integrate a Hugging Face model into your OLLAMA workflow:

        
      
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the pre-trained model and tokenizer
model_name = "t5-small"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

class OLLAMAModel(torch.nn.Module):
    def __init__(self):
        super(OLLAMAModel, self).__init__()
        self.model = model
        self.tokenizer = tokenizer

    def forward(self, input_ids, attention_mask):
        # Preprocess the input IDs and attention mask
        inputs = self.tokenizer(input_ids, attention_mask=attention_mask, return_tensors="pt")
        outputs = self.model(**inputs)
        return outputs.last_hidden_state[:, 0, :]

# Create an instance of the OLLAMA model
model_instance = OLLAMAModel()

# Use the model to generate text
input_text = "Hello, how are you?"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
attention_mask = tokenizer.encode(["<s>"] + input_text.split() + ["</s>"])

outputs = model_instance(input_ids, attention_mask)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Note that this is a simplified example and you may need to modify the code to fit your specific use case. Additionally, consult the OLLAMA documentation and Hugging Face API for more information on integrating models with these platforms.

This post is licensed under CC BY 4.0 by the author.