🦙Deploying your own model with Ollama
Setting Up Your Model
For this guide, we'll use Llama 3 8B as our example model, and we'll set it up using Ollama, a platform that makes local development with open-source large language models super easy.
Step 1: Install Ollama
Visit the Ollama website or their GitHub repo.
Download the installer for your operating system (MacOS, Windows, or Linux).
If you're using Linux, you can run this command in your terminal:
The installation process usually takes a few minutes. If you have NVIDIA or AMD GPUs, Ollama will automatically detect them (make sure you have the drivers installed). Don't worry if you don't have a GPU – CPU-only mode works fine too, just a bit slower.
Step 2: Download a Model
Check out the Ollama model library to see all the supported models.
For this guide, we'll use Llama 3.1. To download it, open your terminal and run:
Step 3: Try Your Model Locally
In your terminal, run:
Start chatting with your model! You now have a powerful AI, similar in performance to GPT-3.5, running completely on your own machine.
Making Your Model Accessible
To use your model with the DecentAI mobile app when you're away from home, you'll need to make it accessible over the internet. We'll use ngrok for this.
Step 1: Set Up ngrok
Visit ngrok.com/download and download ngrok for your operating system.
Install ngrok following the provided instructions.
Step 2: Authenticate ngrok
Go to https://dashboard.ngrok.com/get-started/setup and follow the instructions to get your auth token.
In your terminal, run:
Step 3: Deploy Your Model Online
Ollama listens on port 11434 by default. To make it accessible, run this command in your terminal:
ngrok will provide you with a public URL. Keep this URL handy – you'll need it for the next step.
Last updated