Managing Models
LME supports both local models (running on your server) and cloud models (via API). You can switch between them at any time through the dashboard or by editing configuration files.
Default Models
LME ships with two models, downloaded automatically during installation:
| Model | File | Size | Purpose |
|---|---|---|---|
| LFM2.5-1.2B-Instruct | LFM2.5-1.2B-Instruct-Q4_K_M.gguf | ~0.8 GB | Chat and analysis |
| nomic-embed-text-v1.5 | nomic-embed-text-v1.5.Q4_K_M.gguf | ~0.3 GB | Text embeddings (RAG) |
Models are stored in /opt/lme/llama-models/ on the LME server.
Settings Navigation
All model management is under the Settings tab in the dashboard. The Settings page has a left sidebar with five sections:
- AI Models — add, remove, and switch between configured models (local and cloud)
- Local Models — manage downloaded
.gguffiles, download new ones, switch the active local model - KEV Configuration — CISA KEV sync settings (see KEV Integration)
- Documents — RAG documentation ingestion status and re-ingestion
- General — auto-refresh and default RAG mode toggles
Switching Local Models
Via the Dashboard
- Open the dashboard at
https://<your-lme-server-ip>:8502 - Go to Settings > Local Models
- You will see all downloaded
.gguffiles listed, with file sizes. The currently active model has a green border and an "Active" badge. - Click the "Switch" button next to the model you want to activate
- A confirmation dialog appears: "Switch llama.cpp to 'filename'? This will restart the llama.cpp container. AI chat will be briefly unavailable."
- Click OK
- The status badge changes to "Restarting..." (yellow)
- Wait for the status to change to "Running" (green) — this typically takes 10-30 seconds
- The model pill in the header updates to the new model name
Via the Command Line
On the LME server:
-
Edit the model config file:
sudo nano /opt/lme/config/llama-cpp-model.json -
Set the model filename:
{"model": "your-model-filename.gguf"} -
Touch the trigger file to activate the switch:
sudo touch /opt/lme/config/.llama-model-updated -
A systemd path watcher detects the change and runs the switch script automatically. Check the status:
cat /opt/lme/config/llama-cpp-status.jsonIt will show
"switching"then"ready"when complete.
What Happens During a Model Switch
- The model filename is validated (must exist in
/opt/lme/llama-models/) - The
--modelargument in the llama.cpp container quadlet file is updated systemctl daemon-reloadruns to pick up the change- The
lme-llama-cppservice restarts with the new model - Status is written to
/opt/lme/config/llama-cpp-status.json
Downloading New Local Models
Via the Dashboard
- Go to Settings > Local Models
- Find the search box below the installed models list
- Enter a search term and press Enter or click "Search". You can search by:
- Repository path:
google/gemma-3-1b-it,meta-llama/Llama-3.2-1B-Instruct - General terms:
mistral 7b,phi-3-mini,gemma 1b
- Repository path:
- The dashboard shows "Searching..." then displays result cards
- Each result card shows a HuggingFace repository with a list of
.gguffiles, including:- Filename
- File size (MB or GB)
- Quantization type (e.g., Q4_K_M, Q5_K_M, Q8_0)
- Hover over a file to reveal the "Download" button
- Click "Download"
- A progress card appears showing the download with a spinning icon, filename, and downloaded size
- When complete, the model appears in the Installed Models list above and can be switched to immediately
The search is smart about GGUF repos. If you search for a model like google/gemma-3-1b-it, it automatically checks for -GGUF suffix variants and popular quantizer repos (bartowski, mradermacher, QuantFactory) to find pre-quantized versions.
Via the Command Line
You can download GGUF models directly from HuggingFace:
cd /opt/lme/llama-models/
# Example: download a Mistral 7B model
sudo wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
After downloading, switch to the model using either the dashboard or command line method above.
Choosing a Model
| Model Size | RAM Needed | Quality | Speed | Good For |
|---|---|---|---|---|
| 1-3B parameters | 2-4 GB | Basic | Fast | Quick summaries, simple Q&A |
| 7B parameters | 6-8 GB | Good | Moderate | Most security analysis tasks |
| 13B parameters | 12-16 GB | Very good | Slower | Complex analysis, detailed reports |
| 70B+ parameters | 48+ GB | Great | Slow | Most complex analysis (needs powerful hardware) |
Look for models with Q4_K_M quantization — this is a good balance of quality and size. Avoid Q2 (too low quality) and f16 (too large for most hardware).
Deleting Local Models
Via the Dashboard
- Go to Settings > Local Models
- Click the trash icon next to the model you want to remove
- A confirmation dialog appears: "Delete 'filename'? This cannot be undone."
- Click OK to confirm
You cannot delete the model that is currently active. The dashboard will reject the request (HTTP 409). Switch to a different model first.
Via the Command Line
sudo rm /opt/lme/llama-models/<model-filename>.gguf
Adding Cloud Models
Cloud models offer higher quality analysis but require internet access and an API key from the provider.
Via the Dashboard
- Go to Settings > AI Models
- Scroll down to the Add Model form
- Select a provider by clicking one of the four buttons:
- Local (llama.cpp) — for adding another local model endpoint
- OpenAI (GPT-4o, etc.)
- Anthropic (Claude)
- OpenRouter (multi-provider gateway)
- After selecting a provider, quick-pick buttons appear with suggested model IDs (e.g., "gpt-4o", "claude-3-sonnet"). Click one to auto-fill the Model ID field, or type your own.
- Fill in the fields:
- Display Name — what this model is called in the dashboard (e.g., "GPT-4o")
- Model ID — the LiteLLM model identifier (e.g.,
gpt-4o,anthropic/claude-3-sonnet-20240229) - API Key — your provider's API key (e.g.,
sk-...for OpenAI). This field only appears for cloud providers. - API Base URL — (optional) pre-filled based on provider. Override for custom endpoints.
- Click "Add Model"
- The button shows "Saving..."
- The model appears in the Configured Models list above
The API key is encrypted using Fernet symmetric encryption (derived from the LME Ansible vault password) and stored securely at /opt/lme/config/llm_keys.enc. A systemd path watcher detects the change, decrypts the keys, injects them into a Podman secret, and restarts LiteLLM — all automatically.
After adding a model, you still need to switch to it to start using it. See Switching the Active Model below.
Via the Configuration File
-
Edit the LiteLLM config:
sudo nano /opt/lme/config/litellm_config.yaml -
Add a model entry under the
model_list:model_list:# Existing local model- model_name: lfm2.5-1.2b-instructlitellm_params:model: openai/LFM2.5-1.2B-Instruct-Q4_K_Mapi_base: https://lme-llama-cpp:8080/v1api_key: dummyssl_verify: false# Add a cloud model (example: OpenAI GPT-4)- model_name: gpt-4litellm_params:model: gpt-4api_key: sk-your-openai-key-here -
Restart LiteLLM:
sudo systemctl restart lme-litellm
Supported Cloud Providers
Here are example configurations for each supported provider:
OpenAI
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: sk-your-key-here
Anthropic (Claude)
- model_name: claude-3-sonnet
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: sk-ant-your-key-here
Azure OpenAI
- model_name: azure-gpt-4
litellm_params:
model: azure/your-deployment-name
api_base: https://your-resource.openai.azure.com/
api_key: your-azure-key
api_version: "2024-02-15-preview"
Google Vertex AI (Gemini)
- model_name: gemini-pro
litellm_params:
model: vertex_ai/gemini-pro
vertex_project: your-gcp-project
vertex_location: us-central1
AWS Bedrock
- model_name: bedrock-claude
litellm_params:
model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
aws_access_key_id: your-access-key
aws_secret_access_key: your-secret-key
aws_region_name: us-east-1
Ollama (Self-Hosted)
- model_name: ollama-llama3
litellm_params:
model: ollama/llama3
api_base: http://your-ollama-server:11434
Removing Cloud Models
Via the Dashboard
- Go to Settings > AI Models
- Find the model in the Configured Models list
- Hover over the model card to reveal the "Remove" button
- Click "Remove"
- A confirmation dialog appears: "Remove model 'name'?"
- Click OK — the model is removed from the LiteLLM configuration
Via the Configuration File
Remove the model entry from /opt/lme/config/litellm_config.yaml and restart LiteLLM:
sudo systemctl restart lme-litellm
Switching the Active Model
The active model is the one used for all AI chat and analysis requests.
Via the Dashboard
- Go to Settings > AI Models
- In the Configured Models list, find the model you want to use
- Click the "Use" button on that model's card
- The model is instantly activated — the green "In Use" badge moves to the selected model
- The model pill in the header bar updates to the new model name
You can also click the model pill in the header bar to jump directly to this screen.
Switching between configured models is instant — no service restart needed. The change takes effect on your next chat message or analysis request.
Via the API
curl -sk -X POST https://localhost:8502/api/models/active \
-H "Content-Type: application/json" \
-d '{"model_name": "gpt-4"}'
How API Keys are Secured
Cloud API keys follow this security pipeline:
- You enter the key in the dashboard
- The dashboard encrypts it using Fernet symmetric encryption derived from the LME Ansible vault password (PBKDF2, 100,000 iterations)
- The encrypted blob is written to
/opt/lme/config/llm_keys.enc - A trigger file is touched:
/opt/lme/config/.llm-keys-updated - A systemd path watcher detects the change
- The
sync_llm_keys.pyscript decrypts the keys and injects them into a Podman secret (llm-keys) - LiteLLM is restarted and reads the keys from
/run/secrets/llm_keys
At no point are plain text API keys stored on disk outside of the Podman secret mount.
Troubleshooting
Model switch stuck on "switching"
Check the switch script status:
sudo journalctl -u lme-llama-model.service -n 20
Common causes:
- The model file does not exist in
/opt/lme/llama-models/ - The llama.cpp service failed to restart
LiteLLM not picking up cloud model
- Verify the config syntax:
python3 -c "import yaml; yaml.safe_load(open('/opt/lme/config/litellm_config.yaml'))"
- Check LiteLLM logs:
sudo podman logs lme-litellm --tail 50
- Restart LiteLLM:
sudo systemctl restart lme-litellm
"Model not found" error in chat
The model name in your chat request must match a model_name in the LiteLLM config. Check available models:
curl -sk https://localhost:4000/v1/models -H "Authorization: Bearer sk-lme-llama-proxy"