Omniscope Evo with an Enterprise licence now lets you configure OpenAI-compatible providers (namely, LLM inference engines providing an OpenAI compatible chat completions endpoint).
There are many models, model variants/quantisations and providers available for running locally on your laptop, on prem in a company server, or self-hosted on a private cloud. This article focuses on a single simple example: llama.cpp on a MacBook Pro M2 (or similar/later) running Qwen_QwQ-32B-Q4_K_M.gguf.
For other providers or environments, consult the provider
Install llama.cpp
The excellent Llama.cpp supports CPU-only, and GPU accelerated (Apple Silicon is a first-class example) AI models in the gguf format as widely available under the "Quantizations" section for a given model on huggingface.co (allowing much larger models to be run on smaller commodity hardware).
On a Mac (such as a MacBook Pro M2 with 32gb memory), install llama.cpp following their instructions. We recommend using brew:
brew install llama.cpp
Download a model
Visit huggingface.co and find the model you want. For example today I'm playing with https://huggingface.co/Qwen/QwQ-32B.
On the right, follow the Quantizations section. Typically there'll be several people offering these. You can usually pick the top one. Bartowski or Unsloth are good bets, but in this cases, Qwen themselves provide the top choice: https://huggingface.co/Qwen/QwQ-32B-GGUF
Under "Files and versions", pick the .gguf file you want. If unsure, pick the q4_k_m variant. In this case it's:
qwq-32b-q4_k_m.gguf
(meaning a 32 billion parameter model, quantized to approx. 4.8 bits per weight).
It's a big file. This example is 20gb.
Run the model
In Terminal, run the following, adapted as required:
llama-server --port 8081 -m models/qwq-32b-q4_k_m.gguf
Configure Omniscope
Following on from the introduction in How to enable AI in Omniscope, go back into AI Settings in Omniscope, and add a Custom provider. You'll need an Enterprise licence; contact support@visokio.com for a trial.
All you will need to configure is the Endpoint base URL. For example, I've specified port 8081 above (the default is 8080) and so the base URL when running llama.cpp on the same machine as Omniscope will be http://127.0.0.1:8080/ as shown here:
Now go to the Report Ninja integration settings and pick the model in the Default model drop-down.
Try it out
Now try using Instant Dashboard on the home page, to upload a data file and see a dashboard.
Note: large local models like 32B ones are not very "instant" and can take a minute or more to respond. You may need to go to the Custom provider Advanced settings and increase the response timeout.
Performance
If performance is poor, try smaller models such as DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf, which is less capable and more likely to misunderstand you, or not understand the Omniscope dashboard completely, but is much faster. Also consider using a newer, faster machine, such as the latest Macbook Pro.
Caveats
Local models are typically only compatible with Report Ninja. The AI Block and Custom SQL typically require real OpenAI models rather than OpenAI-compatible models.
Smaller models will not work sufficiently to be useful.
GPU acceleration is essential for sufficient response times.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article