PocketAI v1.0

Documentation

PocketAI is a Windows desktop app that runs open-source AI language models entirely on your own hardware. No cloud, no subscription, no data leaving your machine.

This documentation covers installation through advanced settings. New users should start with Installation and follow in order.

πŸ’‘ PocketAI is free forever. No paid tiers, no usage limits, no account required.

System Requirements

PocketAI runs on any modern Windows PC. Requirements depend on the model you choose.

ComponentMinimumRecommended
OSWindows 10 (64-bit)Windows 11 (64-bit)
RAM4 GB (for 1B models)8 GB+ (for 7B models)
Storage500 MB + model size10 GB free space
CPUAny x64 processorModern multi-core (4+ cores)
GPUNot required β€” CPU-only in v1.0
⚠️ Running Mistral 7B on under 8 GB of RAM will be slow or fail. Start with Llama 3.2 1B or 3B if you're unsure.

Installation

PocketAI ships as a single .exe file β€” no installer, no setup wizard.

Steps

πŸ›‘οΈ The SmartScreen warning appears because the app is not yet code-signed. It is safe to run. This will be resolved in a future release.

Where PocketAI stores data

C:\Users\<YourName>\PocketAI\
  β”œβ”€β”€ models\          ← downloaded model files (.gguf)
  β”œβ”€β”€ conversations\   ← chat history (.json)
  └── settings.json    ← your preferences

Deleting this folder removes all app data. The PocketAI.exe file can be moved or deleted independently.

First Launch

On first launch you'll see three tabs in the left sidebar:

The status bar at the bottom shows which model is currently loaded. On first launch it reads "No model loaded". Start in the Models tab.

Downloading Models

Models are downloaded once and live on your hard drive. They aren't bundled with the app because they're too large.

How to download

⚠️ Model files range from 0.8 GB to 4.4 GB. Ensure you have enough free disk space before downloading.

To cancel a download in progress, click Cancel. The partial file is deleted automatically. To remove an installed model, click Delete on its row.

Loading a Model

You must load a model into memory before chatting. Only one model can be loaded at a time.

πŸ’‘ PocketAI remembers the last loaded model and reloads it automatically on next launch.

Chatting

Sending a message

Streaming responses

The AI's reply streams in token by token in real time. You don't have to wait for the full response.

Stopping generation

While generating, the Send button becomes Stop. Click it to interrupt at any point.

Conversations

PocketAI saves every conversation automatically.

Conversations are stored as JSON in C:\Users\<YourName>\PocketAI\conversations\.

Available Models

ModelSizeRAMBest for
Llama 3.2 1B0.8 GB2 GB+Fast replies, low-end PCs
Llama 3.2 3B2.0 GB4 GB+Best all-rounder
Phi-3.5 Mini2.2 GB4 GB+Reasoning, code
Gemma 2 2B1.6 GB4 GB+Natural conversation
Mistral 7B4.4 GB8 GB+Highest quality output

Which Model to Choose

4 GB RAM or less

Use Llama 3.2 1B. Fastest and lightest. Good for everyday tasks like summarising and Q&A.

4–8 GB RAM

Use Llama 3.2 3B β€” the recommended default for most users. Or try Gemma 2 2B for more natural conversation.

8 GB+ RAM

Try Mistral 7B for highest quality, or Phi-3.5 Mini for coding and logic tasks.

API Keys

PocketAI supports both local models and cloud models from OpenAI, Anthropic, and any OpenAI-compatible endpoint. You supply your own API key β€” it is stored locally and never transmitted to Vector Dynamics.

Adding an OpenAI or Anthropic key

  1. Open Settings from the sidebar.
  2. Enter your key in the OpenAI API Key or Anthropic API Key field.
  3. Click Save API keys.
  4. Go to Models β€” the cloud section will now show available models. Click Use to activate one.

Custom OpenAI-compatible endpoint

If you have a local server like Ollama or LM Studio, or a self-hosted endpoint:

  1. Enter the base URL in Custom Endpoint URL (e.g. http://localhost:11434/v1).
  2. Optionally add an API key if required.
  3. Enter the model name your server expects (e.g. llama3, mistral).
  4. Save, then activate from the Models tab under Custom Endpoint.
πŸ” Keys are stored in ~/PocketAI/settings.json. You can delete or clear them at any time from Settings.

Temperature

Range: 0.0 – 2.0  Β·  Default: 0.7

Controls randomness in output.

Max Tokens

Default: 2048

The maximum number of tokens generated per response. Roughly ΒΎ of a word per token β€” 2048 tokens β‰ˆ 1,500 words. Increase if responses get cut off; decrease for shorter, faster replies.

Context Length

Default: 4096

How many tokens the model holds in memory at once, including conversation history. Larger windows allow longer conversations but use more RAM.

⚠️ Changing context length requires reloading the model. Unload and reload after saving this setting.

System Prompt

Default: "You are a helpful AI assistant. Be concise and accurate."

A hidden instruction sent to the model before every conversation. Use it to set a persona, a role, or formatting rules.

# Coding assistant
You are an expert software engineer. Answer with clean,
well-commented code. Prefer concise explanations.

# Writing coach
You are a professional editor. Review text for clarity
and tone. Be direct and constructive.

# Study buddy
You are a tutor. Explain simply, use analogies, and check
understanding with follow-up questions.

FAQ

Is PocketAI really free?

Yes. No paid version, no premium tier, no in-app purchases. Free to download and use without limits.

Does PocketAI send data to the internet?

Only when downloading a model β€” those files come from Hugging Face. Conversations and usage data never leave your machine.

Why is the first response slow?

The model loads fully into RAM before generating. Subsequent messages in the same session are faster. Smaller models warm up faster.

Can I use my own models?

Not via the UI in v1.0, but you can place any compatible .gguf file in PocketAI\models\ and the app will detect it. UI support for custom models is planned.

Windows shows a SmartScreen warning. Is it safe?

Yes. The warning appears because PocketAI is not yet code-signed. Click More info β†’ Run anyway. Code signing is planned for the next release.

Does it require a GPU?

No. PocketAI v1.0 runs entirely on CPU. GPU acceleration is on the roadmap.

Mac or Linux support?

Not yet β€” v1.0 is Windows only. Mac and Linux builds are planned.

Can I use my own OpenAI or Anthropic API key?

Yes. PocketAI supports cloud models alongside local ones. Go to Settings β†’ API Keys, enter your key, then head to the Models tab to select a cloud model. Your key is stored locally and is only used to make requests directly to the respective API β€” it is never sent to Vector Dynamics or any third party.

Which cloud providers are supported?

PocketAI supports OpenAI (GPT-4o, GPT-4o mini, GPT-3.5 Turbo), Anthropic (Claude Sonnet 4.6, Claude Haiku 4.5), and any custom OpenAI-compatible endpoint β€” such as Ollama, LM Studio, or your own deployed server.

Do cloud API requests cost money?

PocketAI itself is free. Cloud API usage is billed directly by the provider to your account (OpenAI, Anthropic, etc.). Local models are always free with no per-message cost.

Is my API key stored securely?

Yes. Keys are stored in a plain JSON file in your user directory (~/PocketAI/settings.json). They are never transmitted to Vector Dynamics β€” only to the API endpoint you configure. You can clear them at any time from Settings.

Can I use a custom endpoint like Ollama or LM Studio?

Yes. In Settings β†’ API Keys, enter your endpoint URL (e.g. http://localhost:11434/v1), optionally an API key, and the model name you want to use. Then activate it from the Models tab under Custom Endpoint.

Changelog

v1.0.0 β€” June 2026