PocketAI v1.0

Documentation

PocketAI is a Windows desktop app that runs open-source AI language models entirely on your own hardware. No cloud, no subscription, no data leaving your machine.

This documentation covers installation through advanced settings. New users should start with Installation and follow in order.

💡 PocketAI is free forever. No paid tiers, no usage limits, no account required.

System Requirements

PocketAI runs on any modern Windows PC. Requirements depend on the model you choose.

Component	Minimum	Recommended
OS	Windows 10 (64-bit)	Windows 11 (64-bit)
RAM	4 GB (for 1B models)	8 GB+ (for 7B models)
Storage	500 MB + model size	10 GB free space
CPU	Any x64 processor	Modern multi-core (4+ cores)
GPU	Not required — CPU-only in v1.0

⚠️ Running Mistral 7B on under 8 GB of RAM will be slow or fail. Start with Llama 3.2 1B or 3B if you're unsure.

Installation

PocketAI ships as a single .exe file — no installer, no setup wizard.

Steps

Go to vectordynamics.xyz and click Download for Windows.
Save PocketAI.exe anywhere on your PC.
Double-click to launch. Windows may show a SmartScreen warning — click More info → Run anyway.
The app opens. No installation step required.

🛡️ The SmartScreen warning appears because the app is not yet code-signed. It is safe to run. This will be resolved in a future release.

Where PocketAI stores data

C:\Users\<YourName>\PocketAI\
  ├── models\          ← downloaded model files (.gguf)
  ├── conversations\   ← chat history (.json)
  └── settings.json    ← your preferences

Deleting this folder removes all app data. The PocketAI.exe file can be moved or deleted independently.

First Launch

On first launch you'll see three tabs in the left sidebar:

💬 Chat — where you have conversations.
🤖 Models — where you download and manage AI models.
⚙️ Settings — where you configure generation parameters.

The status bar at the bottom shows which model is currently loaded. On first launch it reads "No model loaded". Start in the Models tab.

Downloading Models

Models are downloaded once and live on your hard drive. They aren't bundled with the app because they're too large.

How to download

Click 🤖 Models in the sidebar.
Browse the library. Each row shows the model name, description, and download size.
Click Download. A progress bar appears.
When complete, a Load button replaces the Download button.

⚠️ Model files range from 0.8 GB to 4.4 GB. Ensure you have enough free disk space before downloading.

To cancel a download in progress, click Cancel. The partial file is deleted automatically. To remove an installed model, click Delete on its row.

Loading a Model

You must load a model into memory before chatting. Only one model can be loaded at a time.

In the Models tab, click Load on an installed model.
The status bar shows "Loading…" while the model initialises.
When ready, the status bar shows "Model: [name]" and the card shows ✓ Loaded.
Switch to Chat and start talking.

💡 PocketAI remembers the last loaded model and reloads it automatically on next launch.

Chatting

Sending a message

Type in the input box at the bottom of the chat panel.
Press Enter to send, Shift + Enter for a newline.

Streaming responses

The AI's reply streams in token by token in real time. You don't have to wait for the full response.

Stopping generation

While generating, the Send button becomes Stop. Click it to interrupt at any point.

Conversations

PocketAI saves every conversation automatically.

The left panel inside Chat shows your conversation history, newest first.
Click any conversation to switch to it.
Click ＋ New Chat to start fresh.
Conversation titles are set from your first message automatically.

Conversations are stored as JSON in C:\Users\<YourName>\PocketAI\conversations\.

Available Models

Model	Size	RAM	Best for
Llama 3.2 1B	0.8 GB	2 GB+	Fast replies, low-end PCs
Llama 3.2 3B	2.0 GB	4 GB+	Best all-rounder
Phi-3.5 Mini	2.2 GB	4 GB+	Reasoning, code
Gemma 2 2B	1.6 GB	4 GB+	Natural conversation
Mistral 7B	4.4 GB	8 GB+	Highest quality output

Which Model to Choose

4 GB RAM or less

Use Llama 3.2 1B. Fastest and lightest. Good for everyday tasks like summarising and Q&A.

4–8 GB RAM

Use Llama 3.2 3B — the recommended default for most users. Or try Gemma 2 2B for more natural conversation.

8 GB+ RAM

Try Mistral 7B for highest quality, or Phi-3.5 Mini for coding and logic tasks.

API Keys

PocketAI supports both local models and cloud models from OpenAI, Anthropic, and any OpenAI-compatible endpoint. You supply your own API key — it is stored locally and never transmitted to Vector Dynamics.

Adding an OpenAI or Anthropic key

Open Settings from the sidebar.
Enter your key in the OpenAI API Key or Anthropic API Key field.
Click Save API keys.
Go to Models — the cloud section will now show available models. Click Use to activate one.

Custom OpenAI-compatible endpoint

If you have a local server like Ollama or LM Studio, or a self-hosted endpoint:

Enter the base URL in Custom Endpoint URL (e.g. http://localhost:11434/v1).
Optionally add an API key if required.
Enter the model name your server expects (e.g. llama3, mistral).
Save, then activate from the Models tab under Custom Endpoint.

🔐 Keys are stored in ~/PocketAI/settings.json. You can delete or clear them at any time from Settings.

Temperature

Range: 0.0 – 2.0 · Default: 0.7

Controls randomness in output.

0.1 – 0.4 — Focused, deterministic. Good for facts and code.
0.5 – 0.8 — Balanced. Works for most use cases.
1.0 – 2.0 — Creative and varied. Good for brainstorming or fiction.

Max Tokens

Default: 2048

The maximum number of tokens generated per response. Roughly ¾ of a word per token — 2048 tokens ≈ 1,500 words. Increase if responses get cut off; decrease for shorter, faster replies.

Context Length

Default: 4096

How many tokens the model holds in memory at once, including conversation history. Larger windows allow longer conversations but use more RAM.

⚠️ Changing context length requires reloading the model. Unload and reload after saving this setting.

System Prompt

Default: "You are a helpful AI assistant. Be concise and accurate."

A hidden instruction sent to the model before every conversation. Use it to set a persona, a role, or formatting rules.

# Coding assistant
You are an expert software engineer. Answer with clean,
well-commented code. Prefer concise explanations.

# Writing coach
You are a professional editor. Review text for clarity
and tone. Be direct and constructive.

# Study buddy
You are a tutor. Explain simply, use analogies, and check
understanding with follow-up questions.

FAQ

Is PocketAI really free?

Yes. No paid version, no premium tier, no in-app purchases. Free to download and use without limits.

Does PocketAI send data to the internet?

Only when downloading a model — those files come from Hugging Face. Conversations and usage data never leave your machine.

Why is the first response slow?

The model loads fully into RAM before generating. Subsequent messages in the same session are faster. Smaller models warm up faster.

Can I use my own models?

Not via the UI in v1.0, but you can place any compatible .gguf file in PocketAI\models\ and the app will detect it. UI support for custom models is planned.

Windows shows a SmartScreen warning. Is it safe?

Yes. The warning appears because PocketAI is not yet code-signed. Click More info → Run anyway. Code signing is planned for the next release.

Does it require a GPU?

No. PocketAI v1.0 runs entirely on CPU. GPU acceleration is on the roadmap.

Mac or Linux support?

Not yet — v1.0 is Windows only. Mac and Linux builds are planned.

Can I use my own OpenAI or Anthropic API key?

Yes. PocketAI supports cloud models alongside local ones. Go to Settings → API Keys, enter your key, then head to the Models tab to select a cloud model. Your key is stored locally and is only used to make requests directly to the respective API — it is never sent to Vector Dynamics or any third party.

Which cloud providers are supported?

PocketAI supports OpenAI (GPT-4o, GPT-4o mini, GPT-3.5 Turbo), Anthropic (Claude Sonnet 4.6, Claude Haiku 4.5), and any custom OpenAI-compatible endpoint — such as Ollama, LM Studio, or your own deployed server.

Do cloud API requests cost money?

PocketAI itself is free. Cloud API usage is billed directly by the provider to your account (OpenAI, Anthropic, etc.). Local models are always free with no per-message cost.

Is my API key stored securely?

Yes. Keys are stored in a plain JSON file in your user directory (~/PocketAI/settings.json). They are never transmitted to Vector Dynamics — only to the API endpoint you configure. You can clear them at any time from Settings.

Can I use a custom endpoint like Ollama or LM Studio?

Yes. In Settings → API Keys, enter your endpoint URL (e.g. http://localhost:11434/v1), optionally an API key, and the model name you want to use. Then activate it from the Models tab under Custom Endpoint.

Changelog

v1.0.0 — June 2026

Initial release for Windows 10 / 11 (64-bit)
Five curated models: Llama 3.2 1B & 3B, Phi-3.5 Mini, Gemma 2 2B, Mistral 7B
In-app model download with progress tracking and cancellation
Streaming chat interface with conversation history
Configurable temperature, max tokens, context length, system prompt
Cloud model support: OpenAI, Anthropic, custom OpenAI-compatible endpoints
Bring-your-own API key — stored locally, never shared
Automatic model reload on next launch
Fully offline after local model download