What sort of resources do we need to run this, particularly VRAM? Also, how does...

wsxiaoys · on April 6, 2023

Tabby's philosophy is to achieve a completion rate comparable to Codex/Copilot by using a model size of less than 1B, with support for BF16/FP16 that reduces VRAM requirements to 2G or less. This may seem impossible, given that the model size is 10 times smaller than that of Codex, but it is definitely achievable, especially in an on-premises environment where customers want to keep the code behind a firewall.

Related research works include [1]. (Hints: combine code search and LLM).

[1]: RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation https://arxiv.org/abs/2303.12570

This also reveals Tabby's roadmap beyond other OSS work like Fauxipilot :)

moffkalast · on April 6, 2023

> philosophy is to achieve

That doesn't answer the question, can anyone without more VRAM than sense actually run it as-is or should we wait until they reach their allegedly impossible aspirational goal?

The very first line of this sort of post should be the specs required and if the trained model weights are actually available, otherwise it's just straight up clickbait.

sp332 · on April 6, 2023

Models are usually trained in fp16, meaning 16 bits = 4 bytes per parameter. So a 1B model would take 4GB to begin with, and can be optimised down from there with sparsification and/or quantization. A 50% reduction in size might be noticeably worse than the original, but still useful for boilerplate or other highly predictable patterns.

nacs · on April 6, 2023

The post says it uses "2GB Or less" of VRAM.

A 1B parameter transformer model is on the low/tiny-end of model size these days.