Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone dare to guess how fast would GH copilot be if it ran locally ?

My main problem with copilot is latency/speed - I would shell out for 4090 if it meant I could use local copilot model that's super fast/low latency/explores deep suggestions.



One of the weaknesses of transformers is that inference is inherently serial, one token at a time, and the entire model needs to be read once per token. This means that inference (after prompt processing) is bounded by memory bandwidth instead of compute.

That said, local solutions always tend to be lower latency than cloud ones just because you get to skip the network.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: