Llama Runs on internal storage instead of RAM. Anyway to optimize this on Mac air? #833

KiwiBites · 2023-04-07T13:32:45Z

KiwiBites
Apr 7, 2023

This is very much related to the thread asking why llama 30B uses only 5.8gb of ram. After seeing that thread, I got excited to see how 30B llama model would run on my poor Mac air m1 with 8gb of ram. Well it works, but excruciatingly slow. It takes about less a minute to generate a single token. I then checked what was used with activity monitor and you guessed it, it is running with 3.9 ram but reading the disk in ~900 mb/sec. I run the same on another computer, similar low usage of ram is used and response is much quicker (2 seconds per token) with 2500 mb/sec reading speed. It seems that disk reading speed correlates positively to the responsiveness of the 30B model. I think the disk reading speed of Mac air m1 can reach about 2000 mb/sec, which may make the 30B actually viable on this 8gb ram machine if the software can utilize the max disk reading speed of Mac air m1.

I am not knowledge in this field. Is what I am proposing manageable?

(On Mac Air M1 8gb):

KiwiBites · 2023-04-07T13:34:24Z

KiwiBites
Apr 7, 2023
Author

This is the source for Mac m1 air's reading speed limit:
https://www.macrumors.com/2020/11/16/apple-silicon-macbook-air-ssd-benchmarks/

0 replies

Belluxx · 2023-04-08T15:14:58Z

Belluxx
Apr 8, 2023

This is due to the fact that all the model needs to be read in order to use it.
The recent mmap() change helps a bit with the RAM requirements, but it's very far from allowing a 30B model to run on 8GB of RAM. So i'm sorry but reading from the disk is the only option.

The 30B q4_0 model needs 15GB of RAM so the only option is to upgrade to a device with 32GB of RAM (Maybe also 16GB can work).

About the SSD speed i think that the reason why you are not seeing a few of GB/s of read speed is because llama.cpp is not reading contiguous SSD areas so it's much slower than what benchmarks show. Maybe try to lookup the M1 SSD read speed for non contiguous data.

2 replies

KiwiBites Apr 8, 2023
Author

Thanks, that is very reasonable. Running is already a miracle, I wouldn't expect a 30B model to run on 8gb at all.

Sorry for asking more, does llama 30B take ~15gb on your setup when you run it? On my laptop with (8 + 16) gb it appear to also use an outrageously low ram (~3gb) but utilize the disk as well. It seems that llama.cpp thinks my computer is deficient of ram, perhaps it is in configuration or anywhere.

When running alpaca 30B on alpaca.cpp, all of my ram is utilized and no much issue thereafter; that rule out the suspicion of a defect ram.

In addition (sorry), alpha 30B running on llama.cpp (~4gb ram) has a much worse logical reasoning than the same model running on alpaca.cpp ( 24 gb ram). When asked "The following sentence is true. The previous sentence is false. Which of the two sentences is true?", alpaca.cpp would answer correctly ("neither.") while llama.cpp answers incorrectly ("previous.").

Belluxx Apr 10, 2023

@KiwiBites No i never tried 30B, however i reccomend you to try a Vicuna model (you can find the GGMLs online on huggingface) they are way better than alpaca, even the cleaned version.

Also try to add the --mlock flag when calling llama.cpp so that it forces the use of only RAM.

If it throws an error it means that there was not enough RAM to load the model without using swap. In this case close browser, discord etc... If not enough disable everything from the autostart and reboot to flush the RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama Runs on internal storage instead of RAM. Anyway to optimize this on Mac air? #833

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Llama Runs on internal storage instead of RAM. Anyway to optimize this on Mac air? #833

KiwiBites Apr 7, 2023

Replies: 2 comments · 2 replies

KiwiBites Apr 7, 2023 Author

Belluxx Apr 8, 2023

KiwiBites Apr 8, 2023 Author

Belluxx Apr 10, 2023

KiwiBites
Apr 7, 2023

Replies: 2 comments 2 replies

KiwiBites
Apr 7, 2023
Author

Belluxx
Apr 8, 2023

KiwiBites Apr 8, 2023
Author