1
mirror of https://github.com/ggerganov/llama.cpp synced 2025-09-11 07:11:04 +02:00

Default Branch

00681dfc16 · CUDA: Add fastdiv to k_bin_bcast*, giving 1-3% E2E performance (#15872) · Updated 2025-09-10 22:04:03 +02:00

Branches

b6fb92f564 · cont : add flags for debugging and disabling concurrency · Updated 2025-09-10 20:48:32 +02:00

3
2

f50c60d654 · fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl · Updated 2025-09-10 20:30:22 +02:00

2
1

7e1f35ce05 · Add docker protocol support for llama-server model loading · Updated 2025-09-10 14:02:48 +02:00

10
1

333c9ead02 · llama : bump max seq limit from 64 to 256 · Updated 2025-09-10 12:53:38 +02:00

9
1

b3c75d2a72 · server : adjust prompt similarity thold + add logs · Updated 2025-09-10 10:14:38 +02:00

10
1

833d03c25d · convert : for FP8, use scale type to decide auto type · Updated 2025-09-09 20:36:34 +02:00

14
21

e582f1ac63 · convert : fix no-lazy dtypes from direct safetensors · Updated 2025-09-09 20:33:01 +02:00

14
8

0d5cfed596 · Merge branch 'master' into compilade/convert-prequant · Updated 2025-09-09 20:23:06 +02:00

14
5

3f62ee8bee · metal : back to a single queue per device · Updated 2025-09-09 16:06:46 +02:00

18
9

3f62ee8bee · metal : back to a single queue per device · Updated 2025-09-09 16:06:46 +02:00

18
9

ed4d8f22b2 · Merge branch 'master' into cisc/grok-2 · Updated 2025-09-08 23:28:08 +02:00

22
31

296ca594a2 · fix build_attn · Updated 2025-09-08 22:55:48 +02:00

23
6

5dc82f809c · cuda : format device_id from cudaDeviceProp instead of cudaDeviceGetPCIBusId · Updated 2025-09-08 15:03:10 +02:00

63
5

ff1566e2cd · explicit offline · Updated 2025-09-06 13:51:33 +02:00

48
5

053dc6b380 · use ggml_time_us · Updated 2025-09-06 12:43:43 +02:00

48
7

7b717fb4b2 · Rewrite llama-run to use llama-server · Updated 2025-09-05 18:22:36 +02:00

55
1

7915399596 · Initial plan · Updated 2025-09-05 12:26:26 +02:00

55
1

9f2636b7dc · wip · Updated 2025-09-01 10:17:56 +02:00

104
1

d8c17629ac · examples : add compare-mlx · Updated 2025-09-01 08:10:01 +02:00

107
1

4317d5abf5 · wip · Updated 2025-08-28 12:55:21 +02:00

138
1