Commit Graph

332 Commits

Author SHA1 Message Date
Meng, Hengyu b864b50ce5
[SYCL] Align GEMM dispatch (#7566)
* align GEMM dispatch
2024-05-29 07:00:24 +08:00
Aarni Koskela 9146d36fe7
Readme: add akx/ggify to tools (#1484) 2024-05-26 22:09:42 +10:00
Georgi Gerganov 74f33adf5f
readme : remove trailing space (#7469) 2024-05-23 17:43:18 +03:00
Raj Hammeer Singh Hada 8b94e799df
readme : add Bunny in supported models [no ci] (#7469) 2024-05-23 15:30:13 +03:00
Victor Nogueira dacfcebd60
readme : add GPT-NeoX + Pythia to the list of supported models (#7491) 2024-05-23 15:12:43 +03:00
Georgi Gerganov fabf30b4c4
llama : remove Persimmon (#7408)
* llama : remove Persimmon

* requirements : remove
2024-05-21 02:35:28 +10:00
Bingan 26cd4237bc
Update README.md (#7410) 2024-05-20 11:55:34 +02:00
slaren d359f30921
llama : remove MPI backend (#7395) 2024-05-20 01:17:03 +02:00
Gavin Zhao 82ca83db3c
ROCm: use native CMake HIP support (#5966)
Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
2024-05-17 17:03:03 +02:00
Vaibhav Srivastav ad52d5c259
doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288)
* chore: add references to the quantisation space.

* fix grammer lol.

* Update README.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-16 15:38:43 +10:00
Daniel Bevenius 8f7080bf48
readme : remove stray double quote (#7310)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-15 23:41:03 +02:00
Andrei d11afd6652
llava : fix moondream support (#7163)
* Revert "Revert "llava : add support for moondream vision language model (#6899)""

This reverts commit 9da243b36a.

* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00
Georgi Gerganov d46dbc76f8
readme : add scheduled server workflow status badge 2024-05-09 16:40:42 +03:00
l3utterfly 0961d86604
readme : add app (#6371)
* added Layla to supported UIs

* Update README.md
2024-05-09 16:32:40 +03:00
Georgi Gerganov 9da243b36a
Revert "llava : add support for moondream vision language model (#6899)"
This reverts commit 46e12c4692.
2024-05-08 22:14:39 +03:00
Jeximo c780e75305
Further tidy on Android instructions README.md (#7077)
* Further tidy on Android instructions README.md

Fixed some logic when following readme direction

* Clean up redundent information

A new user arriving will see simple directions on llama.cpp homepage

* corrected puncuation

Period after cmake, colon after termux

* re-word for clarity

method seems to be more correct, instead of alternative in this context

* Organized required packages per build type

building llama.cpp with NDK on a pc doesn't require installing clang, cmake, git, or wget in termux.

* README.md

corrected title

* fix trailing whitespace
2024-05-08 02:26:43 +02:00
Georgi Gerganov 53d6c52e22
readme : update hot topics 2024-05-07 21:43:13 +03:00
Georgi Gerganov bcdee0daa7
minor : fix trailing whitespace 2024-05-06 09:31:30 +03:00
Lyle Dean ca36326020
readme : add note that LLaMA 3 is not supported with convert.py (#7065) 2024-05-05 08:21:46 +03:00
Jeximo cf768b7e71
Tidy Android Instructions README.md (#7016)
* Tidy Android Instructions README.md

Remove CLBlast instructions(outdated), added OpenBlas.

* don't assume git is installed

Added apt install git, so that git clone works

* removed OpenBlas

Linked to Linux build instructions

* fix typo

Remove word "run"

* correct style

Co-authored-by: slaren <slarengh@gmail.com>

* correct grammar

Co-authored-by: slaren <slarengh@gmail.com>

* delete reference to Android API

* remove Fdroid reference, link directly to Termux

Fdroid is not required

Co-authored-by: slaren <slarengh@gmail.com>

* Update README.md

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-04 18:10:15 +02:00
Olivier Chafik b8a7a5a90f
build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964)
* readme: cmake . -B build && cmake --build build

* build: fix typo

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* build: drop implicit . from cmake config command

* build: remove another superfluous .

* build: update MinGW cmake commands

* Update README-sycl.md

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* build: reinstate --config Release as not the default w/ some generators + document how to build Debug

* build: revert more --config Release

* build: nit / remove -H from cmake example

* build: reword debug instructions around single/multi config split

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-04-29 17:02:45 +01:00
Georgi Gerganov 24affa7db3
readme : update hot topics 2024-04-29 17:06:19 +03:00
vik 46e12c4692
llava : add support for moondream vision language model (#6899)
* add support for moondream vision language model

This required making the following changes to the CLIP model:

1. Support for patch embedding bias.
2. Make class embedding and pre-layernorm optional.
3. Add support for post-layernorm.

* Update examples/llava/clip.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-25 22:38:31 +03:00
BarfingLemurs 3fe0596c18
readme : update model list (#6908)
* Update README.md

* missing space

* llama3 !
2024-04-25 16:52:28 +03:00
Johannes Gäßler 784e11dea1
README: add graphic for matrix multiplication (#6881) 2024-04-24 21:29:13 +02:00
Georgi Gerganov 40f74e4d73
llama : add option to render special/control tokens (#6807)
* make : fix common dep on llama.h

* llama : add option to render special tokens

* readme : add API change notice

ggml-ci

* swift : fix build
2024-04-21 18:36:45 +03:00
Jan Boon e8d35f47cb
doc : add link to falcon (#6789) 2024-04-21 15:35:40 +03:00
Mohammadreza Hendiani 2cca09d509
readme : add Fedora instructions (#6783)
* added fedora to list of distros that may need the package (the packages have the same name on Fedora)

* how to add clblast that is avalible in the fedora repos
2024-04-21 15:32:05 +03:00
nopperl 9958c81b79
Implement the OLMo architecture (#6741)
* implement olmo architecture

* remove unused variable

* remove unused moe branch

* remove check for weight

* remove superfluous moe, bias and rope tensors

* clarified comment

* fix clamp_kqv setting

* remove obsolete parameter name filter
2024-04-19 11:35:54 +02:00
Yaroslav 8dd1ec8b3f
readme : add UI (#6724)
* Update README.md

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-17 15:47:50 +03:00
Pierrick Hymbert 4bd0f93e4a
model: support arch `DbrxForCausalLM` (#6515)
* model: dbrx convert to gguf
#6344

* llama: support dbrx
#6344

* doc: dbrx: add the model as supported

* scripts: get-wikitext-2 add unzip

* llama: increase maximum experts allowed

* llama: factorize moe graph implementation between grok, mixtral and dbrx


---------

Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Rene Leonhardt 5c4d767ac0
chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
Pierrick Hymbert 67fac4b95f
docs : how to add a model (#6565)
* docs: how to add a model

* docs: model: typo and docs

* docs: model: add prevision on RoPE

* docs: model: rephrasing README.md

* docs: model: rephrasing README.md

* docs: model: README.md fix trailing spaces

* docs : some fixes

* Update README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-10 09:58:48 +03:00
Artem Zinnatullin 29122d32ac
readme : fix ROCm link (#6579) 2024-04-10 09:49:12 +03:00
sjxx b231b37b09
readme : update UI list (#6560) 2024-04-10 09:34:00 +03:00
Jiří Sejkora ba5e134e07
readme: fix typo in amdgpu target name (#6573) 2024-04-10 00:23:02 +02:00
Jan Boon beea6e1b16
llama : save and restore kv cache for single seq id (#6341)
* llama : save and restore kv cache for single seq id

* remove trailing whitespace

* respond error in case there's no space in the kv cache

* add kv seq save restore to test case

* add --slot-save-path arg to enable save restore and restrict save location

* Returning 0 for some cases, instead of asserting.

* cleanup error cases

* rename sequence state functions

* rename state get set functions

* add previous function names back in with DEPRECATED notice

* update doc

* adjust endpoints to preferred style

* fix restoring zero cell count

* handle seq rm return value

* unused param

* keep in the size check

* fix return types

* add server test case for slot save restore

* cleanup

* add cake

* cleanup style

* add special

* removing a whole sequence never fails

* move sequence state file functionality from server to llama to match session api and add version tags

* catch exceptions on save as well

* error log messages

* check types for stricter restore

* update server doc

* readme : update API changes date

* strict filename validation

* move include, reject bom as well

* also reject empty filename

* reject whitespace and trailing dot

---------

Co-authored-by: Martin Evans <martindevans@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-08 15:43:30 +03:00
Firat d752327c33
Adding KodiBot to UI list (#6535)
KodiBot is free and open source ai chat app released under the GNU General Public License.
2024-04-08 09:48:29 +02:00
Mark Fairbairn 855f54402e
Change Windows AMD example to release build to make inference much faster. (#6525) 2024-04-07 20:52:19 +02:00
DAN™ e0717e751e
Add GritLM as supported models. (#6513) 2024-04-07 19:33:59 +02:00
Hoang Nguyen d0f5deebf8
readme : update UI list (#6503)
* Add MindMac to UI list

* Update proprietary description

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05 21:39:43 +03:00
alexpinel a307375c02
readme : add Dot to UI list (#6487) 2024-04-04 13:22:50 -04:00
Jun Jie b660a5729e
readme : fix typo (#6481) 2024-04-04 13:16:37 -04:00
bryanSwk bb43cf7e9d
llama : add SEA-LION support (#6448)
* initial commit for sealion support

* add sealion support

* minor fix

* q/k ln and pos_embd only if required

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* minor : clear whitespaces

---------

Co-authored-by: bryan <bryansiow@aisingapore.org>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-03 21:05:10 +03:00
Francisco Melo 154d4ee39c
readme : add feature-rich rust bindings (#6465) 2024-04-03 20:53:37 +03:00
Georgi Gerganov 076b08649e
readme : update hot topics 2024-04-03 16:11:15 +03:00
Georgi Gerganov c50a82ce0f
readme : update hot topics 2024-03-31 11:56:30 +03:00
0cc4m ba0c7c70ab
Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning
2024-03-29 17:29:21 +01:00
hxer7963 069574775c
[Model] Add support for xverse (#6301)
* Support xverse model convert to gguf format.

* 1. Convert xverse models to gguf;
2. Add LLM_ARCH_XVERSE inference in llama.cpp;
3. Add xverse item in Supported models in README.md;

* * gguf-py: remove redundant logs
* llama: remove the init_mapping_prefetch custom parameter

* llama.cpp: Include the changes from #6122 to exclude the unused outputs of the last layers.

* - Fix format issues
- Remove duplicate set kqv_out to llm_build_kv

* Update llama.cpp

---------

Co-authored-by: willhe <willhe@xverse.cn>
Co-authored-by: willhe <hexin@xverse.cn>
2024-03-29 14:37:03 +01:00
zhouwg b910287954
readme : add project (#6356)
* readme: add Android UI binding

* Update README.md
2024-03-29 09:33:46 +02:00