llama.cpp/prompts/LLM-questions.txt

In the context of LLMs, what is "Attention"?
In the context of LLMs, what is a completion?
In the context of LLMs, what is a prompt?
In the context of LLMs, what is GELU?
In the context of LLMs, what is RELU?
In the context of LLMs, what is softmax?
In the context of LLMs, what is decoding?
In the context of LLMs, what is encoding?
In the context of LLMs, what is tokenizing?
In the context of LLMs, what is an embedding?
In the context of LLMs, what is quantization?
In the context of LLMs, what is a tensor?
In the context of LLMs, what is a sparse tensor?
In the context of LLMs, what is a vector?
In the context of LLMs, how is attention implemented?
In the context of LLMs, why is attention all you need?
In the context of LLMs, what is "RoPe" and what is it used for?
In the context of LLMs, what is "LoRA" and what is it used for?
In the context of LLMs, what are weights?
In the context of LLMs, what are biases?
In the context of LLMs, what are checkpoints?
In the context of LLMs, what is "perplexity"?
In the context of LLMs, what are models?
In the context of machine-learning, what is "catastrophic forgetting"?
In the context of machine-learning, what is "elastic weight consolidation (EWC)"?
In the context of neural nets, what is a hidden layer?
In the context of neural nets, what is a convolution?
In the context of neural nets, what is dropout?
In the context of neural nets, what is cross-entropy?
In the context of neural nets, what is over-fitting?
In the context of neural nets, what is under-fitting?
What is the difference between an interpreted computer language and a compiled computer language?
In the context of software development, what is a debugger?
When processing using a GPU, what is off-loading?
When processing using a GPU, what is a batch?
When processing using a GPU, what is a block?
When processing using a GPU, what is the difference between a batch and a block?
When processing using a GPU, what is a scratch tensor?
When processing using a GPU, what is a layer?
When processing using a GPU, what is a cache?
When processing using a GPU, what is unified memory?
When processing using a GPU, what is VRAM?
When processing using a GPU, what is a kernel?
When processing using a GPU, what is "metal"?
In the context of LLMs, what are "Zero-Shot", "One-Shot" and "Few-Shot" learning models?
In the context of LLMs, what is the "Transformer-model" architecture?
In the context of LLMs, what is "Multi-Head Attention"?
In the context of LLMs, what is "Self-Attention"?
In the context of transformer-model architectures, how do attention mechanisms use masks?