Building my own LLM

Yesterday I started building my own LLM because I want to avoid licensing issues for basic chatbot usages. It's super slow training these things. I won't post code here because I plan to commercialise it, but here's the progress screenshot:



I've set it to use the following parameters. Sadly my GPU is rather small so it is taking ages.

# Training hyperparameters
TEMPERATURE=0.7
TOP_K=40
TOP_P=0.95
REPEAT_PENALTY=1.1
MAX_NEW_TOKENS=200

BLOCK_SIZE=32
BATCH_SIZE=8
NUM_EPOCHS=5
LEARNING_RATE=0.001
CHECKPOINT_INTERVAL=1
USE_GPU=true

# File and directory settings
MODEL_DIR=models/
CORPUS_DIR=repo/
LOG_DIR=logs/
BACKUP_DIR=bak/
TOKENIZER_PATH=

# Internal settings
EVAL_INTERVAL=100
EVAL_ITERS=100
SEED=1337
INPUT_FILE=input.txt



Popular posts from this blog

Recent Experiments testing ChatGPT's limits and some useful prompts

Testing ChatGPT on coding - a script to make CSV to ICS

Deepseek