Building my own LLM

April 04, 2025

Yesterday I started building my own LLM because I want to avoid licensing issues for basic chatbot usages. It's super slow training these things. I won't post code here because I plan to commercialise it, but here's the progress screenshot:

I've set it to use the following parameters. Sadly my GPU is rather small so it is taking ages.

# Training hyperparameters

TEMPERATURE=0.7

TOP_K=40

TOP_P=0.95

REPEAT_PENALTY=1.1

MAX_NEW_TOKENS=200

BLOCK_SIZE=32

BATCH_SIZE=8

NUM_EPOCHS=5

LEARNING_RATE=0.001

CHECKPOINT_INTERVAL=1

USE_GPU=true

# File and directory settings

MODEL_DIR=models/

CORPUS_DIR=repo/

LOG_DIR=logs/

BACKUP_DIR=bak/

TOKENIZER_PATH=

# Internal settings

EVAL_INTERVAL=100

EVAL_ITERS=100

SEED=1337

INPUT_FILE=input.txt

Search This Blog

John's AI blog

Building my own LLM

Popular posts from this blog

Hallucinations/inaccuracies

Access to model mistralai/Mistral-Nemo-Base-2407 is restricted. You must be authenticated to access it.