Instructions to use hoin1218/gemma-4-e2b-korean-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hoin1218/gemma-4-e2b-korean-sft with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-e2b-it")
model = PeftModel.from_pretrained(base_model, "hoin1218/gemma-4-e2b-korean-sft")

Transformers

How to use hoin1218/gemma-4-e2b-korean-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="hoin1218/gemma-4-e2b-korean-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hoin1218/gemma-4-e2b-korean-sft", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use hoin1218/gemma-4-e2b-korean-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hoin1218/gemma-4-e2b-korean-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoin1218/gemma-4-e2b-korean-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/hoin1218/gemma-4-e2b-korean-sft

SGLang

How to use hoin1218/gemma-4-e2b-korean-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hoin1218/gemma-4-e2b-korean-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoin1218/gemma-4-e2b-korean-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hoin1218/gemma-4-e2b-korean-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hoin1218/gemma-4-e2b-korean-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use hoin1218/gemma-4-e2b-korean-sft with Docker Model Runner:
```
docker model run hf.co/hoin1218/gemma-4-e2b-korean-sft
```

Gemma 4 E2B Korean SFT v3 (LoRA)

Google Gemma 4 E2B-IT (5.1B params) 모델을 9개 한국어 데이터셋으로 SFT (Supervised Fine-Tuning) 한 LoRA 어댑터입니다.

v3에서는 NVIDIA Nemotron-Personas-Korea 데이터셋(3,000건)을 추가하여 한국 페르소나 기반 멀티턴 대화 능력을 강화했습니다.

Model Details

항목	값
Base Model	google/gemma-4-e2b-it (5.1B params)
Method	LoRA (r=16, alpha=32, dropout=0.05)
Trainable Params	24.2M / 5.1B (0.47%)
Training Data	13,521 samples from 9 Korean datasets
Epochs	1
Training Time	13h 14m on Apple M4 Pro 48GB (MPS)
Framework	TRL 1.2.0 + PEFT 0.19.1 + Transformers 5.5.4

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-e2b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "hoin1218/gemma-4-e2b-korean-sft")
tokenizer = AutoTokenizer.from_pretrained("hoin1218/gemma-4-e2b-korean-sft")

# Generate
messages = [{"role": "user", "content": "한국의 사계절에 대해 설명해주세요."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data

9개 한국어 데이터셋에서 총 13,521건을 선별하여 학습했습니다.

Dataset	Samples	Format	Description
heegyu/open-korean-instructions	2,500	usr_bot	한국어 일반 Instruction
nlpai-lab/kullm-v2	1,491	instruction_input_output	KULLM v2 한국어
llami-team/Korean-OpenThoughts-114k-Normalized	1,500	question_response	한국어 추론/사고
kuotient/orca-math-word-problems-193k-korean	1,500	question_response	한국어 수학 문제
kyujinpy/KOR-OpenOrca-Platypus-v3	1,000	instruction_input_output	OpenOrca 한국어 번역
changpt/ko-lima-vicuna	1,030	sharegpt	GPT-4 생성 한국어 (고품질)
beomi/KoAlpaca-v1.1a	1,000	instruction_output	KoAlpaca 한국어
heegyu/namuwiki-extracted	500	title_text	나무위키 지식
nvidia/Nemotron-Personas-Korea	3,000	nemotron_persona	한국 페르소나 멀티턴 대화

Training Results

Final Metrics

Metric	Value
Average Loss	12.38
Last Step Loss	12.46
Best Loss	10.36 (epoch 87%)
Best Token Accuracy	69.5% (epoch 89%)
Average Token Accuracy	61.8%
Total Steps	1,691
Total Tokens	5.08M

Loss Curve

Loss
33 |*
   | *
27 |
   |
21 |  *
   |
15 |    *
   |
12 |      * * * * * * * * * * * * * * * * * * * * * * * * * * * *
   |        * * * * * * * * * * * * * * * * * * * * * * * * * * * *
10 |                                              *
   |___________________________________________________________________________
   0%    10%   20%   30%   40%   50%   60%   70%   80%   90%   100%  epoch

Detailed Training Log (every 10%)

Epoch	Loss	Token Acc	Learning Rate
1.2%	32.71	48.7%	1.65e-05
4.7%	14.61	60.2%	8.59e-05
10.1%	12.39	65.3%	9.94e-05
20.1%	11.37	67.6%	9.42e-05
30.2%	13.16	63.3%	8.42e-05
40.2%	12.14	65.1%	7.04e-05
50.3%	11.23	67.2%	5.44e-05
59.8%	11.66	66.6%	3.89e-05
69.8%	11.97	65.8%	2.36e-05
79.9%	11.27	67.2%	1.11e-05
89.9%	11.28	67.2%	3.00e-06
100.0%	12.46	64.9%	6.12e-09

v2 vs v3 Comparison

Metric	v2 (8 datasets)	v3 (9 datasets)
Training Data	10,521 samples	13,521 samples
Avg Loss	12.80	12.38
Best Loss	10.24	10.36
Best Accuracy	69.8%	69.5%
Total Tokens	3.54M	5.08M
Training Time	16h 27m	13h 14m

Training Configuration

# LoRA
r: 16
lora_alpha: 32
lora_dropout: 0.05
target_modules: ".*language_model.*?(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)"
bias: none
task_type: CAUSAL_LM

# Training
epochs: 1
batch_size: 1
gradient_accumulation_steps: 8
effective_batch_size: 8
learning_rate: 1.0e-4
lr_scheduler: cosine
warmup_ratio: 0.05
max_seq_length: 512
optimizer: adamw_torch
precision: fp16
gradient_checkpointing: true

Hardware

Device: Apple M4 Pro (48GB Unified Memory)
Backend: MPS (Metal Performance Shaders)
Precision: fp16 (MPS does not support bf16)
Attention: SDPA (Scaled Dot-Product Attention)

LoRA Target Modules

Gemma 4는 멀티모달 모델로, vision_tower/audio_tower에 PEFT가 지원하지 않는 Gemma4ClippableLinear 모듈이 존재합니다. 이를 우회하기 위해 target_modules를 정규식으로 language_model 하위 모듈만 매칭합니다:

.*language_model.*?(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)

Known Issues

Gemma 4는 text-only 학습에도 mm_token_type_ids가 필요합니다. 학습 시 커스텀 Gemma4DataCollator로 자동 주입합니다.
transformers>=5.5.0 필요 (5.4.x는 gemma4 모델 타입 미인식)

License

This model inherits the Gemma license from the base model.

Citation

@misc{gemma4-korean-sft-2026,
  title={Gemma 4 E2B Korean SFT v3},
  author={hoin1218},
  year={2026},
  url={https://huggingface.co/hoin1218/gemma-4-e2b-korean-sft}
}

Downloads last month: 15

hoin1218
/

gemma-4-e2b-korean-sft