nvidia/Nemotron-Personas-Korea
Viewer • Updated • 1M • 13.1k • 513
How to use hoin1218/gemma-4-e2b-korean-sft with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-e2b-it")
model = PeftModel.from_pretrained(base_model, "hoin1218/gemma-4-e2b-korean-sft")How to use hoin1218/gemma-4-e2b-korean-sft with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="hoin1218/gemma-4-e2b-korean-sft")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hoin1218/gemma-4-e2b-korean-sft", dtype="auto")How to use hoin1218/gemma-4-e2b-korean-sft with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hoin1218/gemma-4-e2b-korean-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hoin1218/gemma-4-e2b-korean-sft",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/hoin1218/gemma-4-e2b-korean-sft
How to use hoin1218/gemma-4-e2b-korean-sft with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "hoin1218/gemma-4-e2b-korean-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hoin1218/gemma-4-e2b-korean-sft",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "hoin1218/gemma-4-e2b-korean-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "hoin1218/gemma-4-e2b-korean-sft",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use hoin1218/gemma-4-e2b-korean-sft with Docker Model Runner:
docker model run hf.co/hoin1218/gemma-4-e2b-korean-sft
Google Gemma 4 E2B-IT (5.1B params) 모델을 9개 한국어 데이터셋으로 SFT (Supervised Fine-Tuning) 한 LoRA 어댑터입니다.
v3에서는 NVIDIA Nemotron-Personas-Korea 데이터셋(3,000건)을 추가하여 한국 페르소나 기반 멀티턴 대화 능력을 강화했습니다.
| 항목 | 값 |
|---|---|
| Base Model | google/gemma-4-e2b-it (5.1B params) |
| Method | LoRA (r=16, alpha=32, dropout=0.05) |
| Trainable Params | 24.2M / 5.1B (0.47%) |
| Training Data | 13,521 samples from 9 Korean datasets |
| Epochs | 1 |
| Training Time | 13h 14m on Apple M4 Pro 48GB (MPS) |
| Framework | TRL 1.2.0 + PEFT 0.19.1 + Transformers 5.5.4 |
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load base model + LoRA adapter
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-e2b-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "hoin1218/gemma-4-e2b-korean-sft")
tokenizer = AutoTokenizer.from_pretrained("hoin1218/gemma-4-e2b-korean-sft")
# Generate
messages = [{"role": "user", "content": "한국의 사계절에 대해 설명해주세요."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
9개 한국어 데이터셋에서 총 13,521건을 선별하여 학습했습니다.
| Dataset | Samples | Format | Description |
|---|---|---|---|
| heegyu/open-korean-instructions | 2,500 | usr_bot | 한국어 일반 Instruction |
| nlpai-lab/kullm-v2 | 1,491 | instruction_input_output | KULLM v2 한국어 |
| llami-team/Korean-OpenThoughts-114k-Normalized | 1,500 | question_response | 한국어 추론/사고 |
| kuotient/orca-math-word-problems-193k-korean | 1,500 | question_response | 한국어 수학 문제 |
| kyujinpy/KOR-OpenOrca-Platypus-v3 | 1,000 | instruction_input_output | OpenOrca 한국어 번역 |
| changpt/ko-lima-vicuna | 1,030 | sharegpt | GPT-4 생성 한국어 (고품질) |
| beomi/KoAlpaca-v1.1a | 1,000 | instruction_output | KoAlpaca 한국어 |
| heegyu/namuwiki-extracted | 500 | title_text | 나무위키 지식 |
| nvidia/Nemotron-Personas-Korea | 3,000 | nemotron_persona | 한국 페르소나 멀티턴 대화 |
| Metric | Value |
|---|---|
| Average Loss | 12.38 |
| Last Step Loss | 12.46 |
| Best Loss | 10.36 (epoch 87%) |
| Best Token Accuracy | 69.5% (epoch 89%) |
| Average Token Accuracy | 61.8% |
| Total Steps | 1,691 |
| Total Tokens | 5.08M |
Loss
33 |*
| *
27 |
|
21 | *
|
15 | *
|
12 | * * * * * * * * * * * * * * * * * * * * * * * * * * * *
| * * * * * * * * * * * * * * * * * * * * * * * * * * * *
10 | *
|___________________________________________________________________________
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% epoch
| Epoch | Loss | Token Acc | Learning Rate |
|---|---|---|---|
| 1.2% | 32.71 | 48.7% | 1.65e-05 |
| 4.7% | 14.61 | 60.2% | 8.59e-05 |
| 10.1% | 12.39 | 65.3% | 9.94e-05 |
| 20.1% | 11.37 | 67.6% | 9.42e-05 |
| 30.2% | 13.16 | 63.3% | 8.42e-05 |
| 40.2% | 12.14 | 65.1% | 7.04e-05 |
| 50.3% | 11.23 | 67.2% | 5.44e-05 |
| 59.8% | 11.66 | 66.6% | 3.89e-05 |
| 69.8% | 11.97 | 65.8% | 2.36e-05 |
| 79.9% | 11.27 | 67.2% | 1.11e-05 |
| 89.9% | 11.28 | 67.2% | 3.00e-06 |
| 100.0% | 12.46 | 64.9% | 6.12e-09 |
| Metric | v2 (8 datasets) | v3 (9 datasets) |
|---|---|---|
| Training Data | 10,521 samples | 13,521 samples |
| Avg Loss | 12.80 | 12.38 |
| Best Loss | 10.24 | 10.36 |
| Best Accuracy | 69.8% | 69.5% |
| Total Tokens | 3.54M | 5.08M |
| Training Time | 16h 27m | 13h 14m |
# LoRA
r: 16
lora_alpha: 32
lora_dropout: 0.05
target_modules: ".*language_model.*?(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)"
bias: none
task_type: CAUSAL_LM
# Training
epochs: 1
batch_size: 1
gradient_accumulation_steps: 8
effective_batch_size: 8
learning_rate: 1.0e-4
lr_scheduler: cosine
warmup_ratio: 0.05
max_seq_length: 512
optimizer: adamw_torch
precision: fp16
gradient_checkpointing: true
Gemma 4는 멀티모달 모델로, vision_tower/audio_tower에 PEFT가 지원하지 않는 Gemma4ClippableLinear 모듈이 존재합니다. 이를 우회하기 위해 target_modules를 정규식으로 language_model 하위 모듈만 매칭합니다:
.*language_model.*?(q_proj|k_proj|v_proj|o_proj|gate_proj|up_proj|down_proj)
mm_token_type_ids가 필요합니다. 학습 시 커스텀 Gemma4DataCollator로 자동 주입합니다.transformers>=5.5.0 필요 (5.4.x는 gemma4 모델 타입 미인식)This model inherits the Gemma license from the base model.
@misc{gemma4-korean-sft-2026,
title={Gemma 4 E2B Korean SFT v3},
author={hoin1218},
year={2026},
url={https://huggingface.co/hoin1218/gemma-4-e2b-korean-sft}
}