Instructions to use HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit") model = PeftModel.from_pretrained(base_model, "HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora", max_seq_length=2048, )
SkeptiSTEM-4B-v2 Stage CD (Chat + DPO LoRA)
This is the Stage CD LoRA combining chat restoration (Stage C) and preference tuning (Stage D).
Purpose
Restores normal conversational ability after GRPO training while maintaining:
- Verification skills for suggested answers
- Structured reasoning when appropriate
- Helpful, accurate responses
Training Details
Stage C: Chat SFT
- Dataset: ultrachat_200k (~15,000 examples)
- Purpose: Restore conversational ability
- Epochs: 1
Stage D: DPO
- Dataset: ultrafeedback_binarized (~59,916 preference pairs)
- Purpose: Preference alignment
- Beta: 0.1
Expected Load Order
- Base:
HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit - Merge/apply R2:
HallD/SkeptiSTEM-4B-v2-stageR2-format-lora - Merge/apply R3:
HallD/SkeptiSTEM-4B-v2-stageR3-grpo-lora - Apply this CD adapter
Usage
from unsloth import FastLanguageModel
from peft import PeftModel
# Load base
base, tok = FastLanguageModel.from_pretrained(
"HallD/SkeptiSTEM-4B-v2-stageR1-merged-16bit",
max_seq_length=4096,
load_in_4bit=True,
)
# Merge R2 + R3
base = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR2-format-lora")
base = base.merge_and_unload()
base = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageR3-grpo-lora")
base = base.merge_and_unload()
# Apply CD
model = PeftModel.from_pretrained(base, "HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora")
FastLanguageModel.for_inference(model)
Behavior
The model now:
- Responds conversationally by default (no format tags unless asked)
- Still verifies suggestions when present in prompts
- Provides helpful, accurate, preference-aligned responses
Trained with Unsloth.
- Downloads last month
- 3
Model tree for HallD/SkeptiSTEM-4B-v2-stageCD-chat-dpo-lora
Base model
Qwen/Qwen3-4B-Base Finetuned
unsloth/Qwen3-4B-Base