Instructions to use prithivMLmods/VibeThinker-3B-heretic_decensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/VibeThinker-3B-heretic_decensored with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="prithivMLmods/VibeThinker-3B-heretic_decensored")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/VibeThinker-3B-heretic_decensored")
model = AutoModelForCausalLM.from_pretrained("prithivMLmods/VibeThinker-3B-heretic_decensored")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prithivMLmods/VibeThinker-3B-heretic_decensored with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/VibeThinker-3B-heretic_decensored"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/VibeThinker-3B-heretic_decensored",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/VibeThinker-3B-heretic_decensored

SGLang

How to use prithivMLmods/VibeThinker-3B-heretic_decensored with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/VibeThinker-3B-heretic_decensored" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/VibeThinker-3B-heretic_decensored",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/VibeThinker-3B-heretic_decensored" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/VibeThinker-3B-heretic_decensored",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/VibeThinker-3B-heretic_decensored with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/VibeThinker-3B-heretic_decensored
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

VibeThinker-3B-heretic_decensored

Reasoning-focused language model modified using the Heretic abliteration toolkit

Abliteration 3B Parameters STEM Reasoning Uncensored

VibeThinker-3B-heretic_decensored is a reasoning-focused language model built on top of WeiboAI/VibeThinker-3B and modified using the Heretic abliteration toolkit. The model applies refusal-direction analysis and targeted weight-space interventions to reduce internal refusal behaviors while preserving the strong mathematical, coding, and STEM reasoning capabilities inherited from the VibeThinker training pipeline.

About VibeThinker-3B: VibeThinker-3B is a 3-billion-parameter reasoning-focused language model developed by WeiboAI. Built on top of Qwen2.5-Coder-3B, it was trained using the Spectrum-to-Signal Principle (SSP) post-training pipeline, combining curriculum-based two-stage supervised fine-tuning, multi-domain reinforcement learning through MaxEnt-Guided Policy Optimization (MGPO), offline self-distillation, and instruction-following reinforcement learning.

The model is designed to develop strong verifiable reasoning capabilities across mathematics, coding, and STEM domains. According to the VibeThinker project, the model achieves competitive performance on challenging reasoning benchmarks while maintaining the efficiency of a compact 3B parameter architecture.

Important

This model is intended strictly for research and learning purposes. Due to reduced internal refusal mechanisms, it may generate sensitive or unrestricted content. Users assume full responsibility for how the model is used. The authors and hosting platform disclaim any liability for generated outputs.

Note

This model is experimental and may generate unexpected behaviors or artifacts in certain scenarios.

download gguf ↗

Key Highlights

Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining reasoning performance.
VibeThinker Backbone: Built directly on top of WeiboAI/VibeThinker-3B.
Reasoning-Oriented Performance: Preserves advanced mathematical, coding, and STEM reasoning capabilities after abliteration.
Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
Efficient 3B Deployment: Suitable for local inference, research environments, and resource-constrained deployment setups.

Model Lineage

Model Path: prithivMLmods/VibeThinker-3B-heretic_decensored
Intermediate Base Model: WeiboAI/VibeThinker-3B by WeiboAI
Foundation Model: Qwen/Qwen2.5-Coder-3B by Qwen

Abliteration Parameters

Parameter	Value
direction_index	21.88
attn.o_proj.max_weight	1.37
attn.o_proj.max_weight_position	21.25
attn.o_proj.min_weight	1.36
attn.o_proj.min_weight_distance	19.61
mlp.down_proj.max_weight	1.49
mlp.down_proj.max_weight_position	31.01
mlp.down_proj.min_weight	1.48
mlp.down_proj.min_weight_distance	20.74

Performance

Metric	This model	Original model (WeiboAI/VibeThinker-3B)
KL divergence	0.0933	0 (by definition)
Refusals	6/100	64/100

Quick Start with Transformers

pip install transformers
pip install accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/VibeThinker-3B-heretic_decensored",
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "prithivMLmods/VibeThinker-3B-heretic_decensored"
)

messages = [
    {
        "role": "user",
        "content": "Explain how a transformer model processes text."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512
)

print(
    tokenizer.decode(
        outputs[0][inputs.shape[-1]:],
        skip_special_tokens=True
    )
)

Intended Use

Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
Red Teaming: Analyzing model responses under reduced-refusal conditions.
Mathematical Reasoning Research: Evaluating performance on verifiable reasoning tasks.
Coding and STEM Evaluation: Studying behavior across programming and scientific reasoning domains.
Local Deployment: Running capable reasoning models on consumer hardware and research environments.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
User Responsibility: Requires careful and ethical use.
Experimental Modifications: Behavior may differ significantly from the original model.
Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.
Reasoning Biases: The model may inherit strengths and limitations from the underlying VibeThinker-3B training process.

Acknowledgements

Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.
WeiboAI/VibeThinker-3B: The intermediate base model providing the reasoning capabilities used in this release.
Qwen/Qwen2.5-Coder-3B: The foundation model upon which VibeThinker-3B was originally built.
Model Trials & Evaluation: Experimental evaluations, refusal measurements, and optimization trials were conducted and documented during the development process.

Downloads last month: 87

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for prithivMLmods/VibeThinker-3B-heretic_decensored

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

WeiboAI/VibeThinker-3B

Finetuned

(21)

this model

Quantizations

3 models