Instructions to use pszemraj/franken-gemma-4-dense-1b-untrained with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pszemraj/franken-gemma-4-dense-1b-untrained with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="pszemraj/franken-gemma-4-dense-1b-untrained") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("pszemraj/franken-gemma-4-dense-1b-untrained") model = AutoModelForMultimodalLM.from_pretrained("pszemraj/franken-gemma-4-dense-1b-untrained") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use pszemraj/franken-gemma-4-dense-1b-untrained with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "pszemraj/franken-gemma-4-dense-1b-untrained" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pszemraj/franken-gemma-4-dense-1b-untrained", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/pszemraj/franken-gemma-4-dense-1b-untrained
- SGLang
How to use pszemraj/franken-gemma-4-dense-1b-untrained with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "pszemraj/franken-gemma-4-dense-1b-untrained" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pszemraj/franken-gemma-4-dense-1b-untrained", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "pszemraj/franken-gemma-4-dense-1b-untrained" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pszemraj/franken-gemma-4-dense-1b-untrained", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use pszemraj/franken-gemma-4-dense-1b-untrained with Docker Model Runner:
docker model run hf.co/pszemraj/franken-gemma-4-dense-1b-untrained
franken-gemma-4-dense-1b: untrained
A frankenstein-init Gemma 4 (dense) image/text model with ~1b params:
assembled by weight-transplant from Gemma 3 1B (text backbone) and Gemma 4 E2B-IT (vision tower + tokenizer + processor).
Architecturally mirrors google/gemma-4-31B-it (hybrid attention head-dim, no MoE, no PLE, no shared KV) but smol
This is a trained model.. NOT!
It will not produce coherent text out of the box.
It is intended for testing fine-tuning frameworks/configurations (Axolotl, TRL, DeepSpeed, FSDP) at a 'pilot' scale
should train more easily than.. random weights though
Architecture
| component | value |
|---|---|
| hidden_size | 1152 |
| intermediate_size | 6912 |
| num_hidden_layers | 18 (15 sliding + 3 full, pattern 5:1) |
| num_attention_heads | 4 |
| num_key_value_heads | 1 |
| head_dim (sliding) | 256 |
| head_dim (global) | 512 |
| sliding_window | 1024 |
| max_position_embeddings | 32768 |
| attention_k_eq_v | True (global layers) |
| final_logit_softcapping | 30.0 |
| vocab_size | 262148 (Gemma 4 tokenizer) |
Vision tower: hidden=768, 16 layers, head_dim=64 (copied from Gemma 4 E2B-IT)
As parameter counts/modules:
=============================================================
Layer (type) Param # Trainable
=============================================================
Gemma4TextScaledWordEmbedding 301,989,888 True
ModuleList 490,237,440 True
Gemma4RMSNorm 1,152 True
Gemma4TextRotaryEmbedding -- False
Gemma4TextModel 792,228,480 True
Gemma4VisionPatchEmbedder 16,318,464 True
Gemma4VisionEncoder 151,046,144 True
Gemma4VisionPooler -- False
Gemma4VisionModel 167,364,608 True
Linear 884,736 True
Gemma4RMSNorm -- False
Gemma4MultimodalEmbedder 884,736 True
Gemma4Model 960,477,824 True
Linear 301,989,888 True
Gemma4ForConditionalGeneration 960,477,824 True
=============================================================
Total params: 960,477,824
Trainable params: 960,477,824
Non-trainable params: --
=============================================================
frankenstein component inventory
| Component | Source | Method |
|---|---|---|
| Text embeddings | gemma-3-1b-it | Direct copy + 4 rows mean-resized for Gemma 4 special tokens |
| Text MLP weights | gemma-3-1b-it | Direct copy |
| Sliding-attention Q/K/V/O | gemma-3-1b-it | Direct copy |
| Global-attention Q/K | gemma-3-1b-it | Per-head tile (256 → 512) |
| Global-attention O | gemma-3-1b-it | Per-head split-halves (preserves O @ V = O_old @ V_old at init) |
| Global-attention V | --- | Dropped (attention_k_eq_v=True; V reuses K) |
| RMSNorm weights | gemma-3-1b-it | Convention-converted (1.0 + w) |
| q_norm / k_norm | gemma-3-1b-it | Rescaled by 1/√head_dim to compensate for Gemma 4's scaling=1.0 |
| Vision tower | gemma-4-e2b-it | Direct copy |
| embed_vision projection | --- | Fresh init (shape mismatch 768→1536 vs 768→1152) |
| Tokenizer + processor | gemma-4-e2b-it | Wholesale |
License
Gemma Terms of Use apply. This is a derivative of Gemma 3 1B and Gemma 4 E2B-IT weights. See https://ai.google.dev/gemma/terms
- Downloads last month
- 5