🔄 In a Training Loop

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

upvoted a collection 1 day ago

commentedon a paper 1 day ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

upvoted a paper 1 day ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

View all activity

Organizations

upvoted a collection 1 day ago

Apertus Mini

Distillations and Quantizations of our models into more compact formats (<8B parameters) • 17 items • Updated 7 days ago • 9

upvoted a paper 1 day ago

Do We Still Need Fine Tuning? Turkish Sentiment Analysis in the Era of Large Language Model

Paper • 2606.29614 • Published 4 days ago • 1

upvoted 2 papers 2 days ago

A Study of Temporal Fusion Strategies for Named Entity Recognition in Historical Texts

Paper • 2606.27881 • Published 6 days ago • 1

MultiHashFormer: Hash-based Generative Language Models

Paper • 2606.28057 • Published 6 days ago • 19

upvoted a collection 7 days ago

Qwen-AgentWorld

3 items • Updated 7 days ago • 63

upvoted an article 14 days ago

Article

GLM-5.2: Built for Long-Horizon Tasks

zai-org

•

14 days ago

• 115

upvoted a paper 16 days ago

Small LLMs: Pruning vs. Training from Scratch

Paper • 2606.14150 • Published 20 days ago • 2

upvoted a paper 19 days ago

MÖVE: A Holistic LLM Benchmark for the German Public Sector

Paper • 2606.13111 • Published 21 days ago • 2

upvoted a paper 20 days ago

On Subquadratic Architectures: From Applications to Principles

Paper • 2606.12364 • Published 21 days ago • 23

upvoted a paper 22 days ago

TiME: Tiny Monolingual Encoders for Efficient NLP Pipelines

Paper • 2512.14645 • Published Dec 16, 2025 • 1

upvoted a collection 27 days ago

KletterMix

4 items • Updated 27 days ago • 6

upvoted a paper 28 days ago

KletterMix: Climbing Toward High-Quality German Pretraining Data

Paper • 2606.03773 • Published 29 days ago • 21

upvoted a paper 30 days ago

Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing

Paper • 2605.31338 • Published May 29 • 1

upvoted 3 papers about 1 month ago

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German

Paper • 2605.30214 • Published May 28 • 1

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Paper • 2605.30343 • Published May 28 • 1

LLMSurgeon: Diagnosing Data Mixture of Large Language Models

Paper • 2605.30348 • Published May 28 • 1

upvoted a collection about 1 month ago

RFDetr

RF-DETR checkpoints converted to be used with 🤗 Transformers • 15 items • Updated May 27 • 17

upvoted 2 papers about 1 month ago

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published May 21 • 33

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published May 20 • 321

upvoted a collection about 1 month ago

Fastest timm models > 75.3% IN-1k Top-1 (Original ResNet-50)

Fastest image classification models with 75.3% accuracy in ImageNet-1k . • 21 items • Updated Sep 19, 2025 • 5