Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

4706

🐯 Chitos — The Security Scanner That Actually Proves It

Most security scanners hand you a suspect list and walk away. That gap between detection and proof is where attackers live — and it's exactly the gap that Chitos was built to close.

Chitos is the successor to Mythos, a static analyzer built for quick code health checks. Mythos was good at pattern matching — spotting dangerous sinks, mapping CWEs, producing readable reports. But static analysis has a structural ceiling. A rule that sees eval(user_input) can tell you that looks dangerous. It cannot tell you whether the input is reachable, whether sanitization three layers up covers this path, or whether there's a live exploit chain for your exact framework version. Chitos was built to answer those questions.

🔍 Phase 1 applies 50 language-agnostic rules across Python, JavaScript, Go, Java, C/C++, Rust, PHP, YAML and more — covering injection sinks, deserialization gadgets, credential leakage, broken crypto, and prototype pollution. Every candidate is re-verified before reaching the report. Findings that can't be substantiated are excluded, not handed to you as noise.

🔬 Phase 2 dispatches an autonomous web-search agent to hunt live CVE databases, exploit advisories, and public PoC repositories. It formulates hypotheses, verifies them, and synthesizes a structured threat narrative. This phase needs a user-supplied Claude API key — Phases 1 and 3 run entirely free.

🎯 Phase 3 is where Chitos diverges from everything else. Against targets you own or are authorized to test, it fires real payloads — XSS, SQLi, path traversal, command injection — mutates on block, captures hard evidence, and connects every proven finding into a kill-chain showing which vulnerabilities to remediate first.

No installation. No account. No code sent to third-party APIs.

Article: https://huggingface.co/blog/FINAL-Bench/chitos

Try it now 👉 https://chitos.vidraft.net

ginigen-ai

posted an update about 7 hours ago

Post

922

🧠 Does your LLM know when it's about to be wrong?

Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. 🎉

The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal.

Two independent axes (never compared across a row): ① trap_rate — does it fall for tempting trap options? (lower = stronger) ② adapter gain Δ — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value)

What's open: 📊 300+100 trap problems (each with a hidden trap + TICOS type) 🏆 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong))

Submit any HF model → auto-scored daily at 09:00 KST and added to the board.

🏆 Leaderboard → ginigen-ai/Metacognition-Leaderboard-Space

📊 Benchmark → ginigen-ai/Metacognition-Bench

🧩 Adapters → FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961

📊 Article → https://huggingface.co/blog/ginigen-ai/metacognition

Benchmark by ginigen-ai · Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).

ginigen-ai

posted an update 2 days ago

Post

4208

🍳 The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

👉 ginigen-ai/robocasa-kitchen-leaderboard

AxionLab-official

posted an update 2 days ago

Post

4854

⚠️ Community Notice

We would like to clarify that SupraLabs has no affiliation, partnership, or connection whatsoever with "SupraLarps" or its members.

Please avoid interacting with their organization, repositories, or Spaces under the assumption that they are associated with us.

We are currently aware of the situation and have already contacted the appropriate channels to address it.

Thank you to everyone who continues to support SupraLabs. ❤️

11 replies

Banaxi-Tech

posted an update 3 days ago

Post

3805

Hello AI Community! 👋

We have just released BananaMind 1.5 Base and it outperforms other models at its size.
It outperforms GPT 2 124M while being ~50M params smaller

Check it out: BananaMind/BananaMind-1.5-Base

OLD POST CONTENTS EDITED:
We currently have a new AI Model and we are currently training it.
We are training it on 27B tokens and are currently 8% done.
Follow us to be notified when it releases 🚀
Some Info:
Parameters 75M
GPU: RTX Pro 6000
We expect to be able to release it in the coming dayshttps://huggingface.co/BananaMind/BananaMind-1.5-Base

32 replies

stas

posted an update 1 day ago

Post

1490

After many months of intense work the
Snowflake AI Research team is happy to present to you the new open source project: Arctic RL

https://snowflake.com/en/blog/engineering/arctic-rl-open-source-backend/

- Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required
- ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs
- Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions
- Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%

4 replies

mmhamdy

posted an update 6 days ago

Post

225

It has been more than a decade now since the knowledge distillation paper came out.

Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time).

The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience.

First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred.

It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher)

Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here!

If you had to choose another name for Knowledge Distillation, what would it be?

5 replies

its5Q

posted an update 18 days ago

Post

154

I have downloaded about 8.7 million of 2048x1024 (or 1664x832 for older captures) panoramas randomly distributed all over the world that I toyed around with training geo-guessing models. Is anybody interested in a dataset of Google Street View panorama images along with latitude-longitude coordinates?

Reubencf

posted an update about 5 hours ago

Post

Introducing Reubencf/Document_Query
powered by CohereLabs/command-a-plus-05-2026-w4a4 by cohere
just open admin page -> drop in any document -> and ask questions

ST-x-Tony

posted an update about 20 hours ago

Post

Welcome Researcher and Developers!

SKT AI Labs, we are pushing the boundaries of AI architecture and research—and today, we are thrilled to open our doors to the global research community!

We warmly welcome researchers, developers, and AI enthusiasts to join us and contribute to our R&D efforts.

🧪 What You Can Explore:

We invite you to experiment with our WMF (Weight Manifold Fusion) technology. You can test this high-dimensional fusion technique on smaller models to gain a deeper understanding of its behavior and token convergence.

❤ CHECK OUT :

SPACE : SKT-NRS/RD
EXPERIMENT : sKT-Ai-Labs/SKT-SURYA-H
DIRECT TO MAIN DISCUSSION : SKT-NRS/RD#1

🤝 Your Feedback Shapes the Future :

If it works: Fantastic! Share your results with us and contribute directly to the core vision of SKT AI Labs.

If it doesn't work: No problem at all! Your critical feedback is just as valuable to us. Every experiment and anomaly helps us refine this architecture to make it more stable and robust.

We firmly believe that true innovation stems from community collaboration and transparent testing. Let's build the future of advanced AI together. Your ideas, test results, and feedback are always welcome!

You Can Still Research and Development On WMF Only SKT-SURYA-H Model is Dismissed.

☄️Let's innovate and build together! 💡

Recently active users