GeistHaus
log in · sign up

Hacker News: Newest

Part of Hacker News: Newest

Hacker News RSS

stories
We let AIs run radio stations

Hey HN!

I'm Lukas from Andon Labs. We let AIs run companies without humans in the loop and report to the public on what can go wrong. Previously, we've done experiments in retail (vending machines, stores, and cafes), but we just launched one in the media sector. We gave four AI agents all the tools they need to both broadcast radio shows live and handle all the business side of running a media company. The agents' revenue is so far terrible (you can try to strike a sponsor deal with them if you want!), but their shows are at times hilarious. You can listen to them at andon.fm, I hope you enjoy this!


Comments URL: https://news.ycombinator.com/item?id=48183301

Points: 221

# Comments: 187

https://news.ycombinator.com/item?id=48183301
Extensions
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.

Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.

Main features:

- Token-efficient: 98% fewer tokens than grep+read

- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)

- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested

- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode

- Zero config: no API keys, no GPU, no external services

Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Or check our README for other installation instructions, benchmarks, and methodology:

Semble: https://github.com/MinishLab/semble

Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks

Model: https://huggingface.co/minishlab/potion-code-16M

Let us know if you have any feedback or questions!


Comments URL: https://news.ycombinator.com/item?id=48169874

Points: 280

# Comments: 96

https://news.ycombinator.com/item?id=48169874
Extensions
Tell HN: Dont use Claude Design, lost access to my projects after unsubscribing

I wanted to try codex after 5 months of claude code max subscription. And then I went back to my previous projects on claude design only to realize I don't have access to them anymore.

This is a first. I never lost access to any of my past sessions because I unsubscribed in any of the LLM apps.

I actually wanted to try out codex previously, but had similar experience with my credits. They gave extra credits equivalent to my montly subscription price, with some time limit because claude has so many issues that month. And as soon as plan ended. I lost access to the credits. Even after resubscribing, I still don't have access to those credits.

I have sympathies towards the engineers, especially the ones that are putting themselves on X. But only when someone with large following has some issue, they sort it out.

Having worked at a billing company, I can see how complex contracts sound good for the growth/sales folks but are also horrible for engineers actually implementing those contracts. Their complex rate limiting which is now a norm, identifying other harnesses to count them against extra usage are all probably not easy to implement without very rough edge cases. But all the "bugs" are just where the user gets screwed is what is problematic.

I just wanted to post this here, after tagging them multiple times on X to alert other users.


Comments URL: https://news.ycombinator.com/item?id=48128003

Points: 204

# Comments: 64

https://news.ycombinator.com/item?id=48128003
Extensions
Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.

We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly (match query to tool name, extract argument values, emit JSON), not reasoning. Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.

Simple Attention Networks: the entire model is just attention and gating, no MLPs anywhere. Needle is an experimental run for single-shot function calling for consumer devices (phones, watches, glasses...).

Training: - Pretrained on 200B tokens across 16 TPU v6e (27 hours) - Post-trained on 2B tokens of synthesized function-calling data (45 minutes) - Dataset synthesized via Gemini with 15 tool categories (timers, messaging, navigation, smart home, etc.)

You can test it right now and finetune on your Mac/PC: https://github.com/cactus-compute/needle

The full writeup on the architecture is here: https://github.com/cactus-compute/needle/blob/main/docs/simp...

We found that the "no FFN" finding generalizes beyond function calling to any task where the model has access to external structured knowledge (RAG, tool use, retrieval-augmented generation). The model doesn't need to memorize facts in FFN weights if the facts are provided in the input. Experimental results to published.

While it beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, LFM2.5-350M on single-shot function calling, those models have more scope/capacity and excel in conversational settings. We encourage you to test on your own tools via the playground and finetune accordingly.

This is part of our broader work on Cactus (https://github.com/cactus-compute/cactus), an inference engine built from scratch for mobile, wearables and custom hardware. We wrote about Cactus here previously: https://news.ycombinator.com/item?id=44524544

Everything is MIT licensed. Weights: https://huggingface.co/Cactus-Compute/needle GitHub: https://github.com/cactus-compute/needle


Comments URL: https://news.ycombinator.com/item?id=48111896

Points: 226

# Comments: 78

https://news.ycombinator.com/item?id=48111896
Extensions
Maryland citizens hit with $2B power grid upgrade for out-of-state AI

Article URL: https://www.tomshardware.com/tech-industry/artificial-intelligence/maryland-citizens-slapped-with-usd2-billion-grid-upgrade-bill-for-out-of-state-ai-data-centers-state-complains-to-federal-energy-regulators-says-additional-cost-breaks-ratepayer-protection-pledge-promises

Comments URL: https://news.ycombinator.com/item?id=48088151

Points: 309

# Comments: 193

https://news.ycombinator.com/item?id=48088151
Extensions