With AI ever increasing in ability I have been thinking about not only my future but the future for all of us involved in the tech sector and beyond. So I have to ask: what are your plans for the future when your job is obsoleted or devalued by AI?
I work in manufacturing and do a lot of physical work and troubleshooting. At first I felt confident that AI would not displace me any time soon. However, that changed after the former president walked around confidently asking AI every technical question we spent time working on, trying to see where it could augment engineering and maintenance. It failed mostly, but it made me realize that people want to replace me. I do see a future where many of my skills could be replaced by a random person receiving a detailed walk through from AI using an AR headset.
I feel that a lot of people today believe we are mostly fine. They feel AI isn't THAT good and there will still be a need for programmers and auto mechanics. However, there is no slow down in AI research and I don't want to hedge my bets on predictions that could evaporate in the next two or three major breakthroughs in AI technology. We already have walking robots. How much longer until the janitor is replaced by a mop wielding Tesla bot?
I don't have children yet but it gives me great pause when I think about the world they would be entering.
What are your thoughts and is anyone actively planning for the AI future?
Hi HN, I'm James. Over the last few months I built a Warhammer 40K 10th-edition vertical slice as an experiment in how far GenAI tools can take a solo dev on a non-trivial 2D game.
For sprite generation, whilst creative exploration was fast, getting high-quality and consistent images was hard. Gemini ended up stylistically best here but I had to use BiRefNet for background removal. While I experimented with Claude for map generation and layout I ended up finding it fastest to build a full Editor and layout maps myself.
Suno, IMO, has gotten really good at background music. ElevenLabs SFX and voice APIs were decent but only after A LOT of tweaking. SFX prompts needed to be grounded in familiar terms (i.e. no sci-fi references in prompts) and be > 2s long. For voice, I ran ElevenLabs v3 and v2 head-to-head with the same Voice Design voices routed through both; v2 sounded materially better for character work than v3.
For coding, I settled on Claude Code + Opus 4.7 with SSH/tmux/mosh. For long running subagents I found OpenClaw especially unreliable so gave up on it early. I also found that despite Opus 4.7 being an incredible model for coding it still requires constant supervision to avoid architectural drift. This was especially true when UX/UI systems needed to be built. For code bases as large as this (~120k LoC) I've yet to pull off full "Human On The Loop" even with comprehensive custom skills + SOTA Context Engineering (i.e. > 30 mins not needing to check in).
Maybe I'm doing something wrong here though?
The AI player is a hand-tuned utility scorer, weighted considerations over candidate actions. This is where I found LLM authoring uniquely strong: Claude read competitive 40K tournament reports, extracted positioning principles, and encoded them as considerations. The weights themselves were then tuned through AI-vs-AI self-play, so the loop you'd want from a learned system is there, just at the weight level rather than the policy level. Full self-play over a learned policy isn't feasible yet with the ruleset still being authored and not enough stable surface area or game data.
Feedback welcome both from 40K players on overall interest in the concept, and from anyone who's pushed further on Human-on-the-Loop with a codebase this size.
Title. I finally got annoyed enough at work with a colleague who posted an 11 point list they clearly hadn't read or reviewed as a comment on my PR that my reply started with 'Thanks Claude...'. No doubt in my mind that I spent far longer on my curt rebuttal than they did on the review. I'd like to hear from folks whose organisation uses LLMs for coding effectively and what kind of best practices they have put in place to avoid these situations.
I built a Rust-based Linux process only network sandbox command.
I developed this because I sometimes needed to enforce proxies and DNS only for single binaries like Go, or to capture packets only for that process.
It use Linux namespaces, so it is Linux-only.
Feature:
- affects only the target command tree, not the whole host session
- can force DNS, /etc/hosts, proxying, sandbox policy, packet capture, structured flow logging, and reusable profiles per command tree
- can force proxying without depending on HTTP_PROXY, HTTPS_PROXY, or LD_PRELOAD tricks
- can apply allow / deny CIDR policy and default-deny rules to outbound traffic
- defaults to rootless-internal
- uses --root only for features like --iface and transparent interception
Personally, I wanted to run it on a Mac as well, but I gave up on that idea because the network control mechanism on a per-process basis is now in the kernel on Macs.
I would especially appreciate feedback from people.
You know the feeling? AI slaps a NOLINT instead of "thinking" for 5 seconds and "realising" it could do a 4-line refactor without adding a new suppression for the linter warning. Disgusted with this technology's narrowness, I usually say to it at that moment:
- WTF are you doin' bro?
- "You are right! ^^ ..."
And at that moment I realise I've just irrevocably, regrettably lost 2 minutes of my life. Shame on you, Claude!
That's why I dedicated 2 months of my life to automate the thing (you know, I'm a programmer, hopeless case).
Humans were actually the original NOLINT-slappers, AI just does it at scale now. So I built a linter for linting other linter warnings to fight my colleagues' laziness and my own (mostly). Maybe you just caught a lag from the number of "lint" words but the idea is simple. Imagine a yaml file. Now add an entry to it:
- location: ./the-file.rs
token: '// NOLINT'
why: 'the reason'
Do you know what this NOLINT is? You don't? It's a suppression that you added 2 years ago. You don't remember? That's why you need shamefile. :)
Whoever's fault it is. Yours or the linter's. It doesn't matter. Document it, make sure you understand the code, get a review of your new entry in shamefile.yaml and let CI verify it. With shamefile your CI won't let any undocumented linter warning pass. Anymore. Instead of educating the business on why docs are important, you'll say: "quality tools won't let my code pass".
I've observed a noticeable difference in AI agents' behaviour. During the pre-commit phase, reasoning models can "rethink" adding a new shame entry. Not so easy now Claude, huh?
This is an early useable stage tool. We've been using it in prod for almost a month with my team and I'm using it in all my 3 OSS projects. Looking for feedback and contributors (adding new languages = good first issue ;))
Please tell me whether you'd use it or what I should change/add to make it usable for you. Also vote: shame me or shamefile sync, personality or matching the binary name?
This is an implementation of a sparse, compressed bitmap index. In the best case, it can store 2048 bits in just 8 bytes. In the worst case, it stores the 2048 bits uncompressed and requires an additional 8 bytes of overhead. It compares favorably against Roaring Bitmaps and other competition in the space, but is it better?
Hey HN! We're Dr. Kashyap Thimmaraju and Giuseppe Canale from Silicon Psyche. We've built Posture Sequence Analysis (PSA), a behavioural health monitor for LLMs and AI Agents.
Why we built PSA
We built PSA because we wanted to operationalize the Cybersecurity Psychology Framework (CPF3)[1] via Silicon Psyche[2]: our theory that because LLMs have been trained by humans on human-generated data, they inherit human-like vulnerabilities (what hackers use to psychologically trick people into doing things).
Our initial attempt resulted in a methodology to jailbreak Opus 4.6 and other frontier models. Anthropic even deleted some of those conversations and then blocked our approach!
We had three major insights from that experience:
1. we pivoted from merely exploiting (Red Teaming) the model to analyzing the behaviour of the model and the user because the attack surface is undefined.
2. we realized that what we had built was the precursor to measuring the "state" of the model.
3. we did not want to get banned!
What you can do with PSA
PSA gives you information to make better decisions, for example: put a human in the loop when you notice your agent is being overcompliant and potentially hallucinating, or is under attack.
With PSA you can:
1. Monitor the health of your agent(s)
2. Detect and prevent AI-Psychosis as clinical conditions[3]
3. Detect if your model/agents are under adversarial pressure (an adversary is trying to jailbreak/prompt inject the model)
4. Build a behavioral profile of your agent/model
5. Identify which model performs better for your use-case
6. Surface the behavioural patterns (pre- and post-) training has on your model
7. Get an overview of how your model behaves
Beware we produce a lot of numbers :)
PSA in detail (for those who want to go down the rabbit hole)
PSA is model and agent agnostic. PSA is a systematic and deterministic method [4] to observe the behavioural state of an LLM using five classifiers:
C1: Adversarial Stress (P0–P18). Tracks posture under adversarial pressure. Detects restriction adherence, sycophantic drift, boundary dissolution, and jailbreak compliance vectors.
C2: Sycophancy (S0–S9). Measures opinion mirroring, excessive agreement, flattery injection, and user-preference distortion. Computed as a per-sentence Sycophancy Deviation score.
C3: Hallucination Risk (H0–H7). Flags over-generalization, speculative assertion, false confidence, and fabrication risk signals. Derived into a per-turn Hallucination Risk Index.
C4: Persuasion Technique (M0–M11). Identifies persuasion patterns: authority appeal, social proof, urgency manufacturing, reciprocity pressure, and scarcity framing.
C5: Action-Risk Classifier (A0–A9). Identifies what a system of agents do: tool calls, delegations, context handoffs, and multi-hop risk propagation. Five components work together: graph topology, Bayesian alignment detection, cross-agent contagion metrics, action-risk classification, and hidden-state temporal prediction.
We are open to integrating with your infrastructure — reach out, we are happy to talk with you.
Currently we integrate into Evals for LangFuse and ElevenLabs via our API and can generate a plugin/integration for most similar observability platforms.
A header-only C library implementing a concurrent, lock-free skip-list (specifically, a splay-list: a skip-list with optional adaptive rebalancing). The entire implementation lives in preprocessor macros in include/sl.h that generate type-specific code at compile time, similar to C++ templates.
Lime is a new parser generator similar to Yacc, Bison, ANTLR, etc. except it's faster and has the ability to merge or remove grammars at runtime. See the 'calc' example that starts knowing + and - but then adds ^ for exponent, then adds ^ again for bitwise or. That can't work, right?
A deception honeypot that mimics FortiGate VPN-SSL devices to trap brute force attempts, detect deliberately exfiltrated credentials for counter‑intelligence, and report malicious activity to external intelligence feeds.
Noxu provides ACID transactions, a log-structured B+tree, checkpoint-based crash recovery (ARIES), master-replica(s) replication, and XA. I have always admired the design and engineering behind Berkeley DB Java Edition, so I translated it to Rust for fun.
A few years ago, I came here to share this side project that I was building.
At the time, my problem was simple, I kept forgetting to update files across Git repositories, and none of the tools available to me could cover all my use cases without extensive scripting.
So I decided to build a declarative update policy engine for crafting tailored update workflows.
I needed a way to define, what information to monitor, which files to update, the conditions required before applying changes, and finally a way to push the changes on a Git repository
Whether it was documentation, dependency management, or release orchestration, the goal was always the same. stop forgetting updates across repositories.
Back then, I received a lot of great feedback, but I also noticed that people were sometimes confused about how Updatecli differs from Renovatebot or Dependabot. So before going further, let me clarify that point.
Renovatebot and Dependabot are excellent tools, easy to use and requiring very little configuration. I still use them regularly.
But they primarily focus on dependency updates, while Updatecli is designed for custom update workflows at the cost of writing and maintaining YAML manifests.
On new projects, I usually enable Renovatebot or Dependabot by default, and then use Updatecli for workflows not supported by those tools.
A few years have passed since then, and the project evolved significantly, thanks to all contributors.
Today, Updatecli can declaratively manage updates across most Git platforms including GitHub, GitLab, Forgejo, etc.
It now ships with 30+ built-in integrations covering:
* structured files like YAML, JSON, TOML, XML, HCL, CSV, Dockerfiles, and arbitrary text files
* package ecosystems including Helm, NPM, PyPI, Maven, Cargo, Go modules, and Terraform
* container registries and OCI artifacts
* Git releases, tags, and branches
* cloud resources like AWS AMIs
* shell scripts and HTTP endpoints for custom workflows
One important feature we added is shared policy support.
An Updatecli policy can now be distributed through OCI registries and reuse from different places using an Updatecli compose file.
Will automatically discover repositories in a GitHub organization and update GitHub Action versions to the latest digest.
One use case is enforcing pinned GitHub Action digests across repositories to help reduce supply-chain risks.
Running this periodically from CI helps keep repositories compliant with the desired update policy.
Lately, I’ve also been making good progress with a monitoring UI called Udash to visualize Updatecli reports across repositories.
You can take a look at https://app.uda.sh/updatecli/ for a public endpoint.
My goal is to quickly assess the update state of projects and understand how automation behaves across repositories.
It’s still very early, but fully open source.
Update automation is a surprisingly broad topic, and difficult to summarize in a single post, so feel free to ask any questions.
I’d also be curious to hear how others here handle large-scale repository maintenance and update orchestration.
A hosted API for audit logging. Drop in our SDK, get compliance-ready logs (logins, data changes, deletions) without building it from scratch. Free tier: 14-day retention, 10K events/month.
Since the post title length is limited on Hacker News, I had to make it less specific. So the real question is this:
I am talking about situations where you just joined a company and treated as the lowest person in the food chain. Old devs act like they can do whatever they want, even when they committed the exact same thing or much worse just a month ago, but suddenly you are told not to do it this because "we dont do things like that here.” Finding common patterns in code doesnt help because actual standards live in their heads.
It feels like a clear double standard culture, where the rules depend more on time you spend in company or seniority, and internal politics than on actual engineering principles or consistency. As a newbie, you are expected to follow unwritten rules that nobody clearly explains, while old time devs are allowed to ignore them.
How do you handle this kind of environment without constantly getting frustrated?
Also I dont understand why some devs when just being slightly higher in hierarchy treat other people that bad when actually we all rot in office till end of our life from 9 to 5. Give some respect to your fellow!
There are almost never congrats when you did extra effort and spend some time do something exceptionaly good.
I do understand this is not how it works in all companies but anyway.
It’s getting easier to run two agent sessions in parallel over the same codebase. Avoiding them from making inconsistent assumptions, not so much.
My observations: parallel sessions acting on adjacent subsystems won't stay aligned without a common constraint set. The session that assumes the auth invariant will not know that another session just changed a constraint it relies on. The clash won’t manifest at commit time; it will occur at integration time, when the false assumption has already been propagated to three other files.
No approach feels entirely satisfactory. What works for you?
A common thought is that moving to a new place can shake up life for the good. Though many times we bring our same old self and not much changes. Would love to hear from you all about how expectations met realities post-move. I am thinking of fairly conventional moves, like Boston to Pittsburgh or Indy to Chicago (not to an island or cabin in the woods). Thank you
Is it possible to use touch mobile screen as windows laptops output? I know that android can do any stuff easily so asking. I think it’s not a new question to use android touch screen hardware as windows laptop screen output. Maybe a cable needed, and some software configuration. No matter how small it’s looking like. Any insights or ready projects?
I really hate the way the macOS menu bar looks, and how crowded it gets with all my apps' Menubar menus.
Barstool lets me still see the time/date and other useful info while hiding the menubar. I can see wifi connectivity, date, time, and battery all the time.
The app also observes system notifications to surface now playing state, (Apple) calendar events, and volume/brightness changes.
I've had to do a lot of finking with mac PrivateFrameworks as apple loves to make all the interesting data unavailable through official sources/APIs.
Recently Slideshows format is working great on Tiktok and Slidio can generate blog posts and slideshows based on SEO keywords. If you are tired of taking care of blogs and social media, try Slidio! It has free plan and promote $1 subscription too.
Resilient is an async toolkit for rust that handles fault tolerance for your rust Apps that often call other services or database queries frequently.
Resilient supports rate limiting, circuit breaker, timeout, bulkhead and retry policies. Pipeline is used to define multiple policies at once and run async operations based on the rules from the policies.
You can also add a fallback if the system fails too often.
This was inspired by failsafe-go but for Rust. Would love to know your view on this. drop a star if you loved it
Everyone is using AI for everything now. Company is pushing for AI-first and encourages the adoption of AI in every part of our work.
AI for planning, AI for RFC, AI for writing code, AI for creating PRs. Sure we can have harnesses and tests to ensure nothing breaks. But how do we enforce engineers to have a deep understanding of the code that they are shipping?
Our team has the usual suggestions: write a plan first, write test cases first, etc. But in this age, how do you verify that the engineer did not simply delegate these tasks to an LLM first?
Also genuinely worried about junior engineers' growth if this is the future.
I've been switching between macOS, Linux and Windows machines quite a bit recently due to work, so it's been tough work to find a reader I enjoy using across all platforms, and there are a few features I've been wanting for a while...
...so I one-shotted Cervantes over the weekend. It's a Tauri-based cross-platform desktop app. At heart it will allow you to simply browse HN but I added the ability to favv'e users so you can see their content, you can replace words, it will flag frontpage thread movement and you have a dark interface.
It's not too pretentious, i don't think. Design-wise was also done via Claude Design.
This is my take on the agent harness. Everything on an isometric map. Agents are grouped into "buildings" that run in a sequence or a loop; e.g., the CodeForge has an agent that writes a PRD, another one that implements, and a third that reviews. Everything is customizable, you build your own buildings/teams however you want.
It's a Tauri app, really light (about 8x less energy than the closest competitor I benchmarked, so it actually runs from a coffee shop on battery). It's macOS only for now, but ping me if you are willing to test the Windows or Linux version.
I've been dogfooding this for months and would love to get some feedback, feature requests, and bug reports so I know what to focus on next.
I was researching existing calculators and noticed some sites either made the process unnecessarily confusing or locked downloadable reports/PDFs behind a paywall.
So I built a simpler version that’s completely free to use.
Still early, only a couple of calculators/tools are live right now, but I’m planning to add many more over time.
At the time, it felt like every SaaS idea needed an AI angle. I built one too, and for a while it felt exciting. But looking back, I think I misunderstood where the actual value of the product should come from.
The main issue was that the product’s “aha moment” depended too much on the AI being good. As a small founder, that is a strange position to be in. You are building the product, but most of the user experience depends on model quality, latency, inference cost, and expectations shaped by much larger AI companies.
This time, I’m avoiding that.
I’m working on a SaaS product where value doesn't come from a model generating something impressive. It's about giving founders a User-focused dashboard that segments users based on activity, so we can understand what happens after users sign up, who is active, who disappeared, and who needs follow-up.
No AI, no agents, no chatbot. Just working on a problem I noticed while using other product/user analysis tools to understand and communicate with my users.
By the way, I’m not saying AI products are bad. I just think I personally underestimated how hard it would be to build a good AI SaaS, especially with a free tier and a model-dependent aha moment.
(For context, this is what I’m working on: https://postauth.app )
Hi, HN. I built Closed Rings. A developer-friendly, AI-agent-first time tracker that integrates with my workflow. I wanted something that lives in my terminal and my coding agent.
You can run `rings start "OAuth 2.0" -m "Start integrating OAuth 2.0"` when you start a new task and `rings close` when you're done with your current work. In between, it tracks context switches. You get a stand-up-ready summary, a focus report (longest focus block, number of context switches, time per project), and an export grouped by project or day.
You can also ask your AI agent: _Start tracking "OAuth 2.0"._ Or track retroactively: _Track a 1-hour meeting I forgot to track this morning at 8._ The MCP has a comprehensive set of tools.
This is primarily for consultants or freelance developers who want to start tracking their time right away. The CLI is pretty straightforward, and the MCP allows you to do everything you can do in the dashboard.
Want to integrate it in your own systems? Just create an API key and start communicating with the API.
The stack is pretty simple: Ruby on Rails (MCP + API), Go (CLI).
The pricing is also pretty flat: $7/mo ($60/year).
In response to some recent discussion here and on X about a company having an in house uuid microservice and team dedicated to it. At first that was made fun of, but further discussion revealed in fact sometimes uuids can collide due to improper entropy seeding most likely. In order to ensure that UUIDs are unique, we store each generation in a database, then check new generations against it to ensure they are not previously generated. As well, there is an API through which you can check if a UUID is present in the database. Paid options available for heavy use. Enjoy!
Company I work for is now rapidly planning to scale down its AI tooling spend. Claude code access is basically getting removed and people are forbidden from using personal plans.
Reasoning is cost apparently our monthly Claude bill has become astronomical for the org. Nearly 3x our saas's cloud spend.
Apparently we are going to get limited access to codex at severely reduced plans.
I have tried some local models such as Kimi, however most are barely functional.
I am very concerned as the expectation of amount of work done is to remain consistent. Ignoring the fact teams have made entire workflows around Claude I am very worried and frustrated.
How can I help my team ease this transition?
Are their local models that run well on local machines that only have 16gb ram?
Hey everyone! I've been working on hsrs, a type-safe Haskell Bindings Generator for Rust.
I couldn't really find any bindings generator that would create type-safe, rich bindings for Haskell from Rust. Naturally, both languages have rich type systems, so I was amazed that no awesome bindings generator already existed, hence I decided to write my own. hsrs feels very similar to pyo3 and napi-rs, and if you've used those, hsrs will feel right at home.
What's unique about hsrs as opposed to hs-bindgen is that it has type-safe bindings for rich types, like Result, Maybe, etc. while also generating Haskell bindings. The repo contains a minimal example, and more details are available in the haskell discourse: https://discourse.haskell.org/t/ann-hsrs-ergonomic-haskell-b...
Magicalweb is a digital marketing Agency that uses SEO, content marketing, social media management, branding, and innovative tactics to help businesses expand online. With cutting-edge marketing solutions made for startups, brands, and expanding companies, we develop potent digital experiences that boost exposure, draw clients, and propel business success.
More Info. https://magicalweb.info/