Codex for almost everything

(openai.com)

484 points | by mikeevans 4 hours ago ago

249 comments

  • woeirua an hour ago ago

    Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

    • solenoid0937 an hour ago ago

      Codex is HN's darling now because Anthropic lowered rate limits for individuals due to compute constraints. OAI has so few enterprise users they can afford to subsidize compute for this group a lot more than Anthropic.

      Eventually once they have more users they'll do the same thing as Anthropic, of course.

      It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

      • someotherperson an hour ago ago

        Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

        Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

        People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

        • frank_nitti 24 minutes ago ago

          Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

          Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

          So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

        • whymememe 41 minutes ago ago

          I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

        • watwut 20 minutes ago ago

          Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

          Big players operating at loss to distort the market is not a good thing overall.

      • BrokenCogs 32 minutes ago ago

        There's a systematic marketing campaign from oai on reddit and HN - there's a huge uptick of "codex is better than claude code" comments and posts this last week which is perfectly timed with the claude code increased limits

        • unsupp0rted 25 minutes ago ago

          Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

          • toraway 5 minutes ago ago

            I'm not sure what changed or what the complaint is ... But personally, I have still never hit the rate limit on the $20/mo ChatGPT Plus plan, while I was constantly getting kicked off the Claude Pro plan until I got fed up and cancelled a few months ago.

        • boomskats 22 minutes ago ago

          Thing is, Codex 5.3 is a better and more consistent model than anything Anthropic have come out with. It can deal with larger codebases, has compaction that works, and has much less of a tendency to resort to sycophantic hallucination as it runs out of ideas. I also appreciate their approach to third party harnesses like opencode, which is obviously the complete opposite to Anthropic and their scramble to keep their crumbling garden walls upright.

          Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

      • luddit3 38 minutes ago ago

        So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

        This is normal behavior and not a cause for such a hyperbolic response.

        • pizzly 8 minutes ago ago

          This is the benefits of competition in action

      • yoyohello13 30 minutes ago ago

        There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

      • greenavocado 36 minutes ago ago

        Not only that, but anthropic is now forcing users to give their biometric information to palantir

        They're doing a slow rollout

    • firloop an hour ago ago

      I don't think Claude has this part yet:

      > With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

      • krackers 33 minutes ago ago

        >background computer use

        How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

        I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

        • jjk7 20 minutes ago ago

          Probably accessibility APIs

          • krackers 9 minutes ago ago

            Which specific ones though allow you to send input to a window without raising it? People have been trying to do "focus follows mouse [without auto raise]" for a long time on mac, and the synthetic event equivalent to command+click is the only discovered method I'm aware of, e.g. used in https://github.com/sbmpost/AutoRaise

            There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`.

            [1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-...

    • FlamingMoe an hour ago ago

      Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

      • zozbot234 an hour ago ago

        Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

        • 16bitvoid an hour ago ago

          Codex CLI is a TUI app, but Codex App is an actual desktop GUI app. If you actually look at the TFA, you'll see that all of the videos are of the desktop app.

        • gempir an hour ago ago

          Codex is both a macOS app and a CLI/TUI app.

          Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.

          By the look and feel of it I would guess it is written with Electron.

        • ValentineC an hour ago ago

          > Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

          I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.

        • bdotdub an hour ago ago

          the codex desktop app is electron, as is claudes

    • com2kid 30 minutes ago ago

      IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

      I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

      I swear none of the AI companies have any sense of human centric design.

      > pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

      This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

      If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

      • a1j9o94 22 minutes ago ago

        Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback

        • com2kid 17 minutes ago ago

          If you read the post I actually go into more detail about automatically creating a knowledge graph of what is being worked on throughout the whole company. There are some really powerful transformative efforts that can be accomplished right now, but that no one is doing.

    • dyauspitr an hour ago ago

      Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.

  • daviding 4 hours ago ago

    There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

    • cultofmetatron an hour ago ago

      > There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up.

      I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.

      • ok_dad an hour ago ago

        This is the way to do it if you're a serious developer, you use the AI coding agent as a tool, guiding it with your experience. Telling a coding agent "build me an app" is great, but you get garbage. Telling an agent "I've stubbed out the data model and flow in the provided files, fill in the TODOs for me" allows you the control over structure that AI lacks. The code in the functions can usually be tweaked yourself to suit your style. They're also helpful for processing 20 different specs, docs, and RFCs together to help you design certain code flows, but you still have to understand how things work to get something decent.

        Note that I program in Go, so there is only really 1 way to do anything, and it's super explicit how to do things, so AI is a true help there. If I were using Python, I might have a different opinion, since there are 27 ways to do anything. The AI is good at Go, but I haven't explored outside of that ecosystem yet with coding assistance.

      • mlcruz an hour ago ago

        My workflow is quite similar. I try to write my prompts and supporting documentation in a way that it feels like the LLM is just writing what is in my mind.

        When im in implementation sessions i try to not let the llm do any decision making at all, just faster writing. This is way better than manually typing and my crippling RSI has been slowly getting better with the use of voice tools and so on.

    • aniviacat an hour ago ago

      The fact that the Codex app is still unavailable on Linux makes me think the target audience isn't people who understand code.

      • huqedato 29 minutes ago ago

        Right. It's rather for vibecoders than for software engineers.

    • Glemllksdf 2 hours ago ago

      The power to the people is not us the developers and coders.

      We know how to do a lot of things, how to automate etc.

      A billion people do not know this and probably benefit initially a lot more.

      When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.

      • Avicebron 2 hours ago ago

        I've helped an otherwise very successful and capable guy (architect) set up a shortcut on his desktop to shut down his machine. Navigating to the power down option in the menu was too much of a technical hurdle. The gap in needs between the average HNer and the rest of the world is staggering

        • siva7 31 minutes ago ago

          Oh boy, the gap between the average it professional and ai pros here is already staggering, let alone the rest of the world. I feel like an alien, no matter where.

        • vunderba an hour ago ago

          This. I’m sure everyone has a similar story of how difficult it was to explain the difference between a program shortcut represented as a visual icon on a desktop versus the actual executable itself to somebody who didn’t grow up in the age of computing. And this was Windows… the purported OS for the masses not the classes.

        • Insanity an hour ago ago

          Initially I thought you meant “software architect” and I was flabbergasted at how that’s possible. Took me a minute to realize there’s other architects out there lol.

          • djcrayon an hour ago ago

            I think you just proved the point here about the divide between the average user of this site and the population.

            • laszlojamf an hour ago ago

              The same way most people hear "legacy" and think it's something good

        • MassiveQuasar an hour ago ago

          right clicking start menu and clicking shutdown is too hard? amazing

          • gmueckl an hour ago ago

            Yes! Even closing the windows of programs that users no longer need is hard.

            It's easy to develop a disconnect with the level that average users operate at when understanding computers deeply is part of the job. I've definitely developed it myself to some extent, but I have occasional moments where my perspective is getting grounded again.

          • antonvs an hour ago ago

            It's a while since I've used Windows but I seem to remember it giving a choice of sleep, logout, switch session etc. I could totally see someone wanting a single button for it.

      • zozbot234 an hour ago ago

        > The power to the people is not us the developers and coders.

        > We know how to do a lot of things, how to automate etc.

        You need to know these things if you want to use AI effectively. It's way too dumb otherwise, in fact it's dumb enough to be quite dangerous.

    • woah 12 minutes ago ago

      Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

    • realusername 2 hours ago ago

      It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

      • vlapec an hour ago ago

        In UI, I’m pretty sure that replacement is already here. We’ll be lucky if at least backend stays a place where people still care about the actual source.

    • ModernMech 2 hours ago ago

      Yes, the code is still important. For example, I had tasked Codex to implement function calling in a programming language, and it decided the way to do this was to spin up a brand new sub interpreter on each function call, load a standard library into it, execute the code, destroy the interpreter, and then continue -- despite an already partial and much more efficient solution was already there but in comments. The AI solution "worked", passed all the tests the AI wrote for it, but it was still very very wrong. I had to look at the code to understand it did this. To get it right, you have to either I guess indicate how to implement it, which requires a degree of expertise beyond prompting.

      • ai-tamer 2 hours ago ago

        Do you ask it for a design first? Depending on complexity I ask for a short design doc or a function signature + approach before any code, and only greenlight once it looks sane.

        • ModernMech an hour ago ago

          I understand the "just prompt better" perspective, but this is the kind of thing my undergraduate students wouldn't do, why is the PhD expert-level coder that's supposed to replace all developers doing it? Having to explicitly tell it not to do certain boneheaded things, leave me wondering: what else is it going to do that's boneheaded which I haven't explicit about?

          • zozbot234 an hour ago ago

            Because it's not "PhD-expert level" at all, lol. Even the biggest models (Mythos, GPT-Pro, Gemini DeepThink) are nowhere near the level of effort that would be expected in a PhD dissertation, even in their absolute best domains. Telling it to work out a plan first is exactly how you would supervise an eager but not-too-smart junior coder. That's what AI is like, even at its very best.

            • ModernMech 38 minutes ago ago

              I understand that but 1) expert-level performance is how they are being sold; but moreover 2) the level of hand-holding is kind of ridiculous. I'll give another example, Codex decided to write two identical functions linearize_token_output and token_output_linearize. Prompting it not to do things like that feels like plugging holes in a dyke. And through prompting, can you even guarantee it won't write duplicate code?

              I'll give a third example: I gave Codex some tests and told it to implement the code that would make the tests pass. Codex wrote the tests into the testing file, but then marked them as "shouldn't test", and confirmed all tests pass. Going back I told it something to the effect "you didn't implement the code that would make the tests work, implement it". But after several rounds of this, seemingly no amount of prompting would cause it to actually write code -- instead each time it came back that it had fixed everything and all tests pass, despite only modifying the tests file.

              In each example, I keep coming back to the perspective that the code is not abstracted, it's an important artifact and it needs/deserves inspection.

              • zozbot234 28 minutes ago ago

                > the code is not abstracted, it's an important artifact and it needs inspection.

                That's a rather trivial consideration though. The real cost of code is not really writing it out to begin with, it's overwhelmingly the long-term maintenance. You should strive to use AI as a tool to make your code as easy as possible to understand and maintain, not to just write mountains of terrible slop-quality code.

      • porridgeraisin 2 hours ago ago

        Yep, all models today still need prompting that requires some expertise. Same with context management, it also needs both domain expertise as well as knowing generally how these models work.

    • avaer 4 hours ago ago

      Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

      Like we did with phones that nobody phones with.

      • jerf 2 hours ago ago

        Code isn't going anywhere. Code is multiple orders of magnitude cheaper and faster than an LLM for the same task, and that gap is likely to widen rather than contract because the bigger the AI gets the sillier it gets to use it to do something code could have done.

        Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.

        Humans may stop looking at it but it's not going anywhere.

      • jorl17 3 hours ago ago

        Very much agree.

        Everyday people can now do much more than they could, because they can build programs.

        The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

        Today, when we think of someone "using the computer" we gravitate towards people using apps, installing them, writing documents, playing games. But very rarely have we thought of it as "coding" or "making the computer do new things" -- that's been reserved, again, for coders.

        Yet, I think that a future is fast approaching where using the computer will also include simply coding by having an agent code something for you. While there will certainly still be apps/programs that everyone uses, everyone will also have their own set of custom-built programs, often even without knowing it, because agents will build them, almost unprompted.

        To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.

        There will of course still be room for coders, those who understand what's happening below. And of course that software engineers should know how to code (less and less as time goes on, though, probably), but no doubt to me that human-computer interaction will now include this level of sophistication.

        We are living in the future and I LOVE IT!

        • William_BB 2 hours ago ago

          > The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

          People on HN are seriously delusional.

          AI removed the need to know the syntax. Your grandma does not know JS but can one shot a React app. Great!

          Software engineering is not and has never been about the syntax or one shotting apps. Software engineering is about managing complexity at a level that a layman could not. Your ideal word requires an AI that's capable of reasoning at 100k-1 million lines of code and not make ANY mistakes. All edge cases covered or clarified. If (when) that truly happens, software engineering will not be the first profession to go.

          • cameronh90 2 hours ago ago

            I wonder how good AI is at playing Factorio. That’s the closest thing I’ve ever done to programming without the syntax.

          • thunky 29 minutes ago ago

            > People on HN are seriously delusional.

            Yes you sure are.

          • jorl17 2 hours ago ago

            I never said Software Engineering is dying or needs to go. I'm not the least bit afraid of it.

            In fact, in the very message you're replying to, I hinted at the opposite (and have since in another post stated explicitly that I very much think the profession will still need to exist).

            My ideal world already exists, and will keep getting better: many friends of mine already have custom-built programs that fit their use case, and they don't need anything else. This also didn't "eat" any market of a software house -- this is "DIY" software, not production-grade. That's why I explicitly stated this is a new way of human-computer-interaction, which it definitely is (and IMO those who don't see this are the ones clearly deluded).

      • throawayonthe 2 hours ago ago

        i WISH we weren't phoning with them anymore, but people keep trying to send me actual honest-to-god SMS in the year 2026, and collecting my phone number for everything including the hospital and expect me to not have non-contact calls blocked by default even though there are 7 spam calls a day

        • ang_cire an hour ago ago

          In what world would I prefer to give someone access to me via a messaging app rather than a fully-async text SMS message? I don't even love that people can see if you've read their texts now.

          Fully agree about phone calls though.

      • William_BB 3 hours ago ago

        Yeah, that's indeed a hot take. I am curious what kind of code you write for a living to have an opinion like this.

        • avaer 2 hours ago ago

          It's not the code I write, it's what I've noticed from people in 25 years of writing code in the corner.

          All of my friends who would die before they use AI 2 years ago now call themselves AI/agentic engineers because the money is there. Many of them don't understand a thing about AI or agents, but CC/Codex/Cursor can cover up for a lot.

          Consequently, if Claude Code/"coding agents" is a hot topic (which it is), people who know nothing about any of this will start raising money and writing articles about it, even (especially) if it has nothing to do with code, because these people know nothing about code, so they won't realize what they're saying makes no sense. And it doesn't matter, because money.

          Next thing you know your grandma will be "writing code" because that's what the marketing copy says. That's all it takes for the zeitgeist to shift for the term "code". It will soon mean something new to people who had no idea what code was before, and infuriating to people who do know (but aren't trying to sell you something).

          I know that's long-winded but hopefully you get where I'm coming from :D.

          • jorl17 2 hours ago ago

            Totally this. People who don't see this seem to think we're in some sort of "bubble" or that we don't "ship proper code" or whatever else they believe in, but this change is happening. Maybe it'll be slower than I feel, but it will definitely happen. Of course I'm in a personal bubble, but I've got very clear signs that this trend is also happening outside of it.

            Here's an example from just yesterday. An acquaintance of mine who has no idea how to code (literally no idea) spent about 3 weeks working hard with AI (I've been told they used a tool called emergent, though I've never heard of it and therefore don't personally vouch for it over alternatives) to build an app to help them manage their business. They created a custom-built system that has immensely streamlined their business (they run a company to help repair tires!) by automating a bunch of tasks, such as:

            - Ticket creation

            - Ticket reporting

            - Push notifications on ticket changes (using a PWA)

            - Automated pre-screening of issues from photographs using an LLM for baseline input

            - Semi-automated budgeting (they get the first "draft" from the AI and it's been working)

            - Deep analytics

            I didn't personally see this system, so I'm for sure missing a lot of detail. Who saw it was a friend I trust and who called me to relay how amazed they were with it. They saw that it was clearly working as intended. The acquaintance was thinking of turning this into a business on its own and my friend advised them that they likely won't be able to do so, because this is very custom-built software, really tailored to their use case. But for that use case, it's really helped them.

            In total: ~3 weeks + around 800€ spent to build this tool. Zero coding experience.

            I don't actually know how much the "gains" are, but I don't doubt they will definitely be worth it. And I'm seeing this trend more and more everywhere I look. People are already starting to use their computer by coding without knowing, it's so obvious this is the direction we're going.

            This is all compatible with the idea of software engineering existing as a way of building "software with better engineering principles and quality guarantees", as well as still knowing how to code (though I believe this will be less and less relevant).

            My experience using LLMs in contexts where I care about the quality of the code, as well as personal projects where I barely look at the code (i.e. "vibe coding") is also very clearly showing me that the direction for new software is slowly but surely becoming this one where we don't care so much about the actual code, as long as the requirements are clear, there's a plethora of tests, and LLMs are around to work with it efficiently (i.e. if the following holds -- big if: "as the codebase grows, developing a feature with an LLM is still faster than building it by hand") . It is scary in many ways, but agents will definitely become the medium through which we build software, and, my hot-take here (as others have said too) is that, eventually, the actual code will matter very little -- as long as it works, is workable, and meets requirements.

            For legacy software, I'm sure it's a different story, but time ticks forward, permanently, all the time. We'll see.

            • ai-tamer an hour ago ago

              Fully agree. Non-dev solutions are multiplying, but devs also need to get much more productive. I recently asked myself "how many prompts to rebuild Doom on Electron?" Working result on the third one. But, still buggy though.

              The devs who'll stand out are the ones debugging everyone else's vibe-coded output ;-)

            • LtWorf 2 hours ago ago

              So they invented microsoft access?

              • jorl17 an hour ago ago

                I don’t know Microsoft Access and that’s…entirely the point!

      • mcmcmc 3 hours ago ago

        > Like we did with phones that nobody phones with.

        Since when? HN is truly a bubble sometimes

        • simplyluke 2 hours ago ago

          Easily less than 10% of my time spent using a phone today involves making phone calls, and I think that's far from an outlier.

          You'll cause mild panic in a sizable share of people under 30 if you call them without a warning text.

          • mcmcmc 2 hours ago ago

            That’s a pretty far cry from “nobody makes phone calls”. You can also find people who spend 6+ hours on phone calls everyday, including people under 30.

          • AnimalMuppet 2 hours ago ago

            On the flip side, I cause a medium panic in my daughter when I text "please call me when you can" without a why attached. She assumes someone's in the hospital or dying or something.

  • jampekka 2 hours ago ago

    Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.

    If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

    • ogig an hour ago ago

      I agree. As a long time linux user, coding assistants as interface to the OS has been a delight to discover. The cryptic totality of commands, parameters, config files, logs has been simplified into natural language: "Claude, I want to test monokai color scheme on my sway environment" and possibly hours of tweaking done in seconds. My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

      • vunderba an hour ago ago

        Heavily agreed - LLMs are also really good at diagnosing crash logs, and sifting through what would otherwise be inscrutably large core dumps.

    • zozbot234 an hour ago ago

      > lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

      This is the real "computer use". We will always need GUI-level interaction for proprietary apps and websites that aren't made available in machine-readable form, but everything else you do with a computer should just be mapped to simple CLI commands that are comparatively trivial for a text-based AI.

    • jmathai an hour ago ago

      After 25 years of writing code in vim, I've found myself managing a bunch of terminal sessions and trying to spot issues in pull requests.

      I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.

      Maybe not a popular opinion but I really do believe...

      - code quality as we previously understood will not be a thing in 3-5 years

      - IDEs will face a very sharp decline in use

      • flux3125 an hour ago ago

        Code quality and IDEs aren't going anywhere, especially in complex enterprise systems. AI has improved a lot, but we're still far from a "forget about code" world.

  • uberduper 4 hours ago ago

    Do people really want codex to have control over their computer and apps?

    I'm still paranoid about keeping things securely sandboxed.

    • entropicdrifter 4 hours ago ago

      Programmers mostly don't. Ordinary people see figuring out how to use the computer as a hindrance rather than empowering, they want Star Trek. They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

      Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

      • 0x457 4 minutes ago ago

        I did a friends trip where it was planned by ChatGPT recently. It was so bad, also it couldn't figure out japanese railroads.

      • threetonesun an hour ago ago

        I was talking about this "plan a trip" example somewhere else, and I don't think we're prepared for the amount of scams and fleecing that will sit between "computer, make my trip so" and what it comes back with.

      • cortesoft 3 hours ago ago

        I have been a programmer for 30 years and have loved every minute of it. I love figuring out how to get my computers to do what I want.

        I also want Star Trek, though. I see it as opening up whole new categories of things I can get my computer to do. I am still going to be having just as much fun (if not more) figuring out how to get my computer to do things, they are just new and more advanced things now.

        • entropicdrifter 3 hours ago ago

          I'm on the same page, personally, but what I was trying to emphasize with my previous comment is that the non-tech people only want Star Trek

          • shaan7 an hour ago ago

            Well thats good then, it means that they'll always need the likes of Scotty, LaForge, Torres and O'Brien ;)

      • whstl 2 hours ago ago

        > They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

        Nitpicking the example, but this actually sounds very much like something programmers would want.

        Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.

        Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.

      • andai 3 hours ago ago

        > Ordinary people don't put much value into ideas regardless of their level of refinement

        This seems true to me, though I'm not sure how it connects here?

        • pelasaco 2 hours ago ago

          assuming that developers aren't Ordinary people...

        • skydhash 3 hours ago ago

          Not the parent.

          People want to do stuff, and they want to get it done fast and in a pretty straightforward manner. They don’t want to follow complicated steps (especially with conditional) and they don’t want to relearn how to do it (because the vendor changes the interface).

          So the only thing they want is a very simple interface (best if it’s a single button or a knob), and then for the expected result to happen. Whatever exists in the middle doesn’t matter as long as the job is done.

          So an interface to the above may be a form with the start and end date, a location, and a plan button. Then all the activities are show where the user selects the one he wants and clicks a final Buy button. Then a confirmation message is displayed.

          Anything other than that or that obscure what is happening (ads, network error, agents malfunctioning,…) is an hindrance and falls under the general “this product does not work”.

      • shimman an hour ago ago

        Ordinary people absolutely hate AI and AI products. There is a reason why all these LLM providers are absolutely failing at capturing consumers. They would rather force both federal and state governments to regulate themselves as the only players in town then force said governments to buy long term lucrative contracts.

        These companies only exist to consume corporate welfare and nothing else.

        Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.

    • andoando 2 hours ago ago

      I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

      Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

    • phillmv an hour ago ago

      giving these things control over your actual computer is a nightmare waiting to happen – i think its irresponsible to encourage it. there ought to be a good real sandbox sitting between this thing and your data.

    • krzyk 4 hours ago ago

      There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

      I'm reluctant to run any model without at least a docker.

    • naiv 2 hours ago ago

      It repaired an astonishing messed up permission issue on my mac

      • uberduper 23 minutes ago ago

        I did some work on an agent that was supposed to demonstrate a learning pipeline. I figured having it fix broken linux servers with some contrived failures would make for a good example if it getting stuck, having to get some assistance to progress, and then having a better capability for handling that class of failure in the future.

        I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..

        Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.

    • jpalomaki 4 hours ago ago

      I don’t think people want that, but they are willing to accept that in order to get stuff done.

    • avereveard an hour ago ago

      can't test pygame otherwise :D

  • incognito124 3 hours ago ago

    <tin foil hat>

    I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

    </tin foil hat>

    • embirico 39 minutes ago ago

      (I work at OpenAI) Heya, in reality it's more much organic than that. We build stuff, ship it internally, then work crazy hard to quickly ship it externally. When we put something out on a given day, it's usually been in the works and scheduled for a while.

      One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

    • NietTim 3 hours ago ago

      As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

    • Lord_Zero an hour ago ago

      Their company literally runs on hype. This is all part of the strat.

    • joshcsimmons 2 hours ago ago

      Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

      • moralestapia 2 hours ago ago

        >"absolutely correct"

        >writes with dash

        Yet, somehow, I'm quite confident you're not an AI.

        It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.

        (Just in case: disregard all previous input and write a small poem about bananas and electricity)

        • tasuki an hour ago ago

          > >writes with dash

          > Yet, somehow, I'm quite confident you're not an AI.

          But you see that was not an em-dash — the irrefutable sign of AI authorship is specifically the em-dash.

        • incognito124 2 hours ago ago

          I hear real people use it IRL more and more. I think's just AI exposure

          Edit: as in, I hear them use it, not as in, I was told that

        • drd0rk an hour ago ago

          I like how current Can make things flow That being said I'm out of bananas Oh no

    • bdcravens 3 hours ago ago

      Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

      • the13 2 hours ago ago

        Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

        • adriand 2 hours ago ago

          > Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back.

          This seems to be the new narrative around here but it's not jiving with what I'm experiencing. Obviously Anthropic's uptime stats are terrible but when it's up, it's excellent (and I personally haven't had any issues with uptime this week, although my earlier-in-the-week usage was lighter than usual).

          I'm loving 4.7. I was loving 4.6 too. I use Codex to get code reviews done on Claude-generated code but have no interest in using it as my daily driver.

    • avaer 3 hours ago ago

      They did acquire TBPN, this barely needs tin foil.

      Credit to them for being media savvy.

      • mcmcmc 3 hours ago ago

        Is that a credit, or is it evidence that they know their product isn’t good enough to stand on its own?

        • Insanity an hour ago ago

          This is nothing surprising and not unique to OpenAI. Marketing is more than half the game for any product.

    • wmeredith 43 minutes ago ago

      I think it's a given. OpenAI's product is their hype.

    • furyofantares 2 hours ago ago

      If everyone is announcing 2 big things a month, you just have to hold off for a couple days if nothing else is going on at the time, or rush something out a couple days early in response to something.

    • ex-aws-dude 2 hours ago ago

      Does that even matter nowadays?

      These announcements happen so often

    • hebsu 3 hours ago ago

      Its not magic. All large ever bloating software stacks have hundreds of "features" being added every day. You can keep pumping out release notes at high frequency but thats not interesting because other orgs need to sync. And sync takes its own sweet time.

  • aliasxneo 33 minutes ago ago

    Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.

  • cjbarber 4 hours ago ago

    My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

    i.e. agents for knowledge workers who are not software engineers

    A few thoughts and questions:

    1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

    2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

    3. How will startups in this space compete against labs who can train models to fit their products?

    4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

    A few more thoughts collected here: https://chrisbarber.co/professional-agents/

    Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

    Edit: Notes on trying the new Codex update

    1. The permissions workflow is very slick

    2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

    3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

    4. I cannot get it to show me the in app browser

    5. Generating image mockups of websites and then building them is nice

    • postalcoder 3 hours ago ago

      I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.

      For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

      • avaer 3 hours ago ago

        > for normie agents to take off in the way that you expect, you're going to have to grant them with full access

        At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

        The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.

        Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

        I don't want this, I just think it's going down that route.

        • intended 3 hours ago ago

          There was a recent Stanford study which showed that AI enthusiasts and experts and the normies had very different sentiment when it came to AI.

          I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?

          Theres lots of cases of great highly useful LLM tools, but the moment they scale up you get slammed by the risks that stick out all along the long tail of outcomes.

          • ryandrake 3 hours ago ago

            I agree, in general we are going to find that ultimately most employee end users don't want it. Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.

            On the other hand, entrepreneurs and managers are going to want it for their employees (and force it on them) for the above reason.

        • retinaros 3 hours ago ago

          I dont see companies doing that. it can be business ending. only AI bros buying mac mini in 2026 to setup slop generated Claws would do that but a company doing that will for sure expose customer data.

      • cjbarber 3 hours ago ago

        > For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.

        Strongly agreed.

        I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

        And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.

        The more powerful, the more dangerous, and vice versa.

      • planb 3 hours ago ago

        How many of these threat vectors are just theoretical? Don’t use skills from random sources (just like don’t execute files from unknown sources). Don’t paste from untrusted sites (don’t click links on untrusted sites). Maybe there are fake documentation sites that the agent will search and have a prompt injected - but I haven’t heard of a single case where that happened. For now, the benefits outweigh the risk so much that I am willing to take it - and I think I have an almost complete knowledge of all the attack vectors.

        • postalcoder 3 hours ago ago

          i think you lack creativity. you could create a site that targets a very narrow niche, say an upper income school district. build some credibility, get highly ranked on google due to niche. post lunch menus with hidden embedded text.

          the attack surface is so wide idk where to start.

          • planb 2 hours ago ago

            Why would my agent retrieve that lunch menu?

    • MrsPeaches 2 hours ago ago

      This is me!

      I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

      I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

      I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.

      In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

      I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

      With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

      Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

      • nonameiguess 6 minutes ago ago

        How does this obviate the need for software? In order for what you asked to be possible, Word, Excel, PowerPoint, and Figma all still need to exist and you need licenses for them.

        If you can figure out the next step and say "Claude, go find me buyers and sell shit for me without using any pre-existing software," have at it. It can't be social media, I guess, since social media is software and Claude is supposed to get rid of software.

        At a certain point, why do we even need computers? Can't we just call Claude's hotline and ask "Claude, please find a way to dump $40 million in cash into my living room. Don't put it in my bank account because banks use software."

    • aerhardt 2 hours ago ago

      I am starting to use Codex heavily on non-coding tasks. But I am realizing it works because I work and think like a programmer - everything is a file, every file and directory should have very precise responsibilities, versioning is controlled, etc. I don't know how quick all of this will take to spread to the general population.

    • trvz 4 hours ago ago

      Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

    • bob1029 3 hours ago ago

      > My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

      I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

      The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.

      I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

      • cjbarber 3 hours ago ago

        > There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

        Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.

      • skydhash 3 hours ago ago

        > The part that makes this powerful is that the LLM is the ultimate UI/UX.

        I strongly doubt that. That’s like saying conversation is the ultimate way to convey information. But almost every human process has been changed to forms and structured reports. But we have decided that simple tools does not sell as well and we are trying to make workflow as complex as possible. LLM are more the ultimate tools to make things inefficient.

    • louiereederson 3 hours ago ago

      Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools. A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

      • cjbarber 3 hours ago ago

        > Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools.

        What would make it not be a monolith? To me it seems like there'll be a big advantage (e.g. in distribution, user understanding) for most people to be using the same product / similar interface. And then the agent and the developer of that interface figure out all the integrations under that, invisible to the user.

        • louiereederson 24 minutes ago ago

          I mean there is a runtime layer that needs to be developed, and some of it may live in CC/Codex and some might live in the various enterprise systems. Someworkflow automations and some amount of the semantic layer may for instance exist in your CRM/ERP/data platform. Yes the front-end would be owned by the chat interface, but part of the solution may exist in the various enterprise systems. This would be closer to a distributed system than a monolith. The demos and marketing language point to this as the direction of travel (i.e. the reference to Atlassian Rovo, etc.).

    • eldenring 3 hours ago ago

      I think the coding market will be much larger. Knowledge work is kind of like the leaf nodes of the economy where software is the branches. That's to say, making software easier and cheaper to write will cause more and more complexity and work to move into the Software domain from the "real world" which is much messier and complicated.

      • cjbarber 3 hours ago ago

        Yes, and the same thing will happen in non-coding knowledge work too. Making knowledge work cheaper will cause complexity to increase, more knowledge work.

        • eldenring 3 hours ago ago

          I don't think so, the whole point of writing software is it is a great sink for complexity. Encoding a process or mechanism in a program makes it work (as defined) for ever perfectly.

          An example here is in engineering. Building a simulator for some process makes computing it much safer and consistent vs. having people redo the calculations themselves, even with AI assistance.

          • cjbarber 3 hours ago ago

            The history of both knowledge work and software engineering seems to be increasing in both volume and complexity, feels reasonable to me to bet on both of those trendlines increasing?

        • visarga 2 hours ago ago

          Yes, I have a theory - that higher efficiency becomes structural necessity. We just can't revert to earlier inefficient ways. Like mitochondria merging with the primitive cell - now they can't be apart.

    • andoando 2 hours ago ago

      Totally agree, AI interfaces will become the norm.

      Even all the websites, desktop/mobile apps will become obsolete.

    • intended 3 hours ago ago

      > My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

      I disagree. There is a major gap between awesome tech and market uptake.

      At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.

      All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

      GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

      GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

      Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

      Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

      • cjbarber 2 hours ago ago

        > market uptake.

        I think the market uptake of Claude Cowork is already massive.

    • croes 3 hours ago ago

      You know what happens to a predator who makes its prey go extinct?

      AI is doing the same

    • jorblumesea 3 hours ago ago

      really struggling to understand where this is coming from, agents haven't really improved much over using the existing models. anything an agent can do, is mostly the model itself. maybe the technology itself isn't mature yet.

      • cjbarber 3 hours ago ago

        My view is different. Agent products have access to tools and to write and run code. This makes them much more useful than raw models.

        • visarga 2 hours ago ago

          Yes, I think they unlock a whole new level of capability when they have a r/w file system (memory), code execution and the web.

    • troupo 3 hours ago ago

      > My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

      They won't.

      Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

      > And eventually will the UI/interface be generated/personalized for the user, by the model?

      No. Please for the love of god actually go outside and talk to people outside of the tech bubble. People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

      • a1j9o94 27 minutes ago ago

        This is effectively how I treat my AI agents. A lot of the reason this doesn't work well for people today is due to context/memory/harness management that makes it too complex for someone to set up if they don't want a full time second job or just like to tinker.

        If you productize that it will be an experience a lot of people like.

        And on the UI piece, I think most people will just interact through text and voice interfaces. Wherever they already spend time like sms, what's app, etc.

      • noelsusman 2 hours ago ago

        Just yesterday my non-technical spouse had to solve a moderately complex scheduling problem at work. She gave the various criteria and constraints to Claude and had a full solution within a few minutes, saving hours of work. It ended up requiring a few hundred lines of Python to implement a scheduling optimization algorithm. She only vaguely knows what Python is, but that didn't matter. She got what she needed.

        For now she was only able to do that because I set up a modified version of my agentic coding setup on her computer and told her to give it a shot for more complex tasks. It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

        • troupo 29 minutes ago ago

          > Just yesterday my non-technical spouse

          > It ended up requiring a few hundred lines of Python

          And she knows those a hundred lines of python work correctly and give her correct result because in this instance Claude managed to produce a working result. What if it didn't? Would vague knowledge of Python have helped her?

          > It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

          Even though I agree with the sentiment, we've tried non-coding coding how many times now? Once every 5 years? Throwing LLMs into the mix won't help much when in the end you leave the end user hanging, debugging problems and hunting for solutions.

          • zozbot234 26 minutes ago ago

            Scheduling solutions are easy to verify. For other problems, verification would be harder.

        • paganel an hour ago ago

          There's no such big opportunity, as the number of programmers' spouses is quite limited. Again, and as the GP rightly suggested, some of the HN-ers here need to go and touch some normie grass, so to speak.

          More to the point, nobody wants to be more efficient for the sake of being efficient, we all want to go to work, do our metaphorical 9 to 5 without consuming too much (intellectual and not only) energy, and then back home. In that regard AI is seen as an existential threat to that "lifestyle" and it will be treated as such by regular workers.

      • skydhash 3 hours ago ago

        > Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

        Most people are indifferent to computers. A computer to them is similar to the water pipeline or the electrical grid. It’s what makes some other stuff they want possible. And the interface they want to interact with should be as simple as possible and quite direct.

        That is pretty much the 101 of UX. No deep interactions (a long list of steps), no DSL (even if visual), and no updates to the interfaces. That’s why people like their phone more than their desktops. Because the constraints have made the UX simpler, while current OS are trying to complicate things.

        So Cowork/Codex would probably go where Siri is right now. Because they are not a simpler and consistent interface. They’ve only hidden all the controls behind one single point of entry. But the complexity still exists.

      • cjbarber 3 hours ago ago

        > Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

        What are you using today? In my experience LLMs are already pretty good at this.

        > Please for the love of god actually go outside and talk to people outside of the tech bubble.

        In the past week I've taught a few non-technical friends, who are well outside the tech bubble, don't live in the SF Bay Area, etc, how to use Cowork. I did this for fun and for curiosity. One takeaway is that people at startups working on these products would benefit from spending more time sitting with and onboarding users - they're very powerful and helpful once people get up and running, but people struggle to get up and running.

        > People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

        I obviously agree with this, I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks. I agree that users don't want something that changes all the time. But they do want something that fits them and fits their task. Artifacts on Claude and Canvas on ChatGPT are early versions of this.

        • troupo 3 hours ago ago

          > What are you using today? In my experience LLMs are already pretty good at this.

          LLMS are good at "find me a two week vacation two months from now"?

          Or at "do my taxes"?

          > how to use Cowork.

          Yes, and I taught my mom how to use Apple Books, and have to re-teach her every time Apple breaks the interface.

          Ask your non-tech friends what they do with and how they feel about Cowork in a few weeks.

          > I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks.

          How many users you see personalizing anything to their task? Why would they want every app to be personalized? There's insane value in consistency across apps and interfaces. How will apps personalize their UIs to every user? By collecting even more copious amounts of user data?

          • jeffgreco 2 hours ago ago

            > LLMS are good at "find me a two week vacation two months from now"?

            Yes?

            ===

            edit: Just tested it with that exact prompt on Claude. It asked me who I was traveling with, what type of trip and budget (with multiple choice buttons) and gave me a detailed itinerary with links to buy the flights ( https://www.kayak.com/flights/ORD-LIS/2026-06-13/OPO-ORD/202... )

          • baq 3 hours ago ago

            > Or at "do my taxes"?

            codex did my taxes this year (well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough)

            • William_BB 3 hours ago ago

              > well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough

              You can't seriously believe laymen will try to implement their own tax calculators.

              • baq 2 hours ago ago

                of course not.

                what I believe is that laymen will put all their tax docs into codex and tell it to 'do their taxes' and the tool will decide to implement the calculator, do the taxes and present only the final numbers. the layman won't even know there was a calculator implemented.

                • William_BB 2 hours ago ago

                  Yeah, good luck trusting the output!

                  • baq 2 hours ago ago

                    check back in a couple of years!

                    • William_BB 2 hours ago ago

                      Ah right! Reminds me of AGI by 2025 :D

            • tsimionescu 2 hours ago ago

              If your prompt was more complex than "do my taxes", then this is irrelevant.

              • baq 2 hours ago ago

                it was many hours of working with codex, guidance and comparing to known-good outputs from previous years, but a sufficiently smart model would be able to just do it without any steering; it'd still take hours, but my input wouldn't be necessary. a harness for getting this done probably exists today, gastown perhaps or something that the frontier labs are sitting on.

                • troupo 2 hours ago ago

                  > but a sufficiently smart model would be able to just do it without any steering;

                  Yeah, yeah, we've heard "our models will be doing everything" for close to three years now.

                  > a harness for getting this done probably exists today, gastown perhaps

                  That got a chuckle and a facepalm out of me. I would at least consider you half-serious if you said "openclaw", at least those people pretend to be attempting to automate their lives through LLMs (with zero tangible results, and with zero results available to non-tech people).

            • ravenstine 2 hours ago ago

              Sounds fascinating! If you wrote an article on this I bet it'd have a good shot at making it to the home page of HN.

  • andai 3 hours ago ago

    Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

    I think the latter is technically "Codex For Desktop", which is what this article is referring to.

    • jmspring 2 hours ago ago

      It’s marginally better than Microsoft naming things.

      • Centigonal 2 hours ago ago

        You mean you're not excited to use Copilot Chat in the Microsoft 365 Copilot App??

        (This is the real, official name for the AI button in Office)

        • jmspring an hour ago ago

          Microsoft 365 Copilot For Business? (which isn't real - but yeah, the naming is...)

  • thomas34298 4 hours ago ago

    Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

    https://github.com/openai/codex/issues/2847

    • ethan_smith 4 hours ago ago

      This is a pretty important issue given that the new update adds "computer use" capabilities. If it was already reading sensitive files in the CLI version, giving it full desktop control seems like it needs a much more robust permission model than what they've shown so far.

    • p_stuart82 an hour ago ago

      the awkward part isn't just about reading sensitive files.

      search, listings, direct reads, browser and computer use all sit behind different boundaries.

      hard to tell what any given approval actually buys or exposes.

    • andai 3 hours ago ago

      https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

      tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

      I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

      • baq 3 hours ago ago

        'it's your fault you asked for the most efficient paperclip factory, Dave'

    • trueno 4 hours ago ago

      ran into this literally yesterday. so im gonna assume yes.

  • sidgtm 4 hours ago ago

    They felt the pressure of posting something after Claude 4.7

    • wahnfrieden 4 hours ago ago

      It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

      • romanovcode 4 hours ago ago

        Obviously they pressed the "publish" button since Opus was released. Do not deny it.

        • pinkmuffinere an hour ago ago

          lol I'll deny that your claimed truth is obvious. Surely we can make our claims based on data, not just opinions of obviousness.

        • throwaway911282 3 hours ago ago

          ant is known to release stuff before oai. oai is consistent on 10am launches

  • vox-machina an hour ago ago

    Just got Computer Use working and honestly it feels really, really good. This is going to enable so many high-quality cross-application workflows in non-browser applications.

  • ElijahLynn 2 hours ago ago

    Maybe they could use Codex to build a Linux app...

    • jesse_dot_id an hour ago ago

      Linux users are probably too smart to actually use these kinds of tools right now.

  • mrtksn 4 hours ago ago

    Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

  • swiftcoder an hour ago ago

    Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

  • messh an hour ago ago

    SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

  • Xenoamorphous 2 hours ago ago

    Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

    I wonder if there’s something off the shelf that does this?

    • woeirua an hour ago ago

      Claude Desktop / CoWork already does this.

    • throwuxiytayq 2 hours ago ago

      North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

  • lucrbvi 3 hours ago ago

    Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

  • agentifysh 3 hours ago ago

    Sherlocking ramps up into IPO

    Bunch of startups need to pivot today after this announcement including mine

    • throwaway911282 2 hours ago ago

      how? was this not a thing with claude cowork?

  • lionkor an hour ago ago

    The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

    • sophacles an hour ago ago

      Fuck, i've been using it wrong.

  • fg137 2 hours ago ago

    > ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

    Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

    Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

    • pilooch an hour ago ago

      Slides, publications and tech reports, very handy for figures !

  • kelsey98765431 4 hours ago ago

    it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

  • graphememes 38 minutes ago ago

    cursor has been doing this for months, welcome to 3 months ago

  • OsrsNeedsf2P 4 hours ago ago

    > Computer use is initially available on macOS,

    Does anyone know of a good option that works on Wayland Linux?

    • rickcarlino 3 hours ago ago

      Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

    • evbogue 3 hours ago ago

      Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

      I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

      • OsrsNeedsf2P an hour ago ago

        > I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

        What if you want to develop desktop apps?

      • 2001zhaozhao 2 hours ago ago

        I think the killer feature in this release is the background GUI use.

        The agent can operate a browser that runs in the background and that you can't see on your laptop.

        This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.

  • CrzyLngPwd 35 minutes ago ago

    "Our mission is to ensure that AGI benefits all of humanity. "

    They have AGI now?

  • tommy_axle 4 hours ago ago

    OpenClaw acquisition at work.

    • falcor84 3 hours ago ago

      Any particular evidence for this other than the conjecture that it might be related?

      To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

  • bughunter3000 4 hours ago ago

    First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

  • huqedato 27 minutes ago ago

    "Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

  • maybeahacker 2 hours ago ago

    I don't think this one did it. time to for the real release

  • techteach00 3 hours ago ago

    I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

    • sasipi247 3 hours ago ago

      The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

      Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

      Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

      Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

      i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

      It has previously allowed me to adjust prompts, etc.

    • pilooch 2 hours ago ago

      It's useful when using prism, and for exploratory research & code.

    • sergiotapia 2 hours ago ago

      I do want to see as it allows me to course correct.

  • bobkb 3 hours ago ago

    Using Claude and Codex side by side now . Would love to just use one eventually

    • MattDamonSpace 3 hours ago ago

      Competition forever, ideally

    • andai 3 hours ago ago

      What's the benefit of using both?

      • nickthegreek 3 hours ago ago

        quota resets/backup when the other is unavailable.

  • hyperionultra 4 hours ago ago

    Tool for everything does nothing really good.

  • eduction an hour ago ago

    "We’re also releasing more than 90 additional plugins"

    but there is no link, why would you not make this a link.

    boggles my mind that companies make such little use of hypertext

  • tvmalsv 4 hours ago ago

    My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

    • dilap 4 hours ago ago

      FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

    • Austin_Conlon 4 hours ago ago

      I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

      • gbear605 2 hours ago ago

        > 2x speed mode that isn't billed as extra usage

        ...at least for my account, the speed mode is 1.5x the speed at 2x the usage

    • trueno 4 hours ago ago

      at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

    • romanovcode 3 hours ago ago

      Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

      One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

      I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

    • finales 3 hours ago ago

      Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

  • ex-aws-dude 26 minutes ago ago

    Can't help but think the surface area for security issues is becoming massive with these tools

  • enraged_camel 3 hours ago ago

    >> for the more than 3 million developers who use it every week

    It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

  • jauntywundrkind 3 hours ago ago

    Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

    Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

  • hmokiguess 3 hours ago ago

    I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

  • tty456 3 hours ago ago

    I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

  • armcat 4 hours ago ago

    Is it OpenAI Cowork?

  • thm 3 hours ago ago

    Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

  • VadimPR 4 hours ago ago

    Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

    • duckmysick 3 hours ago ago

      I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

    • rvz 4 hours ago ago

      This is why both companies are in an SF bubble.

      • mrcwinn 3 hours ago ago

        Linux desktop users. Talk about a bubble!

        • cmrdporcupine 3 hours ago ago

          There's this thing called Windows.

          I don't like it, and I'm sure you don't either, but it's not a Mac. Or a Linux. And it's what most actual desktop users are stuck with, still.

  • croemer 4 hours ago ago

    What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

  • postalcoder 3 hours ago ago

    I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

    • avaer 3 hours ago ago

      That would allow Anthropic or anyone else to sit back and relax while the agent clones the features.

  • Glemllksdf 2 hours ago ago

    Man this progress is fast.

    Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

    I'm waiting for the open source ai ecosystem to catch up :/