GitHub partial outage

(githubstatus.com)

187 points | by danfritz 3 days ago ago

87 comments

  • theletterf 3 days ago ago

    I was getting crazy thinking that there was something wrong with my SSH keys all of a sudden. Thanks $DEITY it's just GitHub.

    • no_wizard 3 days ago ago

      Same. I reflex replaced mine thinking it needed to be. Glad its working now though

  • arnvald 3 days ago ago

    I’m old enough to remember when GitHub was on main page due to a cool feature they added, now they just end up here when it stops working

    • alentred 3 days ago ago

      If I remember it well, every once in a while a new cool feature was also breaking stuff, doubling the chances of getting to the top page here. But truth being told, GitHub was fixing those at light speed too and it was very interesting to follow their progress. Their delivery pipeline (per branch, deliver when ready, etc.) sounded very much innovative by then and I think inspired many people.

  • numbsafari 3 days ago ago

    Anyone using GitLab have any insight on how well their operations are running these days?

    We originally left GitLab for GitHub after being bit by a major outage that resulted in data loss. Our code was saved, but we lost everything else.

    But that was almost 10 years ago at this point.

    • kaishiro 3 days ago ago

      We use GitLab on the daily. Roughly 200 repos pushing to ~20 on any given day. There have been a few small, unpublished outages that we determined were server side since we have a geo-distributed team, but as a platform seems far more stable than 5-6 years ago.

      My only real current complaint is that the webhooks that are supposed to fire in repo activity have been a little flaky for us over the past 6-8 months. We have a pretty robust chatops system in play, so these things are highly noticeable to our team. It’s generally consistent, but we’ve had hooks fail to post to our systems on a few different occasions which forced us to chase up threads until we determined our operator ingestion service never even received the hooks.

      That aside, we’re relatively happy customers.

      • gen220 3 days ago ago

        FWIW, GitHub is also unreliable with webhooks. Many recent GH outages have affected webhooks.

        They are pretty good, in my experience, at *eventually* delivering all updates. The outages take the form of a "pause" in delivery, every so often... maybe once every 5 weeks?

        Usually the outages are pretty brief but sometimes it can be up to a few hours. Basically I'm unaware of any provider whose webhooks are as reliable as their primary API. If you're obsessive about maintaining SLAs around timely state, you can't really get around maintaining some sort of fall-back poll.

        • cmckn 3 days ago ago

          > you can't really get around maintaining some sort of fall-back poll.

          This has been my experience with GitHub Actions as well, which I imagine rely on the same underlying event system as webhooks.

          Every so often, an Action will not be triggered or otherwise go into the void. So for Actions that trigger on push, I usually just add a cron schedule to them as well.

        • kaishiro 3 days ago ago

          Completely agree on all points. We've had dual remotes running on a few high traffic repos pushing to both GitLab and GitHub simultaneously as a debug mechanism and our experiences mirror yours.

    • boilerupnc 3 days ago ago

      Not sure what specific operational services are of interest - but here's a link to their historical service status [0]

      [0] https://status.gitlab.com/pages/history/5b36dc6502d06804c083...

    • geoffbp 3 days ago ago

      We’re using gitlab, loads of issues and outages, we want to go to github

    • colesantiago 3 days ago ago

      No issues on GitLab.

      Haven't seen any outage from GitLab in like, ever.

      • tux3 3 days ago ago

        That has definitely not been my experience. I like Gitlab, but they've had regular incidents all along. If a git push failed I wouldn't question it, it's almost never my network. I'd just open Gitlab's Gitlab and find the current active issue.

        To Gitlab's credit their observability seems to be good, and they do a good job communicating and resolving incidents quickly.

        Some companies that shall not be named have status pages that always show green and might as well be a static picture. Some use words like "some customers may have experienced partial service degradation" to mean "complete downtime". Gitlab also has incidents, but they're a lot more trustworthy. You can just open the issue tracker and there's the full incident complete with diagnosis.

        • colesantiago 3 days ago ago

          Hmmm.

          You must be doing GitLab wrong.

      • philipwhiuk 3 days ago ago
        • colesantiago 3 days ago ago

          Never had any problems really.

          GitHub on the other hand has outages more frequently.

    • ctkhn 3 days ago ago

      My org hosts it on prem, and while I don't like the way pages are organized for projects, I only really interact with the PR page and that is laid out well. Most of my interaction with git is happening from my terminal anyway so ¯\_ (ツ)_/¯

  • arccy 3 days ago ago

    Your weekly reminder to take a break

    • hoherd 3 days ago ago

      I still can't pull new branches even though the incident says it's resolved. I don't think my boss would be happy with me taking a break this long... but what else can I do when our business uses GH?

      • arccy a day ago ago

        Write up a document on the business continuity risks of depending on Github.

  • ecshafer 3 days ago ago

    Github is owned by Microsoft, so this is a pretty small time indie operation, you need to give them a break.

    • isodev 3 days ago ago

      I bet Microsoft is sad not because people can’t push, but because the training data for Copilot has slowed down.

      PS: None of our 40+ engineers felt anything, our self hosted Forgejo is as snappy as ever.

      • no_wizard 3 days ago ago

        until your hardware fails! Or your VPS provider goes down!

        Or whatever else, software services going down is going to happen in some capacity, eventually. Real question is what is acceptable

        • isodev 3 days ago ago

          When you self host you also learn how to backup. It’s not complicated actually, you should look into it.

          • no_wizard 17 hours ago ago

            I do know how to do that. I also know how to deploy bare metal servers, I have done so for years. Not really sure what you’re assuming here.

            I however prefer to acknowledge the nature of the business, which is there will be an inevitable untimely failure in some way you did not prepare for despite being the most well read, well practiced and researched to the problems at hand.

    • cube00 3 days ago ago

      Not replacing the CEO suggests they aren't focusing on it as much as they were.

    • lysace 3 days ago ago

      Just your casual $3.8T company.

      There were so many severe Github Actions outages (10+ ?) in the past year. Cause: Migration to the disaster zone also known as Azure, I assume. Most of them happened during (morning) CET working hours, as to not inconvenience the americans and/or make headlines.

      Money doesn't buy competency. It's a long-term culture thing. You can never let go on maintaining competency in your organization. It rots if you do. I guess Microsoft did let go.

      • mook 3 days ago ago

        I thought GitHub Actions (in particular; not the rest of GitHub) was always Azure, because it was initially a fork of Azure Pipelines?

        GitHub as a whole, including the previously non-Azure bits, does seem flakier than a few years ago though, for sure.

        • degamad 3 days ago ago

          It's possible that, even though the Actions part was always on Azure, migrating the other parts to Azure broke some connectivity between the pieces....

        • lysace 3 days ago ago

          You seem to be correct. Not that much visible from the outside, but yes it seems like they always ran on Azure, from the 2018 launch. (Apologies for the disinfo, although I qualified it with the "I assume".)

          • jedahan 3 days ago ago

            Pre-launch I seem to recall using an entirely different product with the same name, that supported CUE or HCL and had a better gui editor. I think post acquisition they scrapped it for the current (and IMO) worse reskin.

      • conception 3 days ago ago

        “guess Microsoft did let go” - are we thinking of the same Microsoft here?

        • lysace 3 days ago ago

          I am thinking of the atrophying one. Not MikeRoweSoft.

    • undefined 3 days ago ago
      [deleted]
  • prymitive 3 days ago ago

    Speaking of GitHub issues if you go to Insight->Traffic in your repo you’ll most likely see this banner:

    “ Referring sites and popular content are temporarily unavailable or may not display accurately. We're actively working to resolve the issue.”

    It’s been like that for months now with no sign of anyone working on it. They just don’t care about user experience anymore.

    https://github.com/orgs/community/discussions/173494

    • vaindil 3 days ago ago

      Hey, I'm the GitHub employee who's working on fixing that right now. The service powering those stats is _ancient_, and it fell over back in September. It's taken longer than I hoped to get a replacement working, but it should be fixed within the next couple of weeks, fingers crossed.

    • embedding-shape 3 days ago ago

      Speaking of "temporarily unavailable but it's actually forever", I've been wanting to get into Fallout and Starfield modding, so been waiting for their official wiki to come out of maintenance mode. I think I first tried to access it when Starfield launched (September 2023), and still today it is "currently down for backend maintenance". https://wiki.bethesda.net/

  • JLCarveth 3 days ago ago

    This sure does seem to happen a lot

  • _heimdall 3 days ago ago

    I really do feel for those hubbers that are still working on this years into the Microsoft era. GitHub was an excellent product and from what I hear it was an excellent culture too. I can only assume the culture has eroded similarly to the product itself as Microsoft has finally begun integrating the org into the slower moving machine that is MS.

  • contravariant 3 days ago ago

    Ah that was why. Oh well, I just needed to get the code to the server, so I didn't really need Github anyway.

  • pfyra 3 days ago ago

    Coincidentally, Azure Devops was also missing the ssh keys earlier today, both in the web ui and for ssh login.

    • spockz 3 days ago ago

      Well, github is moving to Azure and they are consolidating systems. No surprise there.

  • danfritz 3 days ago ago

    Related to the recent announcement they are moving to Azure?

    • drcongo 3 days ago ago

      Oh no. I look forward to watching my browser redirect 40 times on every attempted page load.

      • wpm 2 days ago ago

        And the URL will be 400 characters long

    • the_af 3 days ago ago

      Wow. It wasn't already running on Azure? What was it (or is it) running on?

      • noir_lord 3 days ago ago

        iirc (it's been a while) they where on rackspace when Microsoft bought them out - there was an article a few months ago saying they where moving to Azure and freezing new features while they do the move[1].

        [1] https://thenewstack.io/github-will-prioritize-migrating-to-a...

        Honestly I don't know half the features they have added because the surface is huge at this point everyone seems to be using a (different) subset of them anyway.

        So a feature freeze isn't likely to have much impact on me.

        EDIT: went and checked - https://github.blog/news-insights/github-is-moving-to-racksp... not sure if they moved again before the MS acquisition though.

        • nixgeek 3 days ago ago

          A team of us moved it off Rackspace in 2013, it’s been mostly in a set of GitHub operated colo since then. Used to be there was some workloads on AWS and a bit of DirectConnect. Now it’s some workloads on Azure.

          To the best of my knowledge there’s been no Rackspace in the picture since about 2013, the details behind that are fuzzy as it’s been 10+ years since I worked on infrastructure at GitHub.

          • antn 3 days ago ago

            yeah, we did not have anything in Rackspace for many years before the Microsoft acquisition. I remember having to migrate some tiny internal things off of Heroku, though!

      • le_stoph 3 days ago ago

        In the Pragmatic Engineer podcast episode with the former CEO of Github, the latter mentioned that they had their own infra for everything. If I remember correctly, this was due to the fact that Github is quite old and at the time when Github Actions became a thing, cloud providers were not really offering the kind of infra that was necessary to support the feature.

        • Kwpolska 3 days ago ago

          GitHub is old, but GitHub Actions are not. Indeed, GitHub Actions launched two months after the Microsoft acquisition was announced [0], and it is a half-assed clone of Azure Pipelines.

          [0] https://en.wikipedia.org/wiki/Timeline_of_GitHub

          • le_stoph 2 days ago ago

            I'll be damned, I feel like I've been using GA since forever!

            You're right though, just re-listened to the segment[0] and the ex-CEO mentions they were initially using AWS, then moved to their own servers because of the limitations of AWS at the time and their particular needs. Github Actions did however always run on Azure!

            [0] https://www.youtube.com/watch?v=2oq__5tDFZI&t=491s

      • saghm 3 days ago ago

        I can't read the entirety of this article[1] because it's paywalled, but it looks like they ran their own servers:

        > GitHub is currently hosted on the company’s own hardware, centrally located in Virginia

        I imagine this predates their acquisition from Microsoft. Honestly, given how often Github seems to be down compared to the level of dependency people have on it, this might be one of the few cases where I might have understood if Microsoft embraced and extended a bit harder.

        [1]: https://www.theverge.com/tech/796119/microsoft-github-azure-...

        • loloquwowndueo 3 days ago ago
          • saghm 3 days ago ago

            Fair enough, my Azure experience is minimal enough that maybe I shouldn't make assumptions about whether this would improve things. That being said, I do think there's merit in the idea that if Microsoft is going to be able to solve this problem, they probably should try to solve it just once, and in a general way, rather than just for Github?

            • balamatom 3 days ago ago

              >Microsoft

              >solve it just once, and in a general way

    • bob1029 3 days ago ago
    • stackskipton 3 days ago ago

      Doubt it. I'm Ops person on Azure, while they just had terrible outage recently, they tend to be as stable as any other cloud provider and I haven't had many issues with Azure itself compared to whatever slop the devs are chucking into production.

      • whoknowsidont 3 days ago ago

        >they tend to be as stable as any other cloud provider

        Absolutely not.

    • Fokamul 3 days ago ago

      Not Sharepoint? What a bummer.

  • nkzd 3 days ago ago

    I thought my SSH keys were revoked, whew.

    • coffeebeqn 3 days ago ago

      Just started to replace mine when I saw someone post a message about GitHub

  • dustfinger 3 days ago ago

    Why does the main page show all green when there is an ongoing incident? All green here -> https://www.githubstatus.com/

    • zamalek 3 days ago ago

      This is normal for Microsoft. It's as though status is owned and controlled by either marketing or accounting, not engineering.

    • FrostKiwi 3 days ago ago

      Very Microsofty. I'm still furious with how they handled the global ban on updating CDN (Frontdoor) files until last week and not properly communicating it. https://www.reddit.com/r/AZURE/comments/1on0ung/azure_status...

    • undefined 3 days ago ago
      [deleted]
    • gkoberger 3 days ago ago

      It's marked as resolved for some reason

      • undefined 3 days ago ago
        [deleted]
      • blibble 3 days ago ago

        because then some mid-level manager gets a telling off

        and/or has to pay the SLA out of their budget

    • dustfinger 3 days ago ago

      ahh, you are right. I am blind.

  • gunalx 3 days ago ago

    Yep. Was using github for oauth on a petproject of mine. Got the unicorn, and was considering takingthe break, or just etting up something else. Seems to be running again for me now though.

  • rvz 3 days ago ago

    Looking forward to the postmortem.

    Are they using AI agents this time to resolve the outage? Probably not.

    But this time, there is no CEO of GitHub to contact and good luck contacting Satya to solve the outage.

    • stuffn 3 days ago ago

      The postmortem will be simple since Github goes down so consistently every week you can almost use it as an alternative timekeeping system.

      • ares623 3 days ago ago

        The pulsar of web services

  • marak830 2 days ago ago

    Huh could this be why I can't login and pushing packages says my account is banned?

  • carlyai 3 days ago ago

    thought i was going crazy

  • whoknowsidont 3 days ago ago

    Another outage brought to you by Azure.

  • Scribesley 3 days ago ago

    [dead]

  • unit149 3 days ago ago

    [dead]

  • fishgoesblub 3 days ago ago

    Must be a day ending in Y.

  • wavemode 3 days ago ago

    It's possible that Microsoft buying GitHub was a large-scale psyop intended to reduce the productivity of the competition.

    Any time their startup competitors are making too much progress they can just push the "GitHub incident" button and slow everyone down.

    • grepfru_it 3 days ago ago

      We used to obsessively care about 500s. Like I would make a change that caused a 0.1% spike in 500s and I would silently say I'm sorry to the folks who got the unicorn page.

      I'm not sure the new school cares nearly as much. But then again this is how companies change as they mature. I saw this with StubHub as well.. The people who care the most are the initial employees, employee #7291 usually dgaf

      • 0x1ch 3 days ago ago

        I fall into the new school gen z category, and I think you're right. We don't care. We don't care about the problems started before us, and we owe nothing to no one (but our employers, must increase value for shareholders of course).

        I simply want to survive. I'll kiss ass where I have to, but not to people I don't work on behalf of.

        • kataklasm 3 days ago ago

          Can't say that's entirely true for me ('02). If my [ employer, supervisor, ... ] provides me with logical, traceable tasks with their context properly laid out, I can totally put a ton of effort into providing meticulous, well thought out solutions, that are as good as it gets under the provided constraints. It's the non-sensical (be it actually non-sensical or just not understood enough because of unprovided context) tasks that make me not care.

        • Arch485 3 days ago ago

          I'll throw in my $0.02, as a fellow zoomer. I care about the things that are mine (as in, my code, my decisions, etc. etc.). But if management fucks up and tells me to fix it, there is no amount of money that will make me care. Especially if I advised management _not_ to do that in the first place.

        • ares623 3 days ago ago

          Hell yeah

      • wavemode 3 days ago ago

        A lot of downvoters seem to have not realized that my comment was a joke.

        Though yeah, for startups who depend on GitHub for CI and CD, it's been noticeable how absurdly unreliable GitHub has become over the years.