Tell HN: Azure outage
356 comments
·October 29, 2025mystcb
Update 16:57 UTC:
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. In addition. customers may experience issues accessing the Azure Portal. Customers can attempt to use programmatic methods (PowerShell, CLI, etc.) to access/utilize resources if they are unable to access the portal directly. We have failed the portal away from Azure Front Door (AFD) to attempt to mitigate the portal access issues and are continuing to assess the situation.
We are actively assessing failover options of internal services from our AFD infrastructure. Our investigation into the contributing factors and additional recovery workstreams continues. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:57 UTC on 29 October 2025
---
Update: 16:35 UTC:
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:35 UTC on 29 October 2025
---
Azure Portal Access Issues
We are investigating an issue with the Azure Portal where customers may be experiencing issues accessing the portal. More information will be provided shortly.
This message was last updated at 16:18 UTC on 29 October 2025
---
Message from the Azure Status Page: https://azure.status.microsoft/en-gb/status
planewave
Azure Network Availability Issues
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. We suspect that an inadvertent configuration change as the trigger event for this issue. We are taking two concurrent actions where we are blocking all changes to the AFD services and at the same time rolling back to our last known good state.
We have failed the portal away from Azure Front Door (AFD) to mitigate the portal access issues. Customers should be able to access the Azure management portal directly.
We do not have an ETA for when the rollback will be completed, but we will update this communication within 30 minutes or when we have an update.
This message was last updated at 17:17 UTC on 29 October 2025
cyptus
AFD is down quite often regionally in Europe for our services. In 50%+ the cases they just don‘t report it anywhere, even if its for 2h+.
RajT88
Spam those Azure tickets. If you have a CSAM, build them a nice powerpoint telling the story of all your AFD issues (that's what they are there for).
> In 50%+ the cases they just don‘t report it anywhere, even if its for 2h+.
I assume you mean publicly. Are you getting the service health alerts?
tomashubelbauer
CSAM apparently also means Customer Success Account Manager for those who might have gotten startled by this message like me.
psunavy03
Some really unfortunate acronyms flying around the Microsoft ecosystem . . .
cyptus
in many cases: no service health alerts, no status page updates and no confirmations from the support team in tickets. still we can confirm these issues from different customers accross europe. Mostly the issues are regional dependent.
llama052
I got a service health alert an hour after it started, saying the portal was having issues. Pretty useless and misleading.
cyberax
> CSAM
Child Sex-Abuse Material?!? Well, a nice case of acronym collision.
hallh
Same experience. We've recently migrated fully away from AFD due to how unreliable it is.
8cvor6j844qw_d6
I'll be interested in the incident writeup since DNS is mentioned. It will be interesting in a way if it is similar to what happened at AWS.
Insanity
It's pretty unlikely. AWS published a public 'RCA' https://aws.amazon.com/message/101925/. A race condition in a DNS 'record allocator' causing all DNS records for DDB to be wiped out.
I'm simplifying a bit, but I don't think it's likely that Azure has a similar race condition wiping out DNS records on _one_ system than then propagates to all others. The similarity might just end at "it was DNS".
parliament32
That RCA was fun. A distributed system with members that don't know about each other, don't bother with leader elections, and basically all stomp all over each other updating the records. It "worked fine" until one of the members had slightly increased latency and everything cascade-failed down from there. I'm sure there was missing (internal) context but it did not sound like a well-architected system at all.
cdr420
It's always DNS
layer8
DNS has both naming and cache invalidation, so no surprise it’s among the hardest things to get right. ;)
jjp
Whilst the status message acknowledge's the issue with Front Door (AFD), it seems as though the rest of the actions are about how to get Portal/internal services working without relying on AFD. For those of us using Front Door does that mean we're in for a long haul?
llama052
Please migrate off of front door. It's been a failure mode since it came out historically. Anything else is better at this point
everfrustrated
Didn't the underlying vendor they used for Azure Front Door go bankrupt? It's probably on life support.
jdc0589
yea its not just the portal. microsoft.com is down too
PeterCorless
Seems all Microsoft-related domains are impacted in some way.
• https://www.xbox.com/en-US also doesn't fully paint. Header comes up, but not the rest of the page.
• https://www.minecraft.net/en-us is extremely slow, but eventually came up.
daxfohl
Downdetector says aws and gcp are down too. Might be in for a fun day.
rozenmd
From what I can tell, Downdetector just tracks traffic to their pages without actually checking if the site is down.
The other day during the AWS outage they "reported" OVH down too.
linhns
Not sure if this is true. I just login to the console with no glitch.
jdc0589
yea I saw that, but im not sure on how accurate that is. a few large apps/companies I know to be 100% on AWS in us-east-1 are cranking along just fine.
NetMageSCW
AWS was performance issues and I believe is resolved.
mystcb
Yeah, I am guessing it's just a placeholder till they get more info. I thought I saw somewhere that internally within Microsoft it's seen as a "Sev 1" with "all hands on deck" - Annoyingly I can't remember where I saw it, so if someone spots it before I do, please credit that person :D
Edit: Typo!
chad_c
It was here https://news.ycombinator.com/item?id=45749054 but that comment has been deleted.
bossyTeacher
It sure must be embarrassing for the website of the second richest company in the world to be down.
planewave
yes, and it seems that at least for some login.microsoftonline.com is down too, which is part of the Entra login / SSO flow.
NDizzle
They briefly had a statement about using Traffic Manager to work with your AFD to work around this issue, with a link to learn.microsoft.com/...traffic-manager, and the link didn't work. Due to the same issue affecting everyone right now.
They quickly updated the message to REMOVE the link. Comical at this point.
eddie_catflap
We saw issues before 16:00 UTC - approx 15:38
jonathanlydall
Yet another reason to move away from Front Door.
We already had to do it for large files served from Blob Storage since they would cap out at 2MB/s when not in cache of the nearest PoP. If you’ve ever experienced slow Windows Store or Xbox downloads it’s probably the same problem.
I had a support ticket open for months about this and in the end the agent said “this is to be expected and we don’t plan on doing anything about it”.
We’ve moved to Cloudflare and not only is the performance great, but it costs less.
Only thing I need to move off Front Door is a static website for our docs served from Blob Storage, this incident will make us do it sooner rather than later.
out_sider
we are considering the same but because our website uses APEX domain we would need to move all DNS resolver to cloudfront right ? Does it have as a nice "rule set builder" as azure ?
jonathanlydall
Unless you pay for CloudFlare’s Enterpise plan, you’re required to have them host your DNS zone, you can use a different registrar as long as you just point your NS records to Cloudflare.
Be aware that if you’re using Azure as your registrar, it’s (probably still) impossible to change your NS records to point to CloudFlare’s DNS server, at least it was for me about 6 months ago.
This also makes it impossible to transfer your domain to them either, as CloudFlare’s domain transfer flow requires you set your NS records to point to them before their interface shows a transfer option.
In our case we had to transfer to a different registrar, we used Namecheap.
However, transferring a domain from Azure was also a nightmare. Their UI doesn’t have any kind of transfer option, I eventually found an obscure document (not on their Learn website) which had an az command which would let you get a transfer code which I could give to Namecheap.
Then I had to wait over a week for the transfer timeout to occur because there is no way on Azure side that I could find to accept the transfer immediately.
I found CloudFlare’s way of building rules quite easy to use, different from Front Door but I’m not doing anything more complex than some redirects and reverse proxying.
I will say that Cloudflare’s UI is super fast, with Front Door I always found it painfully slow when trying to do any kind of configuration.
Cloudflare also doesn’t have the problem that Front Door has where it requires a manual process every 6 months or so to renew the APEX certificate.
Uehreka
I noticed that Starbucks mobile ordering was down and thought “welp, I guess I’ll order a bagel and coffee on Grubhub”, then GrubHub was down. My next stop was HN to find the common denominator, and y’all did not disappoint.
pants2
Good thing HN is hosted on a couple servers in a basement. Much more reliable than cloud, it seems!
dang
As long as you don't use genetically identical hardware.
https://news.ycombinator.com/item?id=32031639
https://news.ycombinator.com/item?id=32032235
Edit: wow, I can't believe we hadn't put https://news.ycombinator.com/item?id=32031243 in https://news.ycombinator.com/highlights. Fixed now.
airstrike
I love that "Ask HN: What'd you do while HN was down?" was a thing
lysace
It was on AWS at least (for a while) in 2022.
jjice
Yeah looks like they're back on M5.
dang saying it's temporary: https://news.ycombinator.com/item?id=32031136
$ dig news.ycombinator.com
; <<>> DiG 9.10.6 <<>> news.ycombinator.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54819
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;news.ycombinator.com. IN A
;; ANSWER SECTION:
news.ycombinator.com. 1 IN A 209.216.230.207
;; Query time: 79 msec
;; SERVER: 100.100.100.100#53(100.100.100.100)
;; WHEN: Wed Oct 29 13:59:29 EDT 2025
;; MSG SIZE rcvd: 65
And that IP says it's with M5 again.parliament32
Always has been.
hypeatei
Starbucks mobile was down during the AWS outage too...
SoftTalker
They are multi-cloud --- vulnerable to all outages!
mring33621
you wouldn't believe some of the crap enterprise bigco mgmt put in place for disaster recovery.
they think that they are 'eliminating a single point of failure', but in reality, they end up adding multiple, complicated points of mostly failure.
andoma
Go multi-cloud they said...
Hamuko
Gonna build my application to be multicloud so that it requires multiple cloud platforms to be online at the same time. The RAID 0 of cloud computing.
null
sergiotapia
Wow I just left a Starbucks drivethru line because it was just not moving. I guess it was because of this.
irusensei
I was working when I saw the portal page showing only resource groups and lots of items missing. I thought it was a weird browser cache issue.
The actual stuff I was working on (App Insights, Function App) that was still open was operational.
hedayet
The sad thing is - $MSFT isn't even down by 1%. And IIRC, $AMZN actually went up during their previous outage.
So if we look at these companies' bottom lines, all those big wigs are actually doing something right. Sales and lobbying capacity is way more effective than reliability or good engineering (at least in the short term).
agency
So that's why I can't check in for my Alaska Airlines flight... https://news.microsoft.com/source/features/digital-transform...
kurttheviking
I am unable to load this article...presumably for related reasons
move-on-by
Instead of cyber security awareness month, we should rename it to cloud availability awareness month.
gianpaj
Can't download VSCode :D
Error: visual-studio-code: Download failed on Cask 'visual-studio-code' with message: Download failed: https://update.code.visualstudio.com/1.105.1/darwin-arm64/st...
Jamie452
Currently standing in a half closed supermarket because the tills are down and they cant take payments
chasd00
There's a Family Dollar by my house that is down at least 2 full days per month because of bad inet connectivity. I live close enough that with a small tower on my roof i can get line of sight to theirs. I've thought about offering them a backup link off my home inet if they give me 50% of sales whenever its in use. It would be a pretty good deal for them, better some sales when their inet is down vs none.
jrodom
50% of sales? what do you think the gross margin is on average for each item sold?
consp
2-3%, bit higher on perishables. Though i'd just ask lump sum payments in cash since it likely has to no go through corporate (as in, avoid the corporation).
ryandrake
You'd think any SeriousBusiness would have a backup way to take customers' money. This is the one thing you always want to be able to do: accept payment. If they made it so they can't do that, they deserve the hit to their revenue. People should just walk out of the store with the goods if they're not being charged.
Why doesn't someone in the store at least have one of those manual kachunk-kachunk carbon copy card readers in the back that they can resuscitate for a few days until the technology is turned back on? Did they throw them all away?
voidmain0001
If they used standalone merchant terminals, then those typically use the local LAN which can rollover to cellular or PoT in the event of a network outage. The store can process a card transaction with the merchant terminal and then reconcile with the end of day chit. This article from 2008 describes their PoS https://www.retailtouchpoints.com/topics/store-operations/ca...
BenjiWiebe
I think a lot of payment terminals have an option to record transactions offline and upload them later, but apparently it's not enabled by default - probably because it increases your risk that someone pays with a bad card.
ElevenLathe
The kachunk-kachunk credit card machines need raised digits on the cards, and I don't think most banks have been issuing those for years at this point. Mine have been smooth for at least 10 years.
null
BenjiWiebe
Pretty sure it'd be a lot better deal for them to have no sales than to pay out 50% of sales on stuff with single digit margins.
david422
IIRC, the grocery chain I worked for used to have an offline mode to move customers out the door. But it meant that when the system came back online, if the customers card was denied, the customer got free groceries.
cyberax
I remember that banks will try to honor the transactions, even if the customer's balance/credit limit is exhausted. It doesn't apply only to some gift cards.
SoftTalker
Mind-boggling that any retailer would not have the capability to at least run the checkout stations offline.
withinboredom
I knew an old guy in the '00s who specialized in cobal/fortran for working on tiller software. Guess he retired and they couldn't maintain it
computerdork
Anyone remember Bob's number?? Bob?! Oh the humanity! We're all gonna be canned!
0000000000100
Yeah just took down the prod site for one of our clients since we host the front-end out of their CDN. Just got wrapped up panic hosting it somewhere else for the past hour, very quickly reminds you about the pain of cookies...
alt227
... and DNS caching, and browser file cache, and sessions...
Moving a website quickly is never fun.
vachina
microsoft.com and some subdomains (answers.microsoft.com) has no A and AAA records. They screwed up big time.
0xbadcafebee
That specific subdomain has issues with propagation: https://dnschecker.org/#A/answers.microsoft.com (only four resolvers return records)
The root zone and www. do not: https://dnschecker.org/#A/microsoft.com (all resolvers return records)
And querying https://www.microsoft.com/ results in HTTP 200 on the root document, but the page elements return errors (a 504 on the .css/.js documents, a 404 on some fonts, Name Not Resolved on scripts.clarity.ms, Connection Timed Out on wcpstatic.microsoft.com and mem.gfx.ms). That many different kinds of errors is actually kind of impressive.
I'm gonna say this was a networking/routing issue. The CDN stayed up, but everything else non-CDN became unroutable, and different requests traveled through different paths/services, but each eventually hit the bad network path, and that's what created all the different responses. Could also have been a bad deploy or a service stopped running and there's different things trying to access that service in different ways, leading to the weird responses... but that wouldn't explain the failed DNS propagation.
Aperocky
wow, right after AWS suffered a similar thing.
I wonder if this is microsoft "learning" to "prevent" such an issue and instead triggered it...
"One often meets his destiny on the path he takes to avoid it" -- Master Oogway
kierenj
Ouch, and login.microsoftonline.com too - i.e. SSO using MS accounts. We'd just rolled that out across most (all?) of our internal systems...
And microsoft.com too - that's gotta hurt
planewave
It is interesting to see the differential across different tenants in different geographies:
- on a US tenant I am unable to access login.microsoftonline.com and the login flow stalls on any SSO authentication attempt.
- on a European tenant, probably germany-west, I am able to login and access the Azure portal.
parliament32
SSO and 365 are working fine for us, but admin portals for Azure/365 are down. Our workloads in Azure don't seem to be impacted.
juancroldan
Guess you have NASSO now (Not A Single Sign On)
ocdtrekkie
I am still stunned people choose to do this, considering major Office 365 outages are basically a weekly thing now.
NetMageSCW
We are very dependent on Azure and Microsoft Authentication and Microsoft 365 and haven’t had weekly or even monthly issues. I can think of maybe three issues this year.
gmassman
I’ve been migrating our services off of Azure slowly for the past couple of years. The last internet facing things remaining are a static assets bucket and an analytics VM running Matomo. Working with Front Door has been an abysmal experience, and today was the push I needed to finally migrate our assets to Cloudflare.
I feel pretty justified in my previous decisions to move away from Azure. Using it feels like building on quicksand…
alt227
All the clouds hav had major outages this year.
At this point I dont believe that any one of them is any better or reliable than the others.
Azure is down for us, we can't even access the azure portal. Are other experiencing this? Our services are located in Canada/Central and US-East 2
https://downdetector.ca/status/windows-azure/
https://azure.status.microsoft/en-gb/status