Okta has a status page called Trust, and because I compete with them I pay attention to it. At Ping Identity we also have a status page on our IDaaS service and our team makes this a focal point for the service, ensuring that we have realtime data and that it is broken down into the component services along with response times. It is this level of automation and granularity that I think is the underpinning of “capital T” trust.
In reviewing the Okta page last week I noticed something interesting, well 2 things actually. The first is that the minutes of uptime didn’t correlate to the number of minutes in the year to date, it was off by 6 days. I didn’t think much about it until I went back to the page this morning and noticed that the number had jumped up by a large amount.
With the help of EpochConverter, I calculated the number of days in the year to date, multiplied by 24 and then by 60 to establish the number of minutes. Today is Day 251 in the year and that translates to 361,440 minutes will have elapsed at the midnight tonight… which is pretty far off the “minutes up” reported by Okta today, at 393,120.
Reversing the math on the 393,120 number gives me 273 days, and EpochConverter dutifully reports that day to be September 30, 2014. In other words, Okta is reporting the full month of September as being 100% uptime even though we are only on September 8th. So we know they aren’t automating the calculation of uptime, which also means the number is only as good as the incidents that are reported.
Which brings me to the second observation, there are no definitions of what each unit of measure means. Okta reports “100% global service uptime” for 2014 (rolling forward to the end of the month), but in the “infrastructure” and “features” uptime there are incidents that have impacted uptime.
For 2014 there are 770 minutes of infrastructure and features incidents that affected uptime, which calculates to almost 13 hours of service time (12.83 hours to be exact). How can you acknowledge you had 13 hours of incidents this year and then confidently assert that your service was 100% available and therefore meeting SLA promises? That’s just playing lawyer-ball using a synthetic measure of the service being reachable for 100% of the customers as opposed to the reality of that at various times during the year the service was not available for at least some of the customers.
Where this gets meaningful is that 770 minutes of incident time against the actual minutes to year of 361,440 means the service was 98% available and that is a material amount off the 99.9% SLA guarantee.
Trust is a truth between a company and a customer and when that truth is impaired, so goes the trust. Realtime data is a wonderful thing and in the world of on demand systems there is no reason for not offering a realtime perspective on system status.
UPDATE: I called them dishonest but have since deleted that because it was unfair. I really don’t know what their motivation is, and it could well be that they simply put up a page that doesn’t have the necessary systems connected in the backend.
About Me: I work at Ping Identity, a competitor to Okta. Obviously that means I’m not an objective observer here but math is a stubborn thing nonetheless. Hopefully you will read this objectively and make up your own mind… but needless to say, this is my personal blog and these are my personal opinions.