Well, according to my logs, it isn't that stable at all, assuming that the network connection is not the culprit. On these dates the connection failed at least once:
Fri Mar 20
Sun Mar 22
Mon Mar 23
Tue Mar 24
Sat Mar 28
Tue Apr 14
Wed Apr 15
Thu Apr 16
Wed Apr 22
Yeah, it doesn't have perfect availability that's for sure!
I created my own system to monitor this too on the 31 March 2020.
Since then I have noted the following:
20200401203301 Verification check failed to run correctly
20200402173301 Verification check failed to run correctly
20200403123301 Verification check failed to run correctly
20200407063302 Verification check failed to run correctly
20200407083302 Verification check failed to run correctly
20200407103301 Verification check failed to run correctly
20200411221213 Verification tool is reporting incorrect results
20200411231212 Verification tool is reporting incorrect results
20200413043301 Verification check failed to run correctly
20200413063301 Verification check failed to run correctly
20200419113301 Verification check failed to run correctly
20200420103301 Verification check failed to run correctly
20200420153302 Verification check failed to run correctly
Times are UTC naturally.
The check runs once every hour, and uses Puppeteer to automate a headless browser connection to my personal member verification URL and then reports the results back.
Where the check failed to run that indicates the verification site was either down, or was providing very slow responses such as it was on Monday (20th) - potentially these could also be caused by my own Internet connection too.
Where it wasn't reporting correct results, those entries were triggered due to the order my certs were shown in changing on the tool, and my script not accounting for that. I've noticed this does still happen randomly, but my script no longer alerts on that since I adjusted it - it will only alert if not all of my certs are shown in the response.
EDIT: it's interesting to note how our logs don't seem to tally up. I didn't detect any errors on any of the days you did and vice versa!
Also, when I look at my logs, my check has been running for around 522 hours and only truly reported errors for 11 of those which is a 2.1% error rate or inversely results to 97.9% up time - that's not that bad really, all things considered!
Good to see that you have been starting your own monitoring system, Alec. Kudos to you!
One of the possible explanations for the differences between your logs and mine is the method we use. I use the JSON API, you seem to use the standard HTTP(S) connection. I'd figure that they would end on the same system, but perhaps there is a difference.
Another explanation you already gave is that availability is hampered by network congestion or outage. That can be a local phenomenon or it can be an international issue - e.g. when the American site is not reachable from here, in Holland. In this case, perhaps (ISC)2 could consider employing something like Akamai.
I'll keep monitoring the connection, perhaps we can compare notes from time to time.