At my company we colo at a few local datacenters and have do deal with a huge amount of pressure from our investors as to why we're not using AWS.
The points that always seem to come up:
* AWS is a known quantity and it's easier to evaluate our business with it.
* AWS provides "outage damage control" because AWS outages make the news and customers are more understanding. When our ISP has issues it just looks bad on us.
* Our company doesn't look as innovative because we're not cloud. Bleh.
Our app is compute, storage, and data transfer heavy but switching to AWS being a, literally, 10x cost for us is apparently not enough a good enough answer.
Also: Investors want you to burn money. They don't want you to save money. If you run out of money and it works they give you more for more equity. If it doesn't work, they can move on earlier.
A startup I was with in 1999 was owned by a guy who build a super scrappy local isp who sold out to a big co which then sold out to cable.
However the startup was all built on oracle and sun boxes because "this is what investors want to see" and we'll get .80 on the dollar if we have to liquidate. We had some nimrod spend 8 months trying to get oracle to run on bare drives for our 100 tps (max) website.
They refused to let us use mysql or linux even tho the owner was very familliar with them from the ISP.
I think we spent 3x headcount on the hardware and software, eg we could have run for another 2 years had we been more scrappy. Not that the business idea was all that good.
How do you do failover if a server fails or if connectivity to one of those datacenters is lost? With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures. To me, that is the major selling point of any of the hyperscale cloud providers.
> With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures
"Confident"? "Most" failures? Are you merely hopeful that the probability of a bad failure is low, or are you able to test the AWS resiliency techniques you mention and to ensure that they stay working? At what cost?
My experience with RDS instances that had multi region failover, was that the failovers worked every time we needed them for deployments (I don’t think we ever needed them for RDS failures). The cost though was enormous. Our write db represented the lions share of our AWS cost, and doubling it for disaster mitigation increased our costs by something like 50%. It was mostly worth it from a business perspective when we were less sure about AWS uptimes, but I’m not sure I could keep justifying the cost given how little problems we had with RDS over time.
Physical hardware failures are handled by having everything in VMs and storage handled by Ceph. We can lose plenty of physical boxes simultaneously before we run into capacity issues.
Multi-DC failover is handled by announcing our public IP block at both locations with different weights. It’s technically active/active because traffic can come in at the secondary DC but we have a internal site-to-site VPN that is used to direct traffic to the primary. If the primary DC goes down the secondary starts handling the traffic instead of passing it along. All the database masters flip to the secondary and things keep humming along.
If we lose the site-to-site then the secondary stops advertising altogether and all traffic is forced to the primary.
So we can lose the site-to-site (which is dedicated) or one of the DCs at any time.
Not sure if you do not know anything about typical ESXi and vSphere setups, but if a server fails, all virtual machines are automatically migrated to a healthy server. And of course, your HPE G10s are compute only, all storage is on the fiber channel connected SAN.
The points that always seem to come up:
* AWS is a known quantity and it's easier to evaluate our business with it.
* AWS provides "outage damage control" because AWS outages make the news and customers are more understanding. When our ISP has issues it just looks bad on us.
* Our company doesn't look as innovative because we're not cloud. Bleh.
Our app is compute, storage, and data transfer heavy but switching to AWS being a, literally, 10x cost for us is apparently not enough a good enough answer.