Many people use AWS because everyone uses AWS. Many of my clients have no need f...

ehnto · on Nov 2, 2020

I work in an ecommerce agency environment and it can be quite frustrating. Every client, no matter how small, has been sold the spiel and had some guy come through and build a boutique cloud infrastructure, CI pipeline and development VM. As if traffic will increase 1000% by next week with no prior warning. It just doesn't happen like that.

I spend half my time debugging Hyperscale Bullshit™ that the client never needed and that failed the moment the devops guy left.

If you want this stuff, you need a long term tech team, and a devops member on hand. Every time you hand your project around this stuff costs you hours. You also need to tell your devs to use the devops guy like tech support. They built this rickety tower of obtuse config files, make the devops guy fix it and don't let your devs spin their wheels for 4 hours before their assets compile on the Super Cool Distributed Docker VM microservice stack of teetering jenga blocks.

devonkim · on Nov 2, 2020

As an ops and infrastructure engineer I cringe every time people talk about trying to automate all these processes early in the life of an organization or if there’s zero way for the company to have explosive growth. Not every company is going to need to rapidly scale up and down and I rarely see cloud saving money unless one’s infrastructure is pretty garbage already. Cloud lift and shifts are like bad rewrites inheriting all the architectural problems and increasing opex for the sake of a “cloud” stamp aka cloud washing.

But I do recommend companies have a means of deploying stuff to AWS and doing basic security there for clients that really require AWS. Having a VPC ready to go costs nothing operational and is good for at least having some semblance of collective skill with a major public cloud provider.

Invariably though when I see undisciplined developers take on colo style hosting systems become brittle and unable to accept changes and with any success it becomes a Herculean task to deploy new features without putting something important at risk. This results in more time spent on systems archaeology for new engineers eventually than creating new features or improving maintainability.

In 2020 I’d recommend developing as much as possible on PaaS type platforms like Heroku, a managed K8S, or even App Engine to avoid more bike shedding that really doesn’t buy a company anything materially advantageous early on like deciding which kind of EC2 instance to standardize on. Until engineers know what to optimize and concentrate upon in terms of process and there’s perhaps tens of thousands in monthly infrastructure costs most skilled (read: expensive) ops engineers won’t really be of much value except offloading work that developers shouldn’t have been thinking about in the first place.

michaelbuckbee · on Nov 2, 2020

I did work for a YC startup that replaced my ~$100/mo Heroku app with a $5,000/mo AWS stack that nobody in the place knew how to manage.

Their site was so ridiculously low traffic that I did some napkin math and figured out they were paying around $0.10 per web request.

fourseventy · on Nov 2, 2020

I've seen this exact same thing happen to a company I used to work for. They had an enterprise B2B app that had extremely low traffic. The Heroku app worked fine but the CTO spent a year building a super complicated AWS stack as an exercise in resume building. After the AWS stack launch filed miserably and we had to switch back to Heroku the CTO quit and still has "migrated company to infrastructure as code AWS stack" on their Linkedin profile.

jrochkind1 · on Nov 2, 2020

Huh, making a manual AWS stack 50 times more expensive than a heroku setup seems like a real trick to me! I'm used to heroku being more expensive than running it yourself -- as I would expect, since you're paying them for a lot of management.

user5994461 · on Nov 2, 2020

Heroku is both incredibly cheap and incredibly expensive.

It's only $7/month to deploy a web application. More if you want some of the paid features and a database instance. It all works out of the box with instant deployment out of the box, it's fantastic.

Then it suddenly goes $50 more per gigabyte of RAM, which makes it massively expensive for any serious workload. It's crazy hot much they can try to charge, makes AWS looks like a bunch of clowns in comparison.

jrochkind1 · on Nov 2, 2020

If it saves you a FTE or two from managing your own infrastructure, there is a lot of headroom before it's a losing proposition.

Which is what I think the OP misses discussing in as much detail as they could with AWS -- are there ways AWS is saving some customers developer/ops staff time over the cheaper alternatives? Cause that is often more expensive than the resources themselves. Could just be "because that's what we're familiar with" (and that's not illegitimate as cost savings), but could be actually better dashboard UIs, APIs, integrations, whatever.

[I am currently investigating heroku for our low-traffic tiny-team app, after we had our sysadmin/devops position eliminated. The performance characteristics are surprising me negatively (I can't figure out how anyone gets by with a `standard` instead of `performance` dyno, even for a small low-traffic app; currently writing up my findings for public consumption), but the value of the management they are doing so we don't have to is meeting and exceeding my expectations. (We currently manage our own AWS resources directly, so I know what it is we can't do sustainably anymore with the eliminated position, and the value we're getting from not having to do it).]

vidarh · on Nov 2, 2020

My experience - from doing devops consulting and moving clients off AWS every chance I get - is that it's bad business for devops consultants to move clients off AWS in terms of short term billable hours, because clients spent more money on me when they're on AWS. If I was after maximising billable hours in the short term, then I'd recommend AWS all the time...

As such a lot of devops consultants certainly have all the wrong incentives to recommend it, and these days most of them also lack experience of how to price out alternatives.

E.g. a typical beginners mistake is to think you'd price out a one-to-one match of servers between AWS and an alternative, but one of the benefits of picking other options is that you're able to look at your app and design a setup that fits your needs better. Network latency matters. Ability to add enough RAM to fit your database working set in RAM if at all possible matters. And so on. With AWS this is often possible but often at the cost of scaling out other things too that you don't need.

And most developers won't do it if you don't give them budget responsibility and make them justify the cost and then cut their budget.

Development teams used to AWS tends to spin up instance after instance instead of actually measuring and figuring out why they're running into limits to keep costs down. Giving dev teams control over infra without having someone with extensive operations experience is an absolute disaster if you want to manage costs.

I work for a VC now. When I evaluate the tech teams of people who apply to us, it's perfectly fine if they use AWS, but it's a massive red flag to me if they don't understand the costs and the costs of alternatives. Usually the ones who do know they're paying for speed and convenience, and have thoughts on how to cut costs by moving parts or all of their services off AWS as they scale, or they have a rationale for why their hosting is never going to be a big part of their overall cost base.

The only case where AWS is cost effective is if you get big enough to negotiate really hefty discounts. It's possible - I've heard examples.

But if you're paying AWS list prices, chances are sooner or later you'll come across a competitor that isn't.

jrochkind1 · on Nov 2, 2020

When you're talking about clients spending more on you as a consultant when they are on AWS... compared to what? Alternatives like GCS? Or actual on-premises hardware? Or what? When you "move clients off AWS every chance you get", you are moving them to what instead?

I'm having trouble following your theory of why having clients stay on AWS ends up leading to more consultant billable hours, I think because I don't understand what alternatives you are comparing it to. I am pretty sure it is not heroku, as in the earlier part of the thread?

Or are you talking about compared to simpler "vps" hosts like, say, linode? Doesn't that require a lot more ops skillset and time to set up and run compared to aws, or you don't think it does?

vidarh · on Nov 7, 2020

Compared on on premises, colo or managed hosting.

When moving them off AWS it'd usually be to managed hosting on monthly contracts.

Heroku turns expensive real fast. You're paying for AWS + their margins on top of AWS.

Managed hosting ranges from API-based provisioning not much different than AWS to ordering server by server.

In practice the amount of devops time spent dealing with the server itself for me at least is generally at most matter of downloading a bootstrap script that will provision CoreOS/Flatcar and tie it into a VPN and record the details. The rest of the job can be done by simple orchestration elsewhere. I have servers I haven't needed to touch in 5 years other than recently to switch from CoreOS to Flatcar (other than that the OS auto-updates, and everything runs in containers). Once you've done that, it's irrelevant what the server is or where it is.

For modern server hardware, if you run your own colo setup, that's a matter of having PXE and tftp set up once in a colo, and you can then use an IPMI connection to do the OS installation and config remotely, so even with colocated servers, I'd typically visit the data center once or twice a year to manage several racks of servers. The occasional dead disk would be swapped by data centre staff. Everything else would typically be handled via IPMI.

E.g. one of my setups involved 1k containers across New Zealand, Germany and several colo facilities in the UK. Hetzner (Germany) was the first managed hosting provider we found that could compete on total cost ownership with leasing servers and putting them in racks in the UK. Had we been located in Germany (cheaper colo facilities than near London), they'd not been able to compete, but putting stuff in a colo facility somewhere we didn't have people nearby would be too much of a hassle and the cost difference was relatively minor.

Small parts of the bootstrap scripts we had were the only thing different between deploying into KVM VMs (New Zealand), managed servers not in the same racks (Hetzner), and colocated bare metal booting via PXE on their own physical networks (UK). Once they were tied into the VPN and the container runtime and firewall was in place, our orchestration scripts (couple of weeks of work, long before Kubernetes etc. was a thing - we were originally deploying openvz containers and so the same tool could deploy to openvz, KVM and docker over the years) would deploy VMs/containers to them, run backups and failover setups, and dynamically tie them into our frontend load balancers.

We did toy with the idea of tieing in AWS instances to that setup too, but over many years of regularly reviewing the cost we could never get AWS cheap enough to justify it. We kept trying because there was a constant stream of people in the business who believed - with no data - that it'd be cheaper, but the closest we got to with experiments with AWS was ca twice the cost.

For the record, in my current job we do use AWS entirely. I could cut the cost of what we're using it for by ~80%-90% by moving it to Hetzner. But the cost is low enough that it's not worth investing the time in doing the move at this point, and it's not likely to grow much (it's used mostly for internal services for a small team). That's the kind of scenario where AWS is great - offloading developer time on setups that are cheap to run even at AWS markups.

I tend to recommend to people that it's fine to start with AWS to deploy fast and let their dev team cobble something together. But they need to keep an eye on the bill, and have some sort of plan for how to manage the costs as their system gets more complex. That means also thinking long and hard before adding complicated dependencies on AWS. E.g. try to hide AWS dependencies behind APIs they can replace.

ksec · on Nov 2, 2020

Well Heroku is running on top of AWS. So you are basically paying an extra premium on top of the premium from AWS.

Although whether it is worth it depends on your view.

( I keep thinking SalesForces is just not a good fit for Heroku, they should have sold it to MS or Google )

KingOfCoders · on Nov 2, 2020

Many (startup) SaaS companies (except metrics collectors e.g) have very low traffic from logged in users.

When I moved from an ecommerce company with 1000 logins/sec to a SaaS company I could not believe the low traffic :-)

NikolaeVarius · on Nov 2, 2020

Thats just incompetence.

hbogert · on Nov 2, 2020

You make it sound as if devops is the initiator of the complexity problems. This might well be in your case. My experience is the other way around.

I'm forced to think about stuff like Kubernetes because the devs are popping "micro-services" for almost anything and the answer to trying to keep things manageable is, hopefully, K8s.

Of course they then send funny memes and argue that K8s is overkill. Yet they have no idea how much a devops guy has to do to actually ramp up one of their multi-gigabyte container.

The whole micro-services mentality is proving to be backwards in my environment. It's seen as the answer to everything. "Oh we have a behemoth, let's just not use this part of the application and reimplement it somewhere else." In essence that's great. However, the grunt of the work is making tests to capture old behaviour/semantics; and who are the (indirect) clients of this piece of code. Ignoring a piece of code to death and reimplementing it somewhere else behind a socket is only part of the solution.

The hyperscalars are benefiting greatly of this mindset imo.

ehnto · on Nov 2, 2020

Oh yeah, my point of view is entirely based on my experience with small to medium sized companies, and there especially developers can be the instigators in sudden complexity multipliers.

I tried my best not to rag on devops as I am always incredibly impressed by these systems and what can be achieved and automated. It absolutely has it's place.

It's just that all the technology that big companies use become trendy and then end up used at small companies too. Small companies don't realise that the tools aren't solving the problems they have, and they don't understand the vendor lock ins and dependencies that they've introduced into their platform for no real gain.

This happens in all kinds of ways, not just devops. Your example of developers using the microservice pattern is exactly that. When I see a developer recommend microservices my mind flashes forward to all of the extra complexity that entails and how we've just turned a small monolith on one server into 6 pieces of software across distributed servers. Great for big companies, too much complexity for small companies.

Spivak · on Nov 2, 2020

At my company we colo at a few local datacenters and have do deal with a huge amount of pressure from our investors as to why we're not using AWS.

The points that always seem to come up:

* AWS is a known quantity and it's easier to evaluate our business with it.

* AWS provides "outage damage control" because AWS outages make the news and customers are more understanding. When our ISP has issues it just looks bad on us.

* Our company doesn't look as innovative because we're not cloud. Bleh.

Our app is compute, storage, and data transfer heavy but switching to AWS being a, literally, 10x cost for us is apparently not enough a good enough answer.

KingOfCoders · on Nov 2, 2020

Also: Investors want you to burn money. They don't want you to save money. If you run out of money and it works they give you more for more equity. If it doesn't work, they can move on earlier.

grogenaut · on Nov 2, 2020

A startup I was with in 1999 was owned by a guy who build a super scrappy local isp who sold out to a big co which then sold out to cable.

However the startup was all built on oracle and sun boxes because "this is what investors want to see" and we'll get .80 on the dollar if we have to liquidate. We had some nimrod spend 8 months trying to get oracle to run on bare drives for our 100 tps (max) website.

They refused to let us use mysql or linux even tho the owner was very familliar with them from the ISP.

I think we spent 3x headcount on the hardware and software, eg we could have run for another 2 years had we been more scrappy. Not that the business idea was all that good.

We also got .3 on the dollar iirc.

mwcampbell · on Nov 2, 2020

How do you do failover if a server fails or if connectivity to one of those datacenters is lost? With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures. To me, that is the major selling point of any of the hyperscale cloud providers.

KingOfCoders · on Nov 2, 2020

"connectivity to one of those datacenters is lost"

Anecdotal, but AWS had more global problems than the triple connected data centers I've used in twenty years.

I suffered through many rough times, data center connection or power problems were not (very seldom) one of those.

Most of the problems came from apps that we didn't build well to scale or that had bugs (most frequent cause of problems and sitedowns).

HelloNurse · on Nov 2, 2020

> With AWS I could just set up a multi-availability-zone RDS deployment for the database and an auto-scaling group for the web tier and be confident that AWS will recover the system from most failures

"Confident"? "Most" failures? Are you merely hopeful that the probability of a bad failure is low, or are you able to test the AWS resiliency techniques you mention and to ensure that they stay working? At what cost?

milesvp · on Nov 2, 2020

My experience with RDS instances that had multi region failover, was that the failovers worked every time we needed them for deployments (I don’t think we ever needed them for RDS failures). The cost though was enormous. Our write db represented the lions share of our AWS cost, and doubling it for disaster mitigation increased our costs by something like 50%. It was mostly worth it from a business perspective when we were less sure about AWS uptimes, but I’m not sure I could keep justifying the cost given how little problems we had with RDS over time.

Spivak · on Nov 3, 2020

Physical hardware failures are handled by having everything in VMs and storage handled by Ceph. We can lose plenty of physical boxes simultaneously before we run into capacity issues.

Multi-DC failover is handled by announcing our public IP block at both locations with different weights. It’s technically active/active because traffic can come in at the secondary DC but we have a internal site-to-site VPN that is used to direct traffic to the primary. If the primary DC goes down the secondary starts handling the traffic instead of passing it along. All the database masters flip to the secondary and things keep humming along.

If we lose the site-to-site then the secondary stops advertising altogether and all traffic is forced to the primary.

So we can lose the site-to-site (which is dedicated) or one of the DCs at any time.

erikbye · on Nov 3, 2020

> How do you do failover if a server fails

Not sure if you do not know anything about typical ESXi and vSphere setups, but if a server fails, all virtual machines are automatically migrated to a healthy server. And of course, your HPE G10s are compute only, all storage is on the fiber channel connected SAN.

tatersolid · on Nov 4, 2020

You don’t get 17 server licenses for vSphere or a SAN for $55k as outlined above. More like $500k plus another $100k/year in maintenance.

You’re still paying a cloud provider, but in this case it’s VMware

Spivak · on Nov 3, 2020

Yep! We have a KVM and Ceph based setup but it's basically the same as you describe.

Cthulhu_ · on Nov 2, 2020

From AWS' own marketing propaganda: How much would said clients have to invest in people and hardware otherwise? What if their application becomes an overnight success and needs to scale up fast?

Sure, if it's an established company then using their own hardware (and people to set up and manage it) might make sense; iirc Dropbox is a fairly recent big player that made that move. But otherwise it's a big upfront investment to make, and you can't know if it'll pay itself back.

So sure, AWS can be 2x or more as expensive as renting servers at a company like OVH or building your own datacenter, but it's paid by the minute if need be, not all in one go. If your startup or its VC money runs out in six months at least you can quickly scale down and pull out.

fabian2k · on Nov 2, 2020

I think 2x is a bit optimistic, if you just compare what kind of hardware you get by renting a dedicated server compared to EC2 it can easily be 5x or more. Of course that compares very different things, and doesn't mean that renting dedicated servers is always cheaper. The comparison gets much, much worse when significant amounts of traffic is involved. And scaling down isn't so much more difficult than with AWS within certain limits.

Comparing managed services like databases is maybe more meaningful than just EC2, but also so much more difficult.

duhast · on Nov 2, 2020

StackOverflow is famous for running on-prem on bare metal Dell hardware: https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

Last time I checked, GitHub was also mostly on bare metal and cloud-free.

908B64B197 · on Nov 2, 2020

They launched in 2008, on a stack built around IIS because that was what their early devs were familiar with.

Nextgrid · on Nov 3, 2020

Is this supposed to be a bad thing?

whatshisface · on Nov 2, 2020

StackOverflow does predate the AWS world takeover.

duhast · on Nov 2, 2020

How is this relevant? Amazon.com also predates AWS.

skj · on Nov 2, 2020

The implication is that they had already solved these problems and redoing the stack has its own cost to consider.

bsenftner · on Nov 2, 2020

For my own startup, I built a small cluster of 17 servers for just beneath $55K, and that had a month-to-month expense of $600 placed in a co-lo. In comparison, the same setup at AWS would be $96K per month. And it is not hard, easy in many ways. Do not be fooled, the cloud companies are peddling is an expensive scam.

Retric · on Nov 2, 2020

Cloud companies are useful as long as you want what their selling. The best case is needing say 2 TB of ram for some workload or test that only going to take a few hours.

Or something like the Olympics where 95% of your demand comes in a predictable spike across a few days.

chasd00 · on Nov 2, 2020

very true, the original selling point for Cloud was instant upgrade/downgrade as needed. That was the original amazing thing, dials that said RAM and CPU you could turn up or turn down.

ed25519FUUU · on Nov 2, 2020

If I was doing my own thing I would go the same route as you, but I’m knowledgeable about this stuff, and can manage the entire system (network, replacing bad hardware, etc). It would need to be a very good reason for me to be oncall for that, or else I’d save money by going with something like ovh.

ti_ranger · on Nov 2, 2020

> For my own startup, I built a small cluster of 17 servers for just beneath $55K, and that had a month-to-month expense of $600 placed in a co-lo. In comparison, the same setup at AWS would be $96K per month.

Why would you build exactly the same setup in AWS as for on-prem, unless your objective is to (dishonestly) show that on-prem is cheaper?

Lift-and-shift-to-the-cloud is known to be more expensive, because you aren't taking advantage of the features available to you which would allow you to reduce your costs.

bsenftner · on Nov 2, 2020

> Why would you build exactly the same setup in AWS as for on-prem...

It was far better to invest a little up front, and maintain at $600 my operations than the same for $96K a month, that's why.

I never "lifted and shifted", I built and deployed, with physical servers, a 3-way duplicated environment that flew like a hot rod. At a fraction of cloud's expense.

necovek · on Nov 3, 2020

I think the point GP was making is that you could have likely started off much cheaper, eg. with 2k/month of AWS costs before needing to "simply" scale at eg. 12 months, especially so if using managed services and not just bare ec2 instances.

I personally think there's room for both, and I think hybrids between on-prem and cloud are the ideal for long running apps: you size your on-prem infrastructure to handle 99% of the load, and scale to the cloud for that one-off peak.

That's still pretty complicated due to different types of vendor lock in (or lock out in some cases). Google has invested in k8s to get people some value for moving away from AWS.

bsenftner · on Nov 3, 2020

My application had (still would have) very high CPU requirements, and 2k/month would have got me spending more money than necessary. When I started I bought 1 server with the capacity I needed and put that in co-lo for $75 a month. That little puppie was equal to $10K a month at AWS, so why would I want to use AWS again? Just do the math, even 1 server out performs and is exponentially less expensive. The cloud has the majority of engineers looking like morons from a financial literacy perspective.

srtjstjsj · on Nov 3, 2020

Are you claiming that you knew exactly how powerful you needed your machines to be, before you launched? Or are your machines running at 25% utilization which AWS would charge substantially less for?

bsenftner · on Nov 3, 2020

I'm not making any such claim. I'm saying I built a 24-7 available physical 17-server cluster to operate my startup's needs. I had more capacity than I needed, but at the same expense thru AWS I'd not have enough to operate. At less than the expense of one AWS month, I had my entire environment owned outright. How is that difficult to understand?

WrtCdEvrydy · on Nov 2, 2020

AWS also gives you a lot of cost savings for using Spot and signing contracts with minimum spends. Only a small shop pays full price for anything.

shiftpgdn · on Nov 2, 2020

Okay but then you have to engineer your application around interuptable spot instances. You're also making a 36 month commitment when you sign that contract (generally buying the hardware for Amazon.)

WrtCdEvrydy · on Nov 2, 2020

> you have to engineer your application around interuptable spot instances

This is where your ALBs and ASGs come in. If your app doesn't use local writing and you can shift your caching to a shared cache, the cost savings are good.

lvh · on Nov 2, 2020

It's possible that "many" firms do this, but given AWS' growth numbers that would imply they don't really spend meaningful amounts to begin with.

Counterpoint: I've seen a great many false economies with people trying to go on-prem and do alternative hosting because they don't think the AWS premium for e.g. GPU instances is worth it. I don't think that has generally worked out well.

freeone3000 · on Nov 2, 2020

The AWS premium for GPU instances is absolutely not worth it. You don't hear about people running local GPU compute clusters because it's not newsworthy -- it's obvious. Put a few workstations behind a switch, fire up torch.distributed, you're done. And after two months, you've beaten the AWS spend for the same amount of GPU compute, even if only 50% of the time is spent training. Timesharing is done with shell accounts and asking nicely. You do not need the huge technical complexity of the cloud: it gets in the way, as well as costing more!

srtjstjsj · on Nov 3, 2020

What if you want 10x the GPU for one month to build a model?

freeone3000 · on Nov 3, 2020

That's the only scenario I can think where it comes out clearly in favor of AWS - you've tested your model in the small on Colab, you're confident you'll need only a few training runs, you can schedule them in us-east, and you can inference on CPU, and you won't need to rebuild for another eight months (when purchased cards become outdated).

It's not an impossible scenario... But imagine the sort of company that trains their own model instead of using a huggingface refinement or an off-the-shelf redistillation. (These can be done reasonably on an average gaming PC, no need for a cluster.) Such a company has expensive human resources. They bothered to get a data scientist and at least a research engineer, if not a full researcher. Were they hired on six-month contract as well? This is a huge expense, so it must be an important differentiator to have built a custom model -- and it's a one-and-done? I don't see it. I think it's going to be an ongoing project, or it shouldn't have been approved in the first place.

KingOfCoders · on Nov 2, 2020

"It's possible that "many" firms do this, but given AWS' growth numbers that would imply they don't really spend meaningful amounts to begin with."

I could spend a million EUR a year on AWS without the need for most of the services of AWS?

But it YMMV and 1M EUR/y is not meaningful, perhaps we differ there.

qaq · on Nov 2, 2020

There is no good data to prove either statement. You could def prove that for some large scale onprem for a given type of problem is cheeper than the cloud.