A bunch of the comments are already pointing out the launch issues Pokemon Go had, and it's well known that a rep from AWS was also throwing jabs at them during launch for their issues.
It would be naive for everyone to assume that a high traffic launch is all about the cloud underneath and only that.
The article didn't mention any of the technical details of the Pokemon application itself, for all we know the infrastructure was humming nicely and the application itself didn't scale. Or the other way around, or a combination of both or one of the other of thousands of moving pieces it takes to launch something.
I think Pokemon scaled remarkably better than any app many of us have ever dealt with, seeing the crazy demand spikes. You would run into a lot of app issues, data choices, a lot of things that don't just get solved from simple uses of autoscaling.
It scaled better than Twitter, it hit the same active user count (or close to it) in a month that took Twitter years to reach, and Twitter would sh*t the bed all the time with the infamous "fail whale" popping up for users constantly even with its steady increase in growth over time and time to adapt. Twitter was pioneering at reaching that scale, that many now use as guide/lessons learned now, so its forgiven, but how many companies see that much growth, that fast?
To be fair -- Twitter is a much harder architectural problem (significant fan-out and fan-in problems to solve).
Pokemon Go is less complicated in terms of the conceptual model. There are only a few pretty simple ways that different players can interact (gyms, lures on pokestops, that's about it).
On the other hand, Twitter is dealing with strings < 140 character. Pokemon Go is tracking all the trainers and Pokemon at all times, not to mention the gyms and lures.
Thank you for remembering this. Also worth pointing out that they too ruined their launch experience for millions because of features no one really wanted that were secondary to the core experience ;) (IIRC they did some sort of light simulation offloading and social / competition features in "the cloud" which made the entire game not function without an internet connection.)
Upon first read I actually had the same thoughts as many and applied them to the google services as well as the application itself. However on reflection, yeah the services google provided were pretty impressive, especially after re-reading the following:
"Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers"
Architecture analysis may or may not be standard (I don't actually know since I've never had to deal with something like this) but that sounds great to me.
One of the nice things about cloud in general is that it's self-serve for small customers (enter credit card info, a few seconds later you have some computers in the cloud), but as you scale up and start paying millions of dollars per month and sign up for the "Platinum" or "Enterprise" or "Premier" plans, software engineers start showing up at your door to help you design, launch, and fix your service. I've seen this with Google and Amazon, and I imagine it's probably also true for the other providers.
Gold and Platinum support can help with architectural analysis. I'm also guessing Google had a vested interest to help Pokemon Go do an analysis of their architecture. Doing so could help find any inefficiencies to lighten the load on their cloud servers.
Ingress ran on App Engine. I was one of several people working with Niantic at the time to address application architecture issues to avoid App Engine anti-patterns.
[Edit] Disclaimer: I haven't used App Engine in a while now.
I don't recall whether Ingress specifically ran into anti-patterns (or we advised them early enough to avoid them). As I left Google in 2013 I am not sure what the experience of Niantic was with regards to Pokemon Go.
Before App Engine used snowflake IDs for the App Engine datastore IDs were allocated sequentially. The App Engine datastore is based on Megastore [1] which is based on BigTable [2].
If you read massive amounts of sequential keys you could potentially run into issues with the automatic management of tablets. Compaction/splits could introduce terrible performance.
Since Niantic Labs operated within Google there was more flexibility around budget, but with other customers (for example Snapchat), you also needed to carefully control your cost. Perform projection queries (always a good idea) to only return the data you absolutely need and many other optimizations. There is a lot to learn that is specific to App Engine, for example [3] (this is a very old and now outdated SO answer I gave).
From my perspective, designing a high performance and scale, cost-effective application on App Engine is very difficult, but not impossible. This is of course only the case if your application is a standard web application (or REST API). A lot of functionality you may want (support for certain system libraries or connection protocols) may be unavailable.
IMHO, Google Container Engine (and similar services) eliminates many of the use cases for which I previously would have chosen a PaaS such as App Engine. I am in full control of my runtime and application environment without the need to learn proprietary APIs or system behavior, while not having to worry about managing the server infrastructure.
Not GP but I recently assessed GAE for my company and ultimately decided to go with GKE instead. The reason was GAE had no support for websockets [1] and had a weird way of influencing almost all implementation details in your code base. It pushes towards full adoption of GCP's other products like CloudSQL, Memcache, Cloud Datastore, etc, because those products run so bloodly performantly you feel like a poor engineer not taking advantage of them.
In short those "negatives" are actually perfectly reasonable when you consider what GAE was designed to be: A Google managed PaaS that handles all the hard parts for you IFF you follow the platforms conventions. The problem is that others see it as a drop-in for EC2 and get burned as they throw away their previous tool chains like Postgres, Redis, and the entire networking stack.
That Q&A is from 2010 (!). GAE Standard has the Sockets API, and GAE Flexible runs in a VM instead of a sandbox, so it supports sockets out of the box.
Ingress, at peak, probably had only a small fraction the users Pokemon Go had in the first month of launch. So their experience of working at that scale was basically nonexistent. It's a whole other ball parks when you increase your player base by 2-3 orders of magnitude.
That being said, they really should've done better at predicting hype, especially since they had beta sign up and all.
Werner actually posted as himself in that Reddit thread ^
"Guys, this was not intended as a snark, joke, rip or self-promotion. I just want people (including myself) to play given how much fun it is. We have some experience in running big software systems that need to scale and just want to help.
Update for the skeptics: yes I am @werner, not Amazon PR. I have the same username on Instagram."
EDIT: Unless you can prove the snark or malicious intent was there, to say otherwise is disingenuous.
Seems a bit like back peddling. He knows that Google has experience running big software systems that need to scale as well. We all know it. He also knows Alphabet is a part-owner of Niantic.
Yup, it's all about Werner just wanting to play Pokemon Go in a purely private capacity, and he naturally expects his company to help him with that. Nothing at all to do with PR or business.
Load testing is really hard, it would be interesting to see more research in this area, such as tooling and design patterns that flag bottlenecks that could be outages at larger scales.
By not mentioning that Niantic is made up of Googlers, and spun out of an internal project, the post makes it seem that Niantic organically and independently choose Google Cloud from among the top P/IaaS vendors.
The omission of that fact may cause the reader to infer that Google Cloud Platform was chosen for it's technical merits, and not because the company was actually a Google team that spun out.
In theory, the idea that it was a true 'choice' is suspect. They were a Google company previously, so they'd just use Google by default even if everyone on staff also was equally expert at AWS.
That said... in reality, they likely had real reasons to use Google since they probably have a higher expertise with it that AWS.
> Not everything was smooth sailing at launch! When issues emerged around the game’s stability, Niantic and Google engineers braved each problem in sequence, working quickly to create and deploy solutions. Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers — all against a backdrop of millions of new players pouring into the game.
IMO This is the most valuable thing in this article. It essentially says what others are pointing out. You can't just press a button and have scale. It's not that easy. You have to tackle many layers. Considering they 50x'd their worst case scenario it would have only taken a few bad queries to fuck shit up.
At 50x designed max load bad queries would be one of the easier problems to solve. We design systems all the time that have hard scaling limits, and it's only a problem when you operate the system past those limits. e.g. I wrote a service with ACID assumptions from the underlying database but now the largest ACID db box I can buy isn't big enough. Oops. There's a bunch of possible ways around that but they usually involve a nontrivial amount of engineering effort.
I wouldn't expect any service to survive that much unplanned load. Maybe they could've estimated better, but how likely was it for the game to go viral? Worth sinking lots of dev time into and delaying launch? That's a hard question to answer without the benefit of hindsight.
Honestly I was surprised they were able to scale as elastically as they did and we now know the reason why: they used Datastore instead of something like Postgres with sharding. That decision alone is most of the reason they made it. By contrast, my team doesn't use managed NoSQL services for our projects because they're simply too expensive to deploy before seeing considerable growth.
They also have a system that's almost trivial to shard in any way they want, which would help even with postgres all around. Since there's no real interaction between users (only user-environment really), they can split databases by location, user groups, or some hashes as they want. Also the data they deal with is extremely cachable - as long as you play on one device.
There is some, when users battle gyms they are battling with other users Pokemon which would be from anywhere in the world. When the Pokemon on the Gym loses it needs to be removed in a transaction with an update on the gym. This takes the easiest form of sharding off the table, so it's not as simple as you make it seem.
It doesn't really need to be a transaction since it is okay if it is eventually consistent (e.g. they have a cleanup job if a transaction is lost), that would be fine.
This: "but now the largest ACID db box I can buy isn't big enough."
The hard part about scaling is knowing when to invest in scale. The binomial choice is to build and throw away, or build so as to be scalable. So hard to predict.
All you need to do is follow the waterfall development process properly! Think hard, establish a perfect design in enough meetings, write a flawless spec, then implement it correctly and the button will work first time!
Technically true, but it doesn't make sense from a business perspective. Anticipating and accounting for every scaling issue costs a lot of money. Most businesses would decide that satisfying 90% of scenarios is sufficient.
Business is irrelevant once you're sure that the service will go into production AND will have some amount of users.
It's known that some things will break from time to time (e.g. servers, hard drives, networking). If you've designed the application poorly so that a single minor failure can snowball to the entire system, it's guaranteed that your site will fall apart periodically, costing you lots of money each time.
At this point, the best technical AND business converge to "pick an option that can sustain [minor] breakages". Either you've done it like that from the start, or you're in the process of rewriting/redesigning your site.
Not really. In the case described they had accounted for some maximum number of users and instead got 50x that number. It is reasonable to design a system which handles just some planned number as long as there is a strategy in place to manage a situation beyond those limits. In this case the strategy was to apply engineering effort. That seems like a valid risk management plan to me.
Experience suggests that things you didn't anticipate will rain on that plan.
Queues, buffers, locks all have ways of interacting at high volume that defy prediction. And if you think your system has no queues or locks then you just haven't looked deeply enough.
Currently on Cloud here at Google. I would like to elaborate "Google CRE seamlessly provisioned extra capacity on behalf of Niantic to stay well ahead of their record-setting growth."
Just because you have the resources does not make for a well scaling service. As outlined in the post, "Google CRE worked hand-in-hand with Niantic to review every part of their architecture, tapping the expertise of core Google Cloud engineers and product managers". Look past the chart, this wasn't just an estimate for Google, it was also for Niantic. You don't have unlimited development resources, not every aspect of an application may have had the time to flesh out a scale-able approach.
Netflix didn't scale out overnight. I'm sure we've seen their techblog about X extremely specialized framework/tool they've built out over the years. I'm impressed with how quickly Niantic achieved a playable experience.
I experienced it myself. There was chunks of 6+ hours where you could not connect at all, and you had to restart the app every 10 minutes (or every third pokémon), and wait while it was draining your battery like an hungry walrus on its bucket of fish.
However, it became perfectly playable after their "tracker patch" (which was the trigger for the massive exodus that followed).
What tracker patch are you referring to? They changed it several times and I hardly doubt there was any massive exodus due to any of them. The loud minority on reddit is not representative of the game's community in large.
When the game experienced the most instability it hadn't even been launched in most countries.
The first one, when they dropped third parties and removed the already-bugged step count under close-by pokémons.
It's purely anecdotic, but after that release, popular spots (with multiple pokestops and arena in one place) usually crowded with more than 50 people playing actively were down to one dude on a bench.
Sure the "loud minority" is not representative of the whole game's community, but they were it's core players. They shaped the game's community (and niantic didn't helped) from the beginning. And it didn't help when those people decided to stop playing.
Back to server's issues, Google CREs did a very good job during these times. They can't be blamed for trying to scale an app that was absolutely not designed to get this high.
There was the patch that broke the tracker. Then there was the patch that "fixed" the tracker by removing the footsteps but basically kept it how it was. This patch was released on July 30th. Around August 23rd, it was reported that Pokemon Go had lost over 10 million users, down from 45 million sometime in July. Here's the article on that: http://arstechnica.com/gaming/2016/08/pokemon-go-sheds-more-...
This is pure marketing that might convince decision makers and execs that didn't play Pokémon Go.
No doubt, the CRE program could prove valuable. But in this case, they are congratulating themselves on a rocky and widely panned launch of a product on their platform. One might wonder, "If this is what deploying a viral app on Google Cloud Platform looks like when you have help from Google engineers, what chance does anyone else have of getting something right on their platform?"
I think that's probably the wrong takeaway, but it's not difficult for me to imagine that being the only conclusion one has.
Absolute marketing trash with almost zero technical value.
If I linked an old style "HP Whitepaper about Success with Customer X" it would be downvoted to hell - but that's exactly what this article is, but by Google instead of HP.
From a user point of view it did not go nearly as smoothly as it is made to sound in this post. I would love to see a follow-up post about what went wrong along the way, and whether the hype surge might have continued longer and stronger if more people picking it up for the first time would've had a smooth experience (rather than lots and lots of errors talking to the servers).
The worst case scenario was 5x, which was a factor of 10 off. If you are that far off when you do your capacity planning, you can be pretty sure you've got problems throughout your entire stack.
There is literally no way anyone could have expected it to do this well. Nintendo has steered clear of mobile gaming because they expected all of their IP to flop in the mobile world.
That said, what you said is still true. If you're thinking you're going to have 100k users, you might be willing to allow a lot more data to be transmitted and/or processed than if you had 10m users. Just looking at tracker alone, having to transmit and measure distance between a dozen points and ordering them is a lot more work than checking whether a dozen coordinates are within a range and listing them in any order.
> Nintendo has steered clear of mobile gaming because they expected all of their IP to flop in the mobile world.
I don't believe this for a second. Nintendo knows where its goldmines are. IMHO it is far more likely that they steered clear of mobile because they didn't want to cannibalize their Game Boy sales. It's a classic case of a big corporation being slow to react to change or even trying to stop it because they hold a dominant position in the old system.
Nintendo would much prefer if "mobile gaming" still meant Game Boys but the world has already made their decision and they can't stop it. They can't ignore the sales figures on what would have otherwise been an obscure goofy spinoff fitness app on the Game Boy. They've ignored the mobile market for too long and now there is pent up demand. Without Pokemon Go I doubt you would have seen the sudden scramble to develop and release a Mario game for iPhone.
>> Nintendo would much prefer if "mobile gaming" still meant Game Boys but the world has already made their decision and they can't stop it.
I'm not sure this is entirely accurate. The Nintendo 3DS (I assume what you mean by "game boy"), has been a very successful product for Nintendo. And sales of the 3DS have actually spiked as a result of Pokemon GO.
While it may be a success, the 3DS has sold half of what its predecessor the DS did (~60 million vs ~130 million units worldwide). The NX will probably sell half again of the 3DS. The market for handheld gaming systems is dead or at least terminally ill. And I say this as someone who plays on a New 3DS daily (don't get into Monster Hunter if you want a life).
The reason Nintendo has avoided mobile gaming is probably down to fear of piracy. The DS was ruined (from Nintendo's POV) by flash carts, and with the changes they made on the next generation the barrier to entry for copying games is now lower on Android/iOS than on the 3DS. I think that when they do finally go mobile, the games will have a lot of server-side checks going on. The games will be a pain in the neck to play even if you are paying. For example, always-on internet required will only work on non-rooted devices, that sort of thing.
For sure - we're on the same page, although I'm a little more bullish on the future of some sort of hardware gaming platform, like the upcoming NX. I'll be the first to admit that could be wishful thinking, but I just can't go over the thought that my phone isn't a serious gaming device. Monster Hunter is actually a perfect example. When we get the crew together we'll rock for 5 or 6 hours, swapping out chargers. I can't see myself ever doing that comfortably holding a piece of glass.
That chart is not at all accurate. If it were, it would be telling us that over time, they expected no change in data volume? They set a single number as "Expected volume", and a single number as "worst case", with no planning for growth at all? That's what this chart is showing. Either they were so poor at planning that this chart is accurate and the fault is on them, or the chart is inaccurate and we can't really trust any of the data it represents.
I can guarantee you the chart is accurate as I created it. Yes they did expect change, but the only number that matters for this graph is "how big can we get" and then "if we get that big by the end, is there enough capacity". The answer to to the first one turned out to history making ; no-one in their right mind would bet on happening beforehand, and the second turned out to be "Yes".
Since you created the graph, could you provide more context as to the units of measurement and scale for the axes? Are you using the derivative of growth as biot mentioned below, or something else? This graph shows two lines that stay static over time (meaning that either Niantic started off on day one with their entire server infrastructure already running with no plans to scale up, or they did not plan to scale up as demand grew), and one upwards trending line that shows actual changes in traffic over time. I'm trying to discern what this graph is supposed to represent, and if it's supposed to represent the expected traffic over time versus actual, it's showing that there was no expected growth in traffic.
I cannot dive into too much more detail than is in the graph since it's still sensitive information. Y-axis is essentially traffic to Cloud Datastore (think: after layers of caching, etc), x-axis is date.
The 2 lines can be thought of as ceilings or upper bounds, hence why they are static - this are the numbers that traffic was expected to eventually reach at peak.So you can think of it as, "we thought we'd be looking at graphs that had this line as the top and traffic would be some curve underneath.
Obviously from the graph shown here, we/they needed a tall graph.
I think that my issue with the chart (and it's such a minor issue to quibble about) is that you're effectively treating your single dataset (actual traffic over time) the same way you're treating your annotations (expected and worst-case traffic). Both of these different things are represented the exact same way in your chart, which is a confusing way to structure things. I would alter the appearance of the ceilings/bounds to not be represented in the legend, and instead be on-chart annotations that show where those expectations were relative to the actual traffic. I would also recommend adding even the most rudimentary labels to the axes.
No arguments there. There's a balance that needs to be struck between technical detail and marketing appeal. Not everyone is going to agree on where that balance is, less so when you're trying to share sensitive information without giving too much away.
> The 2 lines can be thought of as ceilings or upper bounds, hence why they are static
I think this is the disconnect. This wasn't not obvious to me. It looked like everyone was expecting flat growth at either 1X or 5X. If there was a line showing what they thought would be the traffic that goes up (which would be expected) in addition to the ceiling lines then I think there would have been a lot less confusion.
I wonder if a single bar graph would have illustrated it better with overlaying colors for each ceiling.
> I cannot dive into too much more detail ... since it's still sensitive information.
Question: When should we start checking around for posts containing
A) High-level technical overviews with some basic implementational detail
B) In-depth analyses of the stack you built, the challenges you faced, what improvements you folded back into various open-source components, what you'd have done differently, etc etc
?
I'm thinking in terms of timescales - like n months or so. I suspect (A) will be a little easier (and quicker?) to publish than (B).
If one interprets the graph as "derivative of transaction growth" instead of "number of transactions" then it makes perfect sense. The derivative of linear growth is a horizontal line, whereas quadratic (edited, thanks acomar) growth would be a line that has slope.
Do you know how those estimates are made? As someone who only passively follows some Pokemon news it was obvious that Pokemon Go would be absolutely huge on launch. It was all over facebook, youtube, reddit etc since the day it was announced. I don't think I've ever seen so much hype for a game, including major AAA titles. It seem strange to hear that no-one in their right mind would think it would happen.
Yes, and I can tell you even though I spend every day working with extremely large scale systems I wouldn't have told them "you should expect 10x larger", yet alone 50x. Their initial estimates would still have been an very large launch.
The Niantic team did incredible things given the instant historic success that became Pokemon Go.
Sorry if I implied you did something wrong, not my intention at all. I'm just curious how someone comes up with an estimation at all for a game that doesn't have pre-orders, and where similar games don't already exist.
> Yes, and I can tell you even though I spend every day working with extremely large scale systems I wouldn't have told them "you should expect 10x larger", yet alone 50x. Their initial estimates would still have been an very large launch.
The generic 1X, 5X and 50X are hard to understand in this context I think. Pokemon is so popular it's very difficult in imaging what the real numbers actually are. For instance on launch day in America and Asia I would have expected insane numbers (many hundreds of million).
I also feel like the 1X, 5X and 50X number placeholders are useless in this conversation because it doesn't give a sense of scale at all.
There was no marketing? It was being talked about everywhere. If they didn't spend any money on marketing then good for them!
But in all seriousness this is Pokemon; they sold over 3 million copies of a remake of a game on a niche portable console in just 3 days; elevating its brand to an open platform for FREE that can be downloaded and installed on theoretically, what, a billion or more devices? Seems incredibly doable for Pokemon. Few other brands could do the same. Even Mario wouldn't be able to come close to competing with Pokemon's brand power.
I don't see how that data couldn't be used to argue either of our points (in fact I feel like you could make a strong case for my previous post using this very link). Ultimately though this conversation is a bit vapid without actual numbers which is a bit of a disappointment. Oh well.
I'm not sure I follow. Why are you only focusing on the US? This was launched in many countries. Granted not at the same time but that's why I said take the aggregate of each launch day.
You dropped the "Asia" from that. "America and Asia". Also looks like I forgot to mention the other regions in this comment (mentioned it in another one).
Sigh. This would have been easier if we had real numbers. Let us all test our hypothesis. Oh well.
The article says (in one of the only bits of real information) that they blasted past their estimates with only Australia and NZ just 15m after launch.
Whoever came up with those numbers must have had some serious methodology flaws. I know they couldn't predict that it would become the biggest online game ever for a while, but the initial demand prediction was clearly way off even before it started growing like a rocket.
No offense, but your comments are somewhat outrageous to me. I've been on the receiving end of one of those graphs (different scale) because suddenly things happen (we were placed above the apple logo as a feature with no warning AS WE LAUNCHED).
And you're sitting there as the engineering lead or staff going, "How do I even feel about this? Fortunately I have no time to feel because I am off to fight fires." I didn't go home for 2 days, I worked 82 hours that week and >70 the next.
Complaining that the estimates are bad for a product that literally broke everything we know about how to build a successful mobile game and has scaled to a truly unprecedented level is meaningless. Obviously no one expected this. Obviously the engineers wouldn't have wanted it. Obviously the world will respond the way it will to our work.
Show some compassion. But also some humility. None of us are qualified to make projections in the face of phenomena like this.
> Show some compassion. But also some humility. None of us are qualified to make projections in the face of phenomena like this.
You can't predict it will be the biggest game ever, that's not possible. But I feel like their initial estimates were still too low and you could predict it would have been higher. I don't know if it was based on how well Ingres did but...
Pokemon is a huge property. It's had tons of games, movies, sleeping bags, an incredibly successful trading card game, etc. Just having the Pokemon brand on something makes it VERY big.
In the game, you live out the Pokemon dream. This isn't just Pokemon Puzzle League. This isn't Pokemon Mystery Dungeon where you navigate cute little Pokemon around and play a top-down rogue like. You FIND AND CATCH Pokemon in the wild. It's exactly what Ash did in the TV show or comics.
Also, Pokemon are cute as hell. That plus the novelty of the AR stuff meant this had a lot of potential. "Look, I found a cute Eevee over here on my potted plant!" Those pictures were EVERYWHERE. That's tons of viral advertising.
But there is also the in-person effect. You want to compare what Pokemon you have with other people, and that encourages you to get your friends into it. But people were walking outside with their phones playing the game, and they quickly got spotted by people asking "What are you doing?"
All these things make it clear to me the this game had a high chance of success.
There's no way to know it would go to 50x what they guessed or would top the charts. Given their numbers I wonder if the expected should have been closer to 7x and the worst case at 20-25x.
The popularity they got would have taken basically anyone down. That was going to happen. I'm just surprised the estimates weren't much higher.
> Given their numbers I wonder if the expected should have been closer to 7x and the worst case at 20-25x.
What numbers? We know that their estimates and their realities were quite different, but not knowing the real numbers we have no way of even beginning to judge what's reasonable and unreasonable here. For all we know, they modeled directly after the most successful game in the Android market at that date as a baseline and then said, "At the worst case we'd expect 5x THAT."
You can write a ton of paragraphs about how cute Pokemon are, but the truth is that the Pokemon AR game was a massive risk. AR games have had extremely limited update. It seems incredibly unreasonable for me to expect that those folks should have realized apriori that they were about to release the most popular mobile game ever created.
I, for one, will not throw stones. I don't get why you feel the need to assert that you (or anyone) could have done a better job by setting a 20x or 25x target. Or that you could have not only forseen it was necessary, but convince everyone around you that the capex was justified.
Why are you so keen on assigning blame and shame in this scenario? Some of our peers made history. Can we be happy for them for 6 months before immediately backseat driving about how much better we all are in our armchairs?
From the article, it seems like a few million users/day is a typical expectation, but record-breaking isn't.
>Throughout my career as an engineer, I’ve had a hand in numerous product launches that grew to millions of users. User adoption typically happens gradually over several months, with new features and architectural changes scheduled over relatively long periods of time.
It's hard to estimate with no data. Nintendo had never released any other mobile games and the few NES/SNES ports had modest sales (at admittedly high price points for mobile).
When it comes down to it, the game is closer to a fitness app than a Pokemon game and can be described as bare bones. The fancy accessory wasn't even available at launch. Many of the players are people who have not bought a Pokemon game in over a decade, or ever in some cases.
If there is one area I can solidly criticize Niantic for it is rushing the other region releases so quickly. It's clear the servers were already badly overloaded and they just started adding countries left and right. I know it sucks for people who live in those countries to have to sit out while the rest of the world is having fun, but it's really not much more fun to sit at the eternal loading screen because the backend is entirely on fire.
I'm not sure why you're being downvoted into oblivion; you are completely correct. Yes, to address the other points raised, it's difficult to predict how popular something unreleased will be. But this is Pokemon. Their last game release, which is limited to the niche Nintendo 3DS console, sold over 3 million copies in just 3 days. And this was the Omega / Alpha games which were essentially just remakes of older games.
A mobile game, available to be downloaded onto hundreds of millions of phones, that is also FREE? I feel like they broke 100 million in the first day at the latest (counting totals from each region's first day). It wouldn't surprise me if it's significantly higher than that.
Media, users; everyone has been BEGGING Nintendo to release IP to mobile devices but they have kept it locked up in their own hardware. If no one had even a rough idea of the possible popularity they most certainly had a very, very wrong methodology.
There's a difference between supplying the infrastructure for a service, and writing the software for a service. Just because Google was able to supply the infrastructure for Niantic, that doesn't mean their software was designed in such a way that they could handle the load.
Yes, an example from the article showed that Niantic was running in a Kubernetes cluster that could only scale to ~1k nodes. So even if the resources were provided they couldn't add those nodes to their cluster.
You could argue that Google was providing GCE and therefore GCE couldn't scale, but GCE is really just hosted Kubernetes and it's scaling limits were known in advance. Luckily GCP was able to push a quick version update and migrate the cluster but that took considerable time, coordination and engineering effort that couldn't be done "seamlessly".
This is a clear situation where the software Niantic chose couldn't handle the resources that were available. It turns out a lot of the choices you made for your worst case capacity aren't necessarily adequate for 50x that amount :)
I think it's possible that the disconnect between how the Google Cloud team describes what happened and the reality for users can be attributed to the fact that they are just one team in control of one aspect of the entire system. From their standpoint, taking into account their responsibilities, maybe it did go as smoothly they say. A similar post from Niantic would be very interesting.
Yeah, they talked about numbers of bugs fixed, but those surely seem to have impacted the user experience. My big worry is that if I were to consider deploying to such a service, that I would encounter similar bugs, but not have the political pull to get them fixed and simply have to perform hacky fixes around them myself.
> From a user point of view it did not go nearly as smoothly as it is made to sound in this post.
I think a large part of that was due to the non-Google login from Pokemon Trainer Club (which must have been handled by Nintendo's servers, AFAIK), as opposed to the Google OAuth. Both groups of users had problems, but people who used their pre-existing, Nintendo-issued logins had much more problems (and it took a lot longer for those to get fixed).
There was a period of about a week or so during which time people who used the Pokemon Trainer Club accounts were still having just as much trouble logging in and staying logged in, but people who used their Google accounts were fine.
I wouldn't be surprised if providing multiple login options made it harder for them to properly separate their login servers from their game servers, and if this coupling meant that millions of users (effectively) DDoSing the PTC servers ended up impacting the game's uptime more than simply doing all authentication themselves would have.
There were plenty of login issues with Google accounts as well for days following launch in the US. I think you could attribute issues primarily to the way they did load-balancing though. The app assigned you a server at login time, so if something happened to that server or it got overloaded, you were out of luck until you restarted the app and logged in again. I would guess this probably caused cascading failures of servers as thousands of users got booted from one server and then all put onto the next one in line when they realized their app had frozen and logged back in. Don't forget that at launch it was common for users to have to restart their application and re-login every time they caught a pokemon. A real dynamic load-balancing scheme would probably have solved their issues. Maybe Pokemon trainer club login went down, but there were still plenty of unrelated server issues for those using Google accounts as well.
>I think a large part of that was due to the non-Google login from Pokemon Trainer Club
I can't think of anything further from the problems this game had at launch than the login. So few people actually had pokemon logins this comment is honestly comical to me.
Not just that, playing the game (once logged in) turned out to be hardly possible at certain times of the day, and the game freezes due to communication timeouts didn't help the situation either.
I feel really bad for this article. It is the Kobayashi Maru of sales pitches. Working in IT/DevOps/Servers/Software Dev/Etc all my life, I understand that even if you have the servers, scaling can be hard and time consuming. I also can't even imagine supporting the number of people they have. So they did an awesome job.
However, the Pokemon Go player in me says "Wow, even with all of Google's resources, they still couldn't manage to get this remotely stable for several weeks?".
I'm sure there was many amazing technical feats that occurred, and from a deeply technical level this is a good sales pitch. I'm sure a good sales person could spin it even better "50x your expected traffic? Google Cloud can do that!". But beyond that... most people will probably see this as a failure.
On one hand handling such huge amount of traffic is crazy hard and an amazing accomplishment however the tone of the blog is off putting because of just how much a trainwreck it was from the users point of view.
Yeah the graph indicates resource demand rather than user playability. This is why it's critical for KPIs/metrics to be grounded in reality. Reality being that data reflects your business objectives - which in this case is the user was able to enjoy the experience.
From watching my partner play the game (from Canada), the scaling issues went on for days and it serverly hampered her enjoyment of the game. This should be a cautionary tale of having worse case scenario planning at launch time, which a large organization like Nintendo should have factored in (when planning with Niantic).
This article doesn't fairly reflect that. But maybe they are acting confident now because they will be better prepared next time? They can point to Pokemon Go to why you need experts who have been through the trenches of a rocky high-traffic launch.
It's probably because their KPIs/metrics are in fact grounded in reality, the reality of being the cloud platform provider. This isn't a blog post by Niantic.
> paid off when the game launched without incident in Japan, where the number of new users signing up to play tripled the US launch two weeks earlier.
The "without incident" part is hilarious. The game was unusable for over a week when they added more countries. There were memes all over the place about Niantic execs ignoring the burning server and pushing to launch in more countries anyway. I wonder if any of them actually tried to play as a user on the public servers and spent hours trying to logon and it failing, or locking up soon after for a week.
Not to mention they never even got the original tracker functionality (1 footstep, 2 footstep, 3 footstep for anything nearby) working again after that, they had to replace it with a lower load knock off where you just see what is around a certain location that isn't very popular. So not only did they not even keep login working, they cut features too.
I think most of the problems predated the Japan launch. I was in Japan at the time and the game was available most of the time. It was unexpected really considering the insanely high interest over there.
To those that played and were not impressed with game/system stability: okay, okay. I wasn't there, I don't know.
...But, according to the nightly news, it was a tremendous success. The word 'ever' came up a lot.
PMG was the first successful overnight/viral planet-wide/client-side launch. People who had never heard of it saw it on the news and then visited their local app store in response.
And according to the googleblog, it took a tremendous expenditure of money, hardware, electricity, skills, and knowledge to pull it off.
Makes me wonder... Did some other game/app go almost global, but fall short for the want of those very resources described in the blog post?
The broad consensus in the games press was that Pokemon Go is a great example of a strong gameplay loop overriding a massive technical failure.
Google's post is weird because they seem to think the game was a technical success. Google may have done great, it's impossible to tell from the outside, but the actual user experience is - or at least was when I played it - awful.
I think what this article is saying is that the game was relatively technically successful. It admits there were some problems but scaling your infrastructure to 50x what you expected is pretty amazing and most people would have expected more downtime for a similar situation.
So not total-success, but very successful for the situation at hand.
This was one of the most downloaded apps from all time. Went from zero to gazillions of requests in a single day. Nobody could have planned this. Com'on these guys are great.
Yeah the negativity here is overwhelming. Which, isn't surprising, HN's comment section isn't the cheeriest place on the internet. But seriously, for a tech news aggregator you'd think more of the users would appreciate at least the difficulty of scaling an app from nothing to THE most popular app ever, in a matter of days. Yeah they had/still have issues that they could've mentioned in the article, but it doesn't take away from what they DID do.
I really appreciate this blog post. It gives a great insight into what is going on behind the scenes. I was really surprised by how low their their worse case scenario was. Absolute worst case would be every single person capable of running the game playing. Obviously this wouldn't happen, but for a brand with as much recognition as Pokemon, I think "What if everyone in the world started using this" is a good place to start. Obviously this won't happen, but it's important to think about why it won't happen. "What if everyone who has played Pokemon or wanted to know more about Pokemon downloaded this game?" is still unlikely, but it's less unlikely. It's probably not far off from what actually happened.
I don't want to criticize their model too much, because it's obviously simplified for our benefit. However, it appears that their worst-case scenario was "What if we become the next bejeweled or [insert popular F2P game here]?" It's a ridiculous assumption, because Pokemon has a much broader appeal than any other casual game, cause the IP is so insanely popular, and the game still appeals to people who just want a casual game. I know it is a lot easier to get fired for spending too much money than it is for not spending enough, but it's a stretch to say their launch traffic was beyond imagination. Niantic should start looking for new analysts now if their current analysts honestly thought this traffic was outside the realm of possibility.
I don't consider the server issues to be much of a problem though. It's hard to ensure everything will work perfectly under that kind of load. You have to accurately predict who will be playing, how much they will be playing, how they will interact with the game, and so much more. However, I do think they need to figure out their communication with the fan base. I know that there will be a vocal portion of any constituency that hates everything. That isn't a good excuse for communicating poorly. Good communication will help almost every relationship.
> I was really surprised by how low their their worse case scenario was.
There's no Y axis on the chart, so we don't know exactly what their estimate actually was. Regardless, I'm pretty sure Pokemon GO exceeded any reasonable expectations of popularity, even accounting for the brand and marketing efforts behind it.
From my own experience, lots of people that never engaged with mobile games before started playing Pokemon GO within days of its release. My entire extending family was playing the game. Local bars have become arenas for Pokemon fights. The adoption of this game was absolutely crazy.
So even given the scaling problems, the features they had to remove from the game, and the bugs they introduced, I think this is still a solid win for Google CRE.
I'm assuming the Y-axis is to scale. Assuming it isn't, the numbers they give imply what actually happened is 10x worse than their worse case. Unless Niantic defines "worst case" as "1 in 10 chance of happening", being an entire factor off in a worst case prediction is really bad.
For example, one of the highest estimates of daily US users I saw was 25 million. That's way more than I would have guessed off the top of my head, but it's no where near 10 times what I would have guessed the "worst case" scenario would have been. Pokemon Red and Blue sold 9.85 million copies on gameboy alone. My worst case scenario would assume that literally everyone who played the game as a kid would want to check out the game as an adult. Our worse case is massively more accurate than theirs just using the Gameboy numbers alone, with the assumption that each cart sold was only played by one person.
The assumption that people who have never played mobile games before wouldn't be interested in Pokemon is kind of crazy as well. I can't think of another mobile game that has as much name recognition before it released as Pokemon go. It's a beloved franchise, whose popularity spans generations. On top of that, cellphones are far more ubiquitous than the gameboy ever was. There are people who are playing Pokemon go who have never been in to mobile gaming, but I doubt the number of people playing Pokemon Go who have never played another video game in their life is much, much, much smaller.
We had our team building event for my company at an barcade last year. Every single high score was set by someone older than 50. Literally every single person at the company had played Ms. Pac Man before, including some people who literally have never owned a cell phone. The people who don't own a cellphone are obviously not playing Pokemon Go, but video games are not niche, and have not been niche for a long time now.
I will give Niantic massive credit for recognizing how much of a problem this was going to be immediately. I'm sure I would have under-allocated resources on launch day too, and I don't see myself handling that issue as well. Still, there's no way in hell what actually happened would be 10x worse than my actual worse case.
>Pokemon Red and Blue sold 9.85 million copies on gameboy alone.
That was two games, and that was "by the end of their run".
>The assumption that people who have never played mobile games before wouldn't be interested in Pokemon is kind of crazy as well. I can't think of another mobile game that has as much name recognition before it released as Pokemon go. It's a beloved franchise, whose popularity spans generations. On top of that, cellphones are far more ubiquitous than the gameboy ever was. There are people who are playing Pokemon go who have never been in to mobile gaming, but I doubt the number of people playing Pokemon Go who have never played another video game in their life is much, much, much smaller.
You're aided by hindsight. PKGO grew faster than any prior mobile game or any previous pokemon IP, by a large margin.
I would agree. No hard numbers are mentioned so it's hard to understand the context or scale. However Google did mention their new service offering, Google CRE, that gives you access to this same exact service that the post discusses.
So yes this is a PR piece. Not very technical at all.
I would say these numbers are really sensitive and may lead to calculations for active user numbers, etc.
Yeah, internal reports are way more interesting than publich post.
I don't understand. Why is it that you think these numbers are "really sensitive"? What is the threat? Some numbers are already out there, like the number of downloads.
That's pretty misleading - I believe Google's parent company still own part of Niantic. So other customers shouldn't expect the (implied) same access to Google resources.
I can't be the only person who looked at that graph and burst out with a cackle that startled everyone around them. The deep and inescapable dread of that fire burning around you even as you make history must have been quite a feeling.
Or in the vernacular of youth: "This is fine. Everything is fine" as a scaling graph.
So their estimation was that nothing would increase? I am not sure I trust the two parallels lines in the graph. The estimation should have been a spike at the release date a small drop and some grow over time, no?
Pokemon GO really disappointed me, i'm a big fan of Pokemon and this app really sucks. It is really buggy. I stopped playing two weeks ago because of the GPS instability. I hope they get it right some day. But i'm glad they fixed the server issues.
Interesting read but a bit too positive overall. I think the biggest failure of the launch was not learning from the initial launch zones before launching the other zones.
The launch in Europe was a catastrophe imo (constant crashes and freezes). I don't know how much of this is to blame on the cloud infrastructure but I suspect it's not nothing. I feel they didn't provide nearly enough infrastructure given the data they should have had from Australia/USA.
All that being said I think they smoothed out everything and the system seems to be running very nicely now given the scale. It's certainly a positive engineering tale overall.
Software Scalability issues aside. I am not sure if Pokemon Go would ever be possible if it not on the cloud. How could you get instances up this fast. It had explosion of players in very little time. There is no way you could have planned this resources ahead of time. And it die down fairly quickly, which means you would have lots of unused server if it were not for cloud.
Would be really interesting to read up on some of the bottlenecks they had identified and how they optimized them away. I know at some point they had to turn off the "location where a pokemon was caught" map and the radar to keep the thing responsive :)
I'm really disappointed with Pokemon GO. Despite the issues with the launch I still had a lot of fun playing it initially. But the tracker breaking together with the lack of communication from Niantic killed it for me.
Interestingly, the one time they did decide to communicate was when they announced that they banned a bunch of third parties from accessing their server[1]. Of course, just like in this post, they show a graph with a missing y-axis which tells you very little about the traffic they actually received.
It's surprising that this wasn't mentioned in the Google blog, since according to Niantic it was thanks to this ban that they were able to launch in more regions.
IIUC it pretty much does. When I asked for the option to just run an arbitrary command on a host (instead of a container) the answer was pretty much to create an empty container that mounted the host root to the container root and pretend the container wasn't there.
There have been a few numbers about the crash. It's there, but Pokemon Go still remains one of if not the most popular mobile game on a day to day basis.
Even with the servers stabilized the game has plenty of other issues that need addressing. The "nearby Pokemon" window is still broken and shows no sign of ever getting fixed. People in suburbs and rural environments are still stuck playing in extra grindy hard mode. The distance tracking has a speed limit that is way too slow (10.5kph) and noisy so it largely fails if you jog and or get on a bike. The gym battles are comically out of balance (slow defensive Pokemon are blatantly overpowered).
They shouldn't be hard problems to fix but Niantic doesn't seem interested. They toyed with a half-fix to the nearby Pokemon window, but only deployed it in their hometown and then seemingly forgot about it. It's clear the client has enough data to do the launch style footsteps (see: all of the PokeRadar apps on the store), but for some reason Niantic has no interest in re-enabling it.
> It's clear the client has enough data to do the launch style footsteps (see: all of the PokeRadar apps on the store), but for some reason Niantic has no interest in re-enabling it.
I've always suspected that was because they quickly realized that getting pokemon to not spawn in dangerous spots was an insurmountable task, so having tracking that encourages players to go to those locations was asking for even more lawsuits that what they face currently for that issue.
Yeah. Ingress solved this problem by only having the equivalent of Pokemon Gyms and not spawning anything at random locations, so they could fully curate where they were directing users too - and even then it required a certain amount of common sense and local knowledge from players.
i don't think private property was the main problem, but just people getting into dangerous spots anywhere, including public property. making a map of private vs public is hard but not impossible; dangerous vs not dangerous: impossible.
This all doesn't really matter when you lock out anyone who has a rooted phone, with no warning. Sure, a few people were cheating with them, but anyone running cyanogenmod or any other custom firmware, or anyone who wants to get rid of a god-awful OEM skin, is now screwed out of the game. Ironically, this hasn't stopped the actual cheaters or people using AutoMagisk.
Wait, they did that? That's like blocking anyone from playing any games when they have the administrator password to their computers. Being very pro-rooting (for independence reasons, how do you "own" a device that you don't have access to?) this is a good reason to boycott the game on any phone.
Yeah, they did that using a very aggressive Google-provided tool called SafetyNet designed exactly for that purpose, which downloads a program from Google's servers and runs it with system-level privileges to do the checks.
The method could be circumvented using Xposed + Magisk modules on android (I believe iOS had a method as well).
However this is all irrelevant considering they spent a month or two deciding to ban these devices and it only took 3 weeks for new scanner apps to pop up that potentially will never be brought down [1] the owner claims they will never be taken down now in his posts which makes me wonder if he is balance loading thousands of accounts to prevent Niantic from banning his application from working - which really it was bound to happen eventually just needed the time for some to write out the necessary code and figure out a quick way to authorize and potentially keep creating accounts.
In all honesty I believe the ban on rooted/jailbroken devices was really to remove people who were not contributing as much money to the game. In my experience a rooted device usually signifies the user wants to pay less for/in apps.
Of course many users need rooting for backing up, tweaking the interface for better performance, but in the end most rooted users did this method to install hacked apps or circumvent some restriction that they would normally have to pay for (such as wifi tethering).
It is silly and I am more then just frustrated that I had to spend hours getting Magisk installed to keep playing but in the end they are a business and the main focus is making money.
edit Apologies I must have read over you mentioning the Magisk method!
edit2 also I have a lot of friends who bought phone's pre-rooted so it goes to say it's not really a smart maneuver on Niantics part to just assume people are rooting devices purposely.
As a paying customer of pokemon go, I find it completely unacceptable that I am unable to login with no notice and have to find this out over a hackernews comment at the bottom of the page. I think it is fraud to block the rooted device of a paying customer who always played by the rules.
Eh, it's too much of a pain to hide my root. I'll never play this game again.
Also interesting that it said "WAS the biggest Kubernetes deployment". This tells me that they've tanked hard enough for that to no longer be the case.
When you write blog posts reflecting on something that happened you often use/think in past tenses. Don't try reading too much between the lines on things like that.
It's been 10+ years since WoW launched, and that scaled better than Pokemon Go. It didn't need no fancy cloud or autoscaling or the few moore iterations Google got, and it's a game that you know.. actually has a use for networking.
You could make an offline Pokemon Go version and not notice any difference.
"OMG there is a Dragonair! Do you see it?" "Na, I'm not on your server.."
Blizzard scaled using appropriate technologies for its time. i.e. sharding basted on "Realm" and enforcing active player limits. This scaling mechanism was used by every MMO I knew about until EVE.
EVE pioneered the modern MMO. It is no longer acceptable for an MMO to be sharded by server. Even WoW has been aggressively trying to scale its architecture and infra such that it can re-connect these shards into more cohesive worlds. With the most credit to them, they have a lot of technical debt in this regard, but the fact that some servers are labeled "full" with wait queues to join, implies that they still don't have the technology to scale let alone connect all their servers[0].
[0] Note, there are also other reasons (i.e. in-game economical and political reasons) why they are not aggressively connecting more shards.
EVE still uses sharding in a sense. One system cannot scale over more than one server. So when you jump through a gate or bridge, you are probably connecting to another server. This limitation shows itself when a system suddenly becomes the flashpoint for a ridiculous battle, and CCP has to scramble and move the system to a beefier server.
Totally true. If it came across like I meant EVE has no sharding mechanism, my bad.
> One system cannot scale over more than one server.
In this regard, Blizzard's dynamic realms are really interesting. Ideally, we can move to a world, where even single "systems"/"zones"/"areas" are able to scale horizontally.
I have zero experience with it, so I don't know if it can practically scale to solve this problem, but http://www.paralleluniverse.co/spacebase/ has a really cool set of features that MMOs could take advantage of.
Pokemon Go uses sharding called "the real world". The fact that it's technically one world is wholly immaterial.
I'd wager that at any point in time WoW dealt with higher player densities than Pokemon Go, realms or no realms. Certainly higher data densities with VASTLY higher latency requirements.
Of course in EVE, when the whole Python shambles eventually crumble under the load, they just slow down time. I also think there is some Excel spreadsheet you are supposed to fill out prior to a battle so they can deploy some more Cython before that epic MMO experience.
(I think EVE is great, it's just also a hilariously convoluted Python hack sending serialized objects rivaling XML in complexity over TCP. Not much leverage in that for other game types.)
Seamless would not be the word I would use. I would instead argue that what Pokemon Go proved was that Google's Cloud is not of the same quality, at scale, that AWS is.
What it probably shows is that using the cloud isn't a magic bullet to making you scale indefinitely, you need to work on your code to decouple from the infrastructure too.
It would be naive for everyone to assume that a high traffic launch is all about the cloud underneath and only that.
The article didn't mention any of the technical details of the Pokemon application itself, for all we know the infrastructure was humming nicely and the application itself didn't scale. Or the other way around, or a combination of both or one of the other of thousands of moving pieces it takes to launch something.