We’ve spent most of this week upgrading the server to be more efficient and robust. This is a big deal, as stability is critical to reaching Alpha (our deck-building system is getting closer, too). Without getting into too much gory technical detail, here’s a few of the things we’ve done:
We started off sending json messages. These are plain text messages that are really easy to read (for debugging), and pretty simple to execute. The downside of json is that it makes for huge messages, and it takes a lot of processing power to work with it. It’s also fragile – having the whole game state based on text strings means things can go wrong in ways we don’t expect.
We’ve switched formats to use a Google tool, protobuf, which is way more efficient, and way less CPU-intensive. This gave us a 3x decrease in server CPU usage, and a 100x decrease in game message size. Every game message we send is now well under 1Kb, so we’re totally fine on that front for a good long time.
CPU will always be a battle for us, so we’ll be keeping on looking for more CPU usage improvements as we continue development.
We’ve found a few bugs in the server over the last couple of weeks (thanks for testing!). We’re tracking some of them down and squashing them, and it looks like most of them have the same root cause – the way the server sends messages wasn’t up to modern standards.
We’ve improved it somewhat (now, if one player ends up with a blocked messaging channel, it shouldn’t affect other players). We still have more work to do – there’s no reason any messaging channel should ever lock up, and we have to rework our message send system to make sure it doesn’t.
The server was built as a very basic system, with some prototype code elements. We’ve been going in this week and taking out some of that prototype code.
For example, we used to require each game to use its own thread, and then sleep the thread while the event loop wasn’t running. This meant we would have a lot of threads hanging around doing nothing most of the time (each thread would be sleeping more than 99% of the time). That’s a huge waste of both memory and CPU cycles as threads switch between each other.
We’re testing out a task-based system, which eliminates the vast majority of this overhead, and should allow us to do some interesting further optimizations. We’re still testing this change – we’ll see how it works over the next couple of weeks, and if it’s good, we’ll include it in the next demo build.
Have any questions about what we’ve been working on? Let us know!