Updated: Aug 22, 2019
Pre-Alpha launched last week, but I spent all of this one trying to solve lingering performance issues. When I first made the web build, there were two problems I didn't notice during development or my Windows build smoke testing: loading time and frame rate. The first, unfortunately, I can't do anything about. I'm using the third-party A* asset for my AI needs, and while it does come with dedicated asynchronous methods for navmesh generation, WebGL does not support threading. This means that while I can generate my largest-size map in about four seconds on Windows, it takes closer to a minute online. The good news is that the eventual Steam build will target Windows and Mac, so that particular performance issue is only a problem now, when I'm looking for remote playtesters.
There are several strategies that can be adopted to improve frame rate, however. The main issue that's faced in games is the number of draw calls that must be made in order to show everything on screen. The number of draw calls per frame determines how long that frame takes to completely display, which in turn determines the number of frames per second. The more fps, the smoother your game looks. So, in order to increase fps, we need to decrease draw calls. Without any sort of optimization, it takes one draw call per object onscreen per material on that object. For a game with a large number of objects onscreen at once, or lots of complicated materials, this can quickly add up. I'm going to talk about some different ways to cut this number down, split into three categories: Batching, Instancing and Modeling.
There are two types of Batching: static and dynamic. As the names suggest, static batching is used for objects that do not move. Dynamic batching is used for objects that do move, but it has more restrictions, like a lower total allowed number of vertices on the objects. In general, batching takes objects which use the same model and the same material and groups them together in the renderer. In simplistic terms, most of the time that a draw call takes comes from switching the renderer settings from the last one, so if a bunch of objects in a row use the exact same settings, the renderer can rip right through them extremely quickly. Those objects are batched together. But batching is finicky. Besides the vertex limits I mentioned, and the obvious restrictions of the batched objects needing to share meshes and materials, all your objects need to exist at game start, and they can't use multi-pass shaders. My objects actually fail on all four accounts, then. Because I wanted nice, smooth objects, all my models were high-poly; because I've been conditioned to code for reuse, I created a ton of very generic materials that could be stacked on top of each other on an object for a combined effect I liked; because my map is generated dynamically at runtime, the objects can't exist beforehand; and because the entire aesthetic of the game is built around the cartoony-outline shader, I needed multiple rendering passes. I simply couldn't use a batching technique.
GPU Instancing also requires having the same model with the same material, but that's about it. The general concept is the same as batching, but the underlying strategy is different. Whenever the renderer goes to draw an object, it needs the object's vertex, normal, and texture data. A typical implementation might be a vector3, a vector3, and a vector4 to store all the data...for one point. Now multiply that over your entire object, and for as many objects as are in your scene, and that's a lot of data. GPU Instancing helps reduce the overhead by instead storing all that data for the first object only, then for all identical objects, only storing a single translation that should be applied to all the points. (For truly identical objects, the only difference will be their location in the world.) The less repeated data the renderer has to deal with, the faster it can run. Now, there are a few issues here as well. First, shaders aren't able to use Instancing right out of the gate, so the ones I was using needed to be converted. Then, of course, I still had the one-model-one-material problem. At this point, I realized that I wasn't going to be able to do much of anything because of that, so I used some of my recently-gained shader knowledge to write yet another one, which combined the outline and a 2D texture into one material. Applied to my simplest object, this allowed Instancing to work...sort of. I watched the frame rate in my test scene soar, but the outline was still mucking things up. Only a single one of my objects had it. Apparently the outline data couldn't be instanced the same way everything else could. Or it was applied too late in the render pipeline. In any case, strike two.
So finally, we have the Modeling optimizations. If we go back to the one draw call per object per material premise, one way to go would be to reduce the number of objects. With some clever coding, it's possible to make Unity treat all like-objects as one. Rather than having all your trees render in order, or one tree rendered over and over, now you're rendering a single, giant tree conglomerate. With the correct scripting overhead to combine objects optimally, you're now down to one draw call per material - since all objects with a common material are one object total. This is great, except...
While the whopping 203 fps frame rate is not lost on me, the outline shader is once again wreaking havoc. The only thing that I can figure is that the objects' normals are corrupted during the combination process, causing the outline to only show up at a certain angle. Not what I want.
However, all the work I had done to get to this point was not worthless. After turning off instancing and the model combination, this was the result:
That's a still-respectable 130 fps, up from the approximately 6 (yes, 6) that I was seeing on my largest map before doing anything. Returning to blender for another hellish week of simplifying meshes, coupled with shader writing and material work to reduce to a single material per object, was enough to get respectable performance without sacrificing too much visual quality.
Head over and play now to check out the final result in action!