Micro-optimization #1: Setting a `done` flag using bitwise operators

I’m planning a series of very brief micro-optimization notes, both for my own records and to help anyone else who may be looking at some optimizations. I plan to provide minimal code, results, and brief explanations.

In this case, I came across a bitwise |= for setting a done flag in one of Penny de Byl’s Udemy courses. I was curious and decided to see if it was an optimization. It felt like it would be, but I didn’t expect by much. Sure enough, it is, but not by much. Still, if it’s a function that you have a lot of while-loops in your code checking against a boolean value, it could be handy.

The code:

static void Main(string[] args)
{
	for (int i = 0; i < 10; i++)
	{
		DoneTest1();
		DoneTest2();
	}
	Console.Read();

	Environment.Exit(0);
}

static void DoneTest1()
{
	bool done = false;
	int x = 0;
	int xSize = 100_000_000;

	while (!done)
	{
		x++;
		done |= (x < 0 || x >= xSize);
	}
}

static void DoneTest2()
{
	bool done = false;
	int x = 0;
	int xSize = 100_000_000;

	while (!done)
	{
		x++;
		if (x < 0 || x >= xSize)
			done = true;
	}
}

The results:

Using done |=  : 112ms  (1122354 ticks).
Using if  : 151ms  (1518356 ticks).
Using done |=  : 107ms  (1073112 ticks).
Using if  : 129ms  (1298421 ticks).
Using done |=  : 127ms  (1275415 ticks).
Using if  : 141ms  (1414998 ticks).
Using done |=  : 111ms  (1112100 ticks).
Using if  : 127ms  (1273705 ticks).
Using done |=  : 108ms  (1086612 ticks).
Using if  : 140ms  (1400030 ticks).
Using done |=  : 127ms  (1271739 ticks).
Using if  : 128ms  (1282120 ticks).
Using done |=  : 108ms  (1089749 ticks).
Using if  : 111ms  (1118823 ticks).
Using done |=  : 108ms  (1086191 ticks).
Using if  : 110ms  (1100477 ticks).
Using done |=  : 100ms  (1002949 ticks).
Using if  : 113ms  (1131274 ticks).
Using done |=  : 104ms  (1040928 ticks).
Using if  : 110ms  (1101986 ticks).

Each iteration shows a better performance using the bitwise |= compare and set rather than the if-statement. Across the ten iterations, the bitwise averaged 111.2ms while the if-statement averaged 126.0ms which amounts to a ~11.75% increase in performance. The bulk of the time in each set is, of course, the computer counting to 100,000,000, but given that the only difference in the two methods is the check for setting the flag, the variance is accounted for by that difference.

The Reason:

Bitwise operations on most architectures are almost always faster than other calculations, and branching statements are typically computationally heavier. When bitwise operations are an option, they usually result in more performant if less readable code.

Methodology:

For those of you not familiar with benchmarking in C#, I typically use the .NET Stopwatch class (System.Diagnostics.Stopwatch). I’ve removed the Stopwatch() code for brevity in the code example above. So long as you Start, Stop, read, and Reset your stopwatch in appropriate locations, you don’t need to worry about setup and other functionality as the only portion times is what is wrapped between Start and Stop.

I also run the program (usually a .NET Core console application) as a release executable rather than a debug executable to ensure performance isn’t being bogged down by the debugger for any reason.

Lastly, I try to run ten (or more) iterations of each thing that I’m testing. As you can see in the results, timing can vary for all manner of reasons. Sometimes the first execution of a code block is slower than subsequent executions. I also try to interleave each method or function being tested (e.g.: 1, 2, 1, 2, 1, 2, 1, 2 rather than 1, 1, 1, 1, 2, 2, 2, 2) to help ensure the code block isn’t being cached and repeated. Running only a single iteration is often misleading. In this case, all ten iterations of the bitwise comparison were faster, but it’s often the case the the slower of two methods might have a small percentage of faster executions and running single iteration may provide incorrect information about which is typically faster.

Unity Profiling for Fun (and Profit?)

For those who may have been following since before this blog began, you may have seen the Iceglow Gel death sequence that includes the possibility of spawning smaller Mini Iceglow Gels.  This was, by and large, a stroke of sheer genius on my part (yeah yeah yeah, just let me have this one).  It seemed like a cool idea and has led to some other cool ideas for deaths with other mobs.  In fact, each mob has a DeathScript requirement, even if it’s just to, you know… die.

But this has also been a thorn in my side.  When an Iceglow dies, there is a major stutter before the new minis are spawned in.  I finally decided to toss up the profile to see what was going on, and decided to write this brief post on profiling, because damn, it’s handy!

For those new to Unity or development, the profiler is a pretty common tool used to “debug” the actual running game.  It can attach to a development build for better and more accurate profiling, but if you’re simply looking for bottlenecks where the definitive time and resources aren’t as important as discovering the spike, you can also simply attach the profiler to the editor itself.

Ice Glow Death Profile 1
Ice Glow Death Profile 1

This was a capture at the time of the resource spike right when the Iceglow dies.

Ice Glow Death Profile 2
Ice Glow Death Profile 2

The details of CPU usage show that the >1s spike is caused by the PhysX core baking the mesh for the minis.  I was hoping that, since this was running in a coroutine, that it wouldn’t impact the whole of the game.  Maybe I’m doing it wrong (or, at least, not the best way).  Maybe I can pre-bake those meshes.  Maybe I can use simple colliders instead (a cube is far easier to calculate).  I’m just delving into this, and I’m not sure what the best solution is yet (stay tuned for updates on that, or follow my plea for help), but for now, the profiler is my handy tool to figure out what calls are b0rking the game.

Easily as important as debugging code, and definitely more important for tuning, I can’t recommend highly enough learning to love your profiler.

How To See Your Player… Making Walls Transparent

Over the past several months working on the Dungeon theme for Labyrintheer, I’ve changed my camera angle several times.  I keep moving it higher to prevent walls and such from occluding the player, but I’m never happy with such an oblique view.  So, over the past few days I’ve been looking at options to make walls transparent when they are between the player and the camera.

Some solutions simply disable the geometry.  This isn’t acceptable for my game, and I suspect for many.  You could accidentally walk backwards out of the playable area, or an errant AI could take a bad turn during it’s pathing and fall off the world.  Plus, disabling geometry just doesn’t seem like an elegant solution.  My primary goal (and I’m still working on it) is to use a shader for this directly, though that seems like it has some major pitfalls (how do you tell a shader about an object other than the one that it’s drawing?).

So, for now I’m cheating with a very small amount of code and an extra material for objects that I want to hide.

Basically, I’ve duplicated the four wall materials I have, and the duplicate materials use transparency with an alpha value of 100.

My player controller script now calculates it’s distance from the camera every frame (though I think this might be able to be done once in Awake() since the distance should be fairly static), like this:

     void Update ()
     {
         GetInput();
         ProcessInput();
 
         distanceSquared = (transform.position - Camera.main.transform.position).sqrMagnitude;
     }
 
     public float Dist()
     {
         return distanceSquared
     }

Then created a script to go on the walls (or any object that needs to be transparent to prevent occlusion), as such:

TransMaterialSwap.cs

 using UnityEngine;
 
 public class TransMaterialSwap : MonoBehaviour {
 
     public Material _original;
     public Material _transparent;
     private GameObject player;
     private playerController pC;
     private Renderer rend;
 
     void Start()
     {
         player = GameObject.FindWithTag("Player");
         pC = player.GetComponent<playerController>();
         rend = this.GetComponent<Renderer>();
     }
 
     void Update()
     {
         if ((transform.position - Camera.main.transform.position).sqrMagnitude < pC.Dist())
         {
             rend.material = _transparent;
         }
         else
         {
             rend.material = _original;
         }
     }
 }

In the inspector I set both the original material and the transparent material.  If the object is between the camera and the player, it switches the object’s material to the transparent material.  It looks like this:

There are a few issues here.  First, I still need to profile this to see if the solution gives my runtime performance a hit.  I don’t suspect it’ll be TOO bad, but it doesn’t hurt to check, especially with larger maps.  I may look into options to only run the check if the object is visible to the camera rather than always checking on Update(), every frame for every wall.  The other issue is that by making it transparent, light comes through.  I’m not sure how big an issue this will be – it’ll require some play testing.  But it may be an issue in some situations.

Lastly, as I said, I really do want to attempt this in a shader.  I figure it’s a good method to learn shader programming, even if exactly what I want isn’t possible.

Another day, another sizetest

So there’s been some refactoring to how rooms are created, a new set of room tiles made a couple of weeks ago, the addition of some more realistic torch placement in the deep dark cave and it seemed like it was time to run my ‘SIZETEST’ map again just to get a feel for things.

The ‘SIZETEST’ is set to a base of 2500 rooms, far larger than anything expected in the finished game.  Currently, it ends up being 2674 rooms in the cave biome.  The base size is used to generate a rough estimate of a given maze/dungeon size, but other factors and randomizations will throw that off by a factor.  At any rate, clearly the refactoring has been good.  With only general logging enabled, it now only takes ~3 seconds to create that 2674 room behemoth pictured below.

That leads me to another thing that my day job should’ve taught me, but that somehow had been slipping my mind of late – logging is a resource hog.  Turning on the full compliment of logging nearly doubles the generation time.  Still, even with logging now, it takes ~6 seconds with logging.  Previously, the SIZETEST was taking closer to ~35-45 seconds to generate.  This is pleasing.

SIZETEST

Same map without the room-stage color coding, so you can see the lighting placement.

Screen Shot 2016-08-14 at 8.26.19 PM