Volnaiskra
  • Spryke
  • Volblog
  • downloads
  • Presskit
  • About

Overhauling and Optimising

1/6/2017

4 Comments

 
It's been a long while since I've written any updates on Spryke, so I figured I'd better hop to it. I've spent the past couple of months overhauling Spryke's code to make it more optimised. I'm doing this for a few reasons. Firstly, the performance was starting to creep into an area where even on my relatively high-end gaming PC it would sometimes dip into subpar territory (ie. less than 60 frames per second). I always told myself that this would be the point where I'd stop and start looking closer at how I'm doing things. Spryke will never be the sort of game that runs blisteringly fast on a tinny laptop with integrated graphics, but I'd certainly like it to run well on anything even resembling a gaming PC.

Secondly, I'm determined to make Spryke feel like more of a game this year. Not just an assemblage of levels, but an actual game, with a solid skeleton and all the necessary stuff, like proper loading screens, graphical options, robust camera, and so on. Part of that also means cleaning up the code and getting it into something resembling a publishable state.

Also, some of the really core code (eg. Spryke's unique movement engine) was 3 years old, written when I was a beginner with Fusion. This code was visibly inefficient, and all but completely incomprehensible to present-day me. I was getting tired of breaking things when I added a new feature and not even being able to tell how it was broken.

Redoing all that stuff was a real drag. Merely figuring out how it worked was time-consuming, as has been untangling it and reshaping it into something more modular, optimised and manageable. I've refreshed a lot of the code already, but I'm still knee-deep in the process. And, of course, I've broken lots of little things along the way, so after all this my next step will be to repair all of the 'optimisations' ;)

To be honest, I'm getting pretty sick of this process, but it's important, and I know I'll be very grateful to myself when I'm done. Up until now, I'd taken to heart that silly adage about premature optimisation being the root of all evil. I always knew I'd eventually do a wave of optimisation, so I was never too fussy about how I made my code. I was never sloppy, but tried not to be too fastidious either. I'm now questioning the wisdom of that. Though I know so much more now than I did when I started, so it's probably best that I left the fastidious stuff till now. However, cleaning up after old code is proving so time-consuming that I'm determined to make all my code from now on as optimised as possible. I've done a lot of benchmark testing to see where I can most beneficially streamline the code, and I'm developing a lot of good habits that I'm going to keep as I go forward.

Here's a little sample of the sort of optimisation work I've been doing: 
Picture
Spryke generates animated smoke clouds out of her side vent as she moves. These are hand-drawn and animated in Toon Boom Harmony, but to make them look extra believable and fun, they are also generated and manipulated programmatically. For instance, they come out semi-randomly, at slightly different speeds and sizes, and each smoke cloud randomly chooses from 7 separate handmade animations. They're also more numerous when Spryke's jumping (since that takes more energy), and their angle naturally adapts to the arc of Spryke's jump as they come out of her.

Previously, I had 4 separate events for 4 different scenarios (jumping left, jumping right, ground/ceiling left, ground/ceiling right - all shown in the image), with each event taking care of that scenario's particular requirements for smoke angle, direction, and probability, as well as positioning the smoke just right so that it comes out of her vent. So one event basically said if Spryke's jumping left, generate some smoke and do this, this and this to it, while another said if Spryke's on the ground moving right, generate some smoke and do that, that, and that to it...and so on.

But with some careful maths using some pre-existing variables, I was able to consolidate all 4 events into a single event. The code no longer needs to check whether Spryke's moving right or left, airborne or not. The unified event simply runs in all of those circumstances, and adjusts the smoke's position, angle, direction, and probability as needed on the fly. 

For example, this equation takes care of the angle for all 4 scenarios:

set angle to inAir * (VAngle(XinputSpeed, YmainSpeed) + 180 * Abs(Xdirection-1)*.05)

the VAngle bit uses Spryke's horizontal and vertical movement (XinputSpeed and YmainSpeed) to set the correct angle for the smoke...if she's jumping diagonally right. However, if she's jumping diagonally left, we need to spin the angle of the smoke around by 180 degrees. So I needed the equation to subtract 180, but only when she's facing left.

Depending on which way Spryke's facing, my variable Xdirection is either 1 (right) or -1 (left). So I used this in the equation, by subtracting 1 and halving the result. So when Spryke's facing right, that results in (1 minus 1) * 0.5 = 0, and when she's facing left, it's (-1 minus 1) *0.5 = -1 (but because I also used Abs, which forces a positive number, the final result is actually 1 instead of -1). So now, when we multiply 180 by this result, we end up with either 180 * 0 = 0 when facing right or 180 * 1 = 180 when facing left. Which is just what we needed.

Finally, we multiply the whole lot by inAir (which is either 1 or 0), which will leave the result unchanged when Spryke's airborne, but force it to 0 when she's on the ground/ceiling and her smoke doesn't need any angle change at all. So now we don't even need to test for Spryke's direction or whether she's airborne. We simply run the code each time, and it will figure out the rest on its own.

With this kind of convoluted gymnastics, I was able to dynamically work out the smoke's direction, movement, position and probability as well. The end result is a bit harder to read, but I've commented it well, so future-me shouldn't have too much trouble understanding it. And despite the extra computation required by the slightly more complex actions, the reduction in conditions (from 14 to just 3!) more than makes up for that. I did some performance testing to make sure, and indeed the new version runs about 10% quicker. 

On top of this, I improved performance further by turning the smoke generation code off a portion of the time, preventing it running at all when Spryke isn't eligible for smoke (ie. when falling, floating, or on a wall), and switching from animation sequences to animation directions (which is significantly faster, as I discovered in some tests I did recently). All up, this new vent smoke looks better, yet runs up to 50% faster in stress tests. 

But stress-tests aside, what's the real-world impact of the change, during normal gameplay? Not much. 1 or 2 FPS on my system, though it could be more on a slower system. But that's the nature of the beast when it comes to this optimisation stuff: few single changes will make a significant difference, and one can only hope for cumulative benefits, from the systematic weeding out of such inefficiencies across the whole code. And that process is what I'm knee-deep in right now.
4 Comments
Bo3b
1/6/2017 06:40:25 pm

Well written post and interesting to read.

However- you are doing it wrong. I say this as professional software guy who was the Tech Lead for performance testing on Apple's MacOS. I've got a *lot* of experience with performance.

You are going about it wrong, because you are randomly trying things to see where performance might hide. This is no different than randomly changing code when written, with an 'eye' toward performance. This is premature optimization, and it's a waste of your time.

What you want to be doing for an optimization stage is not rewrite everything that is already debugged. You want to find the hot spots and fix those.


What you want to do instead is to *profile* your code. I don't know if fusion has any tools to do performance profiling or not, but that's where you need to spend your time. If they don't have any tools, then they suck and need to be browbeaten. Even if they don't, you can still do profiling, you just have to roll your own.

Performance Profiling takes a given part of you game, where you notice a slowdown, and works to understand where the hot spots are in the code.

In my copious experience, performance problems have *always* been some key spots in the code that wind up being a bottleneck for whatever reason. It's extremely rare that you'd find anything worth fixing from code inspection or guessing.

With a performance profile you can find the hot spots that burn the most CPU (maybe GPU), and then work on that one spot to remove the bottleneck. Once that is done, you move onto the next hot spot and fix it. Fixing the top ten hot spots will gain 100x anything you've done to date.

Take a word of advice from a battle scarred developer- profile it.

All the best,
bo3b

Reply
Dave Bleja link
2/6/2017 11:52:44 am

Hey bo3b, great to hear from you.

I remember well you writing something to this effect on the 3D forums - that most of the time the 'optimisations' people did made in your team at Apple made no difference because they weren't targeting the real bottlenecks. I was actually quite influenced by reading that statement, and it's one of the reasons why I didn't do much adhoc optimisation during development, and why I did a big round of performance testing before this current phase of optimisation. I'm not too familiar with what proper profiling entails, but I think I may have been doing a type of profiling - or at least partially.

If you don't mind, please let me tell you a bit more about how I've been working, and tell me if it seems sensible, and what I could do better.

Fusion doesn't have a built-in profiler that I know of, but I've rolled my own little A/B performance testing mechanism - I'm not sure if you would call this a profiler or not. It lets me isolate sections of code and measure the milliseconds required to complete a tick of the frame in each. I use it to A/B test two alternate methods of doing something. Sometimes I use code straight from the game, and sometimes I use a blank testing scenario.

More often that not, when I'm testing pieces of code, I need to repeat them 100, 1000, 100000 times per tick to amplify any inefficiencies and expose the difference between my A and B tests. Otherwise, the result is basically the same no matter what code I test, even code that I know is CPU-intensive: it runs fast. Too fast to accurately measure. Usually everything reports as taking about 8ms, which is no doubt a result of the engine being locked at 120fps (1000 / 8 = 125).

Please have a look at the 2nd post in the thread below, where I've reported some of the results of my testing. No need to read the whole post - just glance over it to get the gist:

https://community.clickteam.com/threads/101380-The-Clickteam-Fusion-2-5-optimisation-amp-performance-Hard-Data-thread?p=720013&viewfull=1#post720013

Also, please see this post from the 4th page, which has screenshots of my actual testing setup (the grey box with a "VACCiNE" watermark shows the results):

https://community.clickteam.com/threads/101380-The-Clickteam-Fusion-2-5-optimisation-amp-performance-Hard-Data-thread?p=722149&viewfull=1#post722149

So, I've done a lot of testing. Most of it hasn't been testing actual real-life code from the game (as I said, that doesn't usually yield results with any helpful granularity), but often it has been variants or simulations of it, amplified or stress-tested to better expose differences between A and B tests.

As you can see in the first link, I tried to identify things which have a big impact vs things which have a small or neglible impact. Only after this big round of testing did I start to actually optimise my code. For example, hiding fastloops in closed groups proved to be extremely effective in the tests, so this was one of the first things I did.

There's probably another difference between my situation and yours when you were Tech Lead at Apple. You were, presumably, working with programmers who knew what they were doing. Much of the earlier code in Spryke was cobbled together when I was basically brand new to Fusion. Some of that older code was in desperate need of an overhaul, and not just for performance. Some of it was just really messy, non-modular and tangled with other things, and it made my life harder having to work with something that would so easily break and that I couldn't even remember how it worked. So that code had to go, and I figured while I was at it, I'd give the rest of it a spruce up too, as the more I fixed of the very early code, the more glaring inefficiencies I started spotting in semi-early code.

The example I posted about here - the vent smoke - is probably something that wasn't strictly necessary to optimise. And indeed, its impact appears to be very minor. But it seemed reasonable to tackle it once I had realised that I could reduce 14 conditions (essentially IF statements) to 3. Reducing conditions proved to be one of the areas of significant impact in my testing (80% faster in one example: seee "Always+condition VS condition+condition" section in the 1st link). While reducing the conditions in the vent smoke code may have neglible impact, I have no doubt that if I do it across all the code, the impact will no longer be neglible.

So, that's a bit more info about how I've gone about it. I hope that it's a little less adhoc than you initially thought. But I'm sure it's far from perfect, so I'd love to hear your feedback on what I could do better!

Reply
bo3b
4/6/2017 06:11:23 am

Hi Dave,

You've done some good work there to get a feel for what ClickFusion does internally, and what might be expensive versus cheap. That's not a complete waste, and you can make more informed architectural decisions based on that. Knowing your tools weaknesses is worthwhile.

And it's sometimes worth going back to refactor code that is a mess. It depends on the value though. That code has already been debugged, so refactoring it means it's all new again, and will cost you debug time. Might still be worth it, especially for stuff that changes a lot or where the bad/old architecture is causing longer term pain. Always a judgment call, but just be aware that it's never free, and not always time well spent.


As far as actual profiling though, none of this qualifies. And none of this is likely to actually solve any performance problems your customers might see.

Here is a tutorial of sorts for Visual Studio profiling. Naturally it does not match ClickFusion, but if you skim it, you can get a rough idea of what we are looking for.

https://msdn.microsoft.com/en-us/library/ms182372.aspx

Because software in general has somewhere on the order of 90% of the code gettting run once or zero times, and 10% doing all the work. Of that 10%, some tiny fraction is actually a bottleneck. Trying to narrow that down by inspection or high level experiments is nearly impossible.

Changing stuff like you've found from experiments is satisfying in the code-monkey sense, cleanup is always deeply satisfying. But, none of that actually translates into any performance advantage, because you are changing stuff in the 90% code that barely gets used. Even if you know it's in the 10% high-use part, you are unlikely to guess the bottleneck that actually matters.


The crux of the problem is defining a good test case. What is slow? Where do you see slowdowns or performance hitches, or bad 'feel'? Is it a question of just overall frame rate is too slow?

Your example of the flip horizontal is a good one, where you would have been able to find that quickly if you were doing profiling. You note that it 'feels' slow when going a particular direction. So that would lead to a test case, where you enable the profiling upon a change in direction, and disable it when you change again. Or change profile output logs.

Using that example, you'd immediately note a difference between left and right, which is illogical, and then can dig further to find out why.

The essence of a profile is to specifically not try to guess what is happening, and just ask the computer. With a good test case, a known trouble spot, you just profile it and quickly get the top 10 CPU or GPU users (routines/objects/methods), and then figure out from there what you can change to improve it.

If you hit even the top one on the list, you'll probably reduce the impact of that performance problem to where there is something else more interesting to work on. Maybe even adding levels, better graphics or non-performance issues.


Your A/B testing is a good way to determine that you can fix a specific problem, but it's not a good way to actually find the problem in the first place. That's because you are having to decide a priori what a problem might be, and as I'm noting- you'll rarely win the guess the bottleneck game.

But, say for example, you profile, and find out a given routine is the bottleneck. Using your A/B you can then directly compare what it was doing, to what it is now doing, and seeing that you were successful. Normally, I'd just do another profile to be sure I killed the bottleneck, but your A/B could also work.


Profiling is taking the entire code base into account, and specifically not ignoring things that "shouldn't matter".

Good profiling tools will just hand you the answer on a silver platter. Like in the example website, a list of the ten functions doing the most work. Hopefully these map to something you are doing in your code, but there may be required translation of what ClickTeam is doing with your commands. You will probably have some abstraction that takes a lot of time.

If it makes sense, I can do a VisualStudio level profile of your game for you. I don't expect you'll probably want to learn all this stuff, but maybe. If you can generate an .exe that has symbolic information, routine names, and describe a good test case- Then I can profile it.


Another example would be to roll-your-own profiling. At the heart, profiling is simply measuring time spent in every routine. It's possible to just use a system wide timer, and measure on input and output of every routine, and just sum up the microseconds.

In object oriented languages, you can sometimes do easy on-entry/on-exit routines, which is a nice and clean way to add profiling code.

The goal is to get a complete summary of your routine names with the top users sorted. Sometimes it will be what you expect, and at some point, you'll run into limitations you have no c

Reply
bo3b
4/6/2017 06:15:11 am

--- Well... what a surprise- I type too much. :-> Here is the remaining.
---

Another example would be to roll-your-own profiling. At the heart, profiling is simply measuring time spent in every routine. It's possible to just use a system wide timer, and measure on input and output of every routine, and just sum up the microseconds.

In object oriented languages, you can sometimes do easy on-entry/on-exit routines, which is a nice and clean way to add profiling code.

The goal is to get a complete summary of your routine names with the top users sorted. Sometimes it will be what you expect, and at some point, you'll run into limitations you have no control over. But it's better to know, than to not know.


In terms of testing performance, you need to have a test machine that is the slowest machine you expect to support, and use it as your profiling/test machine.

Maybe that is Intel graphics, maybe just a low spec machine. But, you really do not want to use your best machine for this, because the bottlenecks move to other locations depending upon the hardware.


The goal of profiling is to save you time. Cleaning up the code is emotionally/professionally satisfying, but it's not the best use of your time. Your example of the smoke from the vents is a case in point. Satisfying, because it's simpler and cleaner- but no actual benefit really, just different. Does it really matter that it was 14 conditions? The computer doesn't care. The routine already worked. Working on that meant you have less time for other more valuable work.

Here is an old article I wrote, long, long ago, that is still valid today, on how to focus on what matters. There are of course hundreds of these sorts of things. It's general wisdom, that directly speaks to some of what you have already mentioned.

http://www.mactech.com/articles/develop/issue_25/veteran.html

As a professional software guy, I use this sort of stuff as a metric when interviewing to determine if I'm talking to a junior software person, or a senior software person. The difference is subtle, but also profound in terms of productivity. You will perhaps not be surprised that the code I ran across at Apple was nearly always from junior level programmers. They could build stuff, but not good stuff.


So... long story. Your approach is OK, but isn't going to find the big bottlenecks, and is also inefficient. Not trying to be harsh, just calling it like I see it.

Please let me know if that doesn't clarify what I mean, or if I can help.

Thanks,
bo3b

Reply



Leave a Reply.

    Author

    Picture
    I'm Dave Bleja. I quit my career to make one the most immersive, deeply crafted platformers of this generation. 

    Spryke is the story of a dimension-hopping cyberfish who discovers the joys of dry land!

    Stay in the loop

    Sign up for newsletter
    Follow @Volnaiskra

    RSS Feed

get updates

Follow the progress of our original game Spryke, along with Skyrim mods and more, on the social media platform of your choice





© 2015 Volnaiskra
  • Spryke
  • Volblog
  • downloads
  • Presskit
  • About