After watching the James Bond movie "Skyfall," one scene where the character Q is "hacking" the villians computer showed him analyzing a program which has had it's code obfusticated. He seems to have analyzed the call graph of program, and has visualized the graph in 3D. He mentions that every time he tries to "access" the code, it changes shape.
Once I got over the perposterous misuse of terminology like "security through obscurity" and "code obfustication" and "access," I calmed down and let my imagination run wild for a bit with the science fiction I had just witnessed.
First, it is possible that this "code obfustication" was a graph rewriting algorithm, hence the "rubix cube that fights back." But a skilled analyst would probably try to freeze an image of the code in memory after a single rewriting iteration, and use a graph analysis technique on the code that would not require executing the code so as to prevent it from shifting it's shape.
However some graph analysis must, by necessity, partially execute the code to analyze it. There are situations where a branch of execution is decided entirely by an arithmetic computation. You can narrow-down the number of possible branches by analyzing the possible values that could be plugged into the computation to determine reasonable bounds on the resultant values that choose the branch. But if you want to know exactly which branch the program takes, the only way is to actually execute the program up to that branch and then observe what it does next.
For example, you have a large array, the array contains a reference to every function in the program. As you trace the program execution, you arrive at one function where the inputs to the function are passed through a hash algorithm to produce a new key value. This key you got from the hash is then used to select an element from the array, and then the function selected is called. Your call graph now has a branch pointing to every function in the entire program, which is useless to someone trying to figure out the flow of execution.
Now, what if the entire call graph of the program is determined in this way? That is, every function call in the program is determined by hashing input values and selecting the next function to call from the array by the hashed value. Sure, if the hash is unpredictable, your program will just bounce around through every function in the array, not really doing any interesting work and probably getting stuck in an infinite loop eventually.
But what if your hash algorithm was specially designed in such a way that the function calls were not unpredictable? What if your hash algorithm was so perfect that the correct inputs would result in a call graph that actually performed a specific algorithm, performing useful work?
Well, if your "hash" is really just ordinary arithmetic on the input values to the function, that wouldn't be any different than any ordinary computer program. It would have to be more like a hash tree with loops, an initial value is hashed, the result is passed to the function and used to make a choice, the choice taken is passed a hash of the previous hash value, and so on.
This is indeed far-fetched at best, and impossible at worst, but I am curious. I am reminded of Lie groups, and if a program could be expressed as a Lie algebra, and every possible call graph of the program were a local algebra of the global Lie group, for example. I don't even know if that made sense, but I am envisioning every possible initial value producing an entirely different call graph, but these graphs are all somehow related mathematically due to the nature of the hash algorithm.
Of course, why go to all this trouble? If the program doesn't have the initial hash values somewhere, the program won't be able to execute itself. If you include the initial values, an analyst would eventually figure out exactly how to generate the call graph and your program wouldn't be so obscure anymore. Maybe you intend to install the program and initialize it with the correct hash values by hand when you intend to execute it. But then, why not just encrypt your program with an asymetric key, and only decrypt it when you are ready to execute it?
No comments:
Post a Comment