Open Thoughts

GCC + Machine Learning

Posted by Cheng Soon Ong on July 5, 2009

I found a cool project recently which applies machine learning for something that affects most of us who write software. Milepost is a project that uses statistical machine learning for optimizing gcc. They point to for further development. Just to quote that they hope to do with milepost gcc:

"Next, we plan to use MILEPOST/cTuning technology to enable realistic adaptive parallelization, data partitioning and scheduling for heterogeneous multi-core systems using statistical and machine learning techniques."

There is a lot of infrastructure that needs to be built before coming to the machine learning. In the end, the machine learning question can be stated as follows:

Given M training programs, represented by feature vectors t1,...,tM, the task is to find the best optimization (e.g. compiler flags) for a new program t. In the standard supervised setting way, they collect training data for each program ti consisting of optimization (x) and run time (y) pairs. Then the machine learning question boils down to finding the parameters theta of a distribution over good solutions, q(x|t,theta). I.e. the right compiler settings (x) for a given program (represented by features t).

However, it seems that they use uniform sampling to search q(x|t,theta) for good solutions, and once they have these islands of good solutions they use 1-nearest neighbour for prediction. There seems to be a lot of scope for improvement on the machine learning side.


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.