Machine Learning for Result Estimation of Timing, Resource Usage, and Operation Delay in High Level Synthesis
In high level synthesis, acquiring of accurate result estimation becomes difficult in the earlier stage due to complex optimizations in the physical synthesis. Hence, there occurs a trade-off between efficiency, which involves evaluating in the HLS stage itself and accuracy, which points to waiting for post synthesis results by HLS tools. The factor of accuracy can be optimised using Machine Learning tools by learning from real benchmarks.
One set of parameters is Estimation of Timing, Resource Usage, and Operation Delay : The main methodology is to train an ML model that takes HLS reports as input and outputs a more accurate implementation report without conducting the time-consuming post-implementation.
The workflow can be divided broadly in two steps :
Data Processing :
Like every ML model, HLS estimation also
requires training and testing data. The HLS and implementation reports are
usually collected across individual designs by running each design through the
complete C-to-bitstream flow, for various clock periods and targeting different
FPGA devices. After that, one can extract features from the HLS reports as
inputs and features from implementation reports as outputs. The data being
handled here is of a huge quantity. In order to overcome the effect of
collinearity and to reduce the dimension of the data and retain only the most
significant parts, application of feature selection techniques is done. This
removes the unimportant features from the data.
Selected features after dimensionality reduction
Training the Estimation Models
After we have the refined dataset, regression models are trained to estimate post-implementation resource usages and clock periods. Frequently used metrics to report the estimation error include relative absolute error (RAE) and relative root mean squared error (RMSE). RAE and RMSE are favoured to be as low as possible.
𝑦’ = vector of values predicted by the model,
𝑦 = vector of actual ground truth values in the testing set
y(bar) = mean value of y
N = number of samples
yi’ = predicted value of sample
yi = actual value of sample
The results are in terms of maximum clock frequency, throughput, and throughput-to-area ratio for the RTL code generated by the HLS tool.
Very Insightful!
ReplyDeleteApart from estimation of the mentioned parameters, where else can it be used?
ReplyDeleteIt can also be used for cross-platform performance prediction.
Delete