GPUs have become the dominant computing platforms for many applications, while programming GPUs with the
widely-used CUDA parallel programming model is difficult. As sequential C code is relatively easy to
obtain either from legacy repositories or by manual implementation, automatically translating C to its
parallel CUDA counterpart is promising to relieve the burden of GPU programming. However, because of huge
differences between the sequential C and the parallel CUDA programming model, existing approaches fail to
conduct the challenging auto-parallelized program translation. In this paper, we propose a learning-based
framework, i.e., BabelTower, to address this problem. We first create a large-scale dataset consisting of
computeintensive function-level monolingual corpora. We further propose using back-translation with a
discriminative reranker to cope with unpaired corpora and parallel semantic conversion. Experimental
results show that BabelTower outperforms state-of-the-art by 1.79, 6.09, and 9.39 in terms of BLEU,
CodeBLEU, and specifically designed ParaBLEU, respectively. The CUDA code generated by BabelTower attains
a speedup of up to 347× over the sequential C code, and the developer productivity is improved by at most
3.8×.
Overview of BabelTower learning framework. We train the discriminative ranking model in the
back-translation step, i.e., CUDA-C-CUDA, to synthesize paired data. Further, we specially designed the
metrics ParaBLEU for CUDA, and learn to predict the ParaBLEU score by minimizing the KL-divergence between
the output distribution of the model and target distribution.