uvkcompute

CodeGen Performant Convolution Kernels for Mobile GPUs

This blog post talks about how to generate performant code for convolution ops using MLIR’s multiple levels of abstractions and transformations. I initially created it for targeting ARM Mali GPUs in IREE. But given it is just direct tiling and vectorization, it should be widely applicable. I will walk through the lowering steps, so if you are interested to know how to organize MLIR’s various dialects/patterns together to achieve similar tasks, this blog post might also be useful.

2021-09-19

36 min read

android, ml-inference, gpu-performance, compiler, mlir

gpu-codegen