kernel | Lei.Chat()

CodeGen Performant Convolution Kernels for Mobile GPUs

This blog post talks about how to generate performant code for convolution ops using MLIR’s multiple levels of abstractions and transformations. I initially created it for targeting ARM Mali GPUs in IREE. But given it is just direct tiling and vectorization, it should be widely applicable. I will walk through the lowering steps, so if you are interested to know how to organize MLIR’s various dialects/patterns together to achieve similar tasks, this blog post might also be useful.

2021-09-19

36 min read

android, ml-inference, gpu-performance, compiler, mlir

gpu-codegen

Sampling Performance Counters from Mobile GPU Drivers

In a previous blog post I gave a general introduction to GPU driver internals in Android/Linux systems. Following up with it, today I will explain how a specific functionality, hardware performance counter (perf counter) queries, is handled in both Qualcomm Adreno and ARM Mali drivers, by walking through the kernel driver source code.

2021-07-08

10 min read

android, gpu-driver, gpu-performance

gpu-driver

Android/Linux GPU Drivers: Internals and Resources

Recently I have been working on a library that needs to directly interact with GPU kernel drivers from various vendors on Android/Linux systems. Compared to various GPU APIs, information at this level is quite sparse; so it is not a straightforward task, to say the least, and ends up requiring me to piece multiple sources together to figure out the details. So I am logging these driver internals and resources down in case it can be useful to others that are interested in these low-level bits.

2021-07-05

12 min read

android, linux, gpu-driver

gpu-driver