ml-inference | Lei.Chat()

Single-node ML Runtime Foundation

Previous blog posts overviewed the MLIR dialect hierarchy for kernel code generation (CodeGen) and zoomed in on the Linalg and Vector dialects among them. Now I will switch to discuss the runtime side a bit, in order to provide a holistic view of MLIR-based machine learning (ML) compilers. This one touches the foundation and basics, including the target landscape, runtime requirements and designs to meet thereof.

2023-04-01

18 min read

runtime, mlir, ml-inference

ml-inference , compiler-development

MLIR Linalg Dialect and Patterns

I explained the Vector dialect and related patterns in the previous blog post. In this one let us look at a layer higher and talk about the Linalg dialect and transformations around it.

2022-08-31

13 min read

compiler, ir, mlir, ml-inference

compiler-development

MLIR Vector Dialect and Patterns

The vector dialect and related transformations are crucial components in the MLIR CodeGen flow for machine learning (ML). Today I will zoom in on it to explain its positioning in the overall picture, characteristics, important operations and transformations, and best practices of using it based on my experiences.

2022-07-31

36 min read

compiler, ir, mlir, ml-inference

compiler-development

MLIR CodeGen Dialects for Machine Learning Compilers

The initial blog post in this series captured my overall take on the evolution trends of compilers and IRs. It also touched on LLVM IR, SPIR-V, and MLIR, explaining the problems they are addressing and design focuses thereof. Today I will expand on MLIR and talk about its dialect hierarchy for machine learning (ML) compilers systematically.

2022-02-20

16 min read

compiler, ir, mlir, ml-inference

compiler-development

CodeGen Performant Convolution Kernels for Mobile GPUs

This blog post talks about how to generate performant code for convolution ops using MLIR’s multiple levels of abstractions and transformations. I initially created it for targeting ARM Mali GPUs in IREE. But given it is just direct tiling and vectorization, it should be widely applicable. I will walk through the lowering steps, so if you are interested to know how to organize MLIR’s various dialects/patterns together to achieve similar tasks, this blog post might also be useful.

2021-09-19

36 min read

android, ml-inference, gpu-performance, compiler, mlir

gpu-codegen

GPGPU, ML Inference, and Vulkan Compute

Nowadays GPUs are utilized for both graphics rendering and general-purpose compute (GPGPU). For the latter, CUDA is the indisputable leading solution. Though, with so many other GPU vendors, the quest for a GPGPU standard never stops. OpenCL was a great attempt and is used widely; but still it falls short on many aspects. Given the success of Vulkan in graphics and it being both a graphics and compute API, one would wonder whether it can actually be the next-generation GPGPU standard. I certainly believe so; but the road is not full of roses.

2021-07-25

19 min read

android, ml-inference, vulkan-compute

vulkan-compute

Edge/Mobile ML Inference Challenges

These days if you would like to learn about machine learning, there are abundant great resources on the web discussing model architectures and how to code and train them. Materials about inference, though, are generally much harder to find, especially for edge and mobile. You might ask, inference is just the forward pass of training, so how hard can it be? Actually, it faces lots of unique challenges, to the extent that we are basically solving completely different major problems. I have been working on inference at the edge for a while, so let me capture them in this blog post, by contrasting training and inference in the cloud.

2021-07-17

11 min read

android, ml-inference

ml-inference