Compilers and IRs: LLVM IR, SPIR-V, and MLIR

2022-01-08

27 min read

compiler, ir, llvm, spirv, mlir

compiler-development

Compilers are often critical components in various development toolchains that boosts developer productivity. A compiler is normally used as a monolithic black box that consumes a high-level source program and produces a semantically-equivalent low-level one. It is still structured inside though; what flows between internal layers are called intermediate representations (IRs).

IRs are critical to compilers. Like there are many compilers, there are also many IRs in use. I’m fortunate to have direct experience with three major schools of IRs or infrastructures thus far—LLVM IR, SPIR-V, MLIR, particularly extensively for the last two, where I both joined development in an early stage. So I’d like to write a series of blog posts to log down my understanding of compilers and IRs. Hopefully it could be beneficial to others.

This is the first one, where I’ll sketch the trend of compiler and IR development in general. The focus would be on explaining why those existing IRs are in their current manner in my opinion, not what are the exact mechanisms and how to use them. The what and how are better served with various language specifications and tutorials. So this inevitably means the discussion will be abstract and philosophical, or meta, to use recent trendy terms.😉 I’ll be more concrete in later blog posts. Also, I’ll try to explain them in a comparative fashion, because typically it makes grasping new concepts easier by anchoring on some existing ones.

Oh by the way, I may fork out here and there to talk about domains other than compilers. But I promise they are still related and I won’t digress for too long; threads will join eventually. So, have some popcorn while busy waiting.

Without further ado—

Compilers and IRs

Before diving into concrete IRs, let’s discuss compilers and IRs in general.

Abstractions and semantics

Despite astounding technology advancements, human brains remain little changed since civilization. The way we comprehend the ever-growing complexity almost everywhere is via abstractions. Abstractions help us to ignore minor factors and focus on major aspects. It reduces the number of variables the human brain needs to juggle with, and that’s essential.

What derived from abstracting a problem is commonly called a model, e.g., a machine model (in programming languages) for describing the computing machine in an abstract fashion, a data model (in data systems) for describing how a datum relates to another one, a system model (in distributed systems) for describing timing and failure assumptions.

A model is typically an ideal description of the original entity. Models are sometimes very far away from reality, like, it’s hard to come up with a model for the stock market. Albeit, they are indispensable to keep complexity at a particular level contained—models typically give us clear semantics, which are principles that must hold by definition. That allows us to reason about the entity we are modeling in a logical or mathematical way. Human brains just crave for logicalness and explainability.

It seems I digressed greatly here; but really this is trying to explain the jargon in compiler development discussion that we’ll see a lot—abstractions, models, semantics, and reasoning—in a broader sense.

Other domains in computer science do care about these aspects too, like, we’d want to write our object-oriented classes with clear abstractions. But oftentimes there is more art to it than science. Compilers, on the other hand, focus abstractions more from the science perspective; they want clear semantics so that they can prove the transformations they do are correct w.r.t. the source program. That introduces—

Correctness and optimization

The top concern from compilers is correctness; optimization is always second to that. Generating “performant” code does not matter in any sense if it does not follow the original program’s intent.

Correctness cannot be defined and agreed upon without clear semantics. Unknown operations tie the hands of compilers and the only safe option is to do nothing regarding them. Correctness needs to be maintained through the full transformation flow inside compilers. So there are “boundaries” inside compilers: within each transformation step (typically called a pass), there might be violations of some form, but after it, the resultant code should be correct. Compilers heavily rely on validations to check the correctness after transformation steps.

Only after having correctness, we can talk about optimization and performance. Generating performant code seems an unequivocal goal; however, there are still plenty of subtleties.

Different source code have different programming patterns, and different hardware favors different instruction sequences. Compilers, sitting between the source code and target hardware connection, really have limited choices regarding performance because it needs to trade off among many factors to make sure optimization transformations are indeed beneficial for the majority of cases. That’s a very high bar to meet, especially for compilers handling general programming languages like C++. So in the end, only quite a few transformations (like dead code elimination, constant folding, canonicalization, etc.) can be run universally; lots of other transformations are pushed down to a second target-specific optimization phase.

As programs flow through compiler layers, more and more high-level information will dissolve into much low-level and detailed instructions, this is called lowering. The reverse way is called raising, which is typically significantly harder, as it’s trying to figure out big pictures from messy details. Lowering is the normal flow inside compilers. Thus we can see, transformations happening in a later stage structurally have disadvantages because of missing high-level information. That limits what target-specific optimizations can do.

The fundamental problem behind this is the strong coupling, among different application domains (especially using general programming languages), and among different vertical transformation paths inside the compiler (especially using the same common IR).

Decoupling is the general approach to enable more complex systems and advanced use cases, as witnessed in lots of domains, if not all. Going forward, having domain-specific languages and decoupled compiler internals would unleash optimization power better. More on this later when discussing MLIR.

In summary, a mature compiler may deliver the majority of the performance, like 80%, for all the cases it supports. It’s impractical to expect top performance for all applications. But that’s precisely the benefit it brings: it saves a huge amount of engineering effort to get that 80% and let developers focus on the remaining 20%.

Productivity tools

It might be quite obvious already, but to reiterate, compilers are really tools for developer productivity. It might seem technically geeky and cool to write assembly directly; it’s hardly a productive approach (but it is a totally valid approach to get the best performance if done properly). Being able to write in a higher level language frees developers from thinking about registers and instructions on the chip, which are tedious, time-consuming, and error-prone. It’s a significant boost to productivity.

Now that comes to the approach we manage the ever-growing complexity—we have abstractions at different levels; we just need to build tools to automate the conversion between different levels of abstractions.

Not all such conversions can be automated; but for those possible, we can view their conversion tools as a broader sense of compilers. For example, in data processing systems, Apache Beam provides a unified abstraction for describing data processing jobs and converts them to whatever abstractions provided by underlying execution engines like Spark or Flink, which then eventually compiles down to tasks on concrete machines and orchestration. The full flow here can be thought of as compilation.

The existence of compiler-like tools improves developer productivity greatly by hiding details at a lower level and letting developers focus on higher level description of the problem. Compilers automatically convert the abstraction levels and bridge the gap, by running a sequence of transformations on IRs.

IR forms and compatibility

IRs, as the name shows, are just representations of programs. They are designed to make transformations easier. (Well, mostly, as we will see later they can fulfill other purposes.)

In the early days, we can have one single IR for all compiler internal use, but as compilers evolve and toolchains become more and more complex, the boundaries blur: nowadays compilers can have multiple levels of representations internally (e.g., a C++ program compiled via Clang goes through Clang AST, LLVM IR, MachineInstr, MC, etc.). We can also see the full compiler being split into offline and online stages (e.g., for GPU), and the program representation in the between can also be called as IRs (e.g., SPIR-V), albeit not entirely internal anymore.

An IR can have three forms: an in-memory form for efficient analysis and transformation, a bytecode form for persistence and exchange, and a textual form for human inspection and debugging. Now there are interesting design tradeoffs to be made here regarding what form is centric and how much compatibility is supported, per the use case. For example, LLVM IR is intended for compiler internal use so it opts in for efficient processing of the in-memory form and weak compatibility, while SPIR-V is intended for hardware driver consumption so it opts in fast processing of the bytecode form and strong compatibility. There is no right or wrong; it’s a matter of satisfying the needs. But it does affect the whole ecosystem built on top of them greatly.

IR design considerations

There are no universal rules by which we will certainly design good IRs. Most of the time, it’s weighing different tradeoffs and making a choice. Such tradeoffs involve how common or special the current case is, what the combinatorial cost it brings to the whole compiler stack, the impact over transformations, and so on.

But as general guidelines:

operations in an IR need to have clear semantics. As explained before, this is the foundation of correctness.
Then we typically want operations to be orthogonal if possible; this makes it possible to have a canonical form and reduce the cases we need to consider when writing transforms.
We also typically don’t want to have duplicated information from different pieces of the IR—that runs a high risk of inconsistency after various transformations.
It’s also beneficial to retain high-level information as much as possible, because lowering away and then raising when needed is very hard, if ever possible.
And many others.

All seem reasonable guidelines right? Now I can actually have counterexamples for most of them:

If the hardware has a special functionality unit, we would want to expose that, even if it introduces non orthogonal operations in the IR. For example, it’s common to see fused multiply-add operations in addition to the common separate multiply and add operations in GPU IRs.
In SPIR-V, a module would declare all the needed capabilities first; that information can be deduced by running analysis on the IR. So it’s duplicated information. But it saves effort for the GPU driver compiler as it does not need to perform such analysis during runtime, thus improving runtime efficiency.
For retaining high level information, if the original input is low level enough (like C), then we have nothing to do but to raise the levels, like performing auto vectorization on scalar source code.

The above is really to highlight that IR design is full of tradeoffs and specific to domains and use cases. As discussed previously, it’s natural to have conflicting goals if the compiler is trying to serve a broad range of applications and one IR is trying to serve different vertical conversion flows. Different needs make it hard to identify uncontroversial directions in discussions. That would mean losing the ability to iterate and evolve fast, which can be costly sometimes.

Unbundling the compilers by going domain specific and with separate levels/pieces of IRs would be helpful here. This allows each domain to influence the compilers to the best according to their characteristics. Different levels/pieces of IRs can focus only on the use cases they intend to serve and thus also make the most suitable choices.

LLVM IR

LLVM was initially released in 2003. After almost 20 years of development, it is very mature and has a fantastic ecosystem. Many frontend programming languages and backend hardware targets are supported. Many software or hardware vendors release their own modified fork of LLVM to support their own stacks. I think the significance of LLVM to the industry does not need any belaboring.

Decoupling and modularizing compilers

The most important thing LLVM brings is the practice of decoupling and modularization. The plethora of wonderful libraries and tools built around LLVM is just a natural result of that.

In the pre-LLVM era, compilers were quite ad-hoc and monolithic. Those compilers can still consist of three phases like current LLVM-based compilers—a frontend for parsing source language, an optimizer for performing optimizations, and a backend for generating target machine code, but they typically only focus on a specific language (family) or target hardware. Different compiler stacks share little. It’s not possible to leverage the existing frontend/backend support in a compiler stack and plug in new ones, due to the tight coupling and monolithic nature. This impedes compilation logic reusability and thus does not really realize the retargetability promised by three-phase compilers.

LLVM revolutionizes the above with decoupling. At the center it’s clearly the LLVM IR, which has the full capability to represent programs using control flow and basic block containing SSA-formed instructions. The completeness enables it to be self-contained so that it can be detached from other representations and serve as the sole intermediate exchange between frontends and backends. This decouples frontends and backends entirely.

Then all we need is to enforce good practices of modularization. LLVM codebase is organized as a set of libraries. Granted, libraries have their problems; but they are battle tested as the way for system-level modularization. Libraries put relatively clear boundaries between compiler components. By exposing proper APIs, it also allows one to pick and mix compiler functionalities of interest and perform various tasks, like static analysis and formatting by calling into Clang libraries. All those are super useful. (Think of the time clang-format saves to arguing about styles and manually formatting code!)

Textural IR forms

In addition to decoupling and modularization, LLVM IR also introduces many other usability and productivity improvements. The native support of a textual IR form in addition to in-memory form brings the traditional UNIX philosophy back to compilers—having each tool doing a simple job and chaining them together with files and pipes.

In UNIX-like systems, files are the universal abstraction for resources. Textual files, in particular, are the medium that the majority of tools exchange information. They are powerful enough to support different needs and yet intuitive to use. It’s just so natural to dump the textural representation in the middle of a processing pipeline (e.g., cat <file> | cut -f2 | sort | uniq -c) to see or debug the intermediate state. Nothing beats simplicity in the long run!

It’s hard to argue that compiler components, even modularized, are simple tools. But the textural LLVM IR form does serve the purpose of UNIX files to make chaining tools and inspecting internal states simple. That includes testing the compiler itself with FileCheck because now we can just feed in human-readable textural forms and check against the output textural forms.

The other side of the coin

LLVM is a great leap forward to compiler development. With its good designs and the effort of a vigorous open-source community, we have seen so many great tools come into existence and improve developer productivity. However, it is said that every coin has two sides. Now with the existing LLVM ecosystem as the basis, there are shades casted by the design tradeoffs that become more and more obvious.

Centralization and forks, forks, forks

LLVM IR is centric to the whole LLVM ecosystem. That is the foundation to the great decoupling of frontend and backend. However, it also means a full flow must pass through via LLVM IR.

Changing LLVM IR itself needs to meet a very lofty bar nowadays because of its nature of center of gravity. All the tools are expected to process it, and there are so many transformations and different organizations' workflows go through it. Even if you don’t need to plumb through the full flow that frequency, tweaking a small aspect of the IR can still trigger a surprising ripple effect. So that naturally means changes are slow and requires extensive discussions and sign offs from many stakeholders. That is all necessary to guarantee the quality of LLVM IR; but if I just have a very isolated need, it would be quite hard to motivate a change and justify its landing.

One way, and also the typical way actually, is just to fork LLVM and keep the modification local. But that has a high cost too. LLVM IR being centric really favors upstreaming as much as possible. There are almost 100 commits lands per day in the monorepo, bringing various new features and bug fixes. If not merging consistently, it risks divergence more and more and then it eventually becomes unmanageable, unless staleness is fine. On the other hand, consistently catching up with upstream would mean dedicated teams and engineering efforts.

So the end result is that we have many different flavors of LLVM forks at different versions or commits, with varying degrees of freshness. Maintaining and updating those forks certainly take a significant amount of engineering effort if taken globally.

Though it’s hard to say the problem is unique to LLVM; large and complex systems that are developed by open-source communities and productionzied by various organizations have similar problems. However, those projects typically expect customization of some sorts; in contrast, LLVM IR’s absolute centric role makes it hard. In a sense, it’s a strong coupling. MLIR goes another step towards decoupling the IR.

Evolution and compatibility

Another LLVM IR’s design choice is to co-evolve the IR with various analysis and transformations. This is crucial for better and better toolchains. But it does mean weak compatibility guarantees. The community will try to maintain compatibility as much as possible, but the ability to make breaking changes are certainly reserved.

Compilers typically operate at a level that is close to the operating system and hardware device. So it’s quite natural that people leverage LLVM IR as the program representation to device drivers, especially given the great ecosystem and the fact that LLVM IR has a bytecode form!

However, using LLVM IR as the representation to flow through a coherent set of software and tools is the well supported path; if hardware and devices are involved, it’s a different story. Hardware devices may be in some end product (e.g., a cell phone) and thus controlled by the device manufacturer and end consumer; so there is no guarantee when the driver, which contains LLVM libraries to consume LLVM IR, will be updated, if ever.

There are quite a few attempts of using LLVM IR this way, with mixed success. A notable example is Standard Portable Intermediate Representation (SPIR). SPIR was meant to represent OpenCL kernels. It is LLVM IR pinned at specific versions, with OpenCL compute constructs defined as LLVM intrinsics and metadata. Over the years, the Khronos Group gradually realized that LLVM IR is not really designed for this kind of task, which led to the birth of SPIR-V.

SPIR-V

SPIR-V was initially released in 2015. It is meant to be the intermediate language for multiple Khronos APIs, including Vulkan, OpenGL, and OpenCL. Defining a new IR and building out the full software stack around it is a tremendous amount of work. Nonetheless, SPIR-V is deemed worthwhile because of the particular domain and use cases.

Standard, extensibility, and compatibility

The slogan of the Khronos Group is “connecting software to silicon”, which actually is a quite concise and accurate summary of what it does. The connection is made via standards and APIs.

The Khronos Group defines standards and APIs for hardware vendors to implement and for software vendors to target hardware regardless of platforms and devices. Prominent Khronos standards include Vulkan, OpenGL, OpenCL; but there are many more.

The main purpose of a standard is to provide abstraction and consistency among different implementations. However, standards should also be able to expose vendor-specific features to acknowledge the differences in reality and let software gain the most for specific implementations. The competing needs are served by tiered support for features and a clear procedure to propose, uplevel, and/or deprecate features.

SPIR-V is a standard, so it’s no exception. Apart from core features, SPIR-V provides many mechansims for extending the IR, including adding enumeration token values, introducing extensions, or full extended instruction sets under a particular namespace. Notably, extensions also have multiple tiers—vendor-specific, EXT, KHR. Any vendor can propose an extension, but extensions tiers close to core would need more vendors to be on board and going through more rigorous review and approval procedure. SPIR-V and its governing workgroup group in Khronos provide both the technical and organizational framework for tiered feature support.

LLVM IR does support ways to extend the IR, particularly using intrinsics and metadata, but it’s hard to imagine supporting all sorts of vendor-specific intrinsics, let alone introducing vendor-specific types or vendor-specific modes to core LLVM IR instructions.

Further, a standard bridging software and hardware has strong emphasis on stability and compatibility because hardware drivers update much less frequently than software toolchain.

It’s also not surprising to see drivers never getting updated in the lifetime of the device, for example, many Vulkan drivers on low-end Android cellphones stay at Vulkan 1.0, which was released almost 7 years ago. If we use LLVM IR as the representation, we have a high chance that the producer uses a very recent LLVM version, but the consumer, which is inside the device driver, stays at a version that is many years ago. That can cause all sorts of issues and headaches. In contrast, SPIR-V provides the must-have stability and compatibility with version and extension mechanisms, and a stable binary encoding.

Stable binary form

The full compiler is split into two phases: an offline phase where developers produce SPIR-V from some high level source code, and an online phase where drivers further compile SPIR-V into machine code.

Although like LLVM, SPIR-V is also “intermediate” in this compilation flow, it focuses more on the efficiency of driver consumption, as that’s the step happening at the runtime (so online). Therefore, SPIR-V is primarily a binary format, and the encoding has design choices to make driver consumption easier, e.g., declaring required hardware capabilities upfront to save driver compilers from running heavy analysis to deduce. There is no defined in-memory form or textural form; that’s upto the specific toolchain implementing the SPIR-V standard. For example, SPIRV-Tools defines its own in-memory representation and textual form; so does the SPIR-V dialect in MLIR.

Serving the GPU domain

Okay, I have talked a lot about the “standard” and “portable” parts, but haven’t really touched on the IR aspect in SPIR-V thus far. 😊

Frankly, the IR aspect does not differ from LLVM IR too much. Actually, SPIR-V takes lots of inspiration from LLVM IR (why not!)—it also employs control flow graphs with basic blocks containing SSA-formed instructions. The granularity of those instructions is akin to LLVM IR.

What’s special in SPIR-V, though, is the native support for many GPU concepts and intrinsics with constructs like decorations, builtins, and instructions (e.g., for calculating derivatives and sampling textures). Also to cater for both graphics and compute usages, there are many execution models and modes. Oh, of course, structured control flow requirements if for graphics.

Being a GPU-centric standard requires native support for GPU concepts, tiered extensibility, stable and compatible binary format. These requirements do not fit into LLVM IR’s assumptions and tradeoffs, so SPIR-V was designed.

But designing an IR is just a start, building out the full compiler toolchain takes significant engineering effort. Since it is entirely detached from LLVM IR, the SPIR-V compiler stack cannot leverage existing LLVM libraries. So it started from scratch, going from assembler, disassembler, then to compilers and optimizers. It would be much simpler if we had an infrastructure that can help to build domain-specific compilers—

MLIR

MLIR landed in the LLVM monorepo at the end of 2019. So it’s just 2 years old. My feeling is that it takes at least 5 years to see a reasonably mature ecosystem. So in that sense, MLIR is still very young and there is lots of development to happen. But still, MLIR is already bringing many novel ideas or profound changes to compilers, notably infrastructurization to further decouple compilers and IRs.

Infrastructurization

Infrastructurization is a natural endpoint for technology evolution. Being a part of the infrastructure means the solution is mature enough and widely deployed. Based on that, the next technology evolution can then happen. We see this for transportation, electricity, internet, public cloud, and so on. Those are massive ones for sure; it also happens for smaller scaled technology, because it helps to share the cost of developing infrastructure and let each other focus on core business logic.

Many people get to know about MLIR as one way to implement machine learning compilers. Serving the ML domain and power ML compilers is indeed the initial application and still one important role for MLIR; but MLIR is much more than that.

MLIR is a compiler infrastructure that aims to provide reusable and extensible fundamental components to facilitate building domain specific compilers. Unlike LLVM IR or SPIR-V, where we have one central IR containing a complete set of instructions to represent the CPU/GPU programs they aim to compile, in MLIR, there is no one IR that is clearly at the center.

What MLIR provides is just infrastructure to define operations (instructions in the broader sense) and form logical groups of them (called dialects in MLIR) based on functionality. MLIR also tries to provide universal patterns or passes that can just apply to suitable operations without hardcoding them.

Both goals require MLIR to look at compilers in a more fine-grained manner to decompose concepts further. Instead of seeing operations at the atom, as an infrastructure, the granularity goes down to types, values, attributes, regions, and interfaces (like attribute/type/operation interfaces).

Operations can have an arbitrary number of operands, results, and attributes, and may contain an arbitrary number of regions. Region is a powerful mechanism to allow nested levels of operations and make information localized, which simplifies analysis and transformation. Operations also implement certain interfaces. That allows patterns and passes to be written to operate on interfaces, so detached from concrete operations.

All concepts in MLIR are abstract and detached by design, in order to map to various domains and use cases.

Dialects, dialects, dialects

But the purpose of an infrastructure is to help build products (here, domain specific compilers). We program C++ by calling into STL or even higher level libraries. Rarely one would write everything from scratch. Also infrastructure also needs to co-evolve with its products, as there are where the needs come. So in MLIR, there are also in-tree dialects for abstractions at different levels. They are like STL to C++.

MLIR’s dialect ecosystem is still in its organically growing phase, but there are already early signs of stability and maturity. For example, we have both LLVM and SPIR-V as edge dialects to convert to/from IRs in other systems. (This fact alone actually signifies that MLIR is really an infrastructure as it’s able to define other IRs entirely!) We have middle level abstractions like Linalg, Tensor, Vector, SCF for structured code generation. We have Affine, Math, Arithmetic for low level computation. AI framework dialects like TensorFlow, TFLite, MHLO, Torch, TOSA for importing ML model graphs into MLIR. And many others.

Alex posted an awesome graph and discussions of different dialects in MLIR, which is well worth a read if you are interested. I’ll also write up my understanding of the dialect ecosystem later. These dialects (and eventually partial or full flows connecting and wrapping them) will make developing domain-specific compilers much more simpler.

Further unbundling compilers and IRs

The infrastructurization and plethora of dialects are actually another step towards decoupling and modularizing both compilers and IRs.

Instead of one single central IR, now we have many reusable “partial” IRs that are logically organized into MLIR dialects. If those readily available ones aren’t satisfying your needs, defining new ones and writing their validations are also very easy with the declarative op definition. So given a few more years for MLIR to mature further, we can imagine a world where developing domain-specific compilers means just defining the edge dialect for project specific operations, and picking existing middle or low level dialects and chain them together to form a full flow! This certainly takes much less effort than building everything from the ground up.

Also, the decoupling gives us the flexibility to make tradeoffs based on the specific needs of the domain. We can choose only the necessary partial IRs to piece together a full-fledged compiler; we are not required to pick up a full IR like LLVM IR in its entirety with all its complexity. Extending the functionality of existing components is also simpler given that interfaces are what connects operations and patterns/passes. We can both define new ops implementing existing interfaces to make patterns and passes work immediately, or actually attach new interfaces to existing operations to make them support external patterns and passes.

In other words, if LLVM IR is centralized by nature and favors unified compiler flows, the MLIR infrastructure and its dialect ecosystem is decentralized by nature and favors diverse compiler flows.

The typical trend of technology is evolving from single monolithic options to abundant diverse choices. It’s especially the case for upper layers, because there we are more close to the concrete business needs, which is diverse by nature. For example, think of how many web frontend frameworks, distributed data processing frameworks, ML frameworks we have.

For lower layers, it has been quite stable; we just have the few main hardware architectures, compilers, and operating systems. But the slow down of semiconductor advancement and the ever-growing computation need are driving changes here too. It’s hard to still only rely on general architectures and optimize towards all domains. Developing domain-specific end-to-end solutions is an interesting way out. We see RISC-V pioneering customization and modularization at the ISA level; MLIR is effectively customizing and modularizing the compilers and IRs. It would be very interesting to see how they can work together to push the frontier for low tech layers.

Progressive lowering across system boundaries

There is another aspect I’d like to touch on before closing this section. There are two dimensions that we can look at the unbundling that is brought by MLIR:

Horizontally, dialects split a complete IR into separate partial IRs at the same level, e.g., scalar operations, vector operation, control flow operations, etc. Vertically, MLIR enables dialects and operations to model concepts at different abstraction levels.

This is very useful for domain-specific compilers, which typically have a very high-level source program that describes the job in a declarative way and needs to be compiled down to imperative machine code. Going in one step is unmanageable. Performing progressive lowering with multiple levels of abstractions is preferable, because it separates concerns and lets each layer focus on one dedicated task. Again, decoupling is the key to make complex systems tractable.

But this isn’t entirely new for sure, as we have IR-like constructs in various projects for a long time, like Clang AST and ML framework graphs. What is quite powerful is that MLIR enables different levels to be represented using the same infrastructure; so that the flow between different levels can become seamless.

One of the major tasks in developing modern complex systems is actually choosing various subsystems and chain them together. The boundaries between subsystems place rigid barriers. Much of the efforts are spent taking the output from a previous subsystem, verifying, mutating, and then feeding into a next one. Now if all the systems use the same internal representation and infrastructure, it would save that effort entirely. What’s more, it would also make cross-team cross-project collaboration easier because of the same mindset and tools!

Closing words

Okay, the blog post has grown quite lengthy. Because I’m trying to capture my overall understanding of compilers and IRs and their evolution trends, it’s also abstract and a bit free form. Hopefully it lays down the necessary foundation for my future discussions of more concrete mechanisms. Thanks for following through! A final recap:

Abstractions are how human beings handle the complexity of the world. Compilers are developer productivity tools to automatically convert between different abstraction levels. The top concern for compilers is correctness; generating optimal code is second to that. Correctness is based on clear semantics and verification. Compilers typically can get the majority of the performance with ease, and it saves us engineering effort to focus on the most impactful part.

LLVM decoupled and modularized compilers with LLVM IR and libraries. It also brings the UNIX simplicity to compilers with textural forms. However, certain design choices in LLVM IR makes it not suitable for certain domains, e.g., it does not have a hard guarantee over stability and compatibility, and LLVM IR itself is a monolithic central IR.

SPIR-V is a standard focusing on GPU domains extensively. It provides technical mechanisms and organizational structures for extensibility. It also provides strong compatibility with a stable binary form.

MLIR decouples compilers and IRs further by breaking down monolithic IRs into mixable dialects. Its infrastructure unleashes the power of defining and converting abstractions at different levels with ease. This matches the general direction of going from monolithic to more and more modular and let each domain have their customized solutions with their own tradeoffs. Hopefully writing domain-specific compilers could be as easy as choosing, customizing, and mixing open dialects in the future!

compiler ir llvm spirv mlir design philosophy abstraction semantic correctness optimization productivity modularization standard extensibility compatibility infrastructure

Previous MLIR CodeGen Dialects for Machine Learning Compilers

Next CodeGen Performant Convolution Kernels for Mobile GPUs

Series of Posts

compiler-development »
1. Compilers and IRs: LLVM IR, SPIR-V, and MLIR
2. MLIR CodeGen Dialects for Machine Learning Compilers
3. MLIR Vector Dialect and Patterns
4. MLIR Linalg Dialect and Patterns
5. Single-node ML Runtime Foundation

On This Page