Time flies—almost 9 years have passed since I joined Google. Now the time has come for me to leave and move on. While here, I’m super lucky to mostly work on open source projects that I can publicly talk about. So at the end of my tenure with Google, I’d like to reflect and summarize the incredible journey, which I am super grateful for and thoroughly enjoyed, before I forget some details.
Google is the best place I can imagine to start my career after education. Over the years, I was extremely fortunate to get to know many great Googlers. I’ve learnt a ton from everybody and deeply cherish all the great times. It has always been the awesome people that makes Google such a fantastic place to work!
Along the way, I am extremely appreciative that I was given the opportunity to work on various projects in different organizations (Chrome, Android, Stadia, Brain, Cerebra) across two countries (Canada and U.S.)—
Graphics Compiler Toolchain
I started with Google to work on GPU graphics compiler toolchain projects. This sets the tone for my whole career thus far—developer tools, especially compilers for GPU. :)
It was the early days for Vulkan and SPIR-V, the industry’s next-generation (back then) graphics and compute standards; they were still under the wrap in stealth development mode. Google decided to adopt Vulkan as the graphics API for Android; therefore, such investments to improve their toolchains.
Learned from the messy experiences dealing with SPIR (a LLVM IR derivative), SPIR-V totally detaches from LLVM with an entirely separate stack. The implication is that we now need to build a full compiler toolchain for it, including basic IR assembler, disassembler, parser, and validator, and also full-fledged optimizers and compilers. The reference frontend language for demonstrating Vulkan features is GLSL, a C derivative, but the most actively used frontend language is actually HLSL, a C++ derivative.
I was fortunate to work on all the above pieces and thus witness a full compiler stack growing piece by piece:
KhronosGroup/SPIRV-Headers hosts the machine-readable SPIR-V grammar and header files for various programming languages. I defined the JSON schema for the grammar and contributed the initial JSON encoding. The JSON grammar encoding is used in various downstream projects to automatically generate SPIR-V instruction processing functionalities.
KhronosGroup/SPIRV-Tools hosts the SPIR-V assembler, disassembler, parser, validator, and optimizer. KhronosGroup/glslang hosts the frontend compiler for GLSL to SPIR-V, but its command-line experience is not very friendly so we wrapped it up in google/shaderc. I’ve made various contributions to these projects; code statistics from GitHub:
- SPIRV-Tools: 250 commits, 75,803 ++, 75,392 --
- glslang: 51 commits, 4,365 ++, 2,175 --
- Shaderc: 139 commits, 4,427 ++, 1,786 --
Later, Stadia, which also adopted Vulkan as its only graphics API, needed a HLSL to SPIR-V compiler for game studio partners. I was fortunate to lead the effort to land a solution in microsoft/DirectXShaderCompiler. It became the production compiler for Stadia and was eventually featured in Vulkan 1.2 release.
- DirectXShaderCompiler: 416 commits, 81,273 ++, 34,362 --
For a while I was deeply intrigued by the Rust programming language. So I created two SPIR-V related projects with Rust in my own spare time: gfx-rs/rspirv is a Rust implementation of SPIR-V module processing functionalities, including Rust header, parser, assembler, and disassembler; google/shaderc-rs offers Rust bindings for the shaderc library.
Beyond SPIR-V toolchain, I also contributed to other Vulkan ecosystem projects, including KhronosGroup/VK-GL-CTS, which is the Vulkan conformance tests, and google/gapid, which is a graphics API debugger.
- VK-GL-CTS: 35 commits, 24,132 ++, 5,503 --
- GAPID: 75 commits, 76,596 ++, 54,110 -- [history lost in initial commit squash]
ML Compiler and Runtime
Then someday I learnt about a cool project called MLIR, led by Chris Lattner and under active development in TensorFlow. It aims to develop a reusable and extensible compiler infrastructure, notably for ML. MLIR was later announced to the public and contributed to the LLVM Foundation in 2019.
Late 2018, I chatted with Chris and joined the team as an early member to work on it. That closed the graphics chapter and began the ML/compute chapter for me.
The TableGen subsystem improvements were driven by the concrete needs from TensorFlow adoption, particularly a MLIR-based TensorFlow to TFLite converter. I helped to analyze the old TFLite/TOCO converter and added various missing pieces to enable it end-to-end initially. Aside from the TFLite converter, I also helped to create automatic tools to generate op definitions for the TensorFlow dialect, and ported over some TensorFlow to XLA HLO op conversion patterns. Code statistics from GitHub:
- TensorFlow: 416 commits, 57,620 ++, 28,626 --
While I was in the MLIR team, I also initialized the SPIR-V dialect and various conversions to it. It paved the way towards using standards and compilers to target GPUs of different form factors and from various vendors via MLIR.
It’s extensively used by openxla/iree, an MLIR-based end-to-end compiler and runtime solution. IREE adopts a holistic approach to lower ML models to both host scheduling and device execution logic, aiming to provide a smooth deployment story to meet the needs of from datacenter to edge.
IREE basically grew together with MLIR. I collaborated with the team from the beginning and joined the team subsequently after about a year in the TensorFlow organization.
During this time, I led various SPIR-V code generation and client-side GPU efforts in MLIR and IREE. We were able to deliver portability and performance proofs over different GPU vendors and platforms, notably MobileBERT on Mali GPUs and stable diffusion on AMD GPUs.
Though in ML land GPU basically means NVIDIA. The mission of building a cross-vendor widely-applicable solution using standards and compilers is never short of challenges. During the journey, I also sat in Vulkan ML TSG as the Google representative to help on ML related standardization. I gave quite a few presentations there, some in private settings, some in public settings.
Vulkan developer tools
Developer tool side, there is no cross-vendor solutions for aiding benchmarking and profiling, so I created two additional projects to fill the gaps: google/µVkCompute hosts a micro Vulkan compute pipeline and a collection of benchmarking compute shaders for compiler code generation path finding, google/hardware-perfcounter hosts a set of libraries and utilities for sampling hardware performance counters from the kernel directly. The latter is half baked though given I cannot find enough time for it.
During my 9 years in Google, I was so fortunate to work on so many interesting projects with so many talented Googlers. I’m also very lucky to be able to work mostly in the open source; for code visible outside, I’ve contributed 3k+ commits, with 500k+ additions and 300k+ deletions in total. What a fun journey!
The AI era feels more concrete than ever to me. The industry is full of energy and dynamics at the moment—people are rethinking different layers of the computation stack.
Building a scalable and flexible ML compiler and runtime stack to enable targeting all kinds of hardware for different needs is still what excites me daily. I believe IREE has a great potential to be a fundamental piece in the future landscape. I will help to drive and advance it further by joining nod.ai. :)