Vulkan Learning Notes

Personal notes from following the Vulkan Tutorial.

Environment

Platform: Windows (MSYS2 UCRT64)
Compiler: Clang 21 (C++23 Modules & libc++)
Build system: CMake 3.30+ / Ninja
Vulkan SDK: 1.4.341.1 (via find_package(Vulkan), using Vulkan-Hpp C++ Module)

Dependencies

All third-party libraries are managed as git submodules under deps/.

Library	Purpose
GLFW	Window & input, Vulkan surface
GLM	Math (vectors, matrices, quaternions)
tinyobjloader	OBJ model loading

Quick Start

Download and install Vulkan SDK first, then:

git clone --recurse-submodules https://github.com/qiekn/vulkan
cd vulkan

cmake -B build -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++

cmake --build build
./build/vulkan

Dev Environment

Vulkan SDK

Download and install Vulkan SDK 1.4+. Make sure the VULKAN_SDK environment variable is set — CMake uses find_package(Vulkan) to locate it.

Toolchain (MSYS2 UCRT64)

Install MSYS2, then in the UCRT64 terminal:

pacman -S mingw-w64-ucrt-x86_64-clang \
          mingw-w64-ucrt-x86_64-cmake \
          mingw-w64-ucrt-x86_64-ninja \
          mingw-w64-ucrt-x86_64-clang-tools-extra

Clone & Build

git clone --recurse-submodules <repo-url>
cd vulkan

# if already cloned without submodules:
git submodule update --init --recursive

cmake -B build -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++

This generates compile_commands.json for clangd.

cmake --build build -j$(nproc)
./build/vulkan

Or use the shortcut:

./run.sh          # build + run
./run.sh debug    # build + gdb

Editor

Any editor with clangd support works (VS Code, Neovim, etc.). The build generates build/compile_commands.json automatically.

Formatting and linting are enforced by .clang-format and .clang-tidy.

Current submodules

Submodule	Upstream	CMake target
`deps/glfw`	glfw/glfw	`glfw`
`deps/glm`	g-truc/glm	`glm`
`deps/tinyobjloader`	tinyobjloader/tinyobjloader	`tinyobjloader`

Git Submodules

All third-party dependencies live under deps/ as git submodules. This keeps the repo self-contained without vendoring source code.

Adding a new dependency

git submodule add https://github.com/<owner>/<repo>.git deps/<name>

For example, adding GLFW:

git submodule add https://github.com/glfw/glfw.git deps/glfw

Then wire it up in CMakeLists.txt:

add_subdirectory(${CMAKE_SOURCE_DIR}/deps/glfw)

target_link_libraries(${PROJECT_NAME} PRIVATE glfw)

Cloning a repo with submodules

# clone with all submodules at once
git clone --recurse-submodules <repo-url>

# or, if already cloned:
git submodule update --init --recursive

Updating a submodule to latest

cd deps/glfw
git pull origin master
cd ../..
git add deps/glfw
git commit -m "deps: update glfw"

Removing a submodule

git submodule deinit -f deps/<name>
git rm -f deps/<name>
rm -rf .git/modules/deps/<name>

Gotcha: header-only vs submodule

Initially I used tinyobjloader as a header-only drop-in. Switched to a submodule so CMake handles include paths and we track a specific version. The trade-off: submodule adds a git dependency but keeps things cleaner when the library has its own CMakeLists.txt.

C++20 Modules

This project uses C++20 modules instead of traditional headers. Here’s what I learned getting them to work with CMake + Clang on MSYS2.

File convention

Module interface files: .cppm extension
Regular source files: .cpp (use import to consume modules)

Both live under src/.

CMake setup

CMake 3.28+ has native support for C++20 modules via FILE_SET CXX_MODULES:

cmake_minimum_required(VERSION 3.28)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

file(GLOB_RECURSE SOURCE_FILES CONFIGURE_DEPENDS "src/*.cpp")
file(GLOB_RECURSE MODULE_FILES CONFIGURE_DEPENDS "src/*.cppm")

add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} PRIVATE ${SOURCE_FILES})
target_sources(${PROJECT_NAME} PRIVATE FILE_SET CXX_MODULES FILES ${MODULE_FILES})

Ninja is required

MinGW Makefiles does not support C++20 modules. You must use Ninja:

cmake -B build -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++

If you try -G "MinGW Makefiles", CMake will fail to resolve module dependencies.

Writing a module

A .cppm file looks like this:

module;

// global module fragment — #include traditional headers here
#include <vulkan/vulkan.h>
#include <GLFW/glfw3.h>

export module app;

export class App {
public:
  void Run();

private:
  void InitWindow();
  void InitVulkan();
  void MainLoop();
  void Cleanup();
};

Key points:

module; starts the global module fragment — put all #include directives here
export module <name>; declares the module name
export before declarations makes them visible to importers

Consuming a module

In a .cpp file:

import app;

int main() {
  App app;
  app.Run();
  return 0;
}

Gotchas

Include order matters: All #include lines must go in the global module fragment (between module; and export module). Putting #include after export module is a hard error.
clangd support: clangd needs compile_commands.json (generated by CMake with CMAKE_EXPORT_COMPILE_COMMANDS ON). Module support in clangd is still evolving — occasional false errors are normal.
Build order: CMake + Ninja handle module dependency scanning automatically. No manual ordering needed.

C++23 Modules + import std

Synchronization: Semaphores & Fences

Vulkan is radically asynchronous. The CPU records commands and submits them at blazing speed; the GPU executes them later, on its own timeline. Without explicit synchronization, the CPU would happily overwrite a command buffer the GPU is still reading, or the display engine would show a half-rendered frame.

There are exactly two synchronization primitives to learn:

	Fence	Semaphore
Direction	GPU → CPU	GPU → GPU
CPU visible?	Yes (`waitForFences`)	No
Purpose	CPU waits for GPU to finish	Order GPU operations internally

Fence: The Buzzer

A fence is like a restaurant buzzer. The GPU buzzes it when it’s done. The CPU blocks on waitForFences() until the buzz comes.

Critical rule: once a fence is signaled, the CPU must manually reset it via resetFences(). If you forget, the next waitForFences() returns instantly (it’s already signaled!), and the CPU charges ahead into GPU memory that’s still being used. Crash.

Creation

Fences start signaled on purpose. On the very first frame, DrawFrame() calls waitForFences() before any submit has happened. If the fence started unsignaled, it would block forever.

// one fence per frame-in-flight slot
for (auto _ : std::views::iota(0uz, kMAX_FRAMES_IN_FLIGHT)) {
    in_flight_fences_.emplace_back(
        device_,
        vk::FenceCreateInfo{.flags = vk::FenceCreateFlagBits::eSignaled}
    );
}

Usage in DrawFrame

void DrawFrame() {
    // Block CPU until GPU finishes with this frame slot's command buffer
    auto fence_result = device_.waitForFences(
        *in_flight_fences_[frame_index_], vk::True, UINT64_MAX);

    // MUST reset immediately after wait succeeds
    device_.resetFences(*in_flight_fences_[frame_index_]);

    // ... acquire, record, ...

    // Submit passes the fence — GPU will signal it when done
    graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);

    // ...
}

The fence protects the command buffer. We have kMAX_FRAMES_IN_FLIGHT = 2 frame slots, each with its own command buffer. The fence ensures that when slot 0 comes around again, the GPU has finished executing slot 0’s previous commands before we overwrite them.

Semaphore: The Traffic Light

Semaphores coordinate the ordering of GPU-side operations. The CPU can’t see or wait on them — it just sets up the rules at submit time, and the GPU follows them internally.

We use two semaphores per frame cycle:

present_complete_semaphore (acquire → render)

// Acquire an image from the swapchain
auto [result, image_index] = swapchain_.acquireNextImage(
    UINT64_MAX,
    *present_complete_semaphores_[frame_index_],  // <-- GPU will signal this
    nullptr);

acquireNextImage returns an image index immediately, but the image may not be safe to write yet (the display engine might still be reading it). The semaphore is a promise: “when I’m signaled, the image is truly yours to render into.”

Then at submit time, we tell the GPU to wait on this semaphore before it starts writing color data:

vk::SubmitInfo submit_info{
    .waitSemaphoreCount = 1,
    .pWaitSemaphores = &*present_complete_semaphores_[frame_index_],
    .pWaitDstStageMask = &wait_stage,  // eColorAttachmentOutput
    // ...
};

pWaitDstStageMask = eColorAttachmentOutput is subtle: the GPU can start earlier pipeline stages (vertex processing, etc.) before the semaphore fires. It only blocks at the exact stage where we’d write pixels. This maximizes parallelism.

render_finished_semaphore (render → present)

vk::SubmitInfo submit_info{
    // ...
    .signalSemaphoreCount = 1,
    .pSignalSemaphores = &*render_finished_semaphores_[image_index],
};
graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);

After the GPU finishes all commands in this submit, it signals render_finished. Then we tell presentKHR to wait on it:

vk::PresentInfoKHR present_info{
    .waitSemaphoreCount = 1,
    .pWaitSemaphores = &*render_finished_semaphores_[image_index],
    .swapchainCount = 1,
    .pSwapchains = &*swapchain_,
    .pImageIndices = &image_index,
};
graphics_queue_.presentKHR(present_info);

Without this, the display engine could push a half-drawn image to the screen.

Why Different Counts?

This is the gotcha that confused me at first. The sync objects are not all the same count:

// Indexed by frame_index_ (0..MAX_FRAMES_IN_FLIGHT-1)
std::vector<vk::raii::Semaphore> present_complete_semaphores_;  // size = 2
std::vector<vk::raii::Fence>     in_flight_fences_;             // size = 2
std::vector<vk::raii::CommandBuffer> command_buffers_;           // size = 2

// Indexed by image_index (0..swapchain_images_.size()-1)
std::vector<vk::raii::Semaphore> render_finished_semaphores_;   // size = 3 (usually)

Frame-indexed (2): These belong to the “frame slot” — the CPU rotates between slot 0 and slot 1. The fence ensures we don’t reuse a slot that the GPU is still executing. The present_complete semaphore is tied to the acquire call, which happens once per frame slot.

Image-indexed (3): The swapchain typically has 3 images. render_finished protects a specific image — “don’t present image #2 until it’s fully drawn.” Different frame slots can end up drawing to the same image index, so we need one semaphore per image.

Creation code:

void CreateSyncObjects() {
    // One render_finished per swapchain image
    for (auto _ : std::views::iota(0uz, swapchain_images_.size())) {
        render_finished_semaphores_.emplace_back(device_, vk::SemaphoreCreateInfo());
    }

    // One present_complete + one fence per frame-in-flight slot
    for (auto _ : std::views::iota(0uz, kMAX_FRAMES_IN_FLIGHT)) {
        present_complete_semaphores_.emplace_back(device_, vk::SemaphoreCreateInfo());
        in_flight_fences_.emplace_back(
            device_,
            vk::FenceCreateInfo{.flags = vk::FenceCreateFlagBits::eSignaled});
    }
}

The * Syntax

You’ll see &* a lot in the submit/present calls. Here’s what it does:

.pWaitSemaphores = &*present_complete_semaphores_[frame_index_],

present_complete_semaphores_[frame_index_] is a vk::raii::Semaphore (RAII wrapper)
* calls operator*() which returns a const vk::Semaphore& (the raw Vulkan handle)
& takes the address of that handle, giving const vk::Semaphore*

The Vulkan C API needs raw pointers. &* bridges the RAII wrapper to the C interface.

Complete DrawFrame Walkthrough

Here’s the full function with annotations showing which sync object does what:

void DrawFrame() {
    // ---- STEP 1: Wait for this frame slot to be free ----
    // Fence was signaled by the PREVIOUS submit that used this slot.
    // First frame: fence starts signaled, so this returns immediately.
    device_.waitForFences(*in_flight_fences_[frame_index_], vk::True, UINT64_MAX);
    device_.resetFences(*in_flight_fences_[frame_index_]);

    // ---- STEP 2: Acquire a swapchain image ----
    // present_complete_semaphore will be signaled when the image
    // is actually available for rendering (display engine released it).
    auto [result, image_index] = swapchain_.acquireNextImage(
        UINT64_MAX, *present_complete_semaphores_[frame_index_], nullptr);

    // ---- STEP 3: Record commands ----
    // Safe to reset — fence confirmed GPU is done with this command buffer.
    command_buffers_[frame_index_].reset();
    RecordCommandBuffer(image_index);

    // ---- STEP 4: Submit to GPU ----
    vk::PipelineStageFlags wait_stage(vk::PipelineStageFlagBits::eColorAttachmentOutput);
    vk::SubmitInfo submit_info{
        .waitSemaphoreCount = 1,
        .pWaitSemaphores = &*present_complete_semaphores_[frame_index_],
            // ^ GPU waits for image to be available before writing pixels
        .pWaitDstStageMask = &wait_stage,
            // ^ Only block at color output stage, vertex work can proceed early
        .commandBufferCount = 1,
        .pCommandBuffers = &*command_buffers_[frame_index_],
        .signalSemaphoreCount = 1,
        .pSignalSemaphores = &*render_finished_semaphores_[image_index],
            // ^ GPU signals this when all commands complete
    };
    graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);
        // ^ Fence signaled when GPU finishes — unblocks next waitForFences

    // ---- STEP 5: Present ----
    vk::PresentInfoKHR present_info{
        .waitSemaphoreCount = 1,
        .pWaitSemaphores = &*render_finished_semaphores_[image_index],
            // ^ Display engine waits until rendering is fully done
        .swapchainCount = 1,
        .pSwapchains = &*swapchain_,
        .pImageIndices = &image_index,
    };
    graphics_queue_.presentKHR(present_info);

    // ---- STEP 6: Advance frame slot ----
    frame_index_ = (frame_index_ + 1) % kMAX_FRAMES_IN_FLIGHT;
}

Timeline: Two Frames In Flight

Here’s what happens when MAX_FRAMES_IN_FLIGHT = 2:

Time ──────────────────────────────────────────────────────────────►

CPU:  [slot0: wait+acquire+record+submit] [slot1: wait+acquire+record+submit] [slot0: wait...
                                            │                                    │
GPU:  ·········[slot0: render──────────────]│[slot1: render──────────────]        │
                                         fence0                               fence1
                                        signaled                             signaled

While GPU renders slot 0, CPU is already recording slot 1.
When CPU circles back to slot 0, waitForFences blocks until GPU finishes slot 0.
This keeps both CPU and GPU busy — neither waits for the other (except at the fence).

The gotcha: if you used queue.waitIdle() instead of fences, every frame would fully serialize:

CPU:  [record+submit] ......waiting...... [record+submit] ......waiting......
GPU:  ................[render]............[render]

Fences let us overlap CPU and GPU work — that’s the whole point of frames-in-flight.

Shutdown: waitIdle

Before destroying any Vulkan objects, we must ensure the GPU is completely idle:

void MainLoop() {
    while (!glfwWindowShouldClose(window_)) {
        glfwPollEvents();
        DrawFrame();
    }
    device_.waitIdle();  // block until ALL GPU work finishes
}

Without this, RAII destructors would destroy semaphores/fences/command buffers that the GPU is still using.

Keyboard shortcuts

Vulkan Learning Notes