Vulkan Learning Notes
Personal notes from following the Vulkan Tutorial.
Environment
- Platform: Windows (MSYS2 UCRT64)
- Compiler: Clang 21 (C++23 Modules & libc++)
- Build system: CMake 3.30+ / Ninja
- Vulkan SDK: 1.4.341.1 (via
find_package(Vulkan), using Vulkan-Hpp C++ Module)
Dependencies
All third-party libraries are managed as git submodules under deps/.
| Library | Purpose |
|---|---|
| GLFW | Window & input, Vulkan surface |
| GLM | Math (vectors, matrices, quaternions) |
| tinyobjloader | OBJ model loading |
Quick Start
Download and install Vulkan SDK first, then:
git clone --recurse-submodules https://github.com/qiekn/vulkan
cd vulkan
cmake -B build -G Ninja \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
cmake --build build
./build/vulkan
Dev Environment
Vulkan SDK
Download and install Vulkan SDK 1.4+. Make sure the VULKAN_SDK environment variable is set — CMake uses find_package(Vulkan) to locate it.
Toolchain (MSYS2 UCRT64)
Install MSYS2, then in the UCRT64 terminal:
pacman -S mingw-w64-ucrt-x86_64-clang \
mingw-w64-ucrt-x86_64-cmake \
mingw-w64-ucrt-x86_64-ninja \
mingw-w64-ucrt-x86_64-clang-tools-extra
Clone & Build
git clone --recurse-submodules <repo-url>
cd vulkan
# if already cloned without submodules:
git submodule update --init --recursive
cmake -B build -G Ninja \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
This generates compile_commands.json for clangd.
cmake --build build -j$(nproc)
./build/vulkan
Or use the shortcut:
./run.sh # build + run
./run.sh debug # build + gdb
Editor
Any editor with clangd support works (VS Code, Neovim, etc.).
The build generates build/compile_commands.json automatically.
Formatting and linting are enforced by .clang-format and .clang-tidy.
Current submodules
| Submodule | Upstream | CMake target |
|---|---|---|
deps/glfw | glfw/glfw | glfw |
deps/glm | g-truc/glm | glm |
deps/tinyobjloader | tinyobjloader/tinyobjloader | tinyobjloader |
Git Submodules
All third-party dependencies live under deps/ as git submodules. This keeps the repo self-contained without vendoring source code.
Adding a new dependency
git submodule add https://github.com/<owner>/<repo>.git deps/<name>
For example, adding GLFW:
git submodule add https://github.com/glfw/glfw.git deps/glfw
Then wire it up in CMakeLists.txt:
add_subdirectory(${CMAKE_SOURCE_DIR}/deps/glfw)
target_link_libraries(${PROJECT_NAME} PRIVATE glfw)
Cloning a repo with submodules
# clone with all submodules at once
git clone --recurse-submodules <repo-url>
# or, if already cloned:
git submodule update --init --recursive
Updating a submodule to latest
cd deps/glfw
git pull origin master
cd ../..
git add deps/glfw
git commit -m "deps: update glfw"
Removing a submodule
git submodule deinit -f deps/<name>
git rm -f deps/<name>
rm -rf .git/modules/deps/<name>
Gotcha: header-only vs submodule
Initially I used tinyobjloader as a header-only drop-in. Switched to a submodule so CMake handles include paths and we track a specific version. The trade-off: submodule adds a git dependency but keeps things cleaner when the library has its own CMakeLists.txt.
C++20 Modules
This project uses C++20 modules instead of traditional headers. Here’s what I learned getting them to work with CMake + Clang on MSYS2.
File convention
- Module interface files:
.cppmextension - Regular source files:
.cpp(useimportto consume modules)
Both live under src/.
CMake setup
CMake 3.28+ has native support for C++20 modules via FILE_SET CXX_MODULES:
cmake_minimum_required(VERSION 3.28)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
file(GLOB_RECURSE SOURCE_FILES CONFIGURE_DEPENDS "src/*.cpp")
file(GLOB_RECURSE MODULE_FILES CONFIGURE_DEPENDS "src/*.cppm")
add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} PRIVATE ${SOURCE_FILES})
target_sources(${PROJECT_NAME} PRIVATE FILE_SET CXX_MODULES FILES ${MODULE_FILES})
Ninja is required
MinGW Makefiles does not support C++20 modules. You must use Ninja:
cmake -B build -G Ninja \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
If you try -G "MinGW Makefiles", CMake will fail to resolve module dependencies.
Writing a module
A .cppm file looks like this:
module;
// global module fragment — #include traditional headers here
#include <vulkan/vulkan.h>
#include <GLFW/glfw3.h>
export module app;
export class App {
public:
void Run();
private:
void InitWindow();
void InitVulkan();
void MainLoop();
void Cleanup();
};
Key points:
module;starts the global module fragment — put all#includedirectives hereexport module <name>;declares the module nameexportbefore declarations makes them visible to importers
Consuming a module
In a .cpp file:
import app;
int main() {
App app;
app.Run();
return 0;
}
Gotchas
-
Include order matters: All
#includelines must go in the global module fragment (betweenmodule;andexport module). Putting#includeafterexport moduleis a hard error. -
clangd support: clangd needs
compile_commands.json(generated by CMake withCMAKE_EXPORT_COMPILE_COMMANDS ON). Module support in clangd is still evolving — occasional false errors are normal. -
Build order: CMake + Ninja handle module dependency scanning automatically. No manual ordering needed.
C++23 Modules + import std
Synchronization: Semaphores & Fences
Vulkan is radically asynchronous. The CPU records commands and submits them at blazing speed; the GPU executes them later, on its own timeline. Without explicit synchronization, the CPU would happily overwrite a command buffer the GPU is still reading, or the display engine would show a half-rendered frame.
There are exactly two synchronization primitives to learn:
| Fence | Semaphore | |
|---|---|---|
| Direction | GPU → CPU | GPU → GPU |
| CPU visible? | Yes (waitForFences) | No |
| Purpose | CPU waits for GPU to finish | Order GPU operations internally |
Fence: The Buzzer
A fence is like a restaurant buzzer. The GPU buzzes it when it’s done. The CPU blocks on waitForFences() until the buzz comes.
Critical rule: once a fence is signaled, the CPU must manually reset it via resetFences(). If you forget, the next waitForFences() returns instantly (it’s already signaled!), and the CPU charges ahead into GPU memory that’s still being used. Crash.
Creation
Fences start signaled on purpose. On the very first frame, DrawFrame() calls waitForFences() before any submit has happened. If the fence started unsignaled, it would block forever.
// one fence per frame-in-flight slot
for (auto _ : std::views::iota(0uz, kMAX_FRAMES_IN_FLIGHT)) {
in_flight_fences_.emplace_back(
device_,
vk::FenceCreateInfo{.flags = vk::FenceCreateFlagBits::eSignaled}
);
}
Usage in DrawFrame
void DrawFrame() {
// Block CPU until GPU finishes with this frame slot's command buffer
auto fence_result = device_.waitForFences(
*in_flight_fences_[frame_index_], vk::True, UINT64_MAX);
// MUST reset immediately after wait succeeds
device_.resetFences(*in_flight_fences_[frame_index_]);
// ... acquire, record, ...
// Submit passes the fence — GPU will signal it when done
graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);
// ...
}
The fence protects the command buffer. We have kMAX_FRAMES_IN_FLIGHT = 2 frame slots, each with its own command buffer. The fence ensures that when slot 0 comes around again, the GPU has finished executing slot 0’s previous commands before we overwrite them.
Semaphore: The Traffic Light
Semaphores coordinate the ordering of GPU-side operations. The CPU can’t see or wait on them — it just sets up the rules at submit time, and the GPU follows them internally.
We use two semaphores per frame cycle:
present_complete_semaphore (acquire → render)
// Acquire an image from the swapchain
auto [result, image_index] = swapchain_.acquireNextImage(
UINT64_MAX,
*present_complete_semaphores_[frame_index_], // <-- GPU will signal this
nullptr);
acquireNextImage returns an image index immediately, but the image may not be safe to write yet (the display engine might still be reading it). The semaphore is a promise: “when I’m signaled, the image is truly yours to render into.”
Then at submit time, we tell the GPU to wait on this semaphore before it starts writing color data:
vk::SubmitInfo submit_info{
.waitSemaphoreCount = 1,
.pWaitSemaphores = &*present_complete_semaphores_[frame_index_],
.pWaitDstStageMask = &wait_stage, // eColorAttachmentOutput
// ...
};
pWaitDstStageMask = eColorAttachmentOutput is subtle: the GPU can start earlier pipeline stages (vertex processing, etc.) before the semaphore fires. It only blocks at the exact stage where we’d write pixels. This maximizes parallelism.
render_finished_semaphore (render → present)
vk::SubmitInfo submit_info{
// ...
.signalSemaphoreCount = 1,
.pSignalSemaphores = &*render_finished_semaphores_[image_index],
};
graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);
After the GPU finishes all commands in this submit, it signals render_finished. Then we tell presentKHR to wait on it:
vk::PresentInfoKHR present_info{
.waitSemaphoreCount = 1,
.pWaitSemaphores = &*render_finished_semaphores_[image_index],
.swapchainCount = 1,
.pSwapchains = &*swapchain_,
.pImageIndices = &image_index,
};
graphics_queue_.presentKHR(present_info);
Without this, the display engine could push a half-drawn image to the screen.
Why Different Counts?
This is the gotcha that confused me at first. The sync objects are not all the same count:
// Indexed by frame_index_ (0..MAX_FRAMES_IN_FLIGHT-1)
std::vector<vk::raii::Semaphore> present_complete_semaphores_; // size = 2
std::vector<vk::raii::Fence> in_flight_fences_; // size = 2
std::vector<vk::raii::CommandBuffer> command_buffers_; // size = 2
// Indexed by image_index (0..swapchain_images_.size()-1)
std::vector<vk::raii::Semaphore> render_finished_semaphores_; // size = 3 (usually)
Frame-indexed (2): These belong to the “frame slot” — the CPU rotates between slot 0 and slot 1. The fence ensures we don’t reuse a slot that the GPU is still executing. The present_complete semaphore is tied to the acquire call, which happens once per frame slot.
Image-indexed (3): The swapchain typically has 3 images. render_finished protects a specific image — “don’t present image #2 until it’s fully drawn.” Different frame slots can end up drawing to the same image index, so we need one semaphore per image.
Creation code:
void CreateSyncObjects() {
// One render_finished per swapchain image
for (auto _ : std::views::iota(0uz, swapchain_images_.size())) {
render_finished_semaphores_.emplace_back(device_, vk::SemaphoreCreateInfo());
}
// One present_complete + one fence per frame-in-flight slot
for (auto _ : std::views::iota(0uz, kMAX_FRAMES_IN_FLIGHT)) {
present_complete_semaphores_.emplace_back(device_, vk::SemaphoreCreateInfo());
in_flight_fences_.emplace_back(
device_,
vk::FenceCreateInfo{.flags = vk::FenceCreateFlagBits::eSignaled});
}
}
The * Syntax
You’ll see &* a lot in the submit/present calls. Here’s what it does:
.pWaitSemaphores = &*present_complete_semaphores_[frame_index_],
present_complete_semaphores_[frame_index_]is avk::raii::Semaphore(RAII wrapper)*callsoperator*()which returns aconst vk::Semaphore&(the raw Vulkan handle)&takes the address of that handle, givingconst vk::Semaphore*
The Vulkan C API needs raw pointers. &* bridges the RAII wrapper to the C interface.
Complete DrawFrame Walkthrough
Here’s the full function with annotations showing which sync object does what:
void DrawFrame() {
// ---- STEP 1: Wait for this frame slot to be free ----
// Fence was signaled by the PREVIOUS submit that used this slot.
// First frame: fence starts signaled, so this returns immediately.
device_.waitForFences(*in_flight_fences_[frame_index_], vk::True, UINT64_MAX);
device_.resetFences(*in_flight_fences_[frame_index_]);
// ---- STEP 2: Acquire a swapchain image ----
// present_complete_semaphore will be signaled when the image
// is actually available for rendering (display engine released it).
auto [result, image_index] = swapchain_.acquireNextImage(
UINT64_MAX, *present_complete_semaphores_[frame_index_], nullptr);
// ---- STEP 3: Record commands ----
// Safe to reset — fence confirmed GPU is done with this command buffer.
command_buffers_[frame_index_].reset();
RecordCommandBuffer(image_index);
// ---- STEP 4: Submit to GPU ----
vk::PipelineStageFlags wait_stage(vk::PipelineStageFlagBits::eColorAttachmentOutput);
vk::SubmitInfo submit_info{
.waitSemaphoreCount = 1,
.pWaitSemaphores = &*present_complete_semaphores_[frame_index_],
// ^ GPU waits for image to be available before writing pixels
.pWaitDstStageMask = &wait_stage,
// ^ Only block at color output stage, vertex work can proceed early
.commandBufferCount = 1,
.pCommandBuffers = &*command_buffers_[frame_index_],
.signalSemaphoreCount = 1,
.pSignalSemaphores = &*render_finished_semaphores_[image_index],
// ^ GPU signals this when all commands complete
};
graphics_queue_.submit(submit_info, *in_flight_fences_[frame_index_]);
// ^ Fence signaled when GPU finishes — unblocks next waitForFences
// ---- STEP 5: Present ----
vk::PresentInfoKHR present_info{
.waitSemaphoreCount = 1,
.pWaitSemaphores = &*render_finished_semaphores_[image_index],
// ^ Display engine waits until rendering is fully done
.swapchainCount = 1,
.pSwapchains = &*swapchain_,
.pImageIndices = &image_index,
};
graphics_queue_.presentKHR(present_info);
// ---- STEP 6: Advance frame slot ----
frame_index_ = (frame_index_ + 1) % kMAX_FRAMES_IN_FLIGHT;
}
Timeline: Two Frames In Flight
Here’s what happens when MAX_FRAMES_IN_FLIGHT = 2:
Time ──────────────────────────────────────────────────────────────►
CPU: [slot0: wait+acquire+record+submit] [slot1: wait+acquire+record+submit] [slot0: wait...
│ │
GPU: ·········[slot0: render──────────────]│[slot1: render──────────────] │
fence0 fence1
signaled signaled
- While GPU renders slot 0, CPU is already recording slot 1.
- When CPU circles back to slot 0,
waitForFencesblocks until GPU finishes slot 0. - This keeps both CPU and GPU busy — neither waits for the other (except at the fence).
The gotcha: if you used queue.waitIdle() instead of fences, every frame would fully serialize:
CPU: [record+submit] ......waiting...... [record+submit] ......waiting......
GPU: ................[render]............[render]
Fences let us overlap CPU and GPU work — that’s the whole point of frames-in-flight.
Shutdown: waitIdle
Before destroying any Vulkan objects, we must ensure the GPU is completely idle:
void MainLoop() {
while (!glfwWindowShouldClose(window_)) {
glfwPollEvents();
DrawFrame();
}
device_.waitIdle(); // block until ALL GPU work finishes
}
Without this, RAII destructors would destroy semaphores/fences/command buffers that the GPU is still using.