4 comments

  • qrios 5 hours ago ago

    Works on my computer: RTX 3090, CUDA 12.6

    Interesting project! I haven't really worked with Vulkan myself yet. Hence my question: how is the code compiled and then loaded into the cores?

    Or is the entire code always compiled in the REPL and then uploaded, with only the existing data addresses being updated?

    • mr_octopus 3 hours ago ago

      Thanks for trying it! :)

      Each gpu_* call emits SPIR-V and dispatches via Vulkan compute. Data stays resident in VRAM between calls — no round-trips to CPU unless you need the result.

      No thread_id exposed. The runtime handles thread indexing internally — gpu_add(a, b) means "one thread per element, each does a[i] + b[i]." Workgroup sizing and dispatch dimensions are automatic.

      The tradeoff: you can't write custom kernels with shared memory or warp-level ops. OctoFlow targets the 80% of GPU work that's embarrassingly parallel. For the other 20% you still want CUDA/Vulkan directly.

      Cheers

  • billconan 5 hours ago ago

    I'm curious how a gpu language's syntax design can be different from CUDA kernel?

    Because I think there is no way to avoid concepts like thread_id.

    I'm curious how GPU programming can be made (a lot) simpler than CUDA.

    • mr_octopus 3 hours ago ago

      Most GPU work boils down to a few patterns — map, reduce, scan. Each one has a known way to assign threads.

      So instead of writing a kernel with thread_id:

        let c = gpu_add(a, b)
        let total = gpu_sum(c)
      
      The thread indexing is still there — just handled by the runtime, like how Python hides pointer math.