// printk emits a single 8-bit character to standard output
//
//go:linkname printk runtime.printk
func printk(c byte)
So, printing “Hello, world!”, necessarily will have to make 13 calls to this function. I think I would have required a printk that prints an array of bytes. I expect that can be significantly faster on lots of hardware.
In contrast, there’s
// getRandomData generates len(b) random bytes and writes them into b
//
//go:linkname getRandomData runtime.getRandomData
func getRandomData(b []byte)
Here, they seem to acknowledge that it can be faster to make a single call.
The method for printing uses an Intel UART driver to print characters. AFAIK, the standard low level UART generally only does single character transfers unless you write a (relatively) complex driver.
Rendering per string is better per string, but I'm not so sure how bad the difference is when it comes to UART but I doubt the system has enough throughput for the first implementation to matter.
> The method for printing uses an Intel UART driver to print characters
The spec (rightfully) says “(e.g. serial console)”, not “Intel UART driver”.
You cannot know what bare metal you’re running on. On some hardware it could be sending data out over Bluetooth, USB or WiFi because that’s the only connection to the outside world.
I wonder if this is related to that bare metal bios os post from a week or so ago. I asked the author why he used tty asm calls to print instead of calling int 10 directly and he said it was more efficient, but for different reasons.
printk is the low level primitive for stdout printing and it's done this way as low level drivers generally only accept single characters.
There are upper level functions which simply takes a []byte and make fmt.Printf() work seamlessly and effectively when not printing on an UART that only takes a single character as output.
> Here, they seem to acknowledge that it can be faster to make a single call.
It calls the internal Fill function to fill 4 bytes of the slice at a time. That calls the rng assembly stub function which uses 'rdrand' to get 32bits of random data. Which gets called len(b)/4 times.
I don't think they did it for speed but rather to be more idiomatic.
Anyways, OSDev has had a "Go Bare Bones" page for quite a while:
We use 'scratch' containers for many of our Go applications, so they have no user-space stuff other than our application binary. It reduces exposure for security vulnerabilities. This proposal seems to be taking that approach to the extreme - not even a kernel. Super-interesting; I wonder if it could run on cloud VMs? How tiny could the image become?
Cloud vms are a main target for unikernels, however, as Russ mentions in one of the linked issues there actually is quite a lot of other code you need to include in your system depending on what you are deploying to.
For instance systems with arm64 might need UEFI or if you enable SEV now you need additional support for that which is why I'd agree with Russ's stance on this.
Every time someone asks us to provide support for a new cloud instance type (like a graviton 4 or azure's arm) we have to go in and sometimes provide a ton of new code to get it working.
I assume you're referring to this[1]. I don't think it's necessary to bring all of that into the Go runtime itself, or ask the Go team to maintain it. It would be part of your application, and similar to a board support package.
TamaGo already supports UEFI on x86, and that too would be part of the BSP for your application, not something that would need to be upstreamed to Go proper. Same for AMD SEV SNP.
As for you (nanovms) supporting new instance types, wouldn't it be nice to do that work in Go? :)
Edit: I wonder how big the performance impact would be if you used TamaGo's virtio-net support instead of calling from Go into nanos.
> This proposal seems to be taking that approach to the extreme - not even a kernel.
To be fair, there is a kernel - the Go runtime. But since there is no privilege separation it classifies as a unikernel. Performance gains should be expected compared to a system where you have to copy data to/from guest VM kernel space to guest VM user space.
> I wonder if it could run on cloud VMs?
Yes. TamaGo currently runs in KVM guests with the following VMMs: Cloud Hypervisor, Firecracker microvm, QEMU microvm.
> How tiny could the image become?
Roughly the same size as your current Go binary. TamaGo doesn't add much.
I like Anil Madhavapeddy's definition for such setups. A compiler that just refuses to stop:
MirageOS is a system written in pure OCaml where not only do common network protocols and file systems and high-level things like web servers and web stacks can all be expressed in OCaml but the compiler just refuses to stop ... compiler, instead of stopping and generating a binary that you then run inside Linux or Windows, will continue to specialize the application that it is compiling and ... emit a full operating system that can just boot by itself.
Services rarely need timezone done. So if one is OK with supporting only UTC, Go runtime works fine without any timezene data.
We use a minimal image to run in on AWS Nitro VM and it contains only kernel, init.d, the Go application file and TLS certificate roots with the root filesystem mounted over tmpfs.
Note that Nitro VM uses a custom kernel provided by AWS so the new proposal is not relevant for us. But if we could run Go directly in that VM, it will surely makes things faster and saves like 10% memory overhead. And it will also avoid OOM killer and few other bad unwanted interactions between Go runtime and Linux kernel memory management.
I would be interested in this if it enabled deterministic simulation testing for the Go programming languages. There have been some efforts in this area but with little success.
FTA:
So, printing “Hello, world!”, necessarily will have to make 13 calls to this function. I think I would have required a printk that prints an array of bytes. I expect that can be significantly faster on lots of hardware.In contrast, there’s
Here, they seem to acknowledge that it can be faster to make a single call.The method for printing uses an Intel UART driver to print characters. AFAIK, the standard low level UART generally only does single character transfers unless you write a (relatively) complex driver.
Rendering per string is better per string, but I'm not so sure how bad the difference is when it comes to UART but I doubt the system has enough throughput for the first implementation to matter.
> The method for printing uses an Intel UART driver to print characters
The spec (rightfully) says “(e.g. serial console)”, not “Intel UART driver”.
You cannot know what bare metal you’re running on. On some hardware it could be sending data out over Bluetooth, USB or WiFi because that’s the only connection to the outside world.
I wonder if this is related to that bare metal bios os post from a week or so ago. I asked the author why he used tty asm calls to print instead of calling int 10 directly and he said it was more efficient, but for different reasons.
https://news.ycombinator.com/item?id=43873822
Arguably `printk(c byte)` should be `printck(c byte)`, and there should be a separate `printk(s []byte)` that handles an array of bytes.
If `printk` isn't implemented, then fall back to repeated calls of `printck`.
printk is the low level primitive for stdout printing and it's done this way as low level drivers generally only accept single characters.
There are upper level functions which simply takes a []byte and make fmt.Printf() work seamlessly and effectively when not printing on an UART that only takes a single character as output.
In TamaGo stdout is primarily used for debugging.
> Here, they seem to acknowledge that it can be faster to make a single call.
It calls the internal Fill function to fill 4 bytes of the slice at a time. That calls the rng assembly stub function which uses 'rdrand' to get 32bits of random data. Which gets called len(b)/4 times.
I don't think they did it for speed but rather to be more idiomatic.
Anyways, OSDev has had a "Go Bare Bones" page for quite a while:
https://wiki.osdev.org/Go_Bare_Bones
We use 'scratch' containers for many of our Go applications, so they have no user-space stuff other than our application binary. It reduces exposure for security vulnerabilities. This proposal seems to be taking that approach to the extreme - not even a kernel. Super-interesting; I wonder if it could run on cloud VMs? How tiny could the image become?
Cloud vms are a main target for unikernels, however, as Russ mentions in one of the linked issues there actually is quite a lot of other code you need to include in your system depending on what you are deploying to.
For instance systems with arm64 might need UEFI or if you enable SEV now you need additional support for that which is why I'd agree with Russ's stance on this.
Every time someone asks us to provide support for a new cloud instance type (like a graviton 4 or azure's arm) we have to go in and sometimes provide a ton of new code to get it working.
I assume you're referring to this[1]. I don't think it's necessary to bring all of that into the Go runtime itself, or ask the Go team to maintain it. It would be part of your application, and similar to a board support package.
TamaGo already supports UEFI on x86, and that too would be part of the BSP for your application, not something that would need to be upstreamed to Go proper. Same for AMD SEV SNP.
As for you (nanovms) supporting new instance types, wouldn't it be nice to do that work in Go? :)
Edit: I wonder how big the performance impact would be if you used TamaGo's virtio-net support instead of calling from Go into nanos.
> This proposal seems to be taking that approach to the extreme - not even a kernel.
To be fair, there is a kernel - the Go runtime. But since there is no privilege separation it classifies as a unikernel. Performance gains should be expected compared to a system where you have to copy data to/from guest VM kernel space to guest VM user space.
> I wonder if it could run on cloud VMs?
Yes. TamaGo currently runs in KVM guests with the following VMMs: Cloud Hypervisor, Firecracker microvm, QEMU microvm.
> How tiny could the image become?
Roughly the same size as your current Go binary. TamaGo doesn't add much.
> To be fair, there is a kernel - the Go runtime.
I like Anil Madhavapeddy's definition for such setups. A compiler that just refuses to stop:
https://signalsandthreads.com/what-is-an-operating-system / https://archive.vn/yLfkqLooks like Tamago targets multiple VM runtimes https://github.com/usbarmory/tamago?tab=readme-ov-file
How do you handle temp file space, timezone data, and other things that a minimal image provide?
For timezones data go already has https://pkg.go.dev/time/tzdata
Temp file space: Use RAM, or talk to host storage over Virtio.
Timezone data etc: You would have to fetch that over the network, or from a metadata API such as the one Firecracker provides to VM guests.
Services rarely need timezone done. So if one is OK with supporting only UTC, Go runtime works fine without any timezene data.
We use a minimal image to run in on AWS Nitro VM and it contains only kernel, init.d, the Go application file and TLS certificate roots with the root filesystem mounted over tmpfs.
Note that Nitro VM uses a custom kernel provided by AWS so the new proposal is not relevant for us. But if we could run Go directly in that VM, it will surely makes things faster and saves like 10% memory overhead. And it will also avoid OOM killer and few other bad unwanted interactions between Go runtime and Linux kernel memory management.
I would be interested in this if it enabled deterministic simulation testing for the Go programming languages. There have been some efforts in this area but with little success.
I use TinyGo, and it does that job well. Not sure if it’s necessary to mainline it.
TinyGo targets an entirely different class of systems and is not something that can be upstream being a different compiler, see https://github.com/usbarmory/tamago/wiki/Frequently-Asked-Qu...