Going all the way back to the earliest C compilers on DOS. There was a decision made to make “\n” just work on DOS for portability of Unix programs, and to make the examples from the C programming book just work.
But in Unix “\n” is a single byte, and in DOS it is 2. So they introduced text and binary modes for files on DOS. Behind the scenes the library will handle the extra byte. This is not necessary in Unix.
I used to have to be careful about importing files to DOS. Did the file come from Unix?
In binary mode. In text mode if you printf(“Hello World\n”) you get CRLF because that’s how text works on DOS. Unix had the convention of only requiring the LF for text. And Unix didn’t have text/binary modes. That’s the compatibility hack on DOS.
>These control codes go back to line printers.
Back to teletypes even. Believe me, I go back to line printers.
I'm pretty sure that conversion was done by the C library, just as stated in the article. Not by DOS. ASCII 0x0A '\n' is always one byte*, and C library implementations for DOS would insert an ASCII 0x0D '\r' byte before it at output time if the C FILE stream had been opened in text mode.
Note that printf(), which you use in your example, is a C library function that writes writes to a predefined text mode stream. So it follows the same rules.
I wasn't able to dig up the source code of a vintage DOS compiler's C library in a few minutes of looking, so I can't prove it right now, but this section of the C standard (7.21.2 - Streams) hints that my recollection is correct:
Annoyingly I actually think '\r\n' is the correct line ending here - advance the paper and return the carriage, but I suppose unix took the simpler implementation which makes looping over characters, words (split by ' ') and lines (split by '\n') simpler as each loop only has a single comparison
The carriage return and linefeed combo are the commands to move to the next line of a teletype. Other commands might (in theory) be used for this purpose on other devices. These are implementation details.
Text inside a computer doesn't need any of that just to signal a newline. UNIX chose to use a single line feed character as a line separator because there was no good reason to use two. MacOS chose a single carriage return for similar reasons. Anything going out to a printer or teletype would run through a device driver that would turn the newline character into whatever the device expects.
Windows copied DOS which copied CP/M which was a very basic program loader for 8-bit machines and didn't really have "drivers" like we think of them today. I'm guessing here, but I imagine they chose the teletype combo because that's what most serial printers understood and printing was a major use case for those machines. That was probably the right choice for CP/M, but I can't imagine Microsoft would choose it if they were developing Windows from scratch today.
Yep, on Unixen the translation of CRLF to LF when printing to the terminal (and from CR to CRLF when reading input from the terminal) is done in the kernel, it's called "line discipline".
And if you switch the tty from "cooked" to "raw" mode then it doesn't do the conversion, and a CR just moves the cursor back to the start of the line and a LF just moves the cursor one line down.
It's the other way around. It's the C runtime that treats text ("t") mode differently, because the C standard specifies \n as a line delimiter but the Windows convention is \r\n. In text mode C stdio translates between \n and \r\n. In binary mode it does no translation.
The article seems to be taking the position that the C runtime library is not part of "Windows", which feels like a rather odd view to me. What is the stable API that Windows offers to application developers if not that?
There is a very unfortunate situation in Unix systems in that the library named 'libc' is serving several simultaneous different roles. One of those roles--what it is named for--is serving as the C standard library. The more important role is that the library also provides the implementation of a different standard API, the POSIX API, which is the main API used to access system details. There's also yet another role of providing the stable system interface to the kernel in most Unix implementations. On Windows, these roles are provided by different libraries: ucrt (what used to be msvcrt), kernel32, and ntdll, respectively.
And for what it's worth, the actual C standard library tends to be fairly rarely used, especially if you consider the malloc/free interface to be part of the system library rather than the C standard library. The C stdio functionality, for example, is extremely underpowered compared to the capabilities of all major operating systems' I/O libraries, and so most applications--even those written entirely in C--will choose to avoid the C standard library and instead use the more direct primitives of the system API layer instead.
Not OP, but thank you for your sharing this. If you don't mind a follow-on question, I always hear people talk about the "runtime" in languages like Go and libraries like Tokio. What is that these runtimes are doing that you cannot get from the likes of libc and these Windows DLLs?
MSVCRT, the Microsoft Visual C/C++ Runtime library, is also 'the runtime library'. It was the runtime library for Microsoft's C/C++ compiler. In the days when there were multiple C/C++ implementations for Win32 (which still exist, if one is willing to dig up Watcom C/C++ or some such) there would be different runtime library DLLs for the different C/C++ vendors, even for different versions of their products.
Runtime libraries for C/C++ provide two general sets of stuff: the stuff mandated for the Standard C and Standard C++ libraries, and the stuff that is needed by the basic mechanics of the language.
The former is everything from abort() to wscanf(). The latter is a bunch of internal functions, calls to which the compiler inserts in order to do stuff. This is basically the split nowadays between UCRT and VCRUNTIME.
In the days of programming targetting the 80486SX without an 80487 present, for instance, every piece of floating point arithmetic was not a machine instruction but a call to a runtime library routine that did the floating point operation longhand using non-FPU instructions. Other runtime functionality over the years has included doing 32-bit or 64-bit integer arithmetic on 16-bit and 32-bit architectures where this was not a native word size, functions to do stack checking in the function perilogue, and functions to do C++ run-type type checking and exception processing.
This pattern is followed by other (compiled) programming languages. Naturally, the programming languages do not necessarily have any relation to the Standard C or Standard C++ libraries, nor do they generate code that needs the same helper functions for stuff as C/C++ code does. (But the situation is complicated by the POSIX API and the old C language bindings for the MS-DOS system call API, some of which another programming language might also allow program code to use.)
It wasn't until fairly recently that the C runtime was stably shipped with Windows. Previously you had to install the correct version of the C library alongside your application.
Which is called from what, if not C? Does windows really offer no API for writing text (rather than bytes) to files? Or does it rely on the application developer to manage line endings in their own code? Neither of those sounds very developer-friendly.
Calling it from C does not mean you need a full C standard library to exist. For example, much of the C standard library is itself written in C. But it's a "freestanding" C which assumes only a minimal set of library functions exist (e.g. functions for copying memory from one place to another, filling memory with zeroes, etc).
And you can of course use non-C languages to call the Win32 API. Or even directly using assembly code.
> you can of course use non-C languages to call the Win32 API. Or even directly using assembly code.
Is that a supported/official API though? On Linux you "can" put your arguments in registers and trigger the system call interrupt directly, and I think Go programs even do this, but it's not the official interface and they reserve the right to break your program in future updates, at least in theory.
This is incorrect. The syscall ABI is the supported stable ABI for Linux, not the libc API - there's no single supported C library for Linux, and libc often lags behind the kernel in terms of providing syscall wrappers, so punting it to that level wouldn't work. This is in contrast to the BSDs that have libc tightly coupled to the kernel.
Of course, the Linux solution results in some weirdness, especially because specs like POSIX cover the C API, not the syscall ABI. setuid() at the libc layer is specced as changing the UID for all threads in a process. The Linux setuid() syscall only changes the current thread[1], and it's up to the C library to do some absolute magic to then propagate that to all other threads. Which made things difficult for things not using the C library, like Go (https://github.com/golang/go/issues/1435). But that's still not an argument that the supported interface is the C library - the kernel advertises the interface it exposes via the syscall ABI, and will retain that functionality, and if you want POSIX compatibility then you get it from somewhere else.
[1] In Linux, a thread is just a very slightly special case of a process
Sure. C has never been the only language supported on Windows.
For instance, Delphi had a period of popularity for Windows application development, and AFAIK it has always used its own runtime library which is completely independent of the C runtime.
Go does not trigger low-level system call interrupts on Windows. (It does that on Linux, but Windows syscall numbers are not stable even across minor Windows updates, so if Go did that, its Windows binaries would be incredibly fragile.)
On Windows NT, Go uses the userspace wrappers provided in Windows system libraries such as NTDLL.DLL and KERNEL32.DLL. But those too are entirely separate from the C runtime.
Calling win32 from other languages is supported, calling it from assembly is supported (as long as you use the calling convention properly, obviously), using ntdll to bypass the win32 API is not supported.
Basically on Linux the syscalls are the equivalent of Win32 except much narrower in scope.
The Win32 API doesn't even use the "C" calling convention. C is just another language to Windows and the standard C library is a cross-platform library for C. You could also write C code on classic Mac OS and it had it's own API as well but more styled for Pascal.
The OS and C being closely related is not universal across all operating systems, it's just a Unix thing.
From literally any language. The WriteFile function comes from kernel32.dll shared library, and follows the certain calling convention. You don't need to use this calling convention inside your own binary (and indeed, MinGW and MSYS use SysV ABI for everything except when calling Win32 API), or ask a random C runtime coming from God knows where to do this for you if you write something other than C.
In the UNIX world there is this strange notion that C language is somehow special and that the OS itself should provide its runtime (a single global version of it) for every program, even those written in other languages, to interact with the OS but... it's just silly.
> Does windows really offer no API for writing text (rather than bytes) to files? Or does it rely on the application developer to manage line endings in their own code? Neither of those sounds very developer-friendly.
No it doesn't. That logic belongs in the OS-specific layer in the runtimes/standard libraries of the implementations of the different programming languages. They may decide to re-use each other libraries, of course, or they may decide not to.
> You don't need to use this calling convention inside your own binary (and indeed, MinGW and MSYS use SysV ABI for everything except when calling Win32 API), or ask a random C runtime coming from God knows where to do this for you if you write something other than C.
Well sure but you have to define it somewhere. At some point there's an interface where something that's part of the application asks something that's part of the OS to do something, and that interface had better be stable and well-specified. If you really want you can use a different interface from your C ABI, sure, but given that, like it or not, most of windows is written in C (or in C++ but using C linkage between component boundaries), what do you gain?
Even so, most of Windows historically did not use C ABI, but rather stdcall, so specifying a call from your C library to the Windows C library couldn’t be done in a purely standards-compliant C compiler (which doesn’t have calling convention modifiers), in a slightly pedantic quirk of the C spec design
> At some point there's an interface where something that's part of the application asks something that's part of the OS to do something, and that interface had better be stable and well-specified.
It's defined, and well-specified.
> your C ABI
Which is a C ABI. Borland's Turbo C and C++Builder used different ABI than Microsoft C compiler did. GCC for Windows used to use a third, entirely different ABI as well. The ABI is not part of the language definition, you see.
> most of windows is written in C
And compiled with a very specific C compiler that used a particular ABI. That only means that you need to follow it when you call into the OS, sure, but not that you have to stick to it anywhere else — and indeed, most implementations of many programming languages on Windows didn't; they invented and used their own ABIs.
> Which is a C ABI. Borland's Turbo C and C++Builder used different ABI than Microsoft C compiler did. GCC for Windows used to use a third, entirely different ABI as well.
Sure, you can do that. Userspace code can use any ABI it wants, or none. But again, why, what do you gain?
And regardless of whether it's "the" ABI or merely "a" ABI, that ABI presumably has a representation for strings and allows passing them around - and while you certainly could use a different representation in your program (or in the OS internals) and transform strings back and forth when calling the OS (or when receiving calls from userspace), you probably don't want to. At which point we're back at needing a way to write strings in an in-memory format to OS-standard files in the filesystem.
Performance? Codegen simplicity? Why, again, must one use the syscall ABI for anything that is not a syscall?
> that ABI presumably has a representation for strings and allows passing them around
In this particular case, the API operates with binary buffers, not text strings. Sure, you can go the VMS way, or even IBM way, and turn files from binary blobs into arrays of fixed-length records (that's why C's fwrite/fread have both num and sz arguments: some OSes literally can't write data any other way).
> At which point we're back at needing a way to write strings in an in-memory format to OS-standard files in the filesystem.
Yes? Some text editors converted LFs to NULs to work on the text in memory, and then they'd convert NULs back into LFs on writing to the disk (IIRC). Both emacs and vi don't store text in memory the way it's layed out in the file; they translate it when writing to the disk.
Again, why do you want the OS to get involved into any of this? It's not the OS's job, period, stop trying to make the world an even worse place.
> The whole issue is specific to C and languages that copied C or use its runtime underneath in implementations (like Python)
So it's "specific to" almost all programming languages in actual use. That's a rather esoteric point.
> For reference, Unix has no API other than bytes either.
Unix does offer an API for writing C-standard in-memory text strings to Unix-standard on-disk text files, it just happens to be the same one as the API for writing in-memory binary strings to on-disk binary files.
> Unix does offer an API for writing C-standard in-memory text strings
Why on bloody Earth should a presumably generic-purpose OS provide a special API for dealing with internal representation of some data structure in a (particular) implementation of a (particular) programming language?
Besides, it doesn't offer such an API anyhow; you need to take care to manually pass the result of a strlen() call instead of sizeof()'s as the value for the len parameter of a write() call, otherwise a NUL-terminator will get written into the file as well.
And C says nothing about what constitutes a line break, by the way. Nor does it have any concept of a "line", or any utilities for working with lines specifically, it only knows of strings, and that's all. The concept of "text line" is POSIX.
> Why on bloody Earth should a presumably generic-purpose OS provide a special API for dealing with internal representation of some data structure in a (particular) implementation of a (particular) programming language?
Because the purpose of the OS is to facilitate applications (and, on the other end, facilitate hardware), and those applications tend to have a need to process text in-memory and then store it on the filesystem?
All you need for that is the ability to read and write binary blobs to and from files, which Windows gives you, and to know what "text files" means for the other programs on that platform. Windows itself doesn't care for text much; but the other programs have a shared convention that ASCII text files have CRLF-separated variable-length lines of text, and Unicode text files store text in UTF16-LE, (including the CRLF pairs, so those look like "\x0D\x00\x0A\x00" as raw bytes).
All of this is left to the user space to sort out, just as it is on Linux, so I am not entirely sure why you demand Windows to do more for you than Linux does.
The OS is the one providing the filesystem, it should define and support how it's used (including providing standard utilities for manipulating it, both from programs or by the operator) rather than leaving the programs to figure it out between themselves. (After all, if the text storage format didn't matter to the OS, why would we bother using the CRLF format on windows at all? I submit that third-party programs did not spontaneously come up with an arbitrary convention that everyone would use a different text format on Windows; rather programs use CRLF when running on Windows precisely because the standard utilities that ship as part of DOS/Windows expect that format)
As already stated multiple times here, the CRLF is actually the "correct" way (at least in the telex days, where CR and LF have actual meanings of "Return Carriage to home" and "Feed a new Line"), while the LF-only one is a Unix "hack"/abstraction (which was actually converted back into CRLF if fed to a telex or a terminal). It is not really a surprise that DOS, which was inspired by CP/M, simply copied what was supposed to be a physical signal. This is the reason the ASCII/ANSI code has a BEL indicator for ringing a bell. In short, CRLF is the way to handle newlines at the time that DOS was designed. You will expect that CRLF is the ending because that's how terminals work (unlike with the magicking Unix which smooshes two differing things into a character).
If you are writing a developer suite, whether you're Delphi developing for MS-DOS or Microsoft developing for Apple II, you kinda have the idea of how things should work (because you have the reference book for the platform, not the compiler/language). It is not the assumption that the OS provides abstraction for text - in thise days, everyone just implement it from scratch, really ("code page" was from literal code pages, where each character has a well-defined byte). This is manifested in command-line handling on Windows: the platform convention is that it is just a flat string, and the C runtime determines how to chop that up (MSVC and Intel C has historically disagreed heavily here) The abberation of Windows only having CRLF is because Unix-based designs took over the world: macOS is Unix, Linux was insiped by Unix, *BSD was Unix-derived.
I believe that it technically belongs to Visual C++, not the operating system, but it needs to ship with the OS because the user space binaries are compiled with MSVC.
It's both. Originally Visual C++ binaries built for DLL-based C runtime relied on MSVCRT.DLL and that was installed by the redist. Starting with Visual Studio .NET 2002, separate CRT DLLs starting with MSVCR70.DLL were used. MSVCRT.DLL is now part of Windows to support parts of the OS itself and for compatibility with programs that still use it. I think some versions of MinGW also use MSVCRT.
Current versions of the OS ship with functions in MSVCRT.DLL that weren't in the last VC6 version, such as the updated C++ exception handler (__CxxFrameHandler4). AFAIK, there is no redistributable version of it, it's unique to the OS.
That is for backwards compatibility, the now finally official C standard library that is distributed with the OS, since Windows 10, is UCRT (Universal C Runtime).
The C standard library is definitely not part of Windows.
It is now with the Universal C runtime, introduced in Windows 10, which is ironically written in C++ with extern "C" { ... }
On non UNIX clones, including Windows, it has always been the role of commercial C compilers to provide the C standard library on top of the actual C APIs.
Indeed, C runtime is not part of windows API, and it's normal to have a program include few different copies of C runtime library due to different modules compiled with different compilers/options.
C runtime library being part of OS is accidental thing in Unix, 16bit and 32bit Windows API even does not use C-compatible ABI (instead, Pascal-compatible one is present)
Going all the way back to the earliest C compilers on DOS. There was a decision made to make “\n” just work on DOS for portability of Unix programs, and to make the examples from the C programming book just work.
But in Unix “\n” is a single byte, and in DOS it is 2. So they introduced text and binary modes for files on DOS. Behind the scenes the library will handle the extra byte. This is not necessary in Unix.
I used to have to be careful about importing files to DOS. Did the file come from Unix?
Linefeed (\n) is a single byte in DOS as well.
I think you are talking about carriage return linefeed pair (CRLF or \r\n),
These control codes go back to line printers. Linefeed advances the paper one line and carriage return moves the print head to the left.
>Linefeed (\n) is a single byte in DOS as well.
In binary mode. In text mode if you printf(“Hello World\n”) you get CRLF because that’s how text works on DOS. Unix had the convention of only requiring the LF for text. And Unix didn’t have text/binary modes. That’s the compatibility hack on DOS.
>These control codes go back to line printers.
Back to teletypes even. Believe me, I go back to line printers.
I'm pretty sure that conversion was done by the C library, just as stated in the article. Not by DOS. ASCII 0x0A '\n' is always one byte*, and C library implementations for DOS would insert an ASCII 0x0D '\r' byte before it at output time if the C FILE stream had been opened in text mode.
Note that printf(), which you use in your example, is a C library function that writes writes to a predefined text mode stream. So it follows the same rules.
I wasn't able to dig up the source code of a vintage DOS compiler's C library in a few minutes of looking, so I can't prove it right now, but this section of the C standard (7.21.2 - Streams) hints that my recollection is correct:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#p...
*(On systems where the char type is one byte, of course, which is the case for DOS C compilers.)
Agreed, I didn’t mean that DOS somehow converted it, this was a compatibility feature put into the C library.
I wrote a C Standard library for MS/PC/DR-DOS. Your recollection is correct.
Thanks for confirming!
Teletypes is what I meant to say, thank you. Line printers have no carriage return mechanism.
Annoyingly I actually think '\r\n' is the correct line ending here - advance the paper and return the carriage, but I suppose unix took the simpler implementation which makes looping over characters, words (split by ' ') and lines (split by '\n') simpler as each loop only has a single comparison
The carriage return and linefeed combo are the commands to move to the next line of a teletype. Other commands might (in theory) be used for this purpose on other devices. These are implementation details.
Text inside a computer doesn't need any of that just to signal a newline. UNIX chose to use a single line feed character as a line separator because there was no good reason to use two. MacOS chose a single carriage return for similar reasons. Anything going out to a printer or teletype would run through a device driver that would turn the newline character into whatever the device expects.
Windows copied DOS which copied CP/M which was a very basic program loader for 8-bit machines and didn't really have "drivers" like we think of them today. I'm guessing here, but I imagine they chose the teletype combo because that's what most serial printers understood and printing was a major use case for those machines. That was probably the right choice for CP/M, but I can't imagine Microsoft would choose it if they were developing Windows from scratch today.
Yep, on Unixen the translation of CRLF to LF when printing to the terminal (and from CR to CRLF when reading input from the terminal) is done in the kernel, it's called "line discipline".
And if you switch the tty from "cooked" to "raw" mode then it doesn't do the conversion, and a CR just moves the cursor back to the start of the line and a LF just moves the cursor one line down.
Which is how you do the fun spinny icons on the command line without having to invoke ncurses!
You can also just use a \r directly without a \n. For example:
Similarly on Classic Mac OS, C compilers would map single '\n' to single '\r' which is what was the Mac OS convention.
Interestingly, the IETF has several published RFCs for text protocols, all of which require \r\n line endings.
<https://www.rfc-editor.org/old/EOLstory.txt>
Note this does not apply to file formats (except for RFCs).
fopen(..., "wb") ?
It's C library taking care of the "b" part for you according to the article.
It's the other way around. It's the C runtime that treats text ("t") mode differently, because the C standard specifies \n as a line delimiter but the Windows convention is \r\n. In text mode C stdio translates between \n and \r\n. In binary mode it does no translation.
The article seems to be taking the position that the C runtime library is not part of "Windows", which feels like a rather odd view to me. What is the stable API that Windows offers to application developers if not that?
There is a very unfortunate situation in Unix systems in that the library named 'libc' is serving several simultaneous different roles. One of those roles--what it is named for--is serving as the C standard library. The more important role is that the library also provides the implementation of a different standard API, the POSIX API, which is the main API used to access system details. There's also yet another role of providing the stable system interface to the kernel in most Unix implementations. On Windows, these roles are provided by different libraries: ucrt (what used to be msvcrt), kernel32, and ntdll, respectively.
And for what it's worth, the actual C standard library tends to be fairly rarely used, especially if you consider the malloc/free interface to be part of the system library rather than the C standard library. The C stdio functionality, for example, is extremely underpowered compared to the capabilities of all major operating systems' I/O libraries, and so most applications--even those written entirely in C--will choose to avoid the C standard library and instead use the more direct primitives of the system API layer instead.
Not OP, but thank you for your sharing this. If you don't mind a follow-on question, I always hear people talk about the "runtime" in languages like Go and libraries like Tokio. What is that these runtimes are doing that you cannot get from the likes of libc and these Windows DLLs?
MSVCRT, the Microsoft Visual C/C++ Runtime library, is also 'the runtime library'. It was the runtime library for Microsoft's C/C++ compiler. In the days when there were multiple C/C++ implementations for Win32 (which still exist, if one is willing to dig up Watcom C/C++ or some such) there would be different runtime library DLLs for the different C/C++ vendors, even for different versions of their products.
Runtime libraries for C/C++ provide two general sets of stuff: the stuff mandated for the Standard C and Standard C++ libraries, and the stuff that is needed by the basic mechanics of the language.
The former is everything from abort() to wscanf(). The latter is a bunch of internal functions, calls to which the compiler inserts in order to do stuff. This is basically the split nowadays between UCRT and VCRUNTIME.
In the days of programming targetting the 80486SX without an 80487 present, for instance, every piece of floating point arithmetic was not a machine instruction but a call to a runtime library routine that did the floating point operation longhand using non-FPU instructions. Other runtime functionality over the years has included doing 32-bit or 64-bit integer arithmetic on 16-bit and 32-bit architectures where this was not a native word size, functions to do stack checking in the function perilogue, and functions to do C++ run-type type checking and exception processing.
This pattern is followed by other (compiled) programming languages. Naturally, the programming languages do not necessarily have any relation to the Standard C or Standard C++ libraries, nor do they generate code that needs the same helper functions for stuff as C/C++ code does. (But the situation is complicated by the POSIX API and the old C language bindings for the MS-DOS system call API, some of which another programming language might also allow program code to use.)
The Win32 API. E.g. using WriteFile to write files (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/...)
It wasn't until fairly recently that the C runtime was stably shipped with Windows. Previously you had to install the correct version of the C library alongside your application.
> The Win32 API. E.g. using WriteFile to write files (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/...)
Which is called from what, if not C? Does windows really offer no API for writing text (rather than bytes) to files? Or does it rely on the application developer to manage line endings in their own code? Neither of those sounds very developer-friendly.
Calling it from C does not mean you need a full C standard library to exist. For example, much of the C standard library is itself written in C. But it's a "freestanding" C which assumes only a minimal set of library functions exist (e.g. functions for copying memory from one place to another, filling memory with zeroes, etc).
And you can of course use non-C languages to call the Win32 API. Or even directly using assembly code.
> you can of course use non-C languages to call the Win32 API. Or even directly using assembly code.
Is that a supported/official API though? On Linux you "can" put your arguments in registers and trigger the system call interrupt directly, and I think Go programs even do this, but it's not the official interface and they reserve the right to break your program in future updates, at least in theory.
This is incorrect. The syscall ABI is the supported stable ABI for Linux, not the libc API - there's no single supported C library for Linux, and libc often lags behind the kernel in terms of providing syscall wrappers, so punting it to that level wouldn't work. This is in contrast to the BSDs that have libc tightly coupled to the kernel.
Of course, the Linux solution results in some weirdness, especially because specs like POSIX cover the C API, not the syscall ABI. setuid() at the libc layer is specced as changing the UID for all threads in a process. The Linux setuid() syscall only changes the current thread[1], and it's up to the C library to do some absolute magic to then propagate that to all other threads. Which made things difficult for things not using the C library, like Go (https://github.com/golang/go/issues/1435). But that's still not an argument that the supported interface is the C library - the kernel advertises the interface it exposes via the syscall ABI, and will retain that functionality, and if you want POSIX compatibility then you get it from somewhere else.
[1] In Linux, a thread is just a very slightly special case of a process
Sure. C has never been the only language supported on Windows.
For instance, Delphi had a period of popularity for Windows application development, and AFAIK it has always used its own runtime library which is completely independent of the C runtime.
Go does not trigger low-level system call interrupts on Windows. (It does that on Linux, but Windows syscall numbers are not stable even across minor Windows updates, so if Go did that, its Windows binaries would be incredibly fragile.)
On Windows NT, Go uses the userspace wrappers provided in Windows system libraries such as NTDLL.DLL and KERNEL32.DLL. But those too are entirely separate from the C runtime.
Don't forget the days when multiple C/C++ implementations from multiple vendors all came with their own runtime library DLLs, too.
Calling win32 from other languages is supported, calling it from assembly is supported (as long as you use the calling convention properly, obviously), using ntdll to bypass the win32 API is not supported.
Basically on Linux the syscalls are the equivalent of Win32 except much narrower in scope.
> Is that a supported/official API though?
The Win32 API doesn't even use the "C" calling convention. C is just another language to Windows and the standard C library is a cross-platform library for C. You could also write C code on classic Mac OS and it had it's own API as well but more styled for Pascal.
The OS and C being closely related is not universal across all operating systems, it's just a Unix thing.
> Which is called from what, if not C?
A prominent example is Delphi[1]. At work our primary application is a 20 year old Delphi Win32 application, which we ship new features in weekly.
Delphi does not rely on the C runtime, instead having its own system library which interfaces with the Win32 API that gets compiled in.
[1]: https://en.wikipedia.org/wiki/Delphi_(software)
From literally any language. The WriteFile function comes from kernel32.dll shared library, and follows the certain calling convention. You don't need to use this calling convention inside your own binary (and indeed, MinGW and MSYS use SysV ABI for everything except when calling Win32 API), or ask a random C runtime coming from God knows where to do this for you if you write something other than C.
In the UNIX world there is this strange notion that C language is somehow special and that the OS itself should provide its runtime (a single global version of it) for every program, even those written in other languages, to interact with the OS but... it's just silly.
> Does windows really offer no API for writing text (rather than bytes) to files? Or does it rely on the application developer to manage line endings in their own code? Neither of those sounds very developer-friendly.
No it doesn't. That logic belongs in the OS-specific layer in the runtimes/standard libraries of the implementations of the different programming languages. They may decide to re-use each other libraries, of course, or they may decide not to.
> You don't need to use this calling convention inside your own binary (and indeed, MinGW and MSYS use SysV ABI for everything except when calling Win32 API), or ask a random C runtime coming from God knows where to do this for you if you write something other than C.
Well sure but you have to define it somewhere. At some point there's an interface where something that's part of the application asks something that's part of the OS to do something, and that interface had better be stable and well-specified. If you really want you can use a different interface from your C ABI, sure, but given that, like it or not, most of windows is written in C (or in C++ but using C linkage between component boundaries), what do you gain?
Even so, most of Windows historically did not use C ABI, but rather stdcall, so specifying a call from your C library to the Windows C library couldn’t be done in a purely standards-compliant C compiler (which doesn’t have calling convention modifiers), in a slightly pedantic quirk of the C spec design
> At some point there's an interface where something that's part of the application asks something that's part of the OS to do something, and that interface had better be stable and well-specified.
It's defined, and well-specified.
> your C ABI
Which is a C ABI. Borland's Turbo C and C++Builder used different ABI than Microsoft C compiler did. GCC for Windows used to use a third, entirely different ABI as well. The ABI is not part of the language definition, you see.
> most of windows is written in C
And compiled with a very specific C compiler that used a particular ABI. That only means that you need to follow it when you call into the OS, sure, but not that you have to stick to it anywhere else — and indeed, most implementations of many programming languages on Windows didn't; they invented and used their own ABIs.
> Which is a C ABI. Borland's Turbo C and C++Builder used different ABI than Microsoft C compiler did. GCC for Windows used to use a third, entirely different ABI as well.
Sure, you can do that. Userspace code can use any ABI it wants, or none. But again, why, what do you gain?
And regardless of whether it's "the" ABI or merely "a" ABI, that ABI presumably has a representation for strings and allows passing them around - and while you certainly could use a different representation in your program (or in the OS internals) and transform strings back and forth when calling the OS (or when receiving calls from userspace), you probably don't want to. At which point we're back at needing a way to write strings in an in-memory format to OS-standard files in the filesystem.
> But again, why, what do you gain?
Performance? Codegen simplicity? Why, again, must one use the syscall ABI for anything that is not a syscall?
> that ABI presumably has a representation for strings and allows passing them around
In this particular case, the API operates with binary buffers, not text strings. Sure, you can go the VMS way, or even IBM way, and turn files from binary blobs into arrays of fixed-length records (that's why C's fwrite/fread have both num and sz arguments: some OSes literally can't write data any other way).
> At which point we're back at needing a way to write strings in an in-memory format to OS-standard files in the filesystem.
Yes? Some text editors converted LFs to NULs to work on the text in memory, and then they'd convert NULs back into LFs on writing to the disk (IIRC). Both emacs and vi don't store text in memory the way it's layed out in the file; they translate it when writing to the disk.
Again, why do you want the OS to get involved into any of this? It's not the OS's job, period, stop trying to make the world an even worse place.
The whole issue is specific to C and languages that copied C or use its runtime underneath in implementations (like Python)
For reference, Unix has no API other than bytes either.
> The whole issue is specific to C and languages that copied C or use its runtime underneath in implementations (like Python)
So it's "specific to" almost all programming languages in actual use. That's a rather esoteric point.
> For reference, Unix has no API other than bytes either.
Unix does offer an API for writing C-standard in-memory text strings to Unix-standard on-disk text files, it just happens to be the same one as the API for writing in-memory binary strings to on-disk binary files.
> Unix does offer an API for writing C-standard in-memory text strings
Why on bloody Earth should a presumably generic-purpose OS provide a special API for dealing with internal representation of some data structure in a (particular) implementation of a (particular) programming language?
Besides, it doesn't offer such an API anyhow; you need to take care to manually pass the result of a strlen() call instead of sizeof()'s as the value for the len parameter of a write() call, otherwise a NUL-terminator will get written into the file as well.
And C says nothing about what constitutes a line break, by the way. Nor does it have any concept of a "line", or any utilities for working with lines specifically, it only knows of strings, and that's all. The concept of "text line" is POSIX.
> Why on bloody Earth should a presumably generic-purpose OS provide a special API for dealing with internal representation of some data structure in a (particular) implementation of a (particular) programming language?
Because the purpose of the OS is to facilitate applications (and, on the other end, facilitate hardware), and those applications tend to have a need to process text in-memory and then store it on the filesystem?
All you need for that is the ability to read and write binary blobs to and from files, which Windows gives you, and to know what "text files" means for the other programs on that platform. Windows itself doesn't care for text much; but the other programs have a shared convention that ASCII text files have CRLF-separated variable-length lines of text, and Unicode text files store text in UTF16-LE, (including the CRLF pairs, so those look like "\x0D\x00\x0A\x00" as raw bytes).
All of this is left to the user space to sort out, just as it is on Linux, so I am not entirely sure why you demand Windows to do more for you than Linux does.
The OS is the one providing the filesystem, it should define and support how it's used (including providing standard utilities for manipulating it, both from programs or by the operator) rather than leaving the programs to figure it out between themselves. (After all, if the text storage format didn't matter to the OS, why would we bother using the CRLF format on windows at all? I submit that third-party programs did not spontaneously come up with an arbitrary convention that everyone would use a different text format on Windows; rather programs use CRLF when running on Windows precisely because the standard utilities that ship as part of DOS/Windows expect that format)
As already stated multiple times here, the CRLF is actually the "correct" way (at least in the telex days, where CR and LF have actual meanings of "Return Carriage to home" and "Feed a new Line"), while the LF-only one is a Unix "hack"/abstraction (which was actually converted back into CRLF if fed to a telex or a terminal). It is not really a surprise that DOS, which was inspired by CP/M, simply copied what was supposed to be a physical signal. This is the reason the ASCII/ANSI code has a BEL indicator for ringing a bell. In short, CRLF is the way to handle newlines at the time that DOS was designed. You will expect that CRLF is the ending because that's how terminals work (unlike with the magicking Unix which smooshes two differing things into a character).
If you are writing a developer suite, whether you're Delphi developing for MS-DOS or Microsoft developing for Apple II, you kinda have the idea of how things should work (because you have the reference book for the platform, not the compiler/language). It is not the assumption that the OS provides abstraction for text - in thise days, everyone just implement it from scratch, really ("code page" was from literal code pages, where each character has a well-defined byte). This is manifested in command-line handling on Windows: the platform convention is that it is just a flat string, and the C runtime determines how to chop that up (MSVC and Intel C has historically disagreed heavily here) The abberation of Windows only having CRLF is because Unix-based designs took over the world: macOS is Unix, Linux was insiped by Unix, *BSD was Unix-derived.
It still shows up in IETF-style textual network protocols, which evolved on non-Unix systems (HTTP, SMTP, etc.)
From whatever programming language you feel like using.
It wasn't until fairly recently
By "recently" you mean Win95? MSVCRT.DLL has been there for at least that long.
For some background on what I meant see:
https://devblogs.microsoft.com/oldnewthing/20140411-00/?p=12...
https://learn.microsoft.com/en-us/cpp/windows/universal-crt-...
I believe that it technically belongs to Visual C++, not the operating system, but it needs to ship with the OS because the user space binaries are compiled with MSVC.
It's both. Originally Visual C++ binaries built for DLL-based C runtime relied on MSVCRT.DLL and that was installed by the redist. Starting with Visual Studio .NET 2002, separate CRT DLLs starting with MSVCR70.DLL were used. MSVCRT.DLL is now part of Windows to support parts of the OS itself and for compatibility with programs that still use it. I think some versions of MinGW also use MSVCRT.
Current versions of the OS ship with functions in MSVCRT.DLL that weren't in the last VC6 version, such as the updated C++ exception handler (__CxxFrameHandler4). AFAIK, there is no redistributable version of it, it's unique to the OS.
That is for backwards compatibility, the now finally official C standard library that is distributed with the OS, since Windows 10, is UCRT (Universal C Runtime).
It was there but mystery meat vs whatever version you might need for your binary.
They are backwards-compatible. I've written many tiny (few KB) utilities that work from Win95 through Win11, and of course WINE.
Win32.
The C standard library is definitely not part of Windows.
It is now with the Universal C runtime, introduced in Windows 10, which is ironically written in C++ with extern "C" { ... }
On non UNIX clones, including Windows, it has always been the role of commercial C compilers to provide the C standard library on top of the actual C APIs.
Indeed, C runtime is not part of windows API, and it's normal to have a program include few different copies of C runtime library due to different modules compiled with different compilers/options.
C runtime library being part of OS is accidental thing in Unix, 16bit and 32bit Windows API even does not use C-compatible ABI (instead, Pascal-compatible one is present)