[HN Gopher] Writing a "bare metal" operating system for Raspberr... ___________________________________________________________________ Writing a "bare metal" operating system for Raspberry Pi 4 Author : rcarmo Score : 294 points Date : 2021-10-06 15:05 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | poetaster wrote: | For my older pies I found | https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ | great. But this is arm assembly territory. I believe subsequent | generations of pi have had good tutorials. OS, of course, is a | very large, encompassing term. What is a minimal OS? | hikerclimber1 wrote: | Everything is subjective. Especially laws. | nanis wrote: | I find the writing style tedious: The author expects a reader who | does not know about `make` or cross compilers to relate to | writing ARM64 assembly for the bootloader. | | If I am following along this material, then I don't need all the | digressions with close enough descriptions of the tools. Like, if | I am reading a home building tutorial, don't explain what a | hammer is. | subhro wrote: | Very refreshing to see this. It is so much fucking easy to grok | bare metal C compared to the <flavour of the year>-script junk | that floats around these days. | kennywinker wrote: | Ah yes, nice easy to grok code like `curval &= ~(field_mask << | shift);` :P | | But for real - I've had way more luck grokking embedded rust | than all of the bare metal C examples i've looked at. C breeds | dense bittwiddling and code that relies on inscrutable compiler | behavior. There are easier ways to learn how these systems work | at a bare-metal level. | PaulDavisThe1st wrote: | Would you like to propose or reference a way of doing bit- | twiddling that is clearer than this? | | Also hint: C doesn't breed bit-twiddling, writing software | that actually interacts directly with hardware does. | Veserv wrote: | They are just implementing a generic contiguous bitfield | clear. | | field_mask was probably constructed as ((1 << width) - 1) | instead of as a manifest constant. So you can just do: | | ClearBitField(input, width, shift) { return input & ~(((1 | << width) - 1) << shift) } | | Now you just use that everywhere you would clear a | contiguous bitfield which is a pretty common operation when | operating on hardware. Now all your bit-twiddling is | isolated to a single well-defined generically useful | function instead of repeating it a billion times. | | We know this is a generically valuable operation since this | is basically a C implementation of the ARMv8 bfi | (b)it(f)ield (i)nsert instruction with a fixed 0 argument | or in assembly: | | BFI X{n}, XZR, #shift, #width | kennywinker wrote: | I like this answer too. When opaque code is irreducibly | opaque, put it in a fn with a well chosen name. | kennywinker wrote: | That was just a throwaway example of a pretty write-only | line of code from the op codebase, but since you asked: | | One operation per line. A comment for every operation. | Shifts that explicitly say if they are wrapping or | overflowing. Rust uses ! instead of ~ but if I had my way | it'd be a named function like bitwise_invert(). | // curval &= ~(field_mask << shift); // original line | // pseudo-rust version let shifted_mask = | FIELD_MASK.wrapping_shl(shift); // be clear about what kind | of shift we're doing let invered_mask = | shifted_mask.bitwise_invert(); // use a fictional invert fn | to avoid single-char operators. let shifted_val = | curval & inverted_mask; // new variable instead of mutating | the existing one | | Ideally those comments would say WHY we're doing those ops | rather than what's notable about them - but i didn't dig | into the code enough to write explanations. | | And then we let the compiler crush that into an efficient | lil one liner like the author of the original code did | manually. | adrian_b wrote: | When booting a real CPU you might easily have to modify | from a few tens to a few hundreds of hardware registers, | by doing to each one or more such bit operations. | | If you would choose such a deliberately verbose style, | especially the splitting in multiple lines is the worst, | the written code would become really unreadable, as too | much space would be filled with text that does not | provide any information, obscuring the important parts. | | Normally the name of the register, the mask constant and | the shift constant have informative names that should | indicate all that needs to be known about the operation | done and any other symbols should occupy as less space as | possible on the line of code. | kennywinker wrote: | That's what functions and automatic compiler inlining are | for. See verserv's answer | https://news.ycombinator.com/item?id=28776751 | adrian_b wrote: | No, using functions for such things is worse. | | It does not matter if the compiler inlines them, | encapsulating the bit field operations obfuscates the | code instead of making it more easily understandable. | | It is not possible to make the name of the function to | provide more information than the triplet register name + | bit field name (the name of the shift constant) + the | name of the configuration option (the name of the mask | constant). | | Encapsulating the bit operations into a function just | makes you write exactly the same thing twice and when you | are reading the code you must waste extra time to check | each function definition to see whether it does the right | thing. | | The C code would look just like a table with the names, | where the operators just provide some delimiters in the | table that occupy little space. | | Replacing the operators with words makes such code less | readable and concatenating the named constants into | function names or using them as function arguments brings | no improvement. | | The only possible improvement over explicit bit | operations is to define the registers as structures with | bit-field members and use member assignment instead of | bit string operations. | | Unfortunately the number of register definitions for any | CPU is huge, so most programmers use headers provided by | the hardware vendor, as it would be too much work to | rewrite them. | | For almost all processors with which I have worked, the | hardware vendor has preferred to provide names for mask | constants and shift constants, instead of defining the | registers as structures, even if the latter would have | allowed more easy to read code. | kennywinker wrote: | > you must waste extra time to check each function | definition to see whether it does the right thing | | I think I see what you're arguing. That this: | reg1 &= ~(width_mask_1 << shift); reg2 &= | ~(width_mask_2 << shift); reg3 &= ~(width_mask_3 | << shift); // etc... | | is clearer than something like this: | reg1 = ClearBitField(reg1, 1, shift); reg2 = | ClearBitField(reg2, 2, shift); reg3 = | ClearBitField(reg3, 3, shift); // etc... | | If that's what you're arguing, I simply don't agree. | `ClearBitField` is descriptive and readable. It avoids | creating all those width_mask_n constants, since you | specify the width as input to the fn. You don't have to | go digging into `ClearBitField` because you wrote a unit | test to confirm that it does what it says on the label | and handles the edge cases. | | On top of that, the code inside `ClearBitField` can be as | verbose or as compact as you desire, because it's | contained and separated from the rest of the code. | adrian_b wrote: | Obviously this is a matter of personal preferences and | experience. | | Real register names are usually very long, to indicate | their purpose, so you would not want to repeat them on | each line. | | This can be avoided by redefining ClearBitField. | | Even so, writing an extra "ClearBitField" on each line | does not provide any information. It just clutters the | space. | | Anyone working with such code is very aware that &=~ | means clear bits and |= means set bits. | | When reading the table of names, the repeated function | name is just a distraction that is harder to overlook | than the operators. | | The way to improve over that is not adding anything on | the lines, but using simpler symbols by defining the | registers as structures, i.e.: | | register_1 . bit_field_1 = constant_name_1; | | register_2 . bit_field_2 = constant_name_2; | | register_3 . bit_field_3 = constant_name_3; | | Unfortunately, like I have said, the hardware vendors | seldom provide header files with structure definitions | for the registers and rewriting the headers is a huge | work. | | However, if you are able to rewrite just the register | definitions that you use, that would be better spent time | than attempting to write functions or macros for these | tasks. | fouric wrote: | I find the first code example easier to read and process. | | However, that's because I've written a fair bit of C | code, and so when my brain goes into "C mode", the | symbols &, =, ~, <<, etc. all have clear and unambiguous | meanings - whereas ClearBitField does not. Additionally, | the pattern ~(foo << bar) is a common C idiom, so beyond | the individual symbols, my brain recognizes the whole | pattern so it's "semantically compressed" (easier to | think about) for me. This would not be the case for a | beginner. | | Which style is better depends on an individual's | preferences and experiences - there's no "right" answer. | | This is a stellar example of one of the many reasons why | code-as-text is a huge mistake - because structure and | representation are conflated and coupled together. A | sanely written programming language represents code as | _code objects_ , and you can configure those code objects | to be displayed however you like, whether that's baz &= | ~(foo << bar) or ClearBitField(baz, 1, bar). | junon wrote: | No thanks, I'll take the C version any day. | kennywinker wrote: | Sure, the single line is more aesthetically pleasing. | Compact, clever, concise. But try fixing a bug or adding | new functionality to that one line. Especially as a | beginner. This is supposed to be an educational codebase. | isometimes wrote: | I've stated in part1 of the tutorial that "This tutorial | is not intended to teach you how to code in assembly | language or C". | | My goal was to demonstrate some basic principles to get | code running on bare metal, encourage curiosity, further | my own knowledge and document my findings. | | I appreciate that more self-documenting code might be | desirable, but to some people (me included) a large | number of lines can be as off-putting as more esoteric | syntax. I acknowledge, however, that it is very hard to | please everyone! | NobodyNada wrote: | That's significantly less readable than the C version. I | still have to know what a "left-shift" and "bitwise | invert" are, and if I knew that then I wouldn't have a | problem with `<<` or `~` either. IMO `<<` is even more | intuitive than `shl` because I can just look at the arrow | instead of having to think about which way "left" is (and | I don't even have a tendency to get "left" and "right" | confused). | | All the extra verbosity simply obfuscates the actual | intent of the code: clear all bits in field_mask (shifted | to the left by some offset). That's pretty easy to see | at-a-glance from the C code (some comments could make | that clearer, but this is simple enough that any | experienced systems programmer will know what this does | without comments). | | I agree that Rust embedded code is often more readable | than C, but that's done by creating abstractions to | manage complexity rather than just by adding more words. | For instance, one could write a wrapper struct that | provides a less-tedious interface than a bitfield (like | `curval.set_field(false)`). | kennywinker wrote: | >> All the extra verbosity simply obfuscates the actual | intent of the code | | I have a preference for verbosity in code, and I know | that many people don't share my preference. That's | alright - there's no exact right way to write that code. | But my point was C encourages you to write code that | relies on knowing secrets about specific hidden behavior | in your compiler. `shl` isn't more clear than `<<`, but | `wrapping_shl` and `overflowing_shl` ARE more clear, | because it makes us explicitly aware of behavior that | `<<` doesn't surface. | | As for clarity, I agree an abstraction would be best. And | Rust encourages those abstractions where C discourages | them. I'd still argue that the inside of that abstraction | should be the verbose version, but other than the | wrapping_shl that's mostly just a style/preference thing. | NobodyNada wrote: | In general, I'd agree with you -- I prefer spelling | things out explicitly instead of using terse | abbreviations. However, really common & fundamental math | operations benefit from some shorthand. For instance, 'y | = ax + b' is _way_ easier to read than: | let multiplied = a.wrapping_mul(x); let y = | multiplied.wrapping_add(b); | | The "terse" equation I can instantly recognize as a | linear function, while I'd have to stare at the more | verbose version it for a while to figure out what it | does. In my opinion, bitwise operators work the same way: | if you're working in a domain where you have to write | thousands of simple bitwise operations, a bit of | shorthand can make the code much more expressive. | sneak wrote: | I'm surprised this doesn't start with qemu on a Real Computer for | building/testing. | mrlonglong wrote: | I can recommend https://www.giters.com/rust-embedded/rust- | raspberrypi-OS-tut... for those of you interested in using Rust. | Be aware it also requires the use of Docker though but I don't | need Docker and have changed my code not to need it. | dljsjr wrote: | What is this site that's re-hosting GitHub repos? | AQuantized wrote: | This is perfect for me, I recently made the project OS for Nand | to Tetris and have been learning systems programming with Rust. | I wish there was a way to find resources like this more easily | than scouring HN or trying to sort through google searches. | chucksmash wrote: | I also did Nand2Tetris, like Rust, and had an interest in | more material in this area. I followed v2 of this tutorial[1] | and enjoyed it enough to become a GitHub sponsor for in- | progress UEFI work, you might enjoy: | | [1]: https://os.phil-opp.com/ | jfoutz wrote: | For a few glorious years google was amazing at this. the | difference coming from altavista was unreal. | | at this point, I think I'd prefer boolean queries like | altavista so I can search the word vectors myself. maybe some | meta info so I can include/exclude based on various tags and | links. | ggregoire wrote: | Is this an alternative UI for GitHub but without the files, | commits history and so on? Why tho? I'm confused. | | Actual GitHub repo for anyone looking for the files: | https://github.com/rust-embedded/rust-raspberrypi-OS-tutoria... | superkuh wrote: | Unlike github or gitlab this page is actually an HTML file | and does not need javascript executed to define the web | components "HTML". I appreciate an accessible link, at least. | I don't know if that's why he linked it. | mrlonglong wrote: | Oh, was it on GitHub? I hadn't noticed it was different. | I'll do better next time. | Brian_K_White wrote: | This is a great idea, but I can't square that kind of project | with WSL and brew instead of actual linux or macports. | | If it's meant for the youths and neophites where you don't want | to scare them with strange not-windows things, perfectly fine, | but then aarch64 assembly is already out of scope. | | If you don't know what's wrong with brew (as an os developer not | a casual user) then I can't take you seriously as a system | architect or os developer. | vagrantJin wrote: | Neophites? | | > _can 't take you seriously as a system architect or os | developer_ | | I dont think OS devs and Sys Architects are the intended | audience. What might be helpful is if those rather busy people | could chip in with their knowledge to improve said project | rather than off-handedly dismiss it. | | It is afterall being made available freely and some devs who | aren't low level proframmers might find it a good reason to | learn something low level as an OS, don't you think so? | Wouldn't it be nice? | kennywinker wrote: | I'm very familiar with many of homebrew's faults, but none of | them are dealbreakers for the casual "install latest version of | tool". If you don't like it, just install the same tools using | macports. | ac42 wrote: | And not to forget https://github.com/rsta2/circle | [deleted] | throwaway889900 wrote: | Of all the sections in the tutorial, | https://github.com/isometimes/rpi4-osdev/tree/master/part10-... | is probably the best one for anyone to read. I don't think a lot | of people grasp that all the cores on a system start running | immediately on power up and they're all running the same code | from memory initially. | Unklejoe wrote: | What happens when they all race to store to the same memory | location? I guess if they all run in lockstep it doesn't really | matter? | | I've worked with some ARM SoCs from NXP and I could have sworn | that one core comes up first and the others get released from | reset later, with a "bringing up secondary CPUs" message | printed. | pm215 wrote: | The code that runs at startup makes sure they don't all write | to the same location :-) A common simple approach goes: | * read the CPU main ID register * if core 0, branch to | primary-core bootup code * otherwise, go into a loop | (eg "read x from known location for this core, if x is non | zero branch to x, else keep looping") * core 0 releases | each secondary from the loop when it is ready -- this is when | core 0 prints that "bringing up secondary CPUs" message | | (There are a bunch of minor variants on this, eg waking | secondaries by sending them an interrupt so they can sleep | via wfi insn instead of busy looping, but the basic approach | is always the same.) | | It is also possible to do this in hardware -- you can have an | SoC with a power controller so secondaries start powered off | or held in reset, and the primary core prods the power | controller to start each secondary. | | On 64-bit Arm the common standard is that this is all handled | by the firmware (which implements a standard ABI called | PSCI), and the OS code just makes SMC calls into the firmware | for "power on the secondary". (The firmware does something | like the above under the hood.) | throwaway889900 wrote: | This is assuming an asymmetric multiprocessing model. It | may be that the hardware is set up as such, but symmetric | multiprocessing is also an option which is what the Pi | seems to do. | my123 wrote: | The code that runs at reset on the Arm CPU complex for the | RPi: https://github.com/raspberrypi/tools/blob/master/armst | ubs/ar... | monocasa wrote: | They don't all store to the same location. | SavantIdiot wrote: | That depends on the architecture. E.g., Intel uses a wired-OR | circuit and the cores race to determine who booted first, then | that core becomes the boot core and executes the first | instruction from boot ROM. | throwaway889900 wrote: | I'm assuming a simplistic CPU architecture aimed towards | beginners, which is generally what a tutorial is aimed at. | From there you can learn about all the nitty gritty details | that you need to get actual chip to work. | not-elite wrote: | Wow, is it really this [1] easy to run a C routine? | | Where does the rpi4 store the firmware necessary to read from the | sd card where this software is (presumably) stored? | | [1] | https://github.com/isometimes/rpi4-osdev/blob/master/part1-b... | teraflop wrote: | Yeah, the bootloader is responsible for the hardware stuff up | to this point. It doesn't take _that_ much more assembly code | to bootstrap C in the Linux kernel on x86: | https://github.com/torvalds/linux/blob/master/arch/x86/boot/... | | There are a bunch of other headers in that file, but the | "start_of_setup:" label is what's invoked by the bootloader, | and "calll main" transitions to C. So 32 lines of code, by my | count. | Teknoman117 wrote: | There's a bootloader (u-boot) written into flash memory in the | RPi4 SoC that handles the early initialization of the core and | finding a kernel to boot. Think of u-boot as the UEFI | equivalent for your RPi. | | Getting into C (or Rust) assuming the presence of some kind of | system firmware (BIOS, UEFI, u-boot, coreboot, etc.) isn't too | difficult in the grand scheme of things. | | Not to toot my own horn much, but here's an example I did of | getting into Rust on a 386EX SBC i had hanging around. I | actually yanked out the BIOS chip and this replaces it. Please | forgive any poor Rust practices, this was written in a hurry. | | https://github.com/teknoman117/ts-3100-images/tree/master/ru... | | I discovered a mind-melting bug where replacing the RTC clock | chip / battery-backed RAM can erase the BIOS. This SBC uses the | same flash chip for both user storage and the BIOS. The | partition between the user area and the bios is stored in the | CMOS ram, so if there is any junk in it, the BIOS might | misidentify the flash boundaries and erase itself... | | So, I wrote this to recover the boards. Bonus points were that | I only had an 8 KiB EEPROM hanging around so it had to fit in | 8K initially. | my123 wrote: | It isn't u-boot, it's something totally barebones. | Teknoman117 wrote: | I forgot the RPi didn't use u-boot, but they use an | equivalent. | | https://github.com/isometimes/rpi4-osdev/tree/master/part2- | b... | | https://raspberrypi.stackexchange.com/questions/10489/how- | do... | | You don't handle the CPU from the reset vector like you | would in a microcontroller or system firmware environment. | There is an entire loader stack that finds a boot device to | read a kernel image from that exists under you. | | That's not to take away from this series at all, it's just | the parent comment was asking about how it was so easy to | get into a kernel image written in C on an SD card without | any apparent SD card or FS logic. | cesarb wrote: | > Where does the rpi4 store the firmware necessary to read from | the sd card where this software is (presumably) stored? | | The main chip of the RPi4 has a small amount of code in a | built-in ROM which runs on boot. In the normal boot flow, that | code loads the bootloader from the EEPROM chip, but it can also | read a recovery image from the SD card. See | https://www.raspberrypi.com/documentation/computers/raspberr... | for details (or | https://www.raspberrypi.com/documentation/computers/raspberr... | for how it was on the older RPi devices). | dragontamer wrote: | Yeah, C is really easy to interface with assembly language. | | The hard part is the linker-script to get this working right | :-) | https://github.com/isometimes/rpi4-osdev/blob/master/part1-b... | [deleted] | sigjuice wrote: | _Yeah, C is really easy to interface with assembly language._ | | Doesn't necessarily have to involve assembly! | | https://duckduckgo.com/?q=c+interpreter | jandrese wrote: | The main core gets bootstrapped by the opaque binary blob in | the graphics subsystem. | rkagerer wrote: | _The Bluetooth modem is a Broadcom chip (BCM43455), and it needs | to be loaded with proprietary software before it 's useful to | us._ | | Are there any efforts out there to create open-source firmware | for this chip? | isometimes wrote: | I looked for a long time, but to no avail. Broadcom are | notoriously tight-lipped when it comes to their intellectual | property. | | Some have attempted to reverse-engineer... This was a good | read: https://blog.quarkslab.com/reverse-engineering-broadcom- | wire... | erdo wrote: | Might sound like a strange comment, but I really like the tone of | the readme. It's very welcoming and clear, you don't often find | that level of professionalism in readme docs - it makes me think | the author is probably quite unusual (in a good way) | isometimes wrote: | I'm going to take that as a compliment, thank you ;-) | isometimes wrote: | Quick thanks to the OP for linking my project: | https://www.rpi4os.com & | https://github.com/isometimes/rpi4-osdev. I'm humbled to be | mentioned here. So great to read the feedback! Feel free to get | in touch. | rcarmo wrote: | No problem. I've been looking into bare metal runtimes for ARM | chips, and when I chanced upon this thought it was too much of | a gem not to be posted here. :) | 908B64B197 wrote: | CS107E[0] might be an interesting course for those interested in | bare metal programming. | | Enrollment is over for it as of now, so better luck next | semester! | | [0] http://web.stanford.edu/class/cs107e/ | hikerclimber1 wrote: | You should only invest in the country you live in since you know | what goes on politically. | sylwester wrote: | On the Pi Zero and Pi CM (maybe also others) you don't even need | an SD card to boot it. You can boot it via rpi-boot | https://github.com/raspberrypi/usbboot so no need for qemu for | testing. You can just test it on real hardware in no time. | junon wrote: | Without qemu you have to implement serial interfaces early in | order to debug your own OS. Qemu is a huge benefit early on, | and provides a sane environment for deterministic development | as opposed to potentially quirky hardware. | | Usually hobbyist OS projects don't start directly on hardware. | Many of them never reach the point of running on hardware. | toast0 wrote: | I don't know about qemu for a PI/arm in general, but you do | have to be careful on x86, because segmentation limits aren't | checked by default, and you can be lulled into doing things | that don't work on hardware (I messed up secondary processor | starting), and if you don't frequently test on hardware, it's | easy to forget things. (See also homebrew games that only | work on emulators) | | But, you can certainly wait to start on hardware until you've | got it started on qemu. Debuggability is a lot better unless | you've got specialized equipment for your hardware setup | pm215 wrote: | On Arm a couple of only-on-hardware pitfalls are (a) cache | maintenance -- QEMU doesn't model caches so won't notice if | you forget to clean the dcache and flush the icache before | executing code you just wrote or modified and (b) | synchronization/barriers -- QEMU doesn't reorder memory | accesses and generally makes writes to system registers | take effect immediately, so won't notice if you forget | necessary barrier insns. | LeoPanthera wrote: | The Pi 3 and 4 can also boot from TFTP with an NFS root. This | also allows you to switch the OS your Pi is booting just by | renaming a symlink on your server. All my home Pis boot that | way. As a bonus, you never need to worry about an SD card going | bad. ___________________________________________________________________ (page generated 2021-10-06 23:00 UTC)