[HN Gopher] My First Kernel Module: A Debugging Nightmare ___________________________________________________________________ My First Kernel Module: A Debugging Nightmare Author : ksml Score : 62 points Date : 2020-11-19 19:30 UTC (3 hours ago) (HTM) web link (reberhardt.com) (TXT) w3m dump (reberhardt.com) | Taniwha wrote: | So a story: I've been a kernel hack since Unix V6, made a living | doing it one way or another for over half my life ... learning to | think about concurrency, time, interrupts, race conditions etc is | hard, very hard - I got pretty good at it ... but then my career | took a diversion, I designed chips for a decade or so, everything | is concurrency, at the lowest levels .... after a while I came | back to doing kernel stuff and found that with this new | background all that hard stuff was trivial and obvious. | | Mostly you just have to steep your brain in it for long enough | ksml wrote: | Concurrency is still hard for me, but I do find it getting much | easier over the years :) thanks for the story! | sweettea wrote: | You probably already did this, but for the audience: one of the | best ways to make sure you're using a function reasonably is to | use elixir.bootlin.com to look at other uses and make sure you're | using the function similarly. For instance, check out | https://elixir.bootlin.com/linux/latest/A/ident/for_each_pro... . | ksml wrote: | Elixir was extremely helpful to me! It didn't always help me | understand _why_ code was written the way it was (hence my | incorrect use of rcu_read_lock), but it was very helpful to see | some examples. | ksml wrote: | Hi HN, this was my first attempt at writing any sort of kernel | code. I would love to hear your thoughts on this experience and | on the fixes I applied, especially from anyone with more Linux | experience than me :) | ylyn wrote: | Seems like someone did try to get those functions exported, but | the maintainer rejected it, saying that no driver should be | poking so deep into fd internals. Makes sense. Your use case is | kind of niche. | | https://lore.kernel.org/lkml/20180730163256.GC27761@infradea... | | By the way, C Playground is really helpful for teaching an OS | course! | ksml wrote: | That is really interesting and good to know -- thanks for | that! | | I hope C Playground is helpful, and I'm building it with | teaching in mind. If you teach anywhere and could find it | useful, let me know! | ylyn wrote: | Here's a hack you could use to get around the functions not | being exported: https://github.com/anbox/anbox- | modules/blob/master/binder/de... | ksml wrote: | Oh, that's clever! I might try that. I really don't feel | comfortable building my own kernel | warybeary wrote: | Have you looked into using eBPF instead of writing a kernel | module? | | http://ebpf.io for some more insights. | | At the very least, it'll provide some useful tooling for you to | debug problems in kernel-space. | ksml wrote: | I hadn't considered this! Can eBPF be used to access | arbitrary kernel data structures, though? | warybeary wrote: | Yes (to a degree) :) | | Check out https://github.com/iovisor/bpftrace and the | example tools/ for a taste. You'll likely want to play with | kprobes/kretprobes. | ksml wrote: | This is really interesting; I hadn't realized it was so | capable/general. I'll look into this. Thanks for the | references! | nosefrog wrote: | Great story! I've had a lot of debugging nightmares, but | thankfully never anything as bad as that. | | One thing that looks fishy is this branch: if | (container_tasks_len == max_container_tasks) { | printk("cplayground: ERROR: container_tasks list hit capacity! We | " "may be missing processes from the procfile | output.\n"); break; } | | Since you said printk can block, why isn't calling it in the rcu | critical section a bug? Is it because you immediately break | afterwards and don't try to reference the next task? | ksml wrote: | That's a good point. I'm hoping that this never gets hit, and | if that line ever appears in the logs, then things are already | broken. However, it's probably better to improve the failure | mode where possible :) [edit] and yes, since we break and don't | follow the `next` pointer in the linked list, that also | shouldn't cause any problems. | devit wrote: | You can do most or all of that by reading /proc/<pid>/fdinfo/<fd> | and /proc/<pid>/fd/<fd> or by making system calls on the affected | fds (which you can do e.g. by injecting code with LD_PRELOAD or | ptrace or with nsenter with fd namespace or equivalent C code). | | Even if you write a kernel driver, iterating over all tasks in | the system is a terrible design (there may be millions), not to | mention "determining if a task belongs to a C playground program" | in the kernel (obviously the kernel should have no knowledge | about such specifics). | | Of course, if a developer cannot even produce a reasonable | overall design, it's not surprising that they aren't capable of | writing correct code. | nosefrog wrote: | "Be kind. Don't be snarky. Have curious conversation; don't | cross-examine. Please don't fulminate. Please don't sneer, | including at the rest of the community." | | https://news.ycombinator.com/newsguidelines.html | ksml wrote: | I actually cannot get enough information from doing that. | Crucially, I need to be able to recognize whether two file | descriptors point to the same open `file_struct`. (To be clear, | this isn't the same as whether they're pointing to the same | file path. I need to know when the two file descriptors are | sharing the same cursor.) There is no way to do this using | existing APIs, because there is nothing identifying a `struct | file` besides the memory address of the struct. (The "open file | IDs" I mention are hashes of the `file_struct` address.) | | I did spend a lot of time trying to avoid writing a kernel | module, and this was the only way I could find to do it :) | devit wrote: | You can use the kcmp system call with KCMP_FILE argument to | find out if two fds point to the same files structure (of | course you must use this as the custom comparison function of | a sort algorithm so you don't end up with quadratic run | time). | | Linux has a project called CRIU that can save and restore | processes to disk without needing additional kernel modules, | so pretty much all state is already gettable and settable | from user space. | ksml wrote: | I can't do that across processes, though, can I? (to see | whether two processes have file descriptors pointing to the | same open file) | | I hadn't heard of CRIU. I'll check that out. (edit: CRIU | looks super useful. I think the speed/overhead of | snapshotting will decide whether I can use it for this | project, but I can imagine it being handy in the future | regardless. Thanks for the link.) | dilyevsky wrote: | I recommend checking out podman (or docker) - they have | built-in criu support. Otherwise you'll need some other | namespacing mechanism to avoid colliding pids | lallysingh wrote: | EBPF is honestly the first thing to try _before_ writing a | module. | | I'm glad to see you used a VM. That's the first step in the right | direction. Others have mentioned that you should've used | printk(), which is true. | | I'll mention that you can also run the kernel in a debugger: | https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-... | ksml wrote: | I hadn't considered eBPF because I needed some pretty obscure | information from the kernel internals (i.e. the addresses of | the `struct file`s) and I didn't realize eBPF was as capable as | it is. Another commenter suggested trying it, though, so I'm | checking it out now! | | I did use printk for debugging, but I (incorrectly) assumed it | could block. Another commenter pointed out that this is not the | case. TIL! | | The gdb link looks very helpful and I'll try that next time. | Thanks for linking that. | cesarb wrote: | > However, printk can block (while allocating memory) | | No, printk() is magic. It can be called even in NMI context, | which is a worse place. Quoting https://lwn.net/Articles/800946/, | "[...] kernel code must be able to call printk() from any | context. Calls from atomic context prevent it from blocking; | calls from non-maskable interrupts (NMIs) can even rule out the | use of spinlocks. [...]" | ksml wrote: | This is really good to know. I had assumed it could block when | allocating memory for the formatted string buffer, but the | rationale explained in that article makes a lot of sense. Being | able to use printk simplifes things a lot. | kanox wrote: | Also: allocating memory with GFP_ATOMIC doesn't sleep. | lhoursquentin wrote: | Great post, also love what you are trying to do with C | playground, this is awesome! | | I've recently been trying to build something similar, visualizing | forks/exeve/read/write, but using the strace output of a binary, | which is much less challenging. | ksml wrote: | Thank you! It's open source, and I'd love to hear if you have | any suggestions for it. Would also love to see what you're | building! | secondcoming wrote: | Great article! Reminds me of when I was working on a bug in a | phone kernel and adding its equivalent of printk() made the bug | disappear! Lauterbach time! ___________________________________________________________________ (page generated 2020-11-19 23:00 UTC)