commented: The meat of the article is of course what tripwires you can put into your code, and what color the walls of the rabbit hole are. However, if you are mainly interested in using the described Trip tool, the strace tool might also be an alternative. E.g. to simulate the example case of fork failing every second time: $ strace -f -e fault=clone:when=1+2:error=ENOMEM -o /dev/null bash bash: fork: Cannot allocate memory $ date Sat Aug 24 11:59:22 CEST 2024 $ date bash: fork: Cannot allocate memory $ date Sat Aug 24 11:59:23 CEST 2024 $ date bash: fork: Cannot allocate memory $ This uses the ptrace mechanism rather than LD_PRELOAD, so it has other pros and cons. For example the fork() function apparently is implemented using the clone() syscall on my system. And of course strace is dangerous (but I guess so is LD_PRELOAD). Still, a handy tool. I stumbled upon these surprising strace features just yesterday and wanted to share them. commented: I’ve seen that article about strace before but never found out what they mean by dangerous. AFAICT there’s nothing in the article that explains that sentence. Do you know? commented: Good point; maybe “dangerous” is not the right word here. From my understanding the main danger of strace is that it meddles directly with the traced process, at least by sending it STOP and CONT signals. And in general the ptrace interface provided by the kernel can do basically anything to a process (read and write all of its memory, including its machine code, and thereby also influencing control flow). I don’t know how much of the ptrace capabilities are used by strace; but generally ptrace is a very powerful and therefore “dangerous” way to influence a process. The linked article mainly mentions that strace is slow, which I guess can be a real danger if used for production processes. The part about strace meddling with the traced process is mentioned under “Versus Advanced Tracers”: There is a possible con: in the past, strace has had bugs which can leave the target process, or its followed children, in the STOP state (e.g., here, here). This could cause a serious production outage, as the application is now frozen mid-flight. If you realize this immediately and can fix it (kill strace, then kill -CONT the process), then you may avoid a serious outage. However, you may still have caused a burst of application requests with multi-second latency (outliers), depending on how quickly you typed in the kill command. My understanding is that nowadays tools like perf or dtrace can provide the same level of insight in a less dangerous way (they don’t need to directly meddle with the process, but only “passively” look at the syscalls inside the kernel). So that’s nicer. OTOH perf apparently cannot easily influence the process the same way strace -e fault or strace -e inject can. commented: The article is saying it’s dangerous in production because it will alter the behavior of the program, notably by making it a lot slower. By inserting innumerable pauses in the execution it might also trigger issues due to locking or latency assumptions. .