On Thu, Apr 21, 2022 at 2:10 PM Shaobo Huang wrote: > > > From: huangshaobo > > > > > > when writing out of bounds to the red zone, it can only be detected at > > > kfree. However, there were many scenarios before kfree that caused this > > > out-of-bounds write to not be detected. Therefore, it is necessary to > > > provide a method for actively detecting out-of-bounds writing to the > red > > > zone, so that users can actively detect, and can be detected in the > > > system reboot or panic. > > > > > > > > After having analyzed a couple of KFENCE memory corruption reports in the > > wild, I have doubts that this approach will be helpful. > > > > Note that KFENCE knows nothing about the memory access that performs the > > actual corruption. > > > > It's rather easy to investigate corruptions of short-living objects, e.g. > > those that are allocated and freed within the same function. In that > case, > > one can examine the region of the code between these two events and try > to > > understand what exactly caused the corruption. > > > > But for long-living objects checked at panic/reboot we'll effectively > have > > only the allocation stack and will have to check all the places where the > > corrupted object was potentially used. > > Most of the time, such reports won't be actionable. > > The detection mechanism of kfence is probabilistic. It is not easy to find > a bug. > It is a pity to catch a bug without reporting it. and the cost of panic > detection > is not large, so panic detection is still valuable. > > I am also a big fan of showing as much information as possible to help the developers debug a memory corruption. But I am still struggling to understand how the proposed patch helps. Assume we have some generic allocation of an skbuff, so the reports looks like this: ============================================= BUG: KFENCE: memory corruption in Corrupted memory at kfence-#59: -,size=100,cache=kmalloc-128 allocated by task 77 on cpu 0 at 28.018073s: kmem_cache_alloc __alloc_skb alloc_skb_with_frags sock_alloc_send_pskb unix_stream_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto ============================================= This report will denote that in a system that could have been running for days a particular skbuff was corrupted by some unknown task at some unknown point in time. How do we figure out what exactly caused this corruption? When we deploy KFENCE at scale, it is rarely possible for the kernel developer to get access to the host that reported the bug and try to reproduce it. With that in mind, the report (plus the kernel source) must contain all the necessary information to address the bug, otherwise reporting it will result in wasting the developer's time. Moreover, if we report such bugs too often, our tool loses the credit, which is hard to regain. > > for example, if the application memory is out of bounds and written to > > > the red zone in the kfence object, the system suddenly panics, and the > > > following log can be seen during system reset: > > > BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49/0x70 > [...] > > thanks, > ShaoBo Huang > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.