* SLUB: sysfs lets root force slab order below required minimum, causing memory corruption @ 2020-03-04 0:23 Jann Horn 2020-03-04 1:26 ` David Rientjes 2020-03-04 13:17 ` Vlastimil Babka 0 siblings, 2 replies; 8+ messages in thread From: Jann Horn @ 2020-03-04 0:23 UTC (permalink / raw) To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton Cc: Linux-MM, kernel list, Kees Cook, Matthew Garrett Hi! FYI, I noticed that if you do something like the following as root, the system blows up pretty quickly with error messages about stuff like corrupt freelist pointers because SLUB actually allows root to force a page order that is smaller than what is required to store a single object: echo 0 > /sys/kernel/slab/task_struct/order The other SLUB debugging options, like red_zone, also look kind of suspicious with regards to races (either racing with other writes to the SLUB debugging options, or with object allocations). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 0:23 SLUB: sysfs lets root force slab order below required minimum, causing memory corruption Jann Horn @ 2020-03-04 1:26 ` David Rientjes 2020-03-04 2:22 ` Kees Cook 2020-03-04 14:57 ` Pekka Enberg 2020-03-04 13:17 ` Vlastimil Babka 1 sibling, 2 replies; 8+ messages in thread From: David Rientjes @ 2020-03-04 1:26 UTC (permalink / raw) To: Jann Horn Cc: Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Kees Cook, Matthew Garrett On Wed, 4 Mar 2020, Jann Horn wrote: > Hi! > > FYI, I noticed that if you do something like the following as root, > the system blows up pretty quickly with error messages about stuff > like corrupt freelist pointers because SLUB actually allows root to > force a page order that is smaller than what is required to store a > single object: > > echo 0 > /sys/kernel/slab/task_struct/order > > The other SLUB debugging options, like red_zone, also look kind of > suspicious with regards to races (either racing with other writes to > the SLUB debugging options, or with object allocations). > Thanks for the report, Jann. To address the most immediate issue, allowing a smaller order than allowed, I think we'd need something like this. I can propose it as a formal patch if nobody has any alternate suggestions? --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c --- a/mm/slub.c +++ b/mm/slub.c @@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) */ size = ALIGN(size, s->align); s->size = size; - if (forced_order >= 0) + if (forced_order >= slab_order(size, 1, MAX_ORDER, 1)) order = forced_order; else order = calculate_order(size); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 1:26 ` David Rientjes @ 2020-03-04 2:22 ` Kees Cook 2020-03-04 17:26 ` Vlastimil Babka 2020-03-04 14:57 ` Pekka Enberg 1 sibling, 1 reply; 8+ messages in thread From: Kees Cook @ 2020-03-04 2:22 UTC (permalink / raw) To: David Rientjes Cc: Jann Horn, Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Matthew Garrett On Tue, Mar 03, 2020 at 05:26:14PM -0800, David Rientjes wrote: > On Wed, 4 Mar 2020, Jann Horn wrote: > > > Hi! > > > > FYI, I noticed that if you do something like the following as root, > > the system blows up pretty quickly with error messages about stuff > > like corrupt freelist pointers because SLUB actually allows root to > > force a page order that is smaller than what is required to store a > > single object: > > > > echo 0 > /sys/kernel/slab/task_struct/order > > > > The other SLUB debugging options, like red_zone, also look kind of > > suspicious with regards to races (either racing with other writes to > > the SLUB debugging options, or with object allocations). > > > > Thanks for the report, Jann. To address the most immediate issue, > allowing a smaller order than allowed, I think we'd need something like > this. > > I can propose it as a formal patch if nobody has any alternate > suggestions? > --- > mm/slub.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) > */ > size = ALIGN(size, s->align); > s->size = size; > - if (forced_order >= 0) > + if (forced_order >= slab_order(size, 1, MAX_ORDER, 1)) > order = forced_order; > else > order = calculate_order(size); Seems reasonable! For the race concerns, should this logic just make sure the resulting order can never shrink? Or does it need much stronger atomicity? -- Kees Cook ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 2:22 ` Kees Cook @ 2020-03-04 17:26 ` Vlastimil Babka 2020-03-04 20:39 ` David Rientjes 0 siblings, 1 reply; 8+ messages in thread From: Vlastimil Babka @ 2020-03-04 17:26 UTC (permalink / raw) To: Kees Cook, David Rientjes Cc: Jann Horn, Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Matthew Garrett, Vijayanand Jitta On 3/4/20 3:22 AM, Kees Cook wrote: > On Tue, Mar 03, 2020 at 05:26:14PM -0800, David Rientjes wrote: > > Seems reasonable! > > For the race concerns, should this logic just make sure the resulting > order can never shrink? Or does it need much stronger atomicity? If order grows, I think we also need to recalculate the random sequence for freelist randomization [1]. I expect that would be rather problematic with parallel allocations/freeing going on. As was also noted, the any_slab_objects(s) checks are racy - might return false and immediately some other CPU can allocate some. I wonder if this race window could be fixed at all without introducing extra locking in the fast path? Which means it's probably not worth the trouble of having these runtime knobs. How about making the files read-only (if not remove completely). Vijayanand described a use case in [2], shouldn't it be possible to implement that scenario (all caches have debugging enabled except zram cache) with kernel parameters only? Thanks, Vlastimil [1] https://lore.kernel.org/linux-mm/d3acc069-a5c6-f40a-f95c-b546664bc4ee@suse.cz/ [2] https://lore.kernel.org/linux-mm/1383cd32-1ddc-4dac-b5f8-9c42282fa81c@codeaurora.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 17:26 ` Vlastimil Babka @ 2020-03-04 20:39 ` David Rientjes 2020-03-08 19:34 ` Christopher Lameter 0 siblings, 1 reply; 8+ messages in thread From: David Rientjes @ 2020-03-04 20:39 UTC (permalink / raw) To: Vlastimil Babka Cc: Kees Cook, Jann Horn, Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Matthew Garrett, Vijayanand Jitta On Wed, 4 Mar 2020, Vlastimil Babka wrote: > > Seems reasonable! > > > > For the race concerns, should this logic just make sure the resulting > > order can never shrink? Or does it need much stronger atomicity? > > If order grows, I think we also need to recalculate the random sequence for > freelist randomization [1]. I expect that would be rather problematic with > parallel allocations/freeing going on. > > As was also noted, the any_slab_objects(s) checks are racy - might return false > and immediately some other CPU can allocate some. > > I wonder if this race window could be fixed at all without introducing extra > locking in the fast path? Which means it's probably not worth the trouble of > having these runtime knobs. How about making the files read-only (if not remove > completely). Vijayanand described a use case in [2], shouldn't it be possible to > implement that scenario (all caches have debugging enabled except zram cache) > with kernel parameters only? > I'm not sure how dependent the CONFIG_SLUB_DEBUG users are on being able to modify these are runtime (they've been around for 12+ years) but I agree that it seems particularly dangerous. I think they can be fixed by freezing allocations and frees for the particular kmem_cache on all cpus which would add the additional conditional in the fastpath and that's going to be required in the very small minority of cases where an admin actually wants to change these. The slub_debug kernel command line options are already pretty comprehensive as described by Documentation/vm/slub.rst. I *think* these tunables were primarily introduced for kernel debugging and not general purpose, perhaps with the exception of "order". So I think we may be able to fix "order" with a combination of my patch as well as a fix to the freelist randomization and that the others should likely be made read only. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 20:39 ` David Rientjes @ 2020-03-08 19:34 ` Christopher Lameter 0 siblings, 0 replies; 8+ messages in thread From: Christopher Lameter @ 2020-03-08 19:34 UTC (permalink / raw) To: David Rientjes Cc: Vlastimil Babka, Kees Cook, Jann Horn, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Matthew Garrett, Vijayanand Jitta On Wed, 4 Mar 2020, David Rientjes wrote: > I'm not sure how dependent the CONFIG_SLUB_DEBUG users are on being able > to modify these are runtime (they've been around for 12+ years) but I > agree that it seems particularly dangerous. The order of each individual slab page is stored in struct page. That is why every slub slab page can have a different order. This enabled fallback to order 0 allocations and also allows a dynamic configuration of the order at runtime. > The slub_debug kernel command line options are already pretty > comprehensive as described by Documentation/vm/slub.rst. I *think* these > tunables were primarily introduced for kernel debugging and not general > purpose, perhaps with the exception of "order". What do you mean by "general purpose? Certainly the allocator should not blow up when forcing zero order allocations. > So I think we may be able to fix "order" with a combination of my patch as > well as a fix to the freelist randomization and that the others should > likely be made read only. Hmmm. races increases as more metadata is added that is depending on the size of the slab page and the number of objects in it. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 1:26 ` David Rientjes 2020-03-04 2:22 ` Kees Cook @ 2020-03-04 14:57 ` Pekka Enberg 1 sibling, 0 replies; 8+ messages in thread From: Pekka Enberg @ 2020-03-04 14:57 UTC (permalink / raw) To: David Rientjes, Jann Horn Cc: Christoph Lameter, Pekka Enberg, Joonsoo Kim, Andrew Morton, Linux-MM, kernel list, Kees Cook, Matthew Garrett On 3/4/20 3:26 AM, David Rientjes wrote: > On Wed, 4 Mar 2020, Jann Horn wrote: > >> Hi! >> >> FYI, I noticed that if you do something like the following as root, >> the system blows up pretty quickly with error messages about stuff >> like corrupt freelist pointers because SLUB actually allows root to >> force a page order that is smaller than what is required to store a >> single object: >> >> echo 0 > /sys/kernel/slab/task_struct/order >> >> The other SLUB debugging options, like red_zone, also look kind of >> suspicious with regards to races (either racing with other writes to >> the SLUB debugging options, or with object allocations). >> > > Thanks for the report, Jann. To address the most immediate issue, > allowing a smaller order than allowed, I think we'd need something like > this. > > I can propose it as a formal patch if nobody has any alternate > suggestions? > --- > mm/slub.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order) > */ > size = ALIGN(size, s->align); > s->size = size; > - if (forced_order >= 0) > + if (forced_order >= slab_order(size, 1, MAX_ORDER, 1)) > order = forced_order; > else > order = calculate_order(size); > Reviewed-by: Pekka Enberg <penberg@iki.fi> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption 2020-03-04 0:23 SLUB: sysfs lets root force slab order below required minimum, causing memory corruption Jann Horn 2020-03-04 1:26 ` David Rientjes @ 2020-03-04 13:17 ` Vlastimil Babka 1 sibling, 0 replies; 8+ messages in thread From: Vlastimil Babka @ 2020-03-04 13:17 UTC (permalink / raw) To: Jann Horn, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton Cc: Linux-MM, kernel list, Kees Cook, Matthew Garrett On 3/4/20 1:23 AM, Jann Horn wrote: > Hi! > > FYI, I noticed that if you do something like the following as root, > the system blows up pretty quickly with error messages about stuff > like corrupt freelist pointers because SLUB actually allows root to > force a page order that is smaller than what is required to store a > single object: > > echo 0 > /sys/kernel/slab/task_struct/order > > The other SLUB debugging options, like red_zone, also look kind of > suspicious with regards to races (either racing with other writes to > the SLUB debugging options, or with object allocations). Yeah I also wondered last week that there seems to be no sychronization with alloc/free activity. Increasing order is AFAICS also dangerous with freelist randomization: https://lore.kernel.org/linux-mm/d3acc069-a5c6-f40a-f95c-b546664bc4ee@suse.cz/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-03-08 19:34 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-04 0:23 SLUB: sysfs lets root force slab order below required minimum, causing memory corruption Jann Horn 2020-03-04 1:26 ` David Rientjes 2020-03-04 2:22 ` Kees Cook 2020-03-04 17:26 ` Vlastimil Babka 2020-03-04 20:39 ` David Rientjes 2020-03-08 19:34 ` Christopher Lameter 2020-03-04 14:57 ` Pekka Enberg 2020-03-04 13:17 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).