All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
@ 2020-04-26  7:06 Sven Eckelmann
  2020-04-26 23:14 ` Sasha Levin
  2020-04-28 10:13 ` Vlastimil Babka
  0 siblings, 2 replies; 7+ messages in thread
From: Sven Eckelmann @ 2020-04-26  7:06 UTC (permalink / raw)
  To: stable
  Cc: Vlastimil Babka, Kees Cook, Daniel Micay, Eric Dumazet,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Matthias Schiffer, Andrew Morton, Linus Torvalds, Sven Eckelmann

From: Vlastimil Babka <vbabka@suse.cz>

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

In SLUB, prefetch_freepointer() is used when allocating an object from
cache's freelist, to make sure the next object in the list is cache-hot,
since it's probable it will be allocated soon.

Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has
unintentionally changed the prefetch in a way where the prefetch is
turned to a real fetch, and only the next->next pointer is prefetched.
In case there is not a stream of allocations that would benefit from
prefetching, the extra real fetch might add a useless cache miss to the
allocation.  Restore the previous behavior.

Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz
Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
---
The original problem is explained in the patch description as
performance problem. And maybe this could also be one reason why it was
never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
related to "random" data bus errors. At least applying this patch seemed to
have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
some other persons who where debugging/testing this problem with him.

More details about it can be found in
https://github.com/freifunk-gluon/gluon/issues/1982
---
 mm/slub.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 3c1a16f03b2b..481518c3f61a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -269,8 +269,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
 
 static void prefetch_freepointer(const struct kmem_cache *s, void *object)
 {
-	if (object)
-		prefetch(freelist_dereference(s, object + s->offset));
+	prefetch(object + s->offset);
 }
 
 static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-26  7:06 [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer() Sven Eckelmann
@ 2020-04-26 23:14 ` Sasha Levin
  2020-04-27  7:01   ` Sven Eckelmann
  2020-04-28 10:13 ` Vlastimil Babka
  1 sibling, 1 reply; 7+ messages in thread
From: Sasha Levin @ 2020-04-26 23:14 UTC (permalink / raw)
  To: Sven Eckelmann
  Cc: stable, Vlastimil Babka, Kees Cook, Daniel Micay, Eric Dumazet,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Matthias Schiffer, Andrew Morton, Linus Torvalds

On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:
>From: Vlastimil Babka <vbabka@suse.cz>
>
>commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.
>
>In SLUB, prefetch_freepointer() is used when allocating an object from
>cache's freelist, to make sure the next object in the list is cache-hot,
>since it's probable it will be allocated soon.
>
>Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has
>unintentionally changed the prefetch in a way where the prefetch is
>turned to a real fetch, and only the next->next pointer is prefetched.
>In case there is not a stream of allocations that would benefit from
>prefetching, the extra real fetch might add a useless cache miss to the
>allocation.  Restore the previous behavior.
>
>Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz
>Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation")
>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>Acked-by: Kees Cook <keescook@chromium.org>
>Cc: Daniel Micay <danielmicay@gmail.com>
>Cc: Eric Dumazet <edumazet@google.com>
>Cc: Christoph Lameter <cl@linux.com>
>Cc: Pekka Enberg <penberg@kernel.org>
>Cc: David Rientjes <rientjes@google.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Matthias Schiffer <mschiffer@universe-factory.net>
>Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>Signed-off-by: Sven Eckelmann <sven@narfation.org>
>---
>The original problem is explained in the patch description as
>performance problem. And maybe this could also be one reason why it was
>never submitted for a stable kernel.
>
>But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
>related to "random" data bus errors. At least applying this patch seemed to
>have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
>some other persons who where debugging/testing this problem with him.
>
>More details about it can be found in
>https://github.com/freifunk-gluon/gluon/issues/1982

Interesting... I wonder why this issue has started only now.

I've queued it up for 4.14, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-26 23:14 ` Sasha Levin
@ 2020-04-27  7:01   ` Sven Eckelmann
  2020-04-27  8:45     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Sven Eckelmann @ 2020-04-27  7:01 UTC (permalink / raw)
  To: Sasha Levin
  Cc: stable, Vlastimil Babka, Kees Cook, Daniel Micay, Eric Dumazet,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Matthias Schiffer, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:
> On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:
> >From: Vlastimil Babka <vbabka@suse.cz>
> >
> >commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.
[...]
> >---
> >The original problem is explained in the patch description as
> >performance problem. And maybe this could also be one reason why it was
> >never submitted for a stable kernel.
> >
> >But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
> >related to "random" data bus errors. At least applying this patch seemed to
> >have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
> >some other persons who where debugging/testing this problem with him.
> >
> >More details about it can be found in
> >https://github.com/freifunk-gluon/gluon/issues/1982
> 
> Interesting... I wonder why this issue has started only now.

Unfortunately, I don't know the details. So I (actually we) would love to get 
some feedback from the slub experts. Not that there is another problem which 
we just don't grasp yet.

Just some background information about the "why" from freifunk-gluon's 
perspective:

OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it 
was the first release using kernel 4.14 on the most used target: ar71xx 
(ath79). The wireless community network firmware projects (freifunk-gluon in 
this example) updated their frameworks to this OpenWrt release in the last 
months and just now started to roll it out on their networks.

And while the wireless community networks around here usually don't track the 
connected clients, the health of the APs is often tracked on some central 
system. And some people then just noticed a sudden spike of reboots on their 
APs. Since ar71xx is (often) the most used architecture at the moment, this 
could be spotted rather easily if you spend some time looking at graphs.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-27  7:01   ` Sven Eckelmann
@ 2020-04-27  8:45     ` Vlastimil Babka
  2020-04-27 19:08       ` Matthias Schiffer
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2020-04-27  8:45 UTC (permalink / raw)
  To: Sven Eckelmann, Sasha Levin
  Cc: stable, Kees Cook, Daniel Micay, Eric Dumazet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Matthias Schiffer,
	Andrew Morton, Linus Torvalds

On 4/27/20 9:01 AM, Sven Eckelmann wrote:
> On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:
>> On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:
>> >From: Vlastimil Babka <vbabka@suse.cz>
>> >
>> >commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.
> [...]
>> >---
>> >The original problem is explained in the patch description as
>> >performance problem. And maybe this could also be one reason why it was
>> >never submitted for a stable kernel.
>> >
>> >But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
>> >related to "random" data bus errors. At least applying this patch seemed to
>> >have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
>> >some other persons who where debugging/testing this problem with him.
>> >
>> >More details about it can be found in
>> >https://github.com/freifunk-gluon/gluon/issues/1982

Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.

>> 
>> Interesting... I wonder why this issue has started only now.
> 
> Unfortunately, I don't know the details. So I (actually we) would love to get 
> some feedback from the slub experts. Not that there is another problem which 
> we just don't grasp yet.

I think the prefetch my go to an address that would cause a real fetch to page
fault. Under normal circumstances that could be only the NULL pointer that
terminates a freelist, otherwise the address should be valid.

So that could mean:
1) prefetch() on mips is implemented/compiled wrong?
2) the CPU really has issues with prefetch causing a page fault
3) the prefetch gets reordered between LL/SC and there's some bug similar to
this one described in arch/mips/include/asm/sync.h:

/*
 * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
 * store or prefetch) in between an LL & SC can cause the SC instruction to
 * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
 * containing such sequences, this bug bites harder than we might otherwise
 * expect due to reordering & speculation:


> Just some background information about the "why" from freifunk-gluon's 
> perspective:
> 
> OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it 
> was the first release using kernel 4.14 on the most used target: ar71xx 
> (ath79). The wireless community network firmware projects (freifunk-gluon in 
> this example) updated their frameworks to this OpenWrt release in the last 
> months and just now started to roll it out on their networks.
> 
> And while the wireless community networks around here usually don't track the 
> connected clients, the health of the APs is often tracked on some central 
> system. And some people then just noticed a sudden spike of reboots on their 
> APs. Since ar71xx is (often) the most used architecture at the moment, this 
> could be spotted rather easily if you spend some time looking at graphs.
> 
> Kind regards,
> 	Sven
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-27  8:45     ` Vlastimil Babka
@ 2020-04-27 19:08       ` Matthias Schiffer
  2020-04-28 11:43         ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: Matthias Schiffer @ 2020-04-27 19:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sven Eckelmann, Sasha Levin, stable, Kees Cook, Daniel Micay,
	Eric Dumazet, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linus Torvalds


[-- Attachment #1.1: Type: text/plain, Size: 4389 bytes --]

On 4/27/20 10:45 AM, Vlastimil Babka wrote:
> On 4/27/20 9:01 AM, Sven Eckelmann wrote:
>> On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:
>>> On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:
>>>> From: Vlastimil Babka <vbabka@suse.cz>
>>>>
>>>> commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.
>> [...]
>>>> ---
>>>> The original problem is explained in the patch description as
>>>> performance problem. And maybe this could also be one reason why it was
>>>> never submitted for a stable kernel.
>>>>
>>>> But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
>>>> related to "random" data bus errors. At least applying this patch seemed to
>>>> have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
>>>> some other persons who where debugging/testing this problem with him.
>>>>
>>>> More details about it can be found in
>>>> https://github.com/freifunk-gluon/gluon/issues/1982
> 
> Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.

The fact that the location of the data bus error was so imprecise made me
suspect that no regular load could be the cause - therefore I looked at
that prefetch in detail and eventually found your patch.

> 
>>>
>>> Interesting... I wonder why this issue has started only now.
>>
>> Unfortunately, I don't know the details. So I (actually we) would love to get 
>> some feedback from the slub experts. Not that there is another problem which 
>> we just don't grasp yet.
> 
> I think the prefetch my go to an address that would cause a real fetch to page
> fault. Under normal circumstances that could be only the NULL pointer that
> terminates a freelist, otherwise the address should be valid.

For further analysis, I just replaced the prefetch as implemented in 4.14
(i.e. before applying the patch in question) with a regular load (excluding
NULL, which would lead to an immediate fault on boot). With the test
program, I quickly hit a fault, at an address that looks completely bogus
(i.e. neither NULL nor an address looking like it might be mapped to
anything). Is this expected with the incorrect prefetch_freepointer()
implementation of 4.14? Is it possible that prefetch_freepointer() of 4.14
is even more broken than suspected before? Note that we hit this issue with
the "names_cache" slab, which has page-sized objects, if that might provide
any clue...

In any case, it seems like the "pref" instruction should not be used on
bogus addresses on the ath79 platform... The exact behaviour is also
hardware-dependent: On some SoCs, the error would be visible as a data bus
error, while others just reset without any way to find out what was going
wrong.

Matthias

> 
> So that could mean:
> 1) prefetch() on mips is implemented/compiled wrong?
> 2) the CPU really has issues with prefetch causing a page fault
> 3) the prefetch gets reordered between LL/SC and there's some bug similar to
> this one described in arch/mips/include/asm/sync.h:
> 
> /*
>  * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
>  * store or prefetch) in between an LL & SC can cause the SC instruction to
>  * erroneously succeed, breaking atomicity. Whilst it's unusual to write code
>  * containing such sequences, this bug bites harder than we might otherwise
>  * expect due to reordering & speculation:
> 
> 
>> Just some background information about the "why" from freifunk-gluon's 
>> perspective:
>>
>> OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it 
>> was the first release using kernel 4.14 on the most used target: ar71xx 
>> (ath79). The wireless community network firmware projects (freifunk-gluon in 
>> this example) updated their frameworks to this OpenWrt release in the last 
>> months and just now started to roll it out on their networks.
>>
>> And while the wireless community networks around here usually don't track the 
>> connected clients, the health of the APs is often tracked on some central 
>> system. And some people then just noticed a sudden spike of reboots on their 
>> APs. Since ar71xx is (often) the most used architecture at the moment, this 
>> could be spotted rather easily if you spend some time looking at graphs.
>>
>> Kind regards,
>> 	Sven
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-26  7:06 [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer() Sven Eckelmann
  2020-04-26 23:14 ` Sasha Levin
@ 2020-04-28 10:13 ` Vlastimil Babka
  1 sibling, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2020-04-28 10:13 UTC (permalink / raw)
  To: Sven Eckelmann, stable
  Cc: Kees Cook, Daniel Micay, Eric Dumazet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Matthias Schiffer,
	Andrew Morton, Linus Torvalds

On 4/26/20 9:06 AM, Sven Eckelmann wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
> 
> commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.
> 
> In SLUB, prefetch_freepointer() is used when allocating an object from
> cache's freelist, to make sure the next object in the list is cache-hot,
> since it's probable it will be allocated soon.
> 
> Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has
> unintentionally changed the prefetch in a way where the prefetch is
> turned to a real fetch, and only the next->next pointer is prefetched.
> In case there is not a stream of allocations that would benefit from
> prefetching, the extra real fetch might add a useless cache miss to the
> allocation.  Restore the previous behavior.
> 
> Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz
> Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Kees Cook <keescook@chromium.org>
> Cc: Daniel Micay <danielmicay@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Matthias Schiffer <mschiffer@universe-factory.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Sven Eckelmann <sven@narfation.org>

Regardless of how the discussion of exact root cause turns out, I agree with the 
stable inclusion, consider this as Ack, if it makes sense in that context :)

> ---
> The original problem is explained in the patch description as
> performance problem. And maybe this could also be one reason why it was
> never submitted for a stable kernel.
> 
> But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely
> related to "random" data bus errors. At least applying this patch seemed to
> have solved it for Matthias Schiffer <mschiffer@universe-factory.net> and
> some other persons who where debugging/testing this problem with him.
> 
> More details about it can be found in
> https://github.com/freifunk-gluon/gluon/issues/1982
> ---
>   mm/slub.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 3c1a16f03b2b..481518c3f61a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -269,8 +269,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)
>   
>   static void prefetch_freepointer(const struct kmem_cache *s, void *object)
>   {
> -	if (object)
> -		prefetch(freelist_dereference(s, object + s->offset));
> +	prefetch(object + s->offset);
>   }
>   
>   static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()
  2020-04-27 19:08       ` Matthias Schiffer
@ 2020-04-28 11:43         ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2020-04-28 11:43 UTC (permalink / raw)
  To: Matthias Schiffer
  Cc: Sven Eckelmann, Sasha Levin, stable, Kees Cook, Daniel Micay,
	Eric Dumazet, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Linus Torvalds

On 4/27/20 9:08 PM, Matthias Schiffer wrote:
> On 4/27/20 10:45 AM, Vlastimil Babka wrote:
>> On 4/27/20 9:01 AM, Sven Eckelmann wrote:
>>>>> More details about it can be found in
>>>>> https://github.com/freifunk-gluon/gluon/issues/1982
>> 
>> Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.
> 
> The fact that the location of the data bus error was so imprecise made me
> suspect that no regular load could be the cause - therefore I looked at
> that prefetch in detail and eventually found your patch.

Ah, great, thanks.

>> 
>> 
>> I think the prefetch my go to an address that would cause a real fetch to page
>> fault. Under normal circumstances that could be only the NULL pointer that
>> terminates a freelist, otherwise the address should be valid.
> 
> For further analysis, I just replaced the prefetch as implemented in 4.14
> (i.e. before applying the patch in question) with a regular load (excluding
> NULL, which would lead to an immediate fault on boot). With the test
> program, I quickly hit a fault, at an address that looks completely bogus
> (i.e. neither NULL nor an address looking like it might be mapped to
> anything). Is this expected with the incorrect prefetch_freepointer()
> implementation of 4.14? Is it possible that prefetch_freepointer() of 4.14
> is even more broken than suspected before? Note that we hit this issue with
> the "names_cache" slab, which has page-sized objects, if that might provide
> any clue...

Thanks for the test and it might indeed be worthwile to chase it down as it's 
really unclear to me what's going on and there might be a worse underlying issue 
that we just mask.
One question, is CONFIG_SLAB_FREELIST_HARDENED enabled in your config? That 
could play a role in seeing addresses that look completely bogus.

Looking closer at prefetch_freepointer() usage from slab_alloc_node() we start with

object = c->freelist;

if that's not NULL, we do

void *next_object = get_freepointer_safe(s, object);

then there's the tricky this_cpu_cmpxchg_double() operation. We can assume that 
if that succeeds, c->freelist == next_object
So, if next_object was already a broken pointer, the next allocation should 
crash, even if we don't crash due to the prefetch.

It's possible next_object is NULL. But then the 4.14 implementation should be 
actually fine, prefetch_freepointer() has "if (object)" condition so it won't 
access (not just prefetch) that NULL. On the contrary, my patch removes that 
condition, as it also didn't exist before commit 2482ddec670fb. So AFAICS it's 
possible to trigger a NULL prefetch after my backport, and apparently the CPU 
doesn't mind that?

What if next_object is not NULL? Then it should be a valid pointer, as stated 
above. With my patch, then it's just prefetched and that's it. 4.14 
implementation will read (not prefetch) the next free pointer in the chain, 
which AFAICT should also be either valid or NULL, and then prefetch that. So 
both implementation. should be prefetching either a valid value, or NULL?
Aha! But we have already put next_object to c->freelist by the 
this_cpu_cmpxchg_double(), so we could have been then interrupted, and the 
interrupt handler or another thread scheduled instead of us on the same CPU have 
now allocated next_object from c->freelist and filled it with its own data, 
overwriting the free pointer. Then we resume execution and indeed can read bogus 
value as a free pointer and try to prefetch it... so that might be the reason.

> In any case, it seems like the "pref" instruction should not be used on
> bogus addresses on the ath79 platform... The exact behaviour is also
> hardware-dependent: On some SoCs, the error would be visible as a data bus
> error, while others just reset without any way to find out what was going
> wrong.

If I knew that, and realized the scenario above is possible, I would have marked 
the patch stable myself. Sorry.

> Matthias
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-28 11:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-26  7:06 [PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer() Sven Eckelmann
2020-04-26 23:14 ` Sasha Levin
2020-04-27  7:01   ` Sven Eckelmann
2020-04-27  8:45     ` Vlastimil Babka
2020-04-27 19:08       ` Matthias Schiffer
2020-04-28 11:43         ` Vlastimil Babka
2020-04-28 10:13 ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.