All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Mozes <david.mozes@silk.us>
To: Matthew Wilcox <willy@infradead.org>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Darren Hart <dvhart@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: futex/call -to plist_for_each_entry_safe with head=NULL
Date: Mon, 14 Jun 2021 11:12:03 +0000	[thread overview]
Message-ID: <AM6PR04MB5639FBD246251DEF694AB02FF1319@AM6PR04MB5639.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <YMZkwsa4yQ/SsMW/@casper.infradead.org>

Thx Matthew

1) You are probably correct regarding the place the actual crash happened unless something happens in-betweens....
But that what the gdb told us in addition, the RDI shows us the value of 0x00000246. 


Jun 10 20:49:40 c-node04 kernel: [97562.144463] BUG: unable to handle kernel NULL pointer dereference at 0000000000000246
Jun 10 20:49:40 c-node04 kernel: [97562.145450] PGD 2012ee4067 P4D 2012ee4067 PUD 20135a0067 PMD 0
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Oops: 0000 [#1] SMP
Jun 10 20:49:40 c-node04 kernel: [97562.145450] CPU: 36 PID: 12668 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G        W  OE     4.19.149-KM6 #1
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe624135b90, 85, 50) returning 0 times: 0, 0, 0, 2203, 0 ccount 42
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RIP: 0010:do_futex+0xdf/0xa90
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Code: 08 4c 8d 6d 08 48 8b 3a 48 8d 72 e8 49 39 d5 4c 8d 67 e8 0f 84 89 00 00 00 31 c0 44 89 3c 24 41 89 df 44 89 f3 41 89 c6 eb 16 <49> 8b 7c
24 18 49 8d 54 24 18 4c 89 e6 4c 39 ea 4c 8d 67 e8 74 58
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RSP: 0018:ffff97f6ea8bbdf0 EFLAGS: 00010283
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RAX: 00007f6db1a5d000 RBX: 0000000000000001 RCX: ffffa5530c5f0140
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RDX: ffff97f6e4287d58 RSI: ffff97f6e4287d40 RDI: 0000000000000246
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe62414a860, 2, 50) returning 0 times: 0, 0, 0, 2191, 0 ccount 277
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RBP: ffffa5530c5bd580 R08: 00007f6db1a5d9c0 R09: 0000000000000001


2) In addition, we got a second crash on the same function a few lines above the previous one 

Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? pointer+0x137/0x350
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  printk+0x58/0x6f
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  panic+0xce/0x238
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  __stack_chk_fail+0x15/0x20
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? plist_del+0x5f/0xb0
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  __schedule+0x243/0x830
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? schedule+0x28/0x80
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? exit_to_usermode_loop+0x57/0xe0
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? prepare_exit_to_usermode+0x70/0x90
Jun 12 11:20:43 c-node06 kernel: [91837.319613]  ? retint_user+0x8/0x8


(gdb) l *do_futex+0xa3d
0xffffffff8113985d is in do_futex (kernel/futex.c:1742).
1737			if (!(flags & FLAGS_SHARED)) {
1738				cond_resched();
1739				goto retry_private;
1740			}
1741	
1742			put_futex_key(&key2);
1743			put_futex_key(&key1);
1744			cond_resched();
1745			goto retry;
1746		}
(gdb)


Closer to the         double_lock_hb(hb1, hb2) you mention.


Regarding running without proprietary modules, we didn't manage to reproduce, but we are getting half of the  IO load while this problem happens

Thx
David 



-----Original Message-----
From: Matthew Wilcox <willy@infradead.org> 
Sent: Sunday, June 13, 2021 11:04 PM
To: David Mozes <david.mozes@silk.us>
Cc: linux-fsdevel@vger.kernel.org; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Darren Hart <dvhart@infradead.org>; linux-kernel@vger.kernel.org
Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL

On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
> Hi *,
> Under a very high load of io traffic, we got the below  BUG trace.
> We can see that:
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
>                 if (match_futex (&this->key, &key1))
>  
> were called with hb1 = NULL at futex_wake_up function.
> And there is no protection on the code regarding such a scenario.
>  
> The NULL can  be geting from:
> hb1 = hash_futex(&key1);
>  
> How can we protect against such a situation?

Can you reproduce it without loading proprietary modules?

Your analysis doesn't quite make sense:

        hb1 = hash_futex(&key1);
        hb2 = hash_futex(&key2);

retry_private:
        double_lock_hb(hb1, hb2);

If hb1 were NULL, then the oops would come earlier, in double_lock_hb().

> RIP: 0010:do_futex+0xdf/0xa90
>  
> 0xffffffff81138eff is in do_futex (kernel/futex.c:1748).
> 1743                                       put_futex_key(&key1);
> 1744                                       cond_resched();
> 1745                                       goto retry;
> 1746                       }
> 1747      
> 1748                       plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> 1749                                       if (match_futex (&this->key, &key1)) {
> 1750                                                       if (this->pi_state || this->rt_waiter) {
> 1751                                                                       ret = -EINVAL;
> 1752                                                                       goto out_unlock;
> (gdb)
>  
>  
>  
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
>                 if (match_futex (&this->key, &key1)) {
>  
>  
>  
>  
> This happened in kernel  4.19.149 running on Azure vm
>  
>  
> Thx
> David
> Reply 
> Forward 
> MO
> 

  reply	other threads:[~2021-06-14 11:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-13 12:24 futex/call -to plist_for_each_entry_safe with head=NULL David Mozes
2021-06-13 20:04 ` Matthew Wilcox
2021-06-14 11:12   ` David Mozes [this message]
2021-06-15 15:03   ` Thomas Gleixner
2021-06-16 15:42     ` David Mozes
2021-06-20 16:04       ` David Mozes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM6PR04MB5639FBD246251DEF694AB02FF1319@AM6PR04MB5639.eurprd04.prod.outlook.com \
    --to=david.mozes@silk.us \
    --cc=dvhart@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.