All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Borislav Petkov <bp@alien8.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-edac@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 1/2] ras: fix an off-by-one error in __find_elem()
Date: Tue, 16 Apr 2019 16:16:08 -0700	[thread overview]
Message-ID: <CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com> (raw)
In-Reply-To: <20190416214634.GP31772@zn.tnic>

On Tue, Apr 16, 2019 at 2:46 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Apr 16, 2019 at 02:33:50PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access right after the while loop. In this case,
> > we should not even read it, just return -ENOKEY instead.
> >
> > Note, this could cause a kernel crash if ce_arr.n is exactly
> > MAX_ELEMS.
>
> "Could cause"?
>
> I'm still waiting for a demonstration. You can build a case through
> writing values in the debugfs nodes I pointed you at or even with a
> patch ontop preparing the exact conditions for it to crash. And then
> give me that "recipe" to trigger it here in a VM.

It is actually fairly easy:

1) Fill the whole page with PFN's:
for i in `seq 0 511`; do echo $i >> /sys/kernel/debug/ras/cec/pfn; done

2) Set thresh to 1 in order to trigger the deletion:
echo 1 > /sys/kernel/debug/ras/cec/count_threshold

3) Repeatedly add and remove the last element:
echo 512 >> /sys/kernel/debug/ras/cec/pfn
(until you get a crash.)

In case you still don't get it, here it is:

[   57.732593] BUG: unable to handle kernel paging request at ffff9c667bca0000
[   57.734994] #PF error: [PROT] [WRITE]
[   57.735891] PGD 75601067 P4D 75601067 PUD 75605067 PMD 7bca1063 PTE
800000007bca0061
[   57.737702] Oops: 0003 [#1] SMP PTI
[   57.738533] CPU: 0 PID: 649 Comm: bash Not tainted 5.1.0-rc5+ #561
[   57.739965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29
04/01/2014
[   57.742892] RIP: 0010:__memmove+0x57/0x1a0
[   57.743853] Code: 00 72 05 40 38 fe 74 3b 48 83 ea 20 48 83 ea 20
4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 4c 89 1f 4c
89 57 08 <4c> 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2
00 00
[   57.748150] RSP: 0018:ffffbe2ec0c8bdf8 EFLAGS: 00010206
[   57.749371] RAX: ffff9c667a5c1ff0 RBX: 0000000000000001 RCX: 0000000000000ff8
[   57.751018] RDX: 00000007fe921fb8 RSI: ffff9c667bca0018 RDI: ffff9c667bc9fff0
[   57.752674] RBP: 0000000000000200 R08: 0000000000000000 R09: 0000015c00000000
[   57.754325] R10: 0000000000040001 R11: 5a5a5a5a5a5a5a5a R12: 0000000000000004
[   57.755976] R13: ffff9c6671787778 R14: ffff9c6671787728 R15: ffff9c6671787750
[   57.757631] FS:  00007f33ca294740(0000) GS:ffff9c667d800000(0000)
knlGS:0000000000000000
[   57.759689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.761023] CR2: ffff9c667bca0000 CR3: 000000007061e000 CR4: 00000000000406f0
[   57.762681] Call Trace:
[   57.763275]  del_elem.constprop.1+0x39/0x40
[   57.764260]  cec_add_elem+0x1e4/0x211
[   57.765129]  simple_attr_write+0xa2/0xc3
[   57.766057]  debugfs_attr_write+0x45/0x5c
[   57.767005]  full_proxy_write+0x4b/0x65
[   57.767911]  ? full_proxy_poll+0x50/0x50
[   57.768844]  vfs_write+0xb8/0xf5
[   57.769613]  ksys_write+0x6b/0xb8
[   57.770407]  do_syscall_64+0x57/0x65
[   57.771249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

I will leave it as a homework for explaining why the crash is inside
memmove(). ;)

Thanks.

WARNING: multiple messages have this Message-ID (diff)
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Borislav Petkov <bp@alien8.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-edac@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [v2,1/2] ras: fix an off-by-one error in __find_elem()
Date: Tue, 16 Apr 2019 16:16:08 -0700	[thread overview]
Message-ID: <CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com> (raw)

On Tue, Apr 16, 2019 at 2:46 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Apr 16, 2019 at 02:33:50PM -0700, Cong Wang wrote:
> > ce_arr.array[] is always within the range [0, ce_arr.n-1].
> > However, the binary search code in __find_elem() uses ce_arr.n
> > as the maximum index, which could lead to an off-by-one
> > out-of-bound access right after the while loop. In this case,
> > we should not even read it, just return -ENOKEY instead.
> >
> > Note, this could cause a kernel crash if ce_arr.n is exactly
> > MAX_ELEMS.
>
> "Could cause"?
>
> I'm still waiting for a demonstration. You can build a case through
> writing values in the debugfs nodes I pointed you at or even with a
> patch ontop preparing the exact conditions for it to crash. And then
> give me that "recipe" to trigger it here in a VM.

It is actually fairly easy:

1) Fill the whole page with PFN's:
for i in `seq 0 511`; do echo $i >> /sys/kernel/debug/ras/cec/pfn; done

2) Set thresh to 1 in order to trigger the deletion:
echo 1 > /sys/kernel/debug/ras/cec/count_threshold

3) Repeatedly add and remove the last element:
echo 512 >> /sys/kernel/debug/ras/cec/pfn
(until you get a crash.)

In case you still don't get it, here it is:

[   57.732593] BUG: unable to handle kernel paging request at ffff9c667bca0000
[   57.734994] #PF error: [PROT] [WRITE]
[   57.735891] PGD 75601067 P4D 75601067 PUD 75605067 PMD 7bca1063 PTE
800000007bca0061
[   57.737702] Oops: 0003 [#1] SMP PTI
[   57.738533] CPU: 0 PID: 649 Comm: bash Not tainted 5.1.0-rc5+ #561
[   57.739965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29
04/01/2014
[   57.742892] RIP: 0010:__memmove+0x57/0x1a0
[   57.743853] Code: 00 72 05 40 38 fe 74 3b 48 83 ea 20 48 83 ea 20
4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b 46 18 48 8d 76 20 4c 89 1f 4c
89 57 08 <4c> 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2
00 00
[   57.748150] RSP: 0018:ffffbe2ec0c8bdf8 EFLAGS: 00010206
[   57.749371] RAX: ffff9c667a5c1ff0 RBX: 0000000000000001 RCX: 0000000000000ff8
[   57.751018] RDX: 00000007fe921fb8 RSI: ffff9c667bca0018 RDI: ffff9c667bc9fff0
[   57.752674] RBP: 0000000000000200 R08: 0000000000000000 R09: 0000015c00000000
[   57.754325] R10: 0000000000040001 R11: 5a5a5a5a5a5a5a5a R12: 0000000000000004
[   57.755976] R13: ffff9c6671787778 R14: ffff9c6671787728 R15: ffff9c6671787750
[   57.757631] FS:  00007f33ca294740(0000) GS:ffff9c667d800000(0000)
knlGS:0000000000000000
[   57.759689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.761023] CR2: ffff9c667bca0000 CR3: 000000007061e000 CR4: 00000000000406f0
[   57.762681] Call Trace:
[   57.763275]  del_elem.constprop.1+0x39/0x40
[   57.764260]  cec_add_elem+0x1e4/0x211
[   57.765129]  simple_attr_write+0xa2/0xc3
[   57.766057]  debugfs_attr_write+0x45/0x5c
[   57.767005]  full_proxy_write+0x4b/0x65
[   57.767911]  ? full_proxy_poll+0x50/0x50
[   57.768844]  vfs_write+0xb8/0xf5
[   57.769613]  ksys_write+0x6b/0xb8
[   57.770407]  do_syscall_64+0x57/0x65
[   57.771249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

I will leave it as a homework for explaining why the crash is inside
memmove(). ;)

Thanks.

  reply	other threads:[~2019-04-16 23:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-16 21:33 [PATCH v2 1/2] ras: fix an off-by-one error in __find_elem() Cong Wang
2019-04-16 21:33 ` [v2,1/2] " Cong Wang
2019-04-16 21:33 ` [PATCH v2 2/2] ras: close the race condition with timer Cong Wang
2019-04-16 21:33   ` [v2,2/2] " Cong Wang
2019-06-08 15:28   ` [tip:ras/urgent] RAS/CEC: Convert the timer callback to a workqueue tip-bot for Cong Wang
2019-04-16 21:46 ` [PATCH v2 1/2] ras: fix an off-by-one error in __find_elem() Borislav Petkov
2019-04-16 21:46   ` [v2,1/2] " Borislav Petkov
2019-04-16 23:16   ` Cong Wang [this message]
2019-04-16 23:16     ` Cong Wang
2019-04-20 11:34     ` [PATCH v2 1/2] " Borislav Petkov
2019-04-20 11:34       ` [v2,1/2] " Borislav Petkov
2019-04-20 18:25       ` [PATCH v2 1/2] " Cong Wang
2019-04-20 18:25         ` [v2,1/2] " Cong Wang
2019-04-20 19:04         ` [PATCH v2 1/2] " Borislav Petkov
2019-04-20 19:04           ` [v2,1/2] " Borislav Petkov
2019-04-20 19:15           ` [PATCH v2 1/2] " Cong Wang
2019-04-20 19:15             ` [v2,1/2] " Cong Wang
2019-04-21  8:27             ` [PATCH v2 1/2] " Borislav Petkov
2019-04-21  8:27               ` [v2,1/2] " Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.