* kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove
@ 2021-04-18 14:44 Marek Marczykowski-Górecki
2021-04-19 9:33 ` Juergen Gross
0 siblings, 1 reply; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-04-18 14:44 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 6285 bytes --]
Hi,
I've recently got the crash like below. I'm not sure what exactly
triggers it (besides grant table mapping as seen in the call trace), and
also I don't have reliable reproducer. It happened once for about ~30
startups.
Previous version tested was 5.10.25 and it didn't happened there, but
since reproduction rate is not great, it could be just luck...
[ 1053.550389] BUG: kernel NULL pointer dereference, address: 00000000000003b0
[ 1053.557844] #PF: supervisor read access in kernel mode
[ 1053.557847] #PF: error_code(0x0000) - not-present page
[ 1053.557851] PGD 0 P4D 0
[ 1053.557858] Oops: 0000 [#1] SMP NOPTI
[ 1053.557863] CPU: 1 PID: 8806 Comm: Xorg Tainted: G W 5.10.28-1.fc32.qubes.x86_64 #1
[ 1053.557865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 1053.557876] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
[ 1053.557879] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
[ 1053.557881] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
[ 1053.557883] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1053.557884] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
[ 1053.557885] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
[ 1053.557886] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
[ 1053.557887] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
[ 1053.557913] FS: 0000716f7f9a3a40(0000) GS:ffff888140300000(0000) knlGS:0000000000000000
[ 1053.557915] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1053.557916] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
[ 1053.557919] Call Trace:
[ 1053.557944] gntdev_mmap+0x275/0x2f9 [xen_gntdev]
[ 1053.557950] mmap_region+0x47e/0x720
[ 1053.557953] do_mmap+0x438/0x540
[ 1053.557959] ? security_mmap_file+0x81/0xd0
[ 1053.557963] vm_mmap_pgoff+0xdf/0x130
[ 1053.557967] ksys_mmap_pgoff+0x1d6/0x240
[ 1053.557973] do_syscall_64+0x33/0x40
[ 1053.557977] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1053.557981] RIP: 0033:0x716f7fe8c2e6
[ 1053.557985] Code: 01 00 66 90 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74 37 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 62 5b 5d c3 0f 1f 80 00 00 00 00 48 8b 05 79
[ 1053.557986] RSP: 002b:00007ffcb4ef35c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[ 1053.557988] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000716f7fe8c2e6
[ 1053.557989] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[ 1053.557990] RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
[ 1053.557991] R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcb4ef35e0
[ 1053.557992] R13: 0000000000000001 R14: 0000000000000009 R15: 0000000000000001
[ 1053.557995] Modules linked in: loop nf_tables nfnetlink vfat fat xfs snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation ppdev snd_soc_core snd_compress snd_pcm_dmaengine soundwire_cadence joydev snd_hda_codec snd_hda_core ac97_bus snd_hwdep snd_seq snd_seq_device snd_pcm edac_mce_amd snd_timer pcspkr snd soundcore e1000e i2c_piix4 parport_pc parport xenfs fuse ip_tables dm_crypt bochs_drm drm_vram_helper drm_kms_helper cec drm_ttm_helper ttm serio_raw drm virtio_scsi virtio_console ehci_pci ehci_hcd ata_generic pata_acpi floppy qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
[ 1053.558040] CR2: 00000000000003b0
[ 1053.558135] ---[ end trace 3c5c2ca63aac717a ]---
[ 1054.277085] snd_hda_intel 0000:00:03.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
[ 1054.927022] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
[ 1054.929170] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
[ 1054.937800] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
[ 1054.947281] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1054.949535] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
[ 1054.973016] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
[ 1054.976678] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
[ 1054.978850] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
[ 1054.980751] FS: 0000716f7f9a3a40(0000) GS:ffff888140300000(0000) knlGS:0000000000000000
[ 1054.982878] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1054.984509] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
[ 1054.990508] Kernel panic - not syncing: Fatal exception
[ 1054.991967] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
Looking at the surrounding code, it is access to 0x3b0(%r12), which is
0x38(%rdi):
ffffffff812f5930 <mmu_interval_notifier_remove>:
ffffffff812f5930: e8 8b 09 d7 ff callq ffffffff810662c0 <__fentry__>
ffffffff812f5935: 41 55 push %r13
ffffffff812f5937: 41 54 push %r12
ffffffff812f5939: 55 push %rbp
ffffffff812f593a: 48 89 fd mov %rdi,%rbp
ffffffff812f593d: 53 push %rbx
ffffffff812f593e: 48 83 ec 30 sub $0x30,%rsp
ffffffff812f5942: 4c 8b 67 38 mov 0x38(%rdi),%r12
ffffffff812f5946: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
ffffffff812f594d: 00 00
ffffffff812f594f: 48 89 44 24 28 mov %rax,0x28(%rsp)
ffffffff812f5954: 31 c0 xor %eax,%eax
ffffffff812f5956: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
ffffffff812f595d: 00
ffffffff812f595e: 49 8b 9c 24 b0 03 00 mov 0x3b0(%r12),%rbx
ffffffff812f5965: 00
If my calculation is right, it means map->notifier->mm is NULL.
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove
2021-04-18 14:44 kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove Marek Marczykowski-Górecki
@ 2021-04-19 9:33 ` Juergen Gross
2021-04-23 3:22 ` Marek Marczykowski-Górecki
0 siblings, 1 reply; 3+ messages in thread
From: Juergen Gross @ 2021-04-19 9:33 UTC (permalink / raw)
To: Marek Marczykowski-Górecki, xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 6489 bytes --]
On 18.04.21 16:44, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I've recently got the crash like below. I'm not sure what exactly
> triggers it (besides grant table mapping as seen in the call trace), and
> also I don't have reliable reproducer. It happened once for about ~30
> startups.
>
> Previous version tested was 5.10.25 and it didn't happened there, but
> since reproduction rate is not great, it could be just luck...
>
> [ 1053.550389] BUG: kernel NULL pointer dereference, address: 00000000000003b0
> [ 1053.557844] #PF: supervisor read access in kernel mode
> [ 1053.557847] #PF: error_code(0x0000) - not-present page
> [ 1053.557851] PGD 0 P4D 0
> [ 1053.557858] Oops: 0000 [#1] SMP NOPTI
> [ 1053.557863] CPU: 1 PID: 8806 Comm: Xorg Tainted: G W 5.10.28-1.fc32.qubes.x86_64 #1
> [ 1053.557865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [ 1053.557876] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
> [ 1053.557879] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
> [ 1053.557881] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
> [ 1053.557883] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1053.557884] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
> [ 1053.557885] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
> [ 1053.557886] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
> [ 1053.557887] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
> [ 1053.557913] FS: 0000716f7f9a3a40(0000) GS:ffff888140300000(0000) knlGS:0000000000000000
> [ 1053.557915] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1053.557916] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
> [ 1053.557919] Call Trace:
> [ 1053.557944] gntdev_mmap+0x275/0x2f9 [xen_gntdev]
> [ 1053.557950] mmap_region+0x47e/0x720
> [ 1053.557953] do_mmap+0x438/0x540
> [ 1053.557959] ? security_mmap_file+0x81/0xd0
> [ 1053.557963] vm_mmap_pgoff+0xdf/0x130
> [ 1053.557967] ksys_mmap_pgoff+0x1d6/0x240
> [ 1053.557973] do_syscall_64+0x33/0x40
> [ 1053.557977] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1053.557981] RIP: 0033:0x716f7fe8c2e6
> [ 1053.557985] Code: 01 00 66 90 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 53 89 cb 48 85 ff 74 37 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 62 5b 5d c3 0f 1f 80 00 00 00 00 48 8b 05 79
> [ 1053.557986] RSP: 002b:00007ffcb4ef35c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
> [ 1053.557988] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000716f7fe8c2e6
> [ 1053.557989] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
> [ 1053.557990] RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
> [ 1053.557991] R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcb4ef35e0
> [ 1053.557992] R13: 0000000000000001 R14: 0000000000000009 R15: 0000000000000001
> [ 1053.557995] Modules linked in: loop nf_tables nfnetlink vfat fat xfs snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation ppdev snd_soc_core snd_compress snd_pcm_dmaengine soundwire_cadence joydev snd_hda_codec snd_hda_core ac97_bus snd_hwdep snd_seq snd_seq_device snd_pcm edac_mce_amd snd_timer pcspkr snd soundcore e1000e i2c_piix4 parport_pc parport xenfs fuse ip_tables dm_crypt bochs_drm drm_vram_helper drm_kms_helper cec drm_ttm_helper ttm serio_raw drm virtio_scsi virtio_console ehci_pci ehci_hcd ata_generic pata_acpi floppy qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
> [ 1053.558040] CR2: 00000000000003b0
> [ 1053.558135] ---[ end trace 3c5c2ca63aac717a ]---
> [ 1054.277085] snd_hda_intel 0000:00:03.0: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
> [ 1054.927022] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
> [ 1054.929170] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
> [ 1054.937800] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
> [ 1054.947281] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1054.949535] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
> [ 1054.973016] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
> [ 1054.976678] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
> [ 1054.978850] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
> [ 1054.980751] FS: 0000716f7f9a3a40(0000) GS:ffff888140300000(0000) knlGS:0000000000000000
> [ 1054.982878] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1054.984509] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
> [ 1054.990508] Kernel panic - not syncing: Fatal exception
> [ 1054.991967] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>
> Looking at the surrounding code, it is access to 0x3b0(%r12), which is
> 0x38(%rdi):
>
> ffffffff812f5930 <mmu_interval_notifier_remove>:
> ffffffff812f5930: e8 8b 09 d7 ff callq ffffffff810662c0 <__fentry__>
> ffffffff812f5935: 41 55 push %r13
> ffffffff812f5937: 41 54 push %r12
> ffffffff812f5939: 55 push %rbp
> ffffffff812f593a: 48 89 fd mov %rdi,%rbp
> ffffffff812f593d: 53 push %rbx
> ffffffff812f593e: 48 83 ec 30 sub $0x30,%rsp
> ffffffff812f5942: 4c 8b 67 38 mov 0x38(%rdi),%r12
> ffffffff812f5946: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
> ffffffff812f594d: 00 00
> ffffffff812f594f: 48 89 44 24 28 mov %rax,0x28(%rsp)
> ffffffff812f5954: 31 c0 xor %eax,%eax
> ffffffff812f5956: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
> ffffffff812f595d: 00
> ffffffff812f595e: 49 8b 9c 24 b0 03 00 mov 0x3b0(%r12),%rbx
> ffffffff812f5965: 00
>
> If my calculation is right, it means map->notifier->mm is NULL.
>
Could you try the attached patch?
Juergen
[-- Attachment #1.1.2: 0001-xen-gntdev-fix-gntdev_mmap-error-exit-path.patch --]
[-- Type: text/x-patch, Size: 1472 bytes --]
From 7ff3c32b36279aacef9cf80f4103fc6050759c10 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Mon, 19 Apr 2021 11:15:59 +0200
Subject: [PATCH] xen/gntdev: fix gntdev_mmap() error exit path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
introduced an error in gntdev_mmap(): in case the call of
mmu_interval_notifier_insert_locked() fails the exit path should not
call mmu_interval_notifier_remove().
One reason for failure is e.g. a signal pending for the running
process.
Fixes: d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
Cc: stable@vger.kernel.org
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
drivers/xen/gntdev.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index f01d58c7a042..a3e7be96527d 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1017,8 +1017,10 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
err = mmu_interval_notifier_insert_locked(
&map->notifier, vma->vm_mm, vma->vm_start,
vma->vm_end - vma->vm_start, &gntdev_mmu_ops);
- if (err)
+ if (err) {
+ map->vma = NULL;
goto out_unlock_put;
+ }
}
mutex_unlock(&priv->lock);
--
2.26.2
[-- Attachment #1.1.3: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove
2021-04-19 9:33 ` Juergen Gross
@ 2021-04-23 3:22 ` Marek Marczykowski-Górecki
0 siblings, 0 replies; 3+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-04-23 3:22 UTC (permalink / raw)
To: Juergen Gross; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]
On Mon, Apr 19, 2021 at 11:33:27AM +0200, Juergen Gross wrote:
> Could you try the attached patch?
I've tried and it works, as in - I didn't get the crash in ~20 runs.
Since the issue is quite hard to reproduce, I'm not fully sure it
helped, but sounds plausible. I think you can treat this as Tested-by:
;)
Thanks!
> From 7ff3c32b36279aacef9cf80f4103fc6050759c10 Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@suse.com>
> Date: Mon, 19 Apr 2021 11:15:59 +0200
> Subject: [PATCH] xen/gntdev: fix gntdev_mmap() error exit path
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Commit d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
> introduced an error in gntdev_mmap(): in case the call of
> mmu_interval_notifier_insert_locked() fails the exit path should not
> call mmu_interval_notifier_remove().
>
> One reason for failure is e.g. a signal pending for the running
> process.
>
> Fixes: d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
> Cc: stable@vger.kernel.org
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> drivers/xen/gntdev.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index f01d58c7a042..a3e7be96527d 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -1017,8 +1017,10 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
> err = mmu_interval_notifier_insert_locked(
> &map->notifier, vma->vm_mm, vma->vm_start,
> vma->vm_end - vma->vm_start, &gntdev_mmu_ops);
> - if (err)
> + if (err) {
> + map->vma = NULL;
> goto out_unlock_put;
> + }
> }
> mutex_unlock(&priv->lock);
>
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-04-23 3:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-18 14:44 kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove Marek Marczykowski-Górecki
2021-04-19 9:33 ` Juergen Gross
2021-04-23 3:22 ` Marek Marczykowski-Górecki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).