From: Jani Nikula <jani.nikula@linux.intel.com> To: tcamuso <tcamuso@redhat.com>, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: airlied@linux.ie, dkwon@redhat.com, tcamuso@redhat.com Subject: Re: [PATCH] drm: assure aux_dev is nonzero before using it Date: Fri, 24 May 2019 11:36:14 +0300 [thread overview] Message-ID: <87v9y0mept.fsf@intel.com> (raw) In-Reply-To: <20190523110905.22445-1-tcamuso@redhat.com> On Thu, 23 May 2019, tcamuso <tcamuso@redhat.com> wrote: > From Daniel Kwon <dkwon@redhat.com> > > The system was crashed due to invalid memory access while trying to access > auxiliary device. > > crash> bt > PID: 9863 TASK: ffff89d1bdf11040 CPU: 1 COMMAND: "ipmitool" > #0 [ffff89cedd7f3868] machine_kexec at ffffffffb0663674 > #1 [ffff89cedd7f38c8] __crash_kexec at ffffffffb071cf62 > #2 [ffff89cedd7f3998] crash_kexec at ffffffffb071d050 > #3 [ffff89cedd7f39b0] oops_end at ffffffffb0d6d758 > #4 [ffff89cedd7f39d8] no_context at ffffffffb0d5bcde > #5 [ffff89cedd7f3a28] __bad_area_nosemaphore at ffffffffb0d5bd75 > #6 [ffff89cedd7f3a78] bad_area at ffffffffb0d5c085 > #7 [ffff89cedd7f3aa0] __do_page_fault at ffffffffb0d7080c > #8 [ffff89cedd7f3b10] do_page_fault at ffffffffb0d70905 > #9 [ffff89cedd7f3b40] page_fault at ffffffffb0d6c758 > [exception RIP: drm_dp_aux_dev_get_by_minor+0x3d] > RIP: ffffffffc0a589bd RSP: ffff89cedd7f3bf0 RFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff89cedd7f3fd8 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc0a613e0 > RBP: ffff89cedd7f3bf8 R8: ffff89f1bcbabbd0 R9: 0000000000000000 > R10: ffff89f1be7a1cc0 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff89f1b32a2830 R14: ffff89d18fadfa00 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > RIP: 00002b45f0d80d30 RSP: 00007ffc416066a0 RFLAGS: 00010246 > RAX: 0000000000000002 RBX: 000056062e212d80 RCX: 00007ffc41606810 > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffc41606ec0 > RBP: 0000000000000000 R8: 000056062dfed229 R9: 00002b45f0cdf14d > R10: 0000000000000002 R11: 0000000000000246 R12: 00007ffc41606ec0 > R13: 00007ffc41606ed0 R14: 00007ffc41606ee0 R15: 0000000000000000 > ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b > > ---------------------------------------------------------------------------- > > It was trying to open '/dev/ipmi0', but as no entry in aux_dir, it returned > NULL from 'idr_find()'. This drm_dp_aux_dev_get_by_minor() should have done a > check on this, but had failed to do it. I think the better question is, *why* does the idr_find() return NULL? I don't think it should, under any circumstances. I fear adding the check here papers over some other problem, taking us further away from the root cause. Also, can you reproduce this on a recent upstream kernel? The aux device nodes were introduced in kernel v4.6. Whatever you reproduced on v3.10 is pretty much irrelevant for upstream. BR, Jani. > > ---------------------------------------------------------------------------- > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 114 > 114 struct idr_layer *hint = rcu_dereference_raw(idr->hint); > 0xffffffffc0a58998 <drm_dp_aux_dev_get_by_minor+0x18>: mov 0x8a41(%rip),%rax # 0xffffffffc0a613e0 <aux_idr> > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 116 > 116 if (hint && (id & ~IDR_MASK) == hint->prefix) > 117 return rcu_dereference_raw(hint->ary[id & IDR_MASK]); > 0xffffffffc0a5899f <drm_dp_aux_dev_get_by_minor+0x1f>: test %rax,%rax > 0xffffffffc0a589a2 <drm_dp_aux_dev_get_by_minor+0x22>: je 0xffffffffc0a589ac <drm_dp_aux_dev_get_by_minor+0x2c> > 0xffffffffc0a589a4 <drm_dp_aux_dev_get_by_minor+0x24>: mov %ebx,%edx > 0xffffffffc0a589a6 <drm_dp_aux_dev_get_by_minor+0x26>: xor %dl,%dl > 0xffffffffc0a589a8 <drm_dp_aux_dev_get_by_minor+0x28>: cmp (%rax),%edx > 0xffffffffc0a589aa <drm_dp_aux_dev_get_by_minor+0x2a>: je 0xffffffffc0a589f0 <drm_dp_aux_dev_get_by_minor+0x70> > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 119 > 119 return idr_find_slowpath(idr, id); > 0xffffffffc0a589ac <drm_dp_aux_dev_get_by_minor+0x2c>: mov %ebx,%esi > 0xffffffffc0a589ae <drm_dp_aux_dev_get_by_minor+0x2e>: mov $0xffffffffc0a613e0,%rdi > 0xffffffffc0a589b5 <drm_dp_aux_dev_get_by_minor+0x35>: callq 0xffffffffb09771b0 <idr_find_slowpath> > 0xffffffffc0a589ba <drm_dp_aux_dev_get_by_minor+0x3a>: mov %rax,%rbx > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/arch/x86/include/asm/atomic.h: 25 > 25 return ACCESS_ONCE((v)->counter); > 0xffffffffc0a589bd <drm_dp_aux_dev_get_by_minor+0x3d>: mov 0x18(%rbx),%edx > > crash> struct file.f_path 0xffff89d18fadfa00 > f_path = { > mnt = 0xffff89f23feaa620, > dentry = 0xffff89f1be7a1cc0 > } > crash> files -d 0xffff89f1be7a1cc0 > DENTRY INODE SUPERBLK TYPE PATH > ffff89f1be7a1cc0 ffff89f1b32a2830 ffff89d293aa8800 CHR /dev/ipmi0 > > crash> struct inode.i_rdev ffff89f1b32a2830 > i_rdev = 0xf200000 > crash> eval (0xfffff & 0xf200000) > hexadecimal: 0 > decimal: 0 > octal: 0 > binary: 0000000000000000000000000000000000000000000000000000000000000000 > ---------------------------------------------------------------------------- > > As the index value was 0 and aux_idr had value 0 for all, it can have value > NULL from idr_find() function, but the below function doesn't check and just > tries to use it. > > ---------------------------------------------------------------------------- > crash> aux_idr > aux_idr = $8 = { > hint = 0x0, > top = 0x0, > id_free = 0x0, > layers = 0x0, > id_free_cnt = 0x0, > cur = 0x0, > lock = { > { > rlock = { > raw_lock = { > val = { > counter = 0x0 > } > } > } > } > } > } > > crash> edis -f drm_dp_aux_dev_get_by_minor > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/drivers/gpu/drm/drm_dp_aux_dev.c: 57 > > 56 static struct drm_dp_aux_dev *drm_dp_aux_dev_get_by_minor(unsigned index) > 57 { > 58 struct drm_dp_aux_dev *aux_dev = NULL; > 59 > 60 mutex_lock(&aux_idr_mutex); > 61 aux_dev = idr_find(&aux_idr, index); > 62 if (!kref_get_unless_zero(&aux_dev->refcount)) > 63 aux_dev = NULL; > 64 mutex_unlock(&aux_idr_mutex); > 65 > 66 return aux_dev; > 67 } > ---------------------------------------------------------------------------- > > To avoid this kinds of situation, we should make a safeguard for the returned > value. Changing the line 62 with the below would do. > > 62 if (aux_dev && !kref_get_unless_zero(&aux_dev->refcount)) > ^^^^^^^^^^ > From Tony Camuso <tcamuso@redhat.com> > I built a patched kernel for several architectures. > Booted the kernel, and ran the following for 100 iterations. > rmmod ipmi kmods to remove /dev/ipmi0. > Invoked ipmitool > insmod ipmi kmods > Did not see any crashes or call traces. > > Suggested-by: Daniel Kwon <dkwon@redhat.com> > Signed-off-by: Tony Camuso <tcamuso@redhat.com> > --- > drivers/gpu/drm/drm_dp_aux_dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c > index 0e4f25d63fd2d..0b11210c882ee 100644 > --- a/drivers/gpu/drm/drm_dp_aux_dev.c > +++ b/drivers/gpu/drm/drm_dp_aux_dev.c > @@ -60,7 +60,7 @@ static struct drm_dp_aux_dev *drm_dp_aux_dev_get_by_minor(unsigned index) > > mutex_lock(&aux_idr_mutex); > aux_dev = idr_find(&aux_idr, index); > - if (!kref_get_unless_zero(&aux_dev->refcount)) > + if (aux_dev && !kref_get_unless_zero(&aux_dev->refcount)) > aux_dev = NULL; > mutex_unlock(&aux_idr_mutex); -- Jani Nikula, Intel Open Source Graphics Center
WARNING: multiple messages have this Message-ID (diff)
From: Jani Nikula <jani.nikula@linux.intel.com> To: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: airlied@linux.ie, dkwon@redhat.com, tcamuso@redhat.com Subject: Re: [PATCH] drm: assure aux_dev is nonzero before using it Date: Fri, 24 May 2019 11:36:14 +0300 [thread overview] Message-ID: <87v9y0mept.fsf@intel.com> (raw) In-Reply-To: <20190523110905.22445-1-tcamuso@redhat.com> On Thu, 23 May 2019, tcamuso <tcamuso@redhat.com> wrote: > From Daniel Kwon <dkwon@redhat.com> > > The system was crashed due to invalid memory access while trying to access > auxiliary device. > > crash> bt > PID: 9863 TASK: ffff89d1bdf11040 CPU: 1 COMMAND: "ipmitool" > #0 [ffff89cedd7f3868] machine_kexec at ffffffffb0663674 > #1 [ffff89cedd7f38c8] __crash_kexec at ffffffffb071cf62 > #2 [ffff89cedd7f3998] crash_kexec at ffffffffb071d050 > #3 [ffff89cedd7f39b0] oops_end at ffffffffb0d6d758 > #4 [ffff89cedd7f39d8] no_context at ffffffffb0d5bcde > #5 [ffff89cedd7f3a28] __bad_area_nosemaphore at ffffffffb0d5bd75 > #6 [ffff89cedd7f3a78] bad_area at ffffffffb0d5c085 > #7 [ffff89cedd7f3aa0] __do_page_fault at ffffffffb0d7080c > #8 [ffff89cedd7f3b10] do_page_fault at ffffffffb0d70905 > #9 [ffff89cedd7f3b40] page_fault at ffffffffb0d6c758 > [exception RIP: drm_dp_aux_dev_get_by_minor+0x3d] > RIP: ffffffffc0a589bd RSP: ffff89cedd7f3bf0 RFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff89cedd7f3fd8 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc0a613e0 > RBP: ffff89cedd7f3bf8 R8: ffff89f1bcbabbd0 R9: 0000000000000000 > R10: ffff89f1be7a1cc0 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff89f1b32a2830 R14: ffff89d18fadfa00 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > RIP: 00002b45f0d80d30 RSP: 00007ffc416066a0 RFLAGS: 00010246 > RAX: 0000000000000002 RBX: 000056062e212d80 RCX: 00007ffc41606810 > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffc41606ec0 > RBP: 0000000000000000 R8: 000056062dfed229 R9: 00002b45f0cdf14d > R10: 0000000000000002 R11: 0000000000000246 R12: 00007ffc41606ec0 > R13: 00007ffc41606ed0 R14: 00007ffc41606ee0 R15: 0000000000000000 > ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b > > ---------------------------------------------------------------------------- > > It was trying to open '/dev/ipmi0', but as no entry in aux_dir, it returned > NULL from 'idr_find()'. This drm_dp_aux_dev_get_by_minor() should have done a > check on this, but had failed to do it. I think the better question is, *why* does the idr_find() return NULL? I don't think it should, under any circumstances. I fear adding the check here papers over some other problem, taking us further away from the root cause. Also, can you reproduce this on a recent upstream kernel? The aux device nodes were introduced in kernel v4.6. Whatever you reproduced on v3.10 is pretty much irrelevant for upstream. BR, Jani. > > ---------------------------------------------------------------------------- > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 114 > 114 struct idr_layer *hint = rcu_dereference_raw(idr->hint); > 0xffffffffc0a58998 <drm_dp_aux_dev_get_by_minor+0x18>: mov 0x8a41(%rip),%rax # 0xffffffffc0a613e0 <aux_idr> > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 116 > 116 if (hint && (id & ~IDR_MASK) == hint->prefix) > 117 return rcu_dereference_raw(hint->ary[id & IDR_MASK]); > 0xffffffffc0a5899f <drm_dp_aux_dev_get_by_minor+0x1f>: test %rax,%rax > 0xffffffffc0a589a2 <drm_dp_aux_dev_get_by_minor+0x22>: je 0xffffffffc0a589ac <drm_dp_aux_dev_get_by_minor+0x2c> > 0xffffffffc0a589a4 <drm_dp_aux_dev_get_by_minor+0x24>: mov %ebx,%edx > 0xffffffffc0a589a6 <drm_dp_aux_dev_get_by_minor+0x26>: xor %dl,%dl > 0xffffffffc0a589a8 <drm_dp_aux_dev_get_by_minor+0x28>: cmp (%rax),%edx > 0xffffffffc0a589aa <drm_dp_aux_dev_get_by_minor+0x2a>: je 0xffffffffc0a589f0 <drm_dp_aux_dev_get_by_minor+0x70> > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/include/linux/idr.h: 119 > 119 return idr_find_slowpath(idr, id); > 0xffffffffc0a589ac <drm_dp_aux_dev_get_by_minor+0x2c>: mov %ebx,%esi > 0xffffffffc0a589ae <drm_dp_aux_dev_get_by_minor+0x2e>: mov $0xffffffffc0a613e0,%rdi > 0xffffffffc0a589b5 <drm_dp_aux_dev_get_by_minor+0x35>: callq 0xffffffffb09771b0 <idr_find_slowpath> > 0xffffffffc0a589ba <drm_dp_aux_dev_get_by_minor+0x3a>: mov %rax,%rbx > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/arch/x86/include/asm/atomic.h: 25 > 25 return ACCESS_ONCE((v)->counter); > 0xffffffffc0a589bd <drm_dp_aux_dev_get_by_minor+0x3d>: mov 0x18(%rbx),%edx > > crash> struct file.f_path 0xffff89d18fadfa00 > f_path = { > mnt = 0xffff89f23feaa620, > dentry = 0xffff89f1be7a1cc0 > } > crash> files -d 0xffff89f1be7a1cc0 > DENTRY INODE SUPERBLK TYPE PATH > ffff89f1be7a1cc0 ffff89f1b32a2830 ffff89d293aa8800 CHR /dev/ipmi0 > > crash> struct inode.i_rdev ffff89f1b32a2830 > i_rdev = 0xf200000 > crash> eval (0xfffff & 0xf200000) > hexadecimal: 0 > decimal: 0 > octal: 0 > binary: 0000000000000000000000000000000000000000000000000000000000000000 > ---------------------------------------------------------------------------- > > As the index value was 0 and aux_idr had value 0 for all, it can have value > NULL from idr_find() function, but the below function doesn't check and just > tries to use it. > > ---------------------------------------------------------------------------- > crash> aux_idr > aux_idr = $8 = { > hint = 0x0, > top = 0x0, > id_free = 0x0, > layers = 0x0, > id_free_cnt = 0x0, > cur = 0x0, > lock = { > { > rlock = { > raw_lock = { > val = { > counter = 0x0 > } > } > } > } > } > } > > crash> edis -f drm_dp_aux_dev_get_by_minor > /usr/src/debug/kernel-3.10.0-957.12.1.el7/linux-3.10.0-957.12.1.el7.x86_64/drivers/gpu/drm/drm_dp_aux_dev.c: 57 > > 56 static struct drm_dp_aux_dev *drm_dp_aux_dev_get_by_minor(unsigned index) > 57 { > 58 struct drm_dp_aux_dev *aux_dev = NULL; > 59 > 60 mutex_lock(&aux_idr_mutex); > 61 aux_dev = idr_find(&aux_idr, index); > 62 if (!kref_get_unless_zero(&aux_dev->refcount)) > 63 aux_dev = NULL; > 64 mutex_unlock(&aux_idr_mutex); > 65 > 66 return aux_dev; > 67 } > ---------------------------------------------------------------------------- > > To avoid this kinds of situation, we should make a safeguard for the returned > value. Changing the line 62 with the below would do. > > 62 if (aux_dev && !kref_get_unless_zero(&aux_dev->refcount)) > ^^^^^^^^^^ > From Tony Camuso <tcamuso@redhat.com> > I built a patched kernel for several architectures. > Booted the kernel, and ran the following for 100 iterations. > rmmod ipmi kmods to remove /dev/ipmi0. > Invoked ipmitool > insmod ipmi kmods > Did not see any crashes or call traces. > > Suggested-by: Daniel Kwon <dkwon@redhat.com> > Signed-off-by: Tony Camuso <tcamuso@redhat.com> > --- > drivers/gpu/drm/drm_dp_aux_dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_dp_aux_dev.c b/drivers/gpu/drm/drm_dp_aux_dev.c > index 0e4f25d63fd2d..0b11210c882ee 100644 > --- a/drivers/gpu/drm/drm_dp_aux_dev.c > +++ b/drivers/gpu/drm/drm_dp_aux_dev.c > @@ -60,7 +60,7 @@ static struct drm_dp_aux_dev *drm_dp_aux_dev_get_by_minor(unsigned index) > > mutex_lock(&aux_idr_mutex); > aux_dev = idr_find(&aux_idr, index); > - if (!kref_get_unless_zero(&aux_dev->refcount)) > + if (aux_dev && !kref_get_unless_zero(&aux_dev->refcount)) > aux_dev = NULL; > mutex_unlock(&aux_idr_mutex); -- Jani Nikula, Intel Open Source Graphics Center
next prev parent reply other threads:[~2019-05-24 8:36 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-23 11:09 [PATCH] drm: assure aux_dev is nonzero before using it tcamuso 2019-05-23 11:09 ` tcamuso 2019-05-24 8:36 ` Jani Nikula [this message] 2019-05-24 8:36 ` Jani Nikula 2019-05-24 10:48 ` tony camuso 2019-05-24 11:58 ` Ville Syrjälä 2019-07-10 13:47 ` Tony Camuso 2019-07-10 13:56 ` Ville Syrjälä 2019-07-12 16:07 ` Tony Camuso 2019-07-12 17:06 ` Ville Syrjälä 2019-07-12 17:35 ` Tony Camuso 2019-09-23 15:03 ` Tony Camuso 2019-09-23 15:22 ` Ville Syrjälä 2020-08-10 17:11 Zwane Mwaikambo 2020-08-10 17:11 ` Zwane Mwaikambo 2020-08-11 8:58 ` Daniel Vetter 2020-08-11 8:58 ` Daniel Vetter 2020-08-11 22:16 ` Zwane Mwaikambo 2020-08-11 22:16 ` Zwane Mwaikambo 2020-08-12 14:10 ` Daniel Vetter 2020-08-12 14:10 ` Daniel Vetter 2020-08-12 15:44 ` Lyude Paul 2020-08-12 15:44 ` Lyude Paul 2020-08-12 20:21 ` Zwane Mwaikambo 2020-08-12 20:21 ` Zwane Mwaikambo 2020-08-18 17:58 ` Zwane Mwaikambo 2020-08-18 17:58 ` Zwane Mwaikambo 2020-09-08 18:41 ` Lyude Paul 2020-09-08 18:41 ` Lyude Paul
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87v9y0mept.fsf@intel.com \ --to=jani.nikula@linux.intel.com \ --cc=airlied@linux.ie \ --cc=dkwon@redhat.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=linux-kernel@vger.kernel.org \ --cc=tcamuso@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.