linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops in current tree in i2c
@ 2018-10-27 16:08 Linus Torvalds
  2018-10-27 16:19 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Linus Torvalds @ 2018-10-27 16:08 UTC (permalink / raw)
  To: Jiri Kosina, Julian Sax, Benjamin Tissoires, Hans de Goede
  Cc: linux-input, Linux Kernel Mailing List

Julian, Jiri,
 On my laptop I'm getting a kernel page fault with the current git
tree, and I'm tentatively blaming commit

  9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices")

but that's simply because it's the only thing that seems to touch this
particular area in this merge window.

The oops looks like this:

  BUG: unable to handle kernel paging request at 000000007a25d598
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 888 Comm: systemd-udevd Not tainted 4.19.0-07715-g345671ea0f92 #4
  Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.7.0 01/16/2018
  RIP: 0010:strstr+0x19/0x70

where the code disassembly (and the register contents) shows that the
wild pointer is the first argument to "strstr()", which just has a
bogus value that is not a valid kernel pointer (RDI: 000000007a25d598
- which is obviously also the address of the page fault)

The call trace is:

   dmi_matches+0x55/0xc0
   dmi_first_match+0x26/0x40
   i2c_hid_get_dmi_i2c_hid_desc_override+0x16/0x40 [i2c_hid]
   i2c_hid_probe+0x28c/0x760 [i2c_hid]
   i2c_device_probe+0x1e7/0x260
   really_probe+0xf8/0x3e0
   driver_probe_device+0x10f/0x120
   bus_for_each_drv+0x66/0xb0
   __device_attach+0xd9/0x150
   bus_probe_device+0x8a/0xa0
   device_add+0x48e/0x660
   i2c_new_device+0x162/0x350

which is why I suspect that new i2c_hid_get_dmi_hid_report_desc_override() code.

I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
isn't terminated by a NULL entry, and I will test that next.

What makes me *very* unhappy about this is that if I'm right, I think
it means that code was literally not tested at all by anybody who
didn't have one of the entries in that list.

                  Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2018-10-27 16:08 Oops in current tree in i2c Linus Torvalds
@ 2018-10-27 16:19 ` Linus Torvalds
  2018-10-27 16:20 ` Jiri Kosina
  2018-10-28 10:44 ` Hans de Goede
  2 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2018-10-27 16:19 UTC (permalink / raw)
  To: Jiri Kosina, Julian Sax, Benjamin Tissoires, Hans de Goede
  Cc: linux-input, Linux Kernel Mailing List

On Sat, Oct 27, 2018 at 9:08 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
> isn't terminated by a NULL entry, and I will test that next.

Confirmed.  That makes my laptop boot cleanly.

See commit b59dfdaef173 ("i2c-hid: properly terminate
i2c_hid_dmi_desc_override_table[] array")

                       Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2018-10-27 16:08 Oops in current tree in i2c Linus Torvalds
  2018-10-27 16:19 ` Linus Torvalds
@ 2018-10-27 16:20 ` Jiri Kosina
  2018-10-28 10:44 ` Hans de Goede
  2 siblings, 0 replies; 8+ messages in thread
From: Jiri Kosina @ 2018-10-27 16:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Julian Sax, Benjamin Tissoires, Hans de Goede, linux-input,
	Linux Kernel Mailing List

On Sat, 27 Oct 2018, Linus Torvalds wrote:

> I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
> isn't terminated by a NULL entry, and I will test that next.

Hm, that almost certainly is indeed the issue, thanks a lot for reporting 
it.

> What makes me *very* unhappy about this is that if I'm right, I think it 
> means that code was literally not tested at all by anybody who didn't 
> have one of the entries in that list.

Honestly, my autotesting of HID tree is running on HW that doesn't have 
i2c transport at all, only USB a Bluetooth. Something to improve I guess; 
will fix that next week.

Benjamin, do you have something for that in place already?

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2018-10-27 16:08 Oops in current tree in i2c Linus Torvalds
  2018-10-27 16:19 ` Linus Torvalds
  2018-10-27 16:20 ` Jiri Kosina
@ 2018-10-28 10:44 ` Hans de Goede
  2018-10-29  8:36   ` Benjamin Tissoires
  2 siblings, 1 reply; 8+ messages in thread
From: Hans de Goede @ 2018-10-28 10:44 UTC (permalink / raw)
  To: Linus Torvalds, Jiri Kosina, Julian Sax, Benjamin Tissoires
  Cc: linux-input, Linux Kernel Mailing List

Hi Linus,

On 27-10-18 18:08, Linus Torvalds wrote:
> Julian, Jiri,
>   On my laptop I'm getting a kernel page fault with the current git
> tree, and I'm tentatively blaming commit
> 
>    9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices")
> 
> but that's simply because it's the only thing that seems to touch this
> particular area in this merge window.
> 
> The oops looks like this:
> 
>    BUG: unable to handle kernel paging request at 000000007a25d598
>    PGD 0 P4D 0
>    Oops: 0000 [#1] SMP PTI
>    CPU: 1 PID: 888 Comm: systemd-udevd Not tainted 4.19.0-07715-g345671ea0f92 #4
>    Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.7.0 01/16/2018
>    RIP: 0010:strstr+0x19/0x70
> 
> where the code disassembly (and the register contents) shows that the
> wild pointer is the first argument to "strstr()", which just has a
> bogus value that is not a valid kernel pointer (RDI: 000000007a25d598
> - which is obviously also the address of the page fault)
> 
> The call trace is:
> 
>     dmi_matches+0x55/0xc0
>     dmi_first_match+0x26/0x40
>     i2c_hid_get_dmi_i2c_hid_desc_override+0x16/0x40 [i2c_hid]
>     i2c_hid_probe+0x28c/0x760 [i2c_hid]
>     i2c_device_probe+0x1e7/0x260
>     really_probe+0xf8/0x3e0
>     driver_probe_device+0x10f/0x120
>     bus_for_each_drv+0x66/0xb0
>     __device_attach+0xd9/0x150
>     bus_probe_device+0x8a/0xa0
>     device_add+0x48e/0x660
>     i2c_new_device+0x162/0x350
> 
> which is why I suspect that new i2c_hid_get_dmi_hid_report_desc_override() code.
> 
> I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
> isn't terminated by a NULL entry, and I will test that next.

Yes that likely is the problem. I already had a bug report from one of the
Manjaro maintainers who was cherry picking this into the Manjaro kernel.

So I ran some tests on a laptop of mine which does use i2c-hid but I
failed to reproduce the issue, so we both (me and the Manjaro maintainer)
both assumed something went wrong with the backport.

Both of us seem to have overlooked the missing terminating entry,
as well as other people involved in the patch.

> What makes me *very* unhappy about this is that if I'm right, I think
> it means that code was literally not tested at all by anybody who
> didn't have one of the entries in that list.

That is not true, I've hit one of these unterminated dmi lists
issues before and it depends on what get put in mem directly
after the list by the linker, bugs caused by this do not always
reproduce unfortunately.

And as mentioned I have tested the patch on a machine with an i2c-hid
touchpad, which is not on the list and I did not hit this problem.

Anyways this is fixed now, thank you for catching this.

Regards,

Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2018-10-28 10:44 ` Hans de Goede
@ 2018-10-29  8:36   ` Benjamin Tissoires
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Tissoires @ 2018-10-29  8:36 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Linus Torvalds, jkosina, Julian Sax, open list:HID CORE LAYER, lkml

Hi,

On Sun, Oct 28, 2018 at 11:45 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi Linus,
>
> On 27-10-18 18:08, Linus Torvalds wrote:
> > Julian, Jiri,
> >   On my laptop I'm getting a kernel page fault with the current git
> > tree, and I'm tentatively blaming commit
> >
> >    9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices")
> >
> > but that's simply because it's the only thing that seems to touch this
> > particular area in this merge window.
> >
> > The oops looks like this:
> >
> >    BUG: unable to handle kernel paging request at 000000007a25d598
> >    PGD 0 P4D 0
> >    Oops: 0000 [#1] SMP PTI
> >    CPU: 1 PID: 888 Comm: systemd-udevd Not tainted 4.19.0-07715-g345671ea0f92 #4
> >    Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.7.0 01/16/2018
> >    RIP: 0010:strstr+0x19/0x70
> >
> > where the code disassembly (and the register contents) shows that the
> > wild pointer is the first argument to "strstr()", which just has a
> > bogus value that is not a valid kernel pointer (RDI: 000000007a25d598
> > - which is obviously also the address of the page fault)
> >
> > The call trace is:
> >
> >     dmi_matches+0x55/0xc0
> >     dmi_first_match+0x26/0x40
> >     i2c_hid_get_dmi_i2c_hid_desc_override+0x16/0x40 [i2c_hid]
> >     i2c_hid_probe+0x28c/0x760 [i2c_hid]
> >     i2c_device_probe+0x1e7/0x260
> >     really_probe+0xf8/0x3e0
> >     driver_probe_device+0x10f/0x120
> >     bus_for_each_drv+0x66/0xb0
> >     __device_attach+0xd9/0x150
> >     bus_probe_device+0x8a/0xa0
> >     device_add+0x48e/0x660
> >     i2c_new_device+0x162/0x350
> >
> > which is why I suspect that new i2c_hid_get_dmi_hid_report_desc_override() code.
> >
> > I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
> > isn't terminated by a NULL entry, and I will test that next.

Thanks for fixing that in master. I should have spotted it while
reviewing the series. My bad.

>
> Yes that likely is the problem. I already had a bug report from one of the
> Manjaro maintainers who was cherry picking this into the Manjaro kernel.
>
> So I ran some tests on a laptop of mine which does use i2c-hid but I
> failed to reproduce the issue, so we both (me and the Manjaro maintainer)
> both assumed something went wrong with the backport.
>
> Both of us seem to have overlooked the missing terminating entry,
> as well as other people involved in the patch.
>
> > What makes me *very* unhappy about this is that if I'm right, I think
> > it means that code was literally not tested at all by anybody who
> > didn't have one of the entries in that list.
>
> That is not true, I've hit one of these unterminated dmi lists
> issues before and it depends on what get put in mem directly
> after the list by the linker, bugs caused by this do not always
> reproduce unfortunately.
>
> And as mentioned I have tested the patch on a machine with an i2c-hid
> touchpad, which is not on the list and I did not hit this problem.

Oh. This explains why I was also sure I tested this on a i2c-hid
laptop without having the oops.

I'll try to setup some automate testing for those also.

Cheers,
Benjamin

>
> Anyways this is fixed now, thank you for catching this.
>
> Regards,
>
> Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2019-04-24 17:04 ` Greg KH
@ 2019-04-24 17:30   ` Ambrož Bizjak
  0 siblings, 0 replies; 8+ messages in thread
From: Ambrož Bizjak @ 2019-04-24 17:30 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

Thanks! I just tested the patch with 4.19.36 and it works for me.

On Wed, Apr 24, 2019 at 7:04 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Apr 24, 2019 at 06:59:09PM +0200, Ambrož Bizjak wrote:
> > Hi everyone,
> >
> > This buggy i2c patch (see https://lkml.org/lkml/2018/10/27/248) has
> > just landed in stable 4.19.36 without the appropriate fix, causing
> > oops on boot on many machines. For example we're experiencing this
> > issue with the NixOS distribution
> > (https://github.com/NixOS/nixpkgs/issues/60126). I am currently trying
> > the fix (https://github.com/torvalds/linux/commit/b59dfdaef173677b0b7e10f375226c0a1114fd20)
> > and will report if fixes the crash.
>
> Good catch, that fix is needed, will go queue that up now, thanks!
>
> greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
  2019-04-24 16:59 Ambrož Bizjak
@ 2019-04-24 17:04 ` Greg KH
  2019-04-24 17:30   ` Ambrož Bizjak
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2019-04-24 17:04 UTC (permalink / raw)
  To: Ambrož Bizjak; +Cc: linux-kernel

On Wed, Apr 24, 2019 at 06:59:09PM +0200, Ambrož Bizjak wrote:
> Hi everyone,
> 
> This buggy i2c patch (see https://lkml.org/lkml/2018/10/27/248) has
> just landed in stable 4.19.36 without the appropriate fix, causing
> oops on boot on many machines. For example we're experiencing this
> issue with the NixOS distribution
> (https://github.com/NixOS/nixpkgs/issues/60126). I am currently trying
> the fix (https://github.com/torvalds/linux/commit/b59dfdaef173677b0b7e10f375226c0a1114fd20)
> and will report if fixes the crash.

Good catch, that fix is needed, will go queue that up now, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Oops in current tree in i2c
@ 2019-04-24 16:59 Ambrož Bizjak
  2019-04-24 17:04 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Ambrož Bizjak @ 2019-04-24 16:59 UTC (permalink / raw)
  To: linux-kernel, Greg KH

Hi everyone,

This buggy i2c patch (see https://lkml.org/lkml/2018/10/27/248) has
just landed in stable 4.19.36 without the appropriate fix, causing
oops on boot on many machines. For example we're experiencing this
issue with the NixOS distribution
(https://github.com/NixOS/nixpkgs/issues/60126). I am currently trying
the fix (https://github.com/torvalds/linux/commit/b59dfdaef173677b0b7e10f375226c0a1114fd20)
and will report if fixes the crash.

Best regards,
Ambroz Bizjak

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-24 17:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-27 16:08 Oops in current tree in i2c Linus Torvalds
2018-10-27 16:19 ` Linus Torvalds
2018-10-27 16:20 ` Jiri Kosina
2018-10-28 10:44 ` Hans de Goede
2018-10-29  8:36   ` Benjamin Tissoires
2019-04-24 16:59 Ambrož Bizjak
2019-04-24 17:04 ` Greg KH
2019-04-24 17:30   ` Ambrož Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).