linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com,
	sakari.ailus@linux.intel.com, heikki.krogerus@linux.intel.com,
	gregkh@linuxfoundation.org
Subject: Re: kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary'
Date: Thu, 16 Jul 2020 20:59:27 +0300	[thread overview]
Message-ID: <824383bdd2dbb4ad09d2342792e11d418efdd60b.camel@redhat.com> (raw)
In-Reply-To: <20200716143411.GN3703480@smile.fi.intel.com>

On Thu, 2020-07-16 at 17:34 +0300, Andy Shevchenko wrote:
> On Thu, Jul 16, 2020 at 11:17:03AM +0300, Maxim Levitsky wrote:
> > Hi!
> > 
> > Few days ago I bisected a regression on 5.8 kernel:
> > 
> > I have nvidia rtx 2070s and its USB type C port driver (which is open source)
> > started to crash on load:
> 
> I'm looking at this, but I have questions:
> - any pointers to the device tree excerpt which this tries to iterate over
> - can you provide full Code: line?
> 
> Only way I see, why it happens, is that fwnode is not initialized properly
> somewhere (means it has garbage in the secondary pointer).
> 
> > [  +0.000043] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: P        W  O      5.8.0-rc3.stable #133
> > [  +0.000045] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
> > [  +0.000030] Workqueue: events_long ucsi_init_work [typec_ucsi]
> > [  +0.000048] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> > [  +0.000024] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48>
> > [  +0.000065] RSP: 0018:ffffc900038d7e08 EFLAGS: 00010246
> > [  +0.000044] RAX: ffff889fb6b62f00 RBX: 0000000000000000 RCX: 0000000000000001
> > [  +0.000027] RDX: ffff889fb6fd4a70 RSI: 0000000000000000 RDI: ffff889fb6b63608
> > [  +0.000046] RBP: 0000000000000000 R08: 0000000000000001 R09: 7fffffffffffffff
> > [  +0.000024] R10: 00002075ce282580 R11: 000000000062de3e R12: ffff889fb6b63608
> > [  +0.000043] R13: 0000000000010000 R14: ffff889fb6b63018 R15: 0000000000000001
> > [  +0.000044] FS:  0000000000000000(0000) GS:ffff889fbe4c0000(0000) knlGS:0000000000000000
> > [  +0.000024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  +0.000042] CR2: 0000000000000000 CR3: 000000175621b000 CR4: 0000000000340ea0
> > [  +0.000046] Call Trace:
> > [  +0.000030]  ucsi_init+0x213/0x530 [typec_ucsi]
> > [  +0.000028]  ucsi_init_work+0x12/0x20 [typec_ucsi]
> > [  +0.000049]  process_one_work+0x1d2/0x390
> > [  +0.000027]  worker_thread+0x4a/0x3b0
> > [  +0.000025]  ? process_one_work+0x390/0x390
> > [  +0.000049]  kthread+0xf9/0x130
> > [  +0.000026]  ? kthread_park+0x90/0x90
> > [  +0.000028]  ret_from_fork+0x1f/0x30
> > [  +0.000048] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom ntfs msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb tap xfs rfcomm xt_M>
> > [  +0.000039]  usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops crc32_pclmul libahci crc32c_intel igb ccp >
> > [  +0.000289] CR2: 0000000000000000
> > [  +0.000026] ---[ end trace 38ebb9aebd55fbff ]---
> > [  +0.014201] RIP: 0010:device_get_next_child_node+0x5b/0xb0
> > [  +0.000030] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48>
> > [  +0.000075] RSP: 0018:ffffc900038d7e08 EFLAGS: 00010246
> > [  +0.000027] RAX: ffff889fb6b62f00 RBX: 0000000000000000 RCX: 0000000000000001
> > [  +0.000048] RDX: ffff889fb6fd4a70 RSI: 0000000000000000 RDI: ffff889fb6b63608
> > [  +0.000049] RBP: 0000000000000000 R08: 0000000000000001 R09: 7fffffffffffffff
> > [  +0.000027] R10: 00002075ce282580 R11: 000000000062de3e R12: ffff889fb6b63608
> > [  +0.000049] R13: 0000000000010000 R14: ffff889fb6b63018 R15: 0000000000000001
> > [  +0.000050] FS:  0000000000000000(0000) GS:ffff889fbe4c0000(0000) knlGS:0000000000000000
> > [  +0.000027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  +0.000050] CR2: 0000000000000000 CR3: 000000175621b000 CR4: 0000000000340ea0
> > 
> > I bisected this, while passing the UCSI controller to a VM, and this
> > is the result:
> > 
> > git bisect start
> > # good: [3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162] Linux 5.7
> > git bisect good 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162
> > # bad: [48778464bb7d346b47157d21ffde2af6b2d39110] Linux 5.8-rc2
> > git bisect bad 48778464bb7d346b47157d21ffde2af6b2d39110
> > # good: [a98f670e41a99f53acb1fb33cee9c6abbb2e6f23] Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> > git bisect good a98f670e41a99f53acb1fb33cee9c6abbb2e6f23
> > # good: [081096d98bb23946f16215357b141c5616b234bf] Merge tag 'tty-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > git bisect good 081096d98bb23946f16215357b141c5616b234bf
> > # bad: [3a2a8751742133a7bbc49b9d1bcbd52e212edff6] Merge tag 'for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
> > git bisect bad 3a2a8751742133a7bbc49b9d1bcbd52e212edff6
> > # bad: [a1e81f9654eef650d3ee35c94a8cab00b5cd379c] m68k: implement flush_icache_user_range
> > git bisect bad a1e81f9654eef650d3ee35c94a8cab00b5cd379c
> > # good: [c336c022503d1be719ca06f2526c211709e3d2d3] staging: wfx: remove false positive warning
> > git bisect good c336c022503d1be719ca06f2526c211709e3d2d3
> > # good: [05c8a4fc44a916dd897769ca69b42381f9177ec4] habanalabs: correctly cast u64 to void*
> > git bisect good 05c8a4fc44a916dd897769ca69b42381f9177ec4
> > # good: [a3975dea1696b7c81319dc4b66e3c378dd47ccfb] Merge tag 'iio-for-5.8c' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
> > git bisect good a3975dea1696b7c81319dc4b66e3c378dd47ccfb
> > # bad: [f558b8364e19f9222e7976c64e9367f66bab02cc] Merge tag 'driver-core-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> > git bisect bad f558b8364e19f9222e7976c64e9367f66bab02cc
> > # good: [b6d90ef9a439b4ef73a350789bf766a1339a703d] staging: vchi: Get rid of not implemented function declarations
> > git bisect good b6d90ef9a439b4ef73a350789bf766a1339a703d
> > # good: [93d2e4322aa74c1ad1e8c2160608eb9a960d69ff] of: platform: Batch fwnode parsing when adding all top level devices
> > git bisect good 93d2e4322aa74c1ad1e8c2160608eb9a960d69ff
> > # bad: [c2c076166b5880eabe068ce1cab30bf6edeeea1a] firmware_loader: change enum fw_opt to u32
> > git bisect bad c2c076166b5880eabe068ce1cab30bf6edeeea1a
> > # bad: [2cd38fd15e4ebcfe917a443734820269f8b5ba2b] driver core: Remove unnecessary is_fwnode_dev variable in device_add()
> > git bisect bad 2cd38fd15e4ebcfe917a443734820269f8b5ba2b
> > # good: [c82c83c330654c5639960ebc3dabbae53c43f79e] driver core: platform: Fix spelling errors in platform.c
> > git bisect good c82c83c330654c5639960ebc3dabbae53c43f79e
> > # bad: [114dbb4fa7c4053a51964d112e2851e818e085c6] drivers property: When no children in primary, try secondary
> > git bisect bad 114dbb4fa7c4053a51964d112e2851e818e085c6
> > # first bad commit: [114dbb4fa7c4053a51964d112e2851e818e085c6] drivers property: When no children in primary, try secondary
> > 
> > 
> > Reverting the commit helped fix this oops.
> > 
> > My .config attached.
> > If any more info is needed I'll be happy to provide it,
> > and of course test patches.
> > 
> > Best regards,
> > 	Maxim Levitsky
> 
> 


Turns out that kernel has decode_stacktrace.sh. I always decoded the symbols manually.
I will send the decoded trace from now on in bug reports.

IMHO it would be usefull to include a pointer to it in the kernel oops report since many people like me don't know about this nice script.

[mlevitsk@starship ~/UPSTREAM/linux-kernel/work_area/ucsi_crash]$../../src/scripts/decode_stacktrace.sh ../../src/vmlinux ../../src/ ../../src/ < ./stacktrace.txt 
[  +0.000043] CPU: 19 PID: 31281 Comm: kworker/19:1 Tainted: P        W  O      5.8.0-rc3.stable #133
[  +0.000045] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
[  +0.000030] Workqueue: events_long ucsi_init_work [typec_ucsi]
[   +0.000048] RIP: 0010:device_get_next_child_node (/home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/base/property.c:715) 
[ +0.000024] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48>
All code
========
   0:	18 48 85             	sbb    %cl,-0x7b(%rax)
   3:	db 74 24 48          	(bad)  0x48(%rsp)
   7:	8b 43 08             	mov    0x8(%rbx),%eax
   a:	48 85 c0             	test   %rax,%rax
   d:	74 1b                	je     0x2a
   f:	48 8b 40 50          	mov    0x50(%rax),%rax
  13:	48 85 c0             	test   %rax,%rax
  16:	74 12                	je     0x2a
  18:	48 89 ee             	mov    %rbp,%rsi
  1b:	48 89 df             	mov    %rbx,%rdi
  1e:	ff d0                	callq  *%rax
  20:	48 85 c0             	test   %rax,%rax
  23:	74 05                	je     0x2a
  25:	5b                   	pop    %rbx
  26:	5d                   	pop    %rbp
  27:	41 5c                	pop    %r12
  29:	c3                   	retq   
  2a:*	48 8b 03             	mov    (%rbx),%rax		<-- trapping instruction
  2d:	48 85 c0             	test   %rax,%rax
  30:	74 f3                	je     0x25
  32:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 03             	mov    (%rbx),%rax
   3:	48 85 c0             	test   %rax,%rax
   6:	74 f3                	je     0xfffffffffffffffb
   8:	48                   	rex.W
[  +0.000065] RSP: 0018:ffffc900038d7e08 EFLAGS: 00010246
[  +0.000044] RAX: ffff889fb6b62f00 RBX: 0000000000000000 RCX: 0000000000000001
[  +0.000027] RDX: ffff889fb6fd4a70 RSI: 0000000000000000 RDI: ffff889fb6b63608
[  +0.000046] RBP: 0000000000000000 R08: 0000000000000001 R09: 7fffffffffffffff
[  +0.000024] R10: 00002075ce282580 R11: 000000000062de3e R12: ffff889fb6b63608
[  +0.000043] R13: 0000000000010000 R14: ffff889fb6b63018 R15: 0000000000000001
[  +0.000044] FS:  0000000000000000(0000) GS:ffff889fbe4c0000(0000) knlGS:0000000000000000
[  +0.000024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000042] CR2: 0000000000000000 CR3: 000000175621b000 CR4: 0000000000340ea0
[  +0.000046] Call Trace:
[   +0.000030] ucsi_init (/home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/usb/typec/ucsi/ucsi.c:932 /home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/usb/typec/ucsi/ucsi.c:1049) typec_ucsi
[   +0.000028] ucsi_init_work (/home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/usb/typec/ucsi/ucsi.c:1089) typec_ucsi
[   +0.000049] process_one_work (/home/mlevitsk/UPSTREAM/linux-kernel/src/./arch/x86/include/asm/jump_label.h:25 /home/mlevitsk/UPSTREAM/linux-kernel/src/./include/linux/jump_label.h:200
/home/mlevitsk/UPSTREAM/linux-kernel/src/./include/trace/events/workqueue.h:108 /home/mlevitsk/UPSTREAM/linux-kernel/src/kernel/workqueue.c:2274) 
[   +0.000027] worker_thread (/home/mlevitsk/UPSTREAM/linux-kernel/src/kernel/workqueue.c:2415) 
[   +0.000025] ? process_one_work (/home/mlevitsk/UPSTREAM/linux-kernel/src/kernel/workqueue.c:2462) 
[   +0.000049] kthread (/home/mlevitsk/UPSTREAM/linux-kernel/src/kernel/kthread.c:266) 
[   +0.000026] ? kthread_park (/home/mlevitsk/UPSTREAM/linux-kernel/src/kernel/kthread.c:1296) 
[   +0.000028] ret_from_fork (/home/mlevitsk/UPSTREAM/linux-kernel/src/arch/x86/entry/entry_64.S:299) 
[  +0.000048] Modules linked in: ucsi_ccg typec_ucsi typec hfsplus cdrom ntfs msdos vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb tap xfs rfcomm xt_M>
[  +0.000039]  usb_storage ext4 mbcache jbd2 amdgpu gpu_sched ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt fb_sys_fops crc32_pclmul libahci crc32c_intel igb ccp >
[  +0.000289] CR2: 0000000000000000
[  +0.000026] ---[ end trace 38ebb9aebd55fbff ]---
[   +0.014201] RIP: 0010:device_get_next_child_node (/home/mlevitsk/UPSTREAM/linux-kernel/src/drivers/base/property.c:715) 
[ +0.000030] Code: 18 48 85 db 74 24 48 8b 43 08 48 85 c0 74 1b 48 8b 40 50 48 85 c0 74 12 48 89 ee 48 89 df ff d0 48 85 c0 74 05 5b 5d 41 5c c3 <48> 8b 03 48 85 c0 74 f3 48>
All code
========
   0:	18 48 85             	sbb    %cl,-0x7b(%rax)
   3:	db 74 24 48          	(bad)  0x48(%rsp)
   7:	8b 43 08             	mov    0x8(%rbx),%eax
   a:	48 85 c0             	test   %rax,%rax
   d:	74 1b                	je     0x2a
   f:	48 8b 40 50          	mov    0x50(%rax),%rax
  13:	48 85 c0             	test   %rax,%rax
  16:	74 12                	je     0x2a
  18:	48 89 ee             	mov    %rbp,%rsi
  1b:	48 89 df             	mov    %rbx,%rdi
  1e:	ff d0                	callq  *%rax
  20:	48 85 c0             	test   %rax,%rax
  23:	74 05                	je     0x2a
  25:	5b                   	pop    %rbx
  26:	5d                   	pop    %rbp
  27:	41 5c                	pop    %r12
  29:	c3                   	retq   
  2a:*	48 8b 03             	mov    (%rbx),%rax		<-- trapping instruction
  2d:	48 85 c0             	test   %rax,%rax
  30:	74 f3                	je     0x25
  32:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	48 8b 03             	mov    (%rbx),%rax
   3:	48 85 c0             	test   %rax,%rax
   6:	74 f3                	je     0xfffffffffffffffb
   8:	48                   	rex.W
[  +0.000075] RSP: 0018:ffffc900038d7e08 EFLAGS: 00010246
[  +0.000027] RAX: ffff889fb6b62f00 RBX: 0000000000000000 RCX: 0000000000000001
[  +0.000048] RDX: ffff889fb6fd4a70 RSI: 0000000000000000 RDI: ffff889fb6b63608
[  +0.000049] RBP: 0000000000000000 R08: 0000000000000001 R09: 7fffffffffffffff
[  +0.000027] R10: 00002075ce282580 R11: 000000000062de3e R12: ffff889fb6b63608
[  +0.000049] R13: 0000000000010000 R14: ffff889fb6b63018 R15: 0000000000000001
[  +0.000050] FS:  0000000000000000(0000) GS:ffff889fbe4c0000(0000) knlGS:0000000000000000
[  +0.000027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000050] CR2: 0000000000000000 CR3: 000000175621b000 CR4: 0000000000340ea0



Best regards,
	Maxim Levitsky


  reply	other threads:[~2020-07-16 17:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-16  8:17 kernel oops in 'typec_ucsi' due to commit 'drivers property: When no children in primary, try secondary' Maxim Levitsky
2020-07-16  8:28 ` Greg KH
2020-07-16  8:47   ` Maxim Levitsky
2020-07-16 14:34 ` Andy Shevchenko
2020-07-16 17:59   ` Maxim Levitsky [this message]
2020-07-16 15:47 ` Andy Shevchenko
2020-07-16 18:00   ` Maxim Levitsky
2020-07-16 18:21     ` Andy Shevchenko
2020-07-16 18:22       ` Maxim Levitsky
2020-07-16 18:33         ` Andy Shevchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=824383bdd2dbb4ad09d2342792e11d418efdd60b.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sakari.ailus@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).