All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-13 20:04 Meelis Roos
  2018-02-14 14:29   ` Meelis Roos
  0 siblings, 1 reply; 11+ messages in thread
From: Meelis Roos @ 2018-02-13 20:04 UTC (permalink / raw)
  To: nouveau, dri-devel, Ben Skeggs, Linux Kernel list

This is 4.16-rc1+todays git ona lowly P4 with NV5, worked fine in 4.15:

[    7.361155] nouveau 0000:01:00.0: NVIDIA NV05 (20154000)
[    7.386601] nouveau 0000:01:00.0: bios: version 02.05.19.03.00
[    7.386715] nouveau 0000:01:00.0: bios: DCB table not found
[    7.386983] nouveau 0000:01:00.0: bios: DCB table not found
[    7.387166] nouveau 0000:01:00.0: bios: DCB table not found
[    7.387266] nouveau 0000:01:00.0: bios: DCB table not found
[    7.397578] agpgart-intel 0000:00:00.0: AGP 2.0 bridge
[    7.397705] agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode
[    7.397827] nouveau 0000:01:00.0: putting AGP V2 device into 4x mode
[    7.398021] ================================================================================
[    7.398163] UBSAN: Undefined behaviour in drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c:315:12
[    7.398302] member access within null pointer of type 'struct nvkm_therm'
[    7.398403] CPU: 0 PID: 125 Comm: systemd-udevd Not tainted 4.16.0-rc1-00010-g178e834c47b0 #65
[    7.398543] Hardware name:  /D850GB                         , BIOS GB85010A.86A.0078.P18.0110081719 10/08/2001
[    7.398686] Call Trace:
[    7.398788]  dump_stack+0x16/0x18
[    7.398885]  ubsan_epilogue+0xe/0x2f
[    7.398979]  ubsan_type_mismatch_common+0xdc/0x152
[    7.399079]  __ubsan_handle_type_mismatch+0x24/0x26
[    7.399368]  nvkm_therm_clkgate_fini+0x14d/0x174 [nouveau]
[    7.399638]  ? nvkm_device_subdev+0x1b9/0x1fa [nouveau]
[    7.399907]  nvkm_device_fini+0x113/0x3e9 [nouveau]
[    7.400010]  ? ktime_get+0x4b/0x135
[    7.400253]  ? nvkm_devinit_post+0x35/0xbf [nouveau]
[    7.400519]  nvkm_device_init+0x228/0x5b0 [nouveau]
[    7.400626]  ? kmem_cache_alloc+0xbd/0x12a
[    7.400893]  nvkm_udevice_init+0x51/0xa9 [nouveau]
[    7.401137]  nvkm_object_init+0xc8/0x442 [nouveau]
[    7.401244]  ? check_preempt_wakeup+0xc2/0x1c1
[    7.401487]  ? nvkm_client_child_new+0x1d/0x38 [nouveau]
[    7.401729]  nvkm_ioctl_new+0x152/0x3d9 [nouveau]
[    7.401835]  ? default_wake_function+0x1a/0x35
[    7.402077]  ? nvif_vmm_init+0x2ce/0x2ce [nouveau]
[    7.402345]  ? nvkm_udevice_rd08+0x5b/0x5b [nouveau]
[    7.402587]  nvkm_ioctl+0x1c6/0x48d [nouveau]
[    7.402829]  ? nvif_client_init+0xc3/0x114 [nouveau]
[    7.403094]  ? nvkm_client_map+0xf/0xf [nouveau]
[    7.403382]  nvkm_client_ioctl+0x1c/0x22 [nouveau]
[    7.403643]  nvif_object_ioctl+0x6f/0xff [nouveau]
[    7.403903]  nvif_object_init+0xd4/0x1de [nouveau]
[    7.404164]  nvif_device_init+0x21/0x5c [nouveau]
[    7.404453]  nouveau_cli_init+0x21f/0xe1f [nouveau]
[    7.404733]  ? nouveau_drm_load+0x1d/0xe11 [nouveau]
[    7.405011]  nouveau_drm_load+0x54/0xe11 [nouveau]
[    7.405112]  ? kernfs_new_node+0x2b/0x8e
[    7.405209]  ? kernfs_create_link+0x55/0xcd
[    7.405323]  ? drm_dev_register+0x12f/0x2e0 [drm]
[    7.405437]  drm_dev_register+0x168/0x2e0 [drm]
[    7.405538]  ? pci_enable_device_flags+0xeb/0x15e
[    7.405651]  drm_get_pci_dev+0xbf/0x230 [drm]
[    7.405924]  nouveau_drm_probe+0x183/0x1ea [nouveau]
[    7.406035]  pci_device_probe+0xaa/0x163
[    7.406136]  driver_probe_device+0x1db/0x383
[    7.406234]  __driver_attach+0x86/0xb8
[    7.406330]  ? driver_probe_device+0x383/0x383
[    7.406427]  bus_for_each_dev+0x4e/0x83
[    7.406522]  driver_attach+0x1d/0x33
[    7.406618]  ? driver_probe_device+0x383/0x383
[    7.406714]  bus_add_driver+0x184/0x273
[    7.406810]  driver_register+0x66/0x107
[    7.407039]  ? nouveau_drm_init+0x66/0x1000 [nouveau]
[    7.407146]  __pci_register_driver+0x47/0x71
[    7.407379]  nouveau_drm_init+0x18a/0x1000 [nouveau]
[    7.407478]  ? 0xf831a000
[    7.407575]  do_one_initcall+0x4f/0x1e2
[    7.407672]  ? free_unref_page_commit.isra.88+0xd5/0x176
[    7.407771]  ? kvfree+0x3c/0x3e
[    7.407864]  ? __vunmap+0x89/0xef
[    7.407960]  ? do_init_module+0x1a/0x23f
[    7.408055]  do_init_module+0x82/0x23f
[    7.408153]  load_module+0x243c/0x36ae
[    7.408253]  ? kernel_read+0x4c/0xa1
[    7.408350]  SyS_finit_module+0x78/0x8d
[    7.408447]  do_fast_syscall_32+0xc1/0x31b
[    7.408545]  entry_SYSENTER_32+0x4e/0x7c
[    7.408640] EIP: 0xb7ee9ad5
[    7.408730] EFLAGS: 00000296 CPU: 0
[    7.408823] EAX: ffffffda EBX: 00000019 ECX: b7ce0bdd EDX: 00000000
[    7.408920] ESI: 00eb6670 EDI: 00ebe610 EBP: 00000000 ESP: bff8704c
[    7.409017]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[    7.409113] ================================================================================
[    7.409344] BUG: unable to handle kernel NULL pointer dereference at   (null)
[    7.409640] IP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau]
[    7.409738] *pde = 00000000 
[    7.409833] Oops: 0000 [#1]
[    7.409923] Modules linked in: nouveau(+) evdev wmi video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops uhci_hcd ttm ehci_hcd usbcore drm pcspkr psmouse sr_mod cdrom sg drm_panel_orientation_quirks parport_pc floppy i2c_i801 parport usb_common snd_intel8x0 snd_ac97_codec button rng_core ac97_bus snd_pcm snd_timer snd soundcore eeprom adm1031 adm1025 hwmon_vid i2c_core ip_tables x_tables ipv6 autofs4
[    7.410357] CPU: 0 PID: 125 Comm: systemd-udevd Not tainted 4.16.0-rc1-00010-g178e834c47b0 #65
[    7.410499] Hardware name:  /D850GB                         , BIOS GB85010A.86A.0078.P18.0110081719 10/08/2001
[    7.410824] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau]
[    7.410921] EFLAGS: 00010286 CPU: 0
[    7.411014] EAX: f6b3b800 EBX: 00000000 ECX: 00000006 EDX: 00000007
[    7.411109] ESI: 00000000 EDI: 00000000 EBP: f6155858 ESP: f6155834
[    7.411205]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[    7.411299] CR0: 80050033 CR2: 00000000 CR3: 3614b000 CR4: 000006d0
[    7.411395] Call Trace:
[    7.411662]  ? nvkm_device_subdev+0x1b9/0x1fa [nouveau]
[    7.411926]  nvkm_device_fini+0x113/0x3e9 [nouveau]
[    7.412030]  ? ktime_get+0x4b/0x135
[    7.412274]  ? nvkm_devinit_post+0x35/0xbf [nouveau]
[    7.412536]  nvkm_device_init+0x228/0x5b0 [nouveau]
[    7.412640]  ? kmem_cache_alloc+0xbd/0x12a
[    7.412906]  nvkm_udevice_init+0x51/0xa9 [nouveau]
[    7.413146]  nvkm_object_init+0xc8/0x442 [nouveau]
[    7.413248]  ? check_preempt_wakeup+0xc2/0x1c1
[    7.413602]  ? nvkm_client_child_new+0x1d/0x38 [nouveau]
[    7.413956]  nvkm_ioctl_new+0x152/0x3d9 [nouveau]
[    7.414055]  ? default_wake_function+0x1a/0x35
[    7.414409]  ? nvif_vmm_init+0x2ce/0x2ce [nouveau]
[    7.414788]  ? nvkm_udevice_rd08+0x5b/0x5b [nouveau]
[    7.415150]  nvkm_ioctl+0x1c6/0x48d [nouveau]
[    7.416466]  ? nvif_client_init+0xc3/0x114 [nouveau]
[    7.416832]  ? nvkm_client_map+0xf/0xf [nouveau]
[    7.417201]  nvkm_client_ioctl+0x1c/0x22 [nouveau]
[    7.417554]  nvif_object_ioctl+0x6f/0xff [nouveau]
[    7.417909]  nvif_object_init+0xd4/0x1de [nouveau]
[    7.418271]  nvif_device_init+0x21/0x5c [nouveau]
[    7.418536]  nouveau_cli_init+0x21f/0xe1f [nouveau]
[    7.418799]  ? nouveau_drm_load+0x1d/0xe11 [nouveau]
[    7.419058]  nouveau_drm_load+0x54/0xe11 [nouveau]
[    7.419158]  ? kernfs_new_node+0x2b/0x8e
[    7.419255]  ? kernfs_create_link+0x55/0xcd
[    7.419369]  ? drm_dev_register+0x12f/0x2e0 [drm]
[    7.419496]  drm_dev_register+0x168/0x2e0 [drm]
[    7.419596]  ? pci_enable_device_flags+0xeb/0x15e
[    7.419724]  drm_get_pci_dev+0xbf/0x230 [drm]
[    7.420102]  nouveau_drm_probe+0x183/0x1ea [nouveau]
[    7.420207]  pci_device_probe+0xaa/0x163
[    7.420305]  driver_probe_device+0x1db/0x383
[    7.420402]  __driver_attach+0x86/0xb8
[    7.420497]  ? driver_probe_device+0x383/0x383
[    7.420597]  bus_for_each_dev+0x4e/0x83
[    7.420694]  driver_attach+0x1d/0x33
[    7.420790]  ? driver_probe_device+0x383/0x383
[    7.420886]  bus_add_driver+0x184/0x273
[    7.420983]  driver_register+0x66/0x107
[    7.421215]  ? nouveau_drm_init+0x66/0x1000 [nouveau]
[    7.421322]  __pci_register_driver+0x47/0x71
[    7.421555]  nouveau_drm_init+0x18a/0x1000 [nouveau]
[    7.421654]  ? 0xf831a000
[    7.421751]  do_one_initcall+0x4f/0x1e2
[    7.421850]  ? free_unref_page_commit.isra.88+0xd5/0x176
[    7.421947]  ? kvfree+0x3c/0x3e
[    7.422041]  ? __vunmap+0x89/0xef
[    7.422136]  ? do_init_module+0x1a/0x23f
[    7.422232]  do_init_module+0x82/0x23f
[    7.422329]  load_module+0x243c/0x36ae
[    7.422428]  ? kernel_read+0x4c/0xa1
[    7.422524]  SyS_finit_module+0x78/0x8d
[    7.422624]  do_fast_syscall_32+0xc1/0x31b
[    7.422722]  entry_SYSENTER_32+0x4e/0x7c
[    7.422817] EIP: 0xb7ee9ad5
[    7.422907] EFLAGS: 00000296 CPU: 0
[    7.423001] EAX: ffffffda EBX: 00000019 ECX: b7ce0bdd EDX: 00000000
[    7.423098] ESI: 00eb6670 EDI: 00ebe610 EBP: 00000000 ESP: bff8704c
[    7.423195]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[    7.423291] Code: e9 30 ff ff ff 31 d2 b8 78 cf b0 f8 e8 ba 07 a2 c8 e9 0f ff ff ff 55 89 e5 57 56 53 83 ec 18 89 c3 89 d6 85 c0 0f 84 2c 01 00 00 <8b> 3b 85 ff 0f 84 11 01 00 00 8b 47 30 85 c0 0f 84 a1 00 00 00
[    7.423757] EIP: nvkm_therm_clkgate_fini+0x15/0x174 [nouveau] SS:ESP: 0068:f6155834
[    7.423899] CR2: 0000000000000000
[    7.424033] ---[ end trace cad535783d11d7b9 ]---

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
  2018-02-13 20:04 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini Meelis Roos
@ 2018-02-14 14:29   ` Meelis Roos
  0 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2018-02-14 14:29 UTC (permalink / raw)
  To: nouveau, dri-devel, Ben Skeggs, Linux Kernel list

> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:

NV5 in another PC (secondary card in x86-64) made the systrem crash on 
boot, in nvkm_therm_clkgate_fini.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-14 14:29   ` Meelis Roos
  0 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2018-02-14 14:29 UTC (permalink / raw)
  To: nouveau, dri-devel, Ben Skeggs, Linux Kernel list

> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:

NV5 in another PC (secondary card in x86-64) made the systrem crash on 
boot, in nvkm_therm_clkgate_fini.

-- 
Meelis Roos (mroos@linux.ee)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
  2018-02-14 14:29   ` Meelis Roos
@ 2018-02-14 14:35     ` Ilia Mirkin
  -1 siblings, 0 replies; 11+ messages in thread
From: Ilia Mirkin @ 2018-02-14 14:35 UTC (permalink / raw)
  To: Meelis Roos; +Cc: nouveau, dri-devel, Ben Skeggs, Linux Kernel list

On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>
> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> boot, in nvkm_therm_clkgate_fini.

Mind booting with nouveau.debug=trace? That should hopefully tell us
more exactly which thing is dying. If you have a cross-compile/distcc
setup handy, a bisect may be even more useful.

It's funny, I had a NV5 plugged into my desktop for testing, and
*just* took it out (because the box wouldn't even get to BIOS anymore
... although it was unrelated to the NV5, probably just something
mis-seated.)

  -ilia

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-14 14:35     ` Ilia Mirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Ilia Mirkin @ 2018-02-14 14:35 UTC (permalink / raw)
  To: Meelis Roos; +Cc: nouveau, Ben Skeggs, dri-devel, Linux Kernel list

On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>
> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> boot, in nvkm_therm_clkgate_fini.

Mind booting with nouveau.debug=trace? That should hopefully tell us
more exactly which thing is dying. If you have a cross-compile/distcc
setup handy, a bisect may be even more useful.

It's funny, I had a NV5 plugged into my desktop for testing, and
*just* took it out (because the box wouldn't even get to BIOS anymore
... although it was unrelated to the NV5, probably just something
mis-seated.)

  -ilia
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
  2018-02-14 14:35     ` Ilia Mirkin
@ 2018-02-14 14:36       ` Ilia Mirkin
  -1 siblings, 0 replies; 11+ messages in thread
From: Ilia Mirkin @ 2018-02-14 14:36 UTC (permalink / raw)
  To: Meelis Roos; +Cc: nouveau, dri-devel, Ben Skeggs, Linux Kernel list

On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
>>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>>
>> NV5 in another PC (secondary card in x86-64) made the systrem crash on
>> boot, in nvkm_therm_clkgate_fini.
>
> Mind booting with nouveau.debug=trace? That should hopefully tell us
> more exactly which thing is dying. If you have a cross-compile/distcc
> setup handy, a bisect may be even more useful.

Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
somehow mis-hooked up for NV5 now. A bisect result would still make
the culprit a lot more obvious.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-14 14:36       ` Ilia Mirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Ilia Mirkin @ 2018-02-14 14:36 UTC (permalink / raw)
  To: Meelis Roos; +Cc: nouveau, Ben Skeggs, dri-devel, Linux Kernel list

On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
>>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>>
>> NV5 in another PC (secondary card in x86-64) made the systrem crash on
>> boot, in nvkm_therm_clkgate_fini.
>
> Mind booting with nouveau.debug=trace? That should hopefully tell us
> more exactly which thing is dying. If you have a cross-compile/distcc
> setup handy, a bisect may be even more useful.

Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
somehow mis-hooked up for NV5 now. A bisect result would still make
the culprit a lot more obvious.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
  2018-02-14 14:36       ` Ilia Mirkin
  (?)
@ 2018-02-14 17:41       ` Pierre Moreau
  2018-02-14 19:11           ` Lyude Paul
  -1 siblings, 1 reply; 11+ messages in thread
From: Pierre Moreau @ 2018-02-14 17:41 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: Lyude Paul, Meelis Roos, nouveau, Ben Skeggs, dri-devel,
	Linux Kernel list

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

On 2018-02-14 — 09:36, Ilia Mirkin wrote:
> On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
> >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
> >>
> >> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> >> boot, in nvkm_therm_clkgate_fini.
> >
> > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > more exactly which thing is dying. If you have a cross-compile/distcc
> > setup handy, a bisect may be even more useful.
> 
> Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> somehow mis-hooked up for NV5 now. A bisect result would still make
> the culprit a lot more obvious.

CC’ing Lyude Paul as she hooked up the clockgating support.

Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t
nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
their check for the clkgate_* hooks being there? Or instead, maybe have the
check in nvkm_device_init() nvkm_device_init()?

Pierre

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-14 19:11           ` Lyude Paul
  0 siblings, 0 replies; 11+ messages in thread
From: Lyude Paul @ 2018-02-14 19:11 UTC (permalink / raw)
  To: Pierre Moreau, Ilia Mirkin
  Cc: Meelis Roos, nouveau, Ben Skeggs, dri-devel, Linux Kernel list

Actually this was brought up to me already, there's a fix on the mailing list
for this I reviewed a little while ago from nvidia that we should pull in:

https://patchwork.freedesktop.org/patch/203205/

Would you guys mind confirming that this patch fixes your issues?

On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote:
> On 2018-02-14 — 09:36, Ilia Mirkin wrote:
> > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
> > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in
> > > > > 4.15:
> > > > 
> > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on
> > > > boot, in nvkm_therm_clkgate_fini.
> > > 
> > > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > > more exactly which thing is dying. If you have a cross-compile/distcc
> > > setup handy, a bisect may be even more useful.
> > 
> > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> > somehow mis-hooked up for NV5 now. A bisect result would still make
> > the culprit a lot more obvious.
> 
> CC’ing Lyude Paul as she hooked up the clockgating support.
> 
> Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t
> nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
> nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
> their check for the clkgate_* hooks being there? Or instead, maybe have the
> check in nvkm_device_init() nvkm_device_init()?
> 
> Pierre
-- 
Cheers,
	Lyude Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
@ 2018-02-14 19:11           ` Lyude Paul
  0 siblings, 0 replies; 11+ messages in thread
From: Lyude Paul @ 2018-02-14 19:11 UTC (permalink / raw)
  To: Pierre Moreau, Ilia Mirkin
  Cc: nouveau, Meelis Roos, Ben Skeggs, dri-devel, Linux Kernel list

Actually this was brought up to me already, there's a fix on the mailing list
for this I reviewed a little while ago from nvidia that we should pull in:

https://patchwork.freedesktop.org/patch/203205/

Would you guys mind confirming that this patch fixes your issues?

On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote:
> On 2018-02-14 — 09:36, Ilia Mirkin wrote:
> > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@linux.ee> wrote:
> > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in
> > > > > 4.15:
> > > > 
> > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on
> > > > boot, in nvkm_therm_clkgate_fini.
> > > 
> > > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > > more exactly which thing is dying. If you have a cross-compile/distcc
> > > setup handy, a bisect may be even more useful.
> > 
> > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> > somehow mis-hooked up for NV5 now. A bisect result would still make
> > the culprit a lot more obvious.
> 
> CC’ing Lyude Paul as she hooked up the clockgating support.
> 
> Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t
> nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
> nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
> their check for the clkgate_* hooks being there? Or instead, maybe have the
> check in nvkm_device_init() nvkm_device_init()?
> 
> Pierre
-- 
Cheers,
	Lyude Paul
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
  2018-02-14 19:11           ` Lyude Paul
  (?)
@ 2018-02-14 20:59           ` Meelis Roos
  -1 siblings, 0 replies; 11+ messages in thread
From: Meelis Roos @ 2018-02-14 20:59 UTC (permalink / raw)
  To: Lyude Paul
  Cc: Pierre Moreau, Ilia Mirkin, nouveau, Ben Skeggs, dri-devel,
	Linux Kernel list

> Actually this was brought up to me already, there's a fix on the mailing list
> for this I reviewed a little while ago from nvidia that we should pull in:
> 
> https://patchwork.freedesktop.org/patch/203205/
> 
> Would you guys mind confirming that this patch fixes your issues?

It works on my amd64, P4 is still compiling.

[    1.124987] nouveau 0000:04:05.0: NVIDIA NV05 (20154000)
[    1.161464] nouveau 0000:04:05.0: bios: version 03.05.00.10.00
[    1.161475] nouveau 0000:04:05.0: bios: DCB table not found
[    1.161535] nouveau 0000:04:05.0: bios: DCB table not found
[    1.161577] nouveau 0000:04:05.0: bios: DCB table not found
[    1.161586] nouveau 0000:04:05.0: bios: DCB table not found
[    1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz
[    1.344024] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb67c69f81, max_idle_ns: 440795210317 ns
[    1.344037] clocksource: Switched to clocksource tsc
[    1.408102] nouveau 0000:04:05.0: tmr: unknown input clock freq
[    1.409471] nouveau 0000:04:05.0: fb: 32 MiB SDRAM
[    1.414459] nouveau 0000:04:05.0: DRM: VRAM: 31 MiB
[    1.414467] nouveau 0000:04:05.0: DRM: GART: 128 MiB
[    1.414476] nouveau 0000:04:05.0: DRM: BMP version 5.17
[    1.414484] nouveau 0000:04:05.0: DRM: No DCB data found in VBIOS
[    1.415629] nouveau 0000:04:05.0: DRM: Adaptor not initialised, running VBIOS init tables.
[    1.415829] nouveau 0000:04:05.0: bios: DCB table not found
[    1.416125] nouveau 0000:04:05.0: DRM: Saving VGA fonts
[    1.477526] nouveau 0000:04:05.0: DRM: No DCB data found in VBIOS
[    1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.478438] [drm] Driver supports precise vblank timestamp query.
[    1.479618] nouveau 0000:04:05.0: DRM: MM: using M2MF for buffer copies
[    1.517930] nouveau 0000:04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo 00000000a09f4d1f
[    1.519294] nouveau 0000:04:05.0: fb1: nouveaufb frame buffer device
[    1.519313] [drm] Initialized nouveau 1.3.1 20120801 for 0000:04:05.0 on minor 1


-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-14 20:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-13 20:04 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini Meelis Roos
2018-02-14 14:29 ` Meelis Roos
2018-02-14 14:29   ` Meelis Roos
2018-02-14 14:35   ` [Nouveau] " Ilia Mirkin
2018-02-14 14:35     ` Ilia Mirkin
2018-02-14 14:36     ` Ilia Mirkin
2018-02-14 14:36       ` Ilia Mirkin
2018-02-14 17:41       ` Pierre Moreau
2018-02-14 19:11         ` Lyude Paul
2018-02-14 19:11           ` Lyude Paul
2018-02-14 20:59           ` [Nouveau] " Meelis Roos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.