linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 3.4.109 - unable to handle kernel NULL pointer dereference at (null)
@ 2015-10-01 21:07 Steven Rostedt
  2015-10-04  4:26 ` Cal Peake
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2015-10-01 21:07 UTC (permalink / raw)
  To: Zefan Li; +Cc: stable, LKML

[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]


I merged 3.4.109 into 3.4-rt, and it bugged. I then booted 3.4.109
vanilla and it bugged too. 3.4.108 is fine.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 76c22067 PUD 78329067 PMD 0
Oops: 0010 [#1] PREEMPT SMP
Dumping ftrace buffer:
   (ftrace buffer empty)
CPU 2
Modules linked in: sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd i2c_i801 shpchp soundcore floppy i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: freq_table]

Pid: 69, comm: kworker/u:5 Not tainted 3.4.109-test #409 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff880037ac3d78  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8800774081c8 RCX: 0000000000293a02
RDX: ffff880077408100 RSI: ffff880077409728 RDI: ffff8800774081c8
RBP: ffff880037ac3dc0 R08: 0000000000014350 R09: ffffea0001e29e00
R10: ffffffffa0066b8d R11: ffff880078a78100 R12: ffff880037bf1800
R13: ffff880077409758 R14: 0000000000000001 R15: ffff880037ac7a05
FS:  0000000000000000(0000) GS:ffff88007d500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000078a90000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:5 (pid: 69, threadinfo ffff880037ac2000, task ffff880037a74d80)
Stack:
 ffffffffa0066adc 00000005c838e132 ffff880077408158 ffff880079faf440
 ffff880077408000 ffff880037bf1800 ffff880077409758 0000000000000001
 ffff880037ac7a05 ffff880037ac3df0 ffffffffa00686fe 0000000000000002
Call Trace:
 [<ffffffffa0066adc>] ? i915_gem_retire_requests_ring+0x1f/0x19b [i915]
 [<ffffffffa00686fe>] i915_gem_retire_requests+0x75/0x8a [i915]
 [<ffffffffa006876f>] i915_gem_retire_work_handler+0x5c/0x12a [i915]
 [<ffffffff81059e8f>] ? get_parent_ip+0xf/0x40
 [<ffffffff81049787>] process_one_work+0x187/0x298
 [<ffffffff8104a293>] worker_thread+0xd3/0x157
 [<ffffffff8104a1c0>] ? manage_workers.isra.26+0x16f/0x16f
 [<ffffffff8104dc79>] kthread+0x6f/0x77
 [<ffffffff8151a9d4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81513cce>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104dc0a>] ? kthread_freezable_should_stop+0x43/0x43
 [<ffffffff8151a9d0>] ? gs_change+0xb/0xb
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff880037ac3d78>
CR2: 0000000000000000
eth0: no IPv6 routers present


Config attached.

-- Steve

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 23809 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] 3.4.109 - unable to handle kernel NULL pointer dereference at           (null)
  2015-10-01 21:07 [BUG] 3.4.109 - unable to handle kernel NULL pointer dereference at (null) Steven Rostedt
@ 2015-10-04  4:26 ` Cal Peake
  2015-10-08  6:17   ` Zefan Li
  0 siblings, 1 reply; 3+ messages in thread
From: Cal Peake @ 2015-10-04  4:26 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Zefan Li, stable, LKML

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8549 bytes --]

On Thu, 1 Oct 2015, Steven Rostedt wrote:

> 
> I merged 3.4.109 into 3.4-rt, and it bugged. I then booted 3.4.109
> vanilla and it bugged too. 3.4.108 is fine.
> 

I'm getting a similar type bug here.  I've bisected it down to this commit:

commit 961bd13539b9e7ca5d2e667668141496b7a1d6bc
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Thu Apr 16 11:17:27 2015 +0900

    drm/radeon: Use drm_calloc_ab for CS relocs
    
    commit b421ed15d2c3039eb724680e4de1e4b2bd196a9a upstream.
    
    The number of relocs is passed in by userspace and can be large. It has
    been observed to cause kcalloc failures in the wild.


Backing it out of vanilla 3.4.109 has so far eliminated the problem.

Steven, you look to be using i915 graphics instead of radeon, so it seems 
unlikely to me that we're hitting the same problem.  Here's my oops for 
comparison though:


BUG: unable to handle kernel NULL pointer dereference at 00000000000002f1
IP: [<ffffffffa016202a>] evdev_poll+0x2a/0x70 [evdev]
PGD 211441067 PUD 213771067 PMD 0 
Oops: 0000 [#1] SMP 
CPU 0 
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 iptable_nat nf_nat bridge stp llc nfs auth_rpcgss lockd ntfs cifs udf crypto_hash sunrpc crypto_algapi isofs crc_itu_t vfat msdos fat nls_cp437 nls_utf8 nls_iso8859_1 nf_conntrack_ftp nf_conntrack_ipv4 xt_state nf_conntrack_tftp nf_defrag_ipv4 nf_conntrack xt_LOG ipt_REJECT xt_tcpudp iptable_filter kvm_amd ip_tables kvm x_tables f71882fg af_packet edac_core msr pcspkr cpuid edac_mce_amd mousedev usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek usb_storage snd_hda_intel snd_hda_codec radeon sr_mod cdrom ttm drm_kms_helper ohci_hcd powernow_k8 freq_table psmouse evdev snd_pcm mperf snd_timer k10temp e1000 sata_sil drm snd 8250_pnp serio_raw 8250 serial_core floppy ehci_hcd microcode soundcore snd_page_alloc pata_atiixp backlight i2c_piix4 usbcore i2c_algo_bit processor i2c_core sg thermal_sys usb_common button power_supply r8169 hwmon firmware_class mii loop!
  ext4 
bd2 crc16 raid1 dm_mod raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx md_mod ahci libahci libata sd_mod scsi_mod

Pid: 1862, comm: X Not tainted 3.4.108-00059-g961bd13 #7 MICRO-STAR INTERNATIONAL CO.,LTD MS-7551/KA780G (MS-7551)
RIP: 0010:[<ffffffffa016202a>]  [<ffffffffa016202a>] evdev_poll+0x2a/0x70 [evdev]
RSP: 0018:ffff880211dd79f8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802106b6800 RCX: 0000000000000069
RDX: ffff88021184f800 RSI: ffff880211dd7ad8 RDI: ffff88021184f800
RBP: 0000000000000019 R08: ffff880211dd7f48 R09: ffff880211dd7de0
R10: 0000000000000000 R11: 0000000000003246 R12: 0000000000010000
R13: 000000000007e000 R14: ffff88021184f800 R15: 0000000000000040
FS:  00007f649ddd08a0(0000) GS:ffff88021fc00000(0000) knlGS:00000000f70486d0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002f1 CR3: 0000000212ac6000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process X (pid: 1862, threadinfo ffff880211dd6000, task ffff8802135e62d0)
Stack:
 0000000000000001 0000000000000013 0000000000000010 ffffffff810d4b93
 ffff880211f9c700 ffffffffa03d90b2 ffff880211dd7cc8 0000000000000000
 000000000007e000 0000000000000000 0000000000000000 0000000111dd7cc8
Call Trace:
 [<ffffffff810d4b93>] ? do_select+0x333/0x5f0
 [<ffffffffa03d90b2>] ? r600_cs_packet_parse+0x42/0x140 [radeon]
 [<ffffffff810d4500>] ? __pollwait+0x110/0x110
Oct  3 23:24:38 lancer last message repeated 7 times
 [<ffffffff810ba216>] ? kmem_cache_free+0x86/0x90
 [<ffffffff81038d92>] ? __dequeue_signal+0x102/0x190
 [<ffffffff810d505c>] ? core_sys_select+0x20c/0x380
 [<ffffffff8103b608>] ? set_current_blocked+0x38/0x60
 [<ffffffff8103b6ec>] ? block_sigmask+0x3c/0x50
 [<ffffffff81001c84>] ? do_signal+0x1d4/0x620
 [<ffffffff8105b1ad>] ? ktime_get_ts+0x6d/0xe0
 [<ffffffff810d5212>] ? sys_select+0x42/0x110
 [<ffffffff81288982>] ? system_call_fastpath+0x16/0x1b
Code: <80> bd d8 02 00 00 01 8b 4b 04 48 8b 6c 24 10 19 c0 24 14 05 04 01 
RIP  [<ffffffffa016202a>] evdev_poll+0x2a/0x70 [evdev]
 RSP <ffff880211dd79f8>
CR2: 00000000000002f1
---[ end trace 369d4585fbe82a04 ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a1
IP: [<ffffffff81285f4e>] mutex_lock_interruptible+0xe/0x40
PGD 0 
Oops: 0002 [#2] SMP 
CPU 0 
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 iptable_nat nf_nat bridge stp llc nfs auth_rpcgss lockd ntfs cifs udf crypto_hash sunrpc crypto_algapi isofs crc_itu_t vfat msdos fat nls_cp437 nls_utf8 nls_iso8859_1 nf_conntrack_ftp nf_conntrack_ipv4 xt_state nf_conntrack_tftp nf_defrag_ipv4 nf_conntrack xt_LOG ipt_REJECT xt_tcpudp iptable_filter kvm_amd ip_tables kvm x_tables f71882fg af_packet edac_core msr pcspkr cpuid edac_mce_amd mousedev usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek usb_storage snd_hda_intel snd_hda_codec radeon sr_mod cdrom ttm drm_kms_helper ohci_hcd powernow_k8 freq_table psmouse evdev snd_pcm mperf snd_timer k10temp e1000 sata_sil drm snd 8250_pnp serio_raw 8250 serial_core floppy ehci_hcd microcode soundcore snd_page_alloc pata_atiixp backlight i2c_piix4 usbcore i2c_algo_bit processor i2c_core sg thermal_sys usb_common button power_supply r8169 hwmon firmware_class mii loop!
  ext4 
bd2 crc16 raid1 dm_mod raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx md_mod ahci libahci libata sd_mod scsi_mod

Pid: 1862, comm: X Tainted: G      D      3.4.108-00059-g961bd13 #7 MICRO-STAR INTERNATIONAL CO.,LTD MS-7551/KA780G (MS-7551)
RIP: 0010:[<ffffffff81285f4e>]  [<ffffffff81285f4e>] mutex_lock_interruptible+0xe/0x40
RSP: 0018:ffff880211dd7698  EFLAGS: 00010296
RAX: 00000000ffffffff RBX: 00000000000000a1 RCX: 00000000000000e9
RDX: ffff880211dd7fd8 RSI: ffff8802105d1e40 RDI: 00000000000000a1
RBP: 0000000000000019 R08: 00000000000128c0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88021184f800
R13: ffff8802105d1e50 R14: 0000000000000010 R15: ffff8802105d1e40
FS:  00007f649ddd08a0(0000) GS:ffff88021fc00000(0000) knlGS:00000000f70486d0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a1 CR3: 000000000160b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process X (pid: 1862, threadinfo ffff880211dd6000, task ffff8802135e62d0)
Stack:
 00000000000000a1 ffffffffa01621e0 ffff8802105d1e40 ffff88021184ea00
 ffff88021184f800 ffff8802105d1e40 0000000000000000 ffffffff810c0de8
 0000000000000000 0000000000000001 000000000000001f ffffffff8102ef1d
Call Trace:
 [<ffffffffa01621e0>] ? evdev_flush+0x30/0x80 [evdev]
 [<ffffffff810c0de8>] ? filp_close+0x38/0x90
 [<ffffffff8102ef1d>] ? put_files_struct+0x8d/0x100
 [<ffffffff8102f69d>] ? do_exit+0x63d/0x870
 [<ffffffff812857b1>] ? printk+0x40/0x45
 [<ffffffff81005433>] ? oops_end+0x73/0xa0
 [<ffffffff810210b2>] ? no_context+0x122/0x2d0
 [<ffffffff81021b75>] ? do_page_fault+0x3c5/0x420
 [<ffffffff8100aaad>] ? __switch_to_xtra+0xcd/0x130
 [<ffffffff8100109f>] ? __switch_to+0x34f/0x3b0
 [<ffffffff81287206>] ? wait_for_common+0xe6/0x190
 [<ffffffff81052760>] ? try_to_wake_up+0x280/0x280
 [<ffffffff8128849f>] ? page_fault+0x1f/0x30
 [<ffffffffa016202a>] ? evdev_poll+0x2a/0x70 [evdev]
 [<ffffffff810d4b93>] ? do_select+0x333/0x5f0
 [<ffffffffa03d90b2>] ? r600_cs_packet_parse+0x42/0x140 [radeon]
 [<ffffffff810d4500>] ? __pollwait+0x110/0x110
Oct  3 23:24:40 lancer last message repeated 7 times
 [<ffffffff810ba216>] ? kmem_cache_free+0x86/0x90
 [<ffffffff81038d92>] ? __dequeue_signal+0x102/0x190
 [<ffffffff810d505c>] ? core_sys_select+0x20c/0x380
 [<ffffffff8103b608>] ? set_current_blocked+0x38/0x60
 [<ffffffff8103b6ec>] ? block_sigmask+0x3c/0x50
 [<ffffffff81001c84>] ? do_signal+0x1d4/0x620
 [<ffffffff8105b1ad>] ? ktime_get_ts+0x6d/0xe0
 [<ffffffff810d5212>] ? sys_select+0x42/0x110
 [<ffffffff81288982>] ? system_call_fastpath+0x16/0x1b
Code: 8b 44 24 08 48 89 42 08 48 89 10 80 43 04 01 b8 fc ff ff ff eb 96 0f 1f 80 00 00 00 00 53 48 89 fb e8 97 11 00 00 b8 ff ff ff ff <f0> 0f c1 03 ff c8 78 11 65 48 8b 04 25 c0 a7 00 00 48 89 43 18 
RIP  [<ffffffff81285f4e>] mutex_lock_interruptible+0xe/0x40
 RSP <ffff880211dd7698>
CR2: 00000000000000a1
---[ end trace 369d4585fbe82a05 ]---
Fixing recursive fault but reboot is needed!

-- 
Cal Peake

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] 3.4.109 - unable to handle kernel NULL pointer dereference at (null)
  2015-10-04  4:26 ` Cal Peake
@ 2015-10-08  6:17   ` Zefan Li
  0 siblings, 0 replies; 3+ messages in thread
From: Zefan Li @ 2015-10-08  6:17 UTC (permalink / raw)
  To: Cal Peake, Steven Rostedt; +Cc: stable, LKML, satoshi.iwamoto

(back from vacation)

On 2015/10/4 12:26, Cal Peake wrote:
> On Thu, 1 Oct 2015, Steven Rostedt wrote:
>
>>
>> I merged 3.4.109 into 3.4-rt, and it bugged. I then booted 3.4.109
>> vanilla and it bugged too. 3.4.108 is fine.
>>
>

I guess this is caused by the following commit, which has already been
reverted in mainline kernel. I'll fix it in 3.4.110.

drm/i915: Don't skip request retirement if the active list is empty
commit 0aedb1626566efd72b369c01992ee7413c82a0c5 upstream.

> I'm getting a similar type bug here.  I've bisected it down to this commit:
>
> commit 961bd13539b9e7ca5d2e667668141496b7a1d6bc
> Author: Michel Dänzer <michel.daenzer@amd.com>
> Date:   Thu Apr 16 11:17:27 2015 +0900
>
>      drm/radeon: Use drm_calloc_ab for CS relocs
>
>      commit b421ed15d2c3039eb724680e4de1e4b2bd196a9a upstream.
>
>      The number of relocs is passed in by userspace and can be large. It has
>      been observed to cause kcalloc failures in the wild.
>
>
> Backing it out of vanilla 3.4.109 has so far eliminated the problem.
>

As you and Satoshi-san have already found out the culprit, I'll just revert
it in 3.4.110.

There are other 2 commits in drivers/gpu/drm/radeon betwwen 3.4.108 and 3.4.109,
and "drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling" has been partially
reverted in mainline kernel, so I'll fix this too.

> Steven, you look to be using i915 graphics instead of radeon, so it seems
> unlikely to me that we're hitting the same problem.  Here's my oops for
> comparison though:
>
...
>   [<ffffffff810d4b93>] ? do_select+0x333/0x5f0
>   [<ffffffffa03d90b2>] ? r600_cs_packet_parse+0x42/0x140 [radeon]
>   [<ffffffff810d4500>] ? __pollwait+0x110/0x110
> Oct  3 23:24:38 lancer last message repeated 7 times
>   [<ffffffff810ba216>] ? kmem_cache_free+0x86/0x90
>   [<ffffffff81038d92>] ? __dequeue_signal+0x102/0x190
>   [<ffffffff810d505c>] ? core_sys_select+0x20c/0x380
>   [<ffffffff8103b608>] ? set_current_blocked+0x38/0x60
>   [<ffffffff8103b6ec>] ? block_sigmask+0x3c/0x50
>   [<ffffffff81001c84>] ? do_signal+0x1d4/0x620
>   [<ffffffff8105b1ad>] ? ktime_get_ts+0x6d/0xe0
>   [<ffffffff810d5212>] ? sys_select+0x42/0x110
>   [<ffffffff81288982>] ? system_call_fastpath+0x16/0x1b


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-08  6:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-01 21:07 [BUG] 3.4.109 - unable to handle kernel NULL pointer dereference at (null) Steven Rostedt
2015-10-04  4:26 ` Cal Peake
2015-10-08  6:17   ` Zefan Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).