From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965693AbbDVQLm (ORCPT ); Wed, 22 Apr 2015 12:11:42 -0400 Received: from mga09.intel.com ([134.134.136.24]:42814 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965142AbbDVQLk (ORCPT ); Wed, 22 Apr 2015 12:11:40 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,624,1422950400"; d="scan'208";a="684036449" Message-ID: <5537C839.2060702@intel.com> Date: Wed, 22 Apr 2015 17:11:37 +0100 From: Michel Thierry Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Linus Torvalds , Daniel Vetter , Jani Nikula , David Airlie , Ben Widawsky , Mika Kuoppala CC: intel-gfx , Linux Kernel Mailing List Subject: Re: NULL ptr dereference in current i915 driver References: In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/22/2015 12:36 AM, Linus Torvalds wrote: > So I just go the appended NULL pointer de-reference when trying to > look at a video from my GoPro. > > The code disassembles to > > 0: 81 fb 00 04 00 00 cmp $0x400,%ebx > 6: 41 89 07 mov %eax,(%r15) > 9: 74 78 je 0x83 > b: 48 8d 7c 24 18 lea 0x18(%rsp),%rdi > 10: e8 6e b3 1b c1 callq 0xffffffffc11bb383 > 15: 84 c0 test %al,%al > 17: 74 4a je 0x63 > 19: 48 85 ed test %rbp,%rbp > 1c: 75 b5 jne 0xffffffffffffffd3 > 1e: 48 8b 04 24 mov (%rsp),%rax > 22: 49 8b 84 c4 98 01 00 mov 0x198(%r12,%rax,8),%rax > 29: 00 > 2a:* 48 8b 28 mov (%rax),%rbp <-- trapping instruction > 2d: 65 ff 05 1f e8 ef 3f incl %gs:0x3fefe81f(%rip) # 0x3fefe853 > 34: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax > 3b: 16 00 00 > > which matches up with the asm code > > cmpl $1024, %ebx #, act_pte > movl %eax, (%r15) # D.49217, *_26 > je .L118 #, > .L110: > leaq 24(%rsp), %rdi #, tmp156 > call __sg_page_iter_next # > testb %al, %al # D.49219 > je .L119 #, > testq %rbp, %rbp # pt_vaddr > jne .L109 #, > movq (%rsp), %rax # %sfp, act_pt > movq 408(%r12,%rax,8), %rax # MEM[(struct i915_hw_ppgtt > *)vm_8(D)].D.36998.pd.page > movq (%rax), %rbp # _21->page, D.49221 > #APP > # 72 "./arch/x86/include/asm/preempt.h" 1 > incl %gs:__preempt_count(%rip) # __preempt_count > # 0 "" 2 > #NO_APP > movabsq $24189255811072, %rax #, tmp150 > > which in turn seems to come from the C code > > pt_vaddr = > kmap_atomic(ppgtt->pd.page_table[act_pt]->page); > > (that "testq %rbp,%rbp; jne" just before the oopsing instruction group > is that "if (pt_vaddr == NULL)" test. > > IOW, it looks like > > ppgtt->pd.page_table[act_pt] > > is NULL, and then trying to dereference ->page off of it is what > oopses (the preempt-count increment that comes after is the > "pagefault_disable()" in kmap_atomic, and the big constant we're > loading into %rax is part of "page_address(page)"). > > I have no idea why "ppgtt->pd.page_table[act_pt]" would be NULL, but > clearly it can be. Can somebody who knows this code look into it. I've > added a few people who have worked in this area recently, in addition > to the usual maintainer list.. > > Thanks, > > Linus > > --- > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] gen6_ppgtt_insert_entries+0xa7/0x120 [i915] > PGD 0 > Oops: 0000 [#1] SMP > Modules linked in: rfcomm fuse cmac ip6t_rpfilter ip6t_REJECT > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 > nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute > bridge stp llc ebtable_filter ebtables ip6table_mangle > ip6table_security ip6table_raw ip6table_filter ip6_tables > iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat > x86_pkg_temp_thermal pn544_mei mei_phy coretemp pn544 hci nfc > kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek > snd_hda_codec_hdmi kvm snd_hda_codec_generic uvcvideo > videobuf2_vmalloc videobuf2_memops microcode videobuf2_core > snd_hda_intel v4l2_common hid_multitouch snd_hda_controller videodev > btusb snd_hda_codec iwlmvm media snd_hwdep mac80211 btbcm snd_seq > btintel bluetooth snd_seq_device joydev snd_pcm serio_raw > i2c_i801 iwlwifi cfg80211 snd_hda_core sony_laptop snd_timer snd > rfkill mei_me soundcore lpc_ich shpchp mei mfd_core dm_crypt > crct10dif_pclmul i915 crc32_pclmul crc32c_intel i2c_algo_bit > drm_kms_helper ghash_clmulni_intel drm i2c_core video > CPU: 1 PID: 2697 Comm: chrome Not tainted 4.0.0-09362-g1fc149933fd4 #8 > Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013 > task: ffff88010dc51b30 ti: ffff88003f328000 task.ti: ffff88003f328000 > RIP: 0010:[] [] > gen6_ppgtt_insert_entries+0xa7/0x120 [i915] > RSP: 0018:ffff88003f32b9a8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000075b1b > RDX: ffff88007d848990 RSI: 0000000000000001 RDI: ffff88003f32b9c0 > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003f6f7e58 > R10: 000000000d836000 R11: 0000000000000000 R12: ffff8800d4164000 > R13: 0000000000000000 R14: 0000000000000001 R15: ffff88003f7bbffc > FS: 00007f7f0ee94a00(0000) GS:ffff88011fa80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000005f607000 CR4: 00000000001407e0 > Stack: > 0000000000000201 000002010dc51b30 0000000000000000 ffff88007d848990 > 0000004000075b1b ffff880100000001 0000000000000fe0 ffff880011215900 > 0000000000000000 ffff88006cc4c380 ffff88003f6f0000 0000000000000001 > Call Trace: > ggtt_bind_vma+0x97/0x110 [i915] > i915_vma_bind+0x40/0x410 [i915] > swiotlb_map_sg_attrs+0x74/0x140 > i915_gem_object_do_pin+0x864/0x9f0 [i915] > mutex_lock+0x9/0x30 > i915_gem_execbuffer_reserve_vma.isra.20+0x66/0x130 [i915] > i915_gem_execbuffer_reserve+0x2ec/0x320 [i915] > i915_gem_do_execbuffer.isra.27+0x5ee/0xf80 [i915] > mutex_optimistic_spin+0x16e/0x1f0 > __mutex_lock_interruptible_slowpath+0x21/0x130 > shmem_fault+0x57/0x1c0 > drm_gem_object_lookup+0x14/0xa0 [drm] > i915_gem_execbuffer2+0xb2/0x2a0 [i915] > drm_ioctl+0x15a/0x580 [drm] > current_fs_time+0x9/0x50 > do_vfs_ioctl+0x2e8/0x4f0 > file_has_perm+0x77/0x80 > syscall_trace_enter_phase1+0x116/0x140 > SyS_ioctl+0x79/0x90 > system_call_fastpath+0x12/0x6a > Code: 00 81 fb 00 04 00 00 41 89 07 74 78 48 8d 7c 24 18 e8 6e b3 1b > c1 84 c0 74 4a 48 85 ed 75 b5 48 8b 04 24 49 8b 84 c4 98 01 00 00 <48> > 8b 28 65 ff 05 1f e8 ef 3f 48 b8 00 00 00 00 00 16 00 00 48 > RIP gen6_ppgtt_insert_entries+0xa7/0x120 [i915] > RSP > CR2: 0000000000000000 > Hi, I see a possible va re-allocation that could be the culprit, but the change was commited just 2 days ago (http://cgit.freedesktop.org/drm-intel/commit/?id=5c5f645773b6d147bf68c350674dc3ef4f8de83d). -Michel