linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NULL ptr dereference in current i915 driver
@ 2015-04-21 23:36 Linus Torvalds
  2015-04-22 16:11 ` Michel Thierry
  2015-04-22 16:45 ` [PATCH] drm/i915: Add checks to i915_bind_vma Mika Kuoppala
  0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2015-04-21 23:36 UTC (permalink / raw)
  To: Daniel Vetter, Jani Nikula, David Airlie, Ben Widawsky,
	Michel Thierry, Mika Kuoppala
  Cc: intel-gfx, Linux Kernel Mailing List

So I just go the appended NULL pointer de-reference when trying to
look at a video from my GoPro.

The code disassembles to

   0: 81 fb 00 04 00 00     cmp    $0x400,%ebx
   6: 41 89 07             mov    %eax,(%r15)
   9: 74 78                 je     0x83
   b: 48 8d 7c 24 18       lea    0x18(%rsp),%rdi
  10: e8 6e b3 1b c1       callq  0xffffffffc11bb383
  15: 84 c0                 test   %al,%al
  17: 74 4a                 je     0x63
  19: 48 85 ed             test   %rbp,%rbp
  1c: 75 b5                 jne    0xffffffffffffffd3
  1e: 48 8b 04 24           mov    (%rsp),%rax
  22: 49 8b 84 c4 98 01 00 mov    0x198(%r12,%rax,8),%rax
  29: 00
  2a:* 48 8b 28             mov    (%rax),%rbp <-- trapping instruction
  2d: 65 ff 05 1f e8 ef 3f incl   %gs:0x3fefe81f(%rip)        # 0x3fefe853
  34: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
  3b: 16 00 00

which matches up with the asm code

        cmpl    $1024, %ebx     #, act_pte
        movl    %eax, (%r15)    # D.49217, *_26
        je      .L118   #,
  .L110:
        leaq    24(%rsp), %rdi  #, tmp156
        call    __sg_page_iter_next     #
        testb   %al, %al        # D.49219
        je      .L119   #,
        testq   %rbp, %rbp      # pt_vaddr
        jne     .L109   #,
        movq    (%rsp), %rax    # %sfp, act_pt
        movq    408(%r12,%rax,8), %rax  # MEM[(struct i915_hw_ppgtt
*)vm_8(D)].D.36998.pd.page
        movq    (%rax), %rbp    # _21->page, D.49221
  #APP
  # 72 "./arch/x86/include/asm/preempt.h" 1
        incl %gs:__preempt_count(%rip)  # __preempt_count
  # 0 "" 2
  #NO_APP
        movabsq $24189255811072, %rax   #, tmp150

which in turn seems to come from the C code

                        pt_vaddr =
kmap_atomic(ppgtt->pd.page_table[act_pt]->page);

(that "testq %rbp,%rbp; jne" just before the oopsing instruction group
is that "if (pt_vaddr == NULL)" test.

IOW, it looks like

     ppgtt->pd.page_table[act_pt]

is NULL, and then trying to dereference ->page off of it is what
oopses (the preempt-count increment that comes after is the
"pagefault_disable()" in kmap_atomic, and the big constant we're
loading into %rax is part of "page_address(page)").

I have no idea why "ppgtt->pd.page_table[act_pt]" would be NULL, but
clearly it can be. Can somebody who knows this code look into it. I've
added a few people who have worked in this area recently, in addition
to the usual maintainer list..

Thanks,

                  Linus

---
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffc010c137>] gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: rfcomm fuse cmac ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute
bridge stp llc ebtable_filter ebtables ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
x86_pkg_temp_thermal pn544_mei mei_phy coretemp pn544 hci nfc
kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
snd_hda_codec_hdmi kvm snd_hda_codec_generic uvcvideo
videobuf2_vmalloc videobuf2_memops microcode videobuf2_core
snd_hda_intel v4l2_common hid_multitouch snd_hda_controller videodev
btusb snd_hda_codec iwlmvm media snd_hwdep mac80211 btbcm snd_seq
btintel bluetooth snd_seq_device joydev snd_pcm serio_raw
 i2c_i801 iwlwifi cfg80211 snd_hda_core sony_laptop snd_timer snd
rfkill mei_me soundcore lpc_ich shpchp mei mfd_core dm_crypt
crct10dif_pclmul i915 crc32_pclmul crc32c_intel i2c_algo_bit
drm_kms_helper ghash_clmulni_intel drm i2c_core video
CPU: 1 PID: 2697 Comm: chrome Not tainted 4.0.0-09362-g1fc149933fd4 #8
Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
task: ffff88010dc51b30 ti: ffff88003f328000 task.ti: ffff88003f328000
RIP: 0010:[<ffffffffc010c137>]  [<ffffffffc010c137>]
gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
RSP: 0018:ffff88003f32b9a8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000075b1b
RDX: ffff88007d848990 RSI: 0000000000000001 RDI: ffff88003f32b9c0
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003f6f7e58
R10: 000000000d836000 R11: 0000000000000000 R12: ffff8800d4164000
R13: 0000000000000000 R14: 0000000000000001 R15: ffff88003f7bbffc
FS:  00007f7f0ee94a00(0000) GS:ffff88011fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000005f607000 CR4: 00000000001407e0
Stack:
 0000000000000201 000002010dc51b30 0000000000000000 ffff88007d848990
 0000004000075b1b ffff880100000001 0000000000000fe0 ffff880011215900
 0000000000000000 ffff88006cc4c380 ffff88003f6f0000 0000000000000001
Call Trace:
 ggtt_bind_vma+0x97/0x110 [i915]
 i915_vma_bind+0x40/0x410 [i915]
 swiotlb_map_sg_attrs+0x74/0x140
 i915_gem_object_do_pin+0x864/0x9f0 [i915]
 mutex_lock+0x9/0x30
 i915_gem_execbuffer_reserve_vma.isra.20+0x66/0x130 [i915]
 i915_gem_execbuffer_reserve+0x2ec/0x320 [i915]
 i915_gem_do_execbuffer.isra.27+0x5ee/0xf80 [i915]
 mutex_optimistic_spin+0x16e/0x1f0
 __mutex_lock_interruptible_slowpath+0x21/0x130
 shmem_fault+0x57/0x1c0
 drm_gem_object_lookup+0x14/0xa0 [drm]
 i915_gem_execbuffer2+0xb2/0x2a0 [i915]
 drm_ioctl+0x15a/0x580 [drm]
 current_fs_time+0x9/0x50
 do_vfs_ioctl+0x2e8/0x4f0
 file_has_perm+0x77/0x80
 syscall_trace_enter_phase1+0x116/0x140
 SyS_ioctl+0x79/0x90
 system_call_fastpath+0x12/0x6a
Code: 00 81 fb 00 04 00 00 41 89 07 74 78 48 8d 7c 24 18 e8 6e b3 1b
c1 84 c0 74 4a 48 85 ed 75 b5 48 8b 04 24 49 8b 84 c4 98 01 00 00 <48>
8b 28 65 ff 05 1f e8 ef 3f 48 b8 00 00 00 00 00 16 00 00 48
RIP  gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
 RSP <ffff88003f32b9a8>
CR2: 0000000000000000

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NULL ptr dereference in current i915 driver
  2015-04-21 23:36 NULL ptr dereference in current i915 driver Linus Torvalds
@ 2015-04-22 16:11 ` Michel Thierry
  2015-04-22 16:45 ` [PATCH] drm/i915: Add checks to i915_bind_vma Mika Kuoppala
  1 sibling, 0 replies; 4+ messages in thread
From: Michel Thierry @ 2015-04-22 16:11 UTC (permalink / raw)
  To: Linus Torvalds, Daniel Vetter, Jani Nikula, David Airlie,
	Ben Widawsky, Mika Kuoppala
  Cc: intel-gfx, Linux Kernel Mailing List

On 4/22/2015 12:36 AM, Linus Torvalds wrote:
> So I just go the appended NULL pointer de-reference when trying to
> look at a video from my GoPro.
>
> The code disassembles to
>
>     0: 81 fb 00 04 00 00     cmp    $0x400,%ebx
>     6: 41 89 07             mov    %eax,(%r15)
>     9: 74 78                 je     0x83
>     b: 48 8d 7c 24 18       lea    0x18(%rsp),%rdi
>    10: e8 6e b3 1b c1       callq  0xffffffffc11bb383
>    15: 84 c0                 test   %al,%al
>    17: 74 4a                 je     0x63
>    19: 48 85 ed             test   %rbp,%rbp
>    1c: 75 b5                 jne    0xffffffffffffffd3
>    1e: 48 8b 04 24           mov    (%rsp),%rax
>    22: 49 8b 84 c4 98 01 00 mov    0x198(%r12,%rax,8),%rax
>    29: 00
>    2a:* 48 8b 28             mov    (%rax),%rbp <-- trapping instruction
>    2d: 65 ff 05 1f e8 ef 3f incl   %gs:0x3fefe81f(%rip)        # 0x3fefe853
>    34: 48 b8 00 00 00 00 00 movabs $0x160000000000,%rax
>    3b: 16 00 00
>
> which matches up with the asm code
>
>          cmpl    $1024, %ebx     #, act_pte
>          movl    %eax, (%r15)    # D.49217, *_26
>          je      .L118   #,
>    .L110:
>          leaq    24(%rsp), %rdi  #, tmp156
>          call    __sg_page_iter_next     #
>          testb   %al, %al        # D.49219
>          je      .L119   #,
>          testq   %rbp, %rbp      # pt_vaddr
>          jne     .L109   #,
>          movq    (%rsp), %rax    # %sfp, act_pt
>          movq    408(%r12,%rax,8), %rax  # MEM[(struct i915_hw_ppgtt
> *)vm_8(D)].D.36998.pd.page
>          movq    (%rax), %rbp    # _21->page, D.49221
>    #APP
>    # 72 "./arch/x86/include/asm/preempt.h" 1
>          incl %gs:__preempt_count(%rip)  # __preempt_count
>    # 0 "" 2
>    #NO_APP
>          movabsq $24189255811072, %rax   #, tmp150
>
> which in turn seems to come from the C code
>
>                          pt_vaddr =
> kmap_atomic(ppgtt->pd.page_table[act_pt]->page);
>
> (that "testq %rbp,%rbp; jne" just before the oopsing instruction group
> is that "if (pt_vaddr == NULL)" test.
>
> IOW, it looks like
>
>       ppgtt->pd.page_table[act_pt]
>
> is NULL, and then trying to dereference ->page off of it is what
> oopses (the preempt-count increment that comes after is the
> "pagefault_disable()" in kmap_atomic, and the big constant we're
> loading into %rax is part of "page_address(page)").
>
> I have no idea why "ppgtt->pd.page_table[act_pt]" would be NULL, but
> clearly it can be. Can somebody who knows this code look into it. I've
> added a few people who have worked in this area recently, in addition
> to the usual maintainer list..
>
> Thanks,
>
>                    Linus
>
> ---
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffffc010c137>] gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: rfcomm fuse cmac ip6t_rpfilter ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute
> bridge stp llc ebtable_filter ebtables ip6table_mangle
> ip6table_security ip6table_raw ip6table_filter ip6_tables
> iptable_mangle iptable_security iptable_raw bnep arc4 vfat fat
> x86_pkg_temp_thermal pn544_mei mei_phy coretemp pn544 hci nfc
> kvm_intel iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
> snd_hda_codec_hdmi kvm snd_hda_codec_generic uvcvideo
> videobuf2_vmalloc videobuf2_memops microcode videobuf2_core
> snd_hda_intel v4l2_common hid_multitouch snd_hda_controller videodev
> btusb snd_hda_codec iwlmvm media snd_hwdep mac80211 btbcm snd_seq
> btintel bluetooth snd_seq_device joydev snd_pcm serio_raw
>   i2c_i801 iwlwifi cfg80211 snd_hda_core sony_laptop snd_timer snd
> rfkill mei_me soundcore lpc_ich shpchp mei mfd_core dm_crypt
> crct10dif_pclmul i915 crc32_pclmul crc32c_intel i2c_algo_bit
> drm_kms_helper ghash_clmulni_intel drm i2c_core video
> CPU: 1 PID: 2697 Comm: chrome Not tainted 4.0.0-09362-g1fc149933fd4 #8
> Hardware name: Sony Corporation SVP11213CXB/VAIO, BIOS R0270V7 05/17/2013
> task: ffff88010dc51b30 ti: ffff88003f328000 task.ti: ffff88003f328000
> RIP: 0010:[<ffffffffc010c137>]  [<ffffffffc010c137>]
> gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
> RSP: 0018:ffff88003f32b9a8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000075b1b
> RDX: ffff88007d848990 RSI: 0000000000000001 RDI: ffff88003f32b9c0
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88003f6f7e58
> R10: 000000000d836000 R11: 0000000000000000 R12: ffff8800d4164000
> R13: 0000000000000000 R14: 0000000000000001 R15: ffff88003f7bbffc
> FS:  00007f7f0ee94a00(0000) GS:ffff88011fa80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000005f607000 CR4: 00000000001407e0
> Stack:
>   0000000000000201 000002010dc51b30 0000000000000000 ffff88007d848990
>   0000004000075b1b ffff880100000001 0000000000000fe0 ffff880011215900
>   0000000000000000 ffff88006cc4c380 ffff88003f6f0000 0000000000000001
> Call Trace:
>   ggtt_bind_vma+0x97/0x110 [i915]
>   i915_vma_bind+0x40/0x410 [i915]
>   swiotlb_map_sg_attrs+0x74/0x140
>   i915_gem_object_do_pin+0x864/0x9f0 [i915]
>   mutex_lock+0x9/0x30
>   i915_gem_execbuffer_reserve_vma.isra.20+0x66/0x130 [i915]
>   i915_gem_execbuffer_reserve+0x2ec/0x320 [i915]
>   i915_gem_do_execbuffer.isra.27+0x5ee/0xf80 [i915]
>   mutex_optimistic_spin+0x16e/0x1f0
>   __mutex_lock_interruptible_slowpath+0x21/0x130
>   shmem_fault+0x57/0x1c0
>   drm_gem_object_lookup+0x14/0xa0 [drm]
>   i915_gem_execbuffer2+0xb2/0x2a0 [i915]
>   drm_ioctl+0x15a/0x580 [drm]
>   current_fs_time+0x9/0x50
>   do_vfs_ioctl+0x2e8/0x4f0
>   file_has_perm+0x77/0x80
>   syscall_trace_enter_phase1+0x116/0x140
>   SyS_ioctl+0x79/0x90
>   system_call_fastpath+0x12/0x6a
> Code: 00 81 fb 00 04 00 00 41 89 07 74 78 48 8d 7c 24 18 e8 6e b3 1b
> c1 84 c0 74 4a 48 85 ed 75 b5 48 8b 04 24 49 8b 84 c4 98 01 00 00 <48>
> 8b 28 65 ff 05 1f e8 ef 3f 48 b8 00 00 00 00 00 16 00 00 48
> RIP  gen6_ppgtt_insert_entries+0xa7/0x120 [i915]
>   RSP <ffff88003f32b9a8>
> CR2: 0000000000000000
>

Hi,

I see a possible va re-allocation that could be the culprit, but the 
change was commited just 2 days ago 
(http://cgit.freedesktop.org/drm-intel/commit/?id=5c5f645773b6d147bf68c350674dc3ef4f8de83d).

-Michel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] drm/i915: Add checks to i915_bind_vma
  2015-04-21 23:36 NULL ptr dereference in current i915 driver Linus Torvalds
  2015-04-22 16:11 ` Michel Thierry
@ 2015-04-22 16:45 ` Mika Kuoppala
  2015-04-23 13:14   ` Josh Boyer
  1 sibling, 1 reply; 4+ messages in thread
From: Mika Kuoppala @ 2015-04-22 16:45 UTC (permalink / raw)
  To: torvalds, daniel.vetter, jani.nikula, airlied, ben
  Cc: intel-gfx, linux-kernel, Chris Wilson, Michel Thierry

The current aliasing ppgtt implementation allocates
the page table structures on driver initialization
for the entire vm address space. Earlier the page tables
were allocated as array of struct pages, but introduction
of dynamic allocation of page structures changed the page
tables to be inside a page directory.

We have a detailed bug report where traversing of tables and
deferencing page_table[pte]->page oopses. This indicates that
our pre allocation of page tables has failed or that we get
corrupt vma which does not fit inside the vm area and throws
pte > 511.

Add more checks to catch the latter. Warn and bail out if
we get vma which is out of bounds for binding.

v2: Check vma node early (Chris)

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0239fbf..2ffa8f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2746,6 +2746,13 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
 int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 		  u32 flags)
 {
+
+	if (WARN_ON(!drm_mm_node_allocated(&vma->node)))
+		return -EINVAL;
+
+	if (WARN_ON(vma->node.start > vma->vm->total - vma->node.size))
+		return -EINVAL;
+
 	if (i915_is_ggtt(vma->vm)) {
 		int ret = i915_get_ggtt_vma_pages(vma);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Add checks to i915_bind_vma
  2015-04-22 16:45 ` [PATCH] drm/i915: Add checks to i915_bind_vma Mika Kuoppala
@ 2015-04-23 13:14   ` Josh Boyer
  0 siblings, 0 replies; 4+ messages in thread
From: Josh Boyer @ 2015-04-23 13:14 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: Linus Torvalds, Daniel Vetter, Jani Nikula, Dave Airlie, ben,
	Intel Graphics Development, Linux-Kernel@Vger. Kernel. Org,
	Chris Wilson, Michel Thierry

On Wed, Apr 22, 2015 at 12:45 PM, Mika Kuoppala
<mika.kuoppala@linux.intel.com> wrote:
> The current aliasing ppgtt implementation allocates
> the page table structures on driver initialization
> for the entire vm address space. Earlier the page tables
> were allocated as array of struct pages, but introduction
> of dynamic allocation of page structures changed the page
> tables to be inside a page directory.
>
> We have a detailed bug report where traversing of tables and
> deferencing page_table[pte]->page oopses. This indicates that
> our pre allocation of page tables has failed or that we get
> corrupt vma which does not fit inside the vm area and throws
> pte > 511.
>
> Add more checks to catch the latter. Warn and bail out if
> we get vma which is out of bounds for binding.
>
> v2: Check vma node early (Chris)
>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 0239fbf..2ffa8f6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2746,6 +2746,13 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma)
>  int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>                   u32 flags)
>  {
> +
> +       if (WARN_ON(!drm_mm_node_allocated(&vma->node)))
> +               return -EINVAL;
> +
> +       if (WARN_ON(vma->node.start > vma->vm->total - vma->node.size))
> +               return -EINVAL;
> +

Do you really need a full backtrace in these cases?  From an end user
perspective, they're going to get a popup saying they have a kernel
problem or an automated tool will file a bug for them and they will
have no idea what any of this means.

I understand the need for the check, but could it be done in a way
that doesn't splat an oops on a user's system?  The i915 driver has
tons of these kinds of WARN_ONs already and they don't seem to be
helping anything overall...

josh

>         if (i915_is_ggtt(vma->vm)) {
>                 int ret = i915_get_ggtt_vma_pages(vma);
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-23 13:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 23:36 NULL ptr dereference in current i915 driver Linus Torvalds
2015-04-22 16:11 ` Michel Thierry
2015-04-22 16:45 ` [PATCH] drm/i915: Add checks to i915_bind_vma Mika Kuoppala
2015-04-23 13:14   ` Josh Boyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).