All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lyude Paul <lyude@redhat.com>
To: stable@vger.kernel.org
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
	"Sinclair Yeh" <syeh@vmware.com>,
	"Christian König" <christian.koenig@amd.com>,
	"David Airlie" <airlied@linux.ie>,
	linux-kernel@vger.kernel.org,
	"Nicolai Hähnle" <nicolai.haehnle@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Chunming Zhou" <david1.zhou@amd.com>,
	"Michel Dänzer" <michel.daenzer@amd.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	"Harish Kasiviswanathan" <harish.kasiviswanathan@amd.com>,
	"Alex Xie" <alexbin.xie@amd.com>,
	"Zhang, Jerry" <jerry.zhang@amd.com>,
	"Felix Kuehling" <felix.kuehling@amd.com>,
	amd-gfx@lists.freedesktop.org
Subject: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
Date: Thu, 30 Nov 2017 19:23:02 -0500	[thread overview]
Message-ID: <20171201002311.28098-1-lyude@redhat.com> (raw)

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo

Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

    =============================
    WARNING: suspicious RCU usage
    4.14.3Lyude-Test+ #2 Not tainted
    -----------------------------
    ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ext_image_dma_b/27451:
     #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]

    stack backtrace:
    CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
    Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
    Call Trace:
     dump_stack+0x8e/0xce
     lockdep_rcu_suspicious+0xc5/0x100
     reservation_object_copy_fences+0x292/0x2b0
     ? ttm_bo_unref+0x9f/0x3c0 [ttm]
     ttm_bo_unref+0xbd/0x3c0 [ttm]
     amdgpu_bo_unref+0x2a/0x50 [amdgpu]
     amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
     drm_gem_object_free+0x1f/0x40 [drm]
     drm_gem_object_put_unlocked+0x40/0xb0 [drm]
     drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
     drm_gem_object_release_handle+0x51/0x90 [drm]
     drm_gem_handle_delete+0x5e/0x90 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     drm_gem_close_ioctl+0x20/0x30 [drm]
     drm_ioctl_kernel+0x5d/0xb0 [drm]
     drm_ioctl+0x2f7/0x3b0 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     ? trace_hardirqs_on_caller+0xf4/0x190
     ? trace_hardirqs_on+0xd/0x10
     amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     do_vfs_ioctl+0x93/0x670
     ? __fget+0x108/0x1f0
     SyS_ioctl+0x79/0x90
     entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
  drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  dma-buf: make reservation_object_copy_fences rcu save
  drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
  drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

 drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
 drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
 3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

WARNING: multiple messages have this Message-ID (diff)
From: Lyude Paul <lyude-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: "Chunming Zhou" <david1.zhou-5C7GfCeVMHo@public.gmane.org>,
	"Nicolai Hähnle" <nicolai.haehnle-5C7GfCeVMHo@public.gmane.org>,
	"Sinclair Yeh" <syeh-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>,
	"David Airlie" <airlied-cv59FeDIM0c@public.gmane.org>,
	"Harish Kasiviswanathan"
	<harish.kasiviswanathan-5C7GfCeVMHo@public.gmane.org>,
	"Felix Kuehling" <felix.kuehling-5C7GfCeVMHo@public.gmane.org>,
	"Zhang, Jerry" <jerry.zhang-5C7GfCeVMHo@public.gmane.org>,
	"Michel Dänzer" <michel.daenzer-5C7GfCeVMHo@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	"Sumit Semwal"
	<sumit.semwal-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw@public.gmane.org,
	"Peter Zijlstra" <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	"Alex Deucher" <alexander.deucher-5C7GfCeVMHo@public.gmane.org>,
	"Alex Xie" <alexbin.xie-5C7GfCeVMHo@public.gmane.org>,
	"Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>,
	linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14
Date: Thu, 30 Nov 2017 19:23:02 -0500	[thread overview]
Message-ID: <20171201002311.28098-1-lyude@redhat.com> (raw)

I haven't gone to see where it started, but as of late a good number of
pretty nasty deadlock issues have appeared with the kernel. Easy
reproduction recipe on a laptop with i915/amdgpu prime with lockdep enabled:

DRI_PRIME=1 glxinfo

Additionally, some more race conditions exist that I've managed to
trigger with piglit and lockdep enabled after applying these patches:

    =============================
    WARNING: suspicious RCU usage
    4.14.3Lyude-Test+ #2 Not tainted
    -----------------------------
    ./include/linux/reservation.h:216 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ext_image_dma_b/27451:
     #0:  (reservation_ww_class_mutex){+.+.}, at: [<ffffffffa034f2ff>] ttm_bo_unref+0x9f/0x3c0 [ttm]

    stack backtrace:
    CPU: 0 PID: 27451 Comm: ext_image_dma_b Not tainted 4.14.3Lyude-Test+ #2
    Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.02 06/09/2017
    Call Trace:
     dump_stack+0x8e/0xce
     lockdep_rcu_suspicious+0xc5/0x100
     reservation_object_copy_fences+0x292/0x2b0
     ? ttm_bo_unref+0x9f/0x3c0 [ttm]
     ttm_bo_unref+0xbd/0x3c0 [ttm]
     amdgpu_bo_unref+0x2a/0x50 [amdgpu]
     amdgpu_gem_object_free+0x4b/0x50 [amdgpu]
     drm_gem_object_free+0x1f/0x40 [drm]
     drm_gem_object_put_unlocked+0x40/0xb0 [drm]
     drm_gem_object_handle_put_unlocked+0x6c/0xb0 [drm]
     drm_gem_object_release_handle+0x51/0x90 [drm]
     drm_gem_handle_delete+0x5e/0x90 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     drm_gem_close_ioctl+0x20/0x30 [drm]
     drm_ioctl_kernel+0x5d/0xb0 [drm]
     drm_ioctl+0x2f7/0x3b0 [drm]
     ? drm_gem_handle_create+0x40/0x40 [drm]
     ? trace_hardirqs_on_caller+0xf4/0x190
     ? trace_hardirqs_on+0xd/0x10
     amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     do_vfs_ioctl+0x93/0x670
     ? __fget+0x108/0x1f0
     SyS_ioctl+0x79/0x90
     entry_SYSCALL_64_fastpath+0x23/0xc2

I've also added the relevant fixes for the issue mentioned above.

Christian König (3):
  drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more
  dma-buf: make reservation_object_copy_fences rcu save
  drm/amdgpu: reserve root PD while releasing it

Michel Dänzer (1):
  drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list

 drivers/dma-buf/reservation.c          | 56 +++++++++++++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++--
 drivers/gpu/drm/ttm/ttm_bo.c           | 43 +++++++++++++-------------
 3 files changed, 74 insertions(+), 38 deletions(-)

--
2.14.3

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

             reply	other threads:[~2017-12-01  0:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01  0:23 Lyude Paul [this message]
2017-12-01  0:23 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Lyude Paul
2017-12-01  0:23 ` [PATCH 1/4] drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 2/4] dma-buf: make reservation_object_copy_fences rcu save Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 3/4] drm/amdgpu: reserve root PD while releasing it Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  0:23 ` [PATCH 4/4] drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list Lyude Paul
2017-12-01  0:23   ` Lyude Paul
2017-12-01  8:27 ` [PATCH 0/4] Backported amdgpu ttm deadlock fixes for 4.14 Christian König
2017-12-04 11:45 ` Greg KH
2017-12-04 11:45   ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201002311.28098-1-lyude@redhat.com \
    --to=lyude@redhat.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=alexbin.xie@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=david1.zhou@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=harish.kasiviswanathan@amd.com \
    --cc=jerry.zhang@amd.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=michel.daenzer@amd.com \
    --cc=nicolai.haehnle@amd.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=syeh@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.