From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D2B7EB64D0 for ; Tue, 13 Jun 2023 14:18:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5407189321; Tue, 13 Jun 2023 14:18:36 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8C33989321 for ; Tue, 13 Jun 2023 14:18:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686665912; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2G1HiDJI9MtsuF2kfhlCHhhtN8s8ygSWEma90ikyRpE=; b=Yz2tfrSOq/2+xxARosrpFix8Ozkm3ZfJ5DRbFAjqQxOHjyTJOqtSxVvXYmeyITVbkr/N30 7ejEFc//phUH1jq1KLFGgU+l60f0dNnytMoMlGsaBmnhbP/WrE4Q4sr3EJx7Mbt3jRe97D Oas7qH5ZuM1v4BP/PobqN+UlkXQU8cg= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-459-moF14k2nO3-4Vh0CTjEkGw-1; Tue, 13 Jun 2023 10:18:25 -0400 X-MC-Unique: moF14k2nO3-4Vh0CTjEkGw-1 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-2b32a11b31bso3246601fa.0 for ; Tue, 13 Jun 2023 07:18:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686665903; x=1689257903; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2G1HiDJI9MtsuF2kfhlCHhhtN8s8ygSWEma90ikyRpE=; b=FFoeE6CGOjuj4mEMDSL1TN+i5EmqHJfnmUm2ouOZ1cVVfd5bPujcPGZzJ5ntsdLicg IRrdryyhvJYgsP5Na76uP+ElIVWZBKGjdlSBf+i2KcFZFsfvACYptGdvFN8wPRyTN+2S L/Gp2Ah7SDKP703grYEzZ9TAG/tNbWLQL0pttAF9t2C8yeEvlqK1I7MAebyyDnlct+Ik ehspbDZhpoftZZFbGENHhydaHHk03MvDn2F9XmIn/W7M01yg6r/v7D4C9yW6nTYWgH05 3Kcal4taUdPptjeod34xuQuTcV7uksdrZsVQazpfJsDtV/0gSC2JmzZKkVGgjVhDrT91 sk7w== X-Gm-Message-State: AC+VfDwCX4pwrA1DjOCHtuHQmgu0bVcYX8MhNf2rJlasdh7lsbMvMr3n lN3IqTpbmNp5fwg3jrRHzKaNNKLTYEUPBHUajGzhKakJmoDpESe1iwlFDuzCV2q1x1WzqAhB2EU aSz4BrdDbnlhzCxwlelnRF9ia/OjwuUQq/x5bvnnSes3q X-Received: by 2002:a05:651c:1697:b0:2b1:d17a:e22c with SMTP id bd23-20020a05651c169700b002b1d17ae22cmr6187789ljb.1.1686665903485; Tue, 13 Jun 2023 07:18:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6/D3IYQ/MgsAGuFlPhGSIpQKf1XFXlIZ37NxPpt1bkF709B6LQgoGrJXwyHkzPt8VGukiejltomq4jAuYyIy8= X-Received: by 2002:a05:651c:1697:b0:2b1:d17a:e22c with SMTP id bd23-20020a05651c169700b002b1d17ae22cmr6187764ljb.1.1686665902953; Tue, 13 Jun 2023 07:18:22 -0700 (PDT) MIME-Version: 1.0 References: <20221125102137.1801-1-christian.koenig@amd.com> <20221125102137.1801-3-christian.koenig@amd.com> <8ff841e3-8eef-9ec2-2ba5-4907f18873c0@amd.com> <8cb02812-1bc5-c1ff-13b0-eeec87c26859@amd.com> In-Reply-To: <8cb02812-1bc5-c1ff-13b0-eeec87c26859@amd.com> From: Karol Herbst Date: Tue, 13 Jun 2023 16:18:11 +0200 Message-ID: Subject: Re: [PATCH 3/9] drm/ttm: use per BO cleanup workers To: =?UTF-8?Q?Christian_K=C3=B6nig?= X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Airlie , Felix Kuehling , amd-gfx@lists.freedesktop.org, Intel Graphics Development , dri-devel@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Tue, Jun 13, 2023 at 3:59=E2=80=AFPM Christian K=C3=B6nig wrote: > > Am 13.06.23 um 15:05 schrieb Karol Herbst: > > On Mon, Dec 5, 2022 at 2:40=E2=80=AFPM Christian K=C3=B6nig wrote: > >> Am 29.11.22 um 22:14 schrieb Felix Kuehling: > >>> On 2022-11-25 05:21, Christian K=C3=B6nig wrote: > >>>> Instead of a single worker going over the list of delete BOs in regu= lar > >>>> intervals use a per BO worker which blocks for the resv object and > >>>> locking of the BO. > >>>> > >>>> This not only simplifies the handling massively, but also results in > >>>> much better response time when cleaning up buffers. > >>>> > >>>> Signed-off-by: Christian K=C3=B6nig > >>> Just thinking out loud: If I understand it correctly, this can cause = a > >>> lot of sleeping worker threads when > >>> AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed > >>> at the same time. This happens e.g. when a KFD process terminates or > >>> crashes. I guess with a concurrency-managed workqueue this isn't goin= g > >>> to be excessive. And since it's on a per device workqueue, it doesn't > >>> stall work items on the system work queue or from other devices. > >> Yes, exactly that. The last parameter to alloc_workqueue() limits how > >> many work items can be sleeping. > >> > >>> I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue > >>> is not about freeing ttm_resources but about freeing the BOs. But it > >>> affects freeing of ghost_objs that are holding the ttm_resources bein= g > >>> freed. > >> Well if the BO is idle, but not immediately lockable we delegate freei= ng > >> the backing pages in the TT object to those workers as well. It might > >> even be a good idea to use a separate wq for this case. > >> > >>> If those assumptions all make sense, patches 1-3 are > >>> > >>> Reviewed-by: Felix Kuehling > >> Thanks, > >> Christian. > >> > > This patch causes a heap use-after-free when using nouveau with the > > potential of trashing filesystems, is there a way to revert it until > > we figure out a proper solution to the problem? > > Uff I don't think so, we have quite some work based on top of this. But > let me double check. > yeah.. I already talked with Dave about fixing this issue as Dave has more knowledge on this part of the driver (I hope), so we might have a fix soonish, but the concerning part is, that it's already out to users, so might be better to be able to revert it if the fix takes a while to emerge. > On the other hand have you tried running this with KASAN to catch use > after free errors? yes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213#note_1942777 > > Since we now block for work to finish and not check every few > milliseconds to garbage collect memory will now be reclaimed much faster > after freeing it. yeah, that kinda makes sense. This entire issue feels like a race happening as I need to run the OpenGL CTS in parallel with 8+ threads to trigger it reliably. > > Regards, > Christian. > > > > > Bug: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 > > > > example trace on affected systems: > > > > [ 4102.946946] general protection fault, probably for non-canonical > > address 0x5f775ce3bd949b45: 0000 [#3] PREEMPT SMP NOPTI > > [ 4102.957794] CPU: 12 PID: 89561 Comm: glcts Tainted: G D > > 6.3.5-200.fc38.x86_64 #1 > > [ 4102.966556] Hardware name: ASUS System Product Name/PRIME B660-PLUS > > D4, BIOS 0418 10/13/2021 > > [ 4102.974972] RIP: 0010:__kmem_cache_alloc_node+0x1ba/0x320 > > [ 4102.980362] Code: 2b 14 25 28 00 00 00 0f 85 74 01 00 00 48 83 c4 > > 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41 8b 47 28 4d 8b 07 > > 48 01 f8 <48> 8b 18 48 89 c1 49 33 9f b8 00 00 00 48 0f c9 48 31 cb 41 > > f6 c0 > > [ 4102.999073] RSP: 0018:ffff9764e0057b40 EFLAGS: 00010202 > > [ 4103.004291] RAX: 5f775ce3bd949b45 RBX: 0000000000000dc0 RCX: 0000000= 000000046 > > [ 4103.011408] RDX: 00000002cf87600c RSI: 0000000000000dc0 RDI: 5f775ce= 3bd949b15 > > [ 4103.018528] RBP: 0000000000000dc0 R08: 00000000000390c0 R09: 0000000= 030302d6d > > [ 4103.025649] R10: 00000000756c7473 R11: 0000000020090298 R12: 0000000= 000000000 > > [ 4103.032767] R13: 00000000ffffffff R14: 0000000000000046 R15: ffff8bd= a80042600 > > [ 4103.039887] FS: 00007f386a85ef00(0000) GS:ffff8be1df700000(0000) > > knlGS:0000000000000000 > > [ 4103.047958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 4103.053692] CR2: 000000000493b868 CR3: 000000014c3ba000 CR4: 0000000= 000f50ee0 > > [ 4103.060812] PKRU: 55555554 > > [ 4103.063520] Call Trace: > > [ 4103.065970] > > [ 4103.068071] ? die_addr+0x36/0x90 > > [ 4103.071384] ? exc_general_protection+0x1be/0x420 > > [ 4103.076081] ? asm_exc_general_protection+0x26/0x30 > > [ 4103.080952] ? __kmem_cache_alloc_node+0x1ba/0x320 > > [ 4103.085734] ? ext4_htree_store_dirent+0x42/0x180 > > [ 4103.090431] ? ext4_htree_store_dirent+0x42/0x180 > > [ 4103.095132] __kmalloc+0x4d/0x150 > > [ 4103.098444] ext4_htree_store_dirent+0x42/0x180 > > [ 4103.102970] htree_dirblock_to_tree+0x1ed/0x370 > > [ 4103.107494] ext4_htree_fill_tree+0x109/0x3d0 > > [ 4103.111846] ext4_readdir+0x6d4/0xa80 > > [ 4103.115505] iterate_dir+0x178/0x1c0 > > [ 4103.119076] __x64_sys_getdents64+0x88/0x130 > > [ 4103.123341] ? __pfx_filldir64+0x10/0x10 > > [ 4103.127260] do_syscall_64+0x5d/0x90 > > [ 4103.130835] ? handle_mm_fault+0x11e/0x310 > > [ 4103.134927] ? do_user_addr_fault+0x1e0/0x720 > > [ 4103.139278] ? exc_page_fault+0x7c/0x180 > > [ 4103.143195] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > [ 4103.148240] RIP: 0033:0x7f386a418047 > > [ 4103.151828] Code: 24 fb ff 4c 89 e0 5b 41 5c 5d c3 0f 1f 84 00 00 > > 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 > > 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 91 cd 0f 00 f7 d8 64 89 > > 02 48 > > [ 4103.170543] RSP: 002b:00007ffd4793ff38 EFLAGS: 00000293 ORIG_RAX: > > 00000000000000d9 > > [ 4103.178095] RAX: ffffffffffffffda RBX: 0000000004933830 RCX: 00007f3= 86a418047 > > [ 4103.185214] RDX: 0000000000008000 RSI: 0000000004933860 RDI: 0000000= 000000006 > > [ 4103.192335] RBP: 00007ffd4793ff70 R08: 0000000000000000 R09: 0000000= 000000001 > > [ 4103.199454] R10: 0000000000000004 R11: 0000000000000293 R12: 0000000= 004933834 > > [ 4103.206573] R13: 0000000004933860 R14: ffffffffffffff60 R15: 0000000= 000000000 > > [ 4103.213695] > > [ 4103.215883] Modules linked in: snd_seq_dummy snd_hrtimer > > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat > > ip6table_mangle ip6table_raw ip6table > > [ 4103.215911] kvm_intel snd_hwdep snd_seq eeepc_wmi kvm > > snd_seq_device asus_wmi iTCO_wdt mei_pxp mei_hdcp ledtrig_audio > > irqbypass snd_pcm ee1004 intel_pmc_bxt sparse_keymap rapl snd_timer > > pmt_telemetry mei_me iTCO_vendor_support platform_profile joydev > > intel_cstate pmt_class snde > > [ 4103.366194] ---[ end trace 0000000000000000 ]--- > > > >>> > >>>> --- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > >>>> drivers/gpu/drm/i915/i915_gem.c | 2 +- > >>>> drivers/gpu/drm/i915/intel_region_ttm.c | 2 +- > >>>> drivers/gpu/drm/ttm/ttm_bo.c | 112 ++++++++--------= ----- > >>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - > >>>> drivers/gpu/drm/ttm/ttm_device.c | 24 ++--- > >>>> include/drm/ttm/ttm_bo_api.h | 18 +--- > >>>> include/drm/ttm/ttm_device.h | 7 +- > >>>> 8 files changed, 57 insertions(+), 111 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> index 2b1db37e25c1..74ccbd566777 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> @@ -3984,7 +3984,7 @@ void amdgpu_device_fini_hw(struct amdgpu_devic= e > >>>> *adev) > >>>> amdgpu_fence_driver_hw_fini(adev); > >>>> if (adev->mman.initialized) > >>>> - flush_delayed_work(&adev->mman.bdev.wq); > >>>> + drain_workqueue(adev->mman.bdev.wq); > >>>> if (adev->pm_sysfs_en) > >>>> amdgpu_pm_sysfs_fini(adev); > >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c > >>>> b/drivers/gpu/drm/i915/i915_gem.c > >>>> index 8468ca9885fd..c38306f156d6 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>>> @@ -1099,7 +1099,7 @@ void i915_gem_drain_freed_objects(struct > >>>> drm_i915_private *i915) > >>>> { > >>>> while (atomic_read(&i915->mm.free_count)) { > >>>> flush_work(&i915->mm.free_work); > >>>> - flush_delayed_work(&i915->bdev.wq); > >>>> + drain_workqueue(i915->bdev.wq); > >>>> rcu_barrier(); > >>>> } > >>>> } > >>>> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> b/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> index cf89d0c2a2d9..657bbc16a48a 100644 > >>>> --- a/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> @@ -132,7 +132,7 @@ int intel_region_ttm_fini(struct > >>>> intel_memory_region *mem) > >>>> break; > >>>> msleep(20); > >>>> - flush_delayed_work(&mem->i915->bdev.wq); > >>>> + drain_workqueue(mem->i915->bdev.wq); > >>>> } > >>>> /* If we leaked objects, Don't free the region causing use > >>>> after free */ > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_= bo.c > >>>> index b77262a623e0..4749b65bedc4 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >>>> @@ -280,14 +280,13 @@ static int ttm_bo_cleanup_refs(struct > >>>> ttm_buffer_object *bo, > >>>> ret =3D 0; > >>>> } > >>>> - if (ret || unlikely(list_empty(&bo->ddestroy))) { > >>>> + if (ret) { > >>>> if (unlock_resv) > >>>> dma_resv_unlock(bo->base.resv); > >>>> spin_unlock(&bo->bdev->lru_lock); > >>>> return ret; > >>>> } > >>>> - list_del_init(&bo->ddestroy); > >>>> spin_unlock(&bo->bdev->lru_lock); > >>>> ttm_bo_cleanup_memtype_use(bo); > >>>> @@ -300,47 +299,21 @@ static int ttm_bo_cleanup_refs(struct > >>>> ttm_buffer_object *bo, > >>>> } > >>>> /* > >>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all > >>>> - * encountered buffers. > >>>> + * Block for the dma_resv object to become idle, lock the buffer an= d > >>>> clean up > >>>> + * the resource and tt object. > >>>> */ > >>>> -bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all= ) > >>>> +static void ttm_bo_delayed_delete(struct work_struct *work) > >>>> { > >>>> - struct list_head removed; > >>>> - bool empty; > >>>> - > >>>> - INIT_LIST_HEAD(&removed); > >>>> - > >>>> - spin_lock(&bdev->lru_lock); > >>>> - while (!list_empty(&bdev->ddestroy)) { > >>>> - struct ttm_buffer_object *bo; > >>>> - > >>>> - bo =3D list_first_entry(&bdev->ddestroy, struct > >>>> ttm_buffer_object, > >>>> - ddestroy); > >>>> - list_move_tail(&bo->ddestroy, &removed); > >>>> - if (!ttm_bo_get_unless_zero(bo)) > >>>> - continue; > >>>> - > >>>> - if (remove_all || bo->base.resv !=3D &bo->base._resv) { > >>>> - spin_unlock(&bdev->lru_lock); > >>>> - dma_resv_lock(bo->base.resv, NULL); > >>>> - > >>>> - spin_lock(&bdev->lru_lock); > >>>> - ttm_bo_cleanup_refs(bo, false, !remove_all, true); > >>>> - > >>>> - } else if (dma_resv_trylock(bo->base.resv)) { > >>>> - ttm_bo_cleanup_refs(bo, false, !remove_all, true); > >>>> - } else { > >>>> - spin_unlock(&bdev->lru_lock); > >>>> - } > >>>> + struct ttm_buffer_object *bo; > >>>> - ttm_bo_put(bo); > >>>> - spin_lock(&bdev->lru_lock); > >>>> - } > >>>> - list_splice_tail(&removed, &bdev->ddestroy); > >>>> - empty =3D list_empty(&bdev->ddestroy); > >>>> - spin_unlock(&bdev->lru_lock); > >>>> + bo =3D container_of(work, typeof(*bo), delayed_delete); > >>>> - return empty; > >>>> + dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, > >>>> false, > >>>> + MAX_SCHEDULE_TIMEOUT); > >>>> + dma_resv_lock(bo->base.resv, NULL); > >>>> + ttm_bo_cleanup_memtype_use(bo); > >>>> + dma_resv_unlock(bo->base.resv); > >>>> + ttm_bo_put(bo); > >>>> } > >>>> static void ttm_bo_release(struct kref *kref) > >>>> @@ -369,44 +342,40 @@ static void ttm_bo_release(struct kref *kref) > >>>> drm_vma_offset_remove(bdev->vma_manager, > >>>> &bo->base.vma_node); > >>>> ttm_mem_io_free(bdev, bo->resource); > >>>> - } > >>>> - > >>>> - if (!dma_resv_test_signaled(bo->base.resv, > >>>> DMA_RESV_USAGE_BOOKKEEP) || > >>>> - !dma_resv_trylock(bo->base.resv)) { > >>>> - /* The BO is not idle, resurrect it for delayed destroy */ > >>>> - ttm_bo_flush_all_fences(bo); > >>>> - bo->deleted =3D true; > >>>> - spin_lock(&bo->bdev->lru_lock); > >>>> + if (!dma_resv_test_signaled(bo->base.resv, > >>>> + DMA_RESV_USAGE_BOOKKEEP) || > >>>> + !dma_resv_trylock(bo->base.resv)) { > >>>> + /* The BO is not idle, resurrect it for delayed destroy= */ > >>>> + ttm_bo_flush_all_fences(bo); > >>>> + bo->deleted =3D true; > >>>> - /* > >>>> - * Make pinned bos immediately available to > >>>> - * shrinkers, now that they are queued for > >>>> - * destruction. > >>>> - * > >>>> - * FIXME: QXL is triggering this. Can be removed when the > >>>> - * driver is fixed. > >>>> - */ > >>>> - if (bo->pin_count) { > >>>> - bo->pin_count =3D 0; > >>>> - ttm_resource_move_to_lru_tail(bo->resource); > >>>> - } > >>>> + spin_lock(&bo->bdev->lru_lock); > >>>> - kref_init(&bo->kref); > >>>> - list_add_tail(&bo->ddestroy, &bdev->ddestroy); > >>>> - spin_unlock(&bo->bdev->lru_lock); > >>>> + /* > >>>> + * Make pinned bos immediately available to > >>>> + * shrinkers, now that they are queued for > >>>> + * destruction. > >>>> + * > >>>> + * FIXME: QXL is triggering this. Can be removed when t= he > >>>> + * driver is fixed. > >>>> + */ > >>>> + if (bo->pin_count) { > >>>> + bo->pin_count =3D 0; > >>>> + ttm_resource_move_to_lru_tail(bo->resource); > >>>> + } > >>>> - schedule_delayed_work(&bdev->wq, > >>>> - ((HZ / 100) < 1) ? 1 : HZ / 100); > >>>> - return; > >>>> - } > >>>> + kref_init(&bo->kref); > >>>> + spin_unlock(&bo->bdev->lru_lock); > >>>> - spin_lock(&bo->bdev->lru_lock); > >>>> - list_del(&bo->ddestroy); > >>>> - spin_unlock(&bo->bdev->lru_lock); > >>>> + INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete); > >>>> + queue_work(bdev->wq, &bo->delayed_delete); > >>>> + return; > >>>> + } > >>>> - ttm_bo_cleanup_memtype_use(bo); > >>>> - dma_resv_unlock(bo->base.resv); > >>>> + ttm_bo_cleanup_memtype_use(bo); > >>>> + dma_resv_unlock(bo->base.resv); > >>>> + } > >>>> atomic_dec(&ttm_glob.bo_count); > >>>> bo->destroy(bo); > >>>> @@ -946,7 +915,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev= , > >>>> struct ttm_buffer_object *bo, > >>>> int ret; > >>>> kref_init(&bo->kref); > >>>> - INIT_LIST_HEAD(&bo->ddestroy); > >>>> bo->bdev =3D bdev; > >>>> bo->type =3D type; > >>>> bo->page_alignment =3D alignment; > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> index ba3aa0a0fc43..ae4b7922ee1a 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> @@ -230,7 +230,6 @@ static int ttm_buffer_object_transfer(struct > >>>> ttm_buffer_object *bo, > >>>> */ > >>>> atomic_inc(&ttm_glob.bo_count); > >>>> - INIT_LIST_HEAD(&fbo->base.ddestroy); > >>>> drm_vma_node_reset(&fbo->base.base.vma_node); > >>>> kref_init(&fbo->base.kref); > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c > >>>> b/drivers/gpu/drm/ttm/ttm_device.c > >>>> index e7147e304637..e9bedca4dfdc 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_device.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c > >>>> @@ -175,16 +175,6 @@ int ttm_device_swapout(struct ttm_device *bdev, > >>>> struct ttm_operation_ctx *ctx, > >>>> } > >>>> EXPORT_SYMBOL(ttm_device_swapout); > >>>> -static void ttm_device_delayed_workqueue(struct work_struct *wor= k) > >>>> -{ > >>>> - struct ttm_device *bdev =3D > >>>> - container_of(work, struct ttm_device, wq.work); > >>>> - > >>>> - if (!ttm_bo_delayed_delete(bdev, false)) > >>>> - schedule_delayed_work(&bdev->wq, > >>>> - ((HZ / 100) < 1) ? 1 : HZ / 100); > >>>> -} > >>>> - > >>>> /** > >>>> * ttm_device_init > >>>> * > >>>> @@ -215,15 +205,19 @@ int ttm_device_init(struct ttm_device *bdev, > >>>> struct ttm_device_funcs *funcs, > >>>> if (ret) > >>>> return ret; > >>>> + bdev->wq =3D alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGH= PRI, > >>>> 16); > >>>> + if (!bdev->wq) { > >>>> + ttm_global_release(); > >>>> + return -ENOMEM; > >>>> + } > >>>> + > >>>> bdev->funcs =3D funcs; > >>>> ttm_sys_man_init(bdev); > >>>> ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32); > >>>> bdev->vma_manager =3D vma_manager; > >>>> - INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue); > >>>> spin_lock_init(&bdev->lru_lock); > >>>> - INIT_LIST_HEAD(&bdev->ddestroy); > >>>> INIT_LIST_HEAD(&bdev->pinned); > >>>> bdev->dev_mapping =3D mapping; > >>>> mutex_lock(&ttm_global_mutex); > >>>> @@ -247,10 +241,8 @@ void ttm_device_fini(struct ttm_device *bdev) > >>>> list_del(&bdev->device_list); > >>>> mutex_unlock(&ttm_global_mutex); > >>>> - cancel_delayed_work_sync(&bdev->wq); > >>>> - > >>>> - if (ttm_bo_delayed_delete(bdev, true)) > >>>> - pr_debug("Delayed destroy list was clean\n"); > >>>> + drain_workqueue(bdev->wq); > >>>> + destroy_workqueue(bdev->wq); > >>>> spin_lock(&bdev->lru_lock); > >>>> for (i =3D 0; i < TTM_MAX_BO_PRIORITY; ++i) > >>>> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_a= pi.h > >>>> index 7758347c461c..69e62bbb01e3 100644 > >>>> --- a/include/drm/ttm/ttm_bo_api.h > >>>> +++ b/include/drm/ttm/ttm_bo_api.h > >>>> @@ -92,7 +92,6 @@ struct ttm_tt; > >>>> * @ttm: TTM structure holding system pages. > >>>> * @evicted: Whether the object was evicted without user-space > >>>> knowing. > >>>> * @deleted: True if the object is only a zombie and already dele= ted. > >>>> - * @ddestroy: List head for the delayed destroy list. > >>>> * @swap: List head for swap LRU list. > >>>> * @offset: The current GPU offset, which can have different mean= ings > >>>> * depending on the memory type. For SYSTEM type memory, it shoul= d > >>>> be 0. > >>>> @@ -135,19 +134,14 @@ struct ttm_buffer_object { > >>>> struct ttm_tt *ttm; > >>>> bool deleted; > >>>> struct ttm_lru_bulk_move *bulk_move; > >>>> + unsigned priority; > >>>> + unsigned pin_count; > >>>> /** > >>>> - * Members protected by the bdev::lru_lock. > >>>> - */ > >>>> - > >>>> - struct list_head ddestroy; > >>>> - > >>>> - /** > >>>> - * Members protected by a bo reservation. > >>>> + * @delayed_delete: Work item used when we can't delete the BO > >>>> + * immediately > >>>> */ > >>>> - > >>>> - unsigned priority; > >>>> - unsigned pin_count; > >>>> + struct work_struct delayed_delete; > >>>> /** > >>>> * Special members that are protected by the reserve lock > >>>> @@ -448,8 +442,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)= ; > >>>> int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long = addr, > >>>> void *buf, int len, int write); > >>>> -bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all= ); > >>>> - > >>>> vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t pr= ot); > >>>> #endif > >>>> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_devi= ce.h > >>>> index 95b3c04b1ab9..4f3e81eac6f3 100644 > >>>> --- a/include/drm/ttm/ttm_device.h > >>>> +++ b/include/drm/ttm/ttm_device.h > >>>> @@ -251,11 +251,6 @@ struct ttm_device { > >>>> */ > >>>> spinlock_t lru_lock; > >>>> - /** > >>>> - * @ddestroy: Destroyed but not yet cleaned up buffer objects. > >>>> - */ > >>>> - struct list_head ddestroy; > >>>> - > >>>> /** > >>>> * @pinned: Buffer objects which are pinned and so not on any > >>>> LRU list. > >>>> */ > >>>> @@ -270,7 +265,7 @@ struct ttm_device { > >>>> /** > >>>> * @wq: Work queue structure for the delayed delete workqueue= . > >>>> */ > >>>> - struct delayed_work wq; > >>>> + struct workqueue_struct *wq; > >>>> }; > >>>> int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t > >>>> gfp_flags); > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A9C8EB64D0 for ; Tue, 13 Jun 2023 14:18:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8695C10E264; Tue, 13 Jun 2023 14:18:40 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id B46BE10E264 for ; Tue, 13 Jun 2023 14:18:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686665917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2G1HiDJI9MtsuF2kfhlCHhhtN8s8ygSWEma90ikyRpE=; b=MlI8RcJ3ZL9w0loNEIK44/kmeUGh8mIwzUcJWIfydB8Wi4CSQ+muoULu7WHg3DX+O5SNeh 7auX+6a1mSBPYwKXHCFTh+vLdE/bd19GkWWnG2rWQ6zmKekNbqOfIZL+ILOq2nT9Th6ojO 2F5aq+ZzYmm1O5zFAh7gXSghoXF7lso= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-UAgHwWcEPHmr5_A5F5nAqg-1; Tue, 13 Jun 2023 10:18:25 -0400 X-MC-Unique: UAgHwWcEPHmr5_A5F5nAqg-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-4f630005028so1125632e87.0 for ; Tue, 13 Jun 2023 07:18:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686665904; x=1689257904; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2G1HiDJI9MtsuF2kfhlCHhhtN8s8ygSWEma90ikyRpE=; b=IGKvD3lH5QHtCdVszW7+jhPp+91+6cKpaHLdWxDWPmQe0JrLTadVYDD96Yif8JSc8C yiVqgBY+1P78M7+b5KqzcfFIL4vHn/KESl62PGCjVyQ4ARFuknE57CbQ08py85td3e9i NlsAs1rFh/gBbzhkzXwemh5FpTvDMp/vfHCWFWPxZhFOD5772DQdL/nTsq4JchKwpARX VfATdgKH5C9fnM+6koEv26wg4WlHd3joXOZdMkTb8exYtd4DVNgRdxfJnJHb5Bl9Gn7N yfo7xU+27Xa4+vCPJ/VnekRfG8Dtfm8F+xKZkdWT56M+oERM3vSEQcchuQ7OlDaJSYr+ 1nJg== X-Gm-Message-State: AC+VfDwF8+1yiQ0pOt1g/VxRGi+jIq2wHS0iUNKlevQhJzoiQgGTs6ZU PYdHi2vf7D3n3UnWPoTuqfvmHW2d1ubetJA0pJwAiyvLgpQOoiLBdK8hWYuiE+UpneinryiRWKR cjxJtEo6WLHJSdSjHw4Vy+SaLjhBvuY7VVXN68bXI4KhsrTnA95zo2vE= X-Received: by 2002:a05:651c:1697:b0:2b1:d17a:e22c with SMTP id bd23-20020a05651c169700b002b1d17ae22cmr6187792ljb.1.1686665903487; Tue, 13 Jun 2023 07:18:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6/D3IYQ/MgsAGuFlPhGSIpQKf1XFXlIZ37NxPpt1bkF709B6LQgoGrJXwyHkzPt8VGukiejltomq4jAuYyIy8= X-Received: by 2002:a05:651c:1697:b0:2b1:d17a:e22c with SMTP id bd23-20020a05651c169700b002b1d17ae22cmr6187764ljb.1.1686665902953; Tue, 13 Jun 2023 07:18:22 -0700 (PDT) MIME-Version: 1.0 References: <20221125102137.1801-1-christian.koenig@amd.com> <20221125102137.1801-3-christian.koenig@amd.com> <8ff841e3-8eef-9ec2-2ba5-4907f18873c0@amd.com> <8cb02812-1bc5-c1ff-13b0-eeec87c26859@amd.com> In-Reply-To: <8cb02812-1bc5-c1ff-13b0-eeec87c26859@amd.com> From: Karol Herbst Date: Tue, 13 Jun 2023 16:18:11 +0200 Message-ID: To: =?UTF-8?Q?Christian_K=C3=B6nig?= X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Intel-gfx] [PATCH 3/9] drm/ttm: use per BO cleanup workers X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Airlie , Felix Kuehling , amd-gfx@lists.freedesktop.org, Intel Graphics Development , dri-devel@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, Jun 13, 2023 at 3:59=E2=80=AFPM Christian K=C3=B6nig wrote: > > Am 13.06.23 um 15:05 schrieb Karol Herbst: > > On Mon, Dec 5, 2022 at 2:40=E2=80=AFPM Christian K=C3=B6nig wrote: > >> Am 29.11.22 um 22:14 schrieb Felix Kuehling: > >>> On 2022-11-25 05:21, Christian K=C3=B6nig wrote: > >>>> Instead of a single worker going over the list of delete BOs in regu= lar > >>>> intervals use a per BO worker which blocks for the resv object and > >>>> locking of the BO. > >>>> > >>>> This not only simplifies the handling massively, but also results in > >>>> much better response time when cleaning up buffers. > >>>> > >>>> Signed-off-by: Christian K=C3=B6nig > >>> Just thinking out loud: If I understand it correctly, this can cause = a > >>> lot of sleeping worker threads when > >>> AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE is used and many BOs are freed > >>> at the same time. This happens e.g. when a KFD process terminates or > >>> crashes. I guess with a concurrency-managed workqueue this isn't goin= g > >>> to be excessive. And since it's on a per device workqueue, it doesn't > >>> stall work items on the system work queue or from other devices. > >> Yes, exactly that. The last parameter to alloc_workqueue() limits how > >> many work items can be sleeping. > >> > >>> I'm trying to understand why you set WQ_MEM_RECLAIM. This work queue > >>> is not about freeing ttm_resources but about freeing the BOs. But it > >>> affects freeing of ghost_objs that are holding the ttm_resources bein= g > >>> freed. > >> Well if the BO is idle, but not immediately lockable we delegate freei= ng > >> the backing pages in the TT object to those workers as well. It might > >> even be a good idea to use a separate wq for this case. > >> > >>> If those assumptions all make sense, patches 1-3 are > >>> > >>> Reviewed-by: Felix Kuehling > >> Thanks, > >> Christian. > >> > > This patch causes a heap use-after-free when using nouveau with the > > potential of trashing filesystems, is there a way to revert it until > > we figure out a proper solution to the problem? > > Uff I don't think so, we have quite some work based on top of this. But > let me double check. > yeah.. I already talked with Dave about fixing this issue as Dave has more knowledge on this part of the driver (I hope), so we might have a fix soonish, but the concerning part is, that it's already out to users, so might be better to be able to revert it if the fix takes a while to emerge. > On the other hand have you tried running this with KASAN to catch use > after free errors? yes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213#note_1942777 > > Since we now block for work to finish and not check every few > milliseconds to garbage collect memory will now be reclaimed much faster > after freeing it. yeah, that kinda makes sense. This entire issue feels like a race happening as I need to run the OpenGL CTS in parallel with 8+ threads to trigger it reliably. > > Regards, > Christian. > > > > > Bug: https://gitlab.freedesktop.org/drm/nouveau/-/issues/213 > > > > example trace on affected systems: > > > > [ 4102.946946] general protection fault, probably for non-canonical > > address 0x5f775ce3bd949b45: 0000 [#3] PREEMPT SMP NOPTI > > [ 4102.957794] CPU: 12 PID: 89561 Comm: glcts Tainted: G D > > 6.3.5-200.fc38.x86_64 #1 > > [ 4102.966556] Hardware name: ASUS System Product Name/PRIME B660-PLUS > > D4, BIOS 0418 10/13/2021 > > [ 4102.974972] RIP: 0010:__kmem_cache_alloc_node+0x1ba/0x320 > > [ 4102.980362] Code: 2b 14 25 28 00 00 00 0f 85 74 01 00 00 48 83 c4 > > 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 41 8b 47 28 4d 8b 07 > > 48 01 f8 <48> 8b 18 48 89 c1 49 33 9f b8 00 00 00 48 0f c9 48 31 cb 41 > > f6 c0 > > [ 4102.999073] RSP: 0018:ffff9764e0057b40 EFLAGS: 00010202 > > [ 4103.004291] RAX: 5f775ce3bd949b45 RBX: 0000000000000dc0 RCX: 0000000= 000000046 > > [ 4103.011408] RDX: 00000002cf87600c RSI: 0000000000000dc0 RDI: 5f775ce= 3bd949b15 > > [ 4103.018528] RBP: 0000000000000dc0 R08: 00000000000390c0 R09: 0000000= 030302d6d > > [ 4103.025649] R10: 00000000756c7473 R11: 0000000020090298 R12: 0000000= 000000000 > > [ 4103.032767] R13: 00000000ffffffff R14: 0000000000000046 R15: ffff8bd= a80042600 > > [ 4103.039887] FS: 00007f386a85ef00(0000) GS:ffff8be1df700000(0000) > > knlGS:0000000000000000 > > [ 4103.047958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 4103.053692] CR2: 000000000493b868 CR3: 000000014c3ba000 CR4: 0000000= 000f50ee0 > > [ 4103.060812] PKRU: 55555554 > > [ 4103.063520] Call Trace: > > [ 4103.065970] > > [ 4103.068071] ? die_addr+0x36/0x90 > > [ 4103.071384] ? exc_general_protection+0x1be/0x420 > > [ 4103.076081] ? asm_exc_general_protection+0x26/0x30 > > [ 4103.080952] ? __kmem_cache_alloc_node+0x1ba/0x320 > > [ 4103.085734] ? ext4_htree_store_dirent+0x42/0x180 > > [ 4103.090431] ? ext4_htree_store_dirent+0x42/0x180 > > [ 4103.095132] __kmalloc+0x4d/0x150 > > [ 4103.098444] ext4_htree_store_dirent+0x42/0x180 > > [ 4103.102970] htree_dirblock_to_tree+0x1ed/0x370 > > [ 4103.107494] ext4_htree_fill_tree+0x109/0x3d0 > > [ 4103.111846] ext4_readdir+0x6d4/0xa80 > > [ 4103.115505] iterate_dir+0x178/0x1c0 > > [ 4103.119076] __x64_sys_getdents64+0x88/0x130 > > [ 4103.123341] ? __pfx_filldir64+0x10/0x10 > > [ 4103.127260] do_syscall_64+0x5d/0x90 > > [ 4103.130835] ? handle_mm_fault+0x11e/0x310 > > [ 4103.134927] ? do_user_addr_fault+0x1e0/0x720 > > [ 4103.139278] ? exc_page_fault+0x7c/0x180 > > [ 4103.143195] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > [ 4103.148240] RIP: 0033:0x7f386a418047 > > [ 4103.151828] Code: 24 fb ff 4c 89 e0 5b 41 5c 5d c3 0f 1f 84 00 00 > > 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 > > 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 91 cd 0f 00 f7 d8 64 89 > > 02 48 > > [ 4103.170543] RSP: 002b:00007ffd4793ff38 EFLAGS: 00000293 ORIG_RAX: > > 00000000000000d9 > > [ 4103.178095] RAX: ffffffffffffffda RBX: 0000000004933830 RCX: 00007f3= 86a418047 > > [ 4103.185214] RDX: 0000000000008000 RSI: 0000000004933860 RDI: 0000000= 000000006 > > [ 4103.192335] RBP: 00007ffd4793ff70 R08: 0000000000000000 R09: 0000000= 000000001 > > [ 4103.199454] R10: 0000000000000004 R11: 0000000000000293 R12: 0000000= 004933834 > > [ 4103.206573] R13: 0000000004933860 R14: ffffffffffffff60 R15: 0000000= 000000000 > > [ 4103.213695] > > [ 4103.215883] Modules linked in: snd_seq_dummy snd_hrtimer > > nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat > > ip6table_mangle ip6table_raw ip6table > > [ 4103.215911] kvm_intel snd_hwdep snd_seq eeepc_wmi kvm > > snd_seq_device asus_wmi iTCO_wdt mei_pxp mei_hdcp ledtrig_audio > > irqbypass snd_pcm ee1004 intel_pmc_bxt sparse_keymap rapl snd_timer > > pmt_telemetry mei_me iTCO_vendor_support platform_profile joydev > > intel_cstate pmt_class snde > > [ 4103.366194] ---[ end trace 0000000000000000 ]--- > > > >>> > >>>> --- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > >>>> drivers/gpu/drm/i915/i915_gem.c | 2 +- > >>>> drivers/gpu/drm/i915/intel_region_ttm.c | 2 +- > >>>> drivers/gpu/drm/ttm/ttm_bo.c | 112 ++++++++--------= ----- > >>>> drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - > >>>> drivers/gpu/drm/ttm/ttm_device.c | 24 ++--- > >>>> include/drm/ttm/ttm_bo_api.h | 18 +--- > >>>> include/drm/ttm/ttm_device.h | 7 +- > >>>> 8 files changed, 57 insertions(+), 111 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> index 2b1db37e25c1..74ccbd566777 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>> @@ -3984,7 +3984,7 @@ void amdgpu_device_fini_hw(struct amdgpu_devic= e > >>>> *adev) > >>>> amdgpu_fence_driver_hw_fini(adev); > >>>> if (adev->mman.initialized) > >>>> - flush_delayed_work(&adev->mman.bdev.wq); > >>>> + drain_workqueue(adev->mman.bdev.wq); > >>>> if (adev->pm_sysfs_en) > >>>> amdgpu_pm_sysfs_fini(adev); > >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c > >>>> b/drivers/gpu/drm/i915/i915_gem.c > >>>> index 8468ca9885fd..c38306f156d6 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>>> @@ -1099,7 +1099,7 @@ void i915_gem_drain_freed_objects(struct > >>>> drm_i915_private *i915) > >>>> { > >>>> while (atomic_read(&i915->mm.free_count)) { > >>>> flush_work(&i915->mm.free_work); > >>>> - flush_delayed_work(&i915->bdev.wq); > >>>> + drain_workqueue(i915->bdev.wq); > >>>> rcu_barrier(); > >>>> } > >>>> } > >>>> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> b/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> index cf89d0c2a2d9..657bbc16a48a 100644 > >>>> --- a/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c > >>>> @@ -132,7 +132,7 @@ int intel_region_ttm_fini(struct > >>>> intel_memory_region *mem) > >>>> break; > >>>> msleep(20); > >>>> - flush_delayed_work(&mem->i915->bdev.wq); > >>>> + drain_workqueue(mem->i915->bdev.wq); > >>>> } > >>>> /* If we leaked objects, Don't free the region causing use > >>>> after free */ > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_= bo.c > >>>> index b77262a623e0..4749b65bedc4 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c > >>>> @@ -280,14 +280,13 @@ static int ttm_bo_cleanup_refs(struct > >>>> ttm_buffer_object *bo, > >>>> ret =3D 0; > >>>> } > >>>> - if (ret || unlikely(list_empty(&bo->ddestroy))) { > >>>> + if (ret) { > >>>> if (unlock_resv) > >>>> dma_resv_unlock(bo->base.resv); > >>>> spin_unlock(&bo->bdev->lru_lock); > >>>> return ret; > >>>> } > >>>> - list_del_init(&bo->ddestroy); > >>>> spin_unlock(&bo->bdev->lru_lock); > >>>> ttm_bo_cleanup_memtype_use(bo); > >>>> @@ -300,47 +299,21 @@ static int ttm_bo_cleanup_refs(struct > >>>> ttm_buffer_object *bo, > >>>> } > >>>> /* > >>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all > >>>> - * encountered buffers. > >>>> + * Block for the dma_resv object to become idle, lock the buffer an= d > >>>> clean up > >>>> + * the resource and tt object. > >>>> */ > >>>> -bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all= ) > >>>> +static void ttm_bo_delayed_delete(struct work_struct *work) > >>>> { > >>>> - struct list_head removed; > >>>> - bool empty; > >>>> - > >>>> - INIT_LIST_HEAD(&removed); > >>>> - > >>>> - spin_lock(&bdev->lru_lock); > >>>> - while (!list_empty(&bdev->ddestroy)) { > >>>> - struct ttm_buffer_object *bo; > >>>> - > >>>> - bo =3D list_first_entry(&bdev->ddestroy, struct > >>>> ttm_buffer_object, > >>>> - ddestroy); > >>>> - list_move_tail(&bo->ddestroy, &removed); > >>>> - if (!ttm_bo_get_unless_zero(bo)) > >>>> - continue; > >>>> - > >>>> - if (remove_all || bo->base.resv !=3D &bo->base._resv) { > >>>> - spin_unlock(&bdev->lru_lock); > >>>> - dma_resv_lock(bo->base.resv, NULL); > >>>> - > >>>> - spin_lock(&bdev->lru_lock); > >>>> - ttm_bo_cleanup_refs(bo, false, !remove_all, true); > >>>> - > >>>> - } else if (dma_resv_trylock(bo->base.resv)) { > >>>> - ttm_bo_cleanup_refs(bo, false, !remove_all, true); > >>>> - } else { > >>>> - spin_unlock(&bdev->lru_lock); > >>>> - } > >>>> + struct ttm_buffer_object *bo; > >>>> - ttm_bo_put(bo); > >>>> - spin_lock(&bdev->lru_lock); > >>>> - } > >>>> - list_splice_tail(&removed, &bdev->ddestroy); > >>>> - empty =3D list_empty(&bdev->ddestroy); > >>>> - spin_unlock(&bdev->lru_lock); > >>>> + bo =3D container_of(work, typeof(*bo), delayed_delete); > >>>> - return empty; > >>>> + dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, > >>>> false, > >>>> + MAX_SCHEDULE_TIMEOUT); > >>>> + dma_resv_lock(bo->base.resv, NULL); > >>>> + ttm_bo_cleanup_memtype_use(bo); > >>>> + dma_resv_unlock(bo->base.resv); > >>>> + ttm_bo_put(bo); > >>>> } > >>>> static void ttm_bo_release(struct kref *kref) > >>>> @@ -369,44 +342,40 @@ static void ttm_bo_release(struct kref *kref) > >>>> drm_vma_offset_remove(bdev->vma_manager, > >>>> &bo->base.vma_node); > >>>> ttm_mem_io_free(bdev, bo->resource); > >>>> - } > >>>> - > >>>> - if (!dma_resv_test_signaled(bo->base.resv, > >>>> DMA_RESV_USAGE_BOOKKEEP) || > >>>> - !dma_resv_trylock(bo->base.resv)) { > >>>> - /* The BO is not idle, resurrect it for delayed destroy */ > >>>> - ttm_bo_flush_all_fences(bo); > >>>> - bo->deleted =3D true; > >>>> - spin_lock(&bo->bdev->lru_lock); > >>>> + if (!dma_resv_test_signaled(bo->base.resv, > >>>> + DMA_RESV_USAGE_BOOKKEEP) || > >>>> + !dma_resv_trylock(bo->base.resv)) { > >>>> + /* The BO is not idle, resurrect it for delayed destroy= */ > >>>> + ttm_bo_flush_all_fences(bo); > >>>> + bo->deleted =3D true; > >>>> - /* > >>>> - * Make pinned bos immediately available to > >>>> - * shrinkers, now that they are queued for > >>>> - * destruction. > >>>> - * > >>>> - * FIXME: QXL is triggering this. Can be removed when the > >>>> - * driver is fixed. > >>>> - */ > >>>> - if (bo->pin_count) { > >>>> - bo->pin_count =3D 0; > >>>> - ttm_resource_move_to_lru_tail(bo->resource); > >>>> - } > >>>> + spin_lock(&bo->bdev->lru_lock); > >>>> - kref_init(&bo->kref); > >>>> - list_add_tail(&bo->ddestroy, &bdev->ddestroy); > >>>> - spin_unlock(&bo->bdev->lru_lock); > >>>> + /* > >>>> + * Make pinned bos immediately available to > >>>> + * shrinkers, now that they are queued for > >>>> + * destruction. > >>>> + * > >>>> + * FIXME: QXL is triggering this. Can be removed when t= he > >>>> + * driver is fixed. > >>>> + */ > >>>> + if (bo->pin_count) { > >>>> + bo->pin_count =3D 0; > >>>> + ttm_resource_move_to_lru_tail(bo->resource); > >>>> + } > >>>> - schedule_delayed_work(&bdev->wq, > >>>> - ((HZ / 100) < 1) ? 1 : HZ / 100); > >>>> - return; > >>>> - } > >>>> + kref_init(&bo->kref); > >>>> + spin_unlock(&bo->bdev->lru_lock); > >>>> - spin_lock(&bo->bdev->lru_lock); > >>>> - list_del(&bo->ddestroy); > >>>> - spin_unlock(&bo->bdev->lru_lock); > >>>> + INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete); > >>>> + queue_work(bdev->wq, &bo->delayed_delete); > >>>> + return; > >>>> + } > >>>> - ttm_bo_cleanup_memtype_use(bo); > >>>> - dma_resv_unlock(bo->base.resv); > >>>> + ttm_bo_cleanup_memtype_use(bo); > >>>> + dma_resv_unlock(bo->base.resv); > >>>> + } > >>>> atomic_dec(&ttm_glob.bo_count); > >>>> bo->destroy(bo); > >>>> @@ -946,7 +915,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev= , > >>>> struct ttm_buffer_object *bo, > >>>> int ret; > >>>> kref_init(&bo->kref); > >>>> - INIT_LIST_HEAD(&bo->ddestroy); > >>>> bo->bdev =3D bdev; > >>>> bo->type =3D type; > >>>> bo->page_alignment =3D alignment; > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> index ba3aa0a0fc43..ae4b7922ee1a 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > >>>> @@ -230,7 +230,6 @@ static int ttm_buffer_object_transfer(struct > >>>> ttm_buffer_object *bo, > >>>> */ > >>>> atomic_inc(&ttm_glob.bo_count); > >>>> - INIT_LIST_HEAD(&fbo->base.ddestroy); > >>>> drm_vma_node_reset(&fbo->base.base.vma_node); > >>>> kref_init(&fbo->base.kref); > >>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c > >>>> b/drivers/gpu/drm/ttm/ttm_device.c > >>>> index e7147e304637..e9bedca4dfdc 100644 > >>>> --- a/drivers/gpu/drm/ttm/ttm_device.c > >>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c > >>>> @@ -175,16 +175,6 @@ int ttm_device_swapout(struct ttm_device *bdev, > >>>> struct ttm_operation_ctx *ctx, > >>>> } > >>>> EXPORT_SYMBOL(ttm_device_swapout); > >>>> -static void ttm_device_delayed_workqueue(struct work_struct *wor= k) > >>>> -{ > >>>> - struct ttm_device *bdev =3D > >>>> - container_of(work, struct ttm_device, wq.work); > >>>> - > >>>> - if (!ttm_bo_delayed_delete(bdev, false)) > >>>> - schedule_delayed_work(&bdev->wq, > >>>> - ((HZ / 100) < 1) ? 1 : HZ / 100); > >>>> -} > >>>> - > >>>> /** > >>>> * ttm_device_init > >>>> * > >>>> @@ -215,15 +205,19 @@ int ttm_device_init(struct ttm_device *bdev, > >>>> struct ttm_device_funcs *funcs, > >>>> if (ret) > >>>> return ret; > >>>> + bdev->wq =3D alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGH= PRI, > >>>> 16); > >>>> + if (!bdev->wq) { > >>>> + ttm_global_release(); > >>>> + return -ENOMEM; > >>>> + } > >>>> + > >>>> bdev->funcs =3D funcs; > >>>> ttm_sys_man_init(bdev); > >>>> ttm_pool_init(&bdev->pool, dev, use_dma_alloc, use_dma32); > >>>> bdev->vma_manager =3D vma_manager; > >>>> - INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue); > >>>> spin_lock_init(&bdev->lru_lock); > >>>> - INIT_LIST_HEAD(&bdev->ddestroy); > >>>> INIT_LIST_HEAD(&bdev->pinned); > >>>> bdev->dev_mapping =3D mapping; > >>>> mutex_lock(&ttm_global_mutex); > >>>> @@ -247,10 +241,8 @@ void ttm_device_fini(struct ttm_device *bdev) > >>>> list_del(&bdev->device_list); > >>>> mutex_unlock(&ttm_global_mutex); > >>>> - cancel_delayed_work_sync(&bdev->wq); > >>>> - > >>>> - if (ttm_bo_delayed_delete(bdev, true)) > >>>> - pr_debug("Delayed destroy list was clean\n"); > >>>> + drain_workqueue(bdev->wq); > >>>> + destroy_workqueue(bdev->wq); > >>>> spin_lock(&bdev->lru_lock); > >>>> for (i =3D 0; i < TTM_MAX_BO_PRIORITY; ++i) > >>>> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_a= pi.h > >>>> index 7758347c461c..69e62bbb01e3 100644 > >>>> --- a/include/drm/ttm/ttm_bo_api.h > >>>> +++ b/include/drm/ttm/ttm_bo_api.h > >>>> @@ -92,7 +92,6 @@ struct ttm_tt; > >>>> * @ttm: TTM structure holding system pages. > >>>> * @evicted: Whether the object was evicted without user-space > >>>> knowing. > >>>> * @deleted: True if the object is only a zombie and already dele= ted. > >>>> - * @ddestroy: List head for the delayed destroy list. > >>>> * @swap: List head for swap LRU list. > >>>> * @offset: The current GPU offset, which can have different mean= ings > >>>> * depending on the memory type. For SYSTEM type memory, it shoul= d > >>>> be 0. > >>>> @@ -135,19 +134,14 @@ struct ttm_buffer_object { > >>>> struct ttm_tt *ttm; > >>>> bool deleted; > >>>> struct ttm_lru_bulk_move *bulk_move; > >>>> + unsigned priority; > >>>> + unsigned pin_count; > >>>> /** > >>>> - * Members protected by the bdev::lru_lock. > >>>> - */ > >>>> - > >>>> - struct list_head ddestroy; > >>>> - > >>>> - /** > >>>> - * Members protected by a bo reservation. > >>>> + * @delayed_delete: Work item used when we can't delete the BO > >>>> + * immediately > >>>> */ > >>>> - > >>>> - unsigned priority; > >>>> - unsigned pin_count; > >>>> + struct work_struct delayed_delete; > >>>> /** > >>>> * Special members that are protected by the reserve lock > >>>> @@ -448,8 +442,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)= ; > >>>> int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long = addr, > >>>> void *buf, int len, int write); > >>>> -bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all= ); > >>>> - > >>>> vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t pr= ot); > >>>> #endif > >>>> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_devi= ce.h > >>>> index 95b3c04b1ab9..4f3e81eac6f3 100644 > >>>> --- a/include/drm/ttm/ttm_device.h > >>>> +++ b/include/drm/ttm/ttm_device.h > >>>> @@ -251,11 +251,6 @@ struct ttm_device { > >>>> */ > >>>> spinlock_t lru_lock; > >>>> - /** > >>>> - * @ddestroy: Destroyed but not yet cleaned up buffer objects. > >>>> - */ > >>>> - struct list_head ddestroy; > >>>> - > >>>> /** > >>>> * @pinned: Buffer objects which are pinned and so not on any > >>>> LRU list. > >>>> */ > >>>> @@ -270,7 +265,7 @@ struct ttm_device { > >>>> /** > >>>> * @wq: Work queue structure for the delayed delete workqueue= . > >>>> */ > >>>> - struct delayed_work wq; > >>>> + struct workqueue_struct *wq; > >>>> }; > >>>> int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t > >>>> gfp_flags); >