From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23EB6C433F5 for ; Tue, 2 Nov 2021 17:55:18 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D601760F02 for ; Tue, 2 Nov 2021 17:55:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D601760F02 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9682D734F7; Tue, 2 Nov 2021 17:55:16 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3BE52734F7; Tue, 2 Nov 2021 17:55:15 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10156"; a="317533438" X-IronPort-AV: E=Sophos;i="5.87,203,1631602800"; d="scan'208";a="317533438" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2021 10:55:14 -0700 X-IronPort-AV: E=Sophos;i="5.87,203,1631602800"; d="scan'208";a="467828374" Received: from vanderss-mobl.ger.corp.intel.com (HELO [10.249.254.234]) ([10.249.254.234]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2021 10:55:13 -0700 Message-ID: Date: Tue, 2 Nov 2021 18:55:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [PATCH v3 2/2] drm/i915/ttm: Failsafe migration blits Content-Language: en-US To: Matthew Auld References: <20211102163425.505732-1-thomas.hellstrom@linux.intel.com> <20211102163425.505732-3-thomas.hellstrom@linux.intel.com> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel Graphics Development , Matthew Auld , ML dri-devel Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 11/2/21 18:40, Matthew Auld wrote: > On Tue, 2 Nov 2021 at 16:39, Thomas Hellström > wrote: >> If the initial fill blit or copy blit of an object fails, the old >> content of the data might be exposed and read as soon as either CPU- or >> GPU PTEs are set up to point at the pages. >> >> Intercept the blit fence with an async callback that checks the >> blit fence for errors and if there are errors performs an async cpu blit >> instead. If there is a failure to allocate the async dma_fence_work, >> allocate it on the stack and sync wait for the blit to complete. >> >> Add selftests that simulate gpu blit failures and failure to allocate >> the async dma_fence_work. >> >> A previous version of this pach used dma_fence_work, now that's >> opencoded which adds more code but might lower the latency >> somewhat in the common non-error case. >> >> v3: >> - Style fixes (Matthew Auld) >> >> Signed-off-by: Thomas Hellström >> Reviewed-by: Matthew Auld >> --- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 322 +++++++++++++++--- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 5 + >> .../drm/i915/gem/selftests/i915_gem_migrate.c | 24 +- >> 3 files changed, 295 insertions(+), 56 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> index 0ed6b7f2b95f..b89672c547f8 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> @@ -18,6 +18,29 @@ >> #include "gt/intel_gt.h" >> #include "gt/intel_migrate.h" >> >> +/** >> + * DOC: Selftest failure modes for failsafe migration: >> + * >> + * For fail_gpu_migration, the gpu blit scheduled is always a clear blit >> + * rather than a copy blit, and then we force the failure paths as if >> + * the blit fence returned an error. >> + * >> + * For fail_work_allocation we fail the kmalloc of the async worker, we >> + * sync the gpu blit. If it then fails, or fail_gpu_migration is set to >> + * true, then a memcpy operation is performed sync. >> + */ >> +#ifdef CONFIG_DRM_I915_SELFTEST > When pushing maybe make this: > > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) > > Which seems to be consistent with most of the other places. Hmm, I noticed that i915 is doing that, although I thought these macros were primarily intended for C expressions? /Thomas > >> +static bool fail_gpu_migration; >> +static bool fail_work_allocation; >> + >> +void i915_ttm_migrate_set_failure_modes(bool gpu_migration, >> + bool work_allocation) >> +{ >> + fail_gpu_migration = gpu_migration; >> + fail_work_allocation = work_allocation; >> +} >> +#endif >> + From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D233C433F5 for ; Tue, 2 Nov 2021 17:55:20 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4DB2161051 for ; Tue, 2 Nov 2021 17:55:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4DB2161051 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B9A5E734F8; Tue, 2 Nov 2021 17:55:16 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3BE52734F7; Tue, 2 Nov 2021 17:55:15 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10156"; a="317533438" X-IronPort-AV: E=Sophos;i="5.87,203,1631602800"; d="scan'208";a="317533438" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2021 10:55:14 -0700 X-IronPort-AV: E=Sophos;i="5.87,203,1631602800"; d="scan'208";a="467828374" Received: from vanderss-mobl.ger.corp.intel.com (HELO [10.249.254.234]) ([10.249.254.234]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2021 10:55:13 -0700 Message-ID: Date: Tue, 2 Nov 2021 18:55:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Content-Language: en-US To: Matthew Auld References: <20211102163425.505732-1-thomas.hellstrom@linux.intel.com> <20211102163425.505732-3-thomas.hellstrom@linux.intel.com> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Intel-gfx] [PATCH v3 2/2] drm/i915/ttm: Failsafe migration blits X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel Graphics Development , Matthew Auld , ML dri-devel Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 11/2/21 18:40, Matthew Auld wrote: > On Tue, 2 Nov 2021 at 16:39, Thomas Hellström > wrote: >> If the initial fill blit or copy blit of an object fails, the old >> content of the data might be exposed and read as soon as either CPU- or >> GPU PTEs are set up to point at the pages. >> >> Intercept the blit fence with an async callback that checks the >> blit fence for errors and if there are errors performs an async cpu blit >> instead. If there is a failure to allocate the async dma_fence_work, >> allocate it on the stack and sync wait for the blit to complete. >> >> Add selftests that simulate gpu blit failures and failure to allocate >> the async dma_fence_work. >> >> A previous version of this pach used dma_fence_work, now that's >> opencoded which adds more code but might lower the latency >> somewhat in the common non-error case. >> >> v3: >> - Style fixes (Matthew Auld) >> >> Signed-off-by: Thomas Hellström >> Reviewed-by: Matthew Auld >> --- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 322 +++++++++++++++--- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 5 + >> .../drm/i915/gem/selftests/i915_gem_migrate.c | 24 +- >> 3 files changed, 295 insertions(+), 56 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> index 0ed6b7f2b95f..b89672c547f8 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c >> @@ -18,6 +18,29 @@ >> #include "gt/intel_gt.h" >> #include "gt/intel_migrate.h" >> >> +/** >> + * DOC: Selftest failure modes for failsafe migration: >> + * >> + * For fail_gpu_migration, the gpu blit scheduled is always a clear blit >> + * rather than a copy blit, and then we force the failure paths as if >> + * the blit fence returned an error. >> + * >> + * For fail_work_allocation we fail the kmalloc of the async worker, we >> + * sync the gpu blit. If it then fails, or fail_gpu_migration is set to >> + * true, then a memcpy operation is performed sync. >> + */ >> +#ifdef CONFIG_DRM_I915_SELFTEST > When pushing maybe make this: > > #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) > > Which seems to be consistent with most of the other places. Hmm, I noticed that i915 is doing that, although I thought these macros were primarily intended for C expressions? /Thomas > >> +static bool fail_gpu_migration; >> +static bool fail_work_allocation; >> + >> +void i915_ttm_migrate_set_failure_modes(bool gpu_migration, >> + bool work_allocation) >> +{ >> + fail_gpu_migration = gpu_migration; >> + fail_work_allocation = work_allocation; >> +} >> +#endif >> +