All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Tvrtko Ursulin <tvrtko.ursulin@intel.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: [PATCH 5.19 026/101] drm/i915/gt: Restrict forced preemption to the active context
Date: Mon,  3 Oct 2022 09:10:22 +0200	[thread overview]
Message-ID: <20221003070725.132925976@linuxfoundation.org> (raw)
In-Reply-To: <20221003070724.490989164@linuxfoundation.org>

From: Chris Wilson <chris@chris-wilson.co.uk>

commit 6ef7d362123ecb5bf6d163bb9c7fd6ba2d8c968c upstream.

When we submit a new pair of contexts to ELSP for execution, we start a
timer by which point we expect the HW to have switched execution to the
pending contexts. If the promotion to the new pair of contexts has not
occurred, we declare the executing context to have hung and force the
preemption to take place by resetting the engine and resubmitting the
new contexts.

This can lead to an unfair situation where almost all of the preemption
timeout is consumed by the first context which just switches into the
second context immediately prior to the timer firing and triggering the
preemption reset (assuming that the timer interrupts before we process
the CS events for the context switch). The second context hasn't yet had
a chance to yield to the incoming ELSP (and send the ACk for the
promotion) and so ends up being blamed for the reset.

If we see that a context switch has occurred since setting the
preemption timeout, but have not yet received the ACK for the ELSP
promotion, rearm the preemption timer and check again. This is
especially significant if the first context was not schedulable and so
we used the shortest timer possible, greatly increasing the chance of
accidentally blaming the second innocent context.

Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption")
Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Tested-by: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220921135258.1714873-1-andrzej.hajda@intel.com
(cherry picked from commit 107ba1a2c705f4358f2602ec2f2fd821bb651f42)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h         |   15 +++++++++++++
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c |   21 ++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -156,6 +156,21 @@ struct intel_engine_execlists {
 	struct timer_list preempt;
 
 	/**
+	 * @preempt_target: active request at the time of the preemption request
+	 *
+	 * We force a preemption to occur if the pending contexts have not
+	 * been promoted to active upon receipt of the CS ack event within
+	 * the timeout. This timeout maybe chosen based on the target,
+	 * using a very short timeout if the context is no longer schedulable.
+	 * That short timeout may not be applicable to other contexts, so
+	 * if a context switch should happen within before the preemption
+	 * timeout, we may shoot early at an innocent context. To prevent this,
+	 * we record which context was active at the time of the preemption
+	 * request and only reset that context upon the timeout.
+	 */
+	const struct i915_request *preempt_target;
+
+	/**
 	 * @ccid: identifier for contexts submitted to this engine
 	 */
 	u32 ccid;
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1241,6 +1241,9 @@ static unsigned long active_preempt_time
 	if (!rq)
 		return 0;
 
+	/* Only allow ourselves to force reset the currently active context */
+	engine->execlists.preempt_target = rq;
+
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
 	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
 		return 1;
@@ -2427,8 +2430,24 @@ static void execlists_submission_tasklet
 	GEM_BUG_ON(inactive - post > ARRAY_SIZE(post));
 
 	if (unlikely(preempt_timeout(engine))) {
+		const struct i915_request *rq = *engine->execlists.active;
+
+		/*
+		 * If after the preempt-timeout expired, we are still on the
+		 * same active request/context as before we initiated the
+		 * preemption, reset the engine.
+		 *
+		 * However, if we have processed a CS event to switch contexts,
+		 * but not yet processed the CS event for the pending
+		 * preemption, reset the timer allowing the new context to
+		 * gracefully exit.
+		 */
 		cancel_timer(&engine->execlists.preempt);
-		engine->execlists.error_interrupt |= ERROR_PREEMPT;
+		if (rq == engine->execlists.preempt_target)
+			engine->execlists.error_interrupt |= ERROR_PREEMPT;
+		else
+			set_timer_ms(&engine->execlists.preempt,
+				     active_preempt_timeout(engine, rq));
 	}
 
 	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {



  parent reply	other threads:[~2022-10-03  7:19 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03  7:09 [PATCH 5.19 000/101] 5.19.13-rc1 review Greg Kroah-Hartman
2022-10-03  7:09 ` [PATCH 5.19 001/101] riscv: make t-head erratas depend on MMU Greg Kroah-Hartman
2022-10-03  7:09 ` [PATCH 5.19 002/101] tools/perf: Fix out of bound access to cpu mask array Greg Kroah-Hartman
2022-10-03  7:09   ` Greg Kroah-Hartman
2022-10-03  7:09 ` [PATCH 5.19 003/101] perf record: Fix cpu mask bit setting for mixed mmaps Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 004/101] counter: 104-quad-8: Utilize iomap interface Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 005/101] counter: 104-quad-8: Implement and utilize register structures Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 006/101] counter: 104-quad-8: Fix skipped IRQ lines during events configuration Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 007/101] uas: add no-uas quirk for Hiksemi usb_disk Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 008/101] usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 009/101] uas: ignore UAS for Thinkplus chips Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 010/101] usb: typec: ucsi: Remove incorrect warning Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 011/101] thunderbolt: Explicitly reset plug events delay back to USB4 spec value Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 012/101] net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455 Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 013/101] Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 014/101] can: c_can: dont cache TX messages for C_CAN cores Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 015/101] clk: ingenic-tcu: Properly enable registers before accessing timers Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 016/101] wifi: mac80211: ensure vif queues are operational after start Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 017/101] x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 018/101] frontswap: dont call ->init if no ops are registered Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 019/101] ARM: dts: integrator: Tag PCI host with device_type Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 020/101] ntfs: fix BUG_ON in ntfs_lookup_inode_by_name() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 021/101] x86/uaccess: avoid check_object_size() in copy_from_user_nmi() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 022/101] mm/damon/dbgfs: fix memory leak when using debugfs_lookup() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 023/101] net: mt7531: only do PLL once after the reset Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 024/101] Revert "firmware: arm_scmi: Add clock management to the SCMI power domain" Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 025/101] powerpc/64s/radix: dont need to broadcast IPI for radix pmd collapse flush Greg Kroah-Hartman
2022-10-03  7:10 ` Greg Kroah-Hartman [this message]
2022-10-03  7:10 ` [PATCH 5.19 027/101] drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 028/101] vduse: prevent uninitialized memory accesses Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 029/101] libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205 Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 030/101] mm: fix BUG splat with kvmalloc + GFP_ATOMIC Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 031/101] mptcp: factor out __mptcp_close() without socket lock Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 032/101] mptcp: fix unreleased socket in accept queue Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 033/101] mmc: moxart: fix 4-bit bus width and remove 8-bit bus width Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 034/101] mmc: hsq: Fix data stomping during mmc recovery Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 035/101] mm: gup: fix the fast GUP race against THP collapse Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 036/101] mm/page_alloc: fix race condition between build_all_zonelists and page allocation Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 037/101] mm: prevent page_frag_alloc() from corrupting the memory Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 038/101] mm/page_isolation: fix isolate_single_pageblock() isolation behavior Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 039/101] mm: fix dereferencing possible ERR_PTR Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 040/101] mm/migrate_device.c: flush TLB while holding PTL Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 041/101] mm/migrate_device.c: add missing flush_cache_page() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 042/101] mm/migrate_device.c: copy pte dirty bit to page Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 043/101] mm: fix madivse_pageout mishandling on non-LRU page Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 044/101] mm: bring back update_mmu_cache() to finish_fault() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 045/101] mm/hugetlb: correct demote page offset logic Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 046/101] mm,hwpoison: check mm when killing accessing process Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 047/101] media: dvb_vb2: fix possible out of bound access Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 048/101] media: rkvdec: Disable H.264 error detection Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 049/101] media: mediatek: vcodec: Drop platform_get_resource(IORESOURCE_IRQ) Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 050/101] media: v4l2-compat-ioctl32.c: zero buffer passed to v4l2_compat_get_array_args() Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 051/101] ARM: dts: am33xx: Fix MMCHS0 dma properties Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 052/101] reset: imx7: Fix the iMX8MP PCIe PHY PERST support Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 053/101] ARM: dts: am5748: keep usb4_tm disabled Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 054/101] soc: sunxi: sram: Actually claim SRAM regions Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 055/101] soc: sunxi: sram: Prevent the driver from being unbound Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 056/101] soc: sunxi: sram: Fix probe function ordering issues Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 057/101] soc: sunxi: sram: Fix debugfs info for A64 SRAM C Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 058/101] ASoC: imx-card: Fix refcount issue with of_node_put Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 059/101] clk: microchip: mpfs: fix clk_cfg array bounds violation Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 060/101] clk: microchip: mpfs: make the rtcs ahb clock critical Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 061/101] arm64: dts: qcom: sm8350: fix UFS PHY serdes size Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 062/101] ASoC: tas2770: Reinit regcache on reset Greg Kroah-Hartman
2022-10-03  7:10 ` [PATCH 5.19 063/101] drm/bridge: lt8912b: add vsync hsync Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 064/101] drm/bridge: lt8912b: set hdmi or dvi mode Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 065/101] drm/bridge: lt8912b: fix corrupted image output Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 066/101] net: macb: Fix ZynqMP SGMII non-wakeup source resume failure Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 067/101] Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time" Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 068/101] Input: melfas_mip4 - fix return value check in mip4_probe() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 069/101] gpio: mvebu: Fix check for pwm support on non-A8K platforms Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 070/101] perf parse-events: Break out tracepoint and printing Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 071/101] perf print-events: Fix "perf list" can not display the PMU prefix for some hybrid cache events Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 072/101] perf parse-events: Remove "not supported" " Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 073/101] usbnet: Fix memory leak in usbnet_disconnect() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 074/101] net: sched: act_ct: fix possible refcount leak in tcf_ct_init() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 075/101] cxgb4: fix missing unlock on ETHOFLD desc collect fail path Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 076/101] net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 077/101] nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 078/101] wifi: cfg80211: fix MCS divisor value Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 079/101] wifi: mac80211: fix regression with non-QoS drivers Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 080/101] wifi: mac80211: fix memory corruption in minstrel_ht_update_rates() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 081/101] net: stmmac: power up/down serdes in stmmac_open/release Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 082/101] net: phy: Dont WARN for PHY_UP state in mdio_bus_phy_resume() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 083/101] selftests: Fix the if conditions of in test_extra_filter() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 084/101] ice: xsk: change batched Tx descriptor cleaning Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 085/101] ice: xsk: drop power of 2 ring size restriction for AF_XDP Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 086/101] vdpa/ifcvf: fix the calculation of queuepair Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 087/101] virtio-blk: Fix WARN_ON_ONCE in virtio_queue_rq() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 088/101] vdpa/mlx5: Fix MQ to support non power of two num queues Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 089/101] dont use __kernel_write() on kmap_local_page() Greg Kroah-Hartman
2022-10-03  9:09   ` Geert Uytterhoeven
2022-10-04 17:47     ` Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 090/101] clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 091/101] drm/i915/gt: Perf_limit_reasons are only available for Gen11+ Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 092/101] clk: iproc: Do not rely on node name for correct PLL setup Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 093/101] clk: imx93: drop of_match_ptr Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 094/101] net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 095/101] net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2} Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 096/101] perf test: Fix test case 87 ("perf record tests") for hybrid systems Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 097/101] perf tests record: Fail the test if the errs counter is not zero Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 098/101] KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 099/101] x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 100/101] x86/alternative: Fix race in try_get_desc() Greg Kroah-Hartman
2022-10-03  7:11 ` [PATCH 5.19 101/101] damon/sysfs: fix possible memleak on damon_sysfs_add_target Greg Kroah-Hartman
2022-10-03 17:51 ` [PATCH 5.19 000/101] 5.19.13-rc1 review Guenter Roeck
2022-10-03 18:50 ` Florian Fainelli
2022-10-03 19:02 ` Justin Forbes
2022-10-03 20:39 ` Slade Watkins
2022-10-03 21:28 ` Shuah Khan
2022-10-03 23:24 ` Zan Aziz
2022-10-04  6:06 ` Ron Economos
2022-10-04  6:48 ` Naresh Kamboju
2022-10-05  9:38   ` Feng Tang
2022-10-06  7:45     ` Naresh Kamboju
2022-10-05 10:50   ` Hyeonggon Yoo
2022-10-04  7:27 ` Bagas Sanjaya
2022-10-04 11:47 ` Sudip Mukherjee (Codethink)
2022-10-04 13:40 ` Fenil Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221003070725.132925976@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=andi.shyti@linux.intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.