linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
@ 2024-03-11 15:14 Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 2/5] ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port Sasha Levin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sasha Levin @ 2024-03-11 15:14 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ranjan Kumar, Martin K . Petersen, Sasha Levin, sathya.prakash,
	sreekanth.reddy, suganath-prabu.subramani, jejb,
	MPT-FusionLinux.pdl, linux-scsi

From: Ranjan Kumar <ranjan.kumar@broadcom.com>

[ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ]

If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.

Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index e524e1fc53fa3..8325875bfc4ed 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -7238,7 +7238,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
 		return -EFAULT;
 	}
 
- issue_diag_reset:
+	return 0;
+
+issue_diag_reset:
 	rc = _base_diag_reset(ioc);
 	return rc;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 5.15 2/5] ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port
  2024-03-11 15:14 [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready Sasha Levin
@ 2024-03-11 15:14 ` Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-03-11 15:14 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kailang Yang, Takashi Iwai, Sasha Levin, perex, tiwai, sbinding,
	luke, andy.chi, shenghao-ding, ruinairas1992, vitalyr,
	linux-sound

From: Kailang Yang <kailang@realtek.com>

[ Upstream commit b34bf65838f7c6e785f62681605a538b73c2808c ]

It had pop noise from Headphone port when system reboot state.
If NID 58h Index 0x0 to fill default value, it will reduce pop noise.

Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/r/7493e207919a4fb3a0599324fd010e3e@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 sound/pci/hda/patch_realtek.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index a6e6ed1355abf..3a86f0fd78278 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -3675,6 +3675,7 @@ static void alc285_hp_init(struct hda_codec *codec)
 	int i, val;
 	int coef38, coef0d, coef36;
 
+	alc_write_coefex_idx(codec, 0x58, 0x00, 0x1888); /* write default value */
 	alc_update_coef_idx(codec, 0x4a, 1<<15, 1<<15); /* Reset HP JD */
 	coef38 = alc_read_coef_idx(codec, 0x38); /* Amp control */
 	coef0d = alc_read_coef_idx(codec, 0x0d); /* Digital Misc control */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
  2024-03-11 15:14 [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 2/5] ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port Sasha Levin
@ 2024-03-11 15:14 ` Sasha Levin
  2024-03-13 20:03   ` Felix Kuehling
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 4/5] Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 5/5] firewire: core: use long bus reset on gap count error Sasha Levin
  3 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2024-03-11 15:14 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Prike Liang, Alex Deucher, Sasha Levin, christian.koenig,
	Xinhui.Pan, airlied, daniel, Hawking.Zhang, lijo.lazar, le.ma,
	James.Zhu, shane.xiao, sonny.jiang, amd-gfx, dri-devel

From: Prike Liang <Prike.Liang@amd.com>

[ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]

Currently, GPU resets can now be performed successfully on the Raven
series. While GPU reset is required for the S3 suspend abort case.
So now can enable gpu reset for S3 abort cases on the Raven series.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +++++++++++++++++-------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 6a3486f52d698..ef5b3eedc8615 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -605,11 +605,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
 		return AMD_RESET_METHOD_MODE1;
 }
 
+static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
+{
+	u32 sol_reg;
+
+	sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
+
+	/* Will reset for the following suspend abort cases.
+	 * 1) Only reset limit on APU side, dGPU hasn't checked yet.
+	 * 2) S3 suspend abort and TOS already launched.
+	 */
+	if (adev->flags & AMD_IS_APU && adev->in_s3 &&
+			!adev->suspend_complete &&
+			sol_reg)
+		return true;
+
+	return false;
+}
+
 static int soc15_asic_reset(struct amdgpu_device *adev)
 {
 	/* original raven doesn't have full asic reset */
-	if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
-	    (adev->apu_flags & AMD_APU_IS_RAVEN2))
+	/* On the latest Raven, the GPU reset can be performed
+	 * successfully. So now, temporarily enable it for the
+	 * S3 suspend abort case.
+	 */
+	if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
+	    (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
+		!soc15_need_reset_on_resume(adev))
 		return 0;
 
 	switch (soc15_asic_reset_method(adev)) {
@@ -1490,24 +1513,6 @@ static int soc15_common_suspend(void *handle)
 	return soc15_common_hw_fini(adev);
 }
 
-static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
-{
-	u32 sol_reg;
-
-	sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
-
-	/* Will reset for the following suspend abort cases.
-	 * 1) Only reset limit on APU side, dGPU hasn't checked yet.
-	 * 2) S3 suspend abort and TOS already launched.
-	 */
-	if (adev->flags & AMD_IS_APU && adev->in_s3 &&
-			!adev->suspend_complete &&
-			sol_reg)
-		return true;
-
-	return false;
-}
-
 static int soc15_common_resume(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 5.15 4/5] Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
  2024-03-11 15:14 [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 2/5] ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Sasha Levin
@ 2024-03-11 15:14 ` Sasha Levin
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 5/5] firewire: core: use long bus reset on gap count error Sasha Levin
  3 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-03-11 15:14 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yuxuan Hu, Luiz Augusto von Dentz, Sasha Levin, marcel,
	johan.hedberg, luiz.dentz, linux-bluetooth

From: Yuxuan Hu <20373622@buaa.edu.cn>

[ Upstream commit 2535b848fa0f42ddff3e5255cf5e742c9b77bb26 ]

During our fuzz testing of the connection and disconnection process at the
RFCOMM layer, we discovered this bug. By comparing the packets from a
normal connection and disconnection process with the testcase that
triggered a KASAN report. We analyzed the cause of this bug as follows:

1. In the packets captured during a normal connection, the host sends a
`Read Encryption Key Size` type of `HCI_CMD` packet
(Command Opcode: 0x1408) to the controller to inquire the length of
encryption key.After receiving this packet, the controller immediately
replies with a Command Completepacket (Event Code: 0x0e) to return the
Encryption Key Size.

2. In our fuzz test case, the timing of the controller's response to this
packet was delayed to an unexpected point: after the RFCOMM and L2CAP
layers had disconnected but before the HCI layer had disconnected.

3. After receiving the Encryption Key Size Response at the time described
in point 2, the host still called the rfcomm_check_security function.
However, by this time `struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn;`
had already been released, and when the function executed
`return hci_conn_security(conn->hcon, d->sec_level, auth_type, d->out);`,
specifically when accessing `conn->hcon`, a null-ptr-deref error occurred.

To fix this bug, check if `sk->sk_state` is BT_CLOSED before calling
rfcomm_recv_frame in rfcomm_process_rx.

Signed-off-by: Yuxuan Hu <20373622@buaa.edu.cn>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/bluetooth/rfcomm/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 8d6fce9005bdd..4f54c7df3a94f 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -1937,7 +1937,7 @@ static struct rfcomm_session *rfcomm_process_rx(struct rfcomm_session *s)
 	/* Get data directly from socket receive queue without copying it. */
 	while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
 		skb_orphan(skb);
-		if (!skb_linearize(skb)) {
+		if (!skb_linearize(skb) && sk->sk_state != BT_CLOSED) {
 			s = rfcomm_recv_frame(s, skb);
 			if (!s)
 				break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 5.15 5/5] firewire: core: use long bus reset on gap count error
  2024-03-11 15:14 [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready Sasha Levin
                   ` (2 preceding siblings ...)
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 4/5] Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security Sasha Levin
@ 2024-03-11 15:14 ` Sasha Levin
  3 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-03-11 15:14 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Takashi Sakamoto, Adam Goldman, Sasha Levin, linux1394-devel

From: Takashi Sakamoto <o-takashi@sakamocchi.jp>

[ Upstream commit d0b06dc48fb15902d7da09c5c0861e7f042a9381 ]

When resetting the bus after a gap count error, use a long rather than
short bus reset.

IEEE 1394-1995 uses only long bus resets. IEEE 1394a adds the option of
short bus resets. When video or audio transmission is in progress and a
device is hot-plugged elsewhere on the bus, the resulting bus reset can
cause video frame drops or audio dropouts. Short bus resets reduce or
eliminate this problem. Accordingly, short bus resets are almost always
preferred.

However, on a mixed 1394/1394a bus, a short bus reset can trigger an
immediate additional bus reset. This double bus reset can be interpreted
differently by different nodes on the bus, resulting in an inconsistent gap
count after the bus reset. An inconsistent gap count will cause another bus
reset, leading to a neverending bus reset loop. This only happens for some
bus topologies, not for all mixed 1394/1394a buses.

By instead sending a long bus reset after a gap count inconsistency, we
avoid the doubled bus reset, restoring the bus to normal operation.

Signed-off-by: Adam Goldman <adamg@pobox.com>
Link: https://sourceforge.net/p/linux1394/mailman/message/58741624/
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/firewire/core-card.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/firewire/core-card.c b/drivers/firewire/core-card.c
index be195ba834632..d446a72629414 100644
--- a/drivers/firewire/core-card.c
+++ b/drivers/firewire/core-card.c
@@ -500,7 +500,19 @@ static void bm_work(struct work_struct *work)
 		fw_notice(card, "phy config: new root=%x, gap_count=%d\n",
 			  new_root_id, gap_count);
 		fw_send_phy_config(card, new_root_id, generation, gap_count);
-		reset_bus(card, true);
+		/*
+		 * Where possible, use a short bus reset to minimize
+		 * disruption to isochronous transfers. But in the event
+		 * of a gap count inconsistency, use a long bus reset.
+		 *
+		 * As noted in 1394a 8.4.6.2, nodes on a mixed 1394/1394a bus
+		 * may set different gap counts after a bus reset. On a mixed
+		 * 1394/1394a bus, a short bus reset can get doubled. Some
+		 * nodes may treat the double reset as one bus reset and others
+		 * may treat it as two, causing a gap count inconsistency
+		 * again. Using a long bus reset prevents this.
+		 */
+		reset_bus(card, card->gap_count != 0);
 		/* Will allocate broadcast channel after the reset. */
 		goto out;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
  2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Sasha Levin
@ 2024-03-13 20:03   ` Felix Kuehling
  2024-03-13 20:46     ` Alex Deucher
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Kuehling @ 2024-03-13 20:03 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Prike Liang, Alex Deucher, christian.koenig, Xinhui.Pan, airlied,
	daniel, Hawking.Zhang, lijo.lazar, le.ma, James.Zhu, shane.xiao,
	sonny.jiang, amd-gfx, dri-devel

On 2024-03-11 11:14, Sasha Levin wrote:
> From: Prike Liang <Prike.Liang@amd.com>
>
> [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
>
> Currently, GPU resets can now be performed successfully on the Raven
> series. While GPU reset is required for the S3 suspend abort case.
> So now can enable gpu reset for S3 abort cases on the Raven series.

This looks suspicious to me. I'm not sure what conditions made the GPU 
reset successful. But unless all the changes involved were also 
backported, this should probably not be applied to older kernel 
branches. I'm speculating it may be related to the removal of AMD IOMMUv2.

Regards,
   Felix


>
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +++++++++++++++++-------------
>   1 file changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 6a3486f52d698..ef5b3eedc8615 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
>   		return AMD_RESET_METHOD_MODE1;
>   }
>   
> +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> +{
> +	u32 sol_reg;
> +
> +	sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> +
> +	/* Will reset for the following suspend abort cases.
> +	 * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> +	 * 2) S3 suspend abort and TOS already launched.
> +	 */
> +	if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> +			!adev->suspend_complete &&
> +			sol_reg)
> +		return true;
> +
> +	return false;
> +}
> +
>   static int soc15_asic_reset(struct amdgpu_device *adev)
>   {
>   	/* original raven doesn't have full asic reset */
> -	if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> -	    (adev->apu_flags & AMD_APU_IS_RAVEN2))
> +	/* On the latest Raven, the GPU reset can be performed
> +	 * successfully. So now, temporarily enable it for the
> +	 * S3 suspend abort case.
> +	 */
> +	if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> +	    (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> +		!soc15_need_reset_on_resume(adev))
>   		return 0;
>   
>   	switch (soc15_asic_reset_method(adev)) {
> @@ -1490,24 +1513,6 @@ static int soc15_common_suspend(void *handle)
>   	return soc15_common_hw_fini(adev);
>   }
>   
> -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> -{
> -	u32 sol_reg;
> -
> -	sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> -
> -	/* Will reset for the following suspend abort cases.
> -	 * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> -	 * 2) S3 suspend abort and TOS already launched.
> -	 */
> -	if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> -			!adev->suspend_complete &&
> -			sol_reg)
> -		return true;
> -
> -	return false;
> -}
> -
>   static int soc15_common_resume(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
  2024-03-13 20:03   ` Felix Kuehling
@ 2024-03-13 20:46     ` Alex Deucher
  2024-03-14  3:00       ` Liang, Prike
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Deucher @ 2024-03-13 20:46 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Sasha Levin, linux-kernel, stable, Prike Liang, Alex Deucher,
	christian.koenig, Xinhui.Pan, airlied, daniel, Hawking.Zhang,
	lijo.lazar, le.ma, James.Zhu, shane.xiao, sonny.jiang, amd-gfx,
	dri-devel

On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling <felix.kuehling@amd.com> wrote:
>
> On 2024-03-11 11:14, Sasha Levin wrote:
> > From: Prike Liang <Prike.Liang@amd.com>
> >
> > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> >
> > Currently, GPU resets can now be performed successfully on the Raven
> > series. While GPU reset is required for the S3 suspend abort case.
> > So now can enable gpu reset for S3 abort cases on the Raven series.
>
> This looks suspicious to me. I'm not sure what conditions made the GPU
> reset successful. But unless all the changes involved were also
> backported, this should probably not be applied to older kernel
> branches. I'm speculating it may be related to the removal of AMD IOMMUv2.
>

We should get confirmation from Prike, but I think he tested this on
older kernels as well.

Alex

> Regards,
>    Felix
>
>
> >
> > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +++++++++++++++++-------------
> >   1 file changed, 25 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 6a3486f52d698..ef5b3eedc8615 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
> >               return AMD_RESET_METHOD_MODE1;
> >   }
> >
> > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > +{
> > +     u32 sol_reg;
> > +
> > +     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> > +     /* Will reset for the following suspend abort cases.
> > +      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > +      * 2) S3 suspend abort and TOS already launched.
> > +      */
> > +     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > +                     !adev->suspend_complete &&
> > +                     sol_reg)
> > +             return true;
> > +
> > +     return false;
> > +}
> > +
> >   static int soc15_asic_reset(struct amdgpu_device *adev)
> >   {
> >       /* original raven doesn't have full asic reset */
> > -     if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > -         (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > +     /* On the latest Raven, the GPU reset can be performed
> > +      * successfully. So now, temporarily enable it for the
> > +      * S3 suspend abort case.
> > +      */
> > +     if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > +         (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > +             !soc15_need_reset_on_resume(adev))
> >               return 0;
> >
> >       switch (soc15_asic_reset_method(adev)) {
> > @@ -1490,24 +1513,6 @@ static int soc15_common_suspend(void *handle)
> >       return soc15_common_hw_fini(adev);
> >   }
> >
> > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > -{
> > -     u32 sol_reg;
> > -
> > -     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > -
> > -     /* Will reset for the following suspend abort cases.
> > -      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > -      * 2) S3 suspend abort and TOS already launched.
> > -      */
> > -     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > -                     !adev->suspend_complete &&
> > -                     sol_reg)
> > -             return true;
> > -
> > -     return false;
> > -}
> > -
> >   static int soc15_common_resume(void *handle)
> >   {
> >       struct amdgpu_device *adev = (struct amdgpu_device *)handle;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
  2024-03-13 20:46     ` Alex Deucher
@ 2024-03-14  3:00       ` Liang, Prike
  0 siblings, 0 replies; 8+ messages in thread
From: Liang, Prike @ 2024-03-14  3:00 UTC (permalink / raw)
  To: Alex Deucher, Kuehling, Felix
  Cc: Sasha Levin, linux-kernel, stable, Deucher, Alexander, Koenig,
	Christian, Pan, Xinhui, airlied, daniel, Zhang, Hawking, Lazar,
	Lijo, Ma, Le, Zhu, James, Xiao, Shane, Jiang, Sonny, amd-gfx,
	dri-devel

[AMD Official Use Only - General]

> From: Alex Deucher <alexdeucher@gmail.com>
> Sent: Thursday, March 14, 2024 4:46 AM
> To: Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: Sasha Levin <sashal@kernel.org>; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org; Liang, Prike <Prike.Liang@amd.com>; Deucher,
> Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>;
> airlied@gmail.com; daniel@ffwll.ch; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>; Ma, Le
> <Le.Ma@amd.com>; Zhu, James <James.Zhu@amd.com>; Xiao, Shane
> <shane.xiao@amd.com>; Jiang, Sonny <Sonny.Jiang@amd.com>; amd-
> gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
> Subject: Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3
> abort cases on Raven series
>
> On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling <felix.kuehling@amd.com>
> wrote:
> >
> > On 2024-03-11 11:14, Sasha Levin wrote:
> > > From: Prike Liang <Prike.Liang@amd.com>
> > >
> > > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> > >
> > > Currently, GPU resets can now be performed successfully on the Raven
> > > series. While GPU reset is required for the S3 suspend abort case.
> > > So now can enable gpu reset for S3 abort cases on the Raven series.
> >
> > This looks suspicious to me. I'm not sure what conditions made the GPU
> > reset successful. But unless all the changes involved were also
> > backported, this should probably not be applied to older kernel
> > branches. I'm speculating it may be related to the removal of AMD
> IOMMUv2.
> >
>
> We should get confirmation from Prike, but I think he tested this on older
> kernels as well.
>
> Alex
>
> > Regards,
> >    Felix
> >

The Raven/Raven2 series GPU reset function was enabled in some older kernel versions such as 5.5 but filtered out in more recent kernel driver versions. Therefore, this patch only applies to the latest kernel version, and it should be safe without affecting other cases by enabling the Raven GPU reset only on the S3 suspend abort case. From the Chrome kernel log indicating that the AMD IOMMUv2 driver is loaded, and with this patch triggering the GPU reset before the AMDGPU device reinitialization, it can effectively handle the S3 suspend abort resume problem on the Raven series.

Was the Raven GPU reset previously disabled due to the AMD IOMMUv2 driver? If so, based on the Chromebook's verification result, the Raven series GPU reset can probably be enabled with IOMMUv2 for other cases as well.

Thanks,
Prike
> >
> > >
> > > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +++++++++++++++++----------
> ---
> > >   1 file changed, 25 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > index 6a3486f52d698..ef5b3eedc8615 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct
> amdgpu_device *adev)
> > >               return AMD_RESET_METHOD_MODE1;
> > >   }
> > >
> > > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > +{
> > > +     u32 sol_reg;
> > > +
> > > +     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > +
> > > +     /* Will reset for the following suspend abort cases.
> > > +      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > +      * 2) S3 suspend abort and TOS already launched.
> > > +      */
> > > +     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > +                     !adev->suspend_complete &&
> > > +                     sol_reg)
> > > +             return true;
> > > +
> > > +     return false;
> > > +}
> > > +
> > >   static int soc15_asic_reset(struct amdgpu_device *adev)
> > >   {
> > >       /* original raven doesn't have full asic reset */
> > > -     if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > -         (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > > +     /* On the latest Raven, the GPU reset can be performed
> > > +      * successfully. So now, temporarily enable it for the
> > > +      * S3 suspend abort case.
> > > +      */
> > > +     if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > +         (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > > +             !soc15_need_reset_on_resume(adev))
> > >               return 0;
> > >
> > >       switch (soc15_asic_reset_method(adev)) { @@ -1490,24 +1513,6
> > > @@ static int soc15_common_suspend(void *handle)
> > >       return soc15_common_hw_fini(adev);
> > >   }
> > >
> > > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > -{
> > > -     u32 sol_reg;
> > > -
> > > -     sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > -
> > > -     /* Will reset for the following suspend abort cases.
> > > -      * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > -      * 2) S3 suspend abort and TOS already launched.
> > > -      */
> > > -     if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > -                     !adev->suspend_complete &&
> > > -                     sol_reg)
> > > -             return true;
> > > -
> > > -     return false;
> > > -}
> > > -
> > >   static int soc15_common_resume(void *handle)
> > >   {
> > >       struct amdgpu_device *adev = (struct amdgpu_device *)handle;

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-14  3:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 15:14 [PATCH AUTOSEL 5.15 1/5] scsi: mpt3sas: Prevent sending diag_reset when the controller is ready Sasha Levin
2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 2/5] ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port Sasha Levin
2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Sasha Levin
2024-03-13 20:03   ` Felix Kuehling
2024-03-13 20:46     ` Alex Deucher
2024-03-14  3:00       ` Liang, Prike
2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 4/5] Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security Sasha Levin
2024-03-11 15:14 ` [PATCH AUTOSEL 5.15 5/5] firewire: core: use long bus reset on gap count error Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).