All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: markus.dobel@gmx.de, Brad Love <brad@nextdimension.cc>,
	linux-media <linux-media@vger.kernel.org>
Subject: Re: [PATCH] Revert 95f408bb Ryzen DMA related RiSC engine stall fixes
Date: Thu, 6 Dec 2018 17:32:04 -0200	[thread overview]
Message-ID: <20181206173204.21b9366e@coco.lan> (raw)
In-Reply-To: <20181206170752.1f3ac305@coco.lan>

Em Thu, 6 Dec 2018 17:07:52 -0200
Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:

> Em Thu, 6 Dec 2018 13:36:24 -0500
> Alex Deucher <alexdeucher@gmail.com> escreveu:
> 
> > On Thu, Dec 6, 2018 at 1:05 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:  
> > >
> > > Em Thu, 06 Dec 2018 18:18:23 +0100
> > > Markus Dobel <markus.dobel@gmx.de> escreveu:
> > >    
> > > > Hi everyone,
> > > >
> > > > I will try if the hack mentioned fixes the issue for me on the weekend (but I assume, as if effectively removes the function).    
> > >
> > > It should, but it keeps a few changes. Just want to be sure that what
> > > would be left won't cause issues. If this works, the logic that would
> > > solve Ryzen DMA fixes will be contained into a single point, making
> > > easier to maintain it.
> > >    
> > > >
> > > > Just in case this is of interest, I neither have Ryzen nor Intel, but an HP Microserver G7 with an AMD Turion II Neo  N54L, so the machine is more on the slow side.    
> > >
> > > Good to know. It would probably worth to check if this Ryzen
> > > bug occors with all versions of it or with just a subset.
> > > I mean: maybe it is only at the first gen or Ryzen and doesn't
> > > affect Ryzen 2 (or vice versa).    
> > 
> > The original commit also mentions some Xeons are affected too.  Seems
> > like this is potentially an issue on the device side rather than the
> > platform.  
> 
> Maybe.
> 
> > >
> > > The PCI quirks logic will likely need to detect the PCI ID of
> > > the memory controllers found at the buggy CPUs, in order to enable
> > > the quirk only for the affected ones.
> > >
> > > It could be worth talking with AMD people in order to be sure about
> > > the differences at the DMA engine side.
> > >    
> > 
> > It's not clear to me what the pci or platform quirk would do.  The
> > workaround seems to be in the driver, not on the platform.  
> 
> Yeah, the fix should be at the driver, but pci/quirk.c would be able
> to detect memory controllers that would require a hack inside the
> driver, in a similar way to what the media PCI drivers already do
> for DMA controllers that don't support pci2pci transfers.
> 
> There, basically the PCI core (drivers/pci/pci.c and 
> drivers/pci/quirks.c) sets a flag (pci_pci_problems) indicating
> potential issues.
> 
> Then, the driver compares such flag in order to enable the specific quirk.
> 
> Ok, there would be a different way to handle it. The driver could use a 
> logic similar to the one I wrote for drivers/edac/i7core_edac.c. There,
> the logic seeks for some specific PCI device IDs using pci_get_device(),
> adjusting the code accordingly, depending on the detected PCI devices.
> 
> In other words, running something like this (untested), at probe time should
> produce similar results:
> 
> 	/*
> 	 * FIXME: It probably makes sense to also be able to identify specific
> 	 * versions of the same PCI ID, just in case a latter stepping got a
> 	 * fix for the issue.
> 	 */
> 	const static struct {
> 		int vendor, dev;
> 	} broken_dev_id[] = {
> 		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_foo,
> 		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_bar,
> 	},
> 
> 	bool cx23885_does_dma_require_reset(void) 
> 	{
> 		int i;
> 		struct pci_dev *pdev = NULL;
> 
> 		for (i = 0; i < sizeof(broken_dev_id); i++) {
> 			pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
> 			if (pdev) {
> 				pci_put_device(pdev);
> 				return true;
> 			}
> 		}
> 		return false;
> 	}
> 
> Should work. In any case, we need to know what memory controllers 
> have problems, and what are their PCI IDs, and add them (if not there yet)
> at include/linux/pci_ids.h
> 
> Thanks,
> Mauro

To be clearer, I'm thinking on something like the (untested)
code below (untested).

PS.: the PCI ID used there may be wrong. I just added one in
order to have a proof of concept.

Thanks,
Mauro

[PATCH] media: cx23885: only reset DMA on problematic CPUs

It is reported that changeset 95f408bbc4e4 ("media: cx23885:
Ryzen DMA related RiSC engine stall fixes") caused regresssions
with other CPUs.

Ensure that the quirk will be applied only for the CPUs that
are known to cause problems.

Fixes: 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

diff --git a/drivers/media/pci/cx23885/cx23885-core.c b/drivers/media/pci/cx23885/cx23885-core.c
index 39804d830305..48da7d194cc1 100644
--- a/drivers/media/pci/cx23885/cx23885-core.c
+++ b/drivers/media/pci/cx23885/cx23885-core.c
@@ -23,6 +23,7 @@
 #include <linux/moduleparam.h>
 #include <linux/kmod.h>
 #include <linux/kernel.h>
+#include <linux/pci.h>
 #include <linux/slab.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
@@ -603,8 +604,13 @@ static void cx23885_risc_disasm(struct cx23885_tsport *port,
 
 static void cx23885_clear_bridge_error(struct cx23885_dev *dev)
 {
-	uint32_t reg1_val = cx_read(TC_REQ); /* read-only */
-	uint32_t reg2_val = cx_read(TC_REQ_SET);
+	uint32_t reg1_val, reg2_val;
+
+	if (!dev->need_dma_reset)
+		return;
+
+	reg1_val = cx_read(TC_REQ); /* read-only */
+	reg2_val = cx_read(TC_REQ_SET);
 
 	if (reg1_val && reg2_val) {
 		cx_write(TC_REQ, reg1_val);
@@ -2058,6 +2064,31 @@ void cx23885_gpio_enable(struct cx23885_dev *dev, u32 mask, int asoutput)
 	/* TODO: 23-19 */
 }
 
+static struct {
+	int vendor, dev;
+} const broken_dev_id[] = {
+	/* According with
+	 * https://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen%207%201800X/lspci,
+	 * 0x1451 is PCI ID for the IOMMU found on Ryzen 7
+	 */
+	{ PCI_VENDOR_ID_AMD, 0x1451 },
+};
+
+static bool cx23885_does_need_dma_reset(void)
+{
+	int i;
+	struct pci_dev *pdev = NULL;
+
+	for (i = 0; i < sizeof(broken_dev_id); i++) {
+		pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
+		if (pdev) {
+			pci_dev_put(pdev);
+			return true;
+		}
+	}
+	return false;
+}
+
 static int cx23885_initdev(struct pci_dev *pci_dev,
 			   const struct pci_device_id *pci_id)
 {
@@ -2069,6 +2100,8 @@ static int cx23885_initdev(struct pci_dev *pci_dev,
 	if (NULL == dev)
 		return -ENOMEM;
 
+	dev->need_dma_reset = cx23885_does_need_dma_reset();
+
 	err = v4l2_device_register(&pci_dev->dev, &dev->v4l2_dev);
 	if (err < 0)
 		goto fail_free;
diff --git a/drivers/media/pci/cx23885/cx23885.h b/drivers/media/pci/cx23885/cx23885.h
index d54c7ee1ab21..cf965efabe66 100644
--- a/drivers/media/pci/cx23885/cx23885.h
+++ b/drivers/media/pci/cx23885/cx23885.h
@@ -451,6 +451,8 @@ struct cx23885_dev {
 	/* Analog raw audio */
 	struct cx23885_audio_dev   *audio_dev;
 
+	/* Does the system require periodic DMA resets? */
+	unsigned int		need_dma_reset:1;
 };
 
 static inline struct cx23885_dev *to_cx23885(struct v4l2_device *v4l2_dev)


  reply	other threads:[~2018-12-06 19:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-21 13:45 [PATCH] Revert 95f408bb Ryzen DMA related RiSC engine stall fixes Markus Dobel
2018-12-05 11:07 ` Mauro Carvalho Chehab
2018-12-06 16:37   ` Brad Love
2018-12-06 17:18     ` Markus Dobel
2018-12-06 18:01       ` Mauro Carvalho Chehab
2018-12-06 18:36         ` Alex Deucher
2018-12-06 19:07           ` Mauro Carvalho Chehab
2018-12-06 19:32             ` Mauro Carvalho Chehab [this message]
2018-12-18 22:59               ` [PATCH v2] cx23885: only reset DMA on problematic CPUs Brad Love
2018-12-18 23:49                 ` Alex Deucher
2018-12-19 17:26                   ` Brad Love
2018-12-19 17:40                     ` Brad Love
2018-12-19 11:08                 ` Matthias Schwarzott
2018-12-19 17:09                   ` Brad Love
2018-12-19 17:07                 ` [PATCH v3] " Brad Love
2018-12-20 13:43                   ` Mauro Carvalho Chehab
2018-12-16 10:37         ` [PATCH] Revert 95f408bb Ryzen DMA related RiSC engine stall fixes Markus Dobel
2018-12-16 14:23           ` Mauro Carvalho Chehab
2018-12-18  2:05             ` Alex Deucher
2018-12-18  6:32               ` Markus Dobel
2018-12-18 12:45               ` Mauro Carvalho Chehab
2018-12-18 23:11                 ` Brad Love
2018-12-18 23:46                 ` Alex Deucher
2018-12-19  0:05                   ` Brad Love
2018-12-19  0:08                     ` Brad Love
2018-12-19 19:07                       ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181206173204.21b9366e@coco.lan \
    --to=mchehab+samsung@kernel.org \
    --cc=alexdeucher@gmail.com \
    --cc=brad@nextdimension.cc \
    --cc=linux-media@vger.kernel.org \
    --cc=markus.dobel@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.