linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: 'Manivannan Sadhasivam' <manivannan.sadhasivam@linaro.org>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: Shradha Todi <shradha.t@samsung.com>,
	bp@alien8.de, tony.luck@intel.com, james.morse@arm.com,
	mchehab@kernel.org, rric@kernel.org, lpieralisi@kernel.org,
	kw@linux.com, robh@kernel.org, bhelgaas@google.com,
	jingoohan1@gmail.com, gustavo.pimentel@synopsys.com,
	josh@joshtriplett.org, lukas.bulwahn@gmail.com,
	hongxing.zhu@nxp.com, pankaj.dubey@samsung.com,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	vidyas@nvidia.com, gost.dev@samsung.com, alim.akhtar@samsung.com,
	shiju.jose@huawei.com, Terry Bowman <Terry.Bowman@amd.com>,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW controller
Date: Fri, 22 Mar 2024 20:11:41 +0530	[thread overview]
Message-ID: <20240322144141.GC3333@thinkpad> (raw)
In-Reply-To: <20240322125817.0000791f@Huawei.com>

On Fri, Mar 22, 2024 at 12:58:17PM +0000, Jonathan Cameron wrote:
> On Fri, 22 Mar 2024 16:09:35 +0530
> 'Manivannan Sadhasivam' <manivannan.sadhasivam@linaro.org> wrote:
> 
> > On Wed, Mar 20, 2024 at 10:01:44AM +0000, Jonathan Cameron wrote:
> > > On Tue, 19 Mar 2024 22:03:15 +0530
> > > 'Manivannan Sadhasivam' <manivannan.sadhasivam@linaro.org> wrote:
> > >   
> > > > On Thu, Feb 22, 2024 at 04:30:47PM +0530, Shradha Todi wrote:  
> > > > > + Borislav, Tony, James, Mauro, Robert
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > Synopsys DesignWare PCIe controllers have a vendor specific capability (which
> > > > > means that this set of registers are only present in DesignWare controllers)
> > > > > to perform debug operations called "RASDES".
> > > > > The functionalities provided by this extended capability are:
> > > > > 
> > > > > 1. Debug: This has some debug related diagnostic features like holding LTSSM
> > > > > in certain states, reading the status of lane detection, checking if any PCIe
> > > > > lanes are broken (RX Valid) and so on. It's a debug only feature used for diagnostic
> > > > > use-cases.
> > > > > 
> > > > > 2. Error Injection: This is a way to inject certain errors in PCIe like LCRC, ECRC,
> > > > > Bad TLPs and so on. Again, this is a debug feature and generally not used in
> > > > > functional use-case.
> > > > > 
> > > > > 3. Statistical counters: This has 3 parts
> > > > >  - Error counters
> > > > >  - Non error counters (covered as part of perf [1])
> > > > >  - Time based analysis counters (covered as part of perf [1])
> > > > > 
> > > > > Selective features of  the above functionality has been implemented
> > > > > by vendor specific PCIe controller drivers (pcie-tegra194.c) that use
> > > > > Synopsys DesignWare PCIe controllers.
> > > > > In order to make it useful to all vendors using DWC controller, we had
> > > > > proposed a common implementation in DWC PCIe controller directory
> > > > > (drivers/pci/controller/dwc/) and our original idea was based on debugfs
> > > > > filesystem. v1 and v2 are mentioned in [2] and [3].
> > > > > 
> > > > > We got a suggestion to implement this as part of EDAC framework [3] and
> > > > > we looked into the same. But as far as I understood, what I am trying to
> > > > > implement is a very specific feature (only valid for Synopsys DWC PCIe controllers).  
> > > 
> > > For error part there are (at least superficially) similar features in the PCIe
> > > standard that we've started thinking about how to support.
> > > 
> > > See Flit Logging Extended capablity (7.7.8 in PCIe Base Spec rev6.
> > > That has the benefit that they are part of the standard so we can
> > > support them directly in portdrv / EP drivers using some library code in the
> > > PCI core.
> > >   
> > 
> > Sounds good. But v6 is a relatively new version and the DWC RAS predates that.
> > So we still need to support it somehow (either in EDAC or in
> > drivers/pci/controller/dwc).
> > 
> > > There are other interconnect and PCI PMU drivers that log retries etc which are also basically error
> > > counts. At least some of that is done through perf today. 
> > >   
> > 
> > IMO all the RAS support should be exposed through EDAC, otherwise it defeats the
> > purpose of the subsystem.
> 
> Some RAS flows go nowhere near EDAC today.  Individual drivers poke out
> the tracepoints and userspace tooling deals with it. One example is
> CXL but there are plenty of others.
> 

That's the problem. You'd end up with custom tooling for each individual
drivers leading to userspace fragmentation.

- Mani

> Perhaps that should not be the case, but there is definitely
> precedence for ignoring edac when considering error flows.
> 
> Jonathan
> 
> > 
> > - Mani
> > 
> > >   
> > > > > This doesn't seem to fit in very well with the EDAC framework and we can 
> > > > > hardly use any of the EDAC framework APIs. We tried implementing a
> > > > > "pci_driver" but since a function driver will already be running on the EP and
> > > > > portdrv on the root-complex, we will not be able to bind 2 drivers to a single
> > > > > PCI device (root-complex or endpoint). Ultimately, what I will be doing is
> > > > > writing a platform driver with debugfs entries which will be present in EDAC
> > > > > directory instead of DWC directory.  
> > > 
> > > The addition of this type of functionality to pordrv is a long running question.
> > > Everyone wants a solution, I believe some people are looking at it (+CC Terry)
> > > 
> > > Terry, another case for your long list.
> > > 
> > > For the EP end, this should be fired up by the EP driver, whilst it might be
> > > infrastructure used on a bunch of devices,  it is a feature of that particular
> > > EP - so you'd want to provide any functionality in a form that could be used
> > > by both the EP driver and a nice shiny new portdrv replacement.
> > >   
> > > > > 
> > > > > Can  you please help us out by going through this thread [3] and letting us
> > > > > know if our understanding is wrong at any point. If you think it is a better
> > > > > idea to integrate this in the EDAC framework, can you guide me as
> > > > > to how I can utilize the framework better?
> > > > > Please let me know if you need any other information to conclude.
> > > > > 
> > > > > [1] https://lore.kernel.org/linux-pci/20231121013400.18367-1-xueshuai@linux.alibaba.com/
> > > > > [2] https://lore.kernel.org/all/20210518174618.42089-1-shradha.t@samsung.com/T/
> > > > > [3] https://lore.kernel.org/all/20231130115044.53512-1-shradha.t@samsung.com/
> > > > >     
> > > > 
> > > > Gentle ping for the EDAC maintainers.
> > > > 
> > > > - Mani
> > > >   
> > > > > Thanks,
> > > > > Shradha
> > > > >     
> > > > > > -----Original Message-----
> > > > > > From: 'Manivannan Sadhasivam' <manivannan.sadhasivam@linaro.org>
> > > > > > Sent: 16 February 2024 19:19
> > > > > > To: Shradha Todi <shradha.t@samsung.com>
> > > > > > Cc: lpieralisi@kernel.org; kw@linux.com; robh@kernel.org;
> > > > > > bhelgaas@google.com; jingoohan1@gmail.com;
> > > > > > gustavo.pimentel@synopsys.com; josh@joshtriplett.org;
> > > > > > lukas.bulwahn@gmail.com; hongxing.zhu@nxp.com;
> > > > > > pankaj.dubey@samsung.com; linux-kernel@vger.kernel.org; linux-
> > > > > > pci@vger.kernel.org; vidyas@nvidia.com; gost.dev@samsung.com
> > > > > > Subject: Re: [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW
> > > > > > controller
> > > > > > 
> > > > > > On Thu, Feb 15, 2024 at 02:55:06PM +0530, Shradha Todi wrote:    
> > > > > > >
> > > > > > >    
> > > > > > 
> > > > > > [...]
> > > > > >     
> > > > > > > > For the error injection and counters, we already have the EDAC
> > > > > > > > framework. So adding them in the DWC driver doesn't make sense to me.
> > > > > > > >    
> > > > > > >
> > > > > > > Sorry for late response, was going through the EDAC framework to understand    
> > > > > > better how we can fit RAS DES support in it. Below are some technical challenges
> > > > > > found so far:    
> > > > > > > 1: This debugfs framework proposed [1] can run on both side of the link i.e. RC    
> > > > > > and EP as it will be a part of the link controller platform driver. Here for the EP
> > > > > > side the assumption is that it has Linux running, which is primarily a use case for
> > > > > > chip-to-chip communication.  After your suggestion to migrate to EDAC
> > > > > > framework we studied and here are the findings:    
> > > > > > > - If we move to EDAC framework, we need to have RAS DES as a
> > > > > > > pci_driver which will be binded based on vendor_id and device_id. Our
> > > > > > > observation is that on EP side system we are unable to bind two
> > > > > > > function driver (pci_driver), as pci_endpoint_test function driver or
> > > > > > > some other chip-to-chip function driver will already be bound. On the
> > > > > > > other hand, on RC side we observed that if we have portdrv enabled in
> > > > > > > Linux running on RC system, it gets bound to RC controller and then it
> > > > > > > does not allow EDAC pci_driver to bind. So basically we see a problem
> > > > > > > here, that we can't have two pci_driver binding to same PCI device
> > > > > > > 2: Another point is even though we use EDAC driver framework, we may not be    
> > > > > > able to use any of EDAC framework APIs as they are mostly suitable for memory
> > > > > > controller devices sitting on PCI BUS. We will end up using debugfs entries just via
> > > > > > a pci_driver placed inside EDAC framework.
> > > > > > 
> > > > > > Please wrap your replies to 80 characters.
> > > > > > 
> > > > > > There is no need to bind the edac driver to VID:PID of the device. The edac driver
> > > > > > can be a platform driver and you can instantiate the platform device from the
> > > > > > DWC driver. This way, the PCI device can be assocaited with whatever driver, but
> > > > > > still there can be a separate edac driver for handling errors.
> > > > > > 
> > > > > > Regarding API limitation, you should ask the maintainer about the possibility of
> > > > > > extending them.
> > > > > >     
> > > > > > >
> > > > > > > Please let me know if my understanding is wrong.
> > > > > > >    
> > > > > > > > But first check with the perf driver author if they have any plans
> > > > > > > > on adding the proposed functionality. If they do not have any plan
> > > > > > > > or not working on it, then look into EDAC.
> > > > > > > >
> > > > > > > > - Mani
> > > > > > > >    
> > > > > > >
> > > > > > > Since we already worked and posted patches [1], [2], we will continue to work    
> > > > > > on this and based on consent from community we will adopt to most suitable
> > > > > > framework.    
> > > > > > > We see many subsystems like ethernet, usb, gpu, cxl having debugfs files that    
> > > > > > give information about the current status of the running system and as of now
> > > > > > based on our findings, we still feel there is no harm in having debugfs entry based
> > > > > > support in DesignWare controller driver itself.
> > > > > > 
> > > > > > There is no issue in exposing the debug information through debugfs, that's the
> > > > > > sole purpose of the interface. But here, you are trying to add support for DWC
> > > > > > RAS feature for which a dedicated framework already exists.
> > > > > > 
> > > > > > And there will be more similar requests coming for vendor specific error protocols
> > > > > > as well. So your investigation could benefit everyone.
> > > > > > 
> > > > > > From your above investigation, looks like there are some shortcomings of the
> > > > > > EDAC framework. So let's get that clarified by writing to the EDAC maintainers
> > > > > > (keep us in CC). If the EDAC maintainer suggests you to add support for this
> > > > > > feature in DWC driver itself citing some reasons, then no issues with me.
> > > > > > 
> > > > > > - Mani
> > > > > > 
> > > > > > --
> > > > > > மணிவண்ணன் சதாசிவம்    
> > > > > 
> > > > >     
> > > >   
> > >   
> > 
> 

-- 
மணிவண்ணன் சதாசிவம்

  reply	other threads:[~2024-03-22 14:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20231130115055epcas5p4e29befa80877be45dbee308846edc0ba@epcas5p4.samsung.com>
2023-11-30 11:50 ` [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW controller Shradha Todi
     [not found]   ` <CGME20231130115103epcas5p19a56bf80e3c7cb062dba9e60d7363039@epcas5p1.samsung.com>
2023-11-30 11:50     ` [PATCH v2 1/3] PCI: dwc: Add support for vendor specific capability search Shradha Todi
     [not found]   ` <CGME20231130115108epcas5p1b874d14bc1e306a0860c6671b149a35c@epcas5p1.samsung.com>
2023-11-30 11:50     ` [PATCH v2 2/3] PCI: debugfs: Add support for RASDES framework in DWC Shradha Todi
     [not found]   ` <CGME20231130115113epcas5p4bcd4ffb2baac60a0be51d6a3cb15c2a6@epcas5p4.samsung.com>
2023-11-30 11:50     ` [PATCH v2 3/3] PCI: dwc: Create debugfs files in DWC driver Shradha Todi
2023-11-30 16:55   ` [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW controller Manivannan Sadhasivam
2023-12-04  8:40     ` Shradha Todi
2024-01-03  5:43     ` Shradha Todi
2024-01-04  5:50       ` 'Manivannan Sadhasivam'
2024-02-15  9:25         ` Shradha Todi
2024-02-16 13:49           ` 'Manivannan Sadhasivam'
2024-02-22 11:00             ` Shradha Todi
2024-03-19 16:33               ` 'Manivannan Sadhasivam'
2024-03-20 10:01                 ` Jonathan Cameron
2024-03-22 10:39                   ` 'Manivannan Sadhasivam'
2024-03-22 11:21                     ` Shradha Todi
2024-03-22 12:58                     ` Jonathan Cameron
2024-03-22 14:41                       ` 'Manivannan Sadhasivam' [this message]
2024-04-24 15:32                 ` 'Manivannan Sadhasivam'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240322144141.GC3333@thinkpad \
    --to=manivannan.sadhasivam@linaro.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=Terry.Bowman@amd.com \
    --cc=alim.akhtar@samsung.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=gost.dev@samsung.com \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=hongxing.zhu@nxp.com \
    --cc=james.morse@arm.com \
    --cc=jingoohan1@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kw@linux.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mchehab@kernel.org \
    --cc=pankaj.dubey@samsung.com \
    --cc=robh@kernel.org \
    --cc=rric@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=shradha.t@samsung.com \
    --cc=tony.luck@intel.com \
    --cc=vidyas@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).