linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vidya Sagar <vidyas@nvidia.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: <jingoohan1@gmail.com>, <gustavo.pimentel@synopsys.com>,
	<lorenzo.pieralisi@arm.com>, <bhelgaas@google.com>,
	<amurray@thegoodpenguin.co.uk>, <robh@kernel.org>,
	<treding@nvidia.com>, <jonathanh@nvidia.com>,
	<linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<kthota@nvidia.com>, <mmaddireddy@nvidia.com>,
	<sagar.tv@gmail.com>
Subject: Re: [PATCH V3 2/2] PCI: dwc: Add support to configure for ECRC
Date: Wed, 4 Nov 2020 23:10:45 +0530	[thread overview]
Message-ID: <91b92687-3bb5-7d87-548b-00485dcad63e@nvidia.com> (raw)
In-Reply-To: <20201104162233.GA341405@bjorn-Precision-5520>



On 11/4/2020 9:52 PM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Nov 04, 2020 at 05:13:07PM +0530, Vidya Sagar wrote:
>> On 11/4/2020 2:37 AM, Bjorn Helgaas wrote:
>>> On Tue, Nov 03, 2020 at 08:57:01AM +0530, Vidya Sagar wrote:
>>>> On 11/3/2020 4:32 AM, Bjorn Helgaas wrote:
>>>>> On Thu, Oct 29, 2020 at 11:09:59AM +0530, Vidya Sagar wrote:
>>>>>> DesignWare core has a TLP digest (TD) override bit in one of the control
>>>>>> registers of ATU. This bit also needs to be programmed for proper ECRC
>>>>>> functionality. This is currently identified as an issue with DesignWare
>>>>>> IP version 4.90a. This patch does the required programming in ATU upon
>>>>>> querying the system policy for ECRC.
>>>>>
>>>>> I guess this is a hardware defect, right?
>>>> Yes. This is common across all DWC implementations (version 4.90 precisely)
>>>>
>>>>> How much of a problem would it be if we instead added a "no_ecrc"
>>>>> quirk for this hardware so we never enabled ECRC?
>>>> Well, on Tegra for some of the high fidelity use cases, ECRC is required to
>>>> be turned on and if it can be done safely with these patches, why shouldn't
>>>> we not enable ECRC at all?
>>>>
>>>>> IIUC, the current Linux support of ECRC is a single choice at
>>>>> boot-time: by default ECRC is not enabled, but if you boot with
>>>>> "pci=ecrc=on", we turn on ECRC for every device.
>>>>>
>>>>> That seems like the minimal support, but I think the spec allows ECRC
>>>>> to be enabled selectively, on individual devices.  I can imagine a
>>>>> sysfs knob that would allow us to enable/disable ECRC per-device at
>>>>> run-time.
>>>>>
>>>>> If we had such a sysfs knob, it would be pretty ugly and maybe
>>>>> impractical to work around this hardware issue.  So I'm a little bit
>>>>> hesitant to add functionality that might have to be removed in the
>>>>> future.
>>>>
>>>> Agree with this. But since it is a boot-time choice at this point, I think
>>>> we can still go ahead with this approach to have a working ECRC mechanism
>>>> right? I don't see any sysfs knob for AER controlling at this point.
>>>
>>> I don't want to do anything that will prevent us from adding
>>> per-device ECRC control in the future.
>>>
>>> My concern is that if we add a run-time sysfs knob in the future, the
>>> user experience on this hardware will be poor because there's no
>>> convenient path to twiddle the PCIE_ATU_TD bit when the generic code
>>> changes the AER Control bit.
>>
>> Agree.
>> Can we add it to the documentation that run time changing of ECRC settings
>> are not supported on this (i.e. Tegra194) platform (or for that matter on
>> any SoC with PCIe based on DesignWare core version 4.90A). By 'not
>> supported', I meant that the ECRC digest part may not work but the normal
>> functionality will continue to work without reporting any AER errors. I
>> tried to emulate the following scenarios on Tegra194 silicon and here are my
>> observations.
>> FWIW, I'm referring to the PCIe spec Rev 5.0 Ver 1.0 (22 May 2019)
>>
>>> What is the failure mode in these scenarios:
>>>
>>>     - User boots with defaults, ECRC is disabled.
>>>     - User enables ECRC via sysfs.
>>>     - What happens here?  ECRC is enabled via AER Control but not via
>>>       DWC TD bit.  I assume ECRC doesn't work.  Does it cause PCIe
>>>       errors (malformed TLP, etc)?
>>
>> Since DWC TD bit is not programmed, for all the transactions that go through
>> ATU, TLP Digest won't get generated (although AER registers indicate that
>> ECRC should get generated).
>> As per the spec section "2.7.1 ECRC Rules"
>>
>> ---
>> If a device Function is enabled to generate ECRC, it must calculate and
>> apply ECRC for all TLPs originated by the Function
>> ---
>>
>> So the RP would be violating the PCIe spec, but it doesn't result in any
>> error because the same section has the following rule as well
>>
>> ---
>> If a device Function is enabled to check ECRC, it must do so for all TLPs
>> with ECRC where the device is the ultimate PCI Express Receiver
>>        Note that it is still possible for the Function to receive TLPs without
>> ECRC, and these are processed normally - this is not an error
>> ---
>>
>> so, even if the EP has ECRC check enabled, because of the above rule, it
>> just processes those packets without ECRC as normal packets.
>> Basically, whoever is enabling ECRC run time gets cheated as transactions
>> routed through ATU don't really have ECRC digest
>>
>>> Or this one:
>>>
>>>     - User boots with "pci=ecrc=on", ECRC is enabled.
>>>     - ECRC works fine.
>>>     - User disables ECRC via sysfs.
>>>     - What happens here?  ECRC is disabled via AER Control, but DWC TD
>>>       bit thinks it's still enabled.
>>
>> In this case, the EP doesn't have ECRC check enabled but receives TLPs with
>> ECRC digest. This again won't result in any error because of the section
>> "2.2.3 TLP Digest Rules" which says
>>
>> ---
>> If an intermediate or ultimate PCI Express Receiver of the TLP does not
>> support ECRC checking, the Receiver must ignore the TLP Digest
>> ---
>>
>> So the EP just ignores the TLP Digest and process the TLP normally.
>> Although functionality wise there is no issue here, there could be some
>> impact on the perf because of the extra DWord data. This is again debatable
>> as the perf/data path is typically from EP's DMA engine to host system
>> memory and not config/BAR accesses.
>>>
>>> If you enabled ECRC unconditionally on DWC and the sysfs knob had no
>>> effect, I'd be OK with that.  I'm more worried about what happens when
>>> the AER bit and the DWC TD bit are set so they don't match.  What is
>>> the failure mode there?
>>
>> Based on the above experiments, we can as well keep the DWC TD bit
>> programmed unconditionally and it won't lead to any errors. As mentioned
>> before, the only downside of it is the extra DWord in each ATU routed TLP
>> which may load the bus (in downstream direction) with no real benefit as
>> such.
> 
> IIUC the issue only affects traffic from the Root Port to the
> Endpoint, and traffic going upstream is unaffected.  Here's what I
> think happens based on the RP and EP settings, please correct me if
> wrong:
> 
>    1)  RP TD+ ECRC_gen+       generates ECRC
>        EP     ECRC_check-     ignores ECRC
>        EP     ECRC_check+     checks ECRC
> 
>    2)  RP TD+ ECRC_gen-       generates ECRC when it shouldn't (defect)
>        EP     ECRC_check-     ignores ECRC
>        EP     ECRC_check+     checks ECRC (may signal errors)
> 
>    3)  RP TD- ECRC_gen+       fails to generate ECRC (defect)
>        EP     ECRC_check-     ignores ECRC
>        EP     ECRC_check+     handles TLPs without ECRC normally,
>                               but cannot detect ECRC errors
> 
>    4)  RP TD- ECRC_gen-       no ECRC generated (as expected)
>        EP     ECRC_check-     ignores ECRC
>        EP     ECRC_check+     handles TLPs without ECRC normally
> 
> If my assumptions above are correct, this defect never causes extra
> errors like Malformed TLP, etc, to be signaled regardless of the AER
> ECRC_check settings.
Yes.
> 
> The only functional problem is that in case 3, the EP *should* be able
> to check and signal ECRC errors, but it cannot because the RP doesn't
> generate ECRC.
Yes. Thats correct
> 
> Case 2 is a performance issue because we add the extra dword in every
> TLP going downstream.  That doesn't seem unreasonable since this is
> just config and BAR accesses.  The EP may detect ECRC errors when we
> don't think the RP is even generating ECRC, but in general we won't
> enable checking in the EP unless we also enable generation in the RP.
> 
> So I think setting the DWC TD bit unconditionally seems like a pretty
> good solution.
Ok. Sure. I'll push a patch to program DWC TD bit unconditionally then.
Thanks for your insights and guidance.

> 
>> Please let me know where can we go from here.
>> I can push a different patch to keep DWC TD bit always programmed if that is
>> the best approach to take at this point.
>>>
>>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>>> Reviewed-by: Jingoo Han <jingoohan1@gmail.com>
>>>>>> ---
>>>>>> V3:
>>>>>> * Added 'Reviewed-by: Jingoo Han <jingoohan1@gmail.com>'
>>>>>>
>>>>>> V2:
>>>>>> * Addressed Jingoo's review comment
>>>>>> * Removed saving 'td' bit information in 'dw_pcie' structure
>>>>>>
>>>>>>     drivers/pci/controller/dwc/pcie-designware.c | 8 ++++++--
>>>>>>     drivers/pci/controller/dwc/pcie-designware.h | 1 +
>>>>>>     2 files changed, 7 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
>>>>>> index b5e438b70cd5..cbd651b219d2 100644
>>>>>> --- a/drivers/pci/controller/dwc/pcie-designware.c
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.c
>>>>>> @@ -246,6 +246,8 @@ static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, u8 func_no,
>>>>>>          dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_TARGET,
>>>>>>                                   upper_32_bits(pci_addr));
>>>>>>          val = type | PCIE_ATU_FUNC_NUM(func_no);
>>>>>> +     if (pci->version == 0x490A)
>>>>>> +             val = val | pcie_is_ecrc_enabled() << PCIE_ATU_TD_SHIFT;
>>>>>>          val = upper_32_bits(size - 1) ?
>>>>>>                  val | PCIE_ATU_INCREASE_REGION_SIZE : val;
>>>>>>          dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, val);
>>>>>> @@ -294,8 +296,10 @@ static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
>>>>>>                             lower_32_bits(pci_addr));
>>>>>>          dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET,
>>>>>>                             upper_32_bits(pci_addr));
>>>>>> -     dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
>>>>>> -                        PCIE_ATU_FUNC_NUM(func_no));
>>>>>> +     val = type | PCIE_ATU_FUNC_NUM(func_no);
>>>>>> +     if (pci->version == 0x490A)
>>>>>> +             val = val | pcie_is_ecrc_enabled() << PCIE_ATU_TD_SHIFT;
>>>>>> +     dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, val);
>>>>>>          dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE);
>>>>>>
>>>>>>          /*
>>>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
>>>>>> index e7f441441db2..b01ef407fd52 100644
>>>>>> --- a/drivers/pci/controller/dwc/pcie-designware.h
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.h
>>>>>> @@ -89,6 +89,7 @@
>>>>>>     #define PCIE_ATU_TYPE_IO             0x2
>>>>>>     #define PCIE_ATU_TYPE_CFG0           0x4
>>>>>>     #define PCIE_ATU_TYPE_CFG1           0x5
>>>>>> +#define PCIE_ATU_TD_SHIFT            8
>>>>>>     #define PCIE_ATU_FUNC_NUM(pf)           ((pf) << 20)
>>>>>>     #define PCIE_ATU_CR2                 0x908
>>>>>>     #define PCIE_ATU_ENABLE                      BIT(31)
>>>>>> --
>>>>>> 2.17.1
>>>>>>

      reply	other threads:[~2020-11-04 17:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29  5:39 [PATCH V3 0/2] Add support to configure DWC for ECRC Vidya Sagar
2020-10-29  5:39 ` [PATCH V3 1/2] PCI/AER: Add pcie_is_ecrc_enabled() API Vidya Sagar
2020-10-29  5:39 ` [PATCH V3 2/2] PCI: dwc: Add support to configure for ECRC Vidya Sagar
2020-10-29 22:03   ` Jingoo Han
2020-10-30  6:45     ` Vidya Sagar
2020-11-02 14:15   ` Rob Herring
2020-11-02 14:27     ` Vidya Sagar
2020-11-02 15:11       ` Gustavo Pimentel
2020-11-02 21:16         ` Rob Herring
2020-11-02 22:38           ` Gustavo Pimentel
2020-11-03 19:48             ` Rob Herring
2020-11-02 23:02   ` Bjorn Helgaas
2020-11-03  3:27     ` Vidya Sagar
2020-11-03 21:07       ` Bjorn Helgaas
2020-11-04 11:43         ` Vidya Sagar
2020-11-04 16:22           ` Bjorn Helgaas
2020-11-04 17:40             ` Vidya Sagar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91b92687-3bb5-7d87-548b-00485dcad63e@nvidia.com \
    --to=vidyas@nvidia.com \
    --cc=amurray@thegoodpenguin.co.uk \
    --cc=bhelgaas@google.com \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=helgaas@kernel.org \
    --cc=jingoohan1@gmail.com \
    --cc=jonathanh@nvidia.com \
    --cc=kthota@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mmaddireddy@nvidia.com \
    --cc=robh@kernel.org \
    --cc=sagar.tv@gmail.com \
    --cc=treding@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).