* [PATCH] Use maximum latency when determining L1 ASPM
@ 2020-07-26 22:06 Ian Kumlien
2020-07-26 22:06 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-07-26 22:06 UTC (permalink / raw)
To: linux-pci
Hi, just thought that I should send this out properly...
I would also suggest that this should go in to stable, if you don't
disagree =)
I'm still interested in the L0S but I don't know how they work...
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Use maximum latency when determining L1 ASPM
2020-07-26 22:06 [PATCH] Use maximum latency when determining L1 ASPM Ian Kumlien
@ 2020-07-26 22:06 ` Ian Kumlien
2020-07-27 21:17 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-07-26 22:06 UTC (permalink / raw)
To: linux-pci; +Cc: Ian Kumlien
The current solution verifies per "hop" but doesn't
check the maximum latency, instead it checks the
current "hops" max latency for upstream and downstrea,
This would work if all "hops" have the same latency, but:
root -> a -> b -> c -> endpoint
If c or b has the higest latency, it might not register
Fix this by maintaining a maximum value for comparison
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
drivers/pci/pcie/aspm.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..bd53fba7f382 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
* substate latencies (and hence do not do any check).
*/
latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+ l1_max_latency = max_t(u32, latency, l1_max_latency);
if ((link->aspm_capable & ASPM_STATE_L1) &&
- (latency + l1_switch_latency > acceptable->l1))
+ (l1_max_latency + l1_switch_latency > acceptable->l1))
link->aspm_capable &= ~ASPM_STATE_L1;
l1_switch_latency += 1000;
--
2.27.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-07-26 22:06 ` Ian Kumlien
@ 2020-07-27 21:17 ` Ian Kumlien
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kumlien @ 2020-07-27 21:17 UTC (permalink / raw)
To: linux-pci
Sorry, the changelog is broken -- Will resend...
That'll teach me for thinking after working out... =)
On Mon, Jul 27, 2020 at 12:07 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> The current solution verifies per "hop" but doesn't
> check the maximum latency, instead it checks the
> current "hops" max latency for upstream and downstrea,
>
> This would work if all "hops" have the same latency, but:
>
> root -> a -> b -> c -> endpoint
>
> If c or b has the higest latency, it might not register
>
> Fix this by maintaining a maximum value for comparison
>
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
> drivers/pci/pcie/aspm.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bd53fba7f382 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> {
> - u32 latency, l1_switch_latency = 0;
> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> struct aspm_latency *acceptable;
> struct pcie_link_state *link;
>
> @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> * substate latencies (and hence do not do any check).
> */
> latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> + l1_max_latency = max_t(u32, latency, l1_max_latency);
> if ((link->aspm_capable & ASPM_STATE_L1) &&
> - (latency + l1_switch_latency > acceptable->l1))
> + (l1_max_latency + l1_switch_latency > acceptable->l1))
> link->aspm_capable &= ~ASPM_STATE_L1;
> l1_switch_latency += 1000;
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Use maximum latency when determining L1 ASPM
@ 2020-07-27 21:30 Ian Kumlien
2020-07-29 22:27 ` Bjorn Helgaas
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-07-27 21:30 UTC (permalink / raw)
To: linux-pci; +Cc: Ian Kumlien
Currently we check the maximum latency of upstream and downstream
per link, not the maximum for the path
This would work if all links have the same latency, but:
endpoint -> c -> b -> a -> root (in the order we walk the path)
If c or b has the higest latency, it will not register
Fix this by maintaining the maximum latency value for the path
This change fixes a regression introduced by:
66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
drivers/pci/pcie/aspm.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..bd53fba7f382 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
* substate latencies (and hence do not do any check).
*/
latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+ l1_max_latency = max_t(u32, latency, l1_max_latency);
if ((link->aspm_capable & ASPM_STATE_L1) &&
- (latency + l1_switch_latency > acceptable->l1))
+ (l1_max_latency + l1_switch_latency > acceptable->l1))
link->aspm_capable &= ~ASPM_STATE_L1;
l1_switch_latency += 1000;
--
2.27.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-07-27 21:30 Ian Kumlien
@ 2020-07-29 22:27 ` Bjorn Helgaas
2020-07-29 22:43 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2020-07-29 22:27 UTC (permalink / raw)
To: Ian Kumlien; +Cc: linux-pci, Kai-Heng Feng, Mika Westerberg
On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
>
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root (in the order we walk the path)
>
> If c or b has the higest latency, it will not register
>
> Fix this by maintaining the maximum latency value for the path
>
> This change fixes a regression introduced by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
Hi Ian,
Sorry about the regression, and thank you very much for doing the
hard work of debugging and fixing it!
My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM
to be enabled on a longer path, and we weren't computing the maximum
latency correctly, so ASPM on that longer path exceeded the amount we
could tolerate. If that's the case, 66ff14e59e8a probably just
exposed an existing problem that could occur in other topologies even
without 66ff14e59e8a.
I'd like to work through this latency code with concrete examples.
Can you collect the "sudo lspci -vv" output and attach it to an entry
at https://bugzilla.kernel.org? If it's convenient, it would be
really nice to compare it with similar output from before this patch.
Bjorn
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
> drivers/pci/pcie/aspm.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bd53fba7f382 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> {
> - u32 latency, l1_switch_latency = 0;
> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> struct aspm_latency *acceptable;
> struct pcie_link_state *link;
>
> @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> * substate latencies (and hence do not do any check).
> */
> latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> + l1_max_latency = max_t(u32, latency, l1_max_latency);
> if ((link->aspm_capable & ASPM_STATE_L1) &&
> - (latency + l1_switch_latency > acceptable->l1))
> + (l1_max_latency + l1_switch_latency > acceptable->l1))
> link->aspm_capable &= ~ASPM_STATE_L1;
> l1_switch_latency += 1000;
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-07-29 22:27 ` Bjorn Helgaas
@ 2020-07-29 22:43 ` Ian Kumlien
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kumlien @ 2020-07-29 22:43 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-pci, Kai-Heng Feng, Mika Westerberg
On Thu, Jul 30, 2020 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote:
> > Currently we check the maximum latency of upstream and downstream
> > per link, not the maximum for the path
> >
> > This would work if all links have the same latency, but:
> > endpoint -> c -> b -> a -> root (in the order we walk the path)
> >
> > If c or b has the higest latency, it will not register
> >
> > Fix this by maintaining the maximum latency value for the path
> >
> > This change fixes a regression introduced by:
> > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> Hi Ian,
>
> Sorry about the regression, and thank you very much for doing the
> hard work of debugging and fixing it!
>
> My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM
> to be enabled on a longer path, and we weren't computing the maximum
> latency correctly, so ASPM on that longer path exceeded the amount we
> could tolerate. If that's the case, 66ff14e59e8a probably just
> exposed an existing problem that could occur in other topologies even
> without 66ff14e59e8a.
I agree, this is why I didn't do fixes:. but it does fix a regression
- and it's hard to say
exactly what - I'd like it to go in to stable when accepted...
But we can rewrite it anyway you see fit :)
> I'd like to work through this latency code with concrete examples.
> Can you collect the "sudo lspci -vv" output and attach it to an entry
> at https://bugzilla.kernel.org? If it's convenient, it would be
> really nice to compare it with similar output from before this patch.
I can cut from the mails i had in the conversation with Alexander Duyck
I submitted it against PCI, you can find it here:
https://bugzilla.kernel.org/show_bug.cgi?id=208741
Still filling in the data
> Bjorn
>
> > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > ---
> > drivers/pci/pcie/aspm.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b17e5ffd31b1..bd53fba7f382 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> > static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > {
> > - u32 latency, l1_switch_latency = 0;
> > + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> > struct aspm_latency *acceptable;
> > struct pcie_link_state *link;
> >
> > @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > * substate latencies (and hence do not do any check).
> > */
> > latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > + l1_max_latency = max_t(u32, latency, l1_max_latency);
> > if ((link->aspm_capable & ASPM_STATE_L1) &&
> > - (latency + l1_switch_latency > acceptable->l1))
> > + (l1_max_latency + l1_switch_latency > acceptable->l1))
> > link->aspm_capable &= ~ASPM_STATE_L1;
> > l1_switch_latency += 1000;
> >
> > --
> > 2.27.0
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] Use maximum latency when determining L1 ASPM
@ 2020-10-07 13:28 Ian Kumlien
2020-10-08 4:20 ` Kai-Heng Feng
2020-10-08 16:13 ` Bjorn Helgaas
0 siblings, 2 replies; 21+ messages in thread
From: Ian Kumlien @ 2020-10-07 13:28 UTC (permalink / raw)
To: helgaas, linux-pci, alexander.duyck, refactormyself, puranjay12,
kai.heng.feng
Cc: Ian Kumlien
Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
"5.4.1.2.2. Exit from the L1 State"
Which makes it clear that each switch is required to initiate a
transition within 1μs from receiving it, accumulating this latency and
then we have to wait for the slowest link along the path before
entering L0 state from L1.
The current code doesn't take the maximum latency into account.
From the example:
+----------------+
| |
| Root complex |
| |
| +-----+ |
| |32 μs| |
+----------------+
|
| Link 1
|
+----------------+
| |8 μs| |
| +----+ |
| Switch A |
| +----+ |
| |8 μs| |
+----------------+
|
| Link 2
|
+----------------+
| |32 μs| |
| +-----+ |
| Switch B |
| +-----+ |
| |32 μs| |
+----------------+
|
| Link 3
|
+----------------+
| |8μs| |
| +---+ |
| Endpoint C |
| |
| |
+----------------+
Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
it's ports, Link 3 will transition to L0 at T+32 (longest time
considering T+8 for endpoint C and T+32 for switch B).
Switch B is required to initiate a transition from the L1 state on it's
upstream port after no more than 1 μs from the beginning of the
transition from L1 state on the downstream port. Therefore, transition from
L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
The path will exit L1 at T+34.
On my specific system:
lspci -PP -s 04:00.0
00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
lspci -vvv -s 04:00.0
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
...
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
...
Which means that it can't be followed by any switch that is in L1 state.
This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
LnkCtl LnkCtl
------DevCap------- ----LnkCap------- -Before- -After--
00:01.2 L1 <32us L1+ L1-
01:00.0 L1 <32us L1+ L1-
02:04.0 L1 <32us L1+ L1-
04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..893b37669087 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
(link->latency_dw.l0s > acceptable->l0s))
link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+
/*
* Check L1 latency.
- * Every switch on the path to root complex need 1
- * more microsecond for L1. Spec doesn't mention L0s.
+ *
+ * PCIe r5.0, sec 5.4.1.2.2 states:
+ * A Switch is required to initiate an L1 exit transition on its
+ * Upstream Port Link after no more than 1 μs from the beginning of an
+ * L1 exit transition on any of its Downstream Port Links.
*
* The exit latencies for L1 substates are not advertised
* by a device. Since the spec also doesn't mention a way
@@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
* L1 exit latencies advertised by a device include L1
* substate latencies (and hence do not do any check).
*/
- latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
- if ((link->aspm_capable & ASPM_STATE_L1) &&
- (latency + l1_switch_latency > acceptable->l1))
- link->aspm_capable &= ~ASPM_STATE_L1;
- l1_switch_latency += 1000;
+ if (link->aspm_capable & ASPM_STATE_L1) {
+ latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+ l1_max_latency = max_t(u32, latency, l1_max_latency);
+ if (l1_max_latency + l1_switch_latency > acceptable->l1)
+ link->aspm_capable &= ~ASPM_STATE_L1;
+
+ l1_switch_latency += 1000;
+ }
link = link->parent;
}
--
2.28.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-07 13:28 Ian Kumlien
@ 2020-10-08 4:20 ` Kai-Heng Feng
2020-10-08 16:13 ` Bjorn Helgaas
1 sibling, 0 replies; 21+ messages in thread
From: Kai-Heng Feng @ 2020-10-08 4:20 UTC (permalink / raw)
To: Ian Kumlien
Cc: Bjorn Helgaas, open list:PCI SUBSYSTEM, Alexander Duyck,
refactormyself, puranjay12
> On Oct 7, 2020, at 21:28, Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> "5.4.1.2.2. Exit from the L1 State"
>
> Which makes it clear that each switch is required to initiate a
> transition within 1μs from receiving it, accumulating this latency and
> then we have to wait for the slowest link along the path before
> entering L0 state from L1.
>
> The current code doesn't take the maximum latency into account.
>
> From the example:
> +----------------+
> | |
> | Root complex |
> | |
> | +-----+ |
> | |32 μs| |
> +----------------+
> |
> | Link 1
> |
> +----------------+
> | |8 μs| |
> | +----+ |
> | Switch A |
> | +----+ |
> | |8 μs| |
> +----------------+
> |
> | Link 2
> |
> +----------------+
> | |32 μs| |
> | +-----+ |
> | Switch B |
> | +-----+ |
> | |32 μs| |
> +----------------+
> |
> | Link 3
> |
> +----------------+
> | |8μs| |
> | +---+ |
> | Endpoint C |
> | |
> | |
> +----------------+
>
> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> it's ports, Link 3 will transition to L0 at T+32 (longest time
> considering T+8 for endpoint C and T+32 for switch B).
>
> Switch B is required to initiate a transition from the L1 state on it's
> upstream port after no more than 1 μs from the beginning of the
> transition from L1 state on the downstream port. Therefore, transition from
> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
>
> The path will exit L1 at T+34.
>
> On my specific system:
> lspci -PP -s 04:00.0
> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
>
> lspci -vvv -s 04:00.0
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ...
> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> ...
>
> Which means that it can't be followed by any switch that is in L1 state.
>
> This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
>
> LnkCtl LnkCtl
> ------DevCap------- ----LnkCap------- -Before- -After--
> 00:01.2 L1 <32us L1+ L1-
> 01:00.0 L1 <32us L1+ L1-
> 02:04.0 L1 <32us L1+ L1-
> 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
>
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
Solves an issue that enabling ASPM on Dell Latitude 5495 causes system freeze.
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 253c30cc1967..893b37669087 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> {
> - u32 latency, l1_switch_latency = 0;
> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> struct aspm_latency *acceptable;
> struct pcie_link_state *link;
>
> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> (link->latency_dw.l0s > acceptable->l0s))
> link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +
> /*
> * Check L1 latency.
> - * Every switch on the path to root complex need 1
> - * more microsecond for L1. Spec doesn't mention L0s.
> + *
> + * PCIe r5.0, sec 5.4.1.2.2 states:
> + * A Switch is required to initiate an L1 exit transition on its
> + * Upstream Port Link after no more than 1 μs from the beginning of an
> + * L1 exit transition on any of its Downstream Port Links.
> *
> * The exit latencies for L1 substates are not advertised
> * by a device. Since the spec also doesn't mention a way
> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> * L1 exit latencies advertised by a device include L1
> * substate latencies (and hence do not do any check).
> */
> - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> - if ((link->aspm_capable & ASPM_STATE_L1) &&
> - (latency + l1_switch_latency > acceptable->l1))
> - link->aspm_capable &= ~ASPM_STATE_L1;
> - l1_switch_latency += 1000;
> + if (link->aspm_capable & ASPM_STATE_L1) {
> + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> + l1_max_latency = max_t(u32, latency, l1_max_latency);
> + if (l1_max_latency + l1_switch_latency > acceptable->l1)
> + link->aspm_capable &= ~ASPM_STATE_L1;
> +
> + l1_switch_latency += 1000;
> + }
>
> link = link->parent;
> }
> --
> 2.28.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-07 13:28 Ian Kumlien
2020-10-08 4:20 ` Kai-Heng Feng
@ 2020-10-08 16:13 ` Bjorn Helgaas
2020-10-12 10:20 ` Ian Kumlien
1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2020-10-08 16:13 UTC (permalink / raw)
To: Ian Kumlien
Cc: linux-pci, alexander.duyck, refactormyself, puranjay12, kai.heng.feng
On Wed, Oct 07, 2020 at 03:28:08PM +0200, Ian Kumlien wrote:
> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> "5.4.1.2.2. Exit from the L1 State"
>
> Which makes it clear that each switch is required to initiate a
> transition within 1μs from receiving it, accumulating this latency and
> then we have to wait for the slowest link along the path before
> entering L0 state from L1.
>
> The current code doesn't take the maximum latency into account.
>
> From the example:
> +----------------+
> | |
> | Root complex |
> | |
> | +-----+ |
> | |32 μs| |
> +----------------+
> |
> | Link 1
> |
> +----------------+
> | |8 μs| |
> | +----+ |
> | Switch A |
> | +----+ |
> | |8 μs| |
> +----------------+
> |
> | Link 2
> |
> +----------------+
> | |32 μs| |
> | +-----+ |
> | Switch B |
> | +-----+ |
> | |32 μs| |
> +----------------+
> |
> | Link 3
> |
> +----------------+
> | |8μs| |
> | +---+ |
> | Endpoint C |
> | |
> | |
> +----------------+
>
> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> it's ports, Link 3 will transition to L0 at T+32 (longest time
> considering T+8 for endpoint C and T+32 for switch B).
>
> Switch B is required to initiate a transition from the L1 state on it's
> upstream port after no more than 1 μs from the beginning of the
> transition from L1 state on the downstream port. Therefore, transition from
> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
>
> The path will exit L1 at T+34.
>
> On my specific system:
> lspci -PP -s 04:00.0
> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
>
> lspci -vvv -s 04:00.0
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ...
> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> ...
>
> Which means that it can't be followed by any switch that is in L1 state.
>
> This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
>
> LnkCtl LnkCtl
> ------DevCap------- ----LnkCap------- -Before- -After--
> 00:01.2 L1 <32us L1+ L1-
> 01:00.0 L1 <32us L1+ L1-
> 02:04.0 L1 <32us L1+ L1-
> 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
OK, now we're getting close. We just need to flesh out the
justification. We need:
- Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
and follow the example.
- Description of the problem. I think it's poor bandwidth on your
Intel I211 device, but we don't have the complete picture because
that NIC is 03:00.0, which doesn't appear above at all.
- Explanation of what's wrong with the "before" ASPM configuration.
I want to identify what is wrong on your system. The generic
"doesn't match spec" part is good, but step 1 is the specific
details, step 2 is the generalization to relate it to the spec.
- Complete "sudo lspci -vv" information for before and after the
patch below. https://bugzilla.kernel.org/show_bug.cgi?id=208741
has some of this, but some of the lspci output appears to be
copy/pasted and lost all its formatting, and it's not clear how
some was collected (what kernel version, with/without patch, etc).
Since I'm asking for bugzilla attachments, there's no space
constraint, so just attach the complete unedited output for the
whole system.
- URL to the bugzilla. Please open a new one with just the relevant
problem report ("NIC is slow") and attach (1) "before" lspci
output, (2) proposed patch, (3) "after" lspci output. The
existing 208741 report is full of distractions and jumps to the
conclusion without actually starting with the details of the
problem.
Some of this I would normally just do myself, but I can't get the
lspci info. It would be really nice if Kai-Heng could also add
before/after lspci output from the system he tested.
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
> drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 253c30cc1967..893b37669087 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> {
> - u32 latency, l1_switch_latency = 0;
> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> struct aspm_latency *acceptable;
> struct pcie_link_state *link;
>
> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> (link->latency_dw.l0s > acceptable->l0s))
> link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +
> /*
> * Check L1 latency.
> - * Every switch on the path to root complex need 1
> - * more microsecond for L1. Spec doesn't mention L0s.
> + *
> + * PCIe r5.0, sec 5.4.1.2.2 states:
> + * A Switch is required to initiate an L1 exit transition on its
> + * Upstream Port Link after no more than 1 μs from the beginning of an
> + * L1 exit transition on any of its Downstream Port Links.
> *
> * The exit latencies for L1 substates are not advertised
> * by a device. Since the spec also doesn't mention a way
> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> * L1 exit latencies advertised by a device include L1
> * substate latencies (and hence do not do any check).
> */
> - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> - if ((link->aspm_capable & ASPM_STATE_L1) &&
> - (latency + l1_switch_latency > acceptable->l1))
> - link->aspm_capable &= ~ASPM_STATE_L1;
> - l1_switch_latency += 1000;
> + if (link->aspm_capable & ASPM_STATE_L1) {
> + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> + l1_max_latency = max_t(u32, latency, l1_max_latency);
> + if (l1_max_latency + l1_switch_latency > acceptable->l1)
> + link->aspm_capable &= ~ASPM_STATE_L1;
> +
> + l1_switch_latency += 1000;
> + }
>
> link = link->parent;
> }
> --
> 2.28.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-08 16:13 ` Bjorn Helgaas
@ 2020-10-12 10:20 ` Ian Kumlien
2020-10-14 8:34 ` Kai-Heng Feng
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-10-12 10:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, Alexander Duyck, Saheed O. Bolarinwa, Puranjay Mohan,
Kai-Heng Feng
On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Oct 07, 2020 at 03:28:08PM +0200, Ian Kumlien wrote:
> > Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> > "5.4.1.2.2. Exit from the L1 State"
> >
> > Which makes it clear that each switch is required to initiate a
> > transition within 1μs from receiving it, accumulating this latency and
> > then we have to wait for the slowest link along the path before
> > entering L0 state from L1.
> >
> > The current code doesn't take the maximum latency into account.
> >
> > From the example:
> > +----------------+
> > | |
> > | Root complex |
> > | |
> > | +-----+ |
> > | |32 μs| |
> > +----------------+
> > |
> > | Link 1
> > |
> > +----------------+
> > | |8 μs| |
> > | +----+ |
> > | Switch A |
> > | +----+ |
> > | |8 μs| |
> > +----------------+
> > |
> > | Link 2
> > |
> > +----------------+
> > | |32 μs| |
> > | +-----+ |
> > | Switch B |
> > | +-----+ |
> > | |32 μs| |
> > +----------------+
> > |
> > | Link 3
> > |
> > +----------------+
> > | |8μs| |
> > | +---+ |
> > | Endpoint C |
> > | |
> > | |
> > +----------------+
> >
> > Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> > transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> > it's ports, Link 3 will transition to L0 at T+32 (longest time
> > considering T+8 for endpoint C and T+32 for switch B).
> >
> > Switch B is required to initiate a transition from the L1 state on it's
> > upstream port after no more than 1 μs from the beginning of the
> > transition from L1 state on the downstream port. Therefore, transition from
> > L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
> >
> > The path will exit L1 at T+34.
> >
> > On my specific system:
> > lspci -PP -s 04:00.0
> > 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
> >
> > lspci -vvv -s 04:00.0
> > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > ...
> > LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> > ...
> >
> > Which means that it can't be followed by any switch that is in L1 state.
> >
> > This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
> >
> > LnkCtl LnkCtl
> > ------DevCap------- ----LnkCap------- -Before- -After--
> > 00:01.2 L1 <32us L1+ L1-
> > 01:00.0 L1 <32us L1+ L1-
> > 02:04.0 L1 <32us L1+ L1-
> > 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
>
> OK, now we're getting close. We just need to flesh out the
> justification. We need:
>
> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
> and follow the example.
Will do
> - Description of the problem. I think it's poor bandwidth on your
> Intel I211 device, but we don't have the complete picture because
> that NIC is 03:00.0, which doesn't appear above at all.
I think we'll use Kai-Hengs issue, since it's actually more related to
the change itself...
Mine is a side effect while Kai-Heng is actually hitting an issue
caused by the bug.
> - Explanation of what's wrong with the "before" ASPM configuration.
> I want to identify what is wrong on your system. The generic
> "doesn't match spec" part is good, but step 1 is the specific
> details, step 2 is the generalization to relate it to the spec.
>
> - Complete "sudo lspci -vv" information for before and after the
> patch below. https://bugzilla.kernel.org/show_bug.cgi?id=208741
> has some of this, but some of the lspci output appears to be
> copy/pasted and lost all its formatting, and it's not clear how
> some was collected (what kernel version, with/without patch, etc).
> Since I'm asking for bugzilla attachments, there's no space
> constraint, so just attach the complete unedited output for the
> whole system.
>
> - URL to the bugzilla. Please open a new one with just the relevant
> problem report ("NIC is slow") and attach (1) "before" lspci
> output, (2) proposed patch, (3) "after" lspci output. The
> existing 208741 report is full of distractions and jumps to the
> conclusion without actually starting with the details of the
> problem.
>
> Some of this I would normally just do myself, but I can't get the
> lspci info. It would be really nice if Kai-Heng could also add
> before/after lspci output from the system he tested.
>
> > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > ---
> > drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
> > 1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 253c30cc1967..893b37669087 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> > static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > {
> > - u32 latency, l1_switch_latency = 0;
> > + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> > struct aspm_latency *acceptable;
> > struct pcie_link_state *link;
> >
> > @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > (link->latency_dw.l0s > acceptable->l0s))
> > link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +
> > /*
> > * Check L1 latency.
> > - * Every switch on the path to root complex need 1
> > - * more microsecond for L1. Spec doesn't mention L0s.
> > + *
> > + * PCIe r5.0, sec 5.4.1.2.2 states:
> > + * A Switch is required to initiate an L1 exit transition on its
> > + * Upstream Port Link after no more than 1 μs from the beginning of an
> > + * L1 exit transition on any of its Downstream Port Links.
> > *
> > * The exit latencies for L1 substates are not advertised
> > * by a device. Since the spec also doesn't mention a way
> > @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > * L1 exit latencies advertised by a device include L1
> > * substate latencies (and hence do not do any check).
> > */
> > - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > - if ((link->aspm_capable & ASPM_STATE_L1) &&
> > - (latency + l1_switch_latency > acceptable->l1))
> > - link->aspm_capable &= ~ASPM_STATE_L1;
> > - l1_switch_latency += 1000;
> > + if (link->aspm_capable & ASPM_STATE_L1) {
> > + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > + l1_max_latency = max_t(u32, latency, l1_max_latency);
> > + if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > + link->aspm_capable &= ~ASPM_STATE_L1;
> > +
> > + l1_switch_latency += 1000;
> > + }
> >
> > link = link->parent;
> > }
> > --
> > 2.28.0
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-12 10:20 ` Ian Kumlien
@ 2020-10-14 8:34 ` Kai-Heng Feng
2020-10-14 13:33 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Kai-Heng Feng @ 2020-10-14 8:34 UTC (permalink / raw)
To: Ian Kumlien
Cc: Bjorn Helgaas, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
> On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>> On Wed, Oct 07, 2020 at 03:28:08PM +0200, Ian Kumlien wrote:
>>> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
>>> "5.4.1.2.2. Exit from the L1 State"
>>>
>>> Which makes it clear that each switch is required to initiate a
>>> transition within 1μs from receiving it, accumulating this latency and
>>> then we have to wait for the slowest link along the path before
>>> entering L0 state from L1.
>>>
>>> The current code doesn't take the maximum latency into account.
>>>
>>> From the example:
>>> +----------------+
>>> | |
>>> | Root complex |
>>> | |
>>> | +-----+ |
>>> | |32 μs| |
>>> +----------------+
>>> |
>>> | Link 1
>>> |
>>> +----------------+
>>> | |8 μs| |
>>> | +----+ |
>>> | Switch A |
>>> | +----+ |
>>> | |8 μs| |
>>> +----------------+
>>> |
>>> | Link 2
>>> |
>>> +----------------+
>>> | |32 μs| |
>>> | +-----+ |
>>> | Switch B |
>>> | +-----+ |
>>> | |32 μs| |
>>> +----------------+
>>> |
>>> | Link 3
>>> |
>>> +----------------+
>>> | |8μs| |
>>> | +---+ |
>>> | Endpoint C |
>>> | |
>>> | |
>>> +----------------+
>>>
>>> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
>>> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
>>> it's ports, Link 3 will transition to L0 at T+32 (longest time
>>> considering T+8 for endpoint C and T+32 for switch B).
>>>
>>> Switch B is required to initiate a transition from the L1 state on it's
>>> upstream port after no more than 1 μs from the beginning of the
>>> transition from L1 state on the downstream port. Therefore, transition from
>>> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
>>>
>>> The path will exit L1 at T+34.
>>>
>>> On my specific system:
>>> lspci -PP -s 04:00.0
>>> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
>>>
>>> lspci -vvv -s 04:00.0
>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>>> ...
>>> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
>>> ...
>>>
>>> Which means that it can't be followed by any switch that is in L1 state.
>>>
>>> This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
>>>
>>> LnkCtl LnkCtl
>>> ------DevCap------- ----LnkCap------- -Before- -After--
>>> 00:01.2 L1 <32us L1+ L1-
>>> 01:00.0 L1 <32us L1+ L1-
>>> 02:04.0 L1 <32us L1+ L1-
>>> 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
>>
>> OK, now we're getting close. We just need to flesh out the
>> justification. We need:
>>
>> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
>> and follow the example.
>
> Will do
>
>> - Description of the problem. I think it's poor bandwidth on your
>> Intel I211 device, but we don't have the complete picture because
>> that NIC is 03:00.0, which doesn't appear above at all.
>
> I think we'll use Kai-Hengs issue, since it's actually more related to
> the change itself...
>
> Mine is a side effect while Kai-Heng is actually hitting an issue
> caused by the bug.
I filed a bug here:
https://bugzilla.kernel.org/show_bug.cgi?id=209671
Kai-Heng
>
>> - Explanation of what's wrong with the "before" ASPM configuration.
>> I want to identify what is wrong on your system. The generic
>> "doesn't match spec" part is good, but step 1 is the specific
>> details, step 2 is the generalization to relate it to the spec.
>>
>> - Complete "sudo lspci -vv" information for before and after the
>> patch below. https://bugzilla.kernel.org/show_bug.cgi?id=208741
>> has some of this, but some of the lspci output appears to be
>> copy/pasted and lost all its formatting, and it's not clear how
>> some was collected (what kernel version, with/without patch, etc).
>> Since I'm asking for bugzilla attachments, there's no space
>> constraint, so just attach the complete unedited output for the
>> whole system.
>>
>> - URL to the bugzilla. Please open a new one with just the relevant
>> problem report ("NIC is slow") and attach (1) "before" lspci
>> output, (2) proposed patch, (3) "after" lspci output. The
>> existing 208741 report is full of distractions and jumps to the
>> conclusion without actually starting with the details of the
>> problem.
>>
>> Some of this I would normally just do myself, but I can't get the
>> lspci info. It would be really nice if Kai-Heng could also add
>> before/after lspci output from the system he tested.
>>
>>> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
>>> ---
>>> drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
>>> 1 file changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>> index 253c30cc1967..893b37669087 100644
>>> --- a/drivers/pci/pcie/aspm.c
>>> +++ b/drivers/pci/pcie/aspm.c
>>> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>>>
>>> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>>> {
>>> - u32 latency, l1_switch_latency = 0;
>>> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
>>> struct aspm_latency *acceptable;
>>> struct pcie_link_state *link;
>>>
>>> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>>> if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
>>> (link->latency_dw.l0s > acceptable->l0s))
>>> link->aspm_capable &= ~ASPM_STATE_L0S_DW;
>>> +
>>> /*
>>> * Check L1 latency.
>>> - * Every switch on the path to root complex need 1
>>> - * more microsecond for L1. Spec doesn't mention L0s.
>>> + *
>>> + * PCIe r5.0, sec 5.4.1.2.2 states:
>>> + * A Switch is required to initiate an L1 exit transition on its
>>> + * Upstream Port Link after no more than 1 μs from the beginning of an
>>> + * L1 exit transition on any of its Downstream Port Links.
>>> *
>>> * The exit latencies for L1 substates are not advertised
>>> * by a device. Since the spec also doesn't mention a way
>>> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>>> * L1 exit latencies advertised by a device include L1
>>> * substate latencies (and hence do not do any check).
>>> */
>>> - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
>>> - if ((link->aspm_capable & ASPM_STATE_L1) &&
>>> - (latency + l1_switch_latency > acceptable->l1))
>>> - link->aspm_capable &= ~ASPM_STATE_L1;
>>> - l1_switch_latency += 1000;
>>> + if (link->aspm_capable & ASPM_STATE_L1) {
>>> + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
>>> + l1_max_latency = max_t(u32, latency, l1_max_latency);
>>> + if (l1_max_latency + l1_switch_latency > acceptable->l1)
>>> + link->aspm_capable &= ~ASPM_STATE_L1;
>>> +
>>> + l1_switch_latency += 1000;
>>> + }
>>>
>>> link = link->parent;
>>> }
>>> --
>>> 2.28.0
>>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-14 8:34 ` Kai-Heng Feng
@ 2020-10-14 13:33 ` Ian Kumlien
2020-10-14 14:36 ` Bjorn Helgaas
2020-10-16 21:28 ` Bjorn Helgaas
0 siblings, 2 replies; 21+ messages in thread
From: Ian Kumlien @ 2020-10-14 13:33 UTC (permalink / raw)
To: Kai-Heng Feng
Cc: Bjorn Helgaas, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Wed, Oct 14, 2020 at 10:34 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
>
>
> > On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>
> >> On Wed, Oct 07, 2020 at 03:28:08PM +0200, Ian Kumlien wrote:
> >>> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> >>> "5.4.1.2.2. Exit from the L1 State"
> >>>
> >>> Which makes it clear that each switch is required to initiate a
> >>> transition within 1μs from receiving it, accumulating this latency and
> >>> then we have to wait for the slowest link along the path before
> >>> entering L0 state from L1.
> >>>
> >>> The current code doesn't take the maximum latency into account.
> >>>
> >>> From the example:
> >>> +----------------+
> >>> | |
> >>> | Root complex |
> >>> | |
> >>> | +-----+ |
> >>> | |32 μs| |
> >>> +----------------+
> >>> |
> >>> | Link 1
> >>> |
> >>> +----------------+
> >>> | |8 μs| |
> >>> | +----+ |
> >>> | Switch A |
> >>> | +----+ |
> >>> | |8 μs| |
> >>> +----------------+
> >>> |
> >>> | Link 2
> >>> |
> >>> +----------------+
> >>> | |32 μs| |
> >>> | +-----+ |
> >>> | Switch B |
> >>> | +-----+ |
> >>> | |32 μs| |
> >>> +----------------+
> >>> |
> >>> | Link 3
> >>> |
> >>> +----------------+
> >>> | |8μs| |
> >>> | +---+ |
> >>> | Endpoint C |
> >>> | |
> >>> | |
> >>> +----------------+
> >>>
> >>> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> >>> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> >>> it's ports, Link 3 will transition to L0 at T+32 (longest time
> >>> considering T+8 for endpoint C and T+32 for switch B).
> >>>
> >>> Switch B is required to initiate a transition from the L1 state on it's
> >>> upstream port after no more than 1 μs from the beginning of the
> >>> transition from L1 state on the downstream port. Therefore, transition from
> >>> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
> >>>
> >>> The path will exit L1 at T+34.
> >>>
> >>> On my specific system:
> >>> lspci -PP -s 04:00.0
> >>> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
> >>>
> >>> lspci -vvv -s 04:00.0
> >>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> >>> ...
> >>> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> >>> ...
> >>>
> >>> Which means that it can't be followed by any switch that is in L1 state.
> >>>
> >>> This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2.
> >>>
> >>> LnkCtl LnkCtl
> >>> ------DevCap------- ----LnkCap------- -Before- -After--
> >>> 00:01.2 L1 <32us L1+ L1-
> >>> 01:00.0 L1 <32us L1+ L1-
> >>> 02:04.0 L1 <32us L1+ L1-
> >>> 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1-
> >>
> >> OK, now we're getting close. We just need to flesh out the
> >> justification. We need:
> >>
> >> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
> >> and follow the example.
> >
> > Will do
> >
> >> - Description of the problem. I think it's poor bandwidth on your
> >> Intel I211 device, but we don't have the complete picture because
> >> that NIC is 03:00.0, which doesn't appear above at all.
> >
> > I think we'll use Kai-Hengs issue, since it's actually more related to
> > the change itself...
> >
> > Mine is a side effect while Kai-Heng is actually hitting an issue
> > caused by the bug.
>
> I filed a bug here:
> https://bugzilla.kernel.org/show_bug.cgi?id=209671
Thanks!
I'm actually starting to think that reporting what we do with the
latency bit could
be beneficial - i.e. report which links have their L1 disabled due to
which device...
I also think that this could benefit debugging - I have no clue of how
to read the lspci:s - I mean i do see some differences that might be
the fix but nothing really specific without a proper message in
dmesg....
Björn, what do you think?
> Kai-Heng
>
> >
> >> - Explanation of what's wrong with the "before" ASPM configuration.
> >> I want to identify what is wrong on your system. The generic
> >> "doesn't match spec" part is good, but step 1 is the specific
> >> details, step 2 is the generalization to relate it to the spec.
> >>
> >> - Complete "sudo lspci -vv" information for before and after the
> >> patch below. https://bugzilla.kernel.org/show_bug.cgi?id=208741
> >> has some of this, but some of the lspci output appears to be
> >> copy/pasted and lost all its formatting, and it's not clear how
> >> some was collected (what kernel version, with/without patch, etc).
> >> Since I'm asking for bugzilla attachments, there's no space
> >> constraint, so just attach the complete unedited output for the
> >> whole system.
> >>
> >> - URL to the bugzilla. Please open a new one with just the relevant
> >> problem report ("NIC is slow") and attach (1) "before" lspci
> >> output, (2) proposed patch, (3) "after" lspci output. The
> >> existing 208741 report is full of distractions and jumps to the
> >> conclusion without actually starting with the details of the
> >> problem.
> >>
> >> Some of this I would normally just do myself, but I can't get the
> >> lspci info. It would be really nice if Kai-Heng could also add
> >> before/after lspci output from the system he tested.
> >>
> >>> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> >>> ---
> >>> drivers/pci/pcie/aspm.c | 23 +++++++++++++++--------
> >>> 1 file changed, 15 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>> index 253c30cc1967..893b37669087 100644
> >>> --- a/drivers/pci/pcie/aspm.c
> >>> +++ b/drivers/pci/pcie/aspm.c
> >>> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >>>
> >>> static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >>> {
> >>> - u32 latency, l1_switch_latency = 0;
> >>> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> >>> struct aspm_latency *acceptable;
> >>> struct pcie_link_state *link;
> >>>
> >>> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >>> if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> >>> (link->latency_dw.l0s > acceptable->l0s))
> >>> link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> >>> +
> >>> /*
> >>> * Check L1 latency.
> >>> - * Every switch on the path to root complex need 1
> >>> - * more microsecond for L1. Spec doesn't mention L0s.
> >>> + *
> >>> + * PCIe r5.0, sec 5.4.1.2.2 states:
> >>> + * A Switch is required to initiate an L1 exit transition on its
> >>> + * Upstream Port Link after no more than 1 μs from the beginning of an
> >>> + * L1 exit transition on any of its Downstream Port Links.
> >>> *
> >>> * The exit latencies for L1 substates are not advertised
> >>> * by a device. Since the spec also doesn't mention a way
> >>> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >>> * L1 exit latencies advertised by a device include L1
> >>> * substate latencies (and hence do not do any check).
> >>> */
> >>> - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> >>> - if ((link->aspm_capable & ASPM_STATE_L1) &&
> >>> - (latency + l1_switch_latency > acceptable->l1))
> >>> - link->aspm_capable &= ~ASPM_STATE_L1;
> >>> - l1_switch_latency += 1000;
> >>> + if (link->aspm_capable & ASPM_STATE_L1) {
> >>> + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> >>> + l1_max_latency = max_t(u32, latency, l1_max_latency);
> >>> + if (l1_max_latency + l1_switch_latency > acceptable->l1)
> >>> + link->aspm_capable &= ~ASPM_STATE_L1;
> >>> +
> >>> + l1_switch_latency += 1000;
> >>> + }
> >>>
> >>> link = link->parent;
> >>> }
> >>> --
> >>> 2.28.0
> >>>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-14 13:33 ` Ian Kumlien
@ 2020-10-14 14:36 ` Bjorn Helgaas
2020-10-14 15:39 ` Ian Kumlien
2020-10-16 21:28 ` Bjorn Helgaas
1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2020-10-14 14:36 UTC (permalink / raw)
To: Ian Kumlien
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
> I'm actually starting to think that reporting what we do with the
> latency bit could
> be beneficial - i.e. report which links have their L1 disabled due to
> which device...
>
> I also think that this could benefit debugging - I have no clue of how
> to read the lspci:s - I mean i do see some differences that might be
> the fix but nothing really specific without a proper message in
> dmesg....
Yeah, might be worth doing. Probably pci_dbg() since I think it would
be too chatty to be enabled all the time.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-14 14:36 ` Bjorn Helgaas
@ 2020-10-14 15:39 ` Ian Kumlien
2020-10-16 14:53 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-10-14 15:39 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Wed, Oct 14, 2020 at 4:36 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
>
> > I'm actually starting to think that reporting what we do with the
> > latency bit could
> > be beneficial - i.e. report which links have their L1 disabled due to
> > which device...
> >
> > I also think that this could benefit debugging - I have no clue of how
> > to read the lspci:s - I mean i do see some differences that might be
> > the fix but nothing really specific without a proper message in
> > dmesg....
>
> Yeah, might be worth doing. Probably pci_dbg() since I think it would
> be too chatty to be enabled all the time.
OK, will look in to it =)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-14 15:39 ` Ian Kumlien
@ 2020-10-16 14:53 ` Ian Kumlien
0 siblings, 0 replies; 21+ messages in thread
From: Ian Kumlien @ 2020-10-16 14:53 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Wed, Oct 14, 2020 at 5:39 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Oct 14, 2020 at 4:36 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
> >
> > > I'm actually starting to think that reporting what we do with the
> > > latency bit could
> > > be beneficial - i.e. report which links have their L1 disabled due to
> > > which device...
> > >
> > > I also think that this could benefit debugging - I have no clue of how
> > > to read the lspci:s - I mean i do see some differences that might be
> > > the fix but nothing really specific without a proper message in
> > > dmesg....
> >
> > Yeah, might be worth doing. Probably pci_dbg() since I think it would
> > be too chatty to be enabled all the time.
>
> OK, will look in to it =)
Just did some very basic hack, since i think it's much better to get
the information in dmesg than have to boot will full debug.
I assume that there is a niftier way to do it - but i wanted some
feedback basically...
My current output is:
dmesg |grep latency
[ 0.817872] pci 0000:04:00.0: ASPM latency exceeded, disabling:
L1:0000:01:00.0-0000:00:01.2
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..5a5146f47aae 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+ bool aspm_disable = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -446,6 +447,16 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
link = endpoint->bus->self->link_state;
acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
+#define aspm_info(device, type, down, up) \
+ if (!aspm_disable) \
+ { \
+ pr_cont("pci %s: ASPM latency exceeded, disabling: %s:%s-%s", \
+ pci_name(device), type, pci_name(down), pci_name(up)); \
+ aspm_disable = 1; \
+ } \
+ else \
+ pr_cont(", %s:%s-%s", type, pci_name(down), pci_name(up));
+
while (link) {
/* Check upstream direction L0s latency */
if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
@@ -456,10 +467,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
(link->latency_dw.l0s > acceptable->l0s))
link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+
/*
* Check L1 latency.
- * Every switch on the path to root complex need 1
- * more microsecond for L1. Spec doesn't mention L0s.
+ *
+ * PCIe r5.0, sec 5.4.1.2.2 states:
+ * A Switch is required to initiate an L1 exit transition on its
+ * Upstream Port Link after no more than 1 μs from the
beginning of an
+ * L1 exit transition on any of its Downstream Port Links.
*
* The exit latencies for L1 substates are not advertised
* by a device. Since the spec also doesn't mention a way
@@ -469,14 +484,22 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
* L1 exit latencies advertised by a device include L1
* substate latencies (and hence do not do any check).
*/
- latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
- if ((link->aspm_capable & ASPM_STATE_L1) &&
- (latency + l1_switch_latency > acceptable->l1))
- link->aspm_capable &= ~ASPM_STATE_L1;
- l1_switch_latency += 1000;
+ if (link->aspm_capable & ASPM_STATE_L1) {
+ latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
+ l1_max_latency = max_t(u32, latency, l1_max_latency);
+ if (l1_max_latency + l1_switch_latency > acceptable->l1)
+ {
+ aspm_info(endpoint, "L1",
link->downstream, link->pdev);
+ link->aspm_capable &= ~ASPM_STATE_L1;
+ }
+ l1_switch_latency += 1000;
+ }
link = link->parent;
}
+ if (aspm_disable)
+ pr_cont("\n");
+#undef aspm_info
}
for L1 and L0s, which we haven't discussed that much yet:
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 5a5146f47aae..b01d393e742d 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+ l0s_latency_up = 0, l0s_latency_dw = 0;
bool aspm_disable = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -459,14 +460,24 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
while (link) {
/* Check upstream direction L0s latency */
- if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
- (link->latency_up.l0s > acceptable->l0s))
- link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+ if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+ l0s_latency_up += link->latency_up.l0s;
+ if (l0s_latency_up > acceptable->l0s)
+ {
+ aspm_info(endpoint, "L0s-up",
link->downstream, link->pdev);
+ link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+ }
+ }
/* Check downstream direction L0s latency */
- if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
- (link->latency_dw.l0s > acceptable->l0s))
- link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+ if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+ l0s_latency_dw += link->latency_dw.l0s;
+ if (l0s_latency_dw > acceptable->l0s)
+ {
+ aspm_info(endpoint, "L0s-dw",
link->downstream, link->pdev);
+ link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+ }
+ }
/*
* Check L1 latency.
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-14 13:33 ` Ian Kumlien
2020-10-14 14:36 ` Bjorn Helgaas
@ 2020-10-16 21:28 ` Bjorn Helgaas
2020-10-16 22:41 ` Ian Kumlien
1 sibling, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2020-10-16 21:28 UTC (permalink / raw)
To: Ian Kumlien
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
> On Wed, Oct 14, 2020 at 10:34 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
> > > On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >> OK, now we're getting close. We just need to flesh out the
> > >> justification. We need:
> > >>
> > >> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
> > >> and follow the example.
> > >
> > > Will do
> > >
> > >> - Description of the problem. I think it's poor bandwidth on your
> > >> Intel I211 device, but we don't have the complete picture because
> > >> that NIC is 03:00.0, which doesn't appear above at all.
> > >
> > > I think we'll use Kai-Hengs issue, since it's actually more related to
> > > the change itself...
> > >
> > > Mine is a side effect while Kai-Heng is actually hitting an issue
> > > caused by the bug.
> >
> > I filed a bug here:
> > https://bugzilla.kernel.org/show_bug.cgi?id=209671
>
> Thanks!
Sigh. I feel like I'm just not getting anywhere here. I still do not
have a "before" and "after" set of lspci output.
Kai-Heng's bugzilla has two sets of output, but one is a working
config with CONFIG_PCIEASPM_DEFAULT=y and the other is a working
config with Ian's patch applied and CONFIG_PCIEASPM_POWERSAVE=y.
Comparing them doesn't show the effect of Ian's patch; it shows the
combined effect of Ian's patch and the CONFIG_PCIEASPM_POWERSAVE=y
change. I'm not really interested in spending a few hours trying to
reverse-engineer the important changes.
Can you please, please, collect these on your system, Ian? I assume
that you can easily collect it once without your patch, when you see
poor I211 NIC performance but the system is otherwise working. And
you can collect it again *with* your patch. Same Kconfig, same
*everything* except adding your patch.
Bjorn
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-16 21:28 ` Bjorn Helgaas
@ 2020-10-16 22:41 ` Ian Kumlien
2020-10-18 11:35 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-10-16 22:41 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Fri, Oct 16, 2020 at 11:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
> > On Wed, Oct 14, 2020 at 10:34 AM Kai-Heng Feng
> > <kai.heng.feng@canonical.com> wrote:
> > > > On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> > > >> OK, now we're getting close. We just need to flesh out the
> > > >> justification. We need:
> > > >>
> > > >> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
> > > >> and follow the example.
> > > >
> > > > Will do
> > > >
> > > >> - Description of the problem. I think it's poor bandwidth on your
> > > >> Intel I211 device, but we don't have the complete picture because
> > > >> that NIC is 03:00.0, which doesn't appear above at all.
> > > >
> > > > I think we'll use Kai-Hengs issue, since it's actually more related to
> > > > the change itself...
> > > >
> > > > Mine is a side effect while Kai-Heng is actually hitting an issue
> > > > caused by the bug.
> > >
> > > I filed a bug here:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=209671
> >
> > Thanks!
>
> Sigh. I feel like I'm just not getting anywhere here. I still do not
> have a "before" and "after" set of lspci output.
>
> Kai-Heng's bugzilla has two sets of output, but one is a working
> config with CONFIG_PCIEASPM_DEFAULT=y and the other is a working
> config with Ian's patch applied and CONFIG_PCIEASPM_POWERSAVE=y.
>
> Comparing them doesn't show the effect of Ian's patch; it shows the
> combined effect of Ian's patch and the CONFIG_PCIEASPM_POWERSAVE=y
> change. I'm not really interested in spending a few hours trying to
> reverse-engineer the important changes.
>
> Can you please, please, collect these on your system, Ian? I assume
> that you can easily collect it once without your patch, when you see
> poor I211 NIC performance but the system is otherwise working. And
> you can collect it again *with* your patch. Same Kconfig, same
> *everything* except adding your patch.
Yeah I can do that, but I would like the changes output from the
latest patch suggestion
running on Kai-Heng's system so we can actually see what it does...
> Bjorn
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-16 22:41 ` Ian Kumlien
@ 2020-10-18 11:35 ` Ian Kumlien
2020-10-22 15:37 ` Bjorn Helgaas
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-10-18 11:35 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Sat, Oct 17, 2020 at 12:41 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Fri, Oct 16, 2020 at 11:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Oct 14, 2020 at 03:33:17PM +0200, Ian Kumlien wrote:
> > > On Wed, Oct 14, 2020 at 10:34 AM Kai-Heng Feng
> > > <kai.heng.feng@canonical.com> wrote:
> > > > > On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > > >> OK, now we're getting close. We just need to flesh out the
> > > > >> justification. We need:
> > > > >>
> > > > >> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c"
> > > > >> and follow the example.
> > > > >
> > > > > Will do
> > > > >
> > > > >> - Description of the problem. I think it's poor bandwidth on your
> > > > >> Intel I211 device, but we don't have the complete picture because
> > > > >> that NIC is 03:00.0, which doesn't appear above at all.
> > > > >
> > > > > I think we'll use Kai-Hengs issue, since it's actually more related to
> > > > > the change itself...
> > > > >
> > > > > Mine is a side effect while Kai-Heng is actually hitting an issue
> > > > > caused by the bug.
> > > >
> > > > I filed a bug here:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=209671
> > >
> > > Thanks!
> >
> > Sigh. I feel like I'm just not getting anywhere here. I still do not
> > have a "before" and "after" set of lspci output.
> >
> > Kai-Heng's bugzilla has two sets of output, but one is a working
> > config with CONFIG_PCIEASPM_DEFAULT=y and the other is a working
> > config with Ian's patch applied and CONFIG_PCIEASPM_POWERSAVE=y.
> >
> > Comparing them doesn't show the effect of Ian's patch; it shows the
> > combined effect of Ian's patch and the CONFIG_PCIEASPM_POWERSAVE=y
> > change. I'm not really interested in spending a few hours trying to
> > reverse-engineer the important changes.
> >
> > Can you please, please, collect these on your system, Ian? I assume
> > that you can easily collect it once without your patch, when you see
> > poor I211 NIC performance but the system is otherwise working. And
> > you can collect it again *with* your patch. Same Kconfig, same
> > *everything* except adding your patch.
>
> Yeah I can do that, but I would like the changes output from the
> latest patch suggestion
> running on Kai-Heng's system so we can actually see what it does...
Is:
https://bugzilla.kernel.org/show_bug.cgi?id=209725
More to your liking?
> > Bjorn
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-18 11:35 ` Ian Kumlien
@ 2020-10-22 15:37 ` Bjorn Helgaas
2020-10-22 15:41 ` Ian Kumlien
0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2020-10-22 15:37 UTC (permalink / raw)
To: Ian Kumlien
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Sun, Oct 18, 2020 at 01:35:27PM +0200, Ian Kumlien wrote:
> On Sat, Oct 17, 2020 at 12:41 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Fri, Oct 16, 2020 at 11:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > Can you please, please, collect these on your system, Ian? I assume
> > > that you can easily collect it once without your patch, when you see
> > > poor I211 NIC performance but the system is otherwise working. And
> > > you can collect it again *with* your patch. Same Kconfig, same
> > > *everything* except adding your patch.
> >
> > Yeah I can do that, but I would like the changes output from the
> > latest patch suggestion
> > running on Kai-Heng's system so we can actually see what it does...
>
> Is:
> https://bugzilla.kernel.org/show_bug.cgi?id=209725
That's a great start. Can you attach the patch to the bugzilla too,
please, so it is self-contained?
And also the analysis of the path from Root Port to Endpoint, with the
exit latencies of each link, the acceptable latency of the endpoint
and
(1) the computation done by the existing code that results in
"latency < acceptable" that means we can enable ASPM, and
(2) the correct computation per spec that results in
"latency > acceptable" so we cannot enable ASPM?
This analysis will be the core of the commit log, and the bugzilla
with lspci info is the supporting evidence.
Bjorn
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-22 15:37 ` Bjorn Helgaas
@ 2020-10-22 15:41 ` Ian Kumlien
2020-10-22 18:30 ` Bjorn Helgaas
0 siblings, 1 reply; 21+ messages in thread
From: Ian Kumlien @ 2020-10-22 15:41 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Thu, Oct 22, 2020 at 5:37 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sun, Oct 18, 2020 at 01:35:27PM +0200, Ian Kumlien wrote:
> > On Sat, Oct 17, 2020 at 12:41 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Fri, Oct 16, 2020 at 11:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> > > > Can you please, please, collect these on your system, Ian? I assume
> > > > that you can easily collect it once without your patch, when you see
> > > > poor I211 NIC performance but the system is otherwise working. And
> > > > you can collect it again *with* your patch. Same Kconfig, same
> > > > *everything* except adding your patch.
> > >
> > > Yeah I can do that, but I would like the changes output from the
> > > latest patch suggestion
> > > running on Kai-Heng's system so we can actually see what it does...
> >
> > Is:
> > https://bugzilla.kernel.org/show_bug.cgi?id=209725
>
> That's a great start. Can you attach the patch to the bugzilla too,
> please, so it is self-contained?
>
> And also the analysis of the path from Root Port to Endpoint, with the
> exit latencies of each link, the acceptable latency of the endpoint
> and
>
> (1) the computation done by the existing code that results in
> "latency < acceptable" that means we can enable ASPM, and
>
> (2) the correct computation per spec that results in
> "latency > acceptable" so we cannot enable ASPM?
>
> This analysis will be the core of the commit log, and the bugzilla
> with lspci info is the supporting evidence.
Ok, will do, there will be some bio-latency though
Were you ok with the pr_cont output per endpoint?
> Bjorn
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Use maximum latency when determining L1 ASPM
2020-10-22 15:41 ` Ian Kumlien
@ 2020-10-22 18:30 ` Bjorn Helgaas
0 siblings, 0 replies; 21+ messages in thread
From: Bjorn Helgaas @ 2020-10-22 18:30 UTC (permalink / raw)
To: Ian Kumlien
Cc: Kai-Heng Feng, linux-pci, Alexander Duyck, Saheed O. Bolarinwa,
Puranjay Mohan
On Thu, Oct 22, 2020 at 05:41:45PM +0200, Ian Kumlien wrote:
> On Thu, Oct 22, 2020 at 5:37 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Sun, Oct 18, 2020 at 01:35:27PM +0200, Ian Kumlien wrote:
> > > On Sat, Oct 17, 2020 at 12:41 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Fri, Oct 16, 2020 at 11:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > > > Can you please, please, collect these on your system, Ian? I assume
> > > > > that you can easily collect it once without your patch, when you see
> > > > > poor I211 NIC performance but the system is otherwise working. And
> > > > > you can collect it again *with* your patch. Same Kconfig, same
> > > > > *everything* except adding your patch.
> > > >
> > > > Yeah I can do that, but I would like the changes output from the
> > > > latest patch suggestion
> > > > running on Kai-Heng's system so we can actually see what it does...
> > >
> > > Is:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=209725
> >
> > That's a great start. Can you attach the patch to the bugzilla too,
> > please, so it is self-contained?
> >
> > And also the analysis of the path from Root Port to Endpoint, with the
> > exit latencies of each link, the acceptable latency of the endpoint
> > and
> >
> > (1) the computation done by the existing code that results in
> > "latency < acceptable" that means we can enable ASPM, and
> >
> > (2) the correct computation per spec that results in
> > "latency > acceptable" so we cannot enable ASPM?
> >
> > This analysis will be the core of the commit log, and the bugzilla
> > with lspci info is the supporting evidence.
>
> Ok, will do, there will be some bio-latency though
>
> Were you ok with the pr_cont output per endpoint?
Haven't looked at that yet. Will respond to that patch when I do.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-10-22 18:30 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-26 22:06 [PATCH] Use maximum latency when determining L1 ASPM Ian Kumlien
2020-07-26 22:06 ` Ian Kumlien
2020-07-27 21:17 ` Ian Kumlien
2020-07-27 21:30 Ian Kumlien
2020-07-29 22:27 ` Bjorn Helgaas
2020-07-29 22:43 ` Ian Kumlien
2020-10-07 13:28 Ian Kumlien
2020-10-08 4:20 ` Kai-Heng Feng
2020-10-08 16:13 ` Bjorn Helgaas
2020-10-12 10:20 ` Ian Kumlien
2020-10-14 8:34 ` Kai-Heng Feng
2020-10-14 13:33 ` Ian Kumlien
2020-10-14 14:36 ` Bjorn Helgaas
2020-10-14 15:39 ` Ian Kumlien
2020-10-16 14:53 ` Ian Kumlien
2020-10-16 21:28 ` Bjorn Helgaas
2020-10-16 22:41 ` Ian Kumlien
2020-10-18 11:35 ` Ian Kumlien
2020-10-22 15:37 ` Bjorn Helgaas
2020-10-22 15:41 ` Ian Kumlien
2020-10-22 18:30 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).