All of lore.kernel.org
 help / color / mirror / Atom feed
* hfi1 use of PCI internals
@ 2016-06-16 16:20 ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2016-06-16 16:20 UTC (permalink / raw)
  To: Mike Marciniszyn, Dennis Dalessandro
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA

I noticed drivers/infiniband/hw/hfi1 got moved from staging to
drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.

I know there have been lots of ASPM issues, both hardware problems and
Linux kernel problems, but it is *supposed* to be manageable by the
core, without special driver support.  What's the justification for
having to do this in the hfi1 driver?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* hfi1 use of PCI internals
@ 2016-06-16 16:20 ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2016-06-16 16:20 UTC (permalink / raw)
  To: Mike Marciniszyn, Dennis Dalessandro
  Cc: Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma,
	linux-kernel, linux-pci

I noticed drivers/infiniband/hw/hfi1 got moved from staging to
drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.

I know there have been lots of ASPM issues, both hardware problems and
Linux kernel problems, but it is *supposed* to be manageable by the
core, without special driver support.  What's the justification for
having to do this in the hfi1 driver?

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
  2016-06-16 16:20 ` Bjorn Helgaas
  (?)
@ 2016-06-16 18:48   ` Ashutosh Dixit
  -1 siblings, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-16 18:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org

On Thu, Jun 16 2016 at 12:20:52 PM, Bjorn Helgaas <helgaas-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> I noticed drivers/infiniband/hw/hfi1 got moved from staging to
> drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
> configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.
>
> I know there have been lots of ASPM issues, both hardware problems and
> Linux kernel problems, but it is *supposed* to be manageable by the
> core, without special driver support.  What's the justification for
> having to do this in the hfi1 driver?

The description for commit affa48de84 "staging/rdma/hfi1: Add support
for enabling/disabling PCIe ASPM" anticipates this question and
describes why this was done in the hfi1 driver:

    Finally, the kernel ASPM API is not used in this patch. This is
    because this patch does several non-standard things as SW
    workarounds for HW issues. As mentioned above, it enables ASPM even
    when advertised actual latencies are greater than acceptable
    latencies. Also, whereas the kernel API only allows drivers to
    disable ASPM from driver probe, this patch enables/disables ASPM
    directly from interrupt context. Due to these reasons the kernel
    ASPM API was not used.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
@ 2016-06-16 18:48   ` Ashutosh Dixit
  0 siblings, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-16 18:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16 2016 at 12:20:52 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> I noticed drivers/infiniband/hw/hfi1 got moved from staging to
> drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
> configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.
>
> I know there have been lots of ASPM issues, both hardware problems and
> Linux kernel problems, but it is *supposed* to be manageable by the
> core, without special driver support.  What's the justification for
> having to do this in the hfi1 driver?

The description for commit affa48de84 "staging/rdma/hfi1: Add support
for enabling/disabling PCIe ASPM" anticipates this question and
describes why this was done in the hfi1 driver:

    Finally, the kernel ASPM API is not used in this patch. This is
    because this patch does several non-standard things as SW
    workarounds for HW issues. As mentioned above, it enables ASPM even
    when advertised actual latencies are greater than acceptable
    latencies. Also, whereas the kernel API only allows drivers to
    disable ASPM from driver probe, this patch enables/disables ASPM
    directly from interrupt context. Due to these reasons the kernel
    ASPM API was not used.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
@ 2016-06-16 18:48   ` Ashutosh Dixit
  0 siblings, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-16 18:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16 2016 at 12:20:52 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> I noticed drivers/infiniband/hw/hfi1 got moved from staging to
> drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
> configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.
>
> I know there have been lots of ASPM issues, both hardware problems and
> Linux kernel problems, but it is *supposed* to be manageable by the
> core, without special driver support.  What's the justification for
> having to do this in the hfi1 driver?

The description for commit affa48de84 "staging/rdma/hfi1: Add support
for enabling/disabling PCIe ASPM" anticipates this question and
describes why this was done in the hfi1 driver:

    Finally, the kernel ASPM API is not used in this patch. This is
    because this patch does several non-standard things as SW
    workarounds for HW issues. As mentioned above, it enables ASPM even
    when advertised actual latencies are greater than acceptable
    latencies. Also, whereas the kernel API only allows drivers to
    disable ASPM from driver probe, this patch enables/disables ASPM
    directly from interrupt context. Due to these reasons the kernel
    ASPM API was not used.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
  2016-06-16 18:48   ` Ashutosh Dixit
  (?)
  (?)
@ 2016-06-16 20:08   ` Bjorn Helgaas
  2016-06-17 13:58     ` Dennis Dalessandro
  2016-06-17 22:05       ` Ashutosh Dixit
  -1 siblings, 2 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2016-06-16 20:08 UTC (permalink / raw)
  To: Ashutosh Dixit
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16, 2016 at 02:48:30PM -0400, Ashutosh Dixit wrote:
> On Thu, Jun 16 2016 at 12:20:52 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > I noticed drivers/infiniband/hw/hfi1 got moved from staging to
> > drivers/ for v4.7.  It does a bunch of grubbing around in PCIe ASPM
> > configuration, e.g., see drivers/infiniband/hw/hfi1/aspm.h.
> >
> > I know there have been lots of ASPM issues, both hardware problems and
> > Linux kernel problems, but it is *supposed* to be manageable by the
> > core, without special driver support.  What's the justification for
> > having to do this in the hfi1 driver?
> 
> The description for commit affa48de84 "staging/rdma/hfi1: Add support
> for enabling/disabling PCIe ASPM" anticipates this question and
> describes why this was done in the hfi1 driver:
> 
>     Finally, the kernel ASPM API is not used in this patch. This is
>     because this patch does several non-standard things as SW
>     workarounds for HW issues. As mentioned above, it enables ASPM even
>     when advertised actual latencies are greater than acceptable
>     latencies. Also, whereas the kernel API only allows drivers to
>     disable ASPM from driver probe, this patch enables/disables ASPM
>     directly from interrupt context. Due to these reasons the kernel
>     ASPM API was not used.

That's a good start, but leads to more questions.  For example, it
doesn't answer the obvious question of why the driver needs to
enable/disable ASPM from interrupt context.

Disabling ASPM should only require writing the device's Link Control
register.  The PCI core could probably provide an interface to do that
in interrupt context.

Enabling ASPM is not latency-critical and could probably be done from
a work queue outside interrupt context, although conceptually there
shouldn't be much required here either, and possibly the PCI core
interface could be improved.

It's possible the latency problem could be handled by some sort of
quirk that overrides the acceptable latency.

It's hard enough to get ASPM support in the PCI core correct without
having to worry about drivers doing their own thing behind the back of
the core.

As far as I can tell, none of these PCI questions were raised on
linux-pci, so we never even had a chance to have a conversation about
them.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
  2016-06-16 20:08   ` Bjorn Helgaas
@ 2016-06-17 13:58     ` Dennis Dalessandro
  2016-06-17 22:05       ` Ashutosh Dixit
  1 sibling, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2016-06-17 13:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ashutosh Dixit, Marciniszyn, Mike, Doug Ledford, Hefty, Sean,
	Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16, 2016 at 03:08:17PM -0500, Bjorn Helgaas wrote:
>As far as I can tell, none of these PCI questions were raised on
>linux-pci, so we never even had a chance to have a conversation about
>them.

I'll let Ashutosh handle the technical details here since he is most 
familiar with the code in question. I just want to mention that the move out 
of staging doesn't imply the driver is done being developed. We are very 
much open to discussing whatever the PCI folks see as needing to be 
addressed.

-Denny

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
  2016-06-16 20:08   ` Bjorn Helgaas
  2016-06-17 13:58     ` Dennis Dalessandro
@ 2016-06-17 22:05       ` Ashutosh Dixit
  1 sibling, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-17 22:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16 2016 at 04:08:17 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> That's a good start, but leads to more questions.  For example, it
> doesn't answer the obvious question of why the driver needs to
> enable/disable ASPM from interrupt context.

For power saving reasons we keep ASPM L1 enabled, but implement a
heuristic to "quickly" disable ASPM L1 when we notice PCIe traffic (as
measured by the interrupt rate) starting up. If interrupt activity
ceases ASPM L1 is re-enabled.

> Disabling ASPM should only require writing the device's Link Control
> register.  The PCI core could probably provide an interface to do that
> in interrupt context.
>
> Enabling ASPM is not latency-critical and could probably be done from
> a work queue outside interrupt context, although conceptually there
> shouldn't be much required here either, and possibly the PCI core
> interface could be improved.

That is true, to keep latencies low we need to disable ASPM from
interrupt context, but re-enabling ASPM is not latency critical.

> It's possible the latency problem could be handled by some sort of
> quirk that overrides the acceptable latency.

Correct, this is another issue that needs to be resolved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
@ 2016-06-17 22:05       ` Ashutosh Dixit
  0 siblings, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-17 22:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16 2016 at 04:08:17 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> That's a good start, but leads to more questions.  For example, it
> doesn't answer the obvious question of why the driver needs to
> enable/disable ASPM from interrupt context.

For power saving reasons we keep ASPM L1 enabled, but implement a
heuristic to "quickly" disable ASPM L1 when we notice PCIe traffic (as
measured by the interrupt rate) starting up. If interrupt activity
ceases ASPM L1 is re-enabled.

> Disabling ASPM should only require writing the device's Link Control
> register.  The PCI core could probably provide an interface to do that
> in interrupt context.
>
> Enabling ASPM is not latency-critical and could probably be done from
> a work queue outside interrupt context, although conceptually there
> shouldn't be much required here either, and possibly the PCI core
> interface could be improved.

That is true, to keep latencies low we need to disable ASPM from
interrupt context, but re-enabling ASPM is not latency critical.

> It's possible the latency problem could be handled by some sort of
> quirk that overrides the acceptable latency.

Correct, this is another issue that needs to be resolved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
@ 2016-06-17 22:05       ` Ashutosh Dixit
  0 siblings, 0 replies; 11+ messages in thread
From: Ashutosh Dixit @ 2016-06-17 22:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Thu, Jun 16 2016 at 04:08:17 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> That's a good start, but leads to more questions.  For example, it
> doesn't answer the obvious question of why the driver needs to
> enable/disable ASPM from interrupt context.

For power saving reasons we keep ASPM L1 enabled, but implement a
heuristic to "quickly" disable ASPM L1 when we notice PCIe traffic (as
measured by the interrupt rate) starting up. If interrupt activity
ceases ASPM L1 is re-enabled.

> Disabling ASPM should only require writing the device's Link Control
> register.  The PCI core could probably provide an interface to do that
> in interrupt context.
>
> Enabling ASPM is not latency-critical and could probably be done from
> a work queue outside interrupt context, although conceptually there
> shouldn't be much required here either, and possibly the PCI core
> interface could be improved.

That is true, to keep latencies low we need to disable ASPM from
interrupt context, but re-enabling ASPM is not latency critical.

> It's possible the latency problem could be handled by some sort of
> quirk that overrides the acceptable latency.

Correct, this is another issue that needs to be resolved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: hfi1 use of PCI internals
  2016-06-17 22:05       ` Ashutosh Dixit
  (?)
  (?)
@ 2016-06-17 23:04       ` Bjorn Helgaas
  -1 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2016-06-17 23:04 UTC (permalink / raw)
  To: Ashutosh Dixit
  Cc: Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford, Hefty,
	Sean, Hal Rosenstock, linux-rdma, linux-kernel, linux-pci

On Fri, Jun 17, 2016 at 06:05:43PM -0400, Ashutosh Dixit wrote:
> On Thu, Jun 16 2016 at 04:08:17 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > That's a good start, but leads to more questions.  For example, it
> > doesn't answer the obvious question of why the driver needs to
> > enable/disable ASPM from interrupt context.
> 
> For power saving reasons we keep ASPM L1 enabled, but implement a
> heuristic to "quickly" disable ASPM L1 when we notice PCIe traffic (as
> measured by the interrupt rate) starting up. If interrupt activity
> ceases ASPM L1 is re-enabled.
> 
> > Disabling ASPM should only require writing the device's Link Control
> > register.  The PCI core could probably provide an interface to do that
> > in interrupt context.
> >
> > Enabling ASPM is not latency-critical and could probably be done from
> > a work queue outside interrupt context, although conceptually there
> > shouldn't be much required here either, and possibly the PCI core
> > interface could be improved.
> 
> That is true, to keep latencies low we need to disable ASPM from
> interrupt context, but re-enabling ASPM is not latency critical.

For endpoint devices, it should be theoretically possible to
enable/disable ASPM very quickly, by touching only that device.  We
don't do that today because pcie/aspm.c does all sorts of buffoonery
and path walking.  I think that could be simplified, assuming we think
this sort of intensive ASPM-management is desirable.

> > It's possible the latency problem could be handled by some sort of
> > quirk that overrides the acceptable latency.
> 
> Correct, this is another issue that needs to be resolved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-17 23:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 16:20 hfi1 use of PCI internals Bjorn Helgaas
2016-06-16 16:20 ` Bjorn Helgaas
2016-06-16 18:48 ` Ashutosh Dixit
2016-06-16 18:48   ` Ashutosh Dixit
2016-06-16 18:48   ` Ashutosh Dixit
2016-06-16 20:08   ` Bjorn Helgaas
2016-06-17 13:58     ` Dennis Dalessandro
2016-06-17 22:05     ` Ashutosh Dixit
2016-06-17 22:05       ` Ashutosh Dixit
2016-06-17 22:05       ` Ashutosh Dixit
2016-06-17 23:04       ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.