All of lore.kernel.org
 help / color / mirror / Atom feed
* suspend/resume not working in MSIX mode
@ 2010-06-15  1:13 Michael Chan
  2010-06-17 19:16 ` [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access Ben Hutchings
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Chan @ 2010-06-15  1:13 UTC (permalink / raw)
  To: linux-pci, netdev

I'm debugging the bnx2 driver which doesn't work after suspend/resume if
it is running in MSI-X mode.  The problem is that during suspend, the
MSI-X vectors are disabled by the following sequence on x86:

take_cpu_down() -> cpu_disable_common() -> fixup_irqs()

The MSI-X address/data used to disable the vectors are remembered in the
above sequence.  During resume, these address/data are then programmed
back to the device during pci_restore_state(), causing all the vectors
to remain disabled.

Some drivers call free_irq() during suspend and request_irq() during
resume, and that should avoid the problem.  bnx2 and some other drivers
do not do that.  These drivers rely on pci_restore_state() to restore
the MSI-X vectors to the same working state before suspend.

What's the right way to fix this?  Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-06-15  1:13 suspend/resume not working in MSIX mode Michael Chan
@ 2010-06-17 19:16 ` Ben Hutchings
  2010-06-17 23:58   ` Michael Chan
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ben Hutchings @ 2010-06-17 19:16 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Michael Chan, Matthew Wilcox, linux-pci, netdev

During suspend on an SMP system, {read,write}_msi_msg_desc() may be
called to mask and unmask interrupts on a device that is already in a
reduced power state.  At this point memory-mapped registers including
MSI-X tables are not accessible, and config space may not be fully
functional either.

While a device is in a reduced power state its interrupts are
effectively masked and its MSI(-X) state will be restored when it is
brought back to D0.  Therefore these functions can simply read and
write msi_desc::msg for devices not in D0.

Further, read_msi_msg_desc() should only ever be used to update a
previously written message, so it can always read msi_desc::msg
and never needs to touch the hardware.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote: 
> I'm debugging the bnx2 driver which doesn't work after suspend/resume if
> it is running in MSI-X mode.  The problem is that during suspend, the
> MSI-X vectors are disabled by the following sequence on x86:
> 
> take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
>
> The MSI-X address/data used to disable the vectors are remembered in the
> above sequence. During resume, these address/data are then programmed
> back to the device during pci_restore_state(), causing all the vectors
> to remain disabled.

That's not quite what I see.  What I see is that the message is read
back from the table *after* the driver's suspend method has been called.
At this point the device is already in D3 and memory-mapped registers
are not accessible, so we get random bits as the message.  At least,
that's what I see happening with the sfc driver.

> Some drivers call free_irq() during suspend and request_irq() during
> resume, and that should avoid the problem.  bnx2 and some other drivers
> do not do that.  These drivers rely on pci_restore_state() to restore
> the MSI-X vectors to the same working state before suspend.
> 
> What's the right way to fix this?  Thanks.

This is my attempt, which works for sfc.  See if it works for bnx2.

Ben.

 drivers/pci/msi.c |   34 +++++++++++-----------------------
 1 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 77b68ea..03f04dc 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
 void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
 	struct msi_desc *entry = get_irq_desc_msi(desc);
-	if (entry->msi_attrib.is_msix) {
-		void __iomem *base = entry->mask_base +
-			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
 
-		msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
-		msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
-		msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
-	} else {
-		struct pci_dev *dev = entry->dev;
-		int pos = entry->msi_attrib.pos;
-		u16 data;
+	/* We do not touch the hardware (which may not even be
+	 * accessible at the moment) but return the last message
+	 * written.  Assert that this is valid, assuming that
+	 * valid messages are not all-zeroes. */
+	BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
+		 entry->msg.data));
 
-		pci_read_config_dword(dev, msi_lower_address_reg(pos),
-					&msg->address_lo);
-		if (entry->msi_attrib.is_64) {
-			pci_read_config_dword(dev, msi_upper_address_reg(pos),
-						&msg->address_hi);
-			pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
-		} else {
-			msg->address_hi = 0;
-			pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
-		}
-		msg->data = data;
-	}
+	*msg = entry->msg;
 }
 
 void read_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
 void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
 	struct msi_desc *entry = get_irq_desc_msi(desc);
-	if (entry->msi_attrib.is_msix) {
+
+	if (entry->dev->current_state != PCI_D0) {
+		/* Don't touch the hardware now */
+	} else if (entry->msi_attrib.is_msix) {
 		void __iomem *base;
 		base = entry->mask_base +
 			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-06-17 19:16 ` [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access Ben Hutchings
@ 2010-06-17 23:58   ` Michael Chan
  2010-07-02 23:16   ` Jesse Barnes
  2010-10-15 18:26   ` Emil S Tantilov
  2 siblings, 0 replies; 7+ messages in thread
From: Michael Chan @ 2010-06-17 23:58 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jesse Barnes, Matthew Wilcox, linux-pci, netdev


On Thu, 2010-06-17 at 12:16 -0700, Ben Hutchings wrote:
> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
> 
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
> 
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

This works for me too.  Thanks Ben.

> ---
> On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote: 
> > I'm debugging the bnx2 driver which doesn't work after suspend/resume if
> > it is running in MSI-X mode.  The problem is that during suspend, the
> > MSI-X vectors are disabled by the following sequence on x86:
> > 
> > take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
> >
> > The MSI-X address/data used to disable the vectors are remembered in the
> > above sequence. During resume, these address/data are then programmed
> > back to the device during pci_restore_state(), causing all the vectors
> > to remain disabled.
> 
> That's not quite what I see.  What I see is that the message is read
> back from the table *after* the driver's suspend method has been called.
> At this point the device is already in D3 and memory-mapped registers
> are not accessible, so we get random bits as the message.  At least,
> that's what I see happening with the sfc driver.
> 
> > Some drivers call free_irq() during suspend and request_irq() during
> > resume, and that should avoid the problem.  bnx2 and some other drivers
> > do not do that.  These drivers rely on pci_restore_state() to restore
> > the MSI-X vectors to the same working state before suspend.
> > 
> > What's the right way to fix this?  Thanks.
> 
> This is my attempt, which works for sfc.  See if it works for bnx2.
> 
> Ben.
> 
>  drivers/pci/msi.c |   34 +++++++++++-----------------------
>  1 files changed, 11 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 77b68ea..03f04dc 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
>  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>  	struct msi_desc *entry = get_irq_desc_msi(desc);
> -	if (entry->msi_attrib.is_msix) {
> -		void __iomem *base = entry->mask_base +
> -			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
>  
> -		msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
> -		msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
> -		msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
> -	} else {
> -		struct pci_dev *dev = entry->dev;
> -		int pos = entry->msi_attrib.pos;
> -		u16 data;
> +	/* We do not touch the hardware (which may not even be
> +	 * accessible at the moment) but return the last message
> +	 * written.  Assert that this is valid, assuming that
> +	 * valid messages are not all-zeroes. */
> +	BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> +		 entry->msg.data));
>  
> -		pci_read_config_dword(dev, msi_lower_address_reg(pos),
> -					&msg->address_lo);
> -		if (entry->msi_attrib.is_64) {
> -			pci_read_config_dword(dev, msi_upper_address_reg(pos),
> -						&msg->address_hi);
> -			pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
> -		} else {
> -			msg->address_hi = 0;
> -			pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
> -		}
> -		msg->data = data;
> -	}
> +	*msg = entry->msg;
>  }
>  
>  void read_msi_msg(unsigned int irq, struct msi_msg *msg)
> @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
>  void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>  	struct msi_desc *entry = get_irq_desc_msi(desc);
> -	if (entry->msi_attrib.is_msix) {
> +
> +	if (entry->dev->current_state != PCI_D0) {
> +		/* Don't touch the hardware now */
> +	} else if (entry->msi_attrib.is_msix) {
>  		void __iomem *base;
>  		base = entry->mask_base +
>  			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
> -- 
> 1.6.2.5
> 
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-06-17 19:16 ` [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access Ben Hutchings
  2010-06-17 23:58   ` Michael Chan
@ 2010-07-02 23:16   ` Jesse Barnes
  2010-10-15 18:26   ` Emil S Tantilov
  2 siblings, 0 replies; 7+ messages in thread
From: Jesse Barnes @ 2010-07-02 23:16 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Michael Chan, Matthew Wilcox, linux-pci, netdev

On Thu, 17 Jun 2010 20:16:36 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
> 
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
> 
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied to my linux-next branch, thanks.

Matthew, let me know if you have an issue with this.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-06-17 19:16 ` [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access Ben Hutchings
  2010-06-17 23:58   ` Michael Chan
  2010-07-02 23:16   ` Jesse Barnes
@ 2010-10-15 18:26   ` Emil S Tantilov
  2010-10-15 20:06     ` Jesse Barnes
  2 siblings, 1 reply; 7+ messages in thread
From: Emil S Tantilov @ 2010-10-15 18:26 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jesse Barnes, Michael Chan, Matthew Wilcox, linux-pci, NetDev,
	Tantilov, Emil S, Jesse Brandeburg, Kirsher, Jeffrey T

On Thu, Jun 17, 2010 at 12:16 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> called to mask and unmask interrupts on a device that is already in a
> reduced power state.  At this point memory-mapped registers including
> MSI-X tables are not accessible, and config space may not be fully
> functional either.
>
> While a device is in a reduced power state its interrupts are
> effectively masked and its MSI(-X) state will be restored when it is
> brought back to D0.  Therefore these functions can simply read and
> write msi_desc::msg for devices not in D0.
>
> Further, read_msi_msg_desc() should only ever be used to update a
> previously written message, so it can always read msi_desc::msg
> and never needs to touch the hardware.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote:
>> I'm debugging the bnx2 driver which doesn't work after suspend/resume if
>> it is running in MSI-X mode.  The problem is that during suspend, the
>> MSI-X vectors are disabled by the following sequence on x86:
>>
>> take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
>>
>> The MSI-X address/data used to disable the vectors are remembered in the
>> above sequence. During resume, these address/data are then programmed
>> back to the device during pci_restore_state(), causing all the vectors
>> to remain disabled.
>
> That's not quite what I see.  What I see is that the message is read
> back from the table *after* the driver's suspend method has been called.
> At this point the device is already in D3 and memory-mapped registers
> are not accessible, so we get random bits as the message.  At least,
> that's what I see happening with the sfc driver.
>
>> Some drivers call free_irq() during suspend and request_irq() during
>> resume, and that should avoid the problem.  bnx2 and some other drivers
>> do not do that.  These drivers rely on pci_restore_state() to restore
>> the MSI-X vectors to the same working state before suspend.
>>
>> What's the right way to fix this?  Thanks.
>
> This is my attempt, which works for sfc.  See if it works for bnx2.
>
> Ben.
>
>  drivers/pci/msi.c |   34 +++++++++++-----------------------
>  1 files changed, 11 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 77b68ea..03f04dc 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
>  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>        struct msi_desc *entry = get_irq_desc_msi(desc);
> -       if (entry->msi_attrib.is_msix) {
> -               void __iomem *base = entry->mask_base +
> -                       entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
>
> -               msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
> -               msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
> -               msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
> -       } else {
> -               struct pci_dev *dev = entry->dev;
> -               int pos = entry->msi_attrib.pos;
> -               u16 data;
> +       /* We do not touch the hardware (which may not even be
> +        * accessible at the moment) but return the last message
> +        * written.  Assert that this is valid, assuming that
> +        * valid messages are not all-zeroes. */
> +       BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> +                entry->msg.data));
>
> -               pci_read_config_dword(dev, msi_lower_address_reg(pos),
> -                                       &msg->address_lo);
> -               if (entry->msi_attrib.is_64) {
> -                       pci_read_config_dword(dev, msi_upper_address_reg(pos),
> -                                               &msg->address_hi);
> -                       pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
> -               } else {
> -                       msg->address_hi = 0;
> -                       pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
> -               }
> -               msg->data = data;
> -       }
> +       *msg = entry->msg;
>  }
>
>  void read_msi_msg(unsigned int irq, struct msi_msg *msg)
> @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
>  void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>        struct msi_desc *entry = get_irq_desc_msi(desc);
> -       if (entry->msi_attrib.is_msix) {
> +
> +       if (entry->dev->current_state != PCI_D0) {

This check exposed a problem in ixgb (patch is on the way) where
pci_disable_device() was not being called in ixgb_remove(). As a
result the current_state was set to PCI_UNKNOWN and the interface
failed to work on subsequent load of the driver.

Even though the problem was in ixgb, it made me wonder about this
check as the presumption here (low power state) may not always be
true. Like in the case of unloading a driver, which sets
dev->current_state to PCI_UNKNOWN which is not a representation of the
_real_ state of the device (actual state could be D0).

BTW - quick search shows other drivers that could potentially suffer
the faith of ixgb due to lack of pci_disable_device() call on removal.

Thanks,
Emil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-10-15 18:26   ` Emil S Tantilov
@ 2010-10-15 20:06     ` Jesse Barnes
  2010-10-20 19:05       ` Tantilov, Emil S
  0 siblings, 1 reply; 7+ messages in thread
From: Jesse Barnes @ 2010-10-15 20:06 UTC (permalink / raw)
  To: Emil S Tantilov
  Cc: Ben Hutchings, Michael Chan, Matthew Wilcox, linux-pci, NetDev,
	Tantilov, Emil S, Jesse Brandeburg, Kirsher, Jeffrey T

On Fri, 15 Oct 2010 11:26:08 -0700
Emil S Tantilov <emils.tantilov@gmail.com> wrote:

> On Thu, Jun 17, 2010 at 12:16 PM, Ben Hutchings
> <bhutchings@solarflare.com> wrote:
> > During suspend on an SMP system, {read,write}_msi_msg_desc() may be
> > called to mask and unmask interrupts on a device that is already in a
> > reduced power state.  At this point memory-mapped registers including
> > MSI-X tables are not accessible, and config space may not be fully
> > functional either.
> >
> > While a device is in a reduced power state its interrupts are
> > effectively masked and its MSI(-X) state will be restored when it is
> > brought back to D0.  Therefore these functions can simply read and
> > write msi_desc::msg for devices not in D0.
> >
> > Further, read_msi_msg_desc() should only ever be used to update a
> > previously written message, so it can always read msi_desc::msg
> > and never needs to touch the hardware.
> >
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> > On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote:
> >> I'm debugging the bnx2 driver which doesn't work after suspend/resume if
> >> it is running in MSI-X mode.  The problem is that during suspend, the
> >> MSI-X vectors are disabled by the following sequence on x86:
> >>
> >> take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
> >>
> >> The MSI-X address/data used to disable the vectors are remembered in the
> >> above sequence. During resume, these address/data are then programmed
> >> back to the device during pci_restore_state(), causing all the vectors
> >> to remain disabled.
> >
> > That's not quite what I see.  What I see is that the message is read
> > back from the table *after* the driver's suspend method has been called.
> > At this point the device is already in D3 and memory-mapped registers
> > are not accessible, so we get random bits as the message.  At least,
> > that's what I see happening with the sfc driver.
> >
> >> Some drivers call free_irq() during suspend and request_irq() during
> >> resume, and that should avoid the problem.  bnx2 and some other drivers
> >> do not do that.  These drivers rely on pci_restore_state() to restore
> >> the MSI-X vectors to the same working state before suspend.
> >>
> >> What's the right way to fix this?  Thanks.
> >
> > This is my attempt, which works for sfc.  See if it works for bnx2.
> >
> > Ben.
> >
> >  drivers/pci/msi.c |   34 +++++++++++-----------------------
> >  1 files changed, 11 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 77b68ea..03f04dc 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
> >  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
> >  {
> >        struct msi_desc *entry = get_irq_desc_msi(desc);
> > -       if (entry->msi_attrib.is_msix) {
> > -               void __iomem *base = entry->mask_base +
> > -                       entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
> >
> > -               msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
> > -               msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
> > -               msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
> > -       } else {
> > -               struct pci_dev *dev = entry->dev;
> > -               int pos = entry->msi_attrib.pos;
> > -               u16 data;
> > +       /* We do not touch the hardware (which may not even be
> > +        * accessible at the moment) but return the last message
> > +        * written.  Assert that this is valid, assuming that
> > +        * valid messages are not all-zeroes. */
> > +       BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
> > +                entry->msg.data));
> >
> > -               pci_read_config_dword(dev, msi_lower_address_reg(pos),
> > -                                       &msg->address_lo);
> > -               if (entry->msi_attrib.is_64) {
> > -                       pci_read_config_dword(dev, msi_upper_address_reg(pos),
> > -                                               &msg->address_hi);
> > -                       pci_read_config_word(dev, msi_data_reg(pos, 1), &data);
> > -               } else {
> > -                       msg->address_hi = 0;
> > -                       pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
> > -               }
> > -               msg->data = data;
> > -       }
> > +       *msg = entry->msg;
> >  }
> >
> >  void read_msi_msg(unsigned int irq, struct msi_msg *msg)
> > @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
> >  void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
> >  {
> >        struct msi_desc *entry = get_irq_desc_msi(desc);
> > -       if (entry->msi_attrib.is_msix) {
> > +
> > +       if (entry->dev->current_state != PCI_D0) {
> 
> This check exposed a problem in ixgb (patch is on the way) where
> pci_disable_device() was not being called in ixgb_remove(). As a
> result the current_state was set to PCI_UNKNOWN and the interface
> failed to work on subsequent load of the driver.
> 
> Even though the problem was in ixgb, it made me wonder about this
> check as the presumption here (low power state) may not always be
> true. Like in the case of unloading a driver, which sets
> dev->current_state to PCI_UNKNOWN which is not a representation of the
> _real_ state of the device (actual state could be D0).
> 
> BTW - quick search shows other drivers that could potentially suffer
> the faith of ixgb due to lack of pci_disable_device() call on removal.

Yeah we just ran into this in the DRM layer as well; which does a
pci_enable_device but never calls _disable, so we're stuck with
potentially stale state.

I came up with the below to address that, but really I don't like the
idea of nested pci_enable_device() calls at all.  But I haven't looked
at the latest Wireless USB stuff to see if those drivers still rely on
it.

-- 
Jesse Barnes, Intel Open Source Technology Center

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7fa3cbd..37facc1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -994,6 +994,18 @@ static int __pci_enable_device_flags(struct pci_dev *dev,
 	int err;
 	int i, bars = 0;
 
+	/*
+	 * Power state could be unknown at this point, either due to a fresh
+	 * boot or a device removal call.  So get the current power state
+	 * so that things like MSI message writing will behave as expected
+	 * (e.g. if the device really is in D0 at enable time).
+	 */
+	if (dev->pm_cap) {
+		u16 pmcsr;
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
+		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	}
+
 	if (atomic_add_return(1, &dev->enable_cnt) > 1)
 		return 0;		/* already enabled */
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access
  2010-10-15 20:06     ` Jesse Barnes
@ 2010-10-20 19:05       ` Tantilov, Emil S
  0 siblings, 0 replies; 7+ messages in thread
From: Tantilov, Emil S @ 2010-10-20 19:05 UTC (permalink / raw)
  To: Jesse Barnes, Emil S Tantilov
  Cc: Ben Hutchings, Michael Chan, Matthew Wilcox, linux-pci, NetDev,
	Brandeburg, Jesse, Kirsher, Jeffrey T

>-----Original Message-----
>From: Jesse Barnes [mailto:jbarnes@virtuousgeek.org]
>Sent: Friday, October 15, 2010 1:06 PM
>To: Emil S Tantilov
>Cc: Ben Hutchings; Michael Chan; Matthew Wilcox; linux-pci@vger.kernel.org;
>NetDev; Tantilov, Emil S; Brandeburg, Jesse; Kirsher, Jeffrey T
>Subject: Re: [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware
>access
>
>On Fri, 15 Oct 2010 11:26:08 -0700
>Emil S Tantilov <emils.tantilov@gmail.com> wrote:
>
>> On Thu, Jun 17, 2010 at 12:16 PM, Ben Hutchings
>> <bhutchings@solarflare.com> wrote:
>> > During suspend on an SMP system, {read,write}_msi_msg_desc() may be
>> > called to mask and unmask interrupts on a device that is already in a
>> > reduced power state.  At this point memory-mapped registers including
>> > MSI-X tables are not accessible, and config space may not be fully
>> > functional either.
>> >
>> > While a device is in a reduced power state its interrupts are
>> > effectively masked and its MSI(-X) state will be restored when it is
>> > brought back to D0.  Therefore these functions can simply read and
>> > write msi_desc::msg for devices not in D0.
>> >
>> > Further, read_msi_msg_desc() should only ever be used to update a
>> > previously written message, so it can always read msi_desc::msg
>> > and never needs to touch the hardware.
>> >
>> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>> > ---
>> > On Mon, 2010-06-14 at 18:13 -0700, Michael Chan wrote:
>> >> I'm debugging the bnx2 driver which doesn't work after suspend/resume
>if
>> >> it is running in MSI-X mode.  The problem is that during suspend, the
>> >> MSI-X vectors are disabled by the following sequence on x86:
>> >>
>> >> take_cpu_down() -> cpu_disable_common() -> fixup_irqs()
>> >>
>> >> The MSI-X address/data used to disable the vectors are remembered in
>the
>> >> above sequence. During resume, these address/data are then programmed
>> >> back to the device during pci_restore_state(), causing all the vectors
>> >> to remain disabled.
>> >
>> > That's not quite what I see.  What I see is that the message is read
>> > back from the table *after* the driver's suspend method has been
>called.
>> > At this point the device is already in D3 and memory-mapped registers
>> > are not accessible, so we get random bits as the message.  At least,
>> > that's what I see happening with the sfc driver.
>> >
>> >> Some drivers call free_irq() during suspend and request_irq() during
>> >> resume, and that should avoid the problem.  bnx2 and some other
>drivers
>> >> do not do that.  These drivers rely on pci_restore_state() to restore
>> >> the MSI-X vectors to the same working state before suspend.
>> >>
>> >> What's the right way to fix this?  Thanks.
>> >
>> > This is my attempt, which works for sfc.  See if it works for bnx2.
>> >
>> > Ben.
>> >
>> >  drivers/pci/msi.c |   34 +++++++++++-----------------------
>> >  1 files changed, 11 insertions(+), 23 deletions(-)
>> >
>> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> > index 77b68ea..03f04dc 100644
>> > --- a/drivers/pci/msi.c
>> > +++ b/drivers/pci/msi.c
>> > @@ -196,30 +196,15 @@ void unmask_msi_irq(unsigned int irq)
>> >  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>> >  {
>> >        struct msi_desc *entry = get_irq_desc_msi(desc);
>> > -       if (entry->msi_attrib.is_msix) {
>> > -               void __iomem *base = entry->mask_base +
>> > -                       entry->msi_attrib.entry_nr *
>PCI_MSIX_ENTRY_SIZE;
>> >
>> > -               msg->address_lo = readl(base +
>PCI_MSIX_ENTRY_LOWER_ADDR);
>> > -               msg->address_hi = readl(base +
>PCI_MSIX_ENTRY_UPPER_ADDR);
>> > -               msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
>> > -       } else {
>> > -               struct pci_dev *dev = entry->dev;
>> > -               int pos = entry->msi_attrib.pos;
>> > -               u16 data;
>> > +       /* We do not touch the hardware (which may not even be
>> > +        * accessible at the moment) but return the last message
>> > +        * written.  Assert that this is valid, assuming that
>> > +        * valid messages are not all-zeroes. */
>> > +       BUG_ON(!(entry->msg.address_hi | entry->msg.address_lo |
>> > +                entry->msg.data));
>> >
>> > -               pci_read_config_dword(dev, msi_lower_address_reg(pos),
>> > -                                       &msg->address_lo);
>> > -               if (entry->msi_attrib.is_64) {
>> > -                       pci_read_config_dword(dev,
>msi_upper_address_reg(pos),
>> > -                                               &msg->address_hi);
>> > -                       pci_read_config_word(dev, msi_data_reg(pos, 1),
>&data);
>> > -               } else {
>> > -                       msg->address_hi = 0;
>> > -                       pci_read_config_word(dev, msi_data_reg(pos, 0),
>&data);
>> > -               }
>> > -               msg->data = data;
>> > -       }
>> > +       *msg = entry->msg;
>> >  }
>> >
>> >  void read_msi_msg(unsigned int irq, struct msi_msg *msg)
>> > @@ -232,7 +217,10 @@ void read_msi_msg(unsigned int irq, struct msi_msg
>*msg)
>> >  void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>> >  {
>> >        struct msi_desc *entry = get_irq_desc_msi(desc);
>> > -       if (entry->msi_attrib.is_msix) {
>> > +
>> > +       if (entry->dev->current_state != PCI_D0) {
>>
>> This check exposed a problem in ixgb (patch is on the way) where
>> pci_disable_device() was not being called in ixgb_remove(). As a
>> result the current_state was set to PCI_UNKNOWN and the interface
>> failed to work on subsequent load of the driver.
>>
>> Even though the problem was in ixgb, it made me wonder about this
>> check as the presumption here (low power state) may not always be
>> true. Like in the case of unloading a driver, which sets
>> dev->current_state to PCI_UNKNOWN which is not a representation of the
>> _real_ state of the device (actual state could be D0).
>>
>> BTW - quick search shows other drivers that could potentially suffer
>> the faith of ixgb due to lack of pci_disable_device() call on removal.
>
>Yeah we just ran into this in the DRM layer as well; which does a
>pci_enable_device but never calls _disable, so we're stuck with
>potentially stale state.
>
>I came up with the below to address that, but really I don't like the
>idea of nested pci_enable_device() calls at all.  But I haven't looked
>at the latest Wireless USB stuff to see if those drivers still rely on
>it.
>
>--
>Jesse Barnes, Intel Open Source Technology Center
>
>diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>index 7fa3cbd..37facc1 100644
>--- a/drivers/pci/pci.c
>+++ b/drivers/pci/pci.c
>@@ -994,6 +994,18 @@ static int __pci_enable_device_flags(struct pci_dev
>*dev,
> 	int err;
> 	int i, bars = 0;
>
>+	/*
>+	 * Power state could be unknown at this point, either due to a fresh
>+	 * boot or a device removal call.  So get the current power state
>+	 * so that things like MSI message writing will behave as expected
>+	 * (e.g. if the device really is in D0 at enable time).
>+	 */
>+	if (dev->pm_cap) {
>+		u16 pmcsr;
>+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>+		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>+	}
>+
> 	if (atomic_add_return(1, &dev->enable_cnt) > 1)
> 		return 0;		/* already enabled */
>

With this patch applied I could reload the driver and confirmed that current_state is set to the actual power state.

Thanks,
Emil


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-20 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-15  1:13 suspend/resume not working in MSIX mode Michael Chan
2010-06-17 19:16 ` [PATCH] PCI: MSI: Remove unsafe and unnecessary hardware access Ben Hutchings
2010-06-17 23:58   ` Michael Chan
2010-07-02 23:16   ` Jesse Barnes
2010-10-15 18:26   ` Emil S Tantilov
2010-10-15 20:06     ` Jesse Barnes
2010-10-20 19:05       ` Tantilov, Emil S

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.