All of lore.kernel.org
 help / color / mirror / Atom feed
* A question about PCI suspend-resume functionallity
@ 2014-07-08 15:39 Igor Bezukh
  2014-07-08 20:47 ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Igor Bezukh @ 2014-07-08 15:39 UTC (permalink / raw)
  To: linux-pci

Hi,


We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:

(*) Adapter is connected to the PCIE slot
(*) We put the system under suspend by running pm-suspend from user-space
(*) Remove the adapter from the PCIE slot
(*) Wake up the system

Currenlty, we got kernel panics and the system got stuck.

My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?

Or should I implement the call to igb_remove from igb_resume in the Intel driver?

Thanks,
Igor Bezukh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-08 15:39 A question about PCI suspend-resume functionallity Igor Bezukh
@ 2014-07-08 20:47 ` Bjorn Helgaas
  2014-07-08 22:05   ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-08 20:47 UTC (permalink / raw)
  To: Igor Bezukh; +Cc: linux-pci, Linux PM list

[+cc linux-pm]

On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> Hi,
>
>
> We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
>
> (*) Adapter is connected to the PCIE slot
> (*) We put the system under suspend by running pm-suspend from user-space
> (*) Remove the adapter from the PCIE slot
> (*) Wake up the system
>
> Currenlty, we got kernel panics and the system got stuck.
>
> My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
>
> Or should I implement the call to igb_remove from igb_resume in the Intel driver?

I don't know what the best design is here.  I suspect that when we
resume, we should re-enumerate the PCI topology to see if anything
changed, but I don't think we do that.  I think there *is* something
like that for ACPI -- the BIOS can send a Bus Check notification on
resume if it knows something has changed.  But my guess is that if you
remove an adapter below a switch that is powered down because the
system is suspended, the interrupt we would normally get is lost
forever.

It doesn't seem like something that should be solved in the driver,
because you could just as easily have *added* a device while
suspended, and the core has to re-enumerate to find that.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-08 20:47 ` Bjorn Helgaas
@ 2014-07-08 22:05   ` Rafael J. Wysocki
  2014-07-09 14:18     ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2014-07-08 22:05 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Igor Bezukh, linux-pci, Linux PM list

On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
> [+cc linux-pm]
> 
> On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> > Hi,
> >
> >
> > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
> >
> > (*) Adapter is connected to the PCIE slot
> > (*) We put the system under suspend by running pm-suspend from user-space
> > (*) Remove the adapter from the PCIE slot
> > (*) Wake up the system
> >
> > Currenlty, we got kernel panics and the system got stuck.
> >
> > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
> >
> > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
> 
> I don't know what the best design is here.  I suspect that when we
> resume, we should re-enumerate the PCI topology to see if anything
> changed, but I don't think we do that.  I think there *is* something
> like that for ACPI -- the BIOS can send a Bus Check notification on
> resume if it knows something has changed.

In ACPI we wait for system resume to complete and handle the notifications
then.  Calling .remove() callbacks from drivers during system resume doesn't
really work, especially if they happen too early (e.g. during the "noirq resume"
stage).

> But my guess is that if you
> remove an adapter below a switch that is powered down because the
> system is suspended, the interrupt we would normally get is lost
> forever.
> 
> It doesn't seem like something that should be solved in the driver,
> because you could just as easily have *added* a device while
> suspended, and the core has to re-enumerate to find that.

The driver's system resume callbacks need to be able to cope with
missing devices.

Rafael


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-08 22:05   ` Rafael J. Wysocki
@ 2014-07-09 14:18     ` Alan Stern
  2014-07-09 15:55       ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2014-07-09 14:18 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjorn Helgaas, Igor Bezukh, linux-pci, Linux PM list

On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:

> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
> > [+cc linux-pm]
> > 
> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> > > Hi,
> > >
> > >
> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
> > >
> > > (*) Adapter is connected to the PCIE slot
> > > (*) We put the system under suspend by running pm-suspend from user-space
> > > (*) Remove the adapter from the PCIE slot
> > > (*) Wake up the system
> > >
> > > Currenlty, we got kernel panics and the system got stuck.
> > >
> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
> > >
> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
> > 
> > I don't know what the best design is here.  I suspect that when we
> > resume, we should re-enumerate the PCI topology to see if anything
> > changed, but I don't think we do that.  I think there *is* something
> > like that for ACPI -- the BIOS can send a Bus Check notification on
> > resume if it knows something has changed.
> 
> In ACPI we wait for system resume to complete and handle the notifications
> then.  Calling .remove() callbacks from drivers during system resume doesn't
> really work, especially if they happen too early (e.g. during the "noirq resume"
> stage).

USB uses a similar approach.  Hot-unplugs are handled by the khubd 
thread, which is freezable and therefore doesn't run until after the 
system is resumed.

> > But my guess is that if you
> > remove an adapter below a switch that is powered down because the
> > system is suspended, the interrupt we would normally get is lost
> > forever.
> > 
> > It doesn't seem like something that should be solved in the driver,
> > because you could just as easily have *added* a device while
> > suspended, and the core has to re-enumerate to find that.
> 
> The driver's system resume callbacks need to be able to cope with
> missing devices.

In the USB stack, the subsystem core resume code checks to see if a
device has been unplugged before the driver's ->resume callback is
invoked.  If a device is gone, the driver callback is skipped.  Thus
drivers don't have to worry about trying to resume a device that has
been unplugged.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-09 14:18     ` Alan Stern
@ 2014-07-09 15:55       ` Bjorn Helgaas
  2014-07-09 16:26         ` Alan Stern
  2014-07-09 16:35         ` Rafael J. Wysocki
  0 siblings, 2 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-09 15:55 UTC (permalink / raw)
  To: Alan Stern; +Cc: Rafael J. Wysocki, Igor Bezukh, linux-pci, Linux PM list

On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
>
>> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
>> > [+cc linux-pm]
>> >
>> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
>> > > Hi,
>> > >
>> > >
>> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
>> > >
>> > > (*) Adapter is connected to the PCIE slot
>> > > (*) We put the system under suspend by running pm-suspend from user-space
>> > > (*) Remove the adapter from the PCIE slot
>> > > (*) Wake up the system
>> > >
>> > > Currenlty, we got kernel panics and the system got stuck.
>> > >
>> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
>> > >
>> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?

>> ...
>> The driver's system resume callbacks need to be able to cope with
>> missing devices.

Based on this, it sounds like igb_resume() should call igb_remove()
when it figures out the device is missing.

That might be the best we can do right now, but it doesn't sound like
a general-purpose solution.  Detecting device removal sounds like a
core function, not a driver function.  It doesn't seem like drivers
should have to implement ->resume just to deal with this case.
Calling ->remove from ->resume is a little strange and may expose
locking issues or races.  I suspect that most ->remove methods are not
prepared to deal with missing devices (they may want to stop DMA,
disable interrupts, etc.)

> In the USB stack, the subsystem core resume code checks to see if a
> device has been unplugged before the driver's ->resume callback is
> invoked.  If a device is gone, the driver callback is skipped.  Thus
> drivers don't have to worry about trying to resume a device that has
> been unplugged.

I assume that you do want to call the driver's ->remove method to free
any per-device state.  Do USB drivers just have to be smart enough to
do that without touching the device?

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-09 15:55       ` Bjorn Helgaas
@ 2014-07-09 16:26         ` Alan Stern
  2014-07-09 16:35         ` Rafael J. Wysocki
  1 sibling, 0 replies; 11+ messages in thread
From: Alan Stern @ 2014-07-09 16:26 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Rafael J. Wysocki, Igor Bezukh, linux-pci, Linux PM list

On Wed, 9 Jul 2014, Bjorn Helgaas wrote:

> > In the USB stack, the subsystem core resume code checks to see if a
> > device has been unplugged before the driver's ->resume callback is
> > invoked.  If a device is gone, the driver callback is skipped.  Thus
> > drivers don't have to worry about trying to resume a device that has
> > been unplugged.
> 
> I assume that you do want to call the driver's ->remove method to free
> any per-device state.  Do USB drivers just have to be smart enough to
> do that without touching the device?

Not quite how I'd describe it.  They have to be smart enough for the
remove routine to run correctly even if the device is already gone --
_any_ driver for a hot-unpluggable device has to be like this.

Obviously no driver can touch a device that is gone.  But it can
try.  If a USB driver tries to touch a device that has been unplugged,
the corresponding function call just returns an error.  It doesn't
crash.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-09 15:55       ` Bjorn Helgaas
  2014-07-09 16:26         ` Alan Stern
@ 2014-07-09 16:35         ` Rafael J. Wysocki
  2014-07-09 19:24           ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2014-07-09 16:35 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Alan Stern, Igor Bezukh, linux-pci, Linux PM list

On Wednesday, July 09, 2014 09:55:24 AM Bjorn Helgaas wrote:
> On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
> >
> >> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
> >> > [+cc linux-pm]
> >> >
> >> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> >> > > Hi,
> >> > >
> >> > >
> >> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
> >> > >
> >> > > (*) Adapter is connected to the PCIE slot
> >> > > (*) We put the system under suspend by running pm-suspend from user-space
> >> > > (*) Remove the adapter from the PCIE slot
> >> > > (*) Wake up the system
> >> > >
> >> > > Currenlty, we got kernel panics and the system got stuck.
> >> > >
> >> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
> >> > >
> >> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
> 
> >> ...
> >> The driver's system resume callbacks need to be able to cope with
> >> missing devices.
> 
> Based on this, it sounds like igb_resume() should call igb_remove()
> when it figures out the device is missing.

I wouldn't say so.  igb_resume() should not crash when the device is missing
and should just handle that situation cleanly.  Obviously it is not its role
to remove the device from the hierarchy.

> That might be the best we can do right now, but it doesn't sound like
> a general-purpose solution.  Detecting device removal sounds like a
> core function, not a driver function.  It doesn't seem like drivers
> should have to implement ->resume just to deal with this case.

No, they shouldn't.  They just need to be able to cope with missing devices
cleanly.

Devices (and PCI devices in particular) can go away at any time, including
during system resume, without notice anyway and drivers need to be able to
cope with that regardless.

The notification can actually come in *after* the device has gone in any
case and then whoever gets the notification should handle the device
removal. That is not the driver in particular, but in the meantime
the driver should still work without crashing.

Rafael


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-09 16:35         ` Rafael J. Wysocki
@ 2014-07-09 19:24           ` Bjorn Helgaas
  2014-07-10 11:39             ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-09 19:24 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, Igor Bezukh, linux-pci, Linux PM list

On Wed, Jul 9, 2014 at 10:35 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, July 09, 2014 09:55:24 AM Bjorn Helgaas wrote:
>> On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
>> >
>> >> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
>> >> > [+cc linux-pm]
>> >> >
>> >> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
>> >> > > Hi,
>> >> > >
>> >> > >
>> >> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
>> >> > >
>> >> > > (*) Adapter is connected to the PCIE slot
>> >> > > (*) We put the system under suspend by running pm-suspend from user-space
>> >> > > (*) Remove the adapter from the PCIE slot
>> >> > > (*) Wake up the system
>> >> > >
>> >> > > Currenlty, we got kernel panics and the system got stuck.
>> >> > >
>> >> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
>> >> > >
>> >> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
>>
>> >> ...
>> >> The driver's system resume callbacks need to be able to cope with
>> >> missing devices.
>>
>> Based on this, it sounds like igb_resume() should call igb_remove()
>> when it figures out the device is missing.
>
> I wouldn't say so.  igb_resume() should not crash when the device is missing
> and should just handle that situation cleanly.  Obviously it is not its role
> to remove the device from the hierarchy.

OK, that makes sense.

However, I don't know of anything in the PCI core that will notice
that the device has disappeared, so I doubt it will be removed from
the hierarchy.  I think that means the slot will become unusable until
a reboot, because the original device is gone and we can't add a new
one because the original one is still in the hierarchy.  That's not
very good, but it is better than a panic.

>> That might be the best we can do right now, but it doesn't sound like
>> a general-purpose solution.  Detecting device removal sounds like a
>> core function, not a driver function.  It doesn't seem like drivers
>> should have to implement ->resume just to deal with this case.
>
> No, they shouldn't.  They just need to be able to cope with missing devices
> cleanly.
>
> Devices (and PCI devices in particular) can go away at any time, including
> during system resume, without notice anyway and drivers need to be able to
> cope with that regardless.
>
> The notification can actually come in *after* the device has gone in any
> case and then whoever gets the notification should handle the device
> removal. That is not the driver in particular, but in the meantime
> the driver should still work without crashing.

Yep.  So the panic Igor is seeing is probably an igb_resume() problem,
but after that's fixed, we'll probably trip over the PCI bug about not
handling the remove.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-09 19:24           ` Bjorn Helgaas
@ 2014-07-10 11:39             ` Rafael J. Wysocki
  2014-07-10 22:12               ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2014-07-10 11:39 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Alan Stern, Igor Bezukh, linux-pci, Linux PM list

On Wednesday, July 09, 2014 01:24:20 PM Bjorn Helgaas wrote:
> On Wed, Jul 9, 2014 at 10:35 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, July 09, 2014 09:55:24 AM Bjorn Helgaas wrote:
> >> On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> >> > On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
> >> >
> >> >> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
> >> >> > [+cc linux-pm]
> >> >> >
> >> >> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> >> >> > > Hi,
> >> >> > >
> >> >> > >
> >> >> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
> >> >> > >
> >> >> > > (*) Adapter is connected to the PCIE slot
> >> >> > > (*) We put the system under suspend by running pm-suspend from user-space
> >> >> > > (*) Remove the adapter from the PCIE slot
> >> >> > > (*) Wake up the system
> >> >> > >
> >> >> > > Currenlty, we got kernel panics and the system got stuck.
> >> >> > >
> >> >> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
> >> >> > >
> >> >> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
> >>
> >> >> ...
> >> >> The driver's system resume callbacks need to be able to cope with
> >> >> missing devices.
> >>
> >> Based on this, it sounds like igb_resume() should call igb_remove()
> >> when it figures out the device is missing.
> >
> > I wouldn't say so.  igb_resume() should not crash when the device is missing
> > and should just handle that situation cleanly.  Obviously it is not its role
> > to remove the device from the hierarchy.
> 
> OK, that makes sense.
> 
> However, I don't know of anything in the PCI core that will notice
> that the device has disappeared, so I doubt it will be removed from
> the hierarchy.

If we don't get a notification via ACPI or PCIe hotplug or anything,
then no, it won't be removed automatically.

However, it still can be removed manually via sysfs, can't it?

> I think that means the slot will become unusable until
> a reboot, because the original device is gone and we can't add a new
> one because the original one is still in the hierarchy.  That's not
> very good, but it is better than a panic.
> 
> >> That might be the best we can do right now, but it doesn't sound like
> >> a general-purpose solution.  Detecting device removal sounds like a
> >> core function, not a driver function.  It doesn't seem like drivers
> >> should have to implement ->resume just to deal with this case.
> >
> > No, they shouldn't.  They just need to be able to cope with missing devices
> > cleanly.
> >
> > Devices (and PCI devices in particular) can go away at any time, including
> > during system resume, without notice anyway and drivers need to be able to
> > cope with that regardless.
> >
> > The notification can actually come in *after* the device has gone in any
> > case and then whoever gets the notification should handle the device
> > removal. That is not the driver in particular, but in the meantime
> > the driver should still work without crashing.
> 
> Yep.  So the panic Igor is seeing is probably an igb_resume() problem,
> but after that's fixed, we'll probably trip over the PCI bug about not
> handling the remove.

If there's any kind of notification coming in either before of after the event,
we should act on it and remove the device.  It's a bug if we don't I'd say.

Rafael


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A question about PCI suspend-resume functionallity
  2014-07-10 11:39             ` Rafael J. Wysocki
@ 2014-07-10 22:12               ` Bjorn Helgaas
  2014-07-11  7:53                 ` Igor Bezukh
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2014-07-10 22:12 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, Igor Bezukh, linux-pci, Linux PM list

On Thu, Jul 10, 2014 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, July 09, 2014 01:24:20 PM Bjorn Helgaas wrote:
>> On Wed, Jul 9, 2014 at 10:35 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Wednesday, July 09, 2014 09:55:24 AM Bjorn Helgaas wrote:
>> >> On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> >> > On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
>> >> >
>> >> >> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
>> >> >> > [+cc linux-pm]
>> >> >> >
>> >> >> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > >
>> >> >> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
>> >> >> > >
>> >> >> > > (*) Adapter is connected to the PCIE slot
>> >> >> > > (*) We put the system under suspend by running pm-suspend from user-space
>> >> >> > > (*) Remove the adapter from the PCIE slot
>> >> >> > > (*) Wake up the system
>> >> >> > >
>> >> >> > > Currenlty, we got kernel panics and the system got stuck.
>> >> >> > >
>> >> >> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
>> >> >> > >
>> >> >> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
>> >>
>> >> >> ...
>> >> >> The driver's system resume callbacks need to be able to cope with
>> >> >> missing devices.
>> >>
>> >> Based on this, it sounds like igb_resume() should call igb_remove()
>> >> when it figures out the device is missing.
>> >
>> > I wouldn't say so.  igb_resume() should not crash when the device is missing
>> > and should just handle that situation cleanly.  Obviously it is not its role
>> > to remove the device from the hierarchy.
>>
>> OK, that makes sense.
>>
>> However, I don't know of anything in the PCI core that will notice
>> that the device has disappeared, so I doubt it will be removed from
>> the hierarchy.
>
> If we don't get a notification via ACPI or PCIe hotplug or anything,
> then no, it won't be removed automatically.
>
> However, it still can be removed manually via sysfs, can't it?

Yes, I would think so.  So I guess there's a workaround at least.

Igor, can you test this scenario (after fixing igb_resume() so it
doesn't crash when the device is missing)?  I.e., suspend the system,
remove the adapter, resume the system, then do an "lspci" to see if
the kernel thinks the adapter is still there, then put an adapter in
the slot again (either hot-add if the the slot supports it, or
suspend/add/resume)?

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A question about PCI suspend-resume functionallity
  2014-07-10 22:12               ` Bjorn Helgaas
@ 2014-07-11  7:53                 ` Igor Bezukh
  0 siblings, 0 replies; 11+ messages in thread
From: Igor Bezukh @ 2014-07-11  7:53 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Alan Stern, linux-pci, Linux PM list, Dmitry Kuzminov, Yanir Lubetkin


> On Thu, Jul 10, 2014 at 5:39 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, July 09, 2014 01:24:20 PM Bjorn Helgaas wrote:
> >> On Wed, Jul 9, 2014 at 10:35 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> > On Wednesday, July 09, 2014 09:55:24 AM Bjorn Helgaas wrote:
> >> >> On Wed, Jul 9, 2014 at 8:18 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> >> >> > On Wed, 9 Jul 2014, Rafael J. Wysocki wrote:
> >> >> >
> >> >> >> On Tuesday, July 08, 2014 02:47:03 PM Bjorn Helgaas wrote:
> >> >> >> > [+cc linux-pm]
> >> >> >> >
> >> >> >> > On Tue, Jul 8, 2014 at 9:39 AM, Igor Bezukh <Igor@galilsoft.com> wrote:
> >> >> >> > > Hi,
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > We are testing Intel Gigabit adapter driver (igb) under Fedora 20, kernel 3.14.4 for the following use-case:
> >> >> >> > >
> >> >> >> > > (*) Adapter is connected to the PCIE slot
> >> >> >> > > (*) We put the system under suspend by running pm-suspend from user-space
> >> >> >> > > (*) Remove the adapter from the PCIE slot
> >> >> >> > > (*) Wake up the system
> >> >> >> > >
> >> >> >> > > Currenlty, we got kernel panics and the system got stuck.
> >> >> >> > >
> >> >> >> > > My question is - does the PCI subsystem logic calls the driver remove function when driver resume function returns with error code?
> >> >> >> > >
> >> >> >> > > Or should I implement the call to igb_remove from igb_resume in the Intel driver?
> >> >>
> >> >> >> ...
> >> >> >> The driver's system resume callbacks need to be able to cope with
> >> >> >> missing devices.
> >> >>
> >> >> Based on this, it sounds like igb_resume() should call igb_remove()
> >> >> when it figures out the device is missing.
> >> >
> >> > I wouldn't say so.  igb_resume() should not crash when the device is missing
> >> > and should just handle that situation cleanly.  Obviously it is not its role
> >> > to remove the device from the hierarchy.
> >>
> >> OK, that makes sense.
> >>
> >> However, I don't know of anything in the PCI core that will notice
> >> that the device has disappeared, so I doubt it will be removed from
> >> the hierarchy.
> >
> > If we don't get a notification via ACPI or PCIe hotplug or anything,
> > then no, it won't be removed automatically.
> >
> > However, it still can be removed manually via sysfs, can't it?
> 
> Yes, I would think so.  So I guess there's a workaround at least.
> 
> Igor, can you test this scenario (after fixing igb_resume() so it
> doesn't crash when the device is missing)?  I.e., suspend the system,
> remove the adapter, resume the system, then do an "lspci" to see if
> the kernel thinks the adapter is still there, then put an adapter in
> the slot again (either hot-add if the the slot supports it, or
> suspend/add/resume)?


Sure. I think I already found the root cause of the kernel panic. I will test it  ( and submit a patch if it is correct) .

I will also test the PCI enumeration with the fixed driver and I wll update you soon.

Alan, Rafael, Bjorn, thank you for the information!

Igor





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-11  7:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-08 15:39 A question about PCI suspend-resume functionallity Igor Bezukh
2014-07-08 20:47 ` Bjorn Helgaas
2014-07-08 22:05   ` Rafael J. Wysocki
2014-07-09 14:18     ` Alan Stern
2014-07-09 15:55       ` Bjorn Helgaas
2014-07-09 16:26         ` Alan Stern
2014-07-09 16:35         ` Rafael J. Wysocki
2014-07-09 19:24           ` Bjorn Helgaas
2014-07-10 11:39             ` Rafael J. Wysocki
2014-07-10 22:12               ` Bjorn Helgaas
2014-07-11  7:53                 ` Igor Bezukh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.