From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: MIME-Version: 1.0 In-Reply-To: <20180802072036.GN2534@lahna.fi.intel.com> References: <20180801164358.GI2534@lahna.fi.intel.com> <20180801171512.GA28440@wunner.de> <20180802072036.GN2534@lahna.fi.intel.com> From: gokul cg Date: Thu, 2 Aug 2018 12:59:18 +0530 Message-ID: Subject: Re: [PATCH] PCI: pciehp: Differentiate between surprise and safe removal To: Mika Westerberg Cc: Lukas Wunner , Bjorn Helgaas , Ashok Raj , Keith Busch , Yinghai Lu , Sinan Kaya , linux-pci@vger.kernel.org, Alexandru Gagniuc Content-Type: multipart/alternative; boundary="000000000000e724d705726ec4ae" List-ID: --000000000000e724d705726ec4ae Content-Type: text/plain; charset="UTF-8" Hi , >After the pci_dev is removed from the >> hierarchy, accessing it seems at least questionable. What about AER driver . I was discussing the same in another mail chain with subject "Possible race condition in the kernel between PCI driver and AER handling" Regards Gokul, --------------------FYI--------------- I am suspecting a possible race condition in the kernel between PCI driver and AER handling. Because of the same kernel panic happens from worker thread which handles bottom half of aer irq. I am seeing this issue when I suddenly power off PCI card which supports/enabled PCIE AER error reporting. While powering off PCI device, AER driver will get AER IRQ for the device, from AER IRQ handler, it will cache AER error code and schedule worker thread to handle error. The PCIe device will get removed from PCI tree before worker thread completes its task and kernel panic is happening when worker thread tries to access PCI device's config space. Issue: crash> crash> bt PID: 2727 TASK: ffff880272adc530 CPU: 0 COMMAND: "kworker/0:2" #0 [ffff88027469fac8] machine_kexec at ffffffff8102cf18 #1 [ffff88027469fb28] crash_kexec at ffffffff810a6b05 #2 [ffff88027469fbf0] oops_end at ffffffff8176d960 #3 [ffff88027469fc18] die at ffffffff810060db #4 [ffff88027469fc48] do_general_protection at ffffffff8176d452 #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2 [exception RIP: pci_bus_read_config_dword+100] RIP: ffffffff813405f4 RSP: ffff88027469fd20 RFLAGS: 00010046 RAX: 435f494350006963 RBX: ffff880274892000 RCX: 0000000000000004 RDX: 0000000000000100 RSI: 0000000000000060 RDI: ffff880274892000 RBP: ffff88027469fd48 R8: ffff88027469fd2c R9: 00000000000012c0 R10: 0000000000000006 R11: 00000000000012bf R12: ffff88027469fd5c R13: 0000000000000246 R14: 0000000000000000 R15: ffff8802741a4000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225 #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38 #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0 #11 [ffff88027469fe70] worker_thread at ffffffff8105e251 #12 [ffff88027469fed0] kthread at ffffffff81064260 #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38 crash> I have tested it on kernel 3.10 . But from source i could see that this case is still relevant for latest Linux source . --------------------END-------------- On Thu, Aug 2, 2018 at 12:50 PM, Mika Westerberg < mika.westerberg@linux.intel.com> wrote: > On Wed, Aug 01, 2018 at 07:15:12PM +0200, Lukas Wunner wrote: > > On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote: > > > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote: > > > > -static void remove_board(struct slot *p_slot) > > > > +static void remove_board(struct slot *p_slot, bool safe_removal) > > > > { > > > > struct controller *ctrl = p_slot->ctrl; > > > > > > > > - pciehp_unconfigure_device(p_slot); > > > > + pciehp_unconfigure_device(p_slot, safe_removal); > > > > > > Below we turn off power to the slot if it has power controller. Even if > > > we disable slot from sysfs, I think it ends up being inaccessible after > > > power is turned off. I wonder if we should mark the devices > disconnected > > > in that case as well? > > > > > > > > > > > if (POWER_CTRL(ctrl)) { > > > > pciehp_power_off_slot(p_slot); > > > > No, when pciehp_unconfigure_device() returns, the PCI devices below > > the hotplug bridge are unbound and removed from the system. They're > > gone, so the bit set in their pci_dev struct would no longer be > > accessible anyway. Unless of course something is holding a ref on > > the pci_dev, but that would seem to be a bug. (Accessing a device > > that's already removed from the system, that is.) > > > > Calling pci_dev_set_disconnected() only gives the PCI core and the > > driver bound to the device an indication that's it's inaccessible, > > so any code paths during unbound and PCI device teardown can skip > > accesses. (Because pci_dev_is_disconnected() is currently scoped > > to the PCI core, the disconnected status can only be queried from > > drivers that live in the PCI core, such as portdrv and all the > > port services drivers.) After the pci_dev is removed from the > > hierarchy, accessing it seems at least questionable. > > > > Does this make things clearer? Shout if it not. :-) > > Yes it does. Thank you :) > --000000000000e724d705726ec4ae Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi ,

>After the pci_dev is removed= from the
>>= hierarchy, accessing it seems at least questionable.

=
What about AER driver .=C2=A0 I was discussing the same in anoth= er mail chain with subject=C2=A0 "Possible race condition in the kernel between PCI = driver and AER handling"

Regards
Gokul,

-----= ---------------FYI---------------

I=C2=A0am suspecting a poss= ible race condition in the kernel between PCI driver and AER handling.

Because of the same kernel panic happens from worker thread whic= h handles bottom half of aer irq.


<= p class=3D"gmail-m_6437089603770748812gmail-p1" style=3D"font-variant-numer= ic:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8= px;line-height:normal;text-decoration-style:initial;text-decoration-color:i= nitial;margin:0px">= I am seeing this issue when I suddenly power off PCI card which supports/en= abled PCIE AER error reporting.

While powering off PCI de= vice, AER driver will get AER IRQ for the device, from AER IRQ handler, it = will cache AER error code and schedule worker thread to handle error.

The PCIe device will get removed from PCI tree before worker t= hread completes its task and kernel panic is=C2=A0 happening when worker th= read tries to access PCI device's config space.


Issue:


crash>

crash> bt

PID: 2727=C2=A0 =C2=A0TASK: ffff88= 0272adc530=C2=A0 CPU: 0=C2=A0 =C2=A0COMMAND: "kworker/0:2"=

#0 [ffff88027469fac8] machine_kexec at ffffffff8102cf18<= /font>

#1 [ffff88027469fb28] crash_kexec at ffffffff810a6b05

#2 [ffff88027469fbf0] oops_end at ffffffff8176d960

#3 [ffff88027469fc18] die at ffffffff810060db

#4 [fff= f88027469fc48] do_general_protection at ffffffff8176d452

<= p class=3D"gmail-m_6437089603770748812gmail-p1" style=3D"font-variant-numer= ic:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8= px;line-height:normal;text-decoration-style:initial;text-decoration-color:i= nitial;margin:0px">= #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2<= /p>

=C2=A0 =C2=A0 [exception RIP: pci_bus_read_config_dword+100]

=C2=A0 =C2=A0 RIP: ffffffff813405f4=C2=A0 RSP: ffff88027469fd20=C2= =A0 RFLAGS: 00010046

=C2=A0 =C2=A0 RAX: 435f494350006963= =C2=A0 RBX: ffff880274892000=C2=A0 RCX: 0000000000000004

<= p class=3D"gmail-m_6437089603770748812gmail-p1" style=3D"font-variant-numer= ic:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12.8= px;line-height:normal;text-decoration-style:initial;text-decoration-color:i= nitial;margin:0px">= =C2=A0 =C2=A0 RDX: 0000000000000100=C2=A0 RSI: 0000000000000060=C2=A0 RDI: = ffff880274892000

=C2=A0 =C2=A0 RBP: ffff88027469fd48=C2= =A0 =C2=A0R8: ffff88027469fd2c=C2=A0 =C2=A0R9: 00000000000012c0

=C2=A0 =C2=A0 R10: 0000000000000006=C2=A0 R11: 00000000000012bf=C2= =A0 R12: ffff88027469fd5c

=C2=A0 =C2=A0 R13: 0000000000= 000246=C2=A0 R14: 0000000000000000=C2=A0 R15: ffff8802741a4000

=C2=A0 =C2=A0 ORIG_RAX: ffffffffffffffff=C2=A0 CS: 0010=C2=A0 SS: 000= 0

#6 [ffff88027469fd50] pci_find_next_ext_capability at= ffffffff81345d7b

#7 [ffff88027469fd90] pci_find_ext_capa= bility at ffffffff81347225

#8 [ffff88027469fda0] get_dev= ice_error_info at ffffffff81356c4d

#9 [ffff88027469fdd0] = aer_isr at ffffffff81357a38

#10 [ffff88027469fe28] proces= s_one_work at ffffffff8105d4c0

#11 [ffff88027469fe70] wor= ker_thread at ffffffff8105e251

#12 [ffff88027469fed0] kth= read at ffffffff81064260

#13 [ffff88027469ff50] ret_from_= fork at ffffffff81773a38


crash&g= t;


I have tested it on kernel 3.= 10 . But from source i could see that this case is still relevant for lates= t Linux source .


--------------------END------= --------=C2=A0

On Thu, Aug 2, 2018 at 12:50 PM, Mika Westerberg <mika.westerberg@linux.intel.com> wrote:
On Wed, Aug 01, 2018 at 07:15:12PM +0200, Lukas Wunner = wrote:
> On Wed, Aug 01, 2018 at 07:43:58PM +0300, Mika Westerberg wrote:
> > On Tue, Jul 31, 2018 at 07:50:37AM +0200, Lukas Wunner wrote:
> > > -static void remove_board(struct slot *p_slot)
> > > +static void remove_board(struct slot *p_slot, bool safe_rem= oval)
> > >=C2=A0 {
> > >=C2=A0 =C2=A0struct controller *ctrl =3D p_slot->ctrl;
> > >=C2=A0
> > > - pciehp_unconfigure_device(p_slot);
> > > + pciehp_unconfigure_device(p_slot, safe_removal);
> >
> > Below we turn off power to the slot if it has power controller. E= ven if
> > we disable slot from sysfs, I think it ends up being inaccessible= after
> > power is turned off. I wonder if we should mark the devices disco= nnected
> > in that case as well?
> >
> > >=C2=A0
> > >=C2=A0 =C2=A0if (POWER_CTRL(ctrl)) {
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pciehp_power_off_slo= t(p_slot);
>
> No, when pciehp_unconfigure_device() returns, the PCI devices below > the hotplug bridge are unbound and removed from the system.=C2=A0 They= 're
> gone, so the bit set in their pci_dev struct would no longer be
> accessible anyway.=C2=A0 Unless of course something is holding a ref o= n
> the pci_dev, but that would seem to be a bug.=C2=A0 (Accessing a devic= e
> that's already removed from the system, that is.)
>
> Calling pci_dev_set_disconnected() only gives the PCI core and the
> driver bound to the device an indication that's it's inaccessi= ble,
> so any code paths during unbound and PCI device teardown can skip
> accesses.=C2=A0 (Because pci_dev_is_disconnected() is currently scoped=
> to the PCI core, the disconnected status can only be queried from
> drivers that live in the PCI core, such as portdrv and all the
> port services drivers.)=C2=A0 After the pci_dev is removed from the > hierarchy, accessing it seems at least questionable.
>
> Does this make things clearer?=C2=A0 Shout if it not. :-)

Yes it does. Thank you :)

--000000000000e724d705726ec4ae--