On Wed, Mar 20, 2019 at 05:02:57PM +1100, Alexey Kardashevskiy wrote: > > > On 20/03/2019 13:58, Sam Bobroff wrote: > > The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the > > use of driver callbacks in drivers that have been bound part way > > through the recovery process. This is necessary to prevent later stage > > handlers from being called when the earlier stage handlers haven't, > > which can be confusing for drivers. > > The flag is used from eeh_pe_report()->eeh_pe_report_edev which is > called many times from eeh_handle_normal_event() (and you clear the flag > here unconditionally) and once from eeh_handle_special_event() - so this > is actually the only case now when the flag matters. Is my understanding > correct? Also is not clearing the flag correct in that case? I do not > quite understand eeh_handle_normal_event vs. eeh_handle_special_event > business though. I'm not sure I fully understand your question, but here's the situation: * EEH is detected on a PCI device that has no driver bound but there is a driver that COULD bind. * eeh_handle_normal_event() follows the "EEH: Reset with hotplug activity" path because the device doesn't (currently) have a driver that supports EEH. * eeh_reset_device() removes the device (pci_hp_remove_devices()). * eeh_reset_device() re-discovers the device with pci_hp_add_devices(). * As part of re-discovery the PCI subsystem will bind the available driver. * eeh_handle_normal_event() calls eeh_report_resume() (via eeh_pe_report()). If the (newly bound) driver has a resume() handler, then eeh_report_resume() will call it and AFAIK this will cause a problem for some drivers because their error_detected() handler wasn't called first. The fix for this is to have eeh_add_device_late() set EEH_DEV_NO_HANDLER so that we can detect that the device has been added DURING recovery, and avoid calling it's handlers later. I see what you mean about the eeh_handle_special_event() case, where EEH_DEV_NO_HANDLER isn't cleared before calling eeh_pe_report(), and I think it's a bug! I'll fix it in the next version. (Cleaning up that flag is on my list. I don't think it's a very good solution.) > > > > However, the flag is set for all devices that are added after boot > > time and only cleared at the end of the EEH recovery process. This > > results in hot plugged devices erroneously having the flag set during > > the first recovery after they are added (causing their driver's > > handlers to be incorrectly ignored). > > > > To remedy this, clear the flag at the beginning of recovery > > processing. The flag is still cleared at the end of recovery > > processing, although it is no longer really necessary. > > Then may be remove that redundant clearing? I don't really mind either way; clearing it when we are finished with recovery seems "cleaner" to me but it doesn't have any function. (In any case, I think I will eventually want to remove it.) > > > > Signed-off-by: Sam Bobroff > > --- > > arch/powerpc/kernel/eeh_driver.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c > > index 6f3ee30565dd..4c34b9901f15 100644 > > --- a/arch/powerpc/kernel/eeh_driver.c > > +++ b/arch/powerpc/kernel/eeh_driver.c > > @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe) > > result = PCI_ERS_RESULT_DISCONNECT; > > } > > > > + eeh_for_each_pe(pe, tmp_pe) > > + eeh_pe_for_each_dev(tmp_pe, edev, tmp) > > + edev->mode &= ~EEH_DEV_NO_HANDLER; > > + > > /* Walk the various device drivers attached to this slot through > > * a reset sequence, giving each an opportunity to do what it needs > > * to accomplish the reset. Each child gets a report of the > > > > -- > Alexey >