On Tue, Sep 17, 2019 at 11:45:14AM +1000, Oliver O'Halloran wrote: > On Tue, Sep 17, 2019 at 11:04 AM Sam Bobroff wrote: > > > > On Tue, Sep 03, 2019 at 08:15:56PM +1000, Oliver O'Halloran wrote: > > > Currently we print a stack trace in the event handler to help with > > > debugging EEH issues. In the case of suprise hot-unplug this is unneeded, > > > so we want to prevent printing the stack trace unless we know it's due to > > > an actual device error. To accomplish this, we can save a stack trace at > > > the point of detection and only print it once the EEH recovery handler has > > > determined the freeze was due to an actual error. > > > > > > Since the whole point of this is to prevent spurious EEH output we also > > > move a few prints out of the detection thread, or mark them as pr_debug > > > so anyone interested can get output from the eeh_check_dev_failure() > > > if they want. > > > > > > Signed-off-by: Oliver O'Halloran > > > > I think this is a good change, and even in the normal case it will place > > the stacktrace closer to the rest of the recovery information. > > > > But, I think it would make more sense to put the stacktrace into the > > struct eeh_event, rather than the struct eeh_pe. Is there some reason > > we can't do that? (It would save a fair bit of memory!) > > Two reasons: > > 1) the eeh_event structures are allocated with GFP_ATOMIC since > eeh_dev_check_failure() can be called from any context. Minimising the > number of atomic allocations we do is a good idea as a matter of > course. Yes, but I meant directly inside eeh_event so there wouldn't be a second allocation. It would just be a bit bigger. > 2) We don't pass the eeh_event structure to the event handler > function. I guess we could, but... eh > > I don't see the memory saving as hugely significant either. There's > always fewer eeh_pe structures than there are PCI devices since some > will share PEs (e.g. switches, multifunction cards) so you'd be saving > a dozen KB at most. > > root@zaius1:~# lspci | wc -l > 59 > root@zaius1:~# echo $(( $(lspci | wc -l) * 64 * 8)) > 30208 > > I think we'll live... Sure, I don't have very strong feelings about it either way. > > > > Cheers, > > Sam.