On Thu, 2012-11-29 at 13:29 -0500, chas williams - CONTRACTOR wrote: > On Thu, 29 Nov 2012 18:11:48 +0000 > David Woodhouse wrote: > > > We do see the 'packet received for unknown VCC' complaint, after we > > reboot the host without resetting the card. And as shown in the commit I > > just referenced, we rely on the !ATM_VF_READY check in order to prevent > > a use-after-free when the tasklet is sending packets up a VCC that's > > just been closed. > > well that behavior is just crap. why is it delivering cells for a > vpi/vci pair that is not open? In the reboot case... Because it *was* open and the device has no way of knowing that the host just rebooted. We don't reset the card on loading the driver, because that would cause an ADSL resync. Perhaps we could send a 'close all VCCs' command to the firmware though. Nathan, could we add that to the firmware? Or we could just respond to any unwanted incoming packet by sending a close for that specific VCC. And be careful about potential races with open(). In the close case... The *tasklet* is running to process an incoming packet, finds an open and active VCC and is *about* to send a packet up to it. Meanwhile, our close() runs and the VCC is destroyed. And then the tasklet... oops, use-after-free and crash. Hence commit 1f6ea6e51 which makes the tasklet refuse to process RX for a VCC which doesn't have the ATM_VF_READY flag set. Because it knows it's *being* closed. And the close() routine waits for any *existing* tasklet run to finish, to make sure nobody's referencing the VCC, before it returns and allows vcc_destroy_socket() to complete. It's the RCU principle, basically. > > You mean that (e.g.) pppoatm_push(vcc, NULL) should be waiting for any > > pending TX skb (that has already been passed off to the driver) to > > *complete*? How would it even do that? > > i think the order of the vcc_destroy_socket() operations is a bit > wrong. it should call detach protocol (i.e. push a NULL). this should > cause the attached protocol to stop any future sends and receives, and > it CAN sleep in this context (and only this context -- generally > sending cannot sleep which is why this might seem confusing) to do > whatever it needs to do to wait for the attached protocol to clean up > queues, flush data etc. > > then the driver close() should be called. this takes care of cleaning > up any pending tx or rx that is in the hardware. and of course, > close() can sleep since it will be in a interrutible context. This is basically what Krzysztof's patch 1/7 was doing, which we've now dropped from the series. There are issues with *either* ordering. The current case is that we call vcc->dev->ops->close(vcc) *first*, before vcc->push(vcc, NULL). And that means that the device is told to close the VCC while the protocol may still be pushing packets to it. Hence the patches to both pppoatm and br2684 to make them check for ATM_VF_READY and *stop* pushing packets if it's not set. If we flip it round and tell the protocol first, then it tears down all its data structures while the driver is still happily calling its ->pop() on transmitted skbs. Which leads to the panic in br2684_pop() that we've *also* seen (because we weren't actually flushing the TX packets in the driver's close(), which had the same effect). Yes, you suggest that the protocol could keep track of the skbs it's sent down and wait for them... but surely it's better to let the driver get called first and *abort* them? At this point, I think we're better off as we are (with Krzysztof's patch 1/7 dropped, and leaving vcc->dev->ops->close() being called before vcc->push(NULL). We've fairly much solved the issues with that arrangement, by checking ATM_VF_READY in the protocols' ->push() functions. -- dwmw2