On 6/7/2016 10:32 AM, Erez Shitrit wrote: > On Tue, Jun 7, 2016 at 4:48 PM, Doug Ledford wrote: >> On 6/7/2016 2:01 AM, Erez Shitrit wrote: >>> On Tue, Jun 7, 2016 at 2:59 AM, Doug Ledford wrote: >>>> On 6/5/2016 5:10 PM, Or Gerlitz wrote: >>>> >>>>>> - neigh->alive = jiffies; >>>>>> + >>>>>> + if (likely(skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)) >>>>>> + neigh->alive = jiffies; >>>>> >>>>> why testing the queue len makes things right? >>>> >>>> I'm with Or on this one. This test doesn't make sense unless you assume >>>> that the neighbor is being hit with a steady stream of packets. If >>>> someone just runs ping -c 1 to a dead neighbor, this test will fail. >>>> The idea of the patch is OK, but the test needs to be reworked for >>>> situations other than a blasting stream of data. >>>> >>> >>> I will try to explain the idea behind that test, and why it works (I >>> hope I'll make it clear this time :)) >>> >>> there are 2 objects that are taking place here, the garbage-collector >>> that removes neigh if was not used for defined time, and the >>> path-query for each neigh. >>> if the path-query failed the packets for specific neigh are kept in >>> the neigh queue, but only if there are no more than >>> IPOIB_MAX_PATH_REC_QUEUE, and the neigh is still kept for ever (let's >>> assume CM connected that was stopped without any notification with >>> DREQ etc.) >>> The only way to know that there is an unresolved neigh is by checking >>> its queue, if it is full we can assume that this is an unresolved >>> neigh otherwise it is resolved. >> >> That's not true. Reread what I wrote above (I was specific when I >> picked that command). Ping -c 1 will only send one ping. It will not >> trip this test. As I said, your test relies on a stream of packets, a >> single packet to a gone neighbor will never cause it to get deleted. > > If you ping one time (ping -c 1) there is no problem at all, the neigh > will be deleted by the GC anyway, because no (unresolved) packet > updates the neigh validity and the GC will check the last time it was > refreshed and since only one ping refreshed it long ago, it will be > deleted. (the GC always running) > > the problem is when there are non stop traffic to that neigh. IMHO > that test works for that. Ok, that makes sense. >> You need a timeout on the queue and then if the packet in the queue >> times out, regardless of whether the queue is full or not, then you mark >> the neighbor for garbage collection and drop the packet(s). >> >>> So, the test doesn't take any assumption on the shape of the traffic, >>> even in a ping requests once a second it will failed after 3 times the >>> request was sent, and the neigh time will not be updated which will >>> leave it to the garbage-collector to delete it, that gives the driver >>> the possibility to recreate the neigh and to update its ah/pr, >>> otherwise the driver will try over and over to resend to that >>> unresolved neigh. >>> >>>> >> >>