All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] neighbour.c: Avoid GC directly after state change
@ 2015-03-11 21:01 Ulf Samuelsson
  2015-03-12 18:26 ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-11 21:01 UTC (permalink / raw)
  To: netdev

The neighbour state is changed in the ARP timer handler.
If the state is changed to NUD_STALE, then the neighbour
entry becomes a candidate for garbage collection.

The garbage collection is handled by a "periodic work" routine.

When :

	* noone refers to the entry
	* the state is no longer valid (I.E: NUD_STALE).
	* the timeout value has  been reached or state is FAILED

the "periodic work" routine will notify
the stack that the entry should be deleted.

A user application monitoring and controlling the neighbour table
using NETLINK may fail, if the "period work" routine is run
directly after the state has been changed to NUD_STALE,
but before the user application has had a chance to change
the state to something valid.

The "period work" routine will detect the NUD_STALE state
and if the timeout value has been reached, it will notify the stack
that the entry should be deleted.

The patch adds a check in the periodic work routine
which will skip test for garbage collection
unless a number of ticks has passed since the last time
the neighbour entry state was changed.

The feature is controlled through Kconfig

The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
The guardband time (in ticks) is set in ARP_GC_GUARDBAND
Default time is 100 ms if HZ_### is set.

Signed-off-by: Ulf Samuelsson <netdev@emagii.com>
---
 net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
 net/core/neighbour.c |   15 ++++++++++++---
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 44dd578..099a5dd 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -77,6 +77,38 @@ config INET
 	  Short answer: say Y.
 
 if INET
+
+#
+# Core Network configuration
+#
+
+config ARP_GC_APPLY_GUARDBAND
+	bool "IP: ARP: Avoid garbage collection directly after state change"
+	default n
+	---help---
+	  With this item selected, an entry in the neighbour table
+	  will not be garbage collected directly after the ARP state
+	  has changed to STALE of FAILED
+	  This allows an application program change the state to something valid
+	  before garbage colllection occurs.
+
+	  If unsure, say N.
+
+config ARP_GC_GUARDBAND
+	int "Guardband time on garbage collection"
+	depends on ARP_GC_APPLY_GUARDBAND
+	default 10 if HZ_100
+	default 25 if HZ_250
+	default 30 if HZ_300
+	default 100 if HZ_1000
+	default 100
+
+	---help---
+	  The number of ticks to delay garbage collection
+	  after the neighbour entry has been updated
+	  A delay of 100 ms is reasonable.
+	  With CONFIG_HZ = 250, this value should be 25
+
 source "net/ipv4/Kconfig"
 source "net/ipv6/Kconfig"
 source "net/netlabel/Kconfig"
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 70fe9e1..194195d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -786,13 +786,23 @@ static void neigh_periodic_work(struct work_struct *work)
 
 			state = n->nud_state;
 			if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
-				write_unlock(&n->lock);
 				goto next_elt;
 			}
 
 			if (time_before(n->used, n->confirmed))
 				n->used = n->confirmed;
 
+#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
+			/* Do not garbage collect directly after we
+			 * updated n->state to allow applications to
+			 * react to the event
+			 */
+			if (time_before(jiffies,
+					n->updated + CONFIG_ARP_GC_GUARDBAND)) {
+				goto next_elt;
+			}
+#endif
+
 			if (atomic_read(&n->refcnt) == 1 &&
 			    (state == NUD_FAILED ||
 			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
@@ -802,9 +812,8 @@ static void neigh_periodic_work(struct work_struct *work)
 				neigh_cleanup_and_release(n);
 				continue;
 			}
-			write_unlock(&n->lock);
-
 next_elt:
+			write_unlock(&n->lock);
 			np = &n->next;
 		}
 		/*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-11 21:01 [PATCH] neighbour.c: Avoid GC directly after state change Ulf Samuelsson
@ 2015-03-12 18:26 ` David Miller
  2015-03-17 23:33   ` Ulf Samuelsson
  2015-04-10  8:26   ` Ulf Samuelsson
  0 siblings, 2 replies; 22+ messages in thread
From: David Miller @ 2015-03-12 18:26 UTC (permalink / raw)
  To: netdev; +Cc: netdev


I hate changes like this.

By making this a Kconfig options it cannot be dynamic, and every
distribution is going to have to scratch their head and decide
what to set this to.

That's seriously suboptimal.

If you want to change behavior based upon whether userspace is
managing the damn table, make it so the user doing so has to
ask for the new behavior at _RUN TIME_ via the netlink interface
or similar.

Picking the guard time itself at compile time is also undesirable.

And you don't even want a damn timer, what you want is for the
state of the entry to be frozen and for the user to "release"
it by either adjusting the state to something else or marking
in some other way to allow it to be unfrozen and released again.

Why put it to chance with some timeout?  Make things explicit.

I'm not applying this patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-12 18:26 ` David Miller
@ 2015-03-17 23:33   ` Ulf Samuelsson
  2015-03-18  1:56     ` YOSHIFUJI Hideaki/吉藤英明
  2015-04-10  8:26   ` Ulf Samuelsson
  1 sibling, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-17 23:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Den 2015-03-12 19:26, David Miller skrev:
> I hate changes like this.
>
> By making this a Kconfig options it cannot be dynamic, and every
> distribution is going to have to scratch their head and decide
> what to set this to.
>
> That's seriously suboptimal.
>
> If you want to change behavior based upon whether userspace is
> managing the damn table, make it so the user doing so has to
> ask for the new behavior at _RUN TIME_ via the netlink interface
> or similar.
>
> Picking the guard time itself at compile time is also undesirable.
>
> And you don't even want a damn timer, what you want is for the
> state of the entry to be frozen and for the user to "release"
> it by either adjusting the state to something else or marking
> in some other way to allow it to be unfrozen and released again.
>
> Why put it to chance with some timeout?  Make things explicit.
>
> I'm not applying this patch.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sounds reasonable comments to me.

Would this approach work?

2 new message types are defined, to enable/disable the garbage
collection functionality.

RTM_ENANEIGHGC
RTM_DISANEIGHGC


When the functionality is disabled, the stack will not garbage collect,
and the external application will have to send netlink messages
to delete unwanted entries.

1 new message is defined to move an entry from STALE to DELAY.
RTM_NEIGHRECOVER

Will have to discuss internally to see if this works.

BR
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-17 23:33   ` Ulf Samuelsson
@ 2015-03-18  1:56     ` YOSHIFUJI Hideaki/吉藤英明
  0 siblings, 0 replies; 22+ messages in thread
From: YOSHIFUJI Hideaki/吉藤英明 @ 2015-03-18  1:56 UTC (permalink / raw)
  To: Ulf Samuelsson, David Miller; +Cc: hideaki.yoshifuji, netdev

Hi,

Ulf Samuelsson wrote:
> Den 2015-03-12 19:26, David Miller skrev:
>> I hate changes like this.
>>
>> By making this a Kconfig options it cannot be dynamic, and every
>> distribution is going to have to scratch their head and decide
>> what to set this to.
>>
>> That's seriously suboptimal.
>>
>> If you want to change behavior based upon whether userspace is
>> managing the damn table, make it so the user doing so has to
>> ask for the new behavior at _RUN TIME_ via the netlink interface
>> or similar.
>>
>> Picking the guard time itself at compile time is also undesirable.
>>
>> And you don't even want a damn timer, what you want is for the
>> state of the entry to be frozen and for the user to "release"
>> it by either adjusting the state to something else or marking
>> in some other way to allow it to be unfrozen and released again.
>>
>> Why put it to chance with some timeout?  Make things explicit.
>>
>> I'm not applying this patch.
:
> Sounds reasonable comments to me.
>
> Would this approach work?
>
> 2 new message types are defined, to enable/disable the garbage
> collection functionality.
>
> RTM_ENANEIGHGC
> RTM_DISANEIGHGC
>
>
> When the functionality is disabled, the stack will not garbage collect,
> and the external application will have to send netlink messages
> to delete unwanted entries.
>
> 1 new message is defined to move an entry from STALE to DELAY.
> RTM_NEIGHRECOVER

Why don't you "explicitly" set the neighbour, or mark the neighbour
entry as PERMANENT via netlink when you recognized that the entry
has become REACHABLE, then?

--yoshfuji

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-12 18:26 ` David Miller
  2015-03-17 23:33   ` Ulf Samuelsson
@ 2015-04-10  8:26   ` Ulf Samuelsson
  2015-04-15 13:40     ` Ulf Samuelsson
  2015-04-16  5:16     ` YOSHIFUJI Hideaki
  1 sibling, 2 replies; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-10  8:26 UTC (permalink / raw)
  To: netdev; +Cc: netdev


On 03/12/2015 07:26 PM, David Miller wrote:
> I hate changes like this.
>
> By making this a Kconfig options it cannot be dynamic, and every
> distribution is going to have to scratch their head and decide
> what to set this to.
>
> That's seriously suboptimal.
>
> If you want to change behavior based upon whether userspace is
> managing the damn table, make it so the user doing so has to
> ask for the new behavior at _RUN TIME_ via the netlink interface
> or similar.
>
> Picking the guard time itself at compile time is also undesirable.
>
> And you don't even want a damn timer, what you want is for the
> state of the entry to be frozen and for the user to "release"
> it by either adjusting the state to something else or marking
> in some other way to allow it to be unfrozen and released again.
>
> Why put it to chance with some timeout?  Make things explicit.

The desired functionality is that if communication stops,
you want to send out ARP probes, before the entry is deleted.

The current (pseudo) code of the neigh timer is:

     if (state & NUD_REACHABLE) {
         if (now <= "confirmed + "reachable_time")) {
                     ... /* We are OK */
         } else if (now < "used" + DELAY_PROBE_TIME) {    /* Never 
happens */
                     state = NUD_DELAY;
         } else {
             state = NUD_STALE;
             notify = 1;
         }

We never see the state beeing changed from REACHABLE to DELAY,
so the probes are not beeing sent out, instead you always go
from REACHABLE to STALE.

DELAY_PROBE_TIME is set to (5 x HZ) and "used"
seems to be only set by the periodic_work routine
when the neigh entry is in STALE state, and then it is too late.
It is also set by "arp_find" which is used by "broken" devices.

In practice, the second condition: "(now < "used" + DELAY_PROBE_TIME)" 
is never used.
What is the intention of this test?

By adding a new test + parameter, we would get the desired functionality,
and no need to listen for notifications or doing ARP state updates from 
applications.

         if (now <= "confirmed + "reachable_time")) {
                     ... /* We are OK */
+        else if (now <= "confirmed + "reprobe_time")) {
+                   state <= NUD_DELAY;
         } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never 
happens */
                     state <= NUD_DELAY;
         } else {
             state = NUD_STALE;
             notify = 1;
         }

This way the entry would remain in REACHABLE while normal communication 
occurs,
then it would enter DELAY state to probe, and if that fails, it goes to 
STALE state.

Alternatively, we just change the second test:
         if (now <= "confirmed + "reachable_time")) {
                     ... /* We are OK */
-        } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never 
happens */
+       } else if (now < "confirmed" + DELAY_PROBE_TIME))) {
                     state <= NUD_DELAY;
         } else {
             state = NUD_STALE;
             notify = 1;
         }


The  DELAY_PROBE_TIME, should preferrably be a kernel Kconfig parameter.

Best Regards,
Ulf Samuelsson


> I'm not applying this patch.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-10  8:26   ` Ulf Samuelsson
@ 2015-04-15 13:40     ` Ulf Samuelsson
  2015-04-16  5:16     ` YOSHIFUJI Hideaki
  1 sibling, 0 replies; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-15 13:40 UTC (permalink / raw)
  To: netdev; +Cc: netdev

No reply on this....

in net/core/neighbour.c: neigh_timer_handler I see:

     if (state & NUD_REACHABLE) {
         if (time_before_eq(now,
                    neigh->confirmed + neigh->parms->reachable_time)) {
             neigh_dbg(2, "neigh %p is still alive\n", neigh);
             next = neigh->confirmed + neigh->parms->reachable_time;
         } else if (time_before_eq(now,
                       neigh->used +
                       NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
             neigh_dbg(2, "neigh %p is delayed\n", neigh);
             neigh->nud_state = NUD_DELAY;
             neigh->updated = jiffies;
             neigh_suspect(neigh);
             next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME);
         } else {
             neigh_dbg(2, "neigh %p is suspected\n", neigh);
             neigh->nud_state = NUD_STALE;
             neigh->updated = jiffies;
             neigh_suspect(neigh);
             notify = 1;
         }
     } else ...

Why is the second test there in the first place?

In the normal case, "neigh->used" does not get updated until the entry 
is found STALE
in the periodic work.

Why not use "neigh->confirmed" and another sysctl variable?

     if (state & NUD_REACHABLE) {
         if (time_before_eq(now,
                    neigh->confirmed + neigh->parms->reachable_time)) {
             neigh_dbg(2, "neigh %p is still alive\n", neigh);
             next = neigh->confirmed + neigh->parms->reachable_time;
         } else if (time_before_eq(now,
-                     neigh->used +
-                      NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
+                    neigh->confirmed +
+                     NEIGH_VAR(neigh->parms, DELAY_REPROBE_TIME))) {
             neigh_dbg(2, "neigh %p is delayed\n", neigh);
             neigh->nud_state = NUD_DELAY;
             neigh->updated = jiffies;
             neigh_suspect(neigh);
             next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME);
         } else {
             neigh_dbg(2, "neigh %p is suspected\n", neigh);
             neigh->nud_state = NUD_STALE;
             neigh->updated = jiffies;
             neigh_suspect(neigh);
             notify = 1;
         }
     } else ...

if DELAY_REPROBE_TIME is larger than "reachable_time", then
     the kernel will send out ARP probes when it is about
     to lose communication with a remote machine.
     This is what we need.

If it is smaller, then
     it will go from REACHABLE to STALE.

The initial value of DELAY_REPROBE_TIME needs to be settable in Kconfig
to allow the selection of functionality.

I am told that setting stuff using sysctl has a performance penalty, when
interfaces are dynamically created and deleted in hundreds.

Best Regards,
Ulf Samuelsson
  

On 04/10/2015 10:26 AM, Ulf Samuelsson wrote:
>
> On 03/12/2015 07:26 PM, David Miller wrote:
>> I hate changes like this.
>>
>> By making this a Kconfig options it cannot be dynamic, and every
>> distribution is going to have to scratch their head and decide
>> what to set this to.
>>
>> That's seriously suboptimal.
>>
>> If you want to change behavior based upon whether userspace is
>> managing the damn table, make it so the user doing so has to
>> ask for the new behavior at _RUN TIME_ via the netlink interface
>> or similar.
>>
>> Picking the guard time itself at compile time is also undesirable.
>>
>> And you don't even want a damn timer, what you want is for the
>> state of the entry to be frozen and for the user to "release"
>> it by either adjusting the state to something else or marking
>> in some other way to allow it to be unfrozen and released again.
>>
>> Why put it to chance with some timeout?  Make things explicit.
>
> The desired functionality is that if communication stops,
> you want to send out ARP probes, before the entry is deleted.
>
> The current (pseudo) code of the neigh timer is:
>
>     if (state & NUD_REACHABLE) {
>         if (now <= "confirmed + "reachable_time")) {
>                     ... /* We are OK */
>         } else if (now < "used" + DELAY_PROBE_TIME) {    /* Never 
> happens */
>                     state = NUD_DELAY;
>         } else {
>             state = NUD_STALE;
>             notify = 1;
>         }
>
> We never see the state beeing changed from REACHABLE to DELAY,
> so the probes are not beeing sent out, instead you always go
> from REACHABLE to STALE.
>
> DELAY_PROBE_TIME is set to (5 x HZ) and "used"
> seems to be only set by the periodic_work routine
> when the neigh entry is in STALE state, and then it is too late.
> It is also set by "arp_find" which is used by "broken" devices.
>
> In practice, the second condition: "(now < "used" + DELAY_PROBE_TIME)" 
> is never used.
> What is the intention of this test?
>
> By adding a new test + parameter, we would get the desired functionality,
> and no need to listen for notifications or doing ARP state updates 
> from applications.
>
>         if (now <= "confirmed + "reachable_time")) {
>                     ... /* We are OK */
> +        else if (now <= "confirmed + "reprobe_time")) {
> +                   state <= NUD_DELAY;
>         } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never 
> happens */
>                     state <= NUD_DELAY;
>         } else {
>             state = NUD_STALE;
>             notify = 1;
>         }
>
> This way the entry would remain in REACHABLE while normal 
> communication occurs,
> then it would enter DELAY state to probe, and if that fails, it goes 
> to STALE state.
>
> Alternatively, we just change the second test:
>         if (now <= "confirmed + "reachable_time")) {
>                     ... /* We are OK */
> -        } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never 
> happens */
> +       } else if (now < "confirmed" + DELAY_PROBE_TIME))) {
>                     state <= NUD_DELAY;
>         } else {
>             state = NUD_STALE;
>             notify = 1;
>         }
>
>
> The  DELAY_PROBE_TIME, should preferrably be a kernel Kconfig parameter.
>
> Best Regards,
> Ulf Samuelsson
>
>
>> I'm not applying this patch.
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-10  8:26   ` Ulf Samuelsson
  2015-04-15 13:40     ` Ulf Samuelsson
@ 2015-04-16  5:16     ` YOSHIFUJI Hideaki
  2015-04-17  8:03       ` Ulf Samuelsson
  1 sibling, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-04-16  5:16 UTC (permalink / raw)
  To: Ulf Samuelsson, netdev; +Cc: hideaki.yoshifuji, netdev

Hi,

Ulf Samuelsson wrote:

> The desired functionality is that if communication stops,
> you want to send out ARP probes, before the entry is deleted.
> 
> The current (pseudo) code of the neigh timer is:
> 
>     if (state & NUD_REACHABLE) {
>         if (now <= "confirmed + "reachable_time")) {
>                     ... /* We are OK */
>         } else if (now < "used" + DELAY_PROBE_TIME) {    /* Never happens */
>                     state = NUD_DELAY;
>         } else {
>             state = NUD_STALE;
>             notify = 1;
>         }
> 
> We never see the state beeing changed from REACHABLE to DELAY,
> so the probes are not beeing sent out, instead you always go
> from REACHABLE to STALE.

That's right.


> DELAY_PROBE_TIME is set to (5 x HZ) and "used"
> seems to be only set by the periodic_work routine
> when the neigh entry is in STALE state, and then it is too late.
> It is also set by "arp_find" which is used by "broken" devices.
> 

In STALE state, neigh->used is set by neigh_event_send(), called
by neigh_resolve_output() via neigh->output().


> In practice, the second condition: "(now < "used" + DELAY_PROBE_TIME)" is never used.
> What is the intention of this test?

That's right.  It is NOT used in normal condition unless
reachable time is too short.


> 
> By adding a new test + parameter, we would get the desired functionality,
> and no need to listen for notifications or doing ARP state updates from applications.
> 
>         if (now <= "confirmed + "reachable_time")) {
>                     ... /* We are OK */
> +        else if (now <= "confirmed + "reprobe_time")) {
> +                   state <= NUD_DELAY;
>         } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never happens */
>                     state <= NUD_DELAY;
>         } else {
>             state = NUD_STALE;
>             notify = 1;
>         }
> 
> This way the entry would remain in REACHABLE while normal communication occurs,
> then it would enter DELAY state to probe, and if that fails, it goes to STALE state.

No, it is not what REACHABLE and DELAY mean.

>From RFC2461:

|      REACHABLE   Roughly speaking, the neighbor is known to have been
|                  reachable recently (within tens of seconds ago).
:
|      STALE       The neighbor is no longer known to be reachable but
|                  until traffic is sent to the neighbor, no attempt
|                  should be made to verify its reachability.
|      DELAY       The neighbor is no longer known to be reachable, and
|                  traffic has recently been sent to the neighbor.
|                  Rather than probe the neighbor immediately, however,
|                  delay sending probes for a short while in order to
|                  give upper layer protocols a chance to provide
|                  reachability confirmation.


-- 
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-16  5:16     ` YOSHIFUJI Hideaki
@ 2015-04-17  8:03       ` Ulf Samuelsson
  2015-04-20  2:33         ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-17  8:03 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, netdev; +Cc: netdev


On 04/16/2015 07:16 AM, YOSHIFUJI Hideaki wrote:
> Hi,
>
> Ulf Samuelsson wrote:
>
>> The desired functionality is that if communication stops,
>> you want to send out ARP probes, before the entry is deleted.
>>
>> The current (pseudo) code of the neigh timer is:
>>
>>      if (state & NUD_REACHABLE) {
>>          if (now <= "confirmed + "reachable_time")) {
>>                      ... /* We are OK */
>>          } else if (now < "used" + DELAY_PROBE_TIME) {    /* Never happens */
>>                      state = NUD_DELAY;
>>          } else {
>>              state = NUD_STALE;
>>              notify = 1;
>>          }
>>
>> We never see the state beeing changed from REACHABLE to DELAY,
>> so the probes are not beeing sent out, instead you always go
>> from REACHABLE to STALE.
> That's right.
But not acceptable, in telecom.
>
>
>> DELAY_PROBE_TIME is set to (5 x HZ) and "used"
>> seems to be only set by the periodic_work routine
>> when the neigh entry is in STALE state, and then it is too late.
>> It is also set by "arp_find" which is used by "broken" devices.
>>
> In STALE state, neigh->used is set by neigh_event_send(), called
> by neigh_resolve_output() via neigh->output().



>> In practice, the second condition: "(now < "used" + DELAY_PROBE_TIME)" is never used.
>> What is the intention of this test?
> That's right.  It is NOT used in normal condition unless
> reachable time is too short.
>
>
>> By adding a new test + parameter, we would get the desired functionality,
>> and no need to listen for notifications or doing ARP state updates from applications.
>>
>>          if (now <= "confirmed + "reachable_time")) {
>>                      ... /* We are OK */
>> +        else if (now <= "confirmed + "reprobe_time")) {
>> +                   state <= NUD_DELAY;
>>          } else if (now < "used" + DELAY_PROBE_TIME))) {    /* Never happens */
>>                      state <= NUD_DELAY;
>>          } else {
>>              state = NUD_STALE;
>>              notify = 1;
>>          }
>>
>> This way the entry would remain in REACHABLE while normal communication occurs,
>> then it would enter DELAY state to probe, and if that fails, it goes to STALE state.
> No, it is not what REACHABLE and DELAY mean.
>
>  From RFC2461:
>
> |      REACHABLE   Roughly speaking, the neighbor is known to have been
> |                  reachable recently (within tens of seconds ago).
> :
> |      STALE       The neighbor is no longer known to be reachable but
> |                  until traffic is sent to the neighbor, no attempt
> |                  should be made to verify its reachability.
> |      DELAY       The neighbor is no longer known to be reachable, and
> |                  traffic has recently been sent to the neighbor.
> |                  Rather than probe the neighbor immediately, however,
> |                  delay sending probes for a short while in order to
> |                  give upper layer protocols a chance to provide
> |                  reachability confirmation.
>
>

It is all depending on the meaning of the word "recently".
You imply, that if timeouts have been triggered, then it is no longer 
"recent",
but that is not the only interpretation, it is up to the implementer to 
decide
what is "recently".

You can argue, that for REACHABLE they define it as "(within tens of 
seconds ago)",
but in a standards document, that is not enough,
so the definition of STALE is perfectly OK due to this ambiguity.

We have the situation in that machines enter and exit the network, at 
unpredictable times,
and while traffic is sporadic, they still need to be reachable.
They should not enter FAILED state unless they leave the network.

I see also in the RFC2461:
"To reduce unnecessary network traffic, probe messages are only sent to
neighbors to which the node is actively sending packets."

In telecom applications, as long as the neighbour is present on the network,
the node will be sending packets, even if it is not that frequent.

These probes are *neccessary* for the system to work properly,
due to the long time for garbage collection.

The PROBE state need to be entered once, and only when these probes get 
no answer,
the entry should move into STALE.
I think that is compliant with the statement above.

Since they leave at unpredictable times, it is not good to set them to 
PERMANENT.

Therefore, if a timeout occurs due to no traffic, they must be probed before
they are garbage collected.

If this is not acceptable, how do you propose to solve the problem that 
you cannot
make remote units inaccessible for more than a fraction of a second?


Best Regards,
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-17  8:03       ` Ulf Samuelsson
@ 2015-04-20  2:33         ` YOSHIFUJI Hideaki
  2015-04-20 12:48           ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-04-20  2:33 UTC (permalink / raw)
  To: Ulf Samuelsson, netdev; +Cc: hideaki.yoshifuji, netdev

Hi,

Ulf Samuelsson wrote:
>>  From RFC2461:
>>
>> |      REACHABLE   Roughly speaking, the neighbor is known to have been
>> |                  reachable recently (within tens of seconds ago).
>> :
>> |      STALE       The neighbor is no longer known to be reachable but
>> |                  until traffic is sent to the neighbor, no attempt
>> |                  should be made to verify its reachability.
>> |      DELAY       The neighbor is no longer known to be reachable, and
>> |                  traffic has recently been sent to the neighbor.
>> |                  Rather than probe the neighbor immediately, however,
>> |                  delay sending probes for a short while in order to
>> |                  give upper layer protocols a chance to provide
>> |                  reachability confirmation.
>>
>>
> 
> It is all depending on the meaning of the word "recently".
> You imply, that if timeouts have been triggered, then it is no longer "recent",
> but that is not the only interpretation, it is up to the implementer to decide
> what is "recently".

That quoted text is just a "brief" description.  The document has detailed
state machine.


> Therefore, if a timeout occurs due to no traffic, they must be probed before
> they are garbage collected.

It is what we do in PROBE state.


> If this is not acceptable, how do you propose to solve the problem that you cannot
> make remote units inaccessible for more than a fraction of a second?

How many neighbors do you want to maintain?
I guess you have to increase the number of gc_thresh1.

-- 
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-20  2:33         ` YOSHIFUJI Hideaki
@ 2015-04-20 12:48           ` Ulf Samuelsson
  2015-04-21  3:58             ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-20 12:48 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, netdev; +Cc: netdev


On 04/20/2015 04:33 AM, YOSHIFUJI Hideaki wrote:
> Hi,
>
> Ulf Samuelsson wrote:
>>>   From RFC2461:
>>>
>>> |      REACHABLE   Roughly speaking, the neighbor is known to have been
>>> |                  reachable recently (within tens of seconds ago).
>>> :
>>> |      STALE       The neighbor is no longer known to be reachable but
>>> |                  until traffic is sent to the neighbor, no attempt
>>> |                  should be made to verify its reachability.
>>> |      DELAY       The neighbor is no longer known to be reachable, and
>>> |                  traffic has recently been sent to the neighbor.
>>> |                  Rather than probe the neighbor immediately, however,
>>> |                  delay sending probes for a short while in order to
>>> |                  give upper layer protocols a chance to provide
>>> |                  reachability confirmation.
>>>
>>>
>> It is all depending on the meaning of the word "recently".
>> You imply, that if timeouts have been triggered, then it is no longer "recent",
>> but that is not the only interpretation, it is up to the implementer to decide
>> what is "recently".
> That quoted text is just a "brief" description.  The document has detailed
> state machine.
>

It is not *mandatory* to follow the state machine strictly, Page 85:

    "This appendix contains a summary of the rules specified in Sections
    7.2 and 7.3.  This document does not mandate that implementations
    adhere to this model as long as their external behavior is consistent
    with that described in this document."

The kernel does not follow the state machine today.
The kernel already have a test which compares

     "neigh->used" + timeout with current time,
     and move the entry to DELAY.

This is not documented in the state machine so there is already
a precedent to compare

     "neigh->compared" + timeout with current time
     and move the entry into DELAY state.

Obviously, some people would not want you to send probes before going STALE,
so it needs to be configurable.
>> Therefore, if a timeout occurs due to no traffic, they must be probed before
>> they are garbage collected.
> It is what we do in PROBE state.
Yes, but you have to start by moving it into DELAY state first, to init 
the probe counter.
If you move the entry from REACHABLE to DELAY, then the probe counter
may be any value.

>
>> If this is not acceptable, how do you propose to solve the problem that you cannot
>> make remote units inaccessible for more than a fraction of a second?
> How many neighbors do you want to maintain?
> I guess you have to increase the number of gc_thresh1.
The current use cases have up to 2048 entries.
This is expected to grow in the future.
The 3.4 kernel used in the system today is limited to 1024,
but that has been raised to about 10k.

The gc_thresh1 test is not implemented in 3.4 but can be backported,
but still not convinced it is a good idea.

To complicate things, one requirement is that for some interfaces
you always want to keep things alive, if connected, but
for other interfaces you want things to be removed
to conserve memory.
Actually you would want to do this selection on a subnet level.

Internal discussions resulted in a proposal to change the patch,
so that you have a "keepalive" flag which is tested after
it has been decided to exit the REACHABLE state.

if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.


Best Regards,
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-20 12:48           ` Ulf Samuelsson
@ 2015-04-21  3:58             ` YOSHIFUJI Hideaki
  2015-04-22  7:42               ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-04-21  3:58 UTC (permalink / raw)
  To: Ulf Samuelsson, netdev; +Cc: hideaki.yoshifuji, netdev

Ulf Samuelsson wrote:
>> How many neighbors do you want to maintain?
>> I guess you have to increase the number of gc_thresh1.
> The current use cases have up to 2048 entries.
> This is expected to grow in the future.
> The 3.4 kernel used in the system today is limited to 1024,
> but that has been raised to about 10k.
> 
> The gc_thresh1 test is not implemented in 3.4 but can be backported,
> but still not convinced it is a good idea.

Why?

> To complicate things, one requirement is that for some interfaces
> you always want to keep things alive, if connected, but
> for other interfaces you want things to be removed
> to conserve memory.
> Actually you would want to do this selection on a subnet level.

If you want to introduce per-interface parameter, I am okay with it.

> 
> Internal discussions resulted in a proposal to change the patch,
> so that you have a "keepalive" flag which is tested after
> it has been decided to exit the REACHABLE state.
> 
> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.

No.

-- 
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-21  3:58             ` YOSHIFUJI Hideaki
@ 2015-04-22  7:42               ` Ulf Samuelsson
  2015-04-22 10:46                 ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-22  7:42 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, netdev; +Cc: netdev


On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
> Ulf Samuelsson wrote:
>>> How many neighbors do you want to maintain?
>>> I guess you have to increase the number of gc_thresh1.
>> The current use cases have up to 2048 entries.
>> This is expected to grow in the future.
>> The 3.4 kernel used in the system today is limited to 1024,
>> but that has been raised to about 10k.
>>
>> The gc_thresh1 test is not implemented in 3.4 but can be backported,
>> but still not convinced it is a good idea.
> Why?
>
A good solution makes sure that:
* equipment which is connected NEVER IS garbage collected
* equipment which is disconnected IS garbage collected.

The threshold idea does not meet the criteria for a good solution.

With this solution you keep unnecessary entries in the table.
If you ever pass the limit, then equipment which should not
be garbage collected may be.
It relies on someone keeping track of traffic loss,
so needs more maintenance by the SysOp.

The ARP probes should be considered to be NECESSARY traffic
to maintain a quality link.
Obviously not everyone would want to make this trade-off.


>> To complicate things, one requirement is that for some interfaces
>> you always want to keep things alive, if connected, but
>> for other interfaces you want things to be removed
>> to conserve memory.
>> Actually you would want to do this selection on a subnet level.
> If you want to introduce per-interface parameter, I am okay with it.
>
>> Internal discussions resulted in a proposal to change the patch,
>> so that you have a "keepalive" flag which is tested after
>> it has been decided to exit the REACHABLE state.
>>
>> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.
> No.
>
And why is it a bad idea to have a high quality connection?

Best Regards,
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-22  7:42               ` Ulf Samuelsson
@ 2015-04-22 10:46                 ` YOSHIFUJI Hideaki
  2015-04-22 11:49                   ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-04-22 10:46 UTC (permalink / raw)
  To: Ulf Samuelsson, netdev; +Cc: hideaki.yoshifuji, netdev

Ulf Samuelsson wrote:
> 
> On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
>> Ulf Samuelsson wrote:
>>>> How many neighbors do you want to maintain?
>>>> I guess you have to increase the number of gc_thresh1.
>>> The current use cases have up to 2048 entries.
>>> This is expected to grow in the future.
>>> The 3.4 kernel used in the system today is limited to 1024,
>>> but that has been raised to about 10k.
>>>
>>> The gc_thresh1 test is not implemented in 3.4 but can be backported,
>>> but still not convinced it is a good idea.
>> Why?
>>
> A good solution makes sure that:
> * equipment which is connected NEVER IS garbage collected
> * equipment which is disconnected IS garbage collected.
> 
> The threshold idea does not meet the criteria for a good solution.

We try providing "good solution" if you have less than gc_thresh1
entries only.  Otherwise, we try hard to protect ourselves.


> With this solution you keep unnecessary entries in the table.
> If you ever pass the limit, then equipment which should not
> be garbage collected may be.
> It relies on someone keeping track of traffic loss,
> so needs more maintenance by the SysOp.try pr
> 
> The ARP probes should be considered to be NECESSARY traffic
> to maintain a quality link.
> Obviously not everyone would want to make this trade-off.
> 
> 
>>> To complicate things, one requirement is that for some interfaces
>>> you always want to keep things alive, if connected, but
>>> for other interfaces you want things to be removed
>>> to conserve memory.
>>> Actually you would want to do this selection on a subnet level.
>> If you want to introduce per-interface parameter, I am okay with it.
>>
>>> Internal discussions resulted in a proposal to change the patch,
>>> so that you have a "keepalive" flag which is tested after
>>> it has been decided to exit the REACHABLE state.
>>>
>>> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.
>> No.
>>
> And why is it a bad idea to have a high quality connection?

We reclaim neighbor entries as much as possible to protect
ourselves if the number is below gc_thresh1.  We could stop
purging entries, but the idea was rejected AFAIK.  That is
our design.

Again, you should increase gc_thresh1, first.

-- 
Hideaki Yoshifuji <hideaki.yoshifuji@miraclelinux.com>
Technical Division, MIRACLE LINUX CORPORATION

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-22 10:46                 ` YOSHIFUJI Hideaki
@ 2015-04-22 11:49                   ` Ulf Samuelsson
  2015-05-08  9:39                     ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-04-22 11:49 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, netdev; +Cc: netdev


On 04/22/2015 12:46 PM, YOSHIFUJI Hideaki wrote:
> Ulf Samuelsson wrote:
>> On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
>>> Ulf Samuelsson wrote:
>>>>> How many neighbors do you want to maintain?
>>>>> I guess you have to increase the number of gc_thresh1.
>>>> The current use cases have up to 2048 entries.
>>>> This is expected to grow in the future.
>>>> The 3.4 kernel used in the system today is limited to 1024,
>>>> but that has been raised to about 10k.
>>>>
>>>> The gc_thresh1 test is not implemented in 3.4 but can be backported,
>>>> but still not convinced it is a good idea.
>>> Why?
>>>
>> A good solution makes sure that:
>> * equipment which is connected NEVER IS garbage collected
>> * equipment which is disconnected IS garbage collected.
>>
>> The threshold idea does not meet the criteria for a good solution.
> We try providing "good solution" if you have less than gc_thresh1
> entries only.  Otherwise, we try hard to protect ourselves.

Protect against what?


>
>> With this solution you keep unnecessary entries in the table.
>> If you ever pass the limit, then equipment which should not
>> be garbage collected may be.
>> It relies on someone keeping track of traffic loss,
>> so needs more maintenance by the SysOp.try pr
>>
>> The ARP probes should be considered to be NECESSARY traffic
>> to maintain a quality link.
>> Obviously not everyone would want to make this trade-off.
>>
>>
>>>> To complicate things, one requirement is that for some interfaces
>>>> you always want to keep things alive, if connected, but
>>>> for other interfaces you want things to be removed
>>>> to conserve memory.
>>>> Actually you would want to do this selection on a subnet level.
>>> If you want to introduce per-interface parameter, I am okay with it.
>>>
>>>> Internal discussions resulted in a proposal to change the patch,
>>>> so that you have a "keepalive" flag which is tested after
>>>> it has been decided to exit the REACHABLE state.
>>>>
>>>> if the "keepalive" flag is set, you always go to DELAY state from REACHABLE.
>>> No.
>>>
>> And why is it a bad idea to have a high quality connection?
> We reclaim neighbor entries as much as possible to protect
> ourselves if the number is below gc_thresh1.  We could stop
> purging entries, but the idea was rejected AFAIK.  That is
> our design.
No you keep disconnected entries which can be removed
in the table as long as you are below gc_tresh1, so you waste memory.

This is a trade-off to reduce communication over the wire
at the expense of quality of service.

If you can live with the drawback of extra bandwidth usage,
then doing probes before you remove an entry will reduce
memory usage and improve the quality of the communication.

Why do you want to decide which trade-off is best?
Why not let the system designer make that decision?

I am not asking to stop purging entries.
Entries which are not needed *should* be purged.
But entries which are needed should not be purged,
because that means traffic loss.
This design does not differentiate between the two
providing unnecessary risks and wastes memory.

We know beforehand that if a neighbour is connected
to a certain interface, it should always be present in the table.
You do not want us to use that knowledge when configuring the kernel.

I think that the idea that was rejected must have been something different.
>
> Again, you should increase gc_thresh1, first.
>
To what number?
When equipment is delivered from a factory, it is not known
how many entries are needed, and if it becomes known, it can later grow.
Why add extra maintenance to the system?


Best Regards,
Ulf Samuelsson

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-04-22 11:49                   ` Ulf Samuelsson
@ 2015-05-08  9:39                     ` Ulf Samuelsson
  0 siblings, 0 replies; 22+ messages in thread
From: Ulf Samuelsson @ 2015-05-08  9:39 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, netdev; +Cc: netdev

In order to meet our goal, not to garbage collect neighbour entries 
which are still online,
even if they have not communicated in a while,
we have tested what happens if the "timer_handler" always changes
to DELAY state when the REACHABLE timeout has expired.

While this change is not acceptable to you guys, it seems mostly to work.

We have seen one problem which might affect an unpatched kernel.

The current kernel can move into DELAY state inside the timer_handler
or when executing "__neigh_event_send" while in STALE state,

If you are in DELAY or PROBE state, and receive an ARP request,
"arp_process" may call "neigh_event_ns" which calls "neigh_update" 
requesting STALE state.

All the tests to change to something else fails, so the neighbour state 
becomes STALE.

Is this intentional, or a bug?

Best Regards,
Ulf Samuelsson

On 04/22/2015 01:49 PM, Ulf Samuelsson wrote:
>
> On 04/22/2015 12:46 PM, YOSHIFUJI Hideaki wrote:
>> Ulf Samuelsson wrote:
>>> On 04/21/2015 05:58 AM, YOSHIFUJI Hideaki wrote:
>>>> Ulf Samuelsson wrote:
>>>>>> How many neighbors do you want to maintain?
>>>>>> I guess you have to increase the number of gc_thresh1.
>>>>> The current use cases have up to 2048 entries.
>>>>> This is expected to grow in the future.
>>>>> The 3.4 kernel used in the system today is limited to 1024,
>>>>> but that has been raised to about 10k.
>>>>>
>>>>> The gc_thresh1 test is not implemented in 3.4 but can be backported,
>>>>> but still not convinced it is a good idea.
>>>> Why?
>>>>
>>> A good solution makes sure that:
>>> * equipment which is connected NEVER IS garbage collected
>>> * equipment which is disconnected IS garbage collected.
>>>
>>> The threshold idea does not meet the criteria for a good solution.
>> We try providing "good solution" if you have less than gc_thresh1
>> entries only.  Otherwise, we try hard to protect ourselves.
>
> Protect against what?
>
>
>>
>>> With this solution you keep unnecessary entries in the table.
>>> If you ever pass the limit, then equipment which should not
>>> be garbage collected may be.
>>> It relies on someone keeping track of traffic loss,
>>> so needs more maintenance by the SysOp.try pr
>>>
>>> The ARP probes should be considered to be NECESSARY traffic
>>> to maintain a quality link.
>>> Obviously not everyone would want to make this trade-off.
>>>
>>>
>>>>> To complicate things, one requirement is that for some interfaces
>>>>> you always want to keep things alive, if connected, but
>>>>> for other interfaces you want things to be removed
>>>>> to conserve memory.
>>>>> Actually you would want to do this selection on a subnet level.
>>>> If you want to introduce per-interface parameter, I am okay with it.
>>>>
>>>>> Internal discussions resulted in a proposal to change the patch,
>>>>> so that you have a "keepalive" flag which is tested after
>>>>> it has been decided to exit the REACHABLE state.
>>>>>
>>>>> if the "keepalive" flag is set, you always go to DELAY state from 
>>>>> REACHABLE.
>>>> No.
>>>>
>>> And why is it a bad idea to have a high quality connection?
>> We reclaim neighbor entries as much as possible to protect
>> ourselves if the number is below gc_thresh1.  We could stop
>> purging entries, but the idea was rejected AFAIK.  That is
>> our design.
> No you keep disconnected entries which can be removed
> in the table as long as you are below gc_tresh1, so you waste memory.
>
> This is a trade-off to reduce communication over the wire
> at the expense of quality of service.
>
> If you can live with the drawback of extra bandwidth usage,
> then doing probes before you remove an entry will reduce
> memory usage and improve the quality of the communication.
>
> Why do you want to decide which trade-off is best?
> Why not let the system designer make that decision?
>
> I am not asking to stop purging entries.
> Entries which are not needed *should* be purged.
> But entries which are needed should not be purged,
> because that means traffic loss.
> This design does not differentiate between the two
> providing unnecessary risks and wastes memory.
>
> We know beforehand that if a neighbour is connected
> to a certain interface, it should always be present in the table.
> You do not want us to use that knowledge when configuring the kernel.
>
> I think that the idea that was rejected must have been something 
> different.
>>
>> Again, you should increase gc_thresh1, first.
>>
> To what number?
> When equipment is delivered from a factory, it is not known
> how many entries are needed, and if it becomes known, it can later grow.
> Why add extra maintenance to the system?
>
>
> Best Regards,
> Ulf Samuelsson
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-17 12:31         ` YOSHIFUJI Hideaki
@ 2015-03-17 23:27           ` Ulf Samuelsson
  0 siblings, 0 replies; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-17 23:27 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, ulf, netdev; +Cc: YOSHIFUJI Hideaki

Den 2015-03-17 13:31, YOSHIFUJI Hideaki skrev:
> Hello,
>
> Ulf Samuelsson wrote:
>> Den 2015-03-16 05:57, YOSHIFUJI Hideaki/吉藤英明 skrev:
>>> Hello.
>>>
>>> Ulf Samuelsson wrote:
>>>> Den 2015-03-15 09:27, YOSHIFUJI Hideaki skrev:
>>>>> Hello.
>>>>>
>>>>> Ulf Samuelsson wrote:
>>>>>> From: Ulf Samuelsson <ulf@emagii.com>
>>>>>>
>>>>>> The neighbour state is changed in the ARP timer handler.
>>>>>> If the state is changed to NUD_STALE, then the neighbour
>>>>>> entry becomes a candidate for garbage collection.
>>>>>>
>>>>>> The garbage collection is handled by a "periodic work" routine.
>>>>>>
>>>>>> When :
>>>>>>
>>>>>>     * noone refers to the entry
>>>>>>     * the state is no longer valid (I.E: NUD_STALE).
>>>>> NUD_STALE is still valid.
>>>> Yes, my fault.
>>>> The condition which causes garbage collection to be skipped is.
>>>>
>>>>
>>>>      if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>>>>
>>>>      NUD_STALE is not part of that, so GC will not be skipped,
>>>>      and therefore the patch is needed if you want to be able
>>>>      to use the API to modify the neigh statemachine..
>>>>>
>>>>>>     * the timeout value has  been reached or state is FAILED
>>>>>>
>>>>>> the "periodic work" routine will notify
>>>>>> the stack that the entry should be deleted.
>>>>>>
>>>>>> A user application monitoring and controlling the neighbour table
>>>>>> using NETLINK may fail, if the "period work" routine is run
>>>>>> directly after the state has been changed to NUD_STALE,
>>>>>> but before the user application has had a chance to change
>>>>>> the state to something valid.
>>>>>>
>>>>>> The "period work" routine will detect the NUD_STALE state
>>>>>> and if the timeout value has been reached, it will notify the stack
>>>>>> that the entry should be deleted.
>>>>>>
>>>>>> The patch adds a check in the periodic work routine
>>>>>> which will skip test for garbage collection
>>>>>> unless a number of ticks has passed since the last time
>>>>>> the neighbour entry state was changed.
>>>>>>
>>>>>> The feature is controlled through Kconfig
>>>>>>
>>>>>> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
>>>>>> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
>>>>>> Default time is 100 ms if HZ_### is set.
>>>>> We have "lower limit" not to start releasing neighbour entries.
>>>>> Try increasing gc_thresh1.
>>>> Why would  that work?
>>>>
>>>> The only place where this is used is
>>>>
>>>>      "if (atomic_read(&tbl->entries) < tbl->gc_thresh1)"
>>>>
>>>> tbl->entries is related to how many entries there are in the 
>>>> neighbour table.
>>>>
>>>> The only way I think this would work, is if this is raised so high 
>>>> that
>>>> garbage collection does not occur.
>>>>
>>>> That is not the intention.
>>>>
>>>> It does not solve the race condition between the timer_handler and 
>>>> the periodic_work.
>>>
>>> I don't think it is a race.
>> And I think you are wrong and my logging shows that it is.
>
> We do not gurantee holding "stale" entries more than
> gc_thresh1, so it shall not be called as a race at all.


And that is a problem, which results in traffic loss.

>
>>
>>>
>>> You can try increasing gc_staletime to hold each entry based
>>> on last usage.  Plus, you can "confirm" neighbors by
>>> MSG_CONFIRM.
>>>
>>> Note that if the number of entries becomes high, "forced GC" will
>>> drop valid, "not connected" entries as well.
>>
>> I can try increasing gc_staletime, but its a waste of time, because 
>> it is not the last usage which is interesting.
>> What is interesting is the time when the entry state was updated by 
>> the timer handler.
>>
>> Pls explain further MSG_CONFIRM.
>
> Typically, base reachable time is set to 30sec and gc_staletime is
> set to 60sec.  So, entries are expected to be held for a while
> after it has become "stale", no?
>
> MSG_CONFIRM is a sendmsg() flag.  It allows user-space application
> to confirm reachability of neighbor.  It refreshes "confirmed"
> time. In neigh_periodic_work(), "used" time is updated to
> "confirmed" time if "used" time is before "confirmed" time.
>
May work, if the application was totally redesigned,

> ping(8), ping6(8), tftpd(8) use that flag, for example.
>
>
>> The problem occurs when the periodic_work routine runs immediately
>> after the timer handler has changes the state to NUD_STALE and
>> the entry has reached the expiry time.
>
> It is what we expect.
>
> In neigh_periodic_work(), you may try to release non-STALE
> entries first, and then STALE entries if the number of entries is
> still high.
That sounds much more complex than the proposed fix.

>
> --yoshfuji

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-16 19:55       ` Ulf Samuelsson
@ 2015-03-17 12:31         ` YOSHIFUJI Hideaki
  2015-03-17 23:27           ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-03-17 12:31 UTC (permalink / raw)
  To: ulf, Ulf Samuelsson, netdev; +Cc: hideaki.yoshifuji, YOSHIFUJI Hideaki

Hello,

Ulf Samuelsson wrote:
> Den 2015-03-16 05:57, YOSHIFUJI Hideaki/吉藤英明 skrev:
>> Hello.
>>
>> Ulf Samuelsson wrote:
>>> Den 2015-03-15 09:27, YOSHIFUJI Hideaki skrev:
>>>> Hello.
>>>>
>>>> Ulf Samuelsson wrote:
>>>>> From: Ulf Samuelsson <ulf@emagii.com>
>>>>>
>>>>> The neighbour state is changed in the ARP timer handler.
>>>>> If the state is changed to NUD_STALE, then the neighbour
>>>>> entry becomes a candidate for garbage collection.
>>>>>
>>>>> The garbage collection is handled by a "periodic work" routine.
>>>>>
>>>>> When :
>>>>>
>>>>>     * noone refers to the entry
>>>>>     * the state is no longer valid (I.E: NUD_STALE).
>>>> NUD_STALE is still valid.
>>> Yes, my fault.
>>> The condition which causes garbage collection to be skipped is.
>>>
>>>
>>>      if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>>>
>>>      NUD_STALE is not part of that, so GC will not be skipped,
>>>      and therefore the patch is needed if you want to be able
>>>      to use the API to modify the neigh statemachine..
>>>>
>>>>>     * the timeout value has  been reached or state is FAILED
>>>>>
>>>>> the "periodic work" routine will notify
>>>>> the stack that the entry should be deleted.
>>>>>
>>>>> A user application monitoring and controlling the neighbour table
>>>>> using NETLINK may fail, if the "period work" routine is run
>>>>> directly after the state has been changed to NUD_STALE,
>>>>> but before the user application has had a chance to change
>>>>> the state to something valid.
>>>>>
>>>>> The "period work" routine will detect the NUD_STALE state
>>>>> and if the timeout value has been reached, it will notify the stack
>>>>> that the entry should be deleted.
>>>>>
>>>>> The patch adds a check in the periodic work routine
>>>>> which will skip test for garbage collection
>>>>> unless a number of ticks has passed since the last time
>>>>> the neighbour entry state was changed.
>>>>>
>>>>> The feature is controlled through Kconfig
>>>>>
>>>>> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
>>>>> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
>>>>> Default time is 100 ms if HZ_### is set.
>>>> We have "lower limit" not to start releasing neighbour entries.
>>>> Try increasing gc_thresh1.
>>> Why would  that work?
>>>
>>> The only place where this is used is
>>>
>>>      "if (atomic_read(&tbl->entries) < tbl->gc_thresh1)"
>>>
>>> tbl->entries is related to how many entries there are in the neighbour table.
>>>
>>> The only way I think this would work, is if this is raised so high that
>>> garbage collection does not occur.
>>>
>>> That is not the intention.
>>>
>>> It does not solve the race condition between the timer_handler and the periodic_work.
>>
>> I don't think it is a race.
> And I think you are wrong and my logging shows that it is.

We do not gurantee holding "stale" entries more than
gc_thresh1, so it shall not be called as a race at all.

>
>>
>> You can try increasing gc_staletime to hold each entry based
>> on last usage.  Plus, you can "confirm" neighbors by
>> MSG_CONFIRM.
>>
>> Note that if the number of entries becomes high, "forced GC" will
>> drop valid, "not connected" entries as well.
>
> I can try increasing gc_staletime, but its a waste of time, because it is not the last usage which is interesting.
> What is interesting is the time when the entry state was updated by the timer handler.
>
> Pls explain further MSG_CONFIRM.

Typically, base reachable time is set to 30sec and gc_staletime is
set to 60sec.  So, entries are expected to be held for a while
after it has become "stale", no?

MSG_CONFIRM is a sendmsg() flag.  It allows user-space application
to confirm reachability of neighbor.  It refreshes "confirmed"
time. In neigh_periodic_work(), "used" time is updated to
"confirmed" time if "used" time is before "confirmed" time.

ping(8), ping6(8), tftpd(8) use that flag, for example.


> The problem occurs when the periodic_work routine runs immediately
> after the timer handler has changes the state to NUD_STALE and
> the entry has reached the expiry time.

It is what we expect.

In neigh_periodic_work(), you may try to release non-STALE
entries first, and then STALE entries if the number of entries is
still high.

--yoshfuji

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-16  4:57     ` YOSHIFUJI Hideaki/吉藤英明
@ 2015-03-16 19:55       ` Ulf Samuelsson
  2015-03-17 12:31         ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-16 19:55 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki/吉藤英明,
	Ulf Samuelsson, netdev
  Cc: YOSHIFUJI Hideaki

Den 2015-03-16 05:57, YOSHIFUJI Hideaki/吉藤英明 skrev:
> Hello.
>
> Ulf Samuelsson wrote:
>> Den 2015-03-15 09:27, YOSHIFUJI Hideaki skrev:
>>> Hello.
>>>
>>> Ulf Samuelsson wrote:
>>>> From: Ulf Samuelsson <ulf@emagii.com>
>>>>
>>>> The neighbour state is changed in the ARP timer handler.
>>>> If the state is changed to NUD_STALE, then the neighbour
>>>> entry becomes a candidate for garbage collection.
>>>>
>>>> The garbage collection is handled by a "periodic work" routine.
>>>>
>>>> When :
>>>>
>>>>     * noone refers to the entry
>>>>     * the state is no longer valid (I.E: NUD_STALE).
>>> NUD_STALE is still valid.
>> Yes, my fault.
>> The condition which causes garbage collection to be skipped is.
>>
>>
>>      if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>>
>>      NUD_STALE is not part of that, so GC will not be skipped,
>>      and therefore the patch is needed if you want to be able
>>      to use the API to modify the neigh statemachine..
>>>
>>>>     * the timeout value has  been reached or state is FAILED
>>>>
>>>> the "periodic work" routine will notify
>>>> the stack that the entry should be deleted.
>>>>
>>>> A user application monitoring and controlling the neighbour table
>>>> using NETLINK may fail, if the "period work" routine is run
>>>> directly after the state has been changed to NUD_STALE,
>>>> but before the user application has had a chance to change
>>>> the state to something valid.
>>>>
>>>> The "period work" routine will detect the NUD_STALE state
>>>> and if the timeout value has been reached, it will notify the stack
>>>> that the entry should be deleted.
>>>>
>>>> The patch adds a check in the periodic work routine
>>>> which will skip test for garbage collection
>>>> unless a number of ticks has passed since the last time
>>>> the neighbour entry state was changed.
>>>>
>>>> The feature is controlled through Kconfig
>>>>
>>>> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
>>>> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
>>>> Default time is 100 ms if HZ_### is set.
>>> We have "lower limit" not to start releasing neighbour entries.
>>> Try increasing gc_thresh1.
>> Why would  that work?
>>
>> The only place where this is used is
>>
>>      "if (atomic_read(&tbl->entries) < tbl->gc_thresh1)"
>>
>> tbl->entries is related to how many entries there are in the 
>> neighbour table.
>>
>> The only way I think this would work, is if this is raised so high that
>> garbage collection does not occur.
>>
>> That is not the intention.
>>
>> It does not solve the race condition between the timer_handler and 
>> the periodic_work.
>
> I don't think it is a race.
And I think you are wrong and my logging shows that it is.

>
> You can try increasing gc_staletime to hold each entry based
> on last usage.  Plus, you can "confirm" neighbors by
> MSG_CONFIRM.
>
> Note that if the number of entries becomes high, "forced GC" will
> drop valid, "not connected" entries as well.

I can try increasing gc_staletime, but its a waste of time, because it 
is not the last usage which is interesting.
What is interesting is the time when the entry state was updated by the 
timer handler.

Pls explain further MSG_CONFIRM.

The problem occurs when the periodic_work routine runs immediately
after the timer handler has changes the state to NUD_STALE and
the entry has reached the expiry time.

I suggest you read the code, and you will understand what I mean.



Ulf

>
> --yoshfuji
>
>>
>> BR
>> Ulf Samuelsson
>>
>>>
>>> --yoshfuji
>>>
>>>> Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
>>>> ---
>>>>   net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
>>>>   net/core/neighbour.c |   15 ++++++++++++---
>>>>   2 files changed, 44 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/Kconfig b/net/Kconfig
>>>> index 44dd578..099a5dd 100644
>>>> --- a/net/Kconfig
>>>> +++ b/net/Kconfig
>>>> @@ -77,6 +77,38 @@ config INET
>>>>         Short answer: say Y.
>>>>   if INET
>>>> +
>>>> +#
>>>> +# Core Network configuration
>>>> +#
>>>> +
>>>> +config ARP_GC_APPLY_GUARDBAND
>>>> +    bool "IP: ARP: Avoid garbage collection directly after state 
>>>> change"
>>>> +    default n
>>>> +    ---help---
>>>> +      With this item selected, an entry in the neighbour table
>>>> +      will not be garbage collected directly after the ARP state
>>>> +      has changed to STALE of FAILED
>>>> +      This allows an application program change the state to 
>>>> something valid
>>>> +      before garbage colllection occurs.
>>>> +
>>>> +      If unsure, say N.
>>>> +
>>>> +config ARP_GC_GUARDBAND
>>>> +    int "Guardband time on garbage collection"
>>>> +    depends on ARP_GC_APPLY_GUARDBAND
>>>> +    default 10 if HZ_100
>>>> +    default 25 if HZ_250
>>>> +    default 30 if HZ_300
>>>> +    default 100 if HZ_1000
>>>> +    default 100
>>>> +
>>>> +    ---help---
>>>> +      The number of ticks to delay garbage collection
>>>> +      after the neighbour entry has been updated
>>>> +      A delay of 100 ms is reasonable.
>>>> +      With CONFIG_HZ = 250, this value should be 25
>>>> +
>>>>   source "net/ipv4/Kconfig"
>>>>   source "net/ipv6/Kconfig"
>>>>   source "net/netlabel/Kconfig"
>>>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>>>> index 70fe9e1..194195d 100644
>>>> --- a/net/core/neighbour.c
>>>> +++ b/net/core/neighbour.c
>>>> @@ -786,13 +786,23 @@ static void neigh_periodic_work(struct 
>>>> work_struct *work)
>>>>               state = n->nud_state;
>>>>               if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>>>> -                write_unlock(&n->lock);
>>>>                   goto next_elt;
>>>>               }
>>>>               if (time_before(n->used, n->confirmed))
>>>>                   n->used = n->confirmed;
>>>> +#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
>>>> +            /* Do not garbage collect directly after we
>>>> +             * updated n->state to allow applications to
>>>> +             * react to the event
>>>> +             */
>>>> +            if (time_before(jiffies,
>>>> +                    n->updated + CONFIG_ARP_GC_GUARDBAND)) {
>>>> +                goto next_elt;
>>>> +            }
>>>> +#endif
>>>> +
>>>>               if (atomic_read(&n->refcnt) == 1 &&
>>>>                   (state == NUD_FAILED ||
>>>>                    time_after(jiffies, n->used + 
>>>> NEIGH_VAR(n->parms, GC_STALETIME)))) {
>>>> @@ -802,9 +812,8 @@ static void neigh_periodic_work(struct 
>>>> work_struct *work)
>>>>                   neigh_cleanup_and_release(n);
>>>>                   continue;
>>>>               }
>>>> -            write_unlock(&n->lock);
>>>> -
>>>>   next_elt:
>>>> +            write_unlock(&n->lock);
>>>>               np = &n->next;
>>>>           }
>>>>           /*
>>>>
>>
>>
>


-- 
Best Regards
Ulf Samuelsson
ulf@emagii.com
+46 722 427437

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-15 19:34   ` Ulf Samuelsson
@ 2015-03-16  4:57     ` YOSHIFUJI Hideaki/吉藤英明
  2015-03-16 19:55       ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki/吉藤英明 @ 2015-03-16  4:57 UTC (permalink / raw)
  To: ulf, Ulf Samuelsson, netdev; +Cc: YOSHIFUJI Hideaki, hideaki.yoshifuji

Hello.

Ulf Samuelsson wrote:
> Den 2015-03-15 09:27, YOSHIFUJI Hideaki skrev:
>> Hello.
>>
>> Ulf Samuelsson wrote:
>>> From: Ulf Samuelsson <ulf@emagii.com>
>>>
>>> The neighbour state is changed in the ARP timer handler.
>>> If the state is changed to NUD_STALE, then the neighbour
>>> entry becomes a candidate for garbage collection.
>>>
>>> The garbage collection is handled by a "periodic work" routine.
>>>
>>> When :
>>>
>>>     * noone refers to the entry
>>>     * the state is no longer valid (I.E: NUD_STALE).
>> NUD_STALE is still valid.
> Yes, my fault.
> The condition which causes garbage collection to be skipped is.
>
>
>      if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>
>      NUD_STALE is not part of that, so GC will not be skipped,
>      and therefore the patch is needed if you want to be able
>      to use the API to modify the neigh statemachine..
>>
>>>     * the timeout value has  been reached or state is FAILED
>>>
>>> the "periodic work" routine will notify
>>> the stack that the entry should be deleted.
>>>
>>> A user application monitoring and controlling the neighbour table
>>> using NETLINK may fail, if the "period work" routine is run
>>> directly after the state has been changed to NUD_STALE,
>>> but before the user application has had a chance to change
>>> the state to something valid.
>>>
>>> The "period work" routine will detect the NUD_STALE state
>>> and if the timeout value has been reached, it will notify the stack
>>> that the entry should be deleted.
>>>
>>> The patch adds a check in the periodic work routine
>>> which will skip test for garbage collection
>>> unless a number of ticks has passed since the last time
>>> the neighbour entry state was changed.
>>>
>>> The feature is controlled through Kconfig
>>>
>>> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
>>> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
>>> Default time is 100 ms if HZ_### is set.
>> We have "lower limit" not to start releasing neighbour entries.
>> Try increasing gc_thresh1.
> Why would  that work?
>
> The only place where this is used is
>
>      "if (atomic_read(&tbl->entries) < tbl->gc_thresh1)"
>
> tbl->entries is related to how many entries there are in the neighbour table.
>
> The only way I think this would work, is if this is raised so high that
> garbage collection does not occur.
>
> That is not the intention.
>
> It does not solve the race condition between the timer_handler and the periodic_work.

I don't think it is a race.

You can try increasing gc_staletime to hold each entry based
on last usage.  Plus, you can "confirm" neighbors by
MSG_CONFIRM.

Note that if the number of entries becomes high, "forced GC" will
drop valid, "not connected" entries as well.

--yoshfuji

>
> BR
> Ulf Samuelsson
>
>>
>> --yoshfuji
>>
>>> Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
>>> ---
>>>   net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
>>>   net/core/neighbour.c |   15 ++++++++++++---
>>>   2 files changed, 44 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/Kconfig b/net/Kconfig
>>> index 44dd578..099a5dd 100644
>>> --- a/net/Kconfig
>>> +++ b/net/Kconfig
>>> @@ -77,6 +77,38 @@ config INET
>>>         Short answer: say Y.
>>>   if INET
>>> +
>>> +#
>>> +# Core Network configuration
>>> +#
>>> +
>>> +config ARP_GC_APPLY_GUARDBAND
>>> +    bool "IP: ARP: Avoid garbage collection directly after state change"
>>> +    default n
>>> +    ---help---
>>> +      With this item selected, an entry in the neighbour table
>>> +      will not be garbage collected directly after the ARP state
>>> +      has changed to STALE of FAILED
>>> +      This allows an application program change the state to something valid
>>> +      before garbage colllection occurs.
>>> +
>>> +      If unsure, say N.
>>> +
>>> +config ARP_GC_GUARDBAND
>>> +    int "Guardband time on garbage collection"
>>> +    depends on ARP_GC_APPLY_GUARDBAND
>>> +    default 10 if HZ_100
>>> +    default 25 if HZ_250
>>> +    default 30 if HZ_300
>>> +    default 100 if HZ_1000
>>> +    default 100
>>> +
>>> +    ---help---
>>> +      The number of ticks to delay garbage collection
>>> +      after the neighbour entry has been updated
>>> +      A delay of 100 ms is reasonable.
>>> +      With CONFIG_HZ = 250, this value should be 25
>>> +
>>>   source "net/ipv4/Kconfig"
>>>   source "net/ipv6/Kconfig"
>>>   source "net/netlabel/Kconfig"
>>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>>> index 70fe9e1..194195d 100644
>>> --- a/net/core/neighbour.c
>>> +++ b/net/core/neighbour.c
>>> @@ -786,13 +786,23 @@ static void neigh_periodic_work(struct work_struct *work)
>>>               state = n->nud_state;
>>>               if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>>> -                write_unlock(&n->lock);
>>>                   goto next_elt;
>>>               }
>>>               if (time_before(n->used, n->confirmed))
>>>                   n->used = n->confirmed;
>>> +#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
>>> +            /* Do not garbage collect directly after we
>>> +             * updated n->state to allow applications to
>>> +             * react to the event
>>> +             */
>>> +            if (time_before(jiffies,
>>> +                    n->updated + CONFIG_ARP_GC_GUARDBAND)) {
>>> +                goto next_elt;
>>> +            }
>>> +#endif
>>> +
>>>               if (atomic_read(&n->refcnt) == 1 &&
>>>                   (state == NUD_FAILED ||
>>>                    time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
>>> @@ -802,9 +812,8 @@ static void neigh_periodic_work(struct work_struct *work)
>>>                   neigh_cleanup_and_release(n);
>>>                   continue;
>>>               }
>>> -            write_unlock(&n->lock);
>>> -
>>>   next_elt:
>>> +            write_unlock(&n->lock);
>>>               np = &n->next;
>>>           }
>>>           /*
>>>
>
>

-- 
吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
ミラクル・リナックス株式会社 技術本部 サポート部

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-15  8:27 ` YOSHIFUJI Hideaki
@ 2015-03-15 19:34   ` Ulf Samuelsson
  2015-03-16  4:57     ` YOSHIFUJI Hideaki/吉藤英明
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-15 19:34 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki, Ulf Samuelsson, netdev

Den 2015-03-15 09:27, YOSHIFUJI Hideaki skrev:
> Hello.
>
> Ulf Samuelsson wrote:
>> From: Ulf Samuelsson <ulf@emagii.com>
>>
>> The neighbour state is changed in the ARP timer handler.
>> If the state is changed to NUD_STALE, then the neighbour
>> entry becomes a candidate for garbage collection.
>>
>> The garbage collection is handled by a "periodic work" routine.
>>
>> When :
>>
>> 	* noone refers to the entry
>> 	* the state is no longer valid (I.E: NUD_STALE).
> NUD_STALE is still valid.
Yes, my fault.
The condition which causes garbage collection to be skipped is.


     if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {

     NUD_STALE is not part of that, so GC will not be skipped,
     and therefore the patch is needed if you want to be able
     to use the API to modify the neigh statemachine..
>
>> 	* the timeout value has  been reached or state is FAILED
>>
>> the "periodic work" routine will notify
>> the stack that the entry should be deleted.
>>
>> A user application monitoring and controlling the neighbour table
>> using NETLINK may fail, if the "period work" routine is run
>> directly after the state has been changed to NUD_STALE,
>> but before the user application has had a chance to change
>> the state to something valid.
>>
>> The "period work" routine will detect the NUD_STALE state
>> and if the timeout value has been reached, it will notify the stack
>> that the entry should be deleted.
>>
>> The patch adds a check in the periodic work routine
>> which will skip test for garbage collection
>> unless a number of ticks has passed since the last time
>> the neighbour entry state was changed.
>>
>> The feature is controlled through Kconfig
>>
>> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
>> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
>> Default time is 100 ms if HZ_### is set.
> We have "lower limit" not to start releasing neighbour entries.
> Try increasing gc_thresh1.
Why would  that work?

The only place where this is used is

     "if (atomic_read(&tbl->entries) < tbl->gc_thresh1)"

tbl->entries is related to how many entries there are in the neighbour 
table.

The only way I think this would work, is if this is raised so high that
garbage collection does not occur.

That is not the intention.

It does not solve the race condition between the timer_handler and the 
periodic_work.

BR
Ulf Samuelsson

>
> --yoshfuji
>
>> Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
>> ---
>>   net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
>>   net/core/neighbour.c |   15 ++++++++++++---
>>   2 files changed, 44 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/Kconfig b/net/Kconfig
>> index 44dd578..099a5dd 100644
>> --- a/net/Kconfig
>> +++ b/net/Kconfig
>> @@ -77,6 +77,38 @@ config INET
>>   	  Short answer: say Y.
>>   
>>   if INET
>> +
>> +#
>> +# Core Network configuration
>> +#
>> +
>> +config ARP_GC_APPLY_GUARDBAND
>> +	bool "IP: ARP: Avoid garbage collection directly after state change"
>> +	default n
>> +	---help---
>> +	  With this item selected, an entry in the neighbour table
>> +	  will not be garbage collected directly after the ARP state
>> +	  has changed to STALE of FAILED
>> +	  This allows an application program change the state to something valid
>> +	  before garbage colllection occurs.
>> +
>> +	  If unsure, say N.
>> +
>> +config ARP_GC_GUARDBAND
>> +	int "Guardband time on garbage collection"
>> +	depends on ARP_GC_APPLY_GUARDBAND
>> +	default 10 if HZ_100
>> +	default 25 if HZ_250
>> +	default 30 if HZ_300
>> +	default 100 if HZ_1000
>> +	default 100
>> +
>> +	---help---
>> +	  The number of ticks to delay garbage collection
>> +	  after the neighbour entry has been updated
>> +	  A delay of 100 ms is reasonable.
>> +	  With CONFIG_HZ = 250, this value should be 25
>> +
>>   source "net/ipv4/Kconfig"
>>   source "net/ipv6/Kconfig"
>>   source "net/netlabel/Kconfig"
>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>> index 70fe9e1..194195d 100644
>> --- a/net/core/neighbour.c
>> +++ b/net/core/neighbour.c
>> @@ -786,13 +786,23 @@ static void neigh_periodic_work(struct work_struct *work)
>>   
>>   			state = n->nud_state;
>>   			if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
>> -				write_unlock(&n->lock);
>>   				goto next_elt;
>>   			}
>>   
>>   			if (time_before(n->used, n->confirmed))
>>   				n->used = n->confirmed;
>>   
>> +#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
>> +			/* Do not garbage collect directly after we
>> +			 * updated n->state to allow applications to
>> +			 * react to the event
>> +			 */
>> +			if (time_before(jiffies,
>> +					n->updated + CONFIG_ARP_GC_GUARDBAND)) {
>> +				goto next_elt;
>> +			}
>> +#endif
>> +
>>   			if (atomic_read(&n->refcnt) == 1 &&
>>   			    (state == NUD_FAILED ||
>>   			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
>> @@ -802,9 +812,8 @@ static void neigh_periodic_work(struct work_struct *work)
>>   				neigh_cleanup_and_release(n);
>>   				continue;
>>   			}
>> -			write_unlock(&n->lock);
>> -
>>   next_elt:
>> +			write_unlock(&n->lock);
>>   			np = &n->next;
>>   		}
>>   		/*
>>


-- 
Best Regards
Ulf Samuelsson
ulf@emagii.com
+46 722 427437

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] neighbour.c: Avoid GC directly after state change
  2015-03-11 20:28 Ulf Samuelsson
@ 2015-03-15  8:27 ` YOSHIFUJI Hideaki
  2015-03-15 19:34   ` Ulf Samuelsson
  0 siblings, 1 reply; 22+ messages in thread
From: YOSHIFUJI Hideaki @ 2015-03-15  8:27 UTC (permalink / raw)
  To: Ulf Samuelsson, netdev; +Cc: Ulf Samuelsson, YOSHIFUJI Hideaki

Hello.

Ulf Samuelsson wrote:
> From: Ulf Samuelsson <ulf@emagii.com>
> 
> The neighbour state is changed in the ARP timer handler.
> If the state is changed to NUD_STALE, then the neighbour
> entry becomes a candidate for garbage collection.
> 
> The garbage collection is handled by a "periodic work" routine.
> 
> When :
> 
> 	* noone refers to the entry
> 	* the state is no longer valid (I.E: NUD_STALE).

NUD_STALE is still valid.

> 	* the timeout value has  been reached or state is FAILED
> 
> the "periodic work" routine will notify
> the stack that the entry should be deleted.
> 
> A user application monitoring and controlling the neighbour table
> using NETLINK may fail, if the "period work" routine is run
> directly after the state has been changed to NUD_STALE,
> but before the user application has had a chance to change
> the state to something valid.
> 
> The "period work" routine will detect the NUD_STALE state
> and if the timeout value has been reached, it will notify the stack
> that the entry should be deleted.
> 
> The patch adds a check in the periodic work routine
> which will skip test for garbage collection
> unless a number of ticks has passed since the last time
> the neighbour entry state was changed.
> 
> The feature is controlled through Kconfig
> 
> The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
> The guardband time (in ticks) is set in ARP_GC_GUARDBAND
> Default time is 100 ms if HZ_### is set.

We have "lower limit" not to start releasing neighbour entries.
Try increasing gc_thresh1.

--yoshfuji

> 
> Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
> ---
>  net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
>  net/core/neighbour.c |   15 ++++++++++++---
>  2 files changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/net/Kconfig b/net/Kconfig
> index 44dd578..099a5dd 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -77,6 +77,38 @@ config INET
>  	  Short answer: say Y.
>  
>  if INET
> +
> +#
> +# Core Network configuration
> +#
> +
> +config ARP_GC_APPLY_GUARDBAND
> +	bool "IP: ARP: Avoid garbage collection directly after state change"
> +	default n
> +	---help---
> +	  With this item selected, an entry in the neighbour table
> +	  will not be garbage collected directly after the ARP state
> +	  has changed to STALE of FAILED
> +	  This allows an application program change the state to something valid
> +	  before garbage colllection occurs.
> +
> +	  If unsure, say N.
> +
> +config ARP_GC_GUARDBAND
> +	int "Guardband time on garbage collection"
> +	depends on ARP_GC_APPLY_GUARDBAND
> +	default 10 if HZ_100
> +	default 25 if HZ_250
> +	default 30 if HZ_300
> +	default 100 if HZ_1000
> +	default 100
> +
> +	---help---
> +	  The number of ticks to delay garbage collection
> +	  after the neighbour entry has been updated
> +	  A delay of 100 ms is reasonable.
> +	  With CONFIG_HZ = 250, this value should be 25
> +
>  source "net/ipv4/Kconfig"
>  source "net/ipv6/Kconfig"
>  source "net/netlabel/Kconfig"
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 70fe9e1..194195d 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -786,13 +786,23 @@ static void neigh_periodic_work(struct work_struct *work)
>  
>  			state = n->nud_state;
>  			if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
> -				write_unlock(&n->lock);
>  				goto next_elt;
>  			}
>  
>  			if (time_before(n->used, n->confirmed))
>  				n->used = n->confirmed;
>  
> +#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
> +			/* Do not garbage collect directly after we
> +			 * updated n->state to allow applications to
> +			 * react to the event
> +			 */
> +			if (time_before(jiffies,
> +					n->updated + CONFIG_ARP_GC_GUARDBAND)) {
> +				goto next_elt;
> +			}
> +#endif
> +
>  			if (atomic_read(&n->refcnt) == 1 &&
>  			    (state == NUD_FAILED ||
>  			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
> @@ -802,9 +812,8 @@ static void neigh_periodic_work(struct work_struct *work)
>  				neigh_cleanup_and_release(n);
>  				continue;
>  			}
> -			write_unlock(&n->lock);
> -
>  next_elt:
> +			write_unlock(&n->lock);
>  			np = &n->next;
>  		}
>  		/*
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] neighbour.c: Avoid GC directly after state change
@ 2015-03-11 20:28 Ulf Samuelsson
  2015-03-15  8:27 ` YOSHIFUJI Hideaki
  0 siblings, 1 reply; 22+ messages in thread
From: Ulf Samuelsson @ 2015-03-11 20:28 UTC (permalink / raw)
  To: netdev; +Cc: Ulf Samuelsson

From: Ulf Samuelsson <ulf@emagii.com>

The neighbour state is changed in the ARP timer handler.
If the state is changed to NUD_STALE, then the neighbour
entry becomes a candidate for garbage collection.

The garbage collection is handled by a "periodic work" routine.

When :

	* noone refers to the entry
	* the state is no longer valid (I.E: NUD_STALE).
	* the timeout value has  been reached or state is FAILED

the "periodic work" routine will notify
the stack that the entry should be deleted.

A user application monitoring and controlling the neighbour table
using NETLINK may fail, if the "period work" routine is run
directly after the state has been changed to NUD_STALE,
but before the user application has had a chance to change
the state to something valid.

The "period work" routine will detect the NUD_STALE state
and if the timeout value has been reached, it will notify the stack
that the entry should be deleted.

The patch adds a check in the periodic work routine
which will skip test for garbage collection
unless a number of ticks has passed since the last time
the neighbour entry state was changed.

The feature is controlled through Kconfig

The featuree is enabled by setting ARP_GC_APPLY_GUARDBAND
The guardband time (in ticks) is set in ARP_GC_GUARDBAND
Default time is 100 ms if HZ_### is set.

Signed-off-by: Ulf Samuelsson <ulf@emagii.com>
---
 net/Kconfig          |   32 ++++++++++++++++++++++++++++++++
 net/core/neighbour.c |   15 ++++++++++++---
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 44dd578..099a5dd 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -77,6 +77,38 @@ config INET
 	  Short answer: say Y.
 
 if INET
+
+#
+# Core Network configuration
+#
+
+config ARP_GC_APPLY_GUARDBAND
+	bool "IP: ARP: Avoid garbage collection directly after state change"
+	default n
+	---help---
+	  With this item selected, an entry in the neighbour table
+	  will not be garbage collected directly after the ARP state
+	  has changed to STALE of FAILED
+	  This allows an application program change the state to something valid
+	  before garbage colllection occurs.
+
+	  If unsure, say N.
+
+config ARP_GC_GUARDBAND
+	int "Guardband time on garbage collection"
+	depends on ARP_GC_APPLY_GUARDBAND
+	default 10 if HZ_100
+	default 25 if HZ_250
+	default 30 if HZ_300
+	default 100 if HZ_1000
+	default 100
+
+	---help---
+	  The number of ticks to delay garbage collection
+	  after the neighbour entry has been updated
+	  A delay of 100 ms is reasonable.
+	  With CONFIG_HZ = 250, this value should be 25
+
 source "net/ipv4/Kconfig"
 source "net/ipv6/Kconfig"
 source "net/netlabel/Kconfig"
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 70fe9e1..194195d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -786,13 +786,23 @@ static void neigh_periodic_work(struct work_struct *work)
 
 			state = n->nud_state;
 			if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
-				write_unlock(&n->lock);
 				goto next_elt;
 			}
 
 			if (time_before(n->used, n->confirmed))
 				n->used = n->confirmed;
 
+#if defined(CONFIG_ARP_GC_APPLY_GUARDBAND)
+			/* Do not garbage collect directly after we
+			 * updated n->state to allow applications to
+			 * react to the event
+			 */
+			if (time_before(jiffies,
+					n->updated + CONFIG_ARP_GC_GUARDBAND)) {
+				goto next_elt;
+			}
+#endif
+
 			if (atomic_read(&n->refcnt) == 1 &&
 			    (state == NUD_FAILED ||
 			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
@@ -802,9 +812,8 @@ static void neigh_periodic_work(struct work_struct *work)
 				neigh_cleanup_and_release(n);
 				continue;
 			}
-			write_unlock(&n->lock);
-
 next_elt:
+			write_unlock(&n->lock);
 			np = &n->next;
 		}
 		/*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-05-08  9:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 21:01 [PATCH] neighbour.c: Avoid GC directly after state change Ulf Samuelsson
2015-03-12 18:26 ` David Miller
2015-03-17 23:33   ` Ulf Samuelsson
2015-03-18  1:56     ` YOSHIFUJI Hideaki/吉藤英明
2015-04-10  8:26   ` Ulf Samuelsson
2015-04-15 13:40     ` Ulf Samuelsson
2015-04-16  5:16     ` YOSHIFUJI Hideaki
2015-04-17  8:03       ` Ulf Samuelsson
2015-04-20  2:33         ` YOSHIFUJI Hideaki
2015-04-20 12:48           ` Ulf Samuelsson
2015-04-21  3:58             ` YOSHIFUJI Hideaki
2015-04-22  7:42               ` Ulf Samuelsson
2015-04-22 10:46                 ` YOSHIFUJI Hideaki
2015-04-22 11:49                   ` Ulf Samuelsson
2015-05-08  9:39                     ` Ulf Samuelsson
  -- strict thread matches above, loose matches on Subject: below --
2015-03-11 20:28 Ulf Samuelsson
2015-03-15  8:27 ` YOSHIFUJI Hideaki
2015-03-15 19:34   ` Ulf Samuelsson
2015-03-16  4:57     ` YOSHIFUJI Hideaki/吉藤英明
2015-03-16 19:55       ` Ulf Samuelsson
2015-03-17 12:31         ` YOSHIFUJI Hideaki
2015-03-17 23:27           ` Ulf Samuelsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.