netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] neighbour: update neigh timestamps iff update is effective
@ 2017-05-10  0:06 Ihar Hrachyshka
  2017-05-15 17:10 ` David Miller
  2017-05-15 20:05 ` [PATCH] " Julian Anastasov
  0 siblings, 2 replies; 9+ messages in thread
From: Ihar Hrachyshka @ 2017-05-10  0:06 UTC (permalink / raw)
  To: David S. Miller, Ihar Hrachyshka, Julian Anastasov, He Chunhui, netdev

It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. Then we effectively
ignore the update request, bailing out of touching the neigh entry,
except that we still bump its timestamps.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service
availability.

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
---
 net/core/neighbour.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 58b0bcc..d274f81 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1132,10 +1132,6 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		lladdr = neigh->ha;
 	}
 
-	if (new & NUD_CONNECTED)
-		neigh->confirmed = jiffies;
-	neigh->updated = jiffies;
-
 	/* If entry was valid and address is not changed,
 	   do not change entry state, if new one is STALE.
 	 */
@@ -1157,6 +1153,16 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		}
 	}
 
+	/* Update timestamps only once we know we will make a change to the
+	 * neighbour entry. Otherwise we risk to move the locktime window with
+	 * noop updates and ignore relevant ARP updates.
+	 */
+	if (new != old || lladdr != neigh->ha) {
+		if (new & NUD_CONNECTED)
+			neigh->confirmed = jiffies;
+		neigh->updated = jiffies;
+	}
+
 	if (new != old) {
 		neigh_del_timer(neigh);
 		if (new & NUD_PROBE)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-10  0:06 [PATCH] neighbour: update neigh timestamps iff update is effective Ihar Hrachyshka
@ 2017-05-15 17:10 ` David Miller
  2017-05-15 21:40   ` Ihar Hrachyshka
  2017-05-16 15:44   ` [PATCH v2] " Ihar Hrachyshka
  2017-05-15 20:05 ` [PATCH] " Julian Anastasov
  1 sibling, 2 replies; 9+ messages in thread
From: David Miller @ 2017-05-15 17:10 UTC (permalink / raw)
  To: ihrachys; +Cc: ja, hchunhui, netdev

From: Ihar Hrachyshka <ihrachys@redhat.com>
Date: Tue,  9 May 2017 17:06:05 -0700

> Sometimes neigh_update calls won't touch neither lladdr nor state, for
> example if an update arrives in locktime interval. Then we effectively
> ignore the update request, bailing out of touching the neigh entry,
> except that we still bump its timestamps.

So, in order to understand this, one has to know that the ->updated
value is tested by the protocol specific neigh code, which in turn
will thus influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not.

Please update your commit message to explain that this is how the
locktime mechanism influences neigh_update()'s behavior.

Thank you.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-10  0:06 [PATCH] neighbour: update neigh timestamps iff update is effective Ihar Hrachyshka
  2017-05-15 17:10 ` David Miller
@ 2017-05-15 20:05 ` Julian Anastasov
  2017-05-15 21:35   ` Ihar Hrachyshka
  1 sibling, 1 reply; 9+ messages in thread
From: Julian Anastasov @ 2017-05-15 20:05 UTC (permalink / raw)
  To: Ihar Hrachyshka; +Cc: David S. Miller, He Chunhui, netdev


	Hello,

On Tue, 9 May 2017, Ihar Hrachyshka wrote:

> It's a common practice to send gratuitous ARPs after moving an
> IP address to another device to speed up healing of a service. To
> fulfill service availability constraints, the timing of network peers
> updating their caches to point to a new location of an IP address can be
> particularly important.
> 
> Sometimes neigh_update calls won't touch neither lladdr nor state, for
> example if an update arrives in locktime interval. Then we effectively
> ignore the update request, bailing out of touching the neigh entry,
> except that we still bump its timestamps.
> 
> This may be a problem for updates arriving in quick succession. For
> example, consider the following scenario:
> 
> A service is moved to another device with its IP address. The new device
> sends three gratuitous ARP requests into the network with ~1 seconds
> interval between them. Just before the first request arrives to one of
> network peer nodes, its neigh entry for the IP address transitions from
> STALE to DELAY.  This transition, among other things, updates
> neigh->updated. Once the kernel receives the first gratuitous ARP, it
> ignores it because its arrival time is inside the locktime interval. The
> kernel still bumps neigh->updated. Then the second gratuitous ARP
> request arrives, and it's also ignored because it's still in the (new)
> locktime interval. Same happens for the third request. The node
> eventually heals itself (after delay_first_probe_time seconds since the
> initial transition to DELAY state), but it just wasted some time and
> require a new ARP request/reply round trip. This unfortunate behaviour
> both puts more load on the network, as well as reduces service
> availability.
> 
> This patch changes neigh_update so that it bumps neigh->updated (as well
> as neigh->confirmed) only once we are sure that either lladdr or entry
> state will change). In the scenario described above, it means that the
> second gratuitous ARP request will actually update the entry lladdr.
> 
> Ideally, we would update the neigh entry on the very first gratuitous
> ARP request. The locktime mechanism is designed to ignore ARP updates in
> a short timeframe after a previous ARP update was honoured by the kernel
> layer. This would require tracking timestamps for state transitions
> separately from timestamps when actual updates are received. This would
> probably involve changes in neighbour struct. Therefore, the patch
> doesn't tackle the issue of the first gratuitous APR ignored, leaving
> it for a follow-up.
> 
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>

	Looks ok to me,

Acked-by: Julian Anastasov <ja@ssi.bg>

	It seems arp_accept value currently has influence on
the locktime for GARP requests. My understanding is that
locktime is used to ignore replies from proxy_arp
routers while the requested IP is present on the LAN
and replies immediately. IMHO, GARP requests should not
depend on locktime, even when arp_accept=0. For example:

	if (IN_DEV_ARP_ACCEPT(in_dev)) {
	...
+	} else if (n && tip == sip && arp->ar_op == htons(ARPOP_REQUEST)) {
+		unsigned int addr_type = inet_addr_type_dev_table(net, dev, sip);
+
+		is_garp = (addr_type == RTN_UNICAST);
	}

> ---
>  net/core/neighbour.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 58b0bcc..d274f81 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -1132,10 +1132,6 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
>  		lladdr = neigh->ha;
>  	}
>  
> -	if (new & NUD_CONNECTED)
> -		neigh->confirmed = jiffies;
> -	neigh->updated = jiffies;
> -
>  	/* If entry was valid and address is not changed,
>  	   do not change entry state, if new one is STALE.
>  	 */
> @@ -1157,6 +1153,16 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
>  		}
>  	}
>  
> +	/* Update timestamps only once we know we will make a change to the
> +	 * neighbour entry. Otherwise we risk to move the locktime window with
> +	 * noop updates and ignore relevant ARP updates.
> +	 */
> +	if (new != old || lladdr != neigh->ha) {
> +		if (new & NUD_CONNECTED)
> +			neigh->confirmed = jiffies;
> +		neigh->updated = jiffies;
> +	}
> +
>  	if (new != old) {
>  		neigh_del_timer(neigh);
>  		if (new & NUD_PROBE)
> -- 
> 2.9.3

Regards

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-15 20:05 ` [PATCH] " Julian Anastasov
@ 2017-05-15 21:35   ` Ihar Hrachyshka
  2017-05-15 22:27     ` Julian Anastasov
  0 siblings, 1 reply; 9+ messages in thread
From: Ihar Hrachyshka @ 2017-05-15 21:35 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David S. Miller, He Chunhui, netdev

On Mon, May 15, 2017 at 1:05 PM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         It seems arp_accept value currently has influence on
> the locktime for GARP requests. My understanding is that
> locktime is used to ignore replies from proxy_arp
> routers while the requested IP is present on the LAN
> and replies immediately. IMHO, GARP requests should not
> depend on locktime, even when arp_accept=0. For example:

Yes, I believe so.

I actually thought about introducing the patch that does just that:
forcing override on garp, but then I was thinking, maybe there is some
reason to still apply locktime rules to garps; f.e. if you have
multiple nodes carrying the ip address and located on the same
segment, maybe you want to pick the first that replies to you (in
theory, it may be the node that is less loaded, or closer to us; but
then, it's so fragile even if that was the intent...) Do you want me
to post the patch, or will you cover it?

Ihar

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-15 17:10 ` David Miller
@ 2017-05-15 21:40   ` Ihar Hrachyshka
  2017-05-16 15:47     ` Ihar Hrachyshka
  2017-05-16 15:44   ` [PATCH v2] " Ihar Hrachyshka
  1 sibling, 1 reply; 9+ messages in thread
From: Ihar Hrachyshka @ 2017-05-15 21:40 UTC (permalink / raw)
  To: davem; +Cc: ja, netdev, Ihar Hrachyshka

It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. The neigh->updated
value is tested by the protocol specific neigh code, which in turn
will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not. As a result, we may effectively ignore
the update request, bailing out of touching the neigh entry, except that
we still bump its timestamps inside neigh_update.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service
availability.

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
---
 net/core/neighbour.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 58b0bcc..d274f81 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1132,10 +1132,6 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		lladdr = neigh->ha;
 	}
 
-	if (new & NUD_CONNECTED)
-		neigh->confirmed = jiffies;
-	neigh->updated = jiffies;
-
 	/* If entry was valid and address is not changed,
 	   do not change entry state, if new one is STALE.
 	 */
@@ -1157,6 +1153,16 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		}
 	}
 
+	/* Update timestamps only once we know we will make a change to the
+	 * neighbour entry. Otherwise we risk to move the locktime window with
+	 * noop updates and ignore relevant ARP updates.
+	 */
+	if (new != old || lladdr != neigh->ha) {
+		if (new & NUD_CONNECTED)
+			neigh->confirmed = jiffies;
+		neigh->updated = jiffies;
+	}
+
 	if (new != old) {
 		neigh_del_timer(neigh);
 		if (new & NUD_PROBE)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-15 21:35   ` Ihar Hrachyshka
@ 2017-05-15 22:27     ` Julian Anastasov
  0 siblings, 0 replies; 9+ messages in thread
From: Julian Anastasov @ 2017-05-15 22:27 UTC (permalink / raw)
  To: Ihar Hrachyshka; +Cc: David S. Miller, He Chunhui, netdev


	Hello,

On Mon, 15 May 2017, Ihar Hrachyshka wrote:

> On Mon, May 15, 2017 at 1:05 PM, Julian Anastasov <ja@ssi.bg> wrote:
> >
> >         It seems arp_accept value currently has influence on
> > the locktime for GARP requests. My understanding is that
> > locktime is used to ignore replies from proxy_arp
> > routers while the requested IP is present on the LAN
> > and replies immediately. IMHO, GARP requests should not
> > depend on locktime, even when arp_accept=0. For example:
> 
> Yes, I believe so.
> 
> I actually thought about introducing the patch that does just that:
> forcing override on garp, but then I was thinking, maybe there is some
> reason to still apply locktime rules to garps; f.e. if you have
> multiple nodes carrying the ip address and located on the same
> segment, maybe you want to pick the first that replies to you (in
> theory, it may be the node that is less loaded, or closer to us; but
> then, it's so fragile even if that was the intent...) Do you want me
> to post the patch, or will you cover it?

	Feel free to post a patch for this, I see that you change
in another patch the is_garp value, so it seems the same logic
should be used twice.

Regards

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] neighbour: update neigh timestamps iff update is effective
  2017-05-15 17:10 ` David Miller
  2017-05-15 21:40   ` Ihar Hrachyshka
@ 2017-05-16 15:44   ` Ihar Hrachyshka
  2017-05-17 15:42     ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Ihar Hrachyshka @ 2017-05-16 15:44 UTC (permalink / raw)
  To: davem; +Cc: ja, netdev, Ihar Hrachyshka

It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. The neigh->updated
value is tested by the protocol specific neigh code, which in turn
will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not. As a result, we may effectively ignore
the update request, bailing out of touching the neigh entry, except that
we still bump its timestamps inside neigh_update.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service
availability.

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
---
v2: added more details to commit message to explain relation between arp
    and neigh code.
---
 net/core/neighbour.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 58b0bcc..d274f81 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1132,10 +1132,6 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		lladdr = neigh->ha;
 	}
 
-	if (new & NUD_CONNECTED)
-		neigh->confirmed = jiffies;
-	neigh->updated = jiffies;
-
 	/* If entry was valid and address is not changed,
 	   do not change entry state, if new one is STALE.
 	 */
@@ -1157,6 +1153,16 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 		}
 	}
 
+	/* Update timestamps only once we know we will make a change to the
+	 * neighbour entry. Otherwise we risk to move the locktime window with
+	 * noop updates and ignore relevant ARP updates.
+	 */
+	if (new != old || lladdr != neigh->ha) {
+		if (new & NUD_CONNECTED)
+			neigh->confirmed = jiffies;
+		neigh->updated = jiffies;
+	}
+
 	if (new != old) {
 		neigh_del_timer(neigh);
 		if (new & NUD_PROBE)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] neighbour: update neigh timestamps iff update is effective
  2017-05-15 21:40   ` Ihar Hrachyshka
@ 2017-05-16 15:47     ` Ihar Hrachyshka
  0 siblings, 0 replies; 9+ messages in thread
From: Ihar Hrachyshka @ 2017-05-16 15:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: Julian Anastasov, netdev, Ihar Hrachyshka

On Mon, May 15, 2017 at 2:40 PM, Ihar Hrachyshka <ihrachys@redhat.com> wrote:
> It's a common practice to send gratuitous ARPs after moving an
> IP address to another device to speed up healing of a service. To
> fulfill service availability constraints, the timing of network peers
> updating their caches to point to a new location of an IP address can be
> particularly important.
>
> Sometimes neigh_update calls won't touch neither lladdr nor state, for
> example if an update arrives in locktime interval. The neigh->updated
> value is tested by the protocol specific neigh code, which in turn
> will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
> call to neigh_update() or not. As a result, we may effectively ignore
> the update request, bailing out of touching the neigh entry, except that
> we still bump its timestamps inside neigh_update.
>
>
> This may be a problem for updates arriving in quick succession. For
> example, consider the following scenario:
>
> A service is moved to another device with its IP address. The new device
> sends three gratuitous ARP requests into the network with ~1 seconds
> interval between them. Just before the first request arrives to one of
> network peer nodes, its neigh entry for the IP address transitions from
> STALE to DELAY.  This transition, among other things, updates
> neigh->updated. Once the kernel receives the first gratuitous ARP, it
> ignores it because its arrival time is inside the locktime interval. The
> kernel still bumps neigh->updated. Then the second gratuitous ARP
> request arrives, and it's also ignored because it's still in the (new)
> locktime interval. Same happens for the third request. The node
> eventually heals itself (after delay_first_probe_time seconds since the
> initial transition to DELAY state), but it just wasted some time and
> require a new ARP request/reply round trip. This unfortunate behaviour
> both puts more load on the network, as well as reduces service
> availability.
>
> This patch changes neigh_update so that it bumps neigh->updated (as well
> as neigh->confirmed) only once we are sure that either lladdr or entry
> state will change). In the scenario described above, it means that the
> second gratuitous ARP request will actually update the entry lladdr.
>
> Ideally, we would update the neigh entry on the very first gratuitous
> ARP request. The locktime mechanism is designed to ignore ARP updates in
> a short timeframe after a previous ARP update was honoured by the kernel
> layer. This would require tracking timestamps for state transitions
> separately from timestamps when actual updates are received. This would
> probably involve changes in neighbour struct. Therefore, the patch
> doesn't tackle the issue of the first gratuitous APR ignored, leaving
> it for a follow-up.
>
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>

Please disregard this email, I forgot to update the patch version to
v2 and provide changelog. I posted (hopefully) correct version. Still
learning the process...

Ihar

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] neighbour: update neigh timestamps iff update is effective
  2017-05-16 15:44   ` [PATCH v2] " Ihar Hrachyshka
@ 2017-05-17 15:42     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2017-05-17 15:42 UTC (permalink / raw)
  To: ihrachys; +Cc: ja, netdev

From: Ihar Hrachyshka <ihrachys@redhat.com>
Date: Tue, 16 May 2017 08:44:24 -0700

> It's a common practice to send gratuitous ARPs after moving an
> IP address to another device to speed up healing of a service. To
> fulfill service availability constraints, the timing of network peers
> updating their caches to point to a new location of an IP address can be
> particularly important.
> 
> Sometimes neigh_update calls won't touch neither lladdr nor state, for
> example if an update arrives in locktime interval. The neigh->updated
> value is tested by the protocol specific neigh code, which in turn
> will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
> call to neigh_update() or not. As a result, we may effectively ignore
> the update request, bailing out of touching the neigh entry, except that
> we still bump its timestamps inside neigh_update.
> 
> This may be a problem for updates arriving in quick succession. For
> example, consider the following scenario:
> 
> A service is moved to another device with its IP address. The new device
> sends three gratuitous ARP requests into the network with ~1 seconds
> interval between them. Just before the first request arrives to one of
> network peer nodes, its neigh entry for the IP address transitions from
> STALE to DELAY.  This transition, among other things, updates
> neigh->updated. Once the kernel receives the first gratuitous ARP, it
> ignores it because its arrival time is inside the locktime interval. The
> kernel still bumps neigh->updated. Then the second gratuitous ARP
> request arrives, and it's also ignored because it's still in the (new)
> locktime interval. Same happens for the third request. The node
> eventually heals itself (after delay_first_probe_time seconds since the
> initial transition to DELAY state), but it just wasted some time and
> require a new ARP request/reply round trip. This unfortunate behaviour
> both puts more load on the network, as well as reduces service
> availability.
> 
> This patch changes neigh_update so that it bumps neigh->updated (as well
> as neigh->confirmed) only once we are sure that either lladdr or entry
> state will change). In the scenario described above, it means that the
> second gratuitous ARP request will actually update the entry lladdr.
> 
> Ideally, we would update the neigh entry on the very first gratuitous
> ARP request. The locktime mechanism is designed to ignore ARP updates in
> a short timeframe after a previous ARP update was honoured by the kernel
> layer. This would require tracking timestamps for state transitions
> separately from timestamps when actual updates are received. This would
> probably involve changes in neighbour struct. Therefore, the patch
> doesn't tackle the issue of the first gratuitous APR ignored, leaving
> it for a follow-up.
> 
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> ---
> v2: added more details to commit message to explain relation between arp
>     and neigh code.

Applied, thanks for documenting things better.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-17 15:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10  0:06 [PATCH] neighbour: update neigh timestamps iff update is effective Ihar Hrachyshka
2017-05-15 17:10 ` David Miller
2017-05-15 21:40   ` Ihar Hrachyshka
2017-05-16 15:47     ` Ihar Hrachyshka
2017-05-16 15:44   ` [PATCH v2] " Ihar Hrachyshka
2017-05-17 15:42     ` David Miller
2017-05-15 20:05 ` [PATCH] " Julian Anastasov
2017-05-15 21:35   ` Ihar Hrachyshka
2017-05-15 22:27     ` Julian Anastasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).