All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
@ 2022-05-10  7:43 menglong8.dong
  2022-05-10 21:51 ` Julian Anastasov
  0 siblings, 1 reply; 5+ messages in thread
From: menglong8.dong @ 2022-05-10  7:43 UTC (permalink / raw)
  To: ja
  Cc: horms, pablo, kadlec, fw, davem, edumazet, kuba, pabeni, netdev,
	lvs-devel, linux-kernel, netfilter-devel, coreteam,
	Menglong Dong, Jiang Biao, Hao Peng

From: Menglong Dong <imagedong@tencent.com>

For now, the start of the RR/WRR scheduler is in order of added
destinations, it will result in imbalance if the director is local
to the clients and the number of connection is small.

For example, we have client1, client2, ..., client100 and real service
service1, service2, ..., service10. All clients have local director with
the same ipvs config, and each of them will create 2 long TCP connect to
the virtual service. Therefore, all the clients will connect to service1
and service2, leaving others free, which will make service1 and service2
overloaded.

Fix this by randomizing the starting destination when
IP_VS_SVC_F_SCHED_RR_RANDOM/IP_VS_SVC_F_SCHED_WRR_RANDOM is set.

I start the randomizing from svc->destinations, as we choose the starting
destination from all of the destinations, so it makes no different to
start from svc->sched_data or svc->destinations. If we start from
svc->sched_data, we have to avoid the 'head' node of the list being the
next node of svc->sched_data, to make the choose totally random.

Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v2:
- randomizing the starting of WRR scheduler too
- Replace '|' with '&' in ip_vs_rr_random_start(Julian Anastasov)
- Replace get_random_u32() with prandom_u32_max() (Julian Anastasov)
---
 include/uapi/linux/ip_vs.h     |  3 +++
 net/netfilter/ipvs/ip_vs_rr.c  | 25 ++++++++++++++++++++++++-
 net/netfilter/ipvs/ip_vs_wrr.c | 20 ++++++++++++++++++++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
index 4102ddcb4e14..9543906dae7d 100644
--- a/include/uapi/linux/ip_vs.h
+++ b/include/uapi/linux/ip_vs.h
@@ -28,6 +28,9 @@
 #define IP_VS_SVC_F_SCHED_SH_FALLBACK	IP_VS_SVC_F_SCHED1 /* SH fallback */
 #define IP_VS_SVC_F_SCHED_SH_PORT	IP_VS_SVC_F_SCHED2 /* SH use port */
 
+#define IP_VS_SVC_F_SCHED_WRR_RANDOM	IP_VS_SVC_F_SCHED1 /* random start */
+#define IP_VS_SVC_F_SCHED_RR_RANDOM	IP_VS_SVC_F_SCHED1 /* random start */
+
 /*
  *      Destination Server Flags
  */
diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
index 38495c6f6c7c..d53bfaf7aadf 100644
--- a/net/netfilter/ipvs/ip_vs_rr.c
+++ b/net/netfilter/ipvs/ip_vs_rr.c
@@ -22,13 +22,36 @@
 
 #include <net/ip_vs.h>
 
+static void ip_vs_rr_random_start(struct ip_vs_service *svc)
+{
+	struct list_head *cur;
+	u32 start;
+
+	if (!(svc->flags & IP_VS_SVC_F_SCHED_RR_RANDOM) ||
+	    svc->num_dests <= 1)
+		return;
+
+	start = prandom_u32_max(svc->num_dests);
+	spin_lock_bh(&svc->sched_lock);
+	cur = &svc->destinations;
+	while (start--)
+		cur = cur->next;
+	svc->sched_data = cur;
+	spin_unlock_bh(&svc->sched_lock);
+}
 
 static int ip_vs_rr_init_svc(struct ip_vs_service *svc)
 {
 	svc->sched_data = &svc->destinations;
+	ip_vs_rr_random_start(svc);
 	return 0;
 }
 
+static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
+{
+	ip_vs_rr_random_start(svc);
+	return 0;
+}
 
 static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
 {
@@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = {
 	.module =		THIS_MODULE,
 	.n_list =		LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list),
 	.init_service =		ip_vs_rr_init_svc,
-	.add_dest =		NULL,
+	.add_dest =		ip_vs_rr_add_dest,
 	.del_dest =		ip_vs_rr_del_dest,
 	.schedule =		ip_vs_rr_schedule,
 };
diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
index 1bc7a0789d85..ed6230976379 100644
--- a/net/netfilter/ipvs/ip_vs_wrr.c
+++ b/net/netfilter/ipvs/ip_vs_wrr.c
@@ -102,6 +102,24 @@ static int ip_vs_wrr_max_weight(struct ip_vs_service *svc)
 	return weight;
 }
 
+static void ip_vs_wrr_random_start(struct ip_vs_service *svc)
+{
+	struct ip_vs_wrr_mark *mark = svc->sched_data;
+	struct list_head *cur;
+	u32 start;
+
+	if (!(svc->flags & IP_VS_SVC_F_SCHED_WRR_RANDOM) ||
+	    svc->num_dests <= 1)
+		return;
+
+	start = prandom_u32_max(svc->num_dests);
+	spin_lock_bh(&svc->sched_lock);
+	cur = &svc->destinations;
+	while (start--)
+		cur = cur->next;
+	mark->cl = list_entry(cur, struct ip_vs_dest, n_list);
+	spin_unlock_bh(&svc->sched_lock);
+}
 
 static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
 {
@@ -119,6 +137,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
 	mark->mw = ip_vs_wrr_max_weight(svc) - (mark->di - 1);
 	mark->cw = mark->mw;
 	svc->sched_data = mark;
+	ip_vs_wrr_random_start(svc);
 
 	return 0;
 }
@@ -149,6 +168,7 @@ static int ip_vs_wrr_dest_changed(struct ip_vs_service *svc,
 	else if (mark->di > 1)
 		mark->cw = (mark->cw / mark->di) * mark->di + 1;
 	spin_unlock_bh(&svc->sched_lock);
+	ip_vs_wrr_random_start(svc);
 	return 0;
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
  2022-05-10  7:43 [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler menglong8.dong
@ 2022-05-10 21:51 ` Julian Anastasov
  2022-05-12  7:16   ` Menglong Dong
  0 siblings, 1 reply; 5+ messages in thread
From: Julian Anastasov @ 2022-05-10 21:51 UTC (permalink / raw)
  To: menglong8.dong
  Cc: Simon Horman, pablo, lvs-devel, netfilter-devel, Menglong Dong,
	Jiang Biao, Hao Peng


	Hello,

On Tue, 10 May 2022, menglong8.dong@gmail.com wrote:

> From: Menglong Dong <imagedong@tencent.com>
> 
> For now, the start of the RR/WRR scheduler is in order of added
> destinations, it will result in imbalance if the director is local
> to the clients and the number of connection is small.
> 
> For example, we have client1, client2, ..., client100 and real service
> service1, service2, ..., service10. All clients have local director with
> the same ipvs config, and each of them will create 2 long TCP connect to
> the virtual service. Therefore, all the clients will connect to service1
> and service2, leaving others free, which will make service1 and service2
> overloaded.

	More time I spend on this topic, I'm less
convinced that it is worth the effort. Randomness can come
from another place: client address/port. Schedulers
like MH and SH probably can help for such case.
RR is so simple scheduler that I doubt it is used in
practice. People prefer WLC, WRR and recently MH which
has many features:

- lockless
- supports server weights
- safer on dest adding/removal or weight changes
- fallback, optional
- suitable for sloppy TCP/SCTP mode to avoid the
SYNC mechanism in active-active setups

> Fix this by randomizing the starting destination when
> IP_VS_SVC_F_SCHED_RR_RANDOM/IP_VS_SVC_F_SCHED_WRR_RANDOM is set.
> 
> I start the randomizing from svc->destinations, as we choose the starting
> destination from all of the destinations, so it makes no different to
> start from svc->sched_data or svc->destinations. If we start from
> svc->sched_data, we have to avoid the 'head' node of the list being the
> next node of svc->sched_data, to make the choose totally random.

	Yes, first dest has two times more chance if we do
not account the head.

> Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
> v2:
> - randomizing the starting of WRR scheduler too
> - Replace '|' with '&' in ip_vs_rr_random_start(Julian Anastasov)
> - Replace get_random_u32() with prandom_u32_max() (Julian Anastasov)
> ---
>  include/uapi/linux/ip_vs.h     |  3 +++
>  net/netfilter/ipvs/ip_vs_rr.c  | 25 ++++++++++++++++++++++++-
>  net/netfilter/ipvs/ip_vs_wrr.c | 20 ++++++++++++++++++++
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> index 4102ddcb4e14..9543906dae7d 100644
> --- a/include/uapi/linux/ip_vs.h
> +++ b/include/uapi/linux/ip_vs.h
> @@ -28,6 +28,9 @@
>  #define IP_VS_SVC_F_SCHED_SH_FALLBACK	IP_VS_SVC_F_SCHED1 /* SH fallback */
>  #define IP_VS_SVC_F_SCHED_SH_PORT	IP_VS_SVC_F_SCHED2 /* SH use port */
>  
> +#define IP_VS_SVC_F_SCHED_WRR_RANDOM	IP_VS_SVC_F_SCHED1 /* random start */
> +#define IP_VS_SVC_F_SCHED_RR_RANDOM	IP_VS_SVC_F_SCHED1 /* random start */
> +
>  /*
>   *      Destination Server Flags
>   */
> diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
> index 38495c6f6c7c..d53bfaf7aadf 100644
> --- a/net/netfilter/ipvs/ip_vs_rr.c
> +++ b/net/netfilter/ipvs/ip_vs_rr.c
> @@ -22,13 +22,36 @@
>  
>  #include <net/ip_vs.h>
>  
> +static void ip_vs_rr_random_start(struct ip_vs_service *svc)
> +{
> +	struct list_head *cur;
> +	u32 start;
> +
> +	if (!(svc->flags & IP_VS_SVC_F_SCHED_RR_RANDOM) ||
> +	    svc->num_dests <= 1)
> +		return;
> +
> +	start = prandom_u32_max(svc->num_dests);
> +	spin_lock_bh(&svc->sched_lock);
> +	cur = &svc->destinations;
> +	while (start--)
> +		cur = cur->next;
> +	svc->sched_data = cur;
> +	spin_unlock_bh(&svc->sched_lock);
> +}
>  
>  static int ip_vs_rr_init_svc(struct ip_vs_service *svc)
>  {
>  	svc->sched_data = &svc->destinations;
> +	ip_vs_rr_random_start(svc);
>  	return 0;
>  }
>  
> +static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
> +{
> +	ip_vs_rr_random_start(svc);
> +	return 0;
> +}
>  
>  static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
>  {
> @@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = {
>  	.module =		THIS_MODULE,
>  	.n_list =		LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list),
>  	.init_service =		ip_vs_rr_init_svc,
> -	.add_dest =		NULL,
> +	.add_dest =		ip_vs_rr_add_dest,
>  	.del_dest =		ip_vs_rr_del_dest,
>  	.schedule =		ip_vs_rr_schedule,
>  };
> diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
> index 1bc7a0789d85..ed6230976379 100644
> --- a/net/netfilter/ipvs/ip_vs_wrr.c
> +++ b/net/netfilter/ipvs/ip_vs_wrr.c
> @@ -102,6 +102,24 @@ static int ip_vs_wrr_max_weight(struct ip_vs_service *svc)
>  	return weight;
>  }
>  
> +static void ip_vs_wrr_random_start(struct ip_vs_service *svc)
> +{
> +	struct ip_vs_wrr_mark *mark = svc->sched_data;
> +	struct list_head *cur;
> +	u32 start;
> +
> +	if (!(svc->flags & IP_VS_SVC_F_SCHED_WRR_RANDOM) ||
> +	    svc->num_dests <= 1)
> +		return;
> +
> +	start = prandom_u32_max(svc->num_dests);
> +	spin_lock_bh(&svc->sched_lock);
> +	cur = &svc->destinations;
> +	while (start--)
> +		cur = cur->next;
> +	mark->cl = list_entry(cur, struct ip_vs_dest, n_list);

	The problem with WRR is that mark->cl and mark->cw
work together, we can not change just cl.

> +	spin_unlock_bh(&svc->sched_lock);
> +}
>  
>  static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
>  {
> @@ -119,6 +137,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
>  	mark->mw = ip_vs_wrr_max_weight(svc) - (mark->di - 1);
>  	mark->cw = mark->mw;
>  	svc->sched_data = mark;
> +	ip_vs_wrr_random_start(svc);
>  
>  	return 0;
>  }
> @@ -149,6 +168,7 @@ static int ip_vs_wrr_dest_changed(struct ip_vs_service *svc,
>  	else if (mark->di > 1)
>  		mark->cw = (mark->cw / mark->di) * mark->di + 1;
>  	spin_unlock_bh(&svc->sched_lock);
> +	ip_vs_wrr_random_start(svc);

	This will be called even on upd_dest (while processing
packets), so the region under lock should be short and cl/cw
state should not be damaged.

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
  2022-05-10 21:51 ` Julian Anastasov
@ 2022-05-12  7:16   ` Menglong Dong
  2022-05-12 10:33     ` Julian Anastasov
  0 siblings, 1 reply; 5+ messages in thread
From: Menglong Dong @ 2022-05-12  7:16 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, Pablo Neira Ayuso, lvs-devel, netfilter-devel,
	Menglong Dong, Jiang Biao, Hao Peng

On Wed, May 11, 2022 at 5:51 AM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Tue, 10 May 2022, menglong8.dong@gmail.com wrote:
>
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > For now, the start of the RR/WRR scheduler is in order of added
> > destinations, it will result in imbalance if the director is local
> > to the clients and the number of connection is small.
> >
> > For example, we have client1, client2, ..., client100 and real service
> > service1, service2, ..., service10. All clients have local director with
> > the same ipvs config, and each of them will create 2 long TCP connect to
> > the virtual service. Therefore, all the clients will connect to service1
> > and service2, leaving others free, which will make service1 and service2
> > overloaded.
>
>         More time I spend on this topic, I'm less
> convinced that it is worth the effort. Randomness can come
> from another place: client address/port. Schedulers
> like MH and SH probably can help for such case.
> RR is so simple scheduler that I doubt it is used in
> practice. People prefer WLC, WRR and recently MH which
> has many features:
>
> - lockless
> - supports server weights
> - safer on dest adding/removal or weight changes
> - fallback, optional
> - suitable for sloppy TCP/SCTP mode to avoid the
> SYNC mechanism in active-active setups
>

Yeah, WLC and MH are excellent schedulers. As for SH, my
fellows' feedback says that it doesn't work well. In fact, it's their
fault, they just forget to enable port hash and just use the
default ip hash. With my direction, this case is solved by sh.

Now, I'm not sure if this feature is necessary. Maybe someone
needs it? Such as someone, who need RR/WRR and a random
start......

Anyway, I have made a V3 according to your advice. I can
send it with any positive reply :/


Thanks for your direction
Menglong Dong

> > Fix this by randomizing the starting destination when
> > IP_VS_SVC_F_SCHED_RR_RANDOM/IP_VS_SVC_F_SCHED_WRR_RANDOM is set.
> >
> > I start the randomizing from svc->destinations, as we choose the starting
> > destination from all of the destinations, so it makes no different to
> > start from svc->sched_data or svc->destinations. If we start from
> > svc->sched_data, we have to avoid the 'head' node of the list being the
> > next node of svc->sched_data, to make the choose totally random.
>
>         Yes, first dest has two times more chance if we do
> not account the head.
>
> > Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> > Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > ---
> > v2:
> > - randomizing the starting of WRR scheduler too
> > - Replace '|' with '&' in ip_vs_rr_random_start(Julian Anastasov)
> > - Replace get_random_u32() with prandom_u32_max() (Julian Anastasov)
> > ---
> >  include/uapi/linux/ip_vs.h     |  3 +++
> >  net/netfilter/ipvs/ip_vs_rr.c  | 25 ++++++++++++++++++++++++-
> >  net/netfilter/ipvs/ip_vs_wrr.c | 20 ++++++++++++++++++++
> >  3 files changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/ip_vs.h b/include/uapi/linux/ip_vs.h
> > index 4102ddcb4e14..9543906dae7d 100644
> > --- a/include/uapi/linux/ip_vs.h
> > +++ b/include/uapi/linux/ip_vs.h
> > @@ -28,6 +28,9 @@
> >  #define IP_VS_SVC_F_SCHED_SH_FALLBACK        IP_VS_SVC_F_SCHED1 /* SH fallback */
> >  #define IP_VS_SVC_F_SCHED_SH_PORT    IP_VS_SVC_F_SCHED2 /* SH use port */
> >
> > +#define IP_VS_SVC_F_SCHED_WRR_RANDOM IP_VS_SVC_F_SCHED1 /* random start */
> > +#define IP_VS_SVC_F_SCHED_RR_RANDOM  IP_VS_SVC_F_SCHED1 /* random start */
> > +
> >  /*
> >   *      Destination Server Flags
> >   */
> > diff --git a/net/netfilter/ipvs/ip_vs_rr.c b/net/netfilter/ipvs/ip_vs_rr.c
> > index 38495c6f6c7c..d53bfaf7aadf 100644
> > --- a/net/netfilter/ipvs/ip_vs_rr.c
> > +++ b/net/netfilter/ipvs/ip_vs_rr.c
> > @@ -22,13 +22,36 @@
> >
> >  #include <net/ip_vs.h>
> >
> > +static void ip_vs_rr_random_start(struct ip_vs_service *svc)
> > +{
> > +     struct list_head *cur;
> > +     u32 start;
> > +
> > +     if (!(svc->flags & IP_VS_SVC_F_SCHED_RR_RANDOM) ||
> > +         svc->num_dests <= 1)
> > +             return;
> > +
> > +     start = prandom_u32_max(svc->num_dests);
> > +     spin_lock_bh(&svc->sched_lock);
> > +     cur = &svc->destinations;
> > +     while (start--)
> > +             cur = cur->next;
> > +     svc->sched_data = cur;
> > +     spin_unlock_bh(&svc->sched_lock);
> > +}
> >
> >  static int ip_vs_rr_init_svc(struct ip_vs_service *svc)
> >  {
> >       svc->sched_data = &svc->destinations;
> > +     ip_vs_rr_random_start(svc);
> >       return 0;
> >  }
> >
> > +static int ip_vs_rr_add_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
> > +{
> > +     ip_vs_rr_random_start(svc);
> > +     return 0;
> > +}
> >
> >  static int ip_vs_rr_del_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest)
> >  {
> > @@ -104,7 +127,7 @@ static struct ip_vs_scheduler ip_vs_rr_scheduler = {
> >       .module =               THIS_MODULE,
> >       .n_list =               LIST_HEAD_INIT(ip_vs_rr_scheduler.n_list),
> >       .init_service =         ip_vs_rr_init_svc,
> > -     .add_dest =             NULL,
> > +     .add_dest =             ip_vs_rr_add_dest,
> >       .del_dest =             ip_vs_rr_del_dest,
> >       .schedule =             ip_vs_rr_schedule,
> >  };
> > diff --git a/net/netfilter/ipvs/ip_vs_wrr.c b/net/netfilter/ipvs/ip_vs_wrr.c
> > index 1bc7a0789d85..ed6230976379 100644
> > --- a/net/netfilter/ipvs/ip_vs_wrr.c
> > +++ b/net/netfilter/ipvs/ip_vs_wrr.c
> > @@ -102,6 +102,24 @@ static int ip_vs_wrr_max_weight(struct ip_vs_service *svc)
> >       return weight;
> >  }
> >
> > +static void ip_vs_wrr_random_start(struct ip_vs_service *svc)
> > +{
> > +     struct ip_vs_wrr_mark *mark = svc->sched_data;
> > +     struct list_head *cur;
> > +     u32 start;
> > +
> > +     if (!(svc->flags & IP_VS_SVC_F_SCHED_WRR_RANDOM) ||
> > +         svc->num_dests <= 1)
> > +             return;
> > +
> > +     start = prandom_u32_max(svc->num_dests);
> > +     spin_lock_bh(&svc->sched_lock);
> > +     cur = &svc->destinations;
> > +     while (start--)
> > +             cur = cur->next;
> > +     mark->cl = list_entry(cur, struct ip_vs_dest, n_list);
>
>         The problem with WRR is that mark->cl and mark->cw
> work together, we can not change just cl.
>
> > +     spin_unlock_bh(&svc->sched_lock);
> > +}
> >
> >  static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
> >  {
> > @@ -119,6 +137,7 @@ static int ip_vs_wrr_init_svc(struct ip_vs_service *svc)
> >       mark->mw = ip_vs_wrr_max_weight(svc) - (mark->di - 1);
> >       mark->cw = mark->mw;
> >       svc->sched_data = mark;
> > +     ip_vs_wrr_random_start(svc);
> >
> >       return 0;
> >  }
> > @@ -149,6 +168,7 @@ static int ip_vs_wrr_dest_changed(struct ip_vs_service *svc,
> >       else if (mark->di > 1)
> >               mark->cw = (mark->cw / mark->di) * mark->di + 1;
> >       spin_unlock_bh(&svc->sched_lock);
> > +     ip_vs_wrr_random_start(svc);
>
>         This will be called even on upd_dest (while processing
> packets), so the region under lock should be short and cl/cw
> state should not be damaged.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
  2022-05-12  7:16   ` Menglong Dong
@ 2022-05-12 10:33     ` Julian Anastasov
  2022-05-15 13:47       ` Menglong Dong
  0 siblings, 1 reply; 5+ messages in thread
From: Julian Anastasov @ 2022-05-12 10:33 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Simon Horman, Pablo Neira Ayuso, lvs-devel, netfilter-devel,
	Menglong Dong, Jiang Biao, Hao Peng


	Hello,

On Thu, 12 May 2022, Menglong Dong wrote:

> Yeah, WLC and MH are excellent schedulers. As for SH, my
> fellows' feedback says that it doesn't work well. In fact, it's their
> fault, they just forget to enable port hash and just use the
> default ip hash. With my direction, this case is solved by sh.
> 
> Now, I'm not sure if this feature is necessary. Maybe someone
> needs it? Such as someone, who need RR/WRR and a random
> start......

	If there is some more clever way to add randomness.
The problem is that random start should be applied after
initial configuration only. But there is no clear point between
where configuration is completed and when service starts.
This is not bad for RR but is bad for WRR.

	One way would be the user tool to add dests in random
order. IIRC, this should not affect setups with backup servers
as long as they share the same set of real servers, i.e.
order in svc->destinations does not matter for SYNC but
the real servers should be present in all directors.

	Another option would be __ip_vs_update_dest() to
use list_add_rcu() or list_add_tail_rcu() per dest depending
on some switch that enables random ordering for the svc.
But it will affect only schedulers that do not order
later the dests. So, it should help for RR, WRR (random
order per same weight). In this case, scheduler's code
is not affected.

	For RR, the scheduler does not use weights and
dests are usually not updated. But for WRR the weights
can be changed and this affects order of selection without
changing the order in svc->destinations.

	OTOH, WRR is a scheduler that can support frequent
update of dest weights. This is not true for MH which can
easily change only between 0 and some fixed weight without
changing its table. As result, ip_vs_wrr_dest_changed()
can be called frequently even after the initial configuration,
at the same time while packets are scheduled.

	When you mentioned that ip_vs_wrr_gcd_weight() is
slow, I see that ip_vs_wrr_dest_changed() can be improved
to reduce the time while holding the lock. May be
in separate patch we can call ip_vs_wrr_gcd_weight() and
ip_vs_wrr_max_weight() before the spin_lock_bh() by using
two new tmp vars.

> Anyway, I have made a V3 according to your advice. I can
> send it with any positive reply :/

	You can use [RFC tag] for this, so that we can
at least review it and further decide what can be done.

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler
  2022-05-12 10:33     ` Julian Anastasov
@ 2022-05-15 13:47       ` Menglong Dong
  0 siblings, 0 replies; 5+ messages in thread
From: Menglong Dong @ 2022-05-15 13:47 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Simon Horman, Pablo Neira Ayuso, lvs-devel, netfilter-devel,
	Menglong Dong

On Thu, May 12, 2022 at 6:33 PM Julian Anastasov <ja@ssi.bg> wrote:
>
>
>         Hello,
>
> On Thu, 12 May 2022, Menglong Dong wrote:
>
> > Yeah, WLC and MH are excellent schedulers. As for SH, my
> > fellows' feedback says that it doesn't work well. In fact, it's their
> > fault, they just forget to enable port hash and just use the
> > default ip hash. With my direction, this case is solved by sh.
> >
> > Now, I'm not sure if this feature is necessary. Maybe someone
> > needs it? Such as someone, who need RR/WRR and a random
> > start......
>
>         If there is some more clever way to add randomness.
> The problem is that random start should be applied after
> initial configuration only. But there is no clear point between
> where configuration is completed and when service starts.
> This is not bad for RR but is bad for WRR.
>

Yes, we need a more nice way to do this work. It's easy for RR,
but a little complex for WRR, after I figure out how WRR works.

>         One way would be the user tool to add dests in random
> order. IIRC, this should not affect setups with backup servers
> as long as they share the same set of real servers, i.e.
> order in svc->destinations does not matter for SYNC but
> the real servers should be present in all directors.
>
>         Another option would be __ip_vs_update_dest() to
> use list_add_rcu() or list_add_tail_rcu() per dest depending
> on some switch that enables random ordering for the svc.
> But it will affect only schedulers that do not order
> later the dests. So, it should help for RR, WRR (random
> order per same weight). In this case, scheduler's code
> is not affected.
>
>         For RR, the scheduler does not use weights and
> dests are usually not updated. But for WRR the weights
> can be changed and this affects order of selection without
> changing the order in svc->destinations.
>
>         OTOH, WRR is a scheduler that can support frequent
> update of dest weights. This is not true for MH which can
> easily change only between 0 and some fixed weight without
> changing its table. As result, ip_vs_wrr_dest_changed()
> can be called frequently even after the initial configuration,
> at the same time while packets are scheduled.
>

Using a user tool to add dests in random order is an option, which
I thought about before. But it needs users to make some extra
changes.

Thanks for your explanation. The options you mentioned are mature
and significant. I'll make them together and try to find the best way,
after I dig into the ipvs code deeper.

>         When you mentioned that ip_vs_wrr_gcd_weight() is
> slow, I see that ip_vs_wrr_dest_changed() can be improved
> to reduce the time while holding the lock. May be
> in separate patch we can call ip_vs_wrr_gcd_weight() and
> ip_vs_wrr_max_weight() before the spin_lock_bh() by using
> two new tmp vars.
>

Yeah, it seems that the range of the lock can be reduced. I
think I'm able to handle it.



> > Anyway, I have made a V3 according to your advice. I can
> > send it with any positive reply :/
>
>         You can use [RFC tag] for this, so that we can
> at least review it and further decide what can be done.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-15 13:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10  7:43 [PATCH net-next v2] net: ipvs: randomize starting destination of RR/WRR scheduler menglong8.dong
2022-05-10 21:51 ` Julian Anastasov
2022-05-12  7:16   ` Menglong Dong
2022-05-12 10:33     ` Julian Anastasov
2022-05-15 13:47       ` Menglong Dong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.