* [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-25 17:41 ` Xin Long 0 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-05-25 17:41 UTC (permalink / raw) To: network dev, linux-sctp Cc: davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, Neil Horman, syzkaller syzbot reported a rcu_sched self-detected stall on CPU which is caused by too small value set on rto_min with SCTP_RTOINFO sockopt. With this value, hb_timer will get stuck there, as in its timer handler it starts this timer again with this value, then goes to the timer handler again. This problem is there since very beginning, and thanks to Eric for the reproducer shared from a syzbot mail. This patch fixes it by not allowing to set rto_min with a value below 200 msecs, which is based on TCP's, by either setsockopt or sysctl. Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- include/net/sctp/constants.h | 1 + net/sctp/socket.c | 10 +++++++--- net/sctp/sysctl.c | 3 ++- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 20ff237..2ee7a7b 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; #define SCTP_RTO_INITIAL (3 * 1000) #define SCTP_RTO_MIN (1 * 1000) #define SCTP_RTO_MAX (60 * 1000) +#define SCTP_RTO_HARD_MIN 200 #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ diff --git a/net/sctp/socket.c b/net/sctp/socket.c index ae7e7c6..6ef12c7 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, * be changed. * */ -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, + unsigned int optlen) { struct sctp_rtoinfo rtoinfo; struct sctp_association *asoc; @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne else rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; - if (rto_min) + if (rto_min) { + if (rto_min < SCTP_RTO_HARD_MIN) + return -EINVAL; rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; - else + } else { rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; + } if (rto_min > rto_max) return -EINVAL; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 33ca5b7..7ec854a 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; static int rto_beta_min = 0; static int rto_alpha_max = 1000; static int rto_beta_max = 1000; +static int rto_hard_min = SCTP_RTO_HARD_MIN; static unsigned long max_autoclose_min = 0; static unsigned long max_autoclose_max = @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_sctp_do_rto_min, - .extra1 = &one, + .extra1 = &rto_hard_min, .extra2 = &init_net.sctp.rto_max }, { -- 2.1.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-25 17:41 ` Xin Long 0 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-05-25 17:41 UTC (permalink / raw) To: network dev, linux-sctp Cc: davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, Neil Horman, syzkaller syzbot reported a rcu_sched self-detected stall on CPU which is caused by too small value set on rto_min with SCTP_RTOINFO sockopt. With this value, hb_timer will get stuck there, as in its timer handler it starts this timer again with this value, then goes to the timer handler again. This problem is there since very beginning, and thanks to Eric for the reproducer shared from a syzbot mail. This patch fixes it by not allowing to set rto_min with a value below 200 msecs, which is based on TCP's, by either setsockopt or sysctl. Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- include/net/sctp/constants.h | 1 + net/sctp/socket.c | 10 +++++++--- net/sctp/sysctl.c | 3 ++- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 20ff237..2ee7a7b 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; #define SCTP_RTO_INITIAL (3 * 1000) #define SCTP_RTO_MIN (1 * 1000) #define SCTP_RTO_MAX (60 * 1000) +#define SCTP_RTO_HARD_MIN 200 #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ diff --git a/net/sctp/socket.c b/net/sctp/socket.c index ae7e7c6..6ef12c7 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, * be changed. * */ -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, + unsigned int optlen) { struct sctp_rtoinfo rtoinfo; struct sctp_association *asoc; @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne else rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; - if (rto_min) + if (rto_min) { + if (rto_min < SCTP_RTO_HARD_MIN) + return -EINVAL; rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; - else + } else { rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; + } if (rto_min > rto_max) return -EINVAL; diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 33ca5b7..7ec854a 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; static int rto_beta_min = 0; static int rto_alpha_max = 1000; static int rto_beta_max = 1000; +static int rto_hard_min = SCTP_RTO_HARD_MIN; static unsigned long max_autoclose_min = 0; static unsigned long max_autoclose_max @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = proc_sctp_do_rto_min, - .extra1 = &one, + .extra1 = &rto_hard_min, .extra2 = &init_net.sctp.rto_max }, { -- 2.1.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-25 17:41 ` Xin Long @ 2018-05-25 19:13 ` Neil Horman -1 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-25 19:13 UTC (permalink / raw) To: Xin Long Cc: network dev, linux-sctp, davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > syzbot reported a rcu_sched self-detected stall on CPU which is caused > by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > value, hb_timer will get stuck there, as in its timer handler it starts > this timer again with this value, then goes to the timer handler again. > > This problem is there since very beginning, and thanks to Eric for the > reproducer shared from a syzbot mail. > > This patch fixes it by not allowing to set rto_min with a value below > 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > include/net/sctp/constants.h | 1 + > net/sctp/socket.c | 10 +++++++--- > net/sctp/sysctl.c | 3 ++- > 3 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > index 20ff237..2ee7a7b 100644 > --- a/include/net/sctp/constants.h > +++ b/include/net/sctp/constants.h > @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > #define SCTP_RTO_INITIAL (3 * 1000) > #define SCTP_RTO_MIN (1 * 1000) > #define SCTP_RTO_MAX (60 * 1000) > +#define SCTP_RTO_HARD_MIN 200 > > #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index ae7e7c6..6ef12c7 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > * be changed. > * > */ > -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > + unsigned int optlen) > { > struct sctp_rtoinfo rtoinfo; > struct sctp_association *asoc; > @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > else > rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > - if (rto_min) > + if (rto_min) { > + if (rto_min < SCTP_RTO_HARD_MIN) > + return -EINVAL; > rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > - else > + } else { > rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > + } > > if (rto_min > rto_max) > return -EINVAL; > diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > index 33ca5b7..7ec854a 100644 > --- a/net/sctp/sysctl.c > +++ b/net/sctp/sysctl.c > @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > static int rto_beta_min = 0; > static int rto_alpha_max = 1000; > static int rto_beta_max = 1000; > +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > static unsigned long max_autoclose_min = 0; > static unsigned long max_autoclose_max = > @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > .maxlen = sizeof(unsigned int), > .mode = 0644, > .proc_handler = proc_sctp_do_rto_min, > - .extra1 = &one, > + .extra1 = &rto_hard_min, > .extra2 = &init_net.sctp.rto_max > }, > { > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Patch looks fine, you probably want to note this hard minimum in man(7) sctp as well Acked-by: Neil Horman <nhorman@tuxdriver.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-25 19:13 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-25 19:13 UTC (permalink / raw) To: Xin Long Cc: network dev, linux-sctp, davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > syzbot reported a rcu_sched self-detected stall on CPU which is caused > by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > value, hb_timer will get stuck there, as in its timer handler it starts > this timer again with this value, then goes to the timer handler again. > > This problem is there since very beginning, and thanks to Eric for the > reproducer shared from a syzbot mail. > > This patch fixes it by not allowing to set rto_min with a value below > 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > include/net/sctp/constants.h | 1 + > net/sctp/socket.c | 10 +++++++--- > net/sctp/sysctl.c | 3 ++- > 3 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > index 20ff237..2ee7a7b 100644 > --- a/include/net/sctp/constants.h > +++ b/include/net/sctp/constants.h > @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > #define SCTP_RTO_INITIAL (3 * 1000) > #define SCTP_RTO_MIN (1 * 1000) > #define SCTP_RTO_MAX (60 * 1000) > +#define SCTP_RTO_HARD_MIN 200 > > #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index ae7e7c6..6ef12c7 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > * be changed. > * > */ > -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > + unsigned int optlen) > { > struct sctp_rtoinfo rtoinfo; > struct sctp_association *asoc; > @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > else > rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > - if (rto_min) > + if (rto_min) { > + if (rto_min < SCTP_RTO_HARD_MIN) > + return -EINVAL; > rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > - else > + } else { > rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > + } > > if (rto_min > rto_max) > return -EINVAL; > diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > index 33ca5b7..7ec854a 100644 > --- a/net/sctp/sysctl.c > +++ b/net/sctp/sysctl.c > @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > static int rto_beta_min = 0; > static int rto_alpha_max = 1000; > static int rto_beta_max = 1000; > +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > static unsigned long max_autoclose_min = 0; > static unsigned long max_autoclose_max > @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > .maxlen = sizeof(unsigned int), > .mode = 0644, > .proc_handler = proc_sctp_do_rto_min, > - .extra1 = &one, > + .extra1 = &rto_hard_min, > .extra2 = &init_net.sctp.rto_max > }, > { > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Patch looks fine, you probably want to note this hard minimum in man(7) sctp as well Acked-by: Neil Horman <nhorman@tuxdriver.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-25 19:13 ` Neil Horman @ 2018-05-26 15:42 ` Michael Tuexen -1 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-26 15:42 UTC (permalink / raw) To: Neil Horman Cc: Xin Long, network dev, linux-sctp, davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller > On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >> syzbot reported a rcu_sched self-detected stall on CPU which is caused >> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >> value, hb_timer will get stuck there, as in its timer handler it starts >> this timer again with this value, then goes to the timer handler again. >> >> This problem is there since very beginning, and thanks to Eric for the >> reproducer shared from a syzbot mail. >> >> This patch fixes it by not allowing to set rto_min with a value below >> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >> >> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >> Signed-off-by: Xin Long <lucien.xin@gmail.com> >> --- >> include/net/sctp/constants.h | 1 + >> net/sctp/socket.c | 10 +++++++--- >> net/sctp/sysctl.c | 3 ++- >> 3 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >> index 20ff237..2ee7a7b 100644 >> --- a/include/net/sctp/constants.h >> +++ b/include/net/sctp/constants.h >> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >> #define SCTP_RTO_INITIAL (3 * 1000) >> #define SCTP_RTO_MIN (1 * 1000) >> #define SCTP_RTO_MAX (60 * 1000) >> +#define SCTP_RTO_HARD_MIN 200 >> >> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >> index ae7e7c6..6ef12c7 100644 >> --- a/net/sctp/socket.c >> +++ b/net/sctp/socket.c >> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >> * be changed. >> * >> */ >> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >> + unsigned int optlen) >> { >> struct sctp_rtoinfo rtoinfo; >> struct sctp_association *asoc; >> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >> else >> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >> >> - if (rto_min) >> + if (rto_min) { >> + if (rto_min < SCTP_RTO_HARD_MIN) >> + return -EINVAL; >> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >> - else >> + } else { >> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >> + } >> >> if (rto_min > rto_max) >> return -EINVAL; >> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >> index 33ca5b7..7ec854a 100644 >> --- a/net/sctp/sysctl.c >> +++ b/net/sctp/sysctl.c >> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >> static int rto_beta_min = 0; >> static int rto_alpha_max = 1000; >> static int rto_beta_max = 1000; >> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >> >> static unsigned long max_autoclose_min = 0; >> static unsigned long max_autoclose_max = >> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >> .maxlen = sizeof(unsigned int), >> .mode = 0644, >> .proc_handler = proc_sctp_do_rto_min, >> - .extra1 = &one, >> + .extra1 = &rto_hard_min, >> .extra2 = &init_net.sctp.rto_max >> }, >> { >> -- >> 2.1.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > well > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. So could this be reduced? Best regards Michael > Acked-by: Neil Horman <nhorman@tuxdriver.com> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-26 15:42 ` Michael Tuexen 0 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-26 15:42 UTC (permalink / raw) To: Neil Horman Cc: Xin Long, network dev, linux-sctp, davem, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller > On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >> syzbot reported a rcu_sched self-detected stall on CPU which is caused >> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >> value, hb_timer will get stuck there, as in its timer handler it starts >> this timer again with this value, then goes to the timer handler again. >> >> This problem is there since very beginning, and thanks to Eric for the >> reproducer shared from a syzbot mail. >> >> This patch fixes it by not allowing to set rto_min with a value below >> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >> >> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >> Signed-off-by: Xin Long <lucien.xin@gmail.com> >> --- >> include/net/sctp/constants.h | 1 + >> net/sctp/socket.c | 10 +++++++--- >> net/sctp/sysctl.c | 3 ++- >> 3 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >> index 20ff237..2ee7a7b 100644 >> --- a/include/net/sctp/constants.h >> +++ b/include/net/sctp/constants.h >> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >> #define SCTP_RTO_INITIAL (3 * 1000) >> #define SCTP_RTO_MIN (1 * 1000) >> #define SCTP_RTO_MAX (60 * 1000) >> +#define SCTP_RTO_HARD_MIN 200 >> >> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >> index ae7e7c6..6ef12c7 100644 >> --- a/net/sctp/socket.c >> +++ b/net/sctp/socket.c >> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >> * be changed. >> * >> */ >> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >> + unsigned int optlen) >> { >> struct sctp_rtoinfo rtoinfo; >> struct sctp_association *asoc; >> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >> else >> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >> >> - if (rto_min) >> + if (rto_min) { >> + if (rto_min < SCTP_RTO_HARD_MIN) >> + return -EINVAL; >> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >> - else >> + } else { >> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >> + } >> >> if (rto_min > rto_max) >> return -EINVAL; >> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >> index 33ca5b7..7ec854a 100644 >> --- a/net/sctp/sysctl.c >> +++ b/net/sctp/sysctl.c >> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >> static int rto_beta_min = 0; >> static int rto_alpha_max = 1000; >> static int rto_beta_max = 1000; >> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >> >> static unsigned long max_autoclose_min = 0; >> static unsigned long max_autoclose_max >> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >> .maxlen = sizeof(unsigned int), >> .mode = 0644, >> .proc_handler = proc_sctp_do_rto_min, >> - .extra1 = &one, >> + .extra1 = &rto_hard_min, >> .extra2 = &init_net.sctp.rto_max >> }, >> { >> -- >> 2.1.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > well > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. So could this be reduced? Best regards Michael > Acked-by: Neil Horman <nhorman@tuxdriver.com> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-26 15:42 ` Michael Tuexen @ 2018-05-26 15:50 ` Dmitry Vyukov -1 siblings, 0 replies; 34+ messages in thread From: Dmitry Vyukov @ 2018-05-26 15:50 UTC (permalink / raw) To: Michael Tuexen Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen <michael.tuexen@lurchi.franken.de> wrote: >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >> >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>> value, hb_timer will get stuck there, as in its timer handler it starts >>> this timer again with this value, then goes to the timer handler again. >>> >>> This problem is there since very beginning, and thanks to Eric for the >>> reproducer shared from a syzbot mail. >>> >>> This patch fixes it by not allowing to set rto_min with a value below >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>> >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>> --- >>> include/net/sctp/constants.h | 1 + >>> net/sctp/socket.c | 10 +++++++--- >>> net/sctp/sysctl.c | 3 ++- >>> 3 files changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>> index 20ff237..2ee7a7b 100644 >>> --- a/include/net/sctp/constants.h >>> +++ b/include/net/sctp/constants.h >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>> #define SCTP_RTO_INITIAL (3 * 1000) >>> #define SCTP_RTO_MIN (1 * 1000) >>> #define SCTP_RTO_MAX (60 * 1000) >>> +#define SCTP_RTO_HARD_MIN 200 >>> >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>> index ae7e7c6..6ef12c7 100644 >>> --- a/net/sctp/socket.c >>> +++ b/net/sctp/socket.c >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>> * be changed. >>> * >>> */ >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>> + unsigned int optlen) >>> { >>> struct sctp_rtoinfo rtoinfo; >>> struct sctp_association *asoc; >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>> else >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>> >>> - if (rto_min) >>> + if (rto_min) { >>> + if (rto_min < SCTP_RTO_HARD_MIN) >>> + return -EINVAL; >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>> - else >>> + } else { >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>> + } >>> >>> if (rto_min > rto_max) >>> return -EINVAL; >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>> index 33ca5b7..7ec854a 100644 >>> --- a/net/sctp/sysctl.c >>> +++ b/net/sctp/sysctl.c >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>> static int rto_beta_min = 0; >>> static int rto_alpha_max = 1000; >>> static int rto_beta_max = 1000; >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>> >>> static unsigned long max_autoclose_min = 0; >>> static unsigned long max_autoclose_max = >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>> .maxlen = sizeof(unsigned int), >>> .mode = 0644, >>> .proc_handler = proc_sctp_do_rto_min, >>> - .extra1 = &one, >>> + .extra1 = &rto_hard_min, >>> .extra2 = &init_net.sctp.rto_max >>> }, >>> { >>> -- >>> 2.1.0 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >> well >> > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > So could this be reduced? Hi Michael, What value do they use? Xin, Neil, is there more principled way of ensuring that a timer won't cause a hard CPU stall? There are slow machines and there are slow kernels (in particular syzbot kernel has tons of debug configs enabled). 200ms _should_ not cause problems because we did not see them with tcp. But it's hard to say what's the low limit as we are trying to put a hard upper bound on execution time of a complex section of code. Is there something like cond_resched for timers? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-26 15:50 ` Dmitry Vyukov 0 siblings, 0 replies; 34+ messages in thread From: Dmitry Vyukov @ 2018-05-26 15:50 UTC (permalink / raw) To: Michael Tuexen Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen <michael.tuexen@lurchi.franken.de> wrote: >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >> >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>> value, hb_timer will get stuck there, as in its timer handler it starts >>> this timer again with this value, then goes to the timer handler again. >>> >>> This problem is there since very beginning, and thanks to Eric for the >>> reproducer shared from a syzbot mail. >>> >>> This patch fixes it by not allowing to set rto_min with a value below >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>> >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>> --- >>> include/net/sctp/constants.h | 1 + >>> net/sctp/socket.c | 10 +++++++--- >>> net/sctp/sysctl.c | 3 ++- >>> 3 files changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>> index 20ff237..2ee7a7b 100644 >>> --- a/include/net/sctp/constants.h >>> +++ b/include/net/sctp/constants.h >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>> #define SCTP_RTO_INITIAL (3 * 1000) >>> #define SCTP_RTO_MIN (1 * 1000) >>> #define SCTP_RTO_MAX (60 * 1000) >>> +#define SCTP_RTO_HARD_MIN 200 >>> >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>> index ae7e7c6..6ef12c7 100644 >>> --- a/net/sctp/socket.c >>> +++ b/net/sctp/socket.c >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>> * be changed. >>> * >>> */ >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>> + unsigned int optlen) >>> { >>> struct sctp_rtoinfo rtoinfo; >>> struct sctp_association *asoc; >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>> else >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>> >>> - if (rto_min) >>> + if (rto_min) { >>> + if (rto_min < SCTP_RTO_HARD_MIN) >>> + return -EINVAL; >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>> - else >>> + } else { >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>> + } >>> >>> if (rto_min > rto_max) >>> return -EINVAL; >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>> index 33ca5b7..7ec854a 100644 >>> --- a/net/sctp/sysctl.c >>> +++ b/net/sctp/sysctl.c >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>> static int rto_beta_min = 0; >>> static int rto_alpha_max = 1000; >>> static int rto_beta_max = 1000; >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>> >>> static unsigned long max_autoclose_min = 0; >>> static unsigned long max_autoclose_max >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>> .maxlen = sizeof(unsigned int), >>> .mode = 0644, >>> .proc_handler = proc_sctp_do_rto_min, >>> - .extra1 = &one, >>> + .extra1 = &rto_hard_min, >>> .extra2 = &init_net.sctp.rto_max >>> }, >>> { >>> -- >>> 2.1.0 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >> well >> > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > So could this be reduced? Hi Michael, What value do they use? Xin, Neil, is there more principled way of ensuring that a timer won't cause a hard CPU stall? There are slow machines and there are slow kernels (in particular syzbot kernel has tons of debug configs enabled). 200ms _should_ not cause problems because we did not see them with tcp. But it's hard to say what's the low limit as we are trying to put a hard upper bound on execution time of a complex section of code. Is there something like cond_resched for timers? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-26 15:50 ` Dmitry Vyukov @ 2018-05-27 1:01 ` Neil Horman -1 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-27 1:01 UTC (permalink / raw) To: Dmitry Vyukov Cc: Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > <michael.tuexen@lurchi.franken.de> wrote: > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > >> > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > >>> value, hb_timer will get stuck there, as in its timer handler it starts > >>> this timer again with this value, then goes to the timer handler again. > >>> > >>> This problem is there since very beginning, and thanks to Eric for the > >>> reproducer shared from a syzbot mail. > >>> > >>> This patch fixes it by not allowing to set rto_min with a value below > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > >>> > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>> --- > >>> include/net/sctp/constants.h | 1 + > >>> net/sctp/socket.c | 10 +++++++--- > >>> net/sctp/sysctl.c | 3 ++- > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > >>> index 20ff237..2ee7a7b 100644 > >>> --- a/include/net/sctp/constants.h > >>> +++ b/include/net/sctp/constants.h > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > >>> #define SCTP_RTO_INITIAL (3 * 1000) > >>> #define SCTP_RTO_MIN (1 * 1000) > >>> #define SCTP_RTO_MAX (60 * 1000) > >>> +#define SCTP_RTO_HARD_MIN 200 > >>> > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > >>> index ae7e7c6..6ef12c7 100644 > >>> --- a/net/sctp/socket.c > >>> +++ b/net/sctp/socket.c > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > >>> * be changed. > >>> * > >>> */ > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > >>> + unsigned int optlen) > >>> { > >>> struct sctp_rtoinfo rtoinfo; > >>> struct sctp_association *asoc; > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > >>> else > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > >>> > >>> - if (rto_min) > >>> + if (rto_min) { > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > >>> + return -EINVAL; > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > >>> - else > >>> + } else { > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > >>> + } > >>> > >>> if (rto_min > rto_max) > >>> return -EINVAL; > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > >>> index 33ca5b7..7ec854a 100644 > >>> --- a/net/sctp/sysctl.c > >>> +++ b/net/sctp/sysctl.c > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > >>> static int rto_beta_min = 0; > >>> static int rto_alpha_max = 1000; > >>> static int rto_beta_max = 1000; > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > >>> > >>> static unsigned long max_autoclose_min = 0; > >>> static unsigned long max_autoclose_max = > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > >>> .maxlen = sizeof(unsigned int), > >>> .mode = 0644, > >>> .proc_handler = proc_sctp_do_rto_min, > >>> - .extra1 = &one, > >>> + .extra1 = &rto_hard_min, > >>> .extra2 = &init_net.sctp.rto_max > >>> }, > >>> { > >>> -- > >>> 2.1.0 > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >> well > >> > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > So could this be reduced? > > Hi Michael, > > What value do they use? > > Xin, Neil, is there more principled way of ensuring that a timer won't > cause a hard CPU stall? There are slow machines and there are slow > kernels (in particular syzbot kernel has tons of debug configs > enabled). 200ms _should_ not cause problems because we did not see > them with tcp. But it's hard to say what's the low limit as we are > trying to put a hard upper bound on execution time of a complex > section of code. Is there something like cond_resched for timers? Unfortunately, Theres not really a way to do conditional rescheduling of timers, additionally, we have a problem because the timer is reset as a side effect of the SCTP state machine, and so the execution time between timer updates has a signifcant amount of jitter (meaning its a pretty hard value to calibrate, unless you just select a 'safe' large value for the floor). What we might could do (though this might impact the protocol function is change the timer update side effects to simply set a flag, and consistently update the timers on exit from sctp_do_sm, so they don't re-arm until all state machine processing is complete. Anyone have any thoughts on that? Neil > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-27 1:01 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-27 1:01 UTC (permalink / raw) To: Dmitry Vyukov Cc: Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > <michael.tuexen@lurchi.franken.de> wrote: > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > >> > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > >>> value, hb_timer will get stuck there, as in its timer handler it starts > >>> this timer again with this value, then goes to the timer handler again. > >>> > >>> This problem is there since very beginning, and thanks to Eric for the > >>> reproducer shared from a syzbot mail. > >>> > >>> This patch fixes it by not allowing to set rto_min with a value below > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > >>> > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>> --- > >>> include/net/sctp/constants.h | 1 + > >>> net/sctp/socket.c | 10 +++++++--- > >>> net/sctp/sysctl.c | 3 ++- > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > >>> index 20ff237..2ee7a7b 100644 > >>> --- a/include/net/sctp/constants.h > >>> +++ b/include/net/sctp/constants.h > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > >>> #define SCTP_RTO_INITIAL (3 * 1000) > >>> #define SCTP_RTO_MIN (1 * 1000) > >>> #define SCTP_RTO_MAX (60 * 1000) > >>> +#define SCTP_RTO_HARD_MIN 200 > >>> > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > >>> index ae7e7c6..6ef12c7 100644 > >>> --- a/net/sctp/socket.c > >>> +++ b/net/sctp/socket.c > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > >>> * be changed. > >>> * > >>> */ > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > >>> + unsigned int optlen) > >>> { > >>> struct sctp_rtoinfo rtoinfo; > >>> struct sctp_association *asoc; > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > >>> else > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > >>> > >>> - if (rto_min) > >>> + if (rto_min) { > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > >>> + return -EINVAL; > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > >>> - else > >>> + } else { > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > >>> + } > >>> > >>> if (rto_min > rto_max) > >>> return -EINVAL; > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > >>> index 33ca5b7..7ec854a 100644 > >>> --- a/net/sctp/sysctl.c > >>> +++ b/net/sctp/sysctl.c > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > >>> static int rto_beta_min = 0; > >>> static int rto_alpha_max = 1000; > >>> static int rto_beta_max = 1000; > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > >>> > >>> static unsigned long max_autoclose_min = 0; > >>> static unsigned long max_autoclose_max > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > >>> .maxlen = sizeof(unsigned int), > >>> .mode = 0644, > >>> .proc_handler = proc_sctp_do_rto_min, > >>> - .extra1 = &one, > >>> + .extra1 = &rto_hard_min, > >>> .extra2 = &init_net.sctp.rto_max > >>> }, > >>> { > >>> -- > >>> 2.1.0 > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >> well > >> > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > So could this be reduced? > > Hi Michael, > > What value do they use? > > Xin, Neil, is there more principled way of ensuring that a timer won't > cause a hard CPU stall? There are slow machines and there are slow > kernels (in particular syzbot kernel has tons of debug configs > enabled). 200ms _should_ not cause problems because we did not see > them with tcp. But it's hard to say what's the low limit as we are > trying to put a hard upper bound on execution time of a complex > section of code. Is there something like cond_resched for timers? Unfortunately, Theres not really a way to do conditional rescheduling of timers, additionally, we have a problem because the timer is reset as a side effect of the SCTP state machine, and so the execution time between timer updates has a signifcant amount of jitter (meaning its a pretty hard value to calibrate, unless you just select a 'safe' large value for the floor). What we might could do (though this might impact the protocol function is change the timer update side effects to simply set a flag, and consistently update the timers on exit from sctp_do_sm, so they don't re-arm until all state machine processing is complete. Anyone have any thoughts on that? Neil > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-27 1:01 ` Neil Horman @ 2018-05-28 19:43 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-28 19:43 UTC (permalink / raw) To: Neil Horman Cc: Dmitry Vyukov, Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > > <michael.tuexen@lurchi.franken.de> wrote: > > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > >> > > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > > >>> value, hb_timer will get stuck there, as in its timer handler it starts > > >>> this timer again with this value, then goes to the timer handler again. > > >>> > > >>> This problem is there since very beginning, and thanks to Eric for the > > >>> reproducer shared from a syzbot mail. > > >>> > > >>> This patch fixes it by not allowing to set rto_min with a value below > > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > >>> > > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > > >>> --- > > >>> include/net/sctp/constants.h | 1 + > > >>> net/sctp/socket.c | 10 +++++++--- > > >>> net/sctp/sysctl.c | 3 ++- > > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > > >>> > > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > > >>> index 20ff237..2ee7a7b 100644 > > >>> --- a/include/net/sctp/constants.h > > >>> +++ b/include/net/sctp/constants.h > > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > > >>> #define SCTP_RTO_INITIAL (3 * 1000) > > >>> #define SCTP_RTO_MIN (1 * 1000) > > >>> #define SCTP_RTO_MAX (60 * 1000) > > >>> +#define SCTP_RTO_HARD_MIN 200 > > >>> > > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > >>> index ae7e7c6..6ef12c7 100644 > > >>> --- a/net/sctp/socket.c > > >>> +++ b/net/sctp/socket.c > > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > > >>> * be changed. > > >>> * > > >>> */ > > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > > >>> + unsigned int optlen) > > >>> { > > >>> struct sctp_rtoinfo rtoinfo; > > >>> struct sctp_association *asoc; > > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > > >>> else > > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > >>> > > >>> - if (rto_min) > > >>> + if (rto_min) { > > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > > >>> + return -EINVAL; > > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > > >>> - else > > >>> + } else { > > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > > >>> + } > > >>> > > >>> if (rto_min > rto_max) > > >>> return -EINVAL; > > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > > >>> index 33ca5b7..7ec854a 100644 > > >>> --- a/net/sctp/sysctl.c > > >>> +++ b/net/sctp/sysctl.c > > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > > >>> static int rto_beta_min = 0; > > >>> static int rto_alpha_max = 1000; > > >>> static int rto_beta_max = 1000; > > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > >>> > > >>> static unsigned long max_autoclose_min = 0; > > >>> static unsigned long max_autoclose_max = > > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > > >>> .maxlen = sizeof(unsigned int), > > >>> .mode = 0644, > > >>> .proc_handler = proc_sctp_do_rto_min, > > >>> - .extra1 = &one, > > >>> + .extra1 = &rto_hard_min, > > >>> .extra2 = &init_net.sctp.rto_max > > >>> }, > > >>> { > > >>> -- > > >>> 2.1.0 > > >>> > > >>> -- > > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > > >>> the body of a message to majordomo@vger.kernel.org > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >>> > > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > > >> well > > >> > > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > > So could this be reduced? > > > > Hi Michael, > > > > What value do they use? > > > > Xin, Neil, is there more principled way of ensuring that a timer won't > > cause a hard CPU stall? There are slow machines and there are slow > > kernels (in particular syzbot kernel has tons of debug configs > > enabled). 200ms _should_ not cause problems because we did not see > > them with tcp. But it's hard to say what's the low limit as we are > > trying to put a hard upper bound on execution time of a complex > > section of code. Is there something like cond_resched for timers? > Unfortunately, Theres not really a way to do conditional rescheduling of timers, > additionally, we have a problem because the timer is reset as a side effect of > the SCTP state machine, and so the execution time between timer updates has a > signifcant amount of jitter (meaning its a pretty hard value to calibrate, > unless you just select a 'safe' large value for the floor). > > What we might could do (though this might impact the protocol function is change > the timer update side effects to simply set a flag, and consistently update the > timers on exit from sctp_do_sm, so they don't re-arm until all state machine > processing is complete. Anyone have any thoughts on that? I was reviewing all this again and I'm thinking that we are missing the real point. With the parameters that reproducer [1] has, setting those very low RTO parameters, it causes the timer to actually busyloop on the heartbeats, as Xin had explained. But thing is, it busy loops not just because RTO is too low, but because hbinterval was not accounted. /* What is the next timeout value for this transport? */ unsigned long sctp_transport_timeout(struct sctp_transport *trans) { /* RTO + timer slack +/- 50% of RTO */ unsigned long timeout = trans->rto >> 1; <-- [a] if (trans->state != SCTP_UNCONFIRMED && trans->state != SCTP_PF) <--- [2] timeout += trans->hbinterval; return timeout; } The if() in [2] is to speed up path verification before using them, as per the commit changelog. Secondary paths added on processing the cookie are created with status SCTP_UNCONFIRMED, and HB timers are started in the sequence: sctp_sf_do_5_1D_ce -> sctp_process_init |> sctp_process_param | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); which starts the timer using only the small RTO for secondary paths: static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, struct sctp_association *asoc) { struct sctp_transport *t; /* Start a heartbeat timer for each transport on the association. * hold a reference on the transport to make sure none of * the needed data structures go away. */ list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) sctp_transport_reset_hb_timer(t); } But if the system is too busy generating HBs, it likely won't process incoming HB ACKs, which would stop the loop as it would mark the transport as Active. I'm now thinking a better fix would be to have a specific way to kickstart these initial heartbeets, and then always use hbinterval on subsequent ones. This would not only fix the issue, but also improve the time we need to identify the transports as Active upon association start, which is currently RTO/2 (equals to 500ms by default). While working on this, I got myself wondering how HZ can affect the stack with such small RTO. If we have HZ=250, for example, we probably should be careful when doing calcs such as in mark [a] to not let it tend to 0. This should not be related to the reported issue as syzkaller was using HZ=1000. (I didn't do any tests, this is only based on code review so far) 1. https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") b. https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-28 19:43 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-28 19:43 UTC (permalink / raw) To: Neil Horman Cc: Dmitry Vyukov, Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > > <michael.tuexen@lurchi.franken.de> wrote: > > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > >> > > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > > >>> value, hb_timer will get stuck there, as in its timer handler it starts > > >>> this timer again with this value, then goes to the timer handler again. > > >>> > > >>> This problem is there since very beginning, and thanks to Eric for the > > >>> reproducer shared from a syzbot mail. > > >>> > > >>> This patch fixes it by not allowing to set rto_min with a value below > > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > >>> > > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > > >>> --- > > >>> include/net/sctp/constants.h | 1 + > > >>> net/sctp/socket.c | 10 +++++++--- > > >>> net/sctp/sysctl.c | 3 ++- > > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > > >>> > > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > > >>> index 20ff237..2ee7a7b 100644 > > >>> --- a/include/net/sctp/constants.h > > >>> +++ b/include/net/sctp/constants.h > > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > > >>> #define SCTP_RTO_INITIAL (3 * 1000) > > >>> #define SCTP_RTO_MIN (1 * 1000) > > >>> #define SCTP_RTO_MAX (60 * 1000) > > >>> +#define SCTP_RTO_HARD_MIN 200 > > >>> > > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > >>> index ae7e7c6..6ef12c7 100644 > > >>> --- a/net/sctp/socket.c > > >>> +++ b/net/sctp/socket.c > > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > > >>> * be changed. > > >>> * > > >>> */ > > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > > >>> + unsigned int optlen) > > >>> { > > >>> struct sctp_rtoinfo rtoinfo; > > >>> struct sctp_association *asoc; > > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > > >>> else > > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > >>> > > >>> - if (rto_min) > > >>> + if (rto_min) { > > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > > >>> + return -EINVAL; > > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > > >>> - else > > >>> + } else { > > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > > >>> + } > > >>> > > >>> if (rto_min > rto_max) > > >>> return -EINVAL; > > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > > >>> index 33ca5b7..7ec854a 100644 > > >>> --- a/net/sctp/sysctl.c > > >>> +++ b/net/sctp/sysctl.c > > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > > >>> static int rto_beta_min = 0; > > >>> static int rto_alpha_max = 1000; > > >>> static int rto_beta_max = 1000; > > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > >>> > > >>> static unsigned long max_autoclose_min = 0; > > >>> static unsigned long max_autoclose_max > > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > > >>> .maxlen = sizeof(unsigned int), > > >>> .mode = 0644, > > >>> .proc_handler = proc_sctp_do_rto_min, > > >>> - .extra1 = &one, > > >>> + .extra1 = &rto_hard_min, > > >>> .extra2 = &init_net.sctp.rto_max > > >>> }, > > >>> { > > >>> -- > > >>> 2.1.0 > > >>> > > >>> -- > > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > > >>> the body of a message to majordomo@vger.kernel.org > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >>> > > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > > >> well > > >> > > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > > So could this be reduced? > > > > Hi Michael, > > > > What value do they use? > > > > Xin, Neil, is there more principled way of ensuring that a timer won't > > cause a hard CPU stall? There are slow machines and there are slow > > kernels (in particular syzbot kernel has tons of debug configs > > enabled). 200ms _should_ not cause problems because we did not see > > them with tcp. But it's hard to say what's the low limit as we are > > trying to put a hard upper bound on execution time of a complex > > section of code. Is there something like cond_resched for timers? > Unfortunately, Theres not really a way to do conditional rescheduling of timers, > additionally, we have a problem because the timer is reset as a side effect of > the SCTP state machine, and so the execution time between timer updates has a > signifcant amount of jitter (meaning its a pretty hard value to calibrate, > unless you just select a 'safe' large value for the floor). > > What we might could do (though this might impact the protocol function is change > the timer update side effects to simply set a flag, and consistently update the > timers on exit from sctp_do_sm, so they don't re-arm until all state machine > processing is complete. Anyone have any thoughts on that? I was reviewing all this again and I'm thinking that we are missing the real point. With the parameters that reproducer [1] has, setting those very low RTO parameters, it causes the timer to actually busyloop on the heartbeats, as Xin had explained. But thing is, it busy loops not just because RTO is too low, but because hbinterval was not accounted. /* What is the next timeout value for this transport? */ unsigned long sctp_transport_timeout(struct sctp_transport *trans) { /* RTO + timer slack +/- 50% of RTO */ unsigned long timeout = trans->rto >> 1; <-- [a] if (trans->state != SCTP_UNCONFIRMED && trans->state != SCTP_PF) <--- [2] timeout += trans->hbinterval; return timeout; } The if() in [2] is to speed up path verification before using them, as per the commit changelog. Secondary paths added on processing the cookie are created with status SCTP_UNCONFIRMED, and HB timers are started in the sequence: sctp_sf_do_5_1D_ce -> sctp_process_init |> sctp_process_param | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); which starts the timer using only the small RTO for secondary paths: static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, struct sctp_association *asoc) { struct sctp_transport *t; /* Start a heartbeat timer for each transport on the association. * hold a reference on the transport to make sure none of * the needed data structures go away. */ list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) sctp_transport_reset_hb_timer(t); } But if the system is too busy generating HBs, it likely won't process incoming HB ACKs, which would stop the loop as it would mark the transport as Active. I'm now thinking a better fix would be to have a specific way to kickstart these initial heartbeets, and then always use hbinterval on subsequent ones. This would not only fix the issue, but also improve the time we need to identify the transports as Active upon association start, which is currently RTO/2 (equals to 500ms by default). While working on this, I got myself wondering how HZ can affect the stack with such small RTO. If we have HZ%0, for example, we probably should be careful when doing calcs such as in mark [a] to not let it tend to 0. This should not be related to the reported issue as syzkaller was using HZ\x1000. (I didn't do any tests, this is only based on code review so far) 1. https://syzkaller.appspot.com/x/repro.syz?x\x1079cf8f800000 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") b. https://syzkaller.appspot.com/x/.config?xób4e30da84ec1ed ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-28 19:43 ` Marcelo Ricardo Leitner @ 2018-05-29 11:41 ` Neil Horman -1 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-29 11:41 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Dmitry Vyukov, Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: > On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > > On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > > > <michael.tuexen@lurchi.franken.de> wrote: > > > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > > >> > > > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > > > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > > > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > > > >>> value, hb_timer will get stuck there, as in its timer handler it starts > > > >>> this timer again with this value, then goes to the timer handler again. > > > >>> > > > >>> This problem is there since very beginning, and thanks to Eric for the > > > >>> reproducer shared from a syzbot mail. > > > >>> > > > >>> This patch fixes it by not allowing to set rto_min with a value below > > > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > > >>> > > > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > > > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > > > >>> --- > > > >>> include/net/sctp/constants.h | 1 + > > > >>> net/sctp/socket.c | 10 +++++++--- > > > >>> net/sctp/sysctl.c | 3 ++- > > > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > > > >>> > > > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > > > >>> index 20ff237..2ee7a7b 100644 > > > >>> --- a/include/net/sctp/constants.h > > > >>> +++ b/include/net/sctp/constants.h > > > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > > > >>> #define SCTP_RTO_INITIAL (3 * 1000) > > > >>> #define SCTP_RTO_MIN (1 * 1000) > > > >>> #define SCTP_RTO_MAX (60 * 1000) > > > >>> +#define SCTP_RTO_HARD_MIN 200 > > > >>> > > > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > > > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > > > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > > >>> index ae7e7c6..6ef12c7 100644 > > > >>> --- a/net/sctp/socket.c > > > >>> +++ b/net/sctp/socket.c > > > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > > > >>> * be changed. > > > >>> * > > > >>> */ > > > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > > > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > > > >>> + unsigned int optlen) > > > >>> { > > > >>> struct sctp_rtoinfo rtoinfo; > > > >>> struct sctp_association *asoc; > > > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > > > >>> else > > > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > > >>> > > > >>> - if (rto_min) > > > >>> + if (rto_min) { > > > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > > > >>> + return -EINVAL; > > > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > > > >>> - else > > > >>> + } else { > > > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > > > >>> + } > > > >>> > > > >>> if (rto_min > rto_max) > > > >>> return -EINVAL; > > > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > > > >>> index 33ca5b7..7ec854a 100644 > > > >>> --- a/net/sctp/sysctl.c > > > >>> +++ b/net/sctp/sysctl.c > > > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > > > >>> static int rto_beta_min = 0; > > > >>> static int rto_alpha_max = 1000; > > > >>> static int rto_beta_max = 1000; > > > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > > >>> > > > >>> static unsigned long max_autoclose_min = 0; > > > >>> static unsigned long max_autoclose_max = > > > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > > > >>> .maxlen = sizeof(unsigned int), > > > >>> .mode = 0644, > > > >>> .proc_handler = proc_sctp_do_rto_min, > > > >>> - .extra1 = &one, > > > >>> + .extra1 = &rto_hard_min, > > > >>> .extra2 = &init_net.sctp.rto_max > > > >>> }, > > > >>> { > > > >>> -- > > > >>> 2.1.0 > > > >>> > > > >>> -- > > > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > > > >>> the body of a message to majordomo@vger.kernel.org > > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >>> > > > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > > > >> well > > > >> > > > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > > > So could this be reduced? > > > > > > Hi Michael, > > > > > > What value do they use? > > > > > > Xin, Neil, is there more principled way of ensuring that a timer won't > > > cause a hard CPU stall? There are slow machines and there are slow > > > kernels (in particular syzbot kernel has tons of debug configs > > > enabled). 200ms _should_ not cause problems because we did not see > > > them with tcp. But it's hard to say what's the low limit as we are > > > trying to put a hard upper bound on execution time of a complex > > > section of code. Is there something like cond_resched for timers? > > Unfortunately, Theres not really a way to do conditional rescheduling of timers, > > additionally, we have a problem because the timer is reset as a side effect of > > the SCTP state machine, and so the execution time between timer updates has a > > signifcant amount of jitter (meaning its a pretty hard value to calibrate, > > unless you just select a 'safe' large value for the floor). > > > > What we might could do (though this might impact the protocol function is change > > the timer update side effects to simply set a flag, and consistently update the > > timers on exit from sctp_do_sm, so they don't re-arm until all state machine > > processing is complete. Anyone have any thoughts on that? > > I was reviewing all this again and I'm thinking that we are missing > the real point. With the parameters that reproducer [1] has, setting > those very low RTO parameters, it causes the timer to actually > busyloop on the heartbeats, as Xin had explained. > > But thing is, it busy loops not just because RTO is too low, but > because hbinterval was not accounted. > > /* What is the next timeout value for this transport? */ > unsigned long sctp_transport_timeout(struct sctp_transport *trans) > { > /* RTO + timer slack +/- 50% of RTO */ > unsigned long timeout = trans->rto >> 1; <-- [a] > > if (trans->state != SCTP_UNCONFIRMED && > trans->state != SCTP_PF) <--- [2] > timeout += trans->hbinterval; > > return timeout; > } > > The if() in [2] is to speed up path verification before using them, as > per the commit changelog. Secondary paths added on processing the > cookie are created with status SCTP_UNCONFIRMED, and HB timers are > started in the sequence: > sctp_sf_do_5_1D_ce > -> sctp_process_init > |> sctp_process_param > | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) > '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); > > which starts the timer using only the small RTO for secondary paths: > static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, > struct sctp_association *asoc) > { > struct sctp_transport *t; > > /* Start a heartbeat timer for each transport on the association. > * hold a reference on the transport to make sure none of > * the needed data structures go away. > */ > list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) > sctp_transport_reset_hb_timer(t); > } > > But if the system is too busy generating HBs, it likely won't process > incoming HB ACKs, which would stop the loop as it would mark the > transport as Active. > > I'm now thinking a better fix would be to have a specific way to > kickstart these initial heartbeets, and then always use hbinterval on > subsequent ones. > I like the idea, but I don't think we can just use the hbinterval to set the timeout. That said, it seems like we should always be using the HB interval, not just on unconfirmed or partially failed transports. From the RFC: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered It seems like we should be adding it to the timer expiration universally. By my read, we've never done this quite right. And yes, I agree, if we account this properly, we will avoid this issue. Its also probably important to note here, that, like RTO.min currently, there is no hard floor to the heartbeat interval, and the RFC is silent on what it should be. So it would be possible to still find ourselves in this situation if we set the interval to 0 from userspace. Is it worth considering a floor on the minimum hb interval of the rto is to have no floor? Neil > This would not only fix the issue, but also improve the time we need > to identify the transports as Active upon association start, which is > currently RTO/2 (equals to 500ms by default). > > While working on this, I got myself wondering how HZ can affect the > stack with such small RTO. If we have HZ=250, for example, we probably > should be careful when doing calcs such as in mark [a] to not let it > tend to 0. This should not be related to the reported issue as > syzkaller was using HZ=1000. > > (I didn't do any tests, this is only based on code review so far) > > 1. https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000 > 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") > b. https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 11:41 ` Neil Horman 0 siblings, 0 replies; 34+ messages in thread From: Neil Horman @ 2018-05-29 11:41 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Dmitry Vyukov, Michael Tuexen, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: > On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > > On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > > > <michael.tuexen@lurchi.franken.de> wrote: > > > >> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > > > >> > > > >> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > > > >>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > > > >>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > > > >>> value, hb_timer will get stuck there, as in its timer handler it starts > > > >>> this timer again with this value, then goes to the timer handler again. > > > >>> > > > >>> This problem is there since very beginning, and thanks to Eric for the > > > >>> reproducer shared from a syzbot mail. > > > >>> > > > >>> This patch fixes it by not allowing to set rto_min with a value below > > > >>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > > > >>> > > > >>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > > > >>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > > > >>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > > > >>> --- > > > >>> include/net/sctp/constants.h | 1 + > > > >>> net/sctp/socket.c | 10 +++++++--- > > > >>> net/sctp/sysctl.c | 3 ++- > > > >>> 3 files changed, 10 insertions(+), 4 deletions(-) > > > >>> > > > >>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > > > >>> index 20ff237..2ee7a7b 100644 > > > >>> --- a/include/net/sctp/constants.h > > > >>> +++ b/include/net/sctp/constants.h > > > >>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > > > >>> #define SCTP_RTO_INITIAL (3 * 1000) > > > >>> #define SCTP_RTO_MIN (1 * 1000) > > > >>> #define SCTP_RTO_MAX (60 * 1000) > > > >>> +#define SCTP_RTO_HARD_MIN 200 > > > >>> > > > >>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > > > >>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > > > >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > > >>> index ae7e7c6..6ef12c7 100644 > > > >>> --- a/net/sctp/socket.c > > > >>> +++ b/net/sctp/socket.c > > > >>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > > > >>> * be changed. > > > >>> * > > > >>> */ > > > >>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > > > >>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > > > >>> + unsigned int optlen) > > > >>> { > > > >>> struct sctp_rtoinfo rtoinfo; > > > >>> struct sctp_association *asoc; > > > >>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > > > >>> else > > > >>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > > > >>> > > > >>> - if (rto_min) > > > >>> + if (rto_min) { > > > >>> + if (rto_min < SCTP_RTO_HARD_MIN) > > > >>> + return -EINVAL; > > > >>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > > > >>> - else > > > >>> + } else { > > > >>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > > > >>> + } > > > >>> > > > >>> if (rto_min > rto_max) > > > >>> return -EINVAL; > > > >>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > > > >>> index 33ca5b7..7ec854a 100644 > > > >>> --- a/net/sctp/sysctl.c > > > >>> +++ b/net/sctp/sysctl.c > > > >>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > > > >>> static int rto_beta_min = 0; > > > >>> static int rto_alpha_max = 1000; > > > >>> static int rto_beta_max = 1000; > > > >>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > > > >>> > > > >>> static unsigned long max_autoclose_min = 0; > > > >>> static unsigned long max_autoclose_max > > > >>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > > > >>> .maxlen = sizeof(unsigned int), > > > >>> .mode = 0644, > > > >>> .proc_handler = proc_sctp_do_rto_min, > > > >>> - .extra1 = &one, > > > >>> + .extra1 = &rto_hard_min, > > > >>> .extra2 = &init_net.sctp.rto_max > > > >>> }, > > > >>> { > > > >>> -- > > > >>> 2.1.0 > > > >>> > > > >>> -- > > > >>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > > > >>> the body of a message to majordomo@vger.kernel.org > > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >>> > > > >> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > > > >> well > > > >> > > > > I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > > > > So could this be reduced? > > > > > > Hi Michael, > > > > > > What value do they use? > > > > > > Xin, Neil, is there more principled way of ensuring that a timer won't > > > cause a hard CPU stall? There are slow machines and there are slow > > > kernels (in particular syzbot kernel has tons of debug configs > > > enabled). 200ms _should_ not cause problems because we did not see > > > them with tcp. But it's hard to say what's the low limit as we are > > > trying to put a hard upper bound on execution time of a complex > > > section of code. Is there something like cond_resched for timers? > > Unfortunately, Theres not really a way to do conditional rescheduling of timers, > > additionally, we have a problem because the timer is reset as a side effect of > > the SCTP state machine, and so the execution time between timer updates has a > > signifcant amount of jitter (meaning its a pretty hard value to calibrate, > > unless you just select a 'safe' large value for the floor). > > > > What we might could do (though this might impact the protocol function is change > > the timer update side effects to simply set a flag, and consistently update the > > timers on exit from sctp_do_sm, so they don't re-arm until all state machine > > processing is complete. Anyone have any thoughts on that? > > I was reviewing all this again and I'm thinking that we are missing > the real point. With the parameters that reproducer [1] has, setting > those very low RTO parameters, it causes the timer to actually > busyloop on the heartbeats, as Xin had explained. > > But thing is, it busy loops not just because RTO is too low, but > because hbinterval was not accounted. > > /* What is the next timeout value for this transport? */ > unsigned long sctp_transport_timeout(struct sctp_transport *trans) > { > /* RTO + timer slack +/- 50% of RTO */ > unsigned long timeout = trans->rto >> 1; <-- [a] > > if (trans->state != SCTP_UNCONFIRMED && > trans->state != SCTP_PF) <--- [2] > timeout += trans->hbinterval; > > return timeout; > } > > The if() in [2] is to speed up path verification before using them, as > per the commit changelog. Secondary paths added on processing the > cookie are created with status SCTP_UNCONFIRMED, and HB timers are > started in the sequence: > sctp_sf_do_5_1D_ce > -> sctp_process_init > |> sctp_process_param > | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) > '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); > > which starts the timer using only the small RTO for secondary paths: > static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, > struct sctp_association *asoc) > { > struct sctp_transport *t; > > /* Start a heartbeat timer for each transport on the association. > * hold a reference on the transport to make sure none of > * the needed data structures go away. > */ > list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) > sctp_transport_reset_hb_timer(t); > } > > But if the system is too busy generating HBs, it likely won't process > incoming HB ACKs, which would stop the loop as it would mark the > transport as Active. > > I'm now thinking a better fix would be to have a specific way to > kickstart these initial heartbeets, and then always use hbinterval on > subsequent ones. > I like the idea, but I don't think we can just use the hbinterval to set the timeout. That said, it seems like we should always be using the HB interval, not just on unconfirmed or partially failed transports. From the RFC: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered It seems like we should be adding it to the timer expiration universally. By my read, we've never done this quite right. And yes, I agree, if we account this properly, we will avoid this issue. Its also probably important to note here, that, like RTO.min currently, there is no hard floor to the heartbeat interval, and the RFC is silent on what it should be. So it would be possible to still find ourselves in this situation if we set the interval to 0 from userspace. Is it worth considering a floor on the minimum hb interval of the rto is to have no floor? Neil > This would not only fix the issue, but also improve the time we need > to identify the transports as Active upon association start, which is > currently RTO/2 (equals to 500ms by default). > > While working on this, I got myself wondering how HZ can affect the > stack with such small RTO. If we have HZ%0, for example, we probably > should be careful when doing calcs such as in mark [a] to not let it > tend to 0. This should not be related to the reported issue as > syzkaller was using HZ\x1000. > > (I didn't do any tests, this is only based on code review so far) > > 1. https://syzkaller.appspot.com/x/repro.syz?x\x1079cf8f800000 > 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") > b. https://syzkaller.appspot.com/x/.config?xób4e30da84ec1ed > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 11:41 ` Neil Horman @ 2018-05-29 13:06 ` Michael Tuexen -1 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-29 13:06 UTC (permalink / raw) To: Neil Horman Cc: Marcelo Ricardo Leitner, Dmitry Vyukov, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller > On 29. May 2018, at 13:41, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: >> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: >>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: >>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen >>>> <michael.tuexen@lurchi.franken.de> wrote: >>>>>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >>>>>> >>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>>>>> this timer again with this value, then goes to the timer handler again. >>>>>>> >>>>>>> This problem is there since very beginning, and thanks to Eric for the >>>>>>> reproducer shared from a syzbot mail. >>>>>>> >>>>>>> This patch fixes it by not allowing to set rto_min with a value below >>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>>>>> >>>>>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>>>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>>>>> --- >>>>>>> include/net/sctp/constants.h | 1 + >>>>>>> net/sctp/socket.c | 10 +++++++--- >>>>>>> net/sctp/sysctl.c | 3 ++- >>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>>>>> index 20ff237..2ee7a7b 100644 >>>>>>> --- a/include/net/sctp/constants.h >>>>>>> +++ b/include/net/sctp/constants.h >>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>>>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>>>>> #define SCTP_RTO_MIN (1 * 1000) >>>>>>> #define SCTP_RTO_MAX (60 * 1000) >>>>>>> +#define SCTP_RTO_HARD_MIN 200 >>>>>>> >>>>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>>>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>>>>> index ae7e7c6..6ef12c7 100644 >>>>>>> --- a/net/sctp/socket.c >>>>>>> +++ b/net/sctp/socket.c >>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>>>>>> * be changed. >>>>>>> * >>>>>>> */ >>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>>>>> + unsigned int optlen) >>>>>>> { >>>>>>> struct sctp_rtoinfo rtoinfo; >>>>>>> struct sctp_association *asoc; >>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>>>>>> else >>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>>>>> >>>>>>> - if (rto_min) >>>>>>> + if (rto_min) { >>>>>>> + if (rto_min < SCTP_RTO_HARD_MIN) >>>>>>> + return -EINVAL; >>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>>>>> - else >>>>>>> + } else { >>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>>>>> + } >>>>>>> >>>>>>> if (rto_min > rto_max) >>>>>>> return -EINVAL; >>>>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>>>>>> index 33ca5b7..7ec854a 100644 >>>>>>> --- a/net/sctp/sysctl.c >>>>>>> +++ b/net/sctp/sysctl.c >>>>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>>>>>> static int rto_beta_min = 0; >>>>>>> static int rto_alpha_max = 1000; >>>>>>> static int rto_beta_max = 1000; >>>>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>>>>>> >>>>>>> static unsigned long max_autoclose_min = 0; >>>>>>> static unsigned long max_autoclose_max = >>>>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>>>>>> .maxlen = sizeof(unsigned int), >>>>>>> .mode = 0644, >>>>>>> .proc_handler = proc_sctp_do_rto_min, >>>>>>> - .extra1 = &one, >>>>>>> + .extra1 = &rto_hard_min, >>>>>>> .extra2 = &init_net.sctp.rto_max >>>>>>> }, >>>>>>> { >>>>>>> -- >>>>>>> 2.1.0 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >>>>>> well >>>>>> >>>>> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. >>>>> So could this be reduced? >>>> >>>> Hi Michael, >>>> >>>> What value do they use? >>>> >>>> Xin, Neil, is there more principled way of ensuring that a timer won't >>>> cause a hard CPU stall? There are slow machines and there are slow >>>> kernels (in particular syzbot kernel has tons of debug configs >>>> enabled). 200ms _should_ not cause problems because we did not see >>>> them with tcp. But it's hard to say what's the low limit as we are >>>> trying to put a hard upper bound on execution time of a complex >>>> section of code. Is there something like cond_resched for timers? >>> Unfortunately, Theres not really a way to do conditional rescheduling of timers, >>> additionally, we have a problem because the timer is reset as a side effect of >>> the SCTP state machine, and so the execution time between timer updates has a >>> signifcant amount of jitter (meaning its a pretty hard value to calibrate, >>> unless you just select a 'safe' large value for the floor). >>> >>> What we might could do (though this might impact the protocol function is change >>> the timer update side effects to simply set a flag, and consistently update the >>> timers on exit from sctp_do_sm, so they don't re-arm until all state machine >>> processing is complete. Anyone have any thoughts on that? >> >> I was reviewing all this again and I'm thinking that we are missing >> the real point. With the parameters that reproducer [1] has, setting >> those very low RTO parameters, it causes the timer to actually >> busyloop on the heartbeats, as Xin had explained. >> >> But thing is, it busy loops not just because RTO is too low, but >> because hbinterval was not accounted. >> >> /* What is the next timeout value for this transport? */ >> unsigned long sctp_transport_timeout(struct sctp_transport *trans) >> { >> /* RTO + timer slack +/- 50% of RTO */ >> unsigned long timeout = trans->rto >> 1; <-- [a] >> >> if (trans->state != SCTP_UNCONFIRMED && >> trans->state != SCTP_PF) <--- [2] >> timeout += trans->hbinterval; >> >> return timeout; >> } >> >> The if() in [2] is to speed up path verification before using them, as >> per the commit changelog. Secondary paths added on processing the >> cookie are created with status SCTP_UNCONFIRMED, and HB timers are >> started in the sequence: >> sctp_sf_do_5_1D_ce >> -> sctp_process_init >> |> sctp_process_param >> | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) >> '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); >> >> which starts the timer using only the small RTO for secondary paths: >> static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, >> struct sctp_association *asoc) >> { >> struct sctp_transport *t; >> >> /* Start a heartbeat timer for each transport on the association. >> * hold a reference on the transport to make sure none of >> * the needed data structures go away. >> */ >> list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) >> sctp_transport_reset_hb_timer(t); >> } >> >> But if the system is too busy generating HBs, it likely won't process >> incoming HB ACKs, which would stop the loop as it would mark the >> transport as Active. >> >> I'm now thinking a better fix would be to have a specific way to >> kickstart these initial heartbeets, and then always use hbinterval on >> subsequent ones. >> > I like the idea, but I don't think we can just use the hbinterval to set the > timeout. That said, it seems like we should always be using the HB interval, > not just on unconfirmed or partially failed transports. From the RFC: > > On an idle destination address that is allowed to heartbeat, it is > recommended that a HEARTBEAT chunk is sent once per RTO of that > destination address plus the protocol parameter 'HB.interval', with > jittering of +/- 50% of the RTO value, and exponential backoff of the > RTO if the previous HEARTBEAT is unanswered Aren't we talking about the path confirmation procedure? This is described in https://tools.ietf.org/html/rfc4960#section-5.4 where it is stated: In each RTO, a probe may be sent on an active UNCONFIRMED path in an attempt to move it to the CONFIRMED state. If during this probing the path becomes inactive, this rate is lowered to the normal HEARTBEAT rate. At the expiration of the RTO timer, the error counter of any path that was probed but not CONFIRMED is incremented by one and subjected to path failure detection, as defined in Section 8.2. When probing UNCONFIRMED addresses, however, the association overall error count is NOT incremented. So during path confirmation there is no requirement to add HB.interval. Best regards Michael > > It seems like we should be adding it to the timer expiration universally. By my > read, we've never done this quite right. And yes, I agree, if we account this > properly, we will avoid this issue. > > Its also probably important to note here, that, like RTO.min currently, there is > no hard floor to the heartbeat interval, and the RFC is silent on what it should > be. So it would be possible to still find ourselves in this situation if we set > the interval to 0 from userspace. Is it worth considering a floor on the > minimum hb interval of the rto is to have no floor? > > Neil > > >> This would not only fix the issue, but also improve the time we need >> to identify the transports as Active upon association start, which is >> currently RTO/2 (equals to 500ms by default). >> >> While working on this, I got myself wondering how HZ can affect the >> stack with such small RTO. If we have HZ=250, for example, we probably >> should be careful when doing calcs such as in mark [a] to not let it >> tend to 0. This should not be related to the reported issue as >> syzkaller was using HZ=1000. >> >> (I didn't do any tests, this is only based on code review so far) >> >> 1. https://syzkaller.appspot.com/x/repro.syz?x=1079cf8f800000 >> 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") >> b. https://syzkaller.appspot.com/x/.config?x=f3b4e30da84ec1ed >> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 13:06 ` Michael Tuexen 0 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-29 13:06 UTC (permalink / raw) To: Neil Horman Cc: Marcelo Ricardo Leitner, Dmitry Vyukov, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller > On 29. May 2018, at 13:41, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: >> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: >>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: >>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen >>>> <michael.tuexen@lurchi.franken.de> wrote: >>>>>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >>>>>> >>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>>>>> this timer again with this value, then goes to the timer handler again. >>>>>>> >>>>>>> This problem is there since very beginning, and thanks to Eric for the >>>>>>> reproducer shared from a syzbot mail. >>>>>>> >>>>>>> This patch fixes it by not allowing to set rto_min with a value below >>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>>>>> >>>>>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>>>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>>>>> --- >>>>>>> include/net/sctp/constants.h | 1 + >>>>>>> net/sctp/socket.c | 10 +++++++--- >>>>>>> net/sctp/sysctl.c | 3 ++- >>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>>>>> index 20ff237..2ee7a7b 100644 >>>>>>> --- a/include/net/sctp/constants.h >>>>>>> +++ b/include/net/sctp/constants.h >>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>>>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>>>>> #define SCTP_RTO_MIN (1 * 1000) >>>>>>> #define SCTP_RTO_MAX (60 * 1000) >>>>>>> +#define SCTP_RTO_HARD_MIN 200 >>>>>>> >>>>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>>>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>>>>> index ae7e7c6..6ef12c7 100644 >>>>>>> --- a/net/sctp/socket.c >>>>>>> +++ b/net/sctp/socket.c >>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>>>>>> * be changed. >>>>>>> * >>>>>>> */ >>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>>>>> + unsigned int optlen) >>>>>>> { >>>>>>> struct sctp_rtoinfo rtoinfo; >>>>>>> struct sctp_association *asoc; >>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>>>>>> else >>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>>>>> >>>>>>> - if (rto_min) >>>>>>> + if (rto_min) { >>>>>>> + if (rto_min < SCTP_RTO_HARD_MIN) >>>>>>> + return -EINVAL; >>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>>>>> - else >>>>>>> + } else { >>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>>>>> + } >>>>>>> >>>>>>> if (rto_min > rto_max) >>>>>>> return -EINVAL; >>>>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>>>>>> index 33ca5b7..7ec854a 100644 >>>>>>> --- a/net/sctp/sysctl.c >>>>>>> +++ b/net/sctp/sysctl.c >>>>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>>>>>> static int rto_beta_min = 0; >>>>>>> static int rto_alpha_max = 1000; >>>>>>> static int rto_beta_max = 1000; >>>>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>>>>>> >>>>>>> static unsigned long max_autoclose_min = 0; >>>>>>> static unsigned long max_autoclose_max >>>>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>>>>>> .maxlen = sizeof(unsigned int), >>>>>>> .mode = 0644, >>>>>>> .proc_handler = proc_sctp_do_rto_min, >>>>>>> - .extra1 = &one, >>>>>>> + .extra1 = &rto_hard_min, >>>>>>> .extra2 = &init_net.sctp.rto_max >>>>>>> }, >>>>>>> { >>>>>>> -- >>>>>>> 2.1.0 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >>>>>> well >>>>>> >>>>> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. >>>>> So could this be reduced? >>>> >>>> Hi Michael, >>>> >>>> What value do they use? >>>> >>>> Xin, Neil, is there more principled way of ensuring that a timer won't >>>> cause a hard CPU stall? There are slow machines and there are slow >>>> kernels (in particular syzbot kernel has tons of debug configs >>>> enabled). 200ms _should_ not cause problems because we did not see >>>> them with tcp. But it's hard to say what's the low limit as we are >>>> trying to put a hard upper bound on execution time of a complex >>>> section of code. Is there something like cond_resched for timers? >>> Unfortunately, Theres not really a way to do conditional rescheduling of timers, >>> additionally, we have a problem because the timer is reset as a side effect of >>> the SCTP state machine, and so the execution time between timer updates has a >>> signifcant amount of jitter (meaning its a pretty hard value to calibrate, >>> unless you just select a 'safe' large value for the floor). >>> >>> What we might could do (though this might impact the protocol function is change >>> the timer update side effects to simply set a flag, and consistently update the >>> timers on exit from sctp_do_sm, so they don't re-arm until all state machine >>> processing is complete. Anyone have any thoughts on that? >> >> I was reviewing all this again and I'm thinking that we are missing >> the real point. With the parameters that reproducer [1] has, setting >> those very low RTO parameters, it causes the timer to actually >> busyloop on the heartbeats, as Xin had explained. >> >> But thing is, it busy loops not just because RTO is too low, but >> because hbinterval was not accounted. >> >> /* What is the next timeout value for this transport? */ >> unsigned long sctp_transport_timeout(struct sctp_transport *trans) >> { >> /* RTO + timer slack +/- 50% of RTO */ >> unsigned long timeout = trans->rto >> 1; <-- [a] >> >> if (trans->state != SCTP_UNCONFIRMED && >> trans->state != SCTP_PF) <--- [2] >> timeout += trans->hbinterval; >> >> return timeout; >> } >> >> The if() in [2] is to speed up path verification before using them, as >> per the commit changelog. Secondary paths added on processing the >> cookie are created with status SCTP_UNCONFIRMED, and HB timers are >> started in the sequence: >> sctp_sf_do_5_1D_ce >> -> sctp_process_init >> |> sctp_process_param >> | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) >> '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); >> >> which starts the timer using only the small RTO for secondary paths: >> static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, >> struct sctp_association *asoc) >> { >> struct sctp_transport *t; >> >> /* Start a heartbeat timer for each transport on the association. >> * hold a reference on the transport to make sure none of >> * the needed data structures go away. >> */ >> list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) >> sctp_transport_reset_hb_timer(t); >> } >> >> But if the system is too busy generating HBs, it likely won't process >> incoming HB ACKs, which would stop the loop as it would mark the >> transport as Active. >> >> I'm now thinking a better fix would be to have a specific way to >> kickstart these initial heartbeets, and then always use hbinterval on >> subsequent ones. >> > I like the idea, but I don't think we can just use the hbinterval to set the > timeout. That said, it seems like we should always be using the HB interval, > not just on unconfirmed or partially failed transports. From the RFC: > > On an idle destination address that is allowed to heartbeat, it is > recommended that a HEARTBEAT chunk is sent once per RTO of that > destination address plus the protocol parameter 'HB.interval', with > jittering of +/- 50% of the RTO value, and exponential backoff of the > RTO if the previous HEARTBEAT is unanswered Aren't we talking about the path confirmation procedure? This is described in https://tools.ietf.org/html/rfc4960#section-5.4 where it is stated: In each RTO, a probe may be sent on an active UNCONFIRMED path in an attempt to move it to the CONFIRMED state. If during this probing the path becomes inactive, this rate is lowered to the normal HEARTBEAT rate. At the expiration of the RTO timer, the error counter of any path that was probed but not CONFIRMED is incremented by one and subjected to path failure detection, as defined in Section 8.2. When probing UNCONFIRMED addresses, however, the association overall error count is NOT incremented. So during path confirmation there is no requirement to add HB.interval. Best regards Michael > > It seems like we should be adding it to the timer expiration universally. By my > read, we've never done this quite right. And yes, I agree, if we account this > properly, we will avoid this issue. > > Its also probably important to note here, that, like RTO.min currently, there is > no hard floor to the heartbeat interval, and the RFC is silent on what it should > be. So it would be possible to still find ourselves in this situation if we set > the interval to 0 from userspace. Is it worth considering a floor on the > minimum hb interval of the rto is to have no floor? > > Neil > > >> This would not only fix the issue, but also improve the time we need >> to identify the transports as Active upon association start, which is >> currently RTO/2 (equals to 500ms by default). >> >> While working on this, I got myself wondering how HZ can affect the >> stack with such small RTO. If we have HZ%0, for example, we probably >> should be careful when doing calcs such as in mark [a] to not let it >> tend to 0. This should not be related to the reported issue as >> syzkaller was using HZ\x1000. >> >> (I didn't do any tests, this is only based on code review so far) >> >> 1. https://syzkaller.appspot.com/x/repro.syz?x\x1079cf8f800000 >> 2. ad8fec1720e0 ("[SCTP]: Verify all the paths to a peer via heartbeat before using them.") >> b. https://syzkaller.appspot.com/x/.config?xób4e30da84ec1ed >> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 13:06 ` Michael Tuexen @ 2018-05-29 15:45 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 15:45 UTC (permalink / raw) To: Michael Tuexen Cc: Neil Horman, Dmitry Vyukov, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 03:06:06PM +0200, Michael Tuexen wrote: > > On 29. May 2018, at 13:41, Neil Horman <nhorman@tuxdriver.com> wrote: > > > > On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: > >> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > >>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > >>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > >>>> <michael.tuexen@lurchi.franken.de> wrote: > >>>>>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > >>>>>> > >>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > >>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > >>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > >>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts > >>>>>>> this timer again with this value, then goes to the timer handler again. > >>>>>>> > >>>>>>> This problem is there since very beginning, and thanks to Eric for the > >>>>>>> reproducer shared from a syzbot mail. > >>>>>>> > >>>>>>> This patch fixes it by not allowing to set rto_min with a value below > >>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > >>>>>>> > >>>>>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > >>>>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>>>>>> --- > >>>>>>> include/net/sctp/constants.h | 1 + > >>>>>>> net/sctp/socket.c | 10 +++++++--- > >>>>>>> net/sctp/sysctl.c | 3 ++- > >>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-) > >>>>>>> > >>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > >>>>>>> index 20ff237..2ee7a7b 100644 > >>>>>>> --- a/include/net/sctp/constants.h > >>>>>>> +++ b/include/net/sctp/constants.h > >>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > >>>>>>> #define SCTP_RTO_INITIAL (3 * 1000) > >>>>>>> #define SCTP_RTO_MIN (1 * 1000) > >>>>>>> #define SCTP_RTO_MAX (60 * 1000) > >>>>>>> +#define SCTP_RTO_HARD_MIN 200 > >>>>>>> > >>>>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > >>>>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > >>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > >>>>>>> index ae7e7c6..6ef12c7 100644 > >>>>>>> --- a/net/sctp/socket.c > >>>>>>> +++ b/net/sctp/socket.c > >>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > >>>>>>> * be changed. > >>>>>>> * > >>>>>>> */ > >>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > >>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > >>>>>>> + unsigned int optlen) > >>>>>>> { > >>>>>>> struct sctp_rtoinfo rtoinfo; > >>>>>>> struct sctp_association *asoc; > >>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > >>>>>>> else > >>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > >>>>>>> > >>>>>>> - if (rto_min) > >>>>>>> + if (rto_min) { > >>>>>>> + if (rto_min < SCTP_RTO_HARD_MIN) > >>>>>>> + return -EINVAL; > >>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > >>>>>>> - else > >>>>>>> + } else { > >>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > >>>>>>> + } > >>>>>>> > >>>>>>> if (rto_min > rto_max) > >>>>>>> return -EINVAL; > >>>>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > >>>>>>> index 33ca5b7..7ec854a 100644 > >>>>>>> --- a/net/sctp/sysctl.c > >>>>>>> +++ b/net/sctp/sysctl.c > >>>>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > >>>>>>> static int rto_beta_min = 0; > >>>>>>> static int rto_alpha_max = 1000; > >>>>>>> static int rto_beta_max = 1000; > >>>>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > >>>>>>> > >>>>>>> static unsigned long max_autoclose_min = 0; > >>>>>>> static unsigned long max_autoclose_max = > >>>>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > >>>>>>> .maxlen = sizeof(unsigned int), > >>>>>>> .mode = 0644, > >>>>>>> .proc_handler = proc_sctp_do_rto_min, > >>>>>>> - .extra1 = &one, > >>>>>>> + .extra1 = &rto_hard_min, > >>>>>>> .extra2 = &init_net.sctp.rto_max > >>>>>>> }, > >>>>>>> { > >>>>>>> -- > >>>>>>> 2.1.0 > >>>>>>> > >>>>>>> -- > >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > >>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>> > >>>>>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >>>>>> well > >>>>>> > >>>>> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > >>>>> So could this be reduced? > >>>> > >>>> Hi Michael, > >>>> > >>>> What value do they use? > >>>> > >>>> Xin, Neil, is there more principled way of ensuring that a timer won't > >>>> cause a hard CPU stall? There are slow machines and there are slow > >>>> kernels (in particular syzbot kernel has tons of debug configs > >>>> enabled). 200ms _should_ not cause problems because we did not see > >>>> them with tcp. But it's hard to say what's the low limit as we are > >>>> trying to put a hard upper bound on execution time of a complex > >>>> section of code. Is there something like cond_resched for timers? > >>> Unfortunately, Theres not really a way to do conditional rescheduling of timers, > >>> additionally, we have a problem because the timer is reset as a side effect of > >>> the SCTP state machine, and so the execution time between timer updates has a > >>> signifcant amount of jitter (meaning its a pretty hard value to calibrate, > >>> unless you just select a 'safe' large value for the floor). > >>> > >>> What we might could do (though this might impact the protocol function is change > >>> the timer update side effects to simply set a flag, and consistently update the > >>> timers on exit from sctp_do_sm, so they don't re-arm until all state machine > >>> processing is complete. Anyone have any thoughts on that? > >> > >> I was reviewing all this again and I'm thinking that we are missing > >> the real point. With the parameters that reproducer [1] has, setting > >> those very low RTO parameters, it causes the timer to actually > >> busyloop on the heartbeats, as Xin had explained. > >> > >> But thing is, it busy loops not just because RTO is too low, but > >> because hbinterval was not accounted. > >> > >> /* What is the next timeout value for this transport? */ > >> unsigned long sctp_transport_timeout(struct sctp_transport *trans) > >> { > >> /* RTO + timer slack +/- 50% of RTO */ > >> unsigned long timeout = trans->rto >> 1; <-- [a] > >> > >> if (trans->state != SCTP_UNCONFIRMED && > >> trans->state != SCTP_PF) <--- [2] > >> timeout += trans->hbinterval; > >> > >> return timeout; > >> } > >> > >> The if() in [2] is to speed up path verification before using them, as > >> per the commit changelog. Secondary paths added on processing the > >> cookie are created with status SCTP_UNCONFIRMED, and HB timers are > >> started in the sequence: > >> sctp_sf_do_5_1D_ce > >> -> sctp_process_init > >> |> sctp_process_param > >> | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) > >> '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); > >> > >> which starts the timer using only the small RTO for secondary paths: > >> static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, > >> struct sctp_association *asoc) > >> { > >> struct sctp_transport *t; > >> > >> /* Start a heartbeat timer for each transport on the association. > >> * hold a reference on the transport to make sure none of > >> * the needed data structures go away. > >> */ > >> list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) > >> sctp_transport_reset_hb_timer(t); > >> } > >> > >> But if the system is too busy generating HBs, it likely won't process > >> incoming HB ACKs, which would stop the loop as it would mark the > >> transport as Active. > >> > >> I'm now thinking a better fix would be to have a specific way to > >> kickstart these initial heartbeets, and then always use hbinterval on > >> subsequent ones. > >> > > I like the idea, but I don't think we can just use the hbinterval to set the > > timeout. That said, it seems like we should always be using the HB interval, > > not just on unconfirmed or partially failed transports. From the RFC: > > > > On an idle destination address that is allowed to heartbeat, it is > > recommended that a HEARTBEAT chunk is sent once per RTO of that > > destination address plus the protocol parameter 'HB.interval', with > > jittering of +/- 50% of the RTO value, and exponential backoff of the > > RTO if the previous HEARTBEAT is unanswered > Aren't we talking about the path confirmation procedure? > This is described in https://tools.ietf.org/html/rfc4960#section-5.4 > where it is stated: > > In each RTO, a probe may be sent on an active UNCONFIRMED path in an > attempt to move it to the CONFIRMED state. If during this probing > the path becomes inactive, this rate is lowered to the normal > HEARTBEAT rate. At the expiration of the RTO timer, the error > counter of any path that was probed but not CONFIRMED is incremented > by one and subjected to path failure detection, as defined in Section 8.2. > When probing UNCONFIRMED addresses, however, the association > overall error count is NOT incremented. > > So during path confirmation there is no requirement to add HB.interval. Right. > > Best regards > Michael > > > > It seems like we should be adding it to the timer expiration universally. By my > > read, we've never done this quite right. And yes, I agree, if we account this > > properly, we will avoid this issue. > > > > Its also probably important to note here, that, like RTO.min currently, there is > > no hard floor to the heartbeat interval, and the RFC is silent on what it should > > be. So it would be possible to still find ourselves in this situation if we set > > the interval to 0 from userspace. Is it worth considering a floor on the > > minimum hb interval of the rto is to have no floor? Seems so, yes. I was discussing this with Xin Long offline and he suggested that we can add a floor to hb timeouts (not interval) with this: diff --git a/net/sctp/transport.c b/net/sctp/transport.c index 47f82bd..9f66708 100644 --- a/net/sctp/transport.c +++ b/net/sctp/transport.c @@ -634,7 +634,7 @@ unsigned long sctp_transport_timeout(struct sctp_transport *trans) trans->state != SCTP_PF) timeout += trans->hbinterval; - return timeout; + return max_t(unsigned long, timeout, HZ/5); } /* Reset transport variables to their initial values */ This avoids the issue at hand and without forcing a rto_min value. But as you were anticipating, there are other vectors that can be exploited to trigger something like this. The one I could think of, is to set rto_min=1 rto_max=2 and pathmaxrxt=<large value>. This is likely to get us into rtxing the same packet over and over based on timer/softirq, and it doesn't even need root for that. Seems a more complete fix is: - patch1 - fix issue at hand - Use the max_t above - patch2 - fix rtx attack vector - Add the floor value to rto_min to HZ/20 (which fits the values that Michael shared on the other email) - patch3 - speed up initial HB again - change sctp_cmd_hb_timers_start() so hb timers are kickstarted when the association is established. AFAICT RFC doesn't specify when these initial ones should be sent, and I see no issues with speeding them up. WDYT? Michael, what is the lowest heartbeat interval you have ever seen? Hopefully it's bigger than 200ms. :) Best, Marcelo ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 15:45 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 15:45 UTC (permalink / raw) To: Michael Tuexen Cc: Neil Horman, Dmitry Vyukov, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 03:06:06PM +0200, Michael Tuexen wrote: > > On 29. May 2018, at 13:41, Neil Horman <nhorman@tuxdriver.com> wrote: > > > > On Mon, May 28, 2018 at 04:43:15PM -0300, Marcelo Ricardo Leitner wrote: > >> On Sat, May 26, 2018 at 09:01:00PM -0400, Neil Horman wrote: > >>> On Sat, May 26, 2018 at 05:50:39PM +0200, Dmitry Vyukov wrote: > >>>> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > >>>> <michael.tuexen@lurchi.franken.de> wrote: > >>>>>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > >>>>>> > >>>>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: > >>>>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused > >>>>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this > >>>>>>> value, hb_timer will get stuck there, as in its timer handler it starts > >>>>>>> this timer again with this value, then goes to the timer handler again. > >>>>>>> > >>>>>>> This problem is there since very beginning, and thanks to Eric for the > >>>>>>> reproducer shared from a syzbot mail. > >>>>>>> > >>>>>>> This patch fixes it by not allowing to set rto_min with a value below > >>>>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. > >>>>>>> > >>>>>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com > >>>>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > >>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> > >>>>>>> --- > >>>>>>> include/net/sctp/constants.h | 1 + > >>>>>>> net/sctp/socket.c | 10 +++++++--- > >>>>>>> net/sctp/sysctl.c | 3 ++- > >>>>>>> 3 files changed, 10 insertions(+), 4 deletions(-) > >>>>>>> > >>>>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h > >>>>>>> index 20ff237..2ee7a7b 100644 > >>>>>>> --- a/include/net/sctp/constants.h > >>>>>>> +++ b/include/net/sctp/constants.h > >>>>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; > >>>>>>> #define SCTP_RTO_INITIAL (3 * 1000) > >>>>>>> #define SCTP_RTO_MIN (1 * 1000) > >>>>>>> #define SCTP_RTO_MAX (60 * 1000) > >>>>>>> +#define SCTP_RTO_HARD_MIN 200 > >>>>>>> > >>>>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ > >>>>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ > >>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c > >>>>>>> index ae7e7c6..6ef12c7 100644 > >>>>>>> --- a/net/sctp/socket.c > >>>>>>> +++ b/net/sctp/socket.c > >>>>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, > >>>>>>> * be changed. > >>>>>>> * > >>>>>>> */ > >>>>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) > >>>>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, > >>>>>>> + unsigned int optlen) > >>>>>>> { > >>>>>>> struct sctp_rtoinfo rtoinfo; > >>>>>>> struct sctp_association *asoc; > >>>>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne > >>>>>>> else > >>>>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; > >>>>>>> > >>>>>>> - if (rto_min) > >>>>>>> + if (rto_min) { > >>>>>>> + if (rto_min < SCTP_RTO_HARD_MIN) > >>>>>>> + return -EINVAL; > >>>>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; > >>>>>>> - else > >>>>>>> + } else { > >>>>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; > >>>>>>> + } > >>>>>>> > >>>>>>> if (rto_min > rto_max) > >>>>>>> return -EINVAL; > >>>>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c > >>>>>>> index 33ca5b7..7ec854a 100644 > >>>>>>> --- a/net/sctp/sysctl.c > >>>>>>> +++ b/net/sctp/sysctl.c > >>>>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; > >>>>>>> static int rto_beta_min = 0; > >>>>>>> static int rto_alpha_max = 1000; > >>>>>>> static int rto_beta_max = 1000; > >>>>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; > >>>>>>> > >>>>>>> static unsigned long max_autoclose_min = 0; > >>>>>>> static unsigned long max_autoclose_max > >>>>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { > >>>>>>> .maxlen = sizeof(unsigned int), > >>>>>>> .mode = 0644, > >>>>>>> .proc_handler = proc_sctp_do_rto_min, > >>>>>>> - .extra1 = &one, > >>>>>>> + .extra1 = &rto_hard_min, > >>>>>>> .extra2 = &init_net.sctp.rto_max > >>>>>>> }, > >>>>>>> { > >>>>>>> -- > >>>>>>> 2.1.0 > >>>>>>> > >>>>>>> -- > >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > >>>>>>> the body of a message to majordomo@vger.kernel.org > >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>>>> > >>>>>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >>>>>> well > >>>>>> > >>>>> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > >>>>> So could this be reduced? > >>>> > >>>> Hi Michael, > >>>> > >>>> What value do they use? > >>>> > >>>> Xin, Neil, is there more principled way of ensuring that a timer won't > >>>> cause a hard CPU stall? There are slow machines and there are slow > >>>> kernels (in particular syzbot kernel has tons of debug configs > >>>> enabled). 200ms _should_ not cause problems because we did not see > >>>> them with tcp. But it's hard to say what's the low limit as we are > >>>> trying to put a hard upper bound on execution time of a complex > >>>> section of code. Is there something like cond_resched for timers? > >>> Unfortunately, Theres not really a way to do conditional rescheduling of timers, > >>> additionally, we have a problem because the timer is reset as a side effect of > >>> the SCTP state machine, and so the execution time between timer updates has a > >>> signifcant amount of jitter (meaning its a pretty hard value to calibrate, > >>> unless you just select a 'safe' large value for the floor). > >>> > >>> What we might could do (though this might impact the protocol function is change > >>> the timer update side effects to simply set a flag, and consistently update the > >>> timers on exit from sctp_do_sm, so they don't re-arm until all state machine > >>> processing is complete. Anyone have any thoughts on that? > >> > >> I was reviewing all this again and I'm thinking that we are missing > >> the real point. With the parameters that reproducer [1] has, setting > >> those very low RTO parameters, it causes the timer to actually > >> busyloop on the heartbeats, as Xin had explained. > >> > >> But thing is, it busy loops not just because RTO is too low, but > >> because hbinterval was not accounted. > >> > >> /* What is the next timeout value for this transport? */ > >> unsigned long sctp_transport_timeout(struct sctp_transport *trans) > >> { > >> /* RTO + timer slack +/- 50% of RTO */ > >> unsigned long timeout = trans->rto >> 1; <-- [a] > >> > >> if (trans->state != SCTP_UNCONFIRMED && > >> trans->state != SCTP_PF) <--- [2] > >> timeout += trans->hbinterval; > >> > >> return timeout; > >> } > >> > >> The if() in [2] is to speed up path verification before using them, as > >> per the commit changelog. Secondary paths added on processing the > >> cookie are created with status SCTP_UNCONFIRMED, and HB timers are > >> started in the sequence: > >> sctp_sf_do_5_1D_ce > >> -> sctp_process_init > >> |> sctp_process_param > >> | -> sctp_assoc_add_peer(asoc, &addr, gfp, SCTP_UNCONFIRMED) > >> '> sctp_add_cmd_sf(commands, SCTP_CMD_HB_TIMERS_START, SCTP_NULL()); > >> > >> which starts the timer using only the small RTO for secondary paths: > >> static void sctp_cmd_hb_timers_start(struct sctp_cmd_seq *cmds, > >> struct sctp_association *asoc) > >> { > >> struct sctp_transport *t; > >> > >> /* Start a heartbeat timer for each transport on the association. > >> * hold a reference on the transport to make sure none of > >> * the needed data structures go away. > >> */ > >> list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) > >> sctp_transport_reset_hb_timer(t); > >> } > >> > >> But if the system is too busy generating HBs, it likely won't process > >> incoming HB ACKs, which would stop the loop as it would mark the > >> transport as Active. > >> > >> I'm now thinking a better fix would be to have a specific way to > >> kickstart these initial heartbeets, and then always use hbinterval on > >> subsequent ones. > >> > > I like the idea, but I don't think we can just use the hbinterval to set the > > timeout. That said, it seems like we should always be using the HB interval, > > not just on unconfirmed or partially failed transports. From the RFC: > > > > On an idle destination address that is allowed to heartbeat, it is > > recommended that a HEARTBEAT chunk is sent once per RTO of that > > destination address plus the protocol parameter 'HB.interval', with > > jittering of +/- 50% of the RTO value, and exponential backoff of the > > RTO if the previous HEARTBEAT is unanswered > Aren't we talking about the path confirmation procedure? > This is described in https://tools.ietf.org/html/rfc4960#section-5.4 > where it is stated: > > In each RTO, a probe may be sent on an active UNCONFIRMED path in an > attempt to move it to the CONFIRMED state. If during this probing > the path becomes inactive, this rate is lowered to the normal > HEARTBEAT rate. At the expiration of the RTO timer, the error > counter of any path that was probed but not CONFIRMED is incremented > by one and subjected to path failure detection, as defined in Section 8.2. > When probing UNCONFIRMED addresses, however, the association > overall error count is NOT incremented. > > So during path confirmation there is no requirement to add HB.interval. Right. > > Best regards > Michael > > > > It seems like we should be adding it to the timer expiration universally. By my > > read, we've never done this quite right. And yes, I agree, if we account this > > properly, we will avoid this issue. > > > > Its also probably important to note here, that, like RTO.min currently, there is > > no hard floor to the heartbeat interval, and the RFC is silent on what it should > > be. So it would be possible to still find ourselves in this situation if we set > > the interval to 0 from userspace. Is it worth considering a floor on the > > minimum hb interval of the rto is to have no floor? Seems so, yes. I was discussing this with Xin Long offline and he suggested that we can add a floor to hb timeouts (not interval) with this: diff --git a/net/sctp/transport.c b/net/sctp/transport.c index 47f82bd..9f66708 100644 --- a/net/sctp/transport.c +++ b/net/sctp/transport.c @@ -634,7 +634,7 @@ unsigned long sctp_transport_timeout(struct sctp_transport *trans) trans->state != SCTP_PF) timeout += trans->hbinterval; - return timeout; + return max_t(unsigned long, timeout, HZ/5); } /* Reset transport variables to their initial values */ This avoids the issue at hand and without forcing a rto_min value. But as you were anticipating, there are other vectors that can be exploited to trigger something like this. The one I could think of, is to set rto_min=1 rto_max=2 and pathmaxrxt=<large value>. This is likely to get us into rtxing the same packet over and over based on timer/softirq, and it doesn't even need root for that. Seems a more complete fix is: - patch1 - fix issue at hand - Use the max_t above - patch2 - fix rtx attack vector - Add the floor value to rto_min to HZ/20 (which fits the values that Michael shared on the other email) - patch3 - speed up initial HB again - change sctp_cmd_hb_timers_start() so hb timers are kickstarted when the association is established. AFAICT RFC doesn't specify when these initial ones should be sent, and I see no issues with speeding them up. WDYT? Michael, what is the lowest heartbeat interval you have ever seen? Hopefully it's bigger than 200ms. :) Best, Marcelo ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 15:45 ` Marcelo Ricardo Leitner @ 2018-05-29 16:03 ` Neal Cardwell -1 siblings, 0 replies; 34+ messages in thread From: Neal Cardwell @ 2018-05-29 16:03 UTC (permalink / raw) To: marcelo.leitner Cc: Michael Tuexen, nhorman, Dmitry Vyukov, lucien.xin, Netdev, linux-sctp, David Miller, dsa, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < marcelo.leitner@gmail.com> wrote: > - patch2 - fix rtx attack vector > - Add the floor value to rto_min to HZ/20 (which fits the values > that Michael shared on the other email) I would encourage allowing minimum RTO values down to 5ms, if the ACK policy in the receiver makes this feasible. Our experience is that in datacenter environments it can be advantageous to allow timer-based loss recoveries using timeout values as low as 5ms, e.g.: https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00 cheers, neal ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 16:03 ` Neal Cardwell 0 siblings, 0 replies; 34+ messages in thread From: Neal Cardwell @ 2018-05-29 16:03 UTC (permalink / raw) To: marcelo.leitner Cc: Michael Tuexen, nhorman, Dmitry Vyukov, lucien.xin, Netdev, linux-sctp, David Miller, dsa, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < marcelo.leitner@gmail.com> wrote: > - patch2 - fix rtx attack vector > - Add the floor value to rto_min to HZ/20 (which fits the values > that Michael shared on the other email) I would encourage allowing minimum RTO values down to 5ms, if the ACK policy in the receiver makes this feasible. Our experience is that in datacenter environments it can be advantageous to allow timer-based loss recoveries using timeout values as low as 5ms, e.g.: https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00 cheers, neal ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 16:03 ` Neal Cardwell @ 2018-05-29 17:06 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 17:06 UTC (permalink / raw) To: Neal Cardwell Cc: Michael Tuexen, nhorman, Dmitry Vyukov, lucien.xin, Netdev, linux-sctp, David Miller, dsa, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: > On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < > marcelo.leitner@gmail.com> wrote: > > - patch2 - fix rtx attack vector > > - Add the floor value to rto_min to HZ/20 (which fits the values > > that Michael shared on the other email) > > I would encourage allowing minimum RTO values down to 5ms, if the ACK > policy in the receiver makes this feasible. Our experience is that in > datacenter environments it can be advantageous to allow timer-based loss > recoveries using timeout values as low as 5ms, e.g.: Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at ~25ms already. Xin, can you share more details on the hw, which CPU was used? Anyway, what about we add a floor to rto_max too, so that RTO can actually grow into something bigger that don't hog the CPU? Like: rto_min floor = 5ms rto_max floor = 50ms Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 17:06 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 17:06 UTC (permalink / raw) To: Neal Cardwell Cc: Michael Tuexen, nhorman, Dmitry Vyukov, lucien.xin, Netdev, linux-sctp, David Miller, dsa, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: > On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < > marcelo.leitner@gmail.com> wrote: > > - patch2 - fix rtx attack vector > > - Add the floor value to rto_min to HZ/20 (which fits the values > > that Michael shared on the other email) > > I would encourage allowing minimum RTO values down to 5ms, if the ACK > policy in the receiver makes this feasible. Our experience is that in > datacenter environments it can be advantageous to allow timer-based loss > recoveries using timeout values as low as 5ms, e.g.: Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at ~25ms already. Xin, can you share more details on the hw, which CPU was used? Anyway, what about we add a floor to rto_max too, so that RTO can actually grow into something bigger that don't hog the CPU? Like: rto_min floor = 5ms rto_max floor = 50ms Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 17:06 ` Marcelo Ricardo Leitner @ 2018-05-29 17:45 ` Xin Long -1 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-05-29 17:45 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Neal Cardwell, Michael Tuexen, Neil Horman, Dmitry Vyukov, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote: > On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >> marcelo.leitner@gmail.com> wrote: >> > - patch2 - fix rtx attack vector >> > - Add the floor value to rto_min to HZ/20 (which fits the values >> > that Michael shared on the other email) >> >> I would encourage allowing minimum RTO values down to 5ms, if the ACK >> policy in the receiver makes this feasible. Our experience is that in >> datacenter environments it can be advantageous to allow timer-based loss >> recoveries using timeout values as low as 5ms, e.g.: > > Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at > ~25ms already. Xin, can you share more details on the hw, which CPU > was used? It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 13 Model name: QEMU Virtual CPU version 1.5.3 Stepping: 3 CPU MHz: 2397.222 BogoMIPS: 4794.44 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0,1 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl cpuid pni cx16 hypervisor lahf_lm abm pti If we're counting on max_t to fix this CPU stuck. It should not that matter if min rto < the value causing that stuck. > > Anyway, what about we add a floor to rto_max too, so that RTO can > actually grow into something bigger that don't hog the CPU? Like: > rto_min floor = 5ms > rto_max floor = 50ms > > Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 17:45 ` Xin Long 0 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-05-29 17:45 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: Neal Cardwell, Michael Tuexen, Neil Horman, Dmitry Vyukov, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote: > On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >> marcelo.leitner@gmail.com> wrote: >> > - patch2 - fix rtx attack vector >> > - Add the floor value to rto_min to HZ/20 (which fits the values >> > that Michael shared on the other email) >> >> I would encourage allowing minimum RTO values down to 5ms, if the ACK >> policy in the receiver makes this feasible. Our experience is that in >> datacenter environments it can be advantageous to allow timer-based loss >> recoveries using timeout values as low as 5ms, e.g.: > > Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at > ~25ms already. Xin, can you share more details on the hw, which CPU > was used? It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 13 Model name: QEMU Virtual CPU version 1.5.3 Stepping: 3 CPU MHz: 2397.222 BogoMIPS: 4794.44 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0,1 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl cpuid pni cx16 hypervisor lahf_lm abm pti If we're counting on max_t to fix this CPU stuck. It should not that matter if min rto < the value causing that stuck. > > Anyway, what about we add a floor to rto_max too, so that RTO can > actually grow into something bigger that don't hog the CPU? Like: > rto_min floor = 5ms > rto_max floor = 50ms > > Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 17:45 ` Xin Long @ 2018-05-29 18:02 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 18:02 UTC (permalink / raw) To: Xin Long Cc: Neal Cardwell, Michael Tuexen, Neil Horman, Dmitry Vyukov, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Wed, May 30, 2018 at 01:45:08AM +0800, Xin Long wrote: > If we're counting on max_t to fix this CPU stuck. It should not that > matter if min rto < the value causing that stuck. Yes but putting a floor to rto_{min,max} now is to protect the rtx timer now, not the heartbeat one. > > > > > Anyway, what about we add a floor to rto_max too, so that RTO can > > actually grow into something bigger that don't hog the CPU? Like: > > rto_min floor = 5ms > > rto_max floor = 50ms > > > > Marcelo > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-29 18:02 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-29 18:02 UTC (permalink / raw) To: Xin Long Cc: Neal Cardwell, Michael Tuexen, Neil Horman, Dmitry Vyukov, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Wed, May 30, 2018 at 01:45:08AM +0800, Xin Long wrote: > If we're counting on max_t to fix this CPU stuck. It should not that > matter if min rto < the value causing that stuck. Yes but putting a floor to rto_{min,max} now is to protect the rtx timer now, not the heartbeat one. > > > > > Anyway, what about we add a floor to rto_max too, so that RTO can > > actually grow into something bigger that don't hog the CPU? Like: > > rto_min floor = 5ms > > rto_max floor = 50ms > > > > Marcelo > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-29 17:45 ` Xin Long @ 2018-06-04 8:34 ` Dmitry Vyukov -1 siblings, 0 replies; 34+ messages in thread From: Dmitry Vyukov @ 2018-06-04 8:34 UTC (permalink / raw) To: Xin Long Cc: Marcelo Ricardo Leitner, Neal Cardwell, Michael Tuexen, Neil Horman, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote: > On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner > <marcelo.leitner@gmail.com> wrote: >> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >>> marcelo.leitner@gmail.com> wrote: >>> > - patch2 - fix rtx attack vector >>> > - Add the floor value to rto_min to HZ/20 (which fits the values >>> > that Michael shared on the other email) >>> >>> I would encourage allowing minimum RTO values down to 5ms, if the ACK >>> policy in the receiver makes this feasible. Our experience is that in >>> datacenter environments it can be advantageous to allow timer-based loss >>> recoveries using timeout values as low as 5ms, e.g.: >> >> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at >> ~25ms already. Xin, can you share more details on the hw, which CPU >> was used? Hi, Did we reach any decision on this? This continues to produce bug reports on syzbot. I am not sure whom you are asking, because Xin is you unless I am missing something :) But if you mean syzbot hardware, then it's GCE VMs with modern Intel CPUs but an important aspect is a heavy-debug config (which you can take from here https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f) and systematic bug reporting. So if it's any flaky in your testing, it will produce dozens of bug emails on syzbot. > It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 2 > On-line CPU(s) list: 0,1 > Thread(s) per core: 1 > Core(s) per socket: 1 > Socket(s): 2 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 13 > Model name: QEMU Virtual CPU version 1.5.3 > Stepping: 3 > CPU MHz: 2397.222 > BogoMIPS: 4794.44 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 4096K > NUMA node0 CPU(s): 0,1 > Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good > nopl cpuid pni cx16 hypervisor lahf_lm abm pti > > If we're counting on max_t to fix this CPU stuck. It should not that > matter if min rto < the value causing that stuck. > >> >> Anyway, what about we add a floor to rto_max too, so that RTO can >> actually grow into something bigger that don't hog the CPU? Like: >> rto_min floor = 5ms >> rto_max floor = 50ms >> >> Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-06-04 8:34 ` Dmitry Vyukov 0 siblings, 0 replies; 34+ messages in thread From: Dmitry Vyukov @ 2018-06-04 8:34 UTC (permalink / raw) To: Xin Long Cc: Marcelo Ricardo Leitner, Neal Cardwell, Michael Tuexen, Neil Horman, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote: > On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner > <marcelo.leitner@gmail.com> wrote: >> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >>> marcelo.leitner@gmail.com> wrote: >>> > - patch2 - fix rtx attack vector >>> > - Add the floor value to rto_min to HZ/20 (which fits the values >>> > that Michael shared on the other email) >>> >>> I would encourage allowing minimum RTO values down to 5ms, if the ACK >>> policy in the receiver makes this feasible. Our experience is that in >>> datacenter environments it can be advantageous to allow timer-based loss >>> recoveries using timeout values as low as 5ms, e.g.: >> >> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at >> ~25ms already. Xin, can you share more details on the hw, which CPU >> was used? Hi, Did we reach any decision on this? This continues to produce bug reports on syzbot. I am not sure whom you are asking, because Xin is you unless I am missing something :) But if you mean syzbot hardware, then it's GCE VMs with modern Intel CPUs but an important aspect is a heavy-debug config (which you can take from here https://syzkaller.appspot.com/bug?extid=cd59a1f907245f891f) and systematic bug reporting. So if it's any flaky in your testing, it will produce dozens of bug emails on syzbot. > It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" > # lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 2 > On-line CPU(s) list: 0,1 > Thread(s) per core: 1 > Core(s) per socket: 1 > Socket(s): 2 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 13 > Model name: QEMU Virtual CPU version 1.5.3 > Stepping: 3 > CPU MHz: 2397.222 > BogoMIPS: 4794.44 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 4096K > NUMA node0 CPU(s): 0,1 > Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good > nopl cpuid pni cx16 hypervisor lahf_lm abm pti > > If we're counting on max_t to fix this CPU stuck. It should not that > matter if min rto < the value causing that stuck. > >> >> Anyway, what about we add a floor to rto_max too, so that RTO can >> actually grow into something bigger that don't hog the CPU? Like: >> rto_min floor = 5ms >> rto_max floor = 50ms >> >> Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-06-04 8:34 ` Dmitry Vyukov @ 2018-06-04 12:15 ` Xin Long -1 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-06-04 12:15 UTC (permalink / raw) To: Dmitry Vyukov Cc: Marcelo Ricardo Leitner, Neal Cardwell, Michael Tuexen, Neil Horman, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Mon, Jun 4, 2018 at 4:34 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote: >> On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner >> <marcelo.leitner@gmail.com> wrote: >>> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >>>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >>>> marcelo.leitner@gmail.com> wrote: >>>> > - patch2 - fix rtx attack vector >>>> > - Add the floor value to rto_min to HZ/20 (which fits the values >>>> > that Michael shared on the other email) >>>> >>>> I would encourage allowing minimum RTO values down to 5ms, if the ACK >>>> policy in the receiver makes this feasible. Our experience is that in >>>> datacenter environments it can be advantageous to allow timer-based loss >>>> recoveries using timeout values as low as 5ms, e.g.: >>> >>> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at >>> ~25ms already. Xin, can you share more details on the hw, which CPU >>> was used? > > Hi, > > Did we reach any decision on this? This continues to produce bug > reports on syzbot. I will post a patch later today for the suggestion: - patch1 - fix issue at hand - Use the max_t above to fix this. As for patch2 and patch3: - patch2 - fix rtx attack vector - Add the floor value to rto_min to HZ/20 (which fits the values that Michael shared on the other email) - patch3 - speed up initial HB again - change sctp_cmd_hb_timers_start() so hb timers are kickstarted when the association is established. AFAICT RFC doesn't specify when these initial ones should be sent, and I see no issues with speeding them up. They are more like improvements, we will do it in the future after getting more information. > > I am not sure whom you are asking, because Xin is you unless I am > missing something :) > But if you mean syzbot hardware, then it's GCE VMs with modern Intel > CPUs but an important aspect is a heavy-debug config (which you can > take from here https://syzkaller.appspot.com/bug?extid=3dcd59a1f907245f891f) > and systematic bug reporting. So if it's any flaky in your testing, it > will produce dozens of bug emails on syzbot. > > >> It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" >> # lscpu >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 2 >> On-line CPU(s) list: 0,1 >> Thread(s) per core: 1 >> Core(s) per socket: 1 >> Socket(s): 2 >> NUMA node(s): 1 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 13 >> Model name: QEMU Virtual CPU version 1.5.3 >> Stepping: 3 >> CPU MHz: 2397.222 >> BogoMIPS: 4794.44 >> Hypervisor vendor: KVM >> Virtualization type: full >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 4096K >> NUMA node0 CPU(s): 0,1 >> Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr >> pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good >> nopl cpuid pni cx16 hypervisor lahf_lm abm pti >> >> If we're counting on max_t to fix this CPU stuck. It should not that >> matter if min rto < the value causing that stuck. >> >>> >>> Anyway, what about we add a floor to rto_max too, so that RTO can >>> actually grow into something bigger that don't hog the CPU? Like: >>> rto_min floor = 5ms >>> rto_max floor = 50ms >>> >>> Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-06-04 12:15 ` Xin Long 0 siblings, 0 replies; 34+ messages in thread From: Xin Long @ 2018-06-04 12:15 UTC (permalink / raw) To: Dmitry Vyukov Cc: Marcelo Ricardo Leitner, Neal Cardwell, Michael Tuexen, Neil Horman, Netdev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Mon, Jun 4, 2018 at 4:34 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, May 29, 2018 at 7:45 PM, Xin Long <lucien.xin@gmail.com> wrote: >> On Wed, May 30, 2018 at 1:06 AM, Marcelo Ricardo Leitner >> <marcelo.leitner@gmail.com> wrote: >>> On Tue, May 29, 2018 at 12:03:46PM -0400, Neal Cardwell wrote: >>>> On Tue, May 29, 2018 at 11:45 AM Marcelo Ricardo Leitner < >>>> marcelo.leitner@gmail.com> wrote: >>>> > - patch2 - fix rtx attack vector >>>> > - Add the floor value to rto_min to HZ/20 (which fits the values >>>> > that Michael shared on the other email) >>>> >>>> I would encourage allowing minimum RTO values down to 5ms, if the ACK >>>> policy in the receiver makes this feasible. Our experience is that in >>>> datacenter environments it can be advantageous to allow timer-based loss >>>> recoveries using timeout values as low as 5ms, e.g.: >>> >>> Thanks Neal. On Xin's tests, the hearbeat timer becomes an issue at >>> ~25ms already. Xin, can you share more details on the hw, which CPU >>> was used? > > Hi, > > Did we reach any decision on this? This continues to produce bug > reports on syzbot. I will post a patch later today for the suggestion: - patch1 - fix issue at hand - Use the max_t above to fix this. As for patch2 and patch3: - patch2 - fix rtx attack vector - Add the floor value to rto_min to HZ/20 (which fits the values that Michael shared on the other email) - patch3 - speed up initial HB again - change sctp_cmd_hb_timers_start() so hb timers are kickstarted when the association is established. AFAICT RFC doesn't specify when these initial ones should be sent, and I see no issues with speeding them up. They are more like improvements, we will do it in the future after getting more information. > > I am not sure whom you are asking, because Xin is you unless I am > missing something :) > But if you mean syzbot hardware, then it's GCE VMs with modern Intel > CPUs but an important aspect is a heavy-debug config (which you can > take from here https://syzkaller.appspot.com/bug?extid=cd59a1f907245f891f) > and systematic bug reporting. So if it's any flaky in your testing, it > will produce dozens of bug emails on syzbot. > > >> It was on a KVM guest, "-smp 2,cores=1,threads=1,sockets=2" >> # lscpu >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 2 >> On-line CPU(s) list: 0,1 >> Thread(s) per core: 1 >> Core(s) per socket: 1 >> Socket(s): 2 >> NUMA node(s): 1 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 13 >> Model name: QEMU Virtual CPU version 1.5.3 >> Stepping: 3 >> CPU MHz: 2397.222 >> BogoMIPS: 4794.44 >> Hypervisor vendor: KVM >> Virtualization type: full >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 4096K >> NUMA node0 CPU(s): 0,1 >> Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr >> pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good >> nopl cpuid pni cx16 hypervisor lahf_lm abm pti >> >> If we're counting on max_t to fix this CPU stuck. It should not that >> matter if min rto < the value causing that stuck. >> >>> >>> Anyway, what about we add a floor to rto_max too, so that RTO can >>> actually grow into something bigger that don't hog the CPU? Like: >>> rto_min floor = 5ms >>> rto_max floor = 50ms >>> >>> Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-26 15:50 ` Dmitry Vyukov @ 2018-05-27 8:58 ` Michael Tuexen -1 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-27 8:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller > On 26. May 2018, at 17:50, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > <michael.tuexen@lurchi.franken.de> wrote: >>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >>> >>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>> this timer again with this value, then goes to the timer handler again. >>>> >>>> This problem is there since very beginning, and thanks to Eric for the >>>> reproducer shared from a syzbot mail. >>>> >>>> This patch fixes it by not allowing to set rto_min with a value below >>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>> >>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>> --- >>>> include/net/sctp/constants.h | 1 + >>>> net/sctp/socket.c | 10 +++++++--- >>>> net/sctp/sysctl.c | 3 ++- >>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>> index 20ff237..2ee7a7b 100644 >>>> --- a/include/net/sctp/constants.h >>>> +++ b/include/net/sctp/constants.h >>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>> #define SCTP_RTO_MIN (1 * 1000) >>>> #define SCTP_RTO_MAX (60 * 1000) >>>> +#define SCTP_RTO_HARD_MIN 200 >>>> >>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>> index ae7e7c6..6ef12c7 100644 >>>> --- a/net/sctp/socket.c >>>> +++ b/net/sctp/socket.c >>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>>> * be changed. >>>> * >>>> */ >>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>> + unsigned int optlen) >>>> { >>>> struct sctp_rtoinfo rtoinfo; >>>> struct sctp_association *asoc; >>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>>> else >>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>> >>>> - if (rto_min) >>>> + if (rto_min) { >>>> + if (rto_min < SCTP_RTO_HARD_MIN) >>>> + return -EINVAL; >>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>> - else >>>> + } else { >>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>> + } >>>> >>>> if (rto_min > rto_max) >>>> return -EINVAL; >>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>>> index 33ca5b7..7ec854a 100644 >>>> --- a/net/sctp/sysctl.c >>>> +++ b/net/sctp/sysctl.c >>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>>> static int rto_beta_min = 0; >>>> static int rto_alpha_max = 1000; >>>> static int rto_beta_max = 1000; >>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>>> >>>> static unsigned long max_autoclose_min = 0; >>>> static unsigned long max_autoclose_max = >>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>>> .maxlen = sizeof(unsigned int), >>>> .mode = 0644, >>>> .proc_handler = proc_sctp_do_rto_min, >>>> - .extra1 = &one, >>>> + .extra1 = &rto_hard_min, >>>> .extra2 = &init_net.sctp.rto_max >>>> }, >>>> { >>>> -- >>>> 2.1.0 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >>> well >>> >> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. >> So could this be reduced? > > Hi Michael, > > What value do they use? I have seen values of RTO.Min = 50ms RTO.Max = 200ms RTO.Initial = 100ms Best regards Michael > > Xin, Neil, is there more principled way of ensuring that a timer won't > cause a hard CPU stall? There are slow machines and there are slow > kernels (in particular syzbot kernel has tons of debug configs > enabled). 200ms _should_ not cause problems because we did not see > them with tcp. But it's hard to say what's the low limit as we are > trying to put a hard upper bound on execution time of a complex > section of code. Is there something like cond_resched for timers? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-27 8:58 ` Michael Tuexen 0 siblings, 0 replies; 34+ messages in thread From: Michael Tuexen @ 2018-05-27 8:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller > On 26. May 2018, at 17:50, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen > <michael.tuexen@lurchi.franken.de> wrote: >>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: >>> >>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote: >>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused >>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this >>>> value, hb_timer will get stuck there, as in its timer handler it starts >>>> this timer again with this value, then goes to the timer handler again. >>>> >>>> This problem is there since very beginning, and thanks to Eric for the >>>> reproducer shared from a syzbot mail. >>>> >>>> This patch fixes it by not allowing to set rto_min with a value below >>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl. >>>> >>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com >>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> >>>> Signed-off-by: Xin Long <lucien.xin@gmail.com> >>>> --- >>>> include/net/sctp/constants.h | 1 + >>>> net/sctp/socket.c | 10 +++++++--- >>>> net/sctp/sysctl.c | 3 ++- >>>> 3 files changed, 10 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h >>>> index 20ff237..2ee7a7b 100644 >>>> --- a/include/net/sctp/constants.h >>>> +++ b/include/net/sctp/constants.h >>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 }; >>>> #define SCTP_RTO_INITIAL (3 * 1000) >>>> #define SCTP_RTO_MIN (1 * 1000) >>>> #define SCTP_RTO_MAX (60 * 1000) >>>> +#define SCTP_RTO_HARD_MIN 200 >>>> >>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */ >>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ >>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>>> index ae7e7c6..6ef12c7 100644 >>>> --- a/net/sctp/socket.c >>>> +++ b/net/sctp/socket.c >>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval, >>>> * be changed. >>>> * >>>> */ >>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen) >>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, >>>> + unsigned int optlen) >>>> { >>>> struct sctp_rtoinfo rtoinfo; >>>> struct sctp_association *asoc; >>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne >>>> else >>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max; >>>> >>>> - if (rto_min) >>>> + if (rto_min) { >>>> + if (rto_min < SCTP_RTO_HARD_MIN) >>>> + return -EINVAL; >>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min; >>>> - else >>>> + } else { >>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min; >>>> + } >>>> >>>> if (rto_min > rto_max) >>>> return -EINVAL; >>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c >>>> index 33ca5b7..7ec854a 100644 >>>> --- a/net/sctp/sysctl.c >>>> +++ b/net/sctp/sysctl.c >>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0; >>>> static int rto_beta_min = 0; >>>> static int rto_alpha_max = 1000; >>>> static int rto_beta_max = 1000; >>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN; >>>> >>>> static unsigned long max_autoclose_min = 0; >>>> static unsigned long max_autoclose_max >>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = { >>>> .maxlen = sizeof(unsigned int), >>>> .mode = 0644, >>>> .proc_handler = proc_sctp_do_rto_min, >>>> - .extra1 = &one, >>>> + .extra1 = &rto_hard_min, >>>> .extra2 = &init_net.sctp.rto_max >>>> }, >>>> { >>>> -- >>>> 2.1.0 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as >>> well >>> >> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. >> So could this be reduced? > > Hi Michael, > > What value do they use? I have seen values of RTO.Min = 50ms RTO.Max = 200ms RTO.Initial = 100ms Best regards Michael > > Xin, Neil, is there more principled way of ensuring that a timer won't > cause a hard CPU stall? There are slow machines and there are slow > kernels (in particular syzbot kernel has tons of debug configs > enabled). 200ms _should_ not cause problems because we did not see > them with tcp. But it's hard to say what's the low limit as we are > trying to put a hard upper bound on execution time of a complex > section of code. Is there something like cond_resched for timers? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs 2018-05-27 8:58 ` Michael Tuexen @ 2018-05-28 18:56 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-28 18:56 UTC (permalink / raw) To: Michael Tuexen Cc: Dmitry Vyukov, Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Sun, May 27, 2018 at 10:58:35AM +0200, Michael Tuexen wrote: ... > >>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >>> well > >>> > >> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > >> So could this be reduced? > > > > Hi Michael, > > > > What value do they use? > I have seen values of > RTO.Min = 50ms > RTO.Max = 200ms > RTO.Initial = 100ms Those look doable to me. Thanks, Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs @ 2018-05-28 18:56 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 34+ messages in thread From: Marcelo Ricardo Leitner @ 2018-05-28 18:56 UTC (permalink / raw) To: Michael Tuexen Cc: Dmitry Vyukov, Neil Horman, Xin Long, network dev, linux-sctp, David Miller, David Ahern, Eric Dumazet, syzkaller On Sun, May 27, 2018 at 10:58:35AM +0200, Michael Tuexen wrote: ... > >>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as > >>> well > >>> > >> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms. > >> So could this be reduced? > > > > Hi Michael, > > > > What value do they use? > I have seen values of > RTO.Min = 50ms > RTO.Max = 200ms > RTO.Initial = 100ms Those look doable to me. Thanks, Marcelo ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2018-06-04 12:15 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-25 17:41 [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs Xin Long 2018-05-25 17:41 ` Xin Long 2018-05-25 19:13 ` Neil Horman 2018-05-25 19:13 ` Neil Horman 2018-05-26 15:42 ` Michael Tuexen 2018-05-26 15:42 ` Michael Tuexen 2018-05-26 15:50 ` Dmitry Vyukov 2018-05-26 15:50 ` Dmitry Vyukov 2018-05-27 1:01 ` Neil Horman 2018-05-27 1:01 ` Neil Horman 2018-05-28 19:43 ` Marcelo Ricardo Leitner 2018-05-28 19:43 ` Marcelo Ricardo Leitner 2018-05-29 11:41 ` Neil Horman 2018-05-29 11:41 ` Neil Horman 2018-05-29 13:06 ` Michael Tuexen 2018-05-29 13:06 ` Michael Tuexen 2018-05-29 15:45 ` Marcelo Ricardo Leitner 2018-05-29 15:45 ` Marcelo Ricardo Leitner 2018-05-29 16:03 ` Neal Cardwell 2018-05-29 16:03 ` Neal Cardwell 2018-05-29 17:06 ` Marcelo Ricardo Leitner 2018-05-29 17:06 ` Marcelo Ricardo Leitner 2018-05-29 17:45 ` Xin Long 2018-05-29 17:45 ` Xin Long 2018-05-29 18:02 ` Marcelo Ricardo Leitner 2018-05-29 18:02 ` Marcelo Ricardo Leitner 2018-06-04 8:34 ` Dmitry Vyukov 2018-06-04 8:34 ` Dmitry Vyukov 2018-06-04 12:15 ` Xin Long 2018-06-04 12:15 ` Xin Long 2018-05-27 8:58 ` Michael Tuexen 2018-05-27 8:58 ` Michael Tuexen 2018-05-28 18:56 ` Marcelo Ricardo Leitner 2018-05-28 18:56 ` Marcelo Ricardo Leitner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.