All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2] sctp: Don't add the shutdown timer if its already been added
@ 2016-03-18 20:47 Neil Horman
  2016-03-18 20:53 ` marcelo.leitner
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Neil Horman @ 2016-03-18 20:47 UTC (permalink / raw)
  To: linux-sctp

This BUG halt was recently reported:

PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 #1 [f418dd7c] oops_end at c0863e02
 #2 [f418dd90] do_invalid_op at c040aaca
 #3 [f418de28] error_code (via invalid_op) at c08631a5
    EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
    DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
    CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
 #4 [f418de5c] add_timer at c046fa5e
 #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 #7 [f418df48] inet_shutdown at c080baf9
 #8 [f418df5c] sys_shutdown at c079eedf
 #9 [f418df70] sys_socketcall at c079fe88
    EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
    DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
    SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
    CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282

It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it.  This of
course leads to the BUG halt in add_timer getting called.

Fix seems pretty straightforward, just check before the timer is added if its
already been started.  If it has mod the timer instead to min(current
expiration, new expiration)

Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed.  It appears to be a sane fix to me though.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>

Change notes:

v2) Removed WARN_ON, as its not useful
---
 net/sctp/sm_sideeffect.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index b5327bb..2a3a71a 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1480,9 +1480,24 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 			timeout = asoc->timeouts[cmd->obj.to];
 			BUG_ON(!timeout);
 
-			timer->expires = jiffies + timeout;
-			sctp_association_hold(asoc);
-			add_timer(timer);
+			/*
+			 * SCTP has a hard time with timer starts.  Because we process
+			 * timer starts as side effects, it can be hard to tell if we
+			 * have already started a timer or not, which leads to BUG
+			 * halts when we call add_timer. So here, instead of just starting
+			 * a timer, we warn if the timer is already started, and just mod
+			 * the timer with the shorter of the two expiration times
+			 */
+			if (WARN_ON_ONCE(timer_pending(timer))) {
+				if (time_after(timer->expires, (jiffies + timeout))) {
+					timer->expires = jiffies+timeout;
+					mod_timer(timer, timer->expires);
+				}
+			} else {
+				timer->expires = jiffies + timeout;
+				sctp_association_hold(asoc);
+				mod_timer(timer, timer->expires);
+			}
 			break;
 
 		case SCTP_CMD_TIMER_RESTART:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
@ 2016-03-18 20:53 ` marcelo.leitner
  2016-03-21 11:31 ` Neil Horman
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: marcelo.leitner @ 2016-03-18 20:53 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 18, 2016 at 04:47:26PM -0400, Neil Horman wrote:
> This BUG halt was recently reported:
> 
> PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
>  #0 [f418dd28] crash_kexec at c04a7d8c
>  #1 [f418dd7c] oops_end at c0863e02
>  #2 [f418dd90] do_invalid_op at c040aaca
>  #3 [f418de28] error_code (via invalid_op) at c08631a5
>     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
>     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
>     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
>  #4 [f418de5c] add_timer at c046fa5e
>  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
>  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
>  #7 [f418df48] inet_shutdown at c080baf9
>  #8 [f418df5c] sys_shutdown at c079eedf
>  #9 [f418df70] sys_socketcall at c079fe88
>     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
>     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
>     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
>     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> 
> It appears that the side effect that starts the shutdown timer was processed
> multiple times, which can happen as multiple paths can trigger it.  This of
> course leads to the BUG halt in add_timer getting called.
> 
> Fix seems pretty straightforward, just check before the timer is added if its
> already been started.  If it has mod the timer instead to min(current
> expiration, new expiration)
> 
> Its been tested but not confirmed to fix the problem, as the issue has only
> occured in production environments where test kernels are enjoined from being
> installed.  It appears to be a sane fix to me though.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: "David S. Miller" <davem@davemloft.net>
> 
> Change notes:
> 
> v2) Removed WARN_ON, as its not useful
> ---

....

> +			 */
> +			if (WARN_ON_ONCE(timer_pending(timer))) {

Not sure what happened but it's still here :-)

> +				if (time_after(timer->expires, (jiffies + timeout))) {
> +					timer->expires = jiffies+timeout;
> +					mod_timer(timer, timer->expires);
> +				}
> +			} else {
> +				timer->expires = jiffies + timeout;
> +				sctp_association_hold(asoc);
> +				mod_timer(timer, timer->expires);
> +			}
>  			break;
>  
>  		case SCTP_CMD_TIMER_RESTART:
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
  2016-03-18 20:53 ` marcelo.leitner
@ 2016-03-21 11:31 ` Neil Horman
  2020-04-29 11:36 ` Neil Horman
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2016-03-21 11:31 UTC (permalink / raw)
  To: linux-sctp

On Fri, Mar 18, 2016 at 05:53:21PM -0300, marcelo.leitner@gmail.com wrote:
> On Fri, Mar 18, 2016 at 04:47:26PM -0400, Neil Horman wrote:
> > This BUG halt was recently reported:
> > 
> > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> >  #0 [f418dd28] crash_kexec at c04a7d8c
> >  #1 [f418dd7c] oops_end at c0863e02
> >  #2 [f418dd90] do_invalid_op at c040aaca
> >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> >  #4 [f418de5c] add_timer at c046fa5e
> >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> >  #7 [f418df48] inet_shutdown at c080baf9
> >  #8 [f418df5c] sys_shutdown at c079eedf
> >  #9 [f418df70] sys_socketcall at c079fe88
> >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > 
> > It appears that the side effect that starts the shutdown timer was processed
> > multiple times, which can happen as multiple paths can trigger it.  This of
> > course leads to the BUG halt in add_timer getting called.
> > 
> > Fix seems pretty straightforward, just check before the timer is added if its
> > already been started.  If it has mod the timer instead to min(current
> > expiration, new expiration)
> > 
> > Its been tested but not confirmed to fix the problem, as the issue has only
> > occured in production environments where test kernels are enjoined from being
> > installed.  It appears to be a sane fix to me though.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Vlad Yasevich <vyasevich@gmail.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > 
> > Change notes:
> > 
> > v2) Removed WARN_ON, as its not useful
> > ---
> 
> ....
> 
> > +			 */
> > +			if (WARN_ON_ONCE(timer_pending(timer))) {
> 
> Not sure what happened but it's still here :-)
My favorite fat finger action of course, forgot to git add before I did the
commit --amend.  Sorry, new patch in a sec :)

> 
> > +				if (time_after(timer->expires, (jiffies + timeout))) {
> > +					timer->expires = jiffies+timeout;
> > +					mod_timer(timer, timer->expires);
> > +				}
> > +			} else {
> > +				timer->expires = jiffies + timeout;
> > +				sctp_association_hold(asoc);
> > +				mod_timer(timer, timer->expires);
> > +			}
> >  			break;
> >  
> >  		case SCTP_CMD_TIMER_RESTART:
> > -- 
> > 2.5.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
  2016-03-18 20:53 ` marcelo.leitner
  2016-03-21 11:31 ` Neil Horman
@ 2020-04-29 11:36 ` Neil Horman
  2020-04-30  8:54 ` Jere Leppänen
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2020-04-29 11:36 UTC (permalink / raw)
  To: linux-sctp

This BUG halt was reported a while back, but the patch somehow got
missed:

PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 #1 [f418dd7c] oops_end at c0863e02
 #2 [f418dd90] do_invalid_op at c040aaca
 #3 [f418de28] error_code (via invalid_op) at c08631a5
    EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
    DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
    CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
 #4 [f418de5c] add_timer at c046fa5e
 #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 #7 [f418df48] inet_shutdown at c080baf9
 #8 [f418df5c] sys_shutdown at c079eedf
 #9 [f418df70] sys_socketcall at c079fe88
    EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
    DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
    SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
    CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282

It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it.  This of
course leads to the BUG halt in add_timer getting called.

Fix seems pretty straightforward, just check before the timer is added if its
already been started.  If it has mod the timer instead to min(current
expiration, new expiration)

Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed.  It appears to be a sane fix to me though.  Also, recentely,
Jere found a reproducer posted on list to confirm that this resolves the
issues

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jere Leppänen <jere.leppanen@nokia.com>
CC: marcelo.leitner@gmail.com

---
Change notes:
V2) Updated to use timer_reduce
---
 net/sctp/sm_sideeffect.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 2bc29463e1dc..9f36fe911d08 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1523,9 +1523,17 @@ static int sctp_cmd_interpreter(enum sctp_event_type event_type,
 			timeout = asoc->timeouts[cmd->obj.to];
 			BUG_ON(!timeout);
 
-			timer->expires = jiffies + timeout;
-			sctp_association_hold(asoc);
-			add_timer(timer);
+			/*
+			 * SCTP has a hard time with timer starts.  Because we process
+			 * timer starts as side effects, it can be hard to tell if we
+			 * have already started a timer or not, which leads to BUG
+			 * halts when we call add_timer. So here, instead of just starting
+			 * a timer, if the timer is already started, and just mod
+			 * the timer with the shorter of the two expiration times
+			 */
+			if (!timer_pending(timer))
+				sctp_association_hold(asoc);
+			timer_reduce(timer, jiffies + timeout);
 			break;
 
 		case SCTP_CMD_TIMER_RESTART:
-- 
2.25.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (2 preceding siblings ...)
  2020-04-29 11:36 ` Neil Horman
@ 2020-04-30  8:54 ` Jere Leppänen
  2020-04-30 10:59 ` Neil Horman
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jere Leppänen @ 2020-04-30  8:54 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 3359 bytes --]

On Wed, 29 Apr 2020, Neil Horman wrote:

> This BUG halt was reported a while back, but the patch somehow got
> missed:
> 
> PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
>  #0 [f418dd28] crash_kexec at c04a7d8c
>  #1 [f418dd7c] oops_end at c0863e02
>  #2 [f418dd90] do_invalid_op at c040aaca
>  #3 [f418de28] error_code (via invalid_op) at c08631a5
>     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
>     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
>     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
>  #4 [f418de5c] add_timer at c046fa5e
>  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
>  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
>  #7 [f418df48] inet_shutdown at c080baf9
>  #8 [f418df5c] sys_shutdown at c079eedf
>  #9 [f418df70] sys_socketcall at c079fe88
>     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
>     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
>     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
>     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> 
> It appears that the side effect that starts the shutdown timer was processed
> multiple times, which can happen as multiple paths can trigger it.  This of
> course leads to the BUG halt in add_timer getting called.
> 
> Fix seems pretty straightforward, just check before the timer is added if its
> already been started.  If it has mod the timer instead to min(current
> expiration, new expiration)
> 
> Its been tested but not confirmed to fix the problem, as the issue has only
> occured in production environments where test kernels are enjoined from being
> installed.  It appears to be a sane fix to me though.  Also, recentely,
> Jere found a reproducer posted on list to confirm that this resolves the
> issues
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Jere Leppänen <jere.leppanen@nokia.com>
> CC: marcelo.leitner@gmail.com
> 
> ---
> Change notes:
> V2) Updated to use timer_reduce
> ---
>  net/sctp/sm_sideeffect.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 2bc29463e1dc..9f36fe911d08 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1523,9 +1523,17 @@ static int sctp_cmd_interpreter(enum sctp_event_type event_type,
>  			timeout = asoc->timeouts[cmd->obj.to];
>  			BUG_ON(!timeout);
>  
> -			timer->expires = jiffies + timeout;
> -			sctp_association_hold(asoc);
> -			add_timer(timer);
> +			/*
> +			 * SCTP has a hard time with timer starts.  Because we process
> +			 * timer starts as side effects, it can be hard to tell if we
> +			 * have already started a timer or not, which leads to BUG
> +			 * halts when we call add_timer. So here, instead of just starting
> +			 * a timer, if the timer is already started, and just mod
> +			 * the timer with the shorter of the two expiration times
> +			 */
> +			if (!timer_pending(timer))
> +				sctp_association_hold(asoc);
> +			timer_reduce(timer, jiffies + timeout);
>  			break;
>  
>  		case SCTP_CMD_TIMER_RESTART:
> -- 
> 2.25.4

Thanks Neil. This patch works for me, but I only tested the one case I 
posted about.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (3 preceding siblings ...)
  2020-04-30  8:54 ` Jere Leppänen
@ 2020-04-30 10:59 ` Neil Horman
  2020-04-30 13:59 ` Marcelo Ricardo Leitner
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2020-04-30 10:59 UTC (permalink / raw)
  To: linux-sctp

On Thu, Apr 30, 2020 at 11:54:11AM +0300, Jere Leppänen wrote:
> On Wed, 29 Apr 2020, Neil Horman wrote:
> 
> > This BUG halt was reported a while back, but the patch somehow got
> > missed:
> > 
> > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> >  #0 [f418dd28] crash_kexec at c04a7d8c
> >  #1 [f418dd7c] oops_end at c0863e02
> >  #2 [f418dd90] do_invalid_op at c040aaca
> >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> >  #4 [f418de5c] add_timer at c046fa5e
> >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> >  #7 [f418df48] inet_shutdown at c080baf9
> >  #8 [f418df5c] sys_shutdown at c079eedf
> >  #9 [f418df70] sys_socketcall at c079fe88
> >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > 
> > It appears that the side effect that starts the shutdown timer was processed
> > multiple times, which can happen as multiple paths can trigger it.  This of
> > course leads to the BUG halt in add_timer getting called.
> > 
> > Fix seems pretty straightforward, just check before the timer is added if its
> > already been started.  If it has mod the timer instead to min(current
> > expiration, new expiration)
> > 
> > Its been tested but not confirmed to fix the problem, as the issue has only
> > occured in production environments where test kernels are enjoined from being
> > installed.  It appears to be a sane fix to me though.  Also, recentely,
> > Jere found a reproducer posted on list to confirm that this resolves the
> > issues
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Vlad Yasevich <vyasevich@gmail.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: Jere Leppänen <jere.leppanen@nokia.com>
> > CC: marcelo.leitner@gmail.com
> > 
> > ---
> > Change notes:
> > V2) Updated to use timer_reduce
> > ---
> >  net/sctp/sm_sideeffect.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> > index 2bc29463e1dc..9f36fe911d08 100644
> > --- a/net/sctp/sm_sideeffect.c
> > +++ b/net/sctp/sm_sideeffect.c
> > @@ -1523,9 +1523,17 @@ static int sctp_cmd_interpreter(enum sctp_event_type event_type,
> >  			timeout = asoc->timeouts[cmd->obj.to];
> >  			BUG_ON(!timeout);
> >  
> > -			timer->expires = jiffies + timeout;
> > -			sctp_association_hold(asoc);
> > -			add_timer(timer);
> > +			/*
> > +			 * SCTP has a hard time with timer starts.  Because we process
> > +			 * timer starts as side effects, it can be hard to tell if we
> > +			 * have already started a timer or not, which leads to BUG
> > +			 * halts when we call add_timer. So here, instead of just starting
> > +			 * a timer, if the timer is already started, and just mod
> > +			 * the timer with the shorter of the two expiration times
> > +			 */
> > +			if (!timer_pending(timer))
> > +				sctp_association_hold(asoc);
> > +			timer_reduce(timer, jiffies + timeout);
> >  			break;
> >  
> >  		case SCTP_CMD_TIMER_RESTART:
> > -- 
> > 2.25.4
> 
> Thanks Neil. This patch works for me, but I only tested the one case I 
> posted about.

Thank you, that test is more than we had before though, so I appreciate it.

Best
Neil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (4 preceding siblings ...)
  2020-04-30 10:59 ` Neil Horman
@ 2020-04-30 13:59 ` Marcelo Ricardo Leitner
  2020-05-19  8:51 ` Jere Leppänen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-04-30 13:59 UTC (permalink / raw)
  To: linux-sctp

On Wed, Apr 29, 2020 at 07:36:13AM -0400, Neil Horman wrote:
> This BUG halt was reported a while back, but the patch somehow got
> missed:
> 
> PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
>  #0 [f418dd28] crash_kexec at c04a7d8c
>  #1 [f418dd7c] oops_end at c0863e02
>  #2 [f418dd90] do_invalid_op at c040aaca
>  #3 [f418de28] error_code (via invalid_op) at c08631a5
>     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
>     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
>     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
>  #4 [f418de5c] add_timer at c046fa5e
>  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
>  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
>  #7 [f418df48] inet_shutdown at c080baf9
>  #8 [f418df5c] sys_shutdown at c079eedf
>  #9 [f418df70] sys_socketcall at c079fe88
>     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
>     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
>     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
>     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> 
> It appears that the side effect that starts the shutdown timer was processed
> multiple times, which can happen as multiple paths can trigger it.  This of
> course leads to the BUG halt in add_timer getting called.
> 
> Fix seems pretty straightforward, just check before the timer is added if its
> already been started.  If it has mod the timer instead to min(current
> expiration, new expiration)
> 
> Its been tested but not confirmed to fix the problem, as the issue has only
> occured in production environments where test kernels are enjoined from being
> installed.  It appears to be a sane fix to me though.  Also, recentely,
> Jere found a reproducer posted on list to confirm that this resolves the
> issues
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Jere Leppänen <jere.leppanen@nokia.com>
> CC: marcelo.leitner@gmail.com
> 
> ---
> Change notes:
> V2) Updated to use timer_reduce

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (5 preceding siblings ...)
  2020-04-30 13:59 ` Marcelo Ricardo Leitner
@ 2020-05-19  8:51 ` Jere Leppänen
  2020-05-19 19:24 ` Marcelo Ricardo Leitner
  2020-05-19 19:59 ` Neil Horman
  8 siblings, 0 replies; 10+ messages in thread
From: Jere Leppänen @ 2020-05-19  8:51 UTC (permalink / raw)
  To: linux-sctp

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Thu, 30 Apr 2020, Marcelo Ricardo Leitner wrote:

> On Wed, Apr 29, 2020 at 07:36:13AM -0400, Neil Horman wrote:
> > This BUG halt was reported a while back, but the patch somehow got
> > missed:
> > 
> > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> >  #0 [f418dd28] crash_kexec at c04a7d8c
> >  #1 [f418dd7c] oops_end at c0863e02
> >  #2 [f418dd90] do_invalid_op at c040aaca
> >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> >  #4 [f418de5c] add_timer at c046fa5e
> >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> >  #7 [f418df48] inet_shutdown at c080baf9
> >  #8 [f418df5c] sys_shutdown at c079eedf
> >  #9 [f418df70] sys_socketcall at c079fe88
> >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > 
> > It appears that the side effect that starts the shutdown timer was processed
> > multiple times, which can happen as multiple paths can trigger it.  This of
> > course leads to the BUG halt in add_timer getting called.
> > 
> > Fix seems pretty straightforward, just check before the timer is added if its
> > already been started.  If it has mod the timer instead to min(current
> > expiration, new expiration)
> > 
> > Its been tested but not confirmed to fix the problem, as the issue has only
> > occured in production environments where test kernels are enjoined from being
> > installed.  It appears to be a sane fix to me though.  Also, recentely,
> > Jere found a reproducer posted on list to confirm that this resolves the
> > issues
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Vlad Yasevich <vyasevich@gmail.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: Jere Leppänen <jere.leppanen@nokia.com>
> > CC: marcelo.leitner@gmail.com
> > 
> > ---
> > Change notes:
> > V2) Updated to use timer_reduce
> 
> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Hey is this patch falling through the cracks again? No rush, I'm just 
wondering what's going on.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (6 preceding siblings ...)
  2020-05-19  8:51 ` Jere Leppänen
@ 2020-05-19 19:24 ` Marcelo Ricardo Leitner
  2020-05-19 19:59 ` Neil Horman
  8 siblings, 0 replies; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2020-05-19 19:24 UTC (permalink / raw)
  To: linux-sctp

On Tue, May 19, 2020 at 11:51:23AM +0300, Jere Leppänen wrote:
> On Thu, 30 Apr 2020, Marcelo Ricardo Leitner wrote:
> 
> > On Wed, Apr 29, 2020 at 07:36:13AM -0400, Neil Horman wrote:
> > > This BUG halt was reported a while back, but the patch somehow got
> > > missed:
> > > 
> > > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> > >  #0 [f418dd28] crash_kexec at c04a7d8c
> > >  #1 [f418dd7c] oops_end at c0863e02
> > >  #2 [f418dd90] do_invalid_op at c040aaca
> > >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> > >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> > >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> > >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> > >  #4 [f418de5c] add_timer at c046fa5e
> > >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> > >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> > >  #7 [f418df48] inet_shutdown at c080baf9
> > >  #8 [f418df5c] sys_shutdown at c079eedf
> > >  #9 [f418df70] sys_socketcall at c079fe88
> > >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> > >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> > >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> > >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > > 
> > > It appears that the side effect that starts the shutdown timer was processed
> > > multiple times, which can happen as multiple paths can trigger it.  This of
> > > course leads to the BUG halt in add_timer getting called.
> > > 
> > > Fix seems pretty straightforward, just check before the timer is added if its
> > > already been started.  If it has mod the timer instead to min(current
> > > expiration, new expiration)
> > > 
> > > Its been tested but not confirmed to fix the problem, as the issue has only
> > > occured in production environments where test kernels are enjoined from being
> > > installed.  It appears to be a sane fix to me though.  Also, recentely,
> > > Jere found a reproducer posted on list to confirm that this resolves the
> > > issues
> > > 
> > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > CC: Vlad Yasevich <vyasevich@gmail.com>
> > > CC: "David S. Miller" <davem@davemloft.net>
> > > CC: Jere Leppänen <jere.leppanen@nokia.com>
> > > CC: marcelo.leitner@gmail.com
> > > 
> > > ---
> > > Change notes:
> > > V2) Updated to use timer_reduce
> > 
> > Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> 
> Hey is this patch falling through the cracks again? No rush, I'm just 
> wondering what's going on.

Whoops, sounds like Neil forgot to Cc netdev@..

  Marcelo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv2] sctp: Don't add the shutdown timer if its already been added
  2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
                   ` (7 preceding siblings ...)
  2020-05-19 19:24 ` Marcelo Ricardo Leitner
@ 2020-05-19 19:59 ` Neil Horman
  8 siblings, 0 replies; 10+ messages in thread
From: Neil Horman @ 2020-05-19 19:59 UTC (permalink / raw)
  To: linux-sctp

On Tue, May 19, 2020 at 04:24:10PM -0300, Marcelo Ricardo Leitner wrote:
> On Tue, May 19, 2020 at 11:51:23AM +0300, Jere Leppänen wrote:
> > On Thu, 30 Apr 2020, Marcelo Ricardo Leitner wrote:
> > 
> > > On Wed, Apr 29, 2020 at 07:36:13AM -0400, Neil Horman wrote:
> > > > This BUG halt was reported a while back, but the patch somehow got
> > > > missed:
> > > > 
> > > > PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
> > > >  #0 [f418dd28] crash_kexec at c04a7d8c
> > > >  #1 [f418dd7c] oops_end at c0863e02
> > > >  #2 [f418dd90] do_invalid_op at c040aaca
> > > >  #3 [f418de28] error_code (via invalid_op) at c08631a5
> > > >     EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
> > > >     DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
> > > >     CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
> > > >  #4 [f418de5c] add_timer at c046fa5e
> > > >  #5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
> > > >  #6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
> > > >  #7 [f418df48] inet_shutdown at c080baf9
> > > >  #8 [f418df5c] sys_shutdown at c079eedf
> > > >  #9 [f418df70] sys_socketcall at c079fe88
> > > >     EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
> > > >     DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
> > > >     SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
> > > >     CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282
> > > > 
> > > > It appears that the side effect that starts the shutdown timer was processed
> > > > multiple times, which can happen as multiple paths can trigger it.  This of
> > > > course leads to the BUG halt in add_timer getting called.
> > > > 
> > > > Fix seems pretty straightforward, just check before the timer is added if its
> > > > already been started.  If it has mod the timer instead to min(current
> > > > expiration, new expiration)
> > > > 
> > > > Its been tested but not confirmed to fix the problem, as the issue has only
> > > > occured in production environments where test kernels are enjoined from being
> > > > installed.  It appears to be a sane fix to me though.  Also, recentely,
> > > > Jere found a reproducer posted on list to confirm that this resolves the
> > > > issues
> > > > 
> > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > > CC: Vlad Yasevich <vyasevich@gmail.com>
> > > > CC: "David S. Miller" <davem@davemloft.net>
> > > > CC: Jere Leppänen <jere.leppanen@nokia.com>
> > > > CC: marcelo.leitner@gmail.com
> > > > 
> > > > ---
> > > > Change notes:
> > > > V2) Updated to use timer_reduce
> > > 
> > > Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > 
> > Hey is this patch falling through the cracks again? No rush, I'm just 
> > wondering what's going on.
> 
> Whoops, sounds like Neil forgot to Cc netdev@..
> 
>   Marcelo
> 
Crap, my bad.  I'm stuck in a call at the moment, but I'll resend this tomorrow
morning.
Thanks for following up!
Neil

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-19 19:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18 20:47 [PATCHv2] sctp: Don't add the shutdown timer if its already been added Neil Horman
2016-03-18 20:53 ` marcelo.leitner
2016-03-21 11:31 ` Neil Horman
2020-04-29 11:36 ` Neil Horman
2020-04-30  8:54 ` Jere Leppänen
2020-04-30 10:59 ` Neil Horman
2020-04-30 13:59 ` Marcelo Ricardo Leitner
2020-05-19  8:51 ` Jere Leppänen
2020-05-19 19:24 ` Marcelo Ricardo Leitner
2020-05-19 19:59 ` Neil Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.