All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe)
@ 2019-10-25  7:28 Jon Hunter
  2019-11-05 12:03 ` Jon Hunter
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2019-10-25  7:28 UTC (permalink / raw)
  To: Trond Myklebust, linux-tegra, Linux Kernel Mailing List

Hi Trond,

Similar to the change 431235818bc3 ("SUNRPC: Declare RPC timers as
TIMER_DEFERRABLE") I have been tracking down another suspend/NFS related
issue where again I am seeing random delays exiting suspend. The delays
can be up to a couple minutes in the worst case and this is causing a
suspend test we have to fail. For example, with this change I see ...

[  130.599520] PM: suspend entry (deep)

[  130.607267] Filesystems sync: 0.000 seconds

[  130.615800] Freezing user space processes ... (elapsed 0.001 seconds) done.

[  130.628247] OOM killer disabled.

[  130.635382] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.

[  130.648052] printk: Suspending console(s) (use no_console_suspend to debug)

[  130.686015] Disabling non-boot CPUs ...

[  130.689568] IRQ 17: no longer affine to CPU2

[  130.693435] Entering suspend state LP1

[  130.693489] Enabling non-boot CPUs ...

[  130.697108] CPU1 is up

[  130.700602] CPU2 is up

[  130.704338] CPU3 is up

[  130.781259] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)

[  130.789742] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[  130.792793] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)

[  130.820913] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)

[  131.345569] OOM killer enabled.

[  131.352643] Restarting tasks ... done.

[  131.365480] PM: suspend exit

[  134.524261] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1

[  243.745788] nfs: server 192.168.99.1 not responding, still trying

[  243.745811] nfs: server 192.168.99.1 not responding, still trying

[  243.767939] nfs: server 192.168.99.1 not responding, still trying

[  243.778233] nfs: server 192.168.99.1 OK

[  243.787058] nfs: server 192.168.99.1 OK

[  243.787542] nfs: server 192.168.99.1 OK


Running a git bisect I was able to track it down to the commit referenced
in the $subject. Reverting this on top of the current mainline fixes the
problem and I no longer see these long delays.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe)
  2019-10-25  7:28 [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe) Jon Hunter
@ 2019-11-05 12:03 ` Jon Hunter
  2019-11-05 14:21   ` [PATCH] SUNRPC: Avoid RPC delays when exiting suspend Trond Myklebust
  2019-11-05 19:37   ` [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe) Trond Myklebust
  0 siblings, 2 replies; 6+ messages in thread
From: Jon Hunter @ 2019-11-05 12:03 UTC (permalink / raw)
  To: Trond Myklebust, linux-tegra, Linux Kernel Mailing List

Hi Trond,

Any feedback on this?

Jon

On 25/10/2019 08:28, Jon Hunter wrote:
> Hi Trond,
> 
> Similar to the change 431235818bc3 ("SUNRPC: Declare RPC timers as
> TIMER_DEFERRABLE") I have been tracking down another suspend/NFS related
> issue where again I am seeing random delays exiting suspend. The delays
> can be up to a couple minutes in the worst case and this is causing a
> suspend test we have to fail. For example, with this change I see ...
> 
> [  130.599520] PM: suspend entry (deep)
> 
> [  130.607267] Filesystems sync: 0.000 seconds
> 
> [  130.615800] Freezing user space processes ... (elapsed 0.001 seconds) done.
> 
> [  130.628247] OOM killer disabled.
> 
> [  130.635382] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> 
> [  130.648052] printk: Suspending console(s) (use no_console_suspend to debug)
> 
> [  130.686015] Disabling non-boot CPUs ...
> 
> [  130.689568] IRQ 17: no longer affine to CPU2
> 
> [  130.693435] Entering suspend state LP1
> 
> [  130.693489] Enabling non-boot CPUs ...
> 
> [  130.697108] CPU1 is up
> 
> [  130.700602] CPU2 is up
> 
> [  130.704338] CPU3 is up
> 
> [  130.781259] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)
> 
> [  130.789742] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> 
> [  130.792793] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> 
> [  130.820913] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)
> 
> [  131.345569] OOM killer enabled.
> 
> [  131.352643] Restarting tasks ... done.
> 
> [  131.365480] PM: suspend exit
> 
> [  134.524261] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
> 
> [  243.745788] nfs: server 192.168.99.1 not responding, still trying
> 
> [  243.745811] nfs: server 192.168.99.1 not responding, still trying
> 
> [  243.767939] nfs: server 192.168.99.1 not responding, still trying
> 
> [  243.778233] nfs: server 192.168.99.1 OK
> 
> [  243.787058] nfs: server 192.168.99.1 OK
> 
> [  243.787542] nfs: server 192.168.99.1 OK
> 
> 
> Running a git bisect I was able to track it down to the commit referenced
> in the $subject. Reverting this on top of the current mainline fixes the
> problem and I no longer see these long delays.
> 
> Cheers
> Jon
> 

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] SUNRPC: Avoid RPC delays when exiting suspend
  2019-11-05 12:03 ` Jon Hunter
@ 2019-11-05 14:21   ` Trond Myklebust
  2019-11-06 11:15     ` Jon Hunter
  2019-11-05 19:37   ` [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe) Trond Myklebust
  1 sibling, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2019-11-05 14:21 UTC (permalink / raw)
  To: Jon Hunter; +Cc: linux-nfs

Jon Hunter: "I have been tracking down another suspend/NFS related
issue where again I am seeing random delays exiting suspend. The delays
can be up to a couple minutes in the worst case and this is causing a
suspend test we have to fail."

Change the use of a deferrable work to a standard delayed one.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 7e0a0e38fcfea ("SUNRPC: Replace the queue timer with a delayed work function")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 360afe153193..987c4b1f0b17 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -260,7 +260,7 @@ static void __rpc_init_priority_wait_queue(struct rpc_wait_queue *queue, const c
 	rpc_reset_waitqueue_priority(queue);
 	queue->qlen = 0;
 	queue->timer_list.expires = 0;
-	INIT_DEFERRABLE_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
+	INIT_DELAYED_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
 	INIT_LIST_HEAD(&queue->timer_list.list);
 	rpc_assign_waitqueue_name(queue, qname);
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe)
  2019-11-05 12:03 ` Jon Hunter
  2019-11-05 14:21   ` [PATCH] SUNRPC: Avoid RPC delays when exiting suspend Trond Myklebust
@ 2019-11-05 19:37   ` Trond Myklebust
  1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2019-11-05 19:37 UTC (permalink / raw)
  To: jonathanh, linux-kernel, linux-tegra

On Tue, 2019-11-05 at 12:03 +0000, Jon Hunter wrote:
> Hi Trond,
> 
> Any feedback on this?

See the patch I sent this morning. I believe that ought to fix the
problem.
Sorry about the regression. As you saw from the original patch, it
predates your first bug report, and when I queued it up for submission,
I missed that it was copying the "deferrable" status that caused that
first bug.

Cheers
  Trond

> 
> Jon
> 
> On 25/10/2019 08:28, Jon Hunter wrote:
> > Hi Trond,
> > 
> > Similar to the change 431235818bc3 ("SUNRPC: Declare RPC timers as
> > TIMER_DEFERRABLE") I have been tracking down another suspend/NFS
> > related
> > issue where again I am seeing random delays exiting suspend. The
> > delays
> > can be up to a couple minutes in the worst case and this is causing
> > a
> > suspend test we have to fail. For example, with this change I see
> > ...
> > 
> > [  130.599520] PM: suspend entry (deep)
> > 
> > [  130.607267] Filesystems sync: 0.000 seconds
> > 
> > [  130.615800] Freezing user space processes ... (elapsed 0.001
> > seconds) done.
> > 
> > [  130.628247] OOM killer disabled.
> > 
> > [  130.635382] Freezing remaining freezable tasks ... (elapsed
> > 0.001 seconds) done.
> > 
> > [  130.648052] printk: Suspending console(s) (use
> > no_console_suspend to debug)
> > 
> > [  130.686015] Disabling non-boot CPUs ...
> > 
> > [  130.689568] IRQ 17: no longer affine to CPU2
> > 
> > [  130.693435] Entering suspend state LP1
> > 
> > [  130.693489] Enabling non-boot CPUs ...
> > 
> > [  130.697108] CPU1 is up
> > 
> > [  130.700602] CPU2 is up
> > 
> > [  130.704338] CPU3 is up
> > 
> > [  130.781259] mmc1: queuing unknown CIS tuple 0x80 (50 bytes)
> > 
> > [  130.789742] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> > 
> > [  130.792793] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
> > 
> > [  130.820913] mmc1: queuing unknown CIS tuple 0x02 (1 bytes)
> > 
> > [  131.345569] OOM killer enabled.
> > 
> > [  131.352643] Restarting tasks ... done.
> > 
> > [  131.365480] PM: suspend exit
> > 
> > [  134.524261] asix 1-1:1.0 eth0: link up, 100Mbps, full-duplex,
> > lpa 0xCDE1
> > 
> > [  243.745788] nfs: server 192.168.99.1 not responding, still
> > trying
> > 
> > [  243.745811] nfs: server 192.168.99.1 not responding, still
> > trying
> > 
> > [  243.767939] nfs: server 192.168.99.1 not responding, still
> > trying
> > 
> > [  243.778233] nfs: server 192.168.99.1 OK
> > 
> > [  243.787058] nfs: server 192.168.99.1 OK
> > 
> > [  243.787542] nfs: server 192.168.99.1 OK
> > 
> > 
> > Running a git bisect I was able to track it down to the commit
> > referenced
> > in the $subject. Reverting this on top of the current mainline
> > fixes the
> > problem and I no longer see these long delays.
> > 
> > Cheers
> > Jon
> > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] SUNRPC: Avoid RPC delays when exiting suspend
  2019-11-05 14:21   ` [PATCH] SUNRPC: Avoid RPC delays when exiting suspend Trond Myklebust
@ 2019-11-06 11:15     ` Jon Hunter
  2019-11-18 10:21       ` Jon Hunter
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2019-11-06 11:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, linux-tegra


On 05/11/2019 14:21, Trond Myklebust wrote:
> Jon Hunter: "I have been tracking down another suspend/NFS related
> issue where again I am seeing random delays exiting suspend. The delays
> can be up to a couple minutes in the worst case and this is causing a
> suspend test we have to fail."
> 
> Change the use of a deferrable work to a standard delayed one.
> 
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Fixes: 7e0a0e38fcfea ("SUNRPC: Replace the queue timer with a delayed work function")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  net/sunrpc/sched.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 360afe153193..987c4b1f0b17 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -260,7 +260,7 @@ static void __rpc_init_priority_wait_queue(struct rpc_wait_queue *queue, const c
>  	rpc_reset_waitqueue_priority(queue);
>  	queue->qlen = 0;
>  	queue->timer_list.expires = 0;
> -	INIT_DEFERRABLE_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
> +	INIT_DELAYED_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
>  	INIT_LIST_HEAD(&queue->timer_list.list);
>  	rpc_assign_waitqueue_name(queue, qname);
>  }

Thanks!

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] SUNRPC: Avoid RPC delays when exiting suspend
  2019-11-06 11:15     ` Jon Hunter
@ 2019-11-18 10:21       ` Jon Hunter
  0 siblings, 0 replies; 6+ messages in thread
From: Jon Hunter @ 2019-11-18 10:21 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, linux-tegra

Hi Trond,

On 06/11/2019 11:15, Jon Hunter wrote:
> 
> On 05/11/2019 14:21, Trond Myklebust wrote:
>> Jon Hunter: "I have been tracking down another suspend/NFS related
>> issue where again I am seeing random delays exiting suspend. The delays
>> can be up to a couple minutes in the worst case and this is causing a
>> suspend test we have to fail."
>>
>> Change the use of a deferrable work to a standard delayed one.
>>
>> Reported-by: Jon Hunter <jonathanh@nvidia.com>
>> Fixes: 7e0a0e38fcfea ("SUNRPC: Replace the queue timer with a delayed work function")
>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>> ---
>>  net/sunrpc/sched.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 360afe153193..987c4b1f0b17 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -260,7 +260,7 @@ static void __rpc_init_priority_wait_queue(struct rpc_wait_queue *queue, const c
>>  	rpc_reset_waitqueue_priority(queue);
>>  	queue->qlen = 0;
>>  	queue->timer_list.expires = 0;
>> -	INIT_DEFERRABLE_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
>> +	INIT_DELAYED_WORK(&queue->timer_list.dwork, __rpc_queue_timer_fn);
>>  	INIT_LIST_HEAD(&queue->timer_list.list);
>>  	rpc_assign_waitqueue_name(queue, qname);
>>  }
> 
> Thanks!
> 
> Tested-by: Jon Hunter <jonathanh@nvidia.com>

I see this is now applied in -next, but I am seeing the failures on
mainline. Any chance we could still get this into v5.4?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-18 10:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25  7:28 [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe) Jon Hunter
2019-11-05 12:03 ` Jon Hunter
2019-11-05 14:21   ` [PATCH] SUNRPC: Avoid RPC delays when exiting suspend Trond Myklebust
2019-11-06 11:15     ` Jon Hunter
2019-11-18 10:21       ` Jon Hunter
2019-11-05 19:37   ` [REGRESSION v5.3] SUNRPC: Replace the queue timer with a delayed work function (7e0a0e38fcfe) Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.