linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression: SUNRPC: Use poll() to fix up the socket requeue races
@ 2019-02-25 20:25 Jon Hunter
  2019-02-25 21:03 ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Hunter @ 2019-02-25 20:25 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs, Linux Kernel Mailing List, linux-tegra

Hi Trond,

Starting in next-20190222 I have observed a regression with NFS causing
some of our boards to fail to boot. Bisect points to your commit ...

commit 0ffe86f48026b7f34db22d1004bc9992f0db8b33
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Wed Jan 30 14:51:26 2019 -0500

    SUNRPC: Use poll() to fix up the socket requeue races


After reverting this on top of -next I no longer see the problem. I have
not had chance to look any closer, but wanted to see if you had any
ideas what might be the problem.

Cheers
Jon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-25 20:25 Regression: SUNRPC: Use poll() to fix up the socket requeue races Jon Hunter
@ 2019-02-25 21:03 ` Trond Myklebust
  2019-02-25 22:27   ` Jon Hunter
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2019-02-25 21:03 UTC (permalink / raw)
  To: jonathanh, linux-kernel, linux-nfs, linux-tegra

On Mon, 2019-02-25 at 20:25 +0000, Jon Hunter wrote:
> Hi Trond,
> 
> Starting in next-20190222 I have observed a regression with NFS
> causing
> some of our boards to fail to boot. Bisect points to your commit ...
> 
> commit 0ffe86f48026b7f34db22d1004bc9992f0db8b33
> Author: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date:   Wed Jan 30 14:51:26 2019 -0500
> 
>     SUNRPC: Use poll() to fix up the socket requeue races
> 
> 
> After reverting this on top of -next I no longer see the problem. I
> have
> not had chance to look any closer, but wanted to see if you had any
> ideas what might be the problem.
> 
> Cheers
> Jon

What kind of boot is this? UDP or TCP? nfsroot? NFSv3 or NFSv4? 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-25 21:03 ` Trond Myklebust
@ 2019-02-25 22:27   ` Jon Hunter
  2019-02-25 22:41     ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Hunter @ 2019-02-25 22:27 UTC (permalink / raw)
  To: Trond Myklebust, linux-kernel, linux-nfs, linux-tegra


On 25/02/2019 21:03, Trond Myklebust wrote:
> On Mon, 2019-02-25 at 20:25 +0000, Jon Hunter wrote:
>> Hi Trond,
>>
>> Starting in next-20190222 I have observed a regression with NFS
>> causing
>> some of our boards to fail to boot. Bisect points to your commit ...
>>
>> commit 0ffe86f48026b7f34db22d1004bc9992f0db8b33
>> Author: Trond Myklebust <trond.myklebust@hammerspace.com>
>> Date:   Wed Jan 30 14:51:26 2019 -0500
>>
>>     SUNRPC: Use poll() to fix up the socket requeue races
>>
>>
>> After reverting this on top of -next I no longer see the problem. I
>> have
>> not had chance to look any closer, but wanted to see if you had any
>> ideas what might be the problem.
>>
>> Cheers
>> Jon
> 
> What kind of boot is this? UDP or TCP? nfsroot? NFSv3 or NFSv4? 

This is nfsroot. I don't specify any particular NFS version from 
the kernel cmdline, but this is seen with ARM kernel configs
tegra_defconfig and multi_v7_defconfig. 

Looking at the logs I am seeing the following crash which appears
to point to UDP ...

[    8.032956] Unable to handle kernel NULL pointer dereference at virtual address 00000024
[    8.041137] pgd = (ptrval)
[    8.043858] [00000024] *pgd=00000000
[    8.047437] Internal error: Oops: 5 [#1] SMP ARM
[    8.052049] Modules linked in:
[    8.055104] CPU: 1 PID: 100 Comm: kworker/u9:2 Not tainted 5.0.0-rc7-next-20190222-g94a4752 #1
[    8.063699] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    8.069960] Workqueue: xprtiod xs_udp_data_receive_workfn
[    8.075353] PC is at udp_poll+0x30/0x64
[    8.079178] LR is at udp_poll+0x10/0x64
[    8.083006] pc : [<c0d3be08>]    lr : [<c0d3bde8>]    psr: 20000113
[    8.089257] sp : e8963ea8  ip : 00000000  fp : 00001064
[    8.094471] r10: 00000000  r9 : e8963ef4  r8 : e8963efc
[    8.099687] r7 : e940db34  r6 : 00000000  r5 : e9d1c540  r4 : 00000345
[    8.106203] r3 : c0d3bdd8  r2 : e8908cc0  r1 : e8908cc0  r0 : e89089c0
[    8.112718] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    8.119843] Control: 10c5387d  Table: 8020406a  DAC: 00000051
[    8.125579] Process kworker/u9:2 (pid: 100, stack limit = 0x(ptrval))
[    8.132008] Stack: (0xe8963ea8 to 0xe8964000)
[    8.136352] 3ea0:                   e940d800 e940db04 e89089c0 c0dcac04 fffffff5 00000000
[    8.144515] 3ec0: e88f4400 c0dcae80 e8963efc e8963ef4 e940da58 e940d800 c187c680 e8941b58
[    8.152681] 3ee0: 000010f0 e940da50 e940db44 00000000 e8963f44 fffffff5 00000000 00000000
[    8.160838] 3f00: e894247c ea203180 e940db34 ea3f5c00 e940db38 ea35e500 00000000 00000000
[    8.169001] 3f20: c188ef50 c035f21c ea3f5c00 ea3f5c00 e8962018 ea203180 ea3f5c00 ea3f5c18
[    8.177164] 3f40: e8962018 c188e91c ea203194 c1703d00 00000088 c035f548 e890d05c e8962000
[    8.185327] 3f60: ea200480 c1703d00 e890d05c e890d040 ea200480 00000000 e890d05c ea203180
[    8.193492] 3f80: c035f508 ea433ebc 00000000 c0364810 ea200480 c03646ec 00000000 00000000
[    8.201657] 3fa0: 00000000 00000000 00000000 c03010e8 00000000 00000000 00000000 00000000
[    8.209815] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.217979] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    8.226154] [<c0d3be08>] (udp_poll) from [<c0dcac04>] (xs_poll_check_readable+0x30/0x70)
[    8.234241] [<c0dcac04>] (xs_poll_check_readable) from [<c0dcae80>] (xs_udp_data_receive_workfn+0x23c/0x2a0)
[    8.244062] [<c0dcae80>] (xs_udp_data_receive_workfn) from [<c035f21c>] (process_one_work+0x164/0x450)
[    8.253359] [<c035f21c>] (process_one_work) from [<c035f548>] (worker_thread+0x40/0x524)
[    8.261444] [<c035f548>] (worker_thread) from [<c0364810>] (kthread+0x124/0x154)
[    8.268834] [<c0364810>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
[    8.276044] Exception stack(0xe8963fb0 to 0xe8963ff8)
[    8.281084] 3fa0:                                     00000000 00000000 00000000 00000000
[    8.289248] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    8.297411] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    8.304017] Code: e1510002 13844041 e3140040 0a000005 (e5962024) 
[    8.310129] ---[ end trace 52d5b8bca1dce72c ]---
[   19.321470] nfs: server 192.168.99.1 not responding, still trying
[   55.001474] nfs: server 192.168.99.1 not responding, still trying

Cheers
Jon
 

-- 
nvpublic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-25 22:27   ` Jon Hunter
@ 2019-02-25 22:41     ` Trond Myklebust
  2019-02-26 18:33       ` Tony Lindgren
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2019-02-25 22:41 UTC (permalink / raw)
  To: jonathanh, linux-kernel, linux-nfs, linux-tegra

On Mon, 2019-02-25 at 22:27 +0000, Jon Hunter wrote:
> On 25/02/2019 21:03, Trond Myklebust wrote:
> > On Mon, 2019-02-25 at 20:25 +0000, Jon Hunter wrote:
> > > Hi Trond,
> > > 
> > > Starting in next-20190222 I have observed a regression with NFS
> > > causing
> > > some of our boards to fail to boot. Bisect points to your commit
> > > ...
> > > 
> > > commit 0ffe86f48026b7f34db22d1004bc9992f0db8b33
> > > Author: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Date:   Wed Jan 30 14:51:26 2019 -0500
> > > 
> > >     SUNRPC: Use poll() to fix up the socket requeue races
> > > 
> > > 
> > > After reverting this on top of -next I no longer see the problem.
> > > I
> > > have
> > > not had chance to look any closer, but wanted to see if you had
> > > any
> > > ideas what might be the problem.
> > > 
> > > Cheers
> > > Jon
> > 
> > What kind of boot is this? UDP or TCP? nfsroot? NFSv3 or NFSv4? 
> 
> This is nfsroot. I don't specify any particular NFS version from 
> the kernel cmdline, but this is seen with ARM kernel configs
> tegra_defconfig and multi_v7_defconfig. 
> 
> Looking at the logs I am seeing the following crash which appears
> to point to UDP ...
> 
> [    8.032956] Unable to handle kernel NULL pointer dereference at
> virtual address 00000024
> [    8.041137] pgd = (ptrval)
> [    8.043858] [00000024] *pgd=00000000
> [    8.047437] Internal error: Oops: 5 [#1] SMP ARM
> [    8.052049] Modules linked in:
> [    8.055104] CPU: 1 PID: 100 Comm: kworker/u9:2 Not tainted 5.0.0-
> rc7-next-20190222-g94a4752 #1
> [    8.063699] Hardware name: NVIDIA Tegra SoC (Flattened Device
> Tree)
> [    8.069960] Workqueue: xprtiod xs_udp_data_receive_workfn
> [    8.075353] PC is at udp_poll+0x30/0x64
> [    8.079178] LR is at udp_poll+0x10/0x64

Thanks! I see what the issue is now and I'll be fixing it ASAP.

Cheers
   Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-25 22:41     ` Trond Myklebust
@ 2019-02-26 18:33       ` Tony Lindgren
  2019-02-26 20:40         ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Tony Lindgren @ 2019-02-26 18:33 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jonathanh, linux-kernel, linux-nfs, linux-tegra

Hi,

* Trond Myklebust <trondmy@hammerspace.com> [700101 00:00]:
> On Mon, 2019-02-25 at 22:27 +0000, Jon Hunter wrote:
> > On 25/02/2019 21:03, Trond Myklebust wrote:
> > This is nfsroot. I don't specify any particular NFS version from 
> > the kernel cmdline, but this is seen with ARM kernel configs
> > tegra_defconfig and multi_v7_defconfig. 
> > 
> > Looking at the logs I am seeing the following crash which appears
> > to point to UDP ...
> > 
> > [    8.032956] Unable to handle kernel NULL pointer dereference at
> > virtual address 00000024
> > [    8.041137] pgd = (ptrval)
> > [    8.043858] [00000024] *pgd=00000000
> > [    8.047437] Internal error: Oops: 5 [#1] SMP ARM
> > [    8.052049] Modules linked in:
> > [    8.055104] CPU: 1 PID: 100 Comm: kworker/u9:2 Not tainted 5.0.0-
> > rc7-next-20190222-g94a4752 #1
> > [    8.063699] Hardware name: NVIDIA Tegra SoC (Flattened Device
> > Tree)
> > [    8.069960] Workqueue: xprtiod xs_udp_data_receive_workfn
> > [    8.075353] PC is at udp_poll+0x30/0x64
> > [    8.079178] LR is at udp_poll+0x10/0x64
> 
> Thanks! I see what the issue is now and I'll be fixing it ASAP.

I'm seeing this also with NFSroot. I can test the fix when available,
that is if you can please Cc me on the fix.

Thanks,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-26 18:33       ` Tony Lindgren
@ 2019-02-26 20:40         ` Trond Myklebust
  2019-02-27  0:31           ` Tony Lindgren
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2019-02-26 20:40 UTC (permalink / raw)
  To: tony; +Cc: jonathanh, linux-kernel, linux-nfs, linux-tegra

On Tue, 2019-02-26 at 10:33 -0800, Tony Lindgren wrote:
> Hi,
> 
> * Trond Myklebust <trondmy@hammerspace.com> [700101 00:00]:
> > On Mon, 2019-02-25 at 22:27 +0000, Jon Hunter wrote:
> > > On 25/02/2019 21:03, Trond Myklebust wrote:
> > > This is nfsroot. I don't specify any particular NFS version from 
> > > the kernel cmdline, but this is seen with ARM kernel configs
> > > tegra_defconfig and multi_v7_defconfig. 
> > > 
> > > Looking at the logs I am seeing the following crash which appears
> > > to point to UDP ...
> > > 
> > > [    8.032956] Unable to handle kernel NULL pointer dereference
> > > at
> > > virtual address 00000024
> > > [    8.041137] pgd = (ptrval)
> > > [    8.043858] [00000024] *pgd=00000000
> > > [    8.047437] Internal error: Oops: 5 [#1] SMP ARM
> > > [    8.052049] Modules linked in:
> > > [    8.055104] CPU: 1 PID: 100 Comm: kworker/u9:2 Not tainted
> > > 5.0.0-
> > > rc7-next-20190222-g94a4752 #1
> > > [    8.063699] Hardware name: NVIDIA Tegra SoC (Flattened Device
> > > Tree)
> > > [    8.069960] Workqueue: xprtiod xs_udp_data_receive_workfn
> > > [    8.075353] PC is at udp_poll+0x30/0x64
> > > [    8.079178] LR is at udp_poll+0x10/0x64
> > 
> > Thanks! I see what the issue is now and I'll be fixing it ASAP.
> 
> I'm seeing this also with NFSroot. I can test the fix when available,
> that is if you can please Cc me on the fix.
> 
> Thanks,
> 
> Tony

I've applied the following patch to my 'testing' branch: 
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=patch;h=a73881c96d73ee72b7dbbd38a6eeef66182a8ef7

It has so far stood up to testing on my side, so I'm expecting to push
it to linux-next this evening.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression: SUNRPC: Use poll() to fix up the socket requeue races
  2019-02-26 20:40         ` Trond Myklebust
@ 2019-02-27  0:31           ` Tony Lindgren
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lindgren @ 2019-02-27  0:31 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: jonathanh, linux-kernel, linux-nfs, linux-tegra

* Trond Myklebust <trondmy@hammerspace.com> [190226 20:40]:
> I've applied the following patch to my 'testing' branch: 
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=patch;h=a73881c96d73ee72b7dbbd38a6eeef66182a8ef7
> 
> It has so far stood up to testing on my side, so I'm expecting to push
> it to linux-next this evening.

OK, works for me too based on a quick test of few NFSroot boots.

Thanks,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-02-27  0:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-25 20:25 Regression: SUNRPC: Use poll() to fix up the socket requeue races Jon Hunter
2019-02-25 21:03 ` Trond Myklebust
2019-02-25 22:27   ` Jon Hunter
2019-02-25 22:41     ` Trond Myklebust
2019-02-26 18:33       ` Tony Lindgren
2019-02-26 20:40         ` Trond Myklebust
2019-02-27  0:31           ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).