All of lore.kernel.org
 help / color / mirror / Atom feed
* net/sunrpc: v4.14-rc4 lockdep warning
@ 2017-10-09 18:17 Lorenzo Pieralisi
  2017-10-09 18:32   ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Pieralisi @ 2017-10-09 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-nfs
  Cc: Trond Myklebust, J. Bruce Fields, Anna Schumaker, Jeff Layton

Hi,

I have run into the lockdep warning below while running v4.14-rc3/rc4
on an ARM64 defconfig Juno dev board - reporting it to check whether
it is a known/genuine issue.

Please let me know if you need further debug data or need some
specific tests.

Thanks,
Lorenzo

[    6.209384] ======================================================
[    6.215569] WARNING: possible circular locking dependency detected
[    6.221755] 4.14.0-rc4 #54 Not tainted
[    6.225503] ------------------------------------------------------
[    6.231689] kworker/4:0H/32 is trying to acquire lock:
[    6.236830]  ((&task->u.tk_work)){+.+.}, at: [<ffff0000080e64cc>] process_one_work+0x1cc/0x3f0
[    6.245472] 
               but task is already holding lock:
[    6.251309]  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>] process_one_work+0x1cc/0x3f0
[    6.259158] 
               which lock already depends on the new lock.

[    6.267345] 
               the existing dependency chain (in reverse order) is:
[    6.274836] 
               -> #1 ("xprtiod"){+.+.}:
[    6.279903]        lock_acquire+0x6c/0xb8
[    6.283914]        flush_work+0x188/0x270
[    6.287926]        __cancel_work_timer+0x120/0x198
[    6.292720]        cancel_work_sync+0x10/0x18
[    6.297081]        xs_destroy+0x34/0x58
[    6.300917]        xprt_destroy+0x84/0x90
[    6.304927]        xprt_put+0x34/0x40
[    6.308589]        rpc_task_release_client+0x6c/0x80
[    6.313557]        rpc_release_resources_task+0x2c/0x38
[    6.318786]        __rpc_execute+0x9c/0x210
[    6.322971]        rpc_async_schedule+0x10/0x18
[    6.327504]        process_one_work+0x240/0x3f0
[    6.332036]        worker_thread+0x48/0x420
[    6.336222]        kthread+0x12c/0x158
[    6.339972]        ret_from_fork+0x10/0x18
[    6.344068] 
               -> #0 ((&task->u.tk_work)){+.+.}:
[    6.349920]        __lock_acquire+0x12ec/0x14a8
[    6.354451]        lock_acquire+0x6c/0xb8
[    6.358462]        process_one_work+0x22c/0x3f0
[    6.362994]        worker_thread+0x48/0x420
[    6.367180]        kthread+0x12c/0x158
[    6.370929]        ret_from_fork+0x10/0x18
[    6.375025] 
               other info that might help us debug this:

[    6.383038]  Possible unsafe locking scenario:

[    6.388962]        CPU0                    CPU1
[    6.393493]        ----                    ----
[    6.398023]   lock("xprtiod");
[    6.401080]                                lock((&task->u.tk_work));
[    6.407444]                                lock("xprtiod");
[    6.413024]   lock((&task->u.tk_work));
[    6.416863] 
                *** DEADLOCK ***

[    6.422789] 1 lock held by kworker/4:0H/32:
[    6.426972]  #0:  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>] process_one_work+0x1cc/0x3f0
[    6.435258] 
               stack backtrace:
[    6.439618] CPU: 4 PID: 32 Comm: kworker/4:0H Not tainted 4.14.0-rc4 #54
[    6.446325] Hardware name: ARM Juno development board (r2) (DT)
[    6.452252] Workqueue: xprtiod rpc_async_schedule
[    6.456959] Call trace:
[    6.459406] [<ffff000008089430>] dump_backtrace+0x0/0x3c8
[    6.464810] [<ffff00000808980c>] show_stack+0x14/0x20
[    6.469866] [<ffff000008a01a30>] dump_stack+0xb8/0xf0
[    6.474922] [<ffff0000081194ac>] print_circular_bug+0x224/0x3a0
[    6.480849] [<ffff00000811a304>] check_prev_add+0x304/0x860
[    6.486426] [<ffff00000811c8c4>] __lock_acquire+0x12ec/0x14a8
[    6.492177] [<ffff00000811d144>] lock_acquire+0x6c/0xb8
[    6.497406] [<ffff0000080e652c>] process_one_work+0x22c/0x3f0
[    6.503156] [<ffff0000080e6738>] worker_thread+0x48/0x420
[    6.508560] [<ffff0000080ed5bc>] kthread+0x12c/0x158
[    6.513528] [<ffff000008084d48>] ret_from_fork+0x10/0x18

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-09 18:17 net/sunrpc: v4.14-rc4 lockdep warning Lorenzo Pieralisi
@ 2017-10-09 18:32   ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-09 18:32 UTC (permalink / raw)
  To: linux-kernel, lorenzo.pieralisi, linux-nfs, jiangshanlai, tj
  Cc: bfields, anna.schumaker, jlayton

On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> Hi,
> 
> I have run into the lockdep warning below while running v4.14-rc3/rc4
> on an ARM64 defconfig Juno dev board - reporting it to check whether
> it is a known/genuine issue.
> 
> Please let me know if you need further debug data or need some
> specific tests.
> 
> Thanks,
> Lorenzo
> 
> [    6.209384] ======================================================
> [    6.215569] WARNING: possible circular locking dependency detected
> [    6.221755] 4.14.0-rc4 #54 Not tainted
> [    6.225503] ------------------------------------------------------
> [    6.231689] kworker/4:0H/32 is trying to acquire lock:
> [    6.236830]  ((&task->u.tk_work)){+.+.}, at: [<ffff0000080e64cc>]
> process_one_work+0x1cc/0x3f0
> [    6.245472] 
>                but task is already holding lock:
> [    6.251309]  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> process_one_work+0x1cc/0x3f0
> [    6.259158] 
>                which lock already depends on the new lock.
> 
> [    6.267345] 
>                the existing dependency chain (in reverse order) is:
> [    6.274836] 
>                -> #1 ("xprtiod"){+.+.}:
> [    6.279903]        lock_acquire+0x6c/0xb8
> [    6.283914]        flush_work+0x188/0x270
> [    6.287926]        __cancel_work_timer+0x120/0x198
> [    6.292720]        cancel_work_sync+0x10/0x18
> [    6.297081]        xs_destroy+0x34/0x58
> [    6.300917]        xprt_destroy+0x84/0x90
> [    6.304927]        xprt_put+0x34/0x40
> [    6.308589]        rpc_task_release_client+0x6c/0x80
> [    6.313557]        rpc_release_resources_task+0x2c/0x38
> [    6.318786]        __rpc_execute+0x9c/0x210
> [    6.322971]        rpc_async_schedule+0x10/0x18
> [    6.327504]        process_one_work+0x240/0x3f0
> [    6.332036]        worker_thread+0x48/0x420
> [    6.336222]        kthread+0x12c/0x158
> [    6.339972]        ret_from_fork+0x10/0x18
> [    6.344068] 
>                -> #0 ((&task->u.tk_work)){+.+.}:
> [    6.349920]        __lock_acquire+0x12ec/0x14a8
> [    6.354451]        lock_acquire+0x6c/0xb8
> [    6.358462]        process_one_work+0x22c/0x3f0
> [    6.362994]        worker_thread+0x48/0x420
> [    6.367180]        kthread+0x12c/0x158
> [    6.370929]        ret_from_fork+0x10/0x18
> [    6.375025] 
>                other info that might help us debug this:
> 
> [    6.383038]  Possible unsafe locking scenario:
> 
> [    6.388962]        CPU0                    CPU1
> [    6.393493]        ----                    ----
> [    6.398023]   lock("xprtiod");
> [    6.401080]                                lock((&task-
> >u.tk_work));
> [    6.407444]                                lock("xprtiod");
> [    6.413024]   lock((&task->u.tk_work));
> [    6.416863] 
>                 *** DEADLOCK ***
> 
> [    6.422789] 1 lock held by kworker/4:0H/32:
> [    6.426972]  #0:  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> process_one_work+0x1cc/0x3f0
> [    6.435258] 
>                stack backtrace:
> [    6.439618] CPU: 4 PID: 32 Comm: kworker/4:0H Not tainted 4.14.0-
> rc4 #54
> [    6.446325] Hardware name: ARM Juno development board (r2) (DT)
> [    6.452252] Workqueue: xprtiod rpc_async_schedule
> [    6.456959] Call trace:
> [    6.459406] [<ffff000008089430>] dump_backtrace+0x0/0x3c8
> [    6.464810] [<ffff00000808980c>] show_stack+0x14/0x20
> [    6.469866] [<ffff000008a01a30>] dump_stack+0xb8/0xf0
> [    6.474922] [<ffff0000081194ac>] print_circular_bug+0x224/0x3a0
> [    6.480849] [<ffff00000811a304>] check_prev_add+0x304/0x860
> [    6.486426] [<ffff00000811c8c4>] __lock_acquire+0x12ec/0x14a8
> [    6.492177] [<ffff00000811d144>] lock_acquire+0x6c/0xb8
> [    6.497406] [<ffff0000080e652c>] process_one_work+0x22c/0x3f0
> [    6.503156] [<ffff0000080e6738>] worker_thread+0x48/0x420
> [    6.508560] [<ffff0000080ed5bc>] kthread+0x12c/0x158
> [    6.513528] [<ffff000008084d48>] ret_from_fork+0x10/0x18
> 

Adding Tejun and Lai, since this looks like a workqueue locking issue.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
@ 2017-10-09 18:32   ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-09 18:32 UTC (permalink / raw)
  To: linux-kernel, lorenzo.pieralisi, linux-nfs, jiangshanlai, tj
  Cc: bfields, anna.schumaker, jlayton

T24gTW9uLCAyMDE3LTEwLTA5IGF0IDE5OjE3ICswMTAwLCBMb3JlbnpvIFBpZXJhbGlzaSB3cm90
ZToNCj4gSGksDQo+IA0KPiBJIGhhdmUgcnVuIGludG8gdGhlIGxvY2tkZXAgd2FybmluZyBiZWxv
dyB3aGlsZSBydW5uaW5nIHY0LjE0LXJjMy9yYzQNCj4gb24gYW4gQVJNNjQgZGVmY29uZmlnIEp1
bm8gZGV2IGJvYXJkIC0gcmVwb3J0aW5nIGl0IHRvIGNoZWNrIHdoZXRoZXINCj4gaXQgaXMgYSBr
bm93bi9nZW51aW5lIGlzc3VlLg0KPiANCj4gUGxlYXNlIGxldCBtZSBrbm93IGlmIHlvdSBuZWVk
IGZ1cnRoZXIgZGVidWcgZGF0YSBvciBuZWVkIHNvbWUNCj4gc3BlY2lmaWMgdGVzdHMuDQo+IA0K
PiBUaGFua3MsDQo+IExvcmVuem8NCj4gDQo+IFsgICAgNi4yMDkzODRdID09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQ0KPiBbICAgIDYuMjE1NTY5
XSBXQVJOSU5HOiBwb3NzaWJsZSBjaXJjdWxhciBsb2NraW5nIGRlcGVuZGVuY3kgZGV0ZWN0ZWQN
Cj4gWyAgICA2LjIyMTc1NV0gNC4xNC4wLXJjNCAjNTQgTm90IHRhaW50ZWQNCj4gWyAgICA2LjIy
NTUwM10gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tDQo+IFsgICAgNi4yMzE2ODldIGt3b3JrZXIvNDowSC8zMiBpcyB0cnlpbmcgdG8gYWNxdWly
ZSBsb2NrOg0KPiBbICAgIDYuMjM2ODMwXSAgKCgmdGFzay0+dS50a193b3JrKSl7Ky4rLn0sIGF0
OiBbPGZmZmYwMDAwMDgwZTY0Y2M+XQ0KPiBwcm9jZXNzX29uZV93b3JrKzB4MWNjLzB4M2YwDQo+
IFsgICAgNi4yNDU0NzJdIA0KPiAgICAgICAgICAgICAgICBidXQgdGFzayBpcyBhbHJlYWR5IGhv
bGRpbmcgbG9jazoNCj4gWyAgICA2LjI1MTMwOV0gICgieHBydGlvZCIpeysuKy59LCBhdDogWzxm
ZmZmMDAwMDA4MGU2NGNjPl0NCj4gcHJvY2Vzc19vbmVfd29yaysweDFjYy8weDNmMA0KPiBbICAg
IDYuMjU5MTU4XSANCj4gICAgICAgICAgICAgICAgd2hpY2ggbG9jayBhbHJlYWR5IGRlcGVuZHMg
b24gdGhlIG5ldyBsb2NrLg0KPiANCj4gWyAgICA2LjI2NzM0NV0gDQo+ICAgICAgICAgICAgICAg
IHRoZSBleGlzdGluZyBkZXBlbmRlbmN5IGNoYWluIChpbiByZXZlcnNlIG9yZGVyKSBpczoNCj4g
WyAgICA2LjI3NDgzNl0gDQo+ICAgICAgICAgICAgICAgIC0+ICMxICgieHBydGlvZCIpeysuKy59
Og0KPiBbICAgIDYuMjc5OTAzXSAgICAgICAgbG9ja19hY3F1aXJlKzB4NmMvMHhiOA0KPiBbICAg
IDYuMjgzOTE0XSAgICAgICAgZmx1c2hfd29yaysweDE4OC8weDI3MA0KPiBbICAgIDYuMjg3OTI2
XSAgICAgICAgX19jYW5jZWxfd29ya190aW1lcisweDEyMC8weDE5OA0KPiBbICAgIDYuMjkyNzIw
XSAgICAgICAgY2FuY2VsX3dvcmtfc3luYysweDEwLzB4MTgNCj4gWyAgICA2LjI5NzA4MV0gICAg
ICAgIHhzX2Rlc3Ryb3krMHgzNC8weDU4DQo+IFsgICAgNi4zMDA5MTddICAgICAgICB4cHJ0X2Rl
c3Ryb3krMHg4NC8weDkwDQo+IFsgICAgNi4zMDQ5MjddICAgICAgICB4cHJ0X3B1dCsweDM0LzB4
NDANCj4gWyAgICA2LjMwODU4OV0gICAgICAgIHJwY190YXNrX3JlbGVhc2VfY2xpZW50KzB4NmMv
MHg4MA0KPiBbICAgIDYuMzEzNTU3XSAgICAgICAgcnBjX3JlbGVhc2VfcmVzb3VyY2VzX3Rhc2sr
MHgyYy8weDM4DQo+IFsgICAgNi4zMTg3ODZdICAgICAgICBfX3JwY19leGVjdXRlKzB4OWMvMHgy
MTANCj4gWyAgICA2LjMyMjk3MV0gICAgICAgIHJwY19hc3luY19zY2hlZHVsZSsweDEwLzB4MTgN
Cj4gWyAgICA2LjMyNzUwNF0gICAgICAgIHByb2Nlc3Nfb25lX3dvcmsrMHgyNDAvMHgzZjANCj4g
WyAgICA2LjMzMjAzNl0gICAgICAgIHdvcmtlcl90aHJlYWQrMHg0OC8weDQyMA0KPiBbICAgIDYu
MzM2MjIyXSAgICAgICAga3RocmVhZCsweDEyYy8weDE1OA0KPiBbICAgIDYuMzM5OTcyXSAgICAg
ICAgcmV0X2Zyb21fZm9yaysweDEwLzB4MTgNCj4gWyAgICA2LjM0NDA2OF0gDQo+ICAgICAgICAg
ICAgICAgIC0+ICMwICgoJnRhc2stPnUudGtfd29yaykpeysuKy59Og0KPiBbICAgIDYuMzQ5OTIw
XSAgICAgICAgX19sb2NrX2FjcXVpcmUrMHgxMmVjLzB4MTRhOA0KPiBbICAgIDYuMzU0NDUxXSAg
ICAgICAgbG9ja19hY3F1aXJlKzB4NmMvMHhiOA0KPiBbICAgIDYuMzU4NDYyXSAgICAgICAgcHJv
Y2Vzc19vbmVfd29yaysweDIyYy8weDNmMA0KPiBbICAgIDYuMzYyOTk0XSAgICAgICAgd29ya2Vy
X3RocmVhZCsweDQ4LzB4NDIwDQo+IFsgICAgNi4zNjcxODBdICAgICAgICBrdGhyZWFkKzB4MTJj
LzB4MTU4DQo+IFsgICAgNi4zNzA5MjldICAgICAgICByZXRfZnJvbV9mb3JrKzB4MTAvMHgxOA0K
PiBbICAgIDYuMzc1MDI1XSANCj4gICAgICAgICAgICAgICAgb3RoZXIgaW5mbyB0aGF0IG1pZ2h0
IGhlbHAgdXMgZGVidWcgdGhpczoNCj4gDQo+IFsgICAgNi4zODMwMzhdICBQb3NzaWJsZSB1bnNh
ZmUgbG9ja2luZyBzY2VuYXJpbzoNCj4gDQo+IFsgICAgNi4zODg5NjJdICAgICAgICBDUFUwICAg
ICAgICAgICAgICAgICAgICBDUFUxDQo+IFsgICAgNi4zOTM0OTNdICAgICAgICAtLS0tICAgICAg
ICAgICAgICAgICAgICAtLS0tDQo+IFsgICAgNi4zOTgwMjNdICAgbG9jaygieHBydGlvZCIpOw0K
PiBbICAgIDYuNDAxMDgwXSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbG9jaygoJnRh
c2stDQo+ID51LnRrX3dvcmspKTsNCj4gWyAgICA2LjQwNzQ0NF0gICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgIGxvY2soInhwcnRpb2QiKTsNCj4gWyAgICA2LjQxMzAyNF0gICBsb2NrKCgm
dGFzay0+dS50a193b3JrKSk7DQo+IFsgICAgNi40MTY4NjNdIA0KPiAgICAgICAgICAgICAgICAg
KioqIERFQURMT0NLICoqKg0KPiANCj4gWyAgICA2LjQyMjc4OV0gMSBsb2NrIGhlbGQgYnkga3dv
cmtlci80OjBILzMyOg0KPiBbICAgIDYuNDI2OTcyXSAgIzA6ICAoInhwcnRpb2QiKXsrLisufSwg
YXQ6IFs8ZmZmZjAwMDAwODBlNjRjYz5dDQo+IHByb2Nlc3Nfb25lX3dvcmsrMHgxY2MvMHgzZjAN
Cj4gWyAgICA2LjQzNTI1OF0gDQo+ICAgICAgICAgICAgICAgIHN0YWNrIGJhY2t0cmFjZToNCj4g
WyAgICA2LjQzOTYxOF0gQ1BVOiA0IFBJRDogMzIgQ29tbToga3dvcmtlci80OjBIIE5vdCB0YWlu
dGVkIDQuMTQuMC0NCj4gcmM0ICM1NA0KPiBbICAgIDYuNDQ2MzI1XSBIYXJkd2FyZSBuYW1lOiBB
Uk0gSnVubyBkZXZlbG9wbWVudCBib2FyZCAocjIpIChEVCkNCj4gWyAgICA2LjQ1MjI1Ml0gV29y
a3F1ZXVlOiB4cHJ0aW9kIHJwY19hc3luY19zY2hlZHVsZQ0KPiBbICAgIDYuNDU2OTU5XSBDYWxs
IHRyYWNlOg0KPiBbICAgIDYuNDU5NDA2XSBbPGZmZmYwMDAwMDgwODk0MzA+XSBkdW1wX2JhY2t0
cmFjZSsweDAvMHgzYzgNCj4gWyAgICA2LjQ2NDgxMF0gWzxmZmZmMDAwMDA4MDg5ODBjPl0gc2hv
d19zdGFjaysweDE0LzB4MjANCj4gWyAgICA2LjQ2OTg2Nl0gWzxmZmZmMDAwMDA4YTAxYTMwPl0g
ZHVtcF9zdGFjaysweGI4LzB4ZjANCj4gWyAgICA2LjQ3NDkyMl0gWzxmZmZmMDAwMDA4MTE5NGFj
Pl0gcHJpbnRfY2lyY3VsYXJfYnVnKzB4MjI0LzB4M2EwDQo+IFsgICAgNi40ODA4NDldIFs8ZmZm
ZjAwMDAwODExYTMwND5dIGNoZWNrX3ByZXZfYWRkKzB4MzA0LzB4ODYwDQo+IFsgICAgNi40ODY0
MjZdIFs8ZmZmZjAwMDAwODExYzhjND5dIF9fbG9ja19hY3F1aXJlKzB4MTJlYy8weDE0YTgNCj4g
WyAgICA2LjQ5MjE3N10gWzxmZmZmMDAwMDA4MTFkMTQ0Pl0gbG9ja19hY3F1aXJlKzB4NmMvMHhi
OA0KPiBbICAgIDYuNDk3NDA2XSBbPGZmZmYwMDAwMDgwZTY1MmM+XSBwcm9jZXNzX29uZV93b3Jr
KzB4MjJjLzB4M2YwDQo+IFsgICAgNi41MDMxNTZdIFs8ZmZmZjAwMDAwODBlNjczOD5dIHdvcmtl
cl90aHJlYWQrMHg0OC8weDQyMA0KPiBbICAgIDYuNTA4NTYwXSBbPGZmZmYwMDAwMDgwZWQ1YmM+
XSBrdGhyZWFkKzB4MTJjLzB4MTU4DQo+IFsgICAgNi41MTM1MjhdIFs8ZmZmZjAwMDAwODA4NGQ0
OD5dIHJldF9mcm9tX2ZvcmsrMHgxMC8weDE4DQo+IA0KDQpBZGRpbmcgVGVqdW4gYW5kIExhaSwg
c2luY2UgdGhpcyBsb29rcyBsaWtlIGEgd29ya3F1ZXVlIGxvY2tpbmcgaXNzdWUuDQoNCkNoZWVy
cw0KICBUcm9uZA0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWlu
dGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-09 18:32   ` Trond Myklebust
  (?)
@ 2017-10-10 14:03   ` tj
  2017-10-10 16:48       ` Trond Myklebust
  -1 siblings, 1 reply; 10+ messages in thread
From: tj @ 2017-10-10 14:03 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-kernel, lorenzo.pieralisi, linux-nfs, jiangshanlai,
	bfields, anna.schumaker, jlayton

Hello, Trond.

On Mon, Oct 09, 2017 at 06:32:13PM +0000, Trond Myklebust wrote:
> On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> > I have run into the lockdep warning below while running v4.14-rc3/rc4
> > on an ARM64 defconfig Juno dev board - reporting it to check whether
> > it is a known/genuine issue.
> > 
> > Please let me know if you need further debug data or need some
> > specific tests.
> > 
> > [    6.209384] ======================================================
> > [    6.215569] WARNING: possible circular locking dependency detected
> > [    6.221755] 4.14.0-rc4 #54 Not tainted
> > [    6.225503] ------------------------------------------------------
> > [    6.231689] kworker/4:0H/32 is trying to acquire lock:
> > [    6.236830]  ((&task->u.tk_work)){+.+.}, at: [<ffff0000080e64cc>]
> > process_one_work+0x1cc/0x3f0
> > [    6.245472] 
> >                but task is already holding lock:
> > [    6.251309]  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> > process_one_work+0x1cc/0x3f0
> > [    6.259158] 
> >                which lock already depends on the new lock.
> > 
> > [    6.267345] 
> >                the existing dependency chain (in reverse order) is:
..
> Adding Tejun and Lai, since this looks like a workqueue locking issue.

It looks a bit cryptic but it's warning against the following case.

1. Memory pressure is high and rescuer kicks in for the xprtiod
   workqueue.  There are no other kworkers serving the workqueue.

2. The rescuer runs the xptr_destroy path and ends up calling
   cancel_work_sync() on a work item which is queued on xprtiod.

3. The work item is pending on the same workqueue and assuming that
   memory pressure doesn't let off (let's say reclaim is trying to
   kick off nfs pages), the only way it can get executed is by the
   rescuer which is waiting for the work item - an A-B-A deadlock.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-10 14:03   ` tj
@ 2017-10-10 16:48       ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-10 16:48 UTC (permalink / raw)
  To: tj
  Cc: bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

On Tue, 2017-10-10 at 07:03 -0700, tj@kernel.org wrote:
> Hello, Trond.
> 
> On Mon, Oct 09, 2017 at 06:32:13PM +0000, Trond Myklebust wrote:
> > On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> > > I have run into the lockdep warning below while running v4.14-
> > > rc3/rc4
> > > on an ARM64 defconfig Juno dev board - reporting it to check
> > > whether
> > > it is a known/genuine issue.
> > > 
> > > Please let me know if you need further debug data or need some
> > > specific tests.
> > > 
> > > [    6.209384]
> > > ======================================================
> > > [    6.215569] WARNING: possible circular locking dependency
> > > detected
> > > [    6.221755] 4.14.0-rc4 #54 Not tainted
> > > [    6.225503] --------------------------------------------------
> > > ----
> > > [    6.231689] kworker/4:0H/32 is trying to acquire lock:
> > > [    6.236830]  ((&task->u.tk_work)){+.+.}, at:
> > > [<ffff0000080e64cc>]
> > > process_one_work+0x1cc/0x3f0
> > > [    6.245472] 
> > >                but task is already holding lock:
> > > [    6.251309]  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> > > process_one_work+0x1cc/0x3f0
> > > [    6.259158] 
> > >                which lock already depends on the new lock.
> > > 
> > > [    6.267345] 
> > >                the existing dependency chain (in reverse order)
> > > is:
> 
> ..
> > Adding Tejun and Lai, since this looks like a workqueue locking
> > issue.
> 
> It looks a bit cryptic but it's warning against the following case.
> 
> 1. Memory pressure is high and rescuer kicks in for the xprtiod
>    workqueue.  There are no other kworkers serving the workqueue.
> 
> 2. The rescuer runs the xptr_destroy path and ends up calling
>    cancel_work_sync() on a work item which is queued on xprtiod.
> 
> 3. The work item is pending on the same workqueue and assuming that
>    memory pressure doesn't let off (let's say reclaim is trying to
>    kick off nfs pages), the only way it can get executed is by the
>    rescuer which is waiting for the work item - an A-B-A deadlock.
> 

Hi Tejun,

Thanks for the explanation. What I'm not really understanding here
though, is how the work item could be queued at all. We have a
wait_on_bit_lock() in xprt_destroy() that should mean the xprt-
>task_cleanup work item has completed running, and that it cannot be
requeued.

Is there a possibility that the flush_queue() might be triggered
despite the work item not being queued?

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
@ 2017-10-10 16:48       ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-10 16:48 UTC (permalink / raw)
  To: tj
  Cc: bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

T24gVHVlLCAyMDE3LTEwLTEwIGF0IDA3OjAzIC0wNzAwLCB0akBrZXJuZWwub3JnIHdyb3RlOg0K
PiBIZWxsbywgVHJvbmQuDQo+IA0KPiBPbiBNb24sIE9jdCAwOSwgMjAxNyBhdCAwNjozMjoxM1BN
ICswMDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+ID4gT24gTW9uLCAyMDE3LTEwLTA5IGF0
IDE5OjE3ICswMTAwLCBMb3JlbnpvIFBpZXJhbGlzaSB3cm90ZToNCj4gPiA+IEkgaGF2ZSBydW4g
aW50byB0aGUgbG9ja2RlcCB3YXJuaW5nIGJlbG93IHdoaWxlIHJ1bm5pbmcgdjQuMTQtDQo+ID4g
PiByYzMvcmM0DQo+ID4gPiBvbiBhbiBBUk02NCBkZWZjb25maWcgSnVubyBkZXYgYm9hcmQgLSBy
ZXBvcnRpbmcgaXQgdG8gY2hlY2sNCj4gPiA+IHdoZXRoZXINCj4gPiA+IGl0IGlzIGEga25vd24v
Z2VudWluZSBpc3N1ZS4NCj4gPiA+IA0KPiA+ID4gUGxlYXNlIGxldCBtZSBrbm93IGlmIHlvdSBu
ZWVkIGZ1cnRoZXIgZGVidWcgZGF0YSBvciBuZWVkIHNvbWUNCj4gPiA+IHNwZWNpZmljIHRlc3Rz
Lg0KPiA+ID4gDQo+ID4gPiBbICAgIDYuMjA5Mzg0XQ0KPiA+ID4gPT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQo+ID4gPiBbICAgIDYuMjE1NTY5
XSBXQVJOSU5HOiBwb3NzaWJsZSBjaXJjdWxhciBsb2NraW5nIGRlcGVuZGVuY3kNCj4gPiA+IGRl
dGVjdGVkDQo+ID4gPiBbICAgIDYuMjIxNzU1XSA0LjE0LjAtcmM0ICM1NCBOb3QgdGFpbnRlZA0K
PiA+ID4gWyAgICA2LjIyNTUwM10gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0NCj4gPiA+IC0tLS0NCj4gPiA+IFsgICAgNi4yMzE2ODldIGt3b3JrZXIv
NDowSC8zMiBpcyB0cnlpbmcgdG8gYWNxdWlyZSBsb2NrOg0KPiA+ID4gWyAgICA2LjIzNjgzMF0g
ICgoJnRhc2stPnUudGtfd29yaykpeysuKy59LCBhdDoNCj4gPiA+IFs8ZmZmZjAwMDAwODBlNjRj
Yz5dDQo+ID4gPiBwcm9jZXNzX29uZV93b3JrKzB4MWNjLzB4M2YwDQo+ID4gPiBbICAgIDYuMjQ1
NDcyXSANCj4gPiA+ICAgICAgICAgICAgICAgIGJ1dCB0YXNrIGlzIGFscmVhZHkgaG9sZGluZyBs
b2NrOg0KPiA+ID4gWyAgICA2LjI1MTMwOV0gICgieHBydGlvZCIpeysuKy59LCBhdDogWzxmZmZm
MDAwMDA4MGU2NGNjPl0NCj4gPiA+IHByb2Nlc3Nfb25lX3dvcmsrMHgxY2MvMHgzZjANCj4gPiA+
IFsgICAgNi4yNTkxNThdIA0KPiA+ID4gICAgICAgICAgICAgICAgd2hpY2ggbG9jayBhbHJlYWR5
IGRlcGVuZHMgb24gdGhlIG5ldyBsb2NrLg0KPiA+ID4gDQo+ID4gPiBbICAgIDYuMjY3MzQ1XSAN
Cj4gPiA+ICAgICAgICAgICAgICAgIHRoZSBleGlzdGluZyBkZXBlbmRlbmN5IGNoYWluIChpbiBy
ZXZlcnNlIG9yZGVyKQ0KPiA+ID4gaXM6DQo+IA0KPiAuLg0KPiA+IEFkZGluZyBUZWp1biBhbmQg
TGFpLCBzaW5jZSB0aGlzIGxvb2tzIGxpa2UgYSB3b3JrcXVldWUgbG9ja2luZw0KPiA+IGlzc3Vl
Lg0KPiANCj4gSXQgbG9va3MgYSBiaXQgY3J5cHRpYyBidXQgaXQncyB3YXJuaW5nIGFnYWluc3Qg
dGhlIGZvbGxvd2luZyBjYXNlLg0KPiANCj4gMS4gTWVtb3J5IHByZXNzdXJlIGlzIGhpZ2ggYW5k
IHJlc2N1ZXIga2lja3MgaW4gZm9yIHRoZSB4cHJ0aW9kDQo+ICAgIHdvcmtxdWV1ZS4gIFRoZXJl
IGFyZSBubyBvdGhlciBrd29ya2VycyBzZXJ2aW5nIHRoZSB3b3JrcXVldWUuDQo+IA0KPiAyLiBU
aGUgcmVzY3VlciBydW5zIHRoZSB4cHRyX2Rlc3Ryb3kgcGF0aCBhbmQgZW5kcyB1cCBjYWxsaW5n
DQo+ICAgIGNhbmNlbF93b3JrX3N5bmMoKSBvbiBhIHdvcmsgaXRlbSB3aGljaCBpcyBxdWV1ZWQg
b24geHBydGlvZC4NCj4gDQo+IDMuIFRoZSB3b3JrIGl0ZW0gaXMgcGVuZGluZyBvbiB0aGUgc2Ft
ZSB3b3JrcXVldWUgYW5kIGFzc3VtaW5nIHRoYXQNCj4gICAgbWVtb3J5IHByZXNzdXJlIGRvZXNu
J3QgbGV0IG9mZiAobGV0J3Mgc2F5IHJlY2xhaW0gaXMgdHJ5aW5nIHRvDQo+ICAgIGtpY2sgb2Zm
IG5mcyBwYWdlcyksIHRoZSBvbmx5IHdheSBpdCBjYW4gZ2V0IGV4ZWN1dGVkIGlzIGJ5IHRoZQ0K
PiAgICByZXNjdWVyIHdoaWNoIGlzIHdhaXRpbmcgZm9yIHRoZSB3b3JrIGl0ZW0gLSBhbiBBLUIt
QSBkZWFkbG9jay4NCj4gDQoNCkhpIFRlanVuLA0KDQpUaGFua3MgZm9yIHRoZSBleHBsYW5hdGlv
bi4gV2hhdCBJJ20gbm90IHJlYWxseSB1bmRlcnN0YW5kaW5nIGhlcmUNCnRob3VnaCwgaXMgaG93
IHRoZSB3b3JrIGl0ZW0gY291bGQgYmUgcXVldWVkIGF0IGFsbC4gV2UgaGF2ZSBhDQp3YWl0X29u
X2JpdF9sb2NrKCkgaW4geHBydF9kZXN0cm95KCkgdGhhdCBzaG91bGQgbWVhbiB0aGUgeHBydC0N
Cj50YXNrX2NsZWFudXAgd29yayBpdGVtIGhhcyBjb21wbGV0ZWQgcnVubmluZywgYW5kIHRoYXQg
aXQgY2Fubm90IGJlDQpyZXF1ZXVlZC4NCg0KSXMgdGhlcmUgYSBwb3NzaWJpbGl0eSB0aGF0IHRo
ZSBmbHVzaF9xdWV1ZSgpIG1pZ2h0IGJlIHRyaWdnZXJlZA0KZGVzcGl0ZSB0aGUgd29yayBpdGVt
IG5vdCBiZWluZyBxdWV1ZWQ/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xp
ZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEu
Y29tDQo=


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-10 16:48       ` Trond Myklebust
  (?)
@ 2017-10-10 17:19       ` tj
  2017-10-11 17:49           ` Trond Myklebust
  -1 siblings, 1 reply; 10+ messages in thread
From: tj @ 2017-10-10 17:19 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

Hello,

On Tue, Oct 10, 2017 at 04:48:57PM +0000, Trond Myklebust wrote:
> Thanks for the explanation. What I'm not really understanding here
> though, is how the work item could be queued at all. We have a
> wait_on_bit_lock() in xprt_destroy() that should mean the xprt-
> >task_cleanup work item has completed running, and that it cannot be
> requeued.
> 
> Is there a possibility that the flush_queue() might be triggered
> despite the work item not being queued?

Yeah, for sure.  The lockdep annotations don't distinguish those
cases and assume the worst case.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-10 17:19       ` tj
@ 2017-10-11 17:49           ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-11 17:49 UTC (permalink / raw)
  To: tj
  Cc: bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

On Tue, 2017-10-10 at 10:19 -0700, tj@kernel.org wrote:
> Hello,
> 
> On Tue, Oct 10, 2017 at 04:48:57PM +0000, Trond Myklebust wrote:
> > Thanks for the explanation. What I'm not really understanding here
> > though, is how the work item could be queued at all. We have a
> > wait_on_bit_lock() in xprt_destroy() that should mean the xprt-
> > > task_cleanup work item has completed running, and that it cannot
> > > be
> > 
> > requeued.
> > 
> > Is there a possibility that the flush_queue() might be triggered
> > despite the work item not being queued?
> 
> Yeah, for sure.  The lockdep annotations don't distinguish those
> cases and assume the worst case.
> 

OK. Let's just remove that call to cancel_work_sync() then. As I said,
it should be redundant due to the wait_on_bit_lock().

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
@ 2017-10-11 17:49           ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2017-10-11 17:49 UTC (permalink / raw)
  To: tj
  Cc: bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

T24gVHVlLCAyMDE3LTEwLTEwIGF0IDEwOjE5IC0wNzAwLCB0akBrZXJuZWwub3JnIHdyb3RlOg0K
PiBIZWxsbywNCj4gDQo+IE9uIFR1ZSwgT2N0IDEwLCAyMDE3IGF0IDA0OjQ4OjU3UE0gKzAwMDAs
IFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gPiBUaGFua3MgZm9yIHRoZSBleHBsYW5hdGlvbi4g
V2hhdCBJJ20gbm90IHJlYWxseSB1bmRlcnN0YW5kaW5nIGhlcmUNCj4gPiB0aG91Z2gsIGlzIGhv
dyB0aGUgd29yayBpdGVtIGNvdWxkIGJlIHF1ZXVlZCBhdCBhbGwuIFdlIGhhdmUgYQ0KPiA+IHdh
aXRfb25fYml0X2xvY2soKSBpbiB4cHJ0X2Rlc3Ryb3koKSB0aGF0IHNob3VsZCBtZWFuIHRoZSB4
cHJ0LQ0KPiA+ID4gdGFza19jbGVhbnVwIHdvcmsgaXRlbSBoYXMgY29tcGxldGVkIHJ1bm5pbmcs
IGFuZCB0aGF0IGl0IGNhbm5vdA0KPiA+ID4gYmUNCj4gPiANCj4gPiByZXF1ZXVlZC4NCj4gPiAN
Cj4gPiBJcyB0aGVyZSBhIHBvc3NpYmlsaXR5IHRoYXQgdGhlIGZsdXNoX3F1ZXVlKCkgbWlnaHQg
YmUgdHJpZ2dlcmVkDQo+ID4gZGVzcGl0ZSB0aGUgd29yayBpdGVtIG5vdCBiZWluZyBxdWV1ZWQ/
DQo+IA0KPiBZZWFoLCBmb3Igc3VyZS4gIFRoZSBsb2NrZGVwIGFubm90YXRpb25zIGRvbid0IGRp
c3Rpbmd1aXNoIHRob3NlDQo+IGNhc2VzIGFuZCBhc3N1bWUgdGhlIHdvcnN0IGNhc2UuDQo+IA0K
DQpPSy4gTGV0J3MganVzdCByZW1vdmUgdGhhdCBjYWxsIHRvIGNhbmNlbF93b3JrX3N5bmMoKSB0
aGVuLiBBcyBJIHNhaWQsDQppdCBzaG91bGQgYmUgcmVkdW5kYW50IGR1ZSB0byB0aGUgd2FpdF9v
bl9iaXRfbG9jaygpLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBt
YWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: net/sunrpc: v4.14-rc4 lockdep warning
  2017-10-11 17:49           ` Trond Myklebust
  (?)
@ 2017-10-16 13:34           ` Jan Glauber
  -1 siblings, 0 replies; 10+ messages in thread
From: Jan Glauber @ 2017-10-16 13:34 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: tj, bfields, linux-kernel, lorenzo.pieralisi, jlayton, linux-nfs,
	jiangshanlai, anna.schumaker

Hi Trond,

is there a patch available for this issue? I'm running into with
4.14-rc5 on my ARM64 board.

thanks, Jan

2017-10-11 19:49 GMT+02:00 Trond Myklebust <trondmy@primarydata.com>:
> On Tue, 2017-10-10 at 10:19 -0700, tj@kernel.org wrote:
>> Hello,
>>
>> On Tue, Oct 10, 2017 at 04:48:57PM +0000, Trond Myklebust wrote:
>> > Thanks for the explanation. What I'm not really understanding here
>> > though, is how the work item could be queued at all. We have a
>> > wait_on_bit_lock() in xprt_destroy() that should mean the xprt-
>> > > task_cleanup work item has completed running, and that it cannot
>> > > be
>> >
>> > requeued.
>> >
>> > Is there a possibility that the flush_queue() might be triggered
>> > despite the work item not being queued?
>>
>> Yeah, for sure.  The lockdep annotations don't distinguish those
>> cases and assume the worst case.
>>
>
> OK. Let's just remove that call to cancel_work_sync() then. As I said,
> it should be redundant due to the wait_on_bit_lock().
>
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-16 13:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 18:17 net/sunrpc: v4.14-rc4 lockdep warning Lorenzo Pieralisi
2017-10-09 18:32 ` Trond Myklebust
2017-10-09 18:32   ` Trond Myklebust
2017-10-10 14:03   ` tj
2017-10-10 16:48     ` Trond Myklebust
2017-10-10 16:48       ` Trond Myklebust
2017-10-10 17:19       ` tj
2017-10-11 17:49         ` Trond Myklebust
2017-10-11 17:49           ` Trond Myklebust
2017-10-16 13:34           ` Jan Glauber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.