On Tue, Aug 21 2018, Jeff Layton wrote: > On Tue, 2018-08-21 at 15:11 +1000, NeilBrown wrote: >> On Thu, Aug 16 2018, NeilBrown wrote: >> >> > On Wed, Aug 15 2018, Jeff Layton wrote: >> > >> > > On Wed, 2018-08-15 at 14:28 +0200, Krzysztof Kozlowski wrote: >> > > > Hi, >> > > > >> > > > Bisect pointed commit ce3147990450a68b3f549088b30f087742a08b5d >> > > > ("fs/locks: allow a lock request to block other requests.") to failure >> > > > boot of NFSv4 with root on several boards. >> > > > >> > > > Log is here: >> > > > https://krzk.eu/#/builders/21/builds/836/steps/12/logs/serial0 >> > > > >> > > > With several errors: >> > > > kernel BUG at ../fs/locks.c:336! >> > > > Unable to handle kernel NULL pointer dereference at virtual address 00000004 >> > > > >> > > > Configuration: >> > > > 1. exynos_defconfig >> > > > 2. Arch ARM Linux >> > > > 3. Boards: >> > > > a. Odroid family (ARMv7, octa-core (Cortex-A7+A15), Exynos5422 SoC) >> > > > b. Toradex Colibri VF50 (ARMv7, UP, Cortex-A5) >> > > > 4. Systemd: v236, 238 >> > > > 5. All boards boot from TFTP with NFS root (NFSv4) >> > > > >> > > > On Colibri VF50 I got slightly different errors: >> > > > [ 11.663204] Internal error: Oops - undefined instruction: 0 [#1] ARM >> > > > [ 12.455273] Unable to handle kernel NULL pointer dereference at >> > > > virtual address 00000004 >> > > > and only with some specific GCC (v6.3) or with other conditions which >> > > > I did not bisect yet. Maybe Colibri's failure is unrelated to that >> > > > commit. >> > > > >> > > > Best regards, >> > > > Krzysztof >> > >> > Thanks a lot for the report Krzysztof!! >> > >> > > >> > > The BUG is due to a lock being freed when the fl_blocked list wasn't >> > > empty (implying that there were still blocked locks waiting on it). >> > > >> > > There are a number of calls to locks_delete_lock_ctx in posix_lock_inode >> > > and I don't think the fl_blocked list is being handled properly with all >> > > of them. It only transplants the blocked locks to a new lock when there >> > > are surviving locks on the list, and that may not be the case when the >> > > whole file is being unlocked. >> > >> > locks_delete_lock_ctx() calls locks_unlink_lock_ctx() which calls >> > locks_wake_up_block() which doesn't only wake_up the blocks, but also >> > detached them. When that function completes, ->fl_blocked must be empty. >> > >> > The trace shows the locks_free_lock() call at the end of fcntl_setlk64() >> > as the problematic call. >> > This suggests that do_lock_file_wait() exited with ->fl_blocked >> > non-empty, which it shouldn't. >> > >> > I think we need to insert a call to locks_wake_up_block() in >> > do_lock_file_wait() before it returns. >> > I cannot find a sequence that would make this necessary, but >> > it isn't surprising that there might be one. >> > >> > I'll dig through the code a bit more later and make sure I understand >> > what is happening. >> > >> >> I think this problem if fixed by the following. It is probably >> triggered when the owner already has a lock for part of the requested >> range. After waiting for some other lock, the pending request gets >> merged with the existing lock, and blocked requests aren't moved across >> in that case. >> >> I still haven't done more testing, so this is just FYI, not a >> submission. >> >> Thanks, >> NeilBrown >> >> From: NeilBrown >> Date: Tue, 21 Aug 2018 15:09:06 +1000 >> Subject: [PATCH] fs/locks: always delete_block after waiting. >> >> Now that requests can block other requests, we >> need to be careful to always clean up those blocked >> requests. >> Any time that we wait for a request, we might have >> other requests attached, and when we stop waiting, >> we much clean them up. >> If the lock was granted, the requests might have been >> moved to the new lock, though when merged with a >> pre-exiting lock, this might not happen. >> No all cases we don't want blocked locks to remain >> attached, so we remove them to be safe. >> >> Signed-off-by: NeilBrown >> --- >> fs/locks.c | 24 +++++++++--------------- >> 1 file changed, 9 insertions(+), 15 deletions(-) >> >> diff --git a/fs/locks.c b/fs/locks.c >> index de38bafb7f7b..6b310112cf3b 100644 >> --- a/fs/locks.c >> +++ b/fs/locks.c >> @@ -1276,12 +1276,10 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl) >> if (error != FILE_LOCK_DEFERRED) >> break; >> error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); >> - if (!error) >> - continue; >> - >> - locks_delete_block(fl); >> - break; >> + if (error) >> + break; >> } >> + locks_delete_block(fl); >> return error; >> } >> >> @@ -1971,12 +1969,10 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl) >> if (error != FILE_LOCK_DEFERRED) >> break; >> error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); >> - if (!error) >> - continue; >> - >> - locks_delete_block(fl); >> - break; >> + if (error) >> + break; >> } >> + locks_delete_block(fl); >> return error; >> } >> >> @@ -2250,12 +2246,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd, >> if (error != FILE_LOCK_DEFERRED) >> break; >> error = wait_event_interruptible(fl->fl_wait, !fl->fl_blocker); >> - if (!error) >> - continue; >> - >> - locks_delete_block(fl); >> - break; >> + if (error) >> + break; >> } >> + locks_delete_block(fl); >> >> return error; >> } > > Thanks Neil. > > FWIW, I was able to reproduce something that looked a lot like what > Krzysztof reported by running the cthon04 lock tests on a client running > the kernel with the original set. > > I applied the above patch on top of that set, reran the test and got a > different BUG (list corruption): > > [ 85.117307] ------------[ cut here ]------------ > [ 85.118130] kernel BUG at lib/list_debug.c:53! > [ 85.118684] invalid opcode: 0000 [#1] SMP NOPTI > [ 85.119800] CPU: 5 PID: 92 Comm: kworker/u16:1 Not tainted 4.18.0+ #46 > [ 85.120845] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 > [ 85.122350] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 85.123242] RIP: 0010:__list_del_entry_valid.cold.1+0x34/0x4c > [ 85.124116] Code: 03 10 9b e8 08 07 ca ff 0f 0b 48 c7 c7 c8 03 10 9b e8 fa 06 ca ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 88 03 10 9b e8 e6 06 ca ff <0f> 0b 48 89 fe 48 c7 c7 50 03 10 9b e8 d5 06 ca ff 0f 0b 90 90 90 > [ 85.126704] RSP: 0018:ffffa0fe0133bd90 EFLAGS: 00010246 > [ 85.127382] RAX: 0000000000000054 RBX: ffff92bcf3a46ad8 RCX: 0000000000000000 > [ 85.128322] RDX: 0000000000000000 RSI: ffff92bcffd56828 RDI: ffff92bcffd56828 > [ 85.129251] RBP: ffff92bcf3a46b10 R08: 0000000000000000 R09: 0000000000aaaaaa > [ 85.130250] R10: 0000000000000000 R11: 0000000000000001 R12: ffff92bce2588618 > [ 85.131230] R13: ffff92bcf3a45800 R14: ffffffffc06f9f60 R15: ffffffffc06f9f60 > [ 85.132191] FS: 0000000000000000(0000) GS:ffff92bcffd40000(0000) knlGS:0000000000000000 > [ 85.133296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 85.134088] CR2: 00007f16c1292008 CR3: 0000000138588000 CR4: 00000000000006e0 > [ 85.134926] Call Trace: > [ 85.135251] __locks_delete_block+0x3f/0x70 > [ 85.135751] locks_delete_block+0x25/0x30 > [ 85.136259] locks_lock_inode_wait+0x63/0x150 > [ 85.136841] ? nfs41_release_slot+0x98/0xd0 [nfsv4] > [ 85.137556] nfs4_lock_done+0x1a2/0x1c0 [nfsv4] > [ 85.138272] rpc_exit_task+0x2d/0x80 [sunrpc] > [ 85.138994] __rpc_execute+0x7f/0x340 [sunrpc] > [ 85.139953] process_one_work+0x1a1/0x350 > [ 85.140678] worker_thread+0x30/0x380 > [ 85.141800] ? wq_update_unbound_numa+0x1a0/0x1a0 > [ 85.142904] kthread+0x112/0x130 > [ 85.143445] ? kthread_create_worker_on_cpu+0x70/0x70 > [ 85.144273] ret_from_fork+0x22/0x40 > [ 85.144859] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_conntrack nf_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc i2c_piix4 edac_mce_amd joydev pcspkr virtio_balloon xfs libcrc32c virtio_net net_failover virtio_console failover virtio_blk floppy qxl serio_raw drm_kms_helper ttm qemu_fw_cfg drm virtio_pci ata_generic pata_acpi virtio_rng virtio_ring virtio > [ 85.153426] ---[ end trace 63df06139208ee23 ]--- > Oh dear. nfs4_alloc_lockdata contains: memcpy(&p->fl, fl, sizeof(p->fl)); so any list_heads that are valid in fl will be invalid in p->fl. Maybe I should initialize the relevant list_heads at the start of wait functions. I should look more closely at what filesystems do with locks though. Thanks, NeilBrown