From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB2FCC4363A for ; Mon, 26 Oct 2020 11:26:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 73F2F2464E for ; Mon, 26 Oct 2020 11:26:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="MQo0xXqT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1773791AbgJZL0w (ORCPT ); Mon, 26 Oct 2020 07:26:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:49252 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1773761AbgJZL0v (ORCPT ); Mon, 26 Oct 2020 07:26:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1603711610; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=jvjWuxh1rJEFJZVLVpImzX0aYv344XBeVk5if5mZwXM=; b=MQo0xXqTGIT4cQPVjeQuE4i1Lfz91bgQjv8yduRJI/rYkMkyOpWGbgYRnZfyuwArNyCkwR VaNWu7388AU3a6a/UKrN951e88GZcyEPRwn9u+4DgGdOeyQd1nKKqEh1rFLueApw1aTe+W /fBKLZsA6GjJl+a92tuj1l+8ZDrucRc= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4AF87AC98; Mon, 26 Oct 2020 11:26:50 +0000 (UTC) From: Filipe Manana Subject: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion") To: LKML Cc: peterz@infradead.org, Jan Kara , David Sterba Message-ID: Date: Mon, 26 Oct 2020 11:26:49 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I've recently started to hit a warning followed by tasks hanging after attempts to freeze a filesystem. A git bisection pointed to the following commit: commit 4d004099a668c41522242aa146a38cc4eb59cb1e Author: Peter Zijlstra Date: Fri Oct 2 11:04:21 2020 +0200 lockdep: Fix lockdep recursion This happens very reliably when running all xfstests with lockdep enabled, and the tested filesystem is btrfs (haven't tried other filesystems, but it shouldn't matter). The warning and task hangs always happen at either test generic/068 or test generic/390, and (oddly) always have to run all tests for it to trigger, running those tests individually on an infinite loop doesn't seem to trigger it (at least for a couple hours). The warning triggered is at fs/super.c:__sb_start_write() which always results later in several tasks hanging on a percpu rw_sem: https://pastebin.com/qnLvf94E What happens is percpu_rwsem_is_held() is apparently returning a false positive, so this makes __sb_start_write() do a percpu_down_read_trylock() on a percpu_rw_sem at a higher level, which is expected to always succeed, because if the calling task is holding a freeze percpu_rw_sem at level 1, it's supposed to be able to try_lock the semaphore at level 2, since the freeze semaphores are always acquired by increasing level order. But the try_lock fails, it triggers the warning at __sb_start_write(), then its caller sb_start_pagefault() ignores the return value and callers such as btrfs_page_mkwrite() make the assumption the freeze semaphore was taken, proceed to do their stuff, and later call sb_end_pagefault(), which which will do an up_read() on the percpu_rwsem at level 2 despite not having not been able to down_read() the semaphore. This obviously corrupts the semaphore's read_count state, and later causes any task trying to down_write() it to hang forever. After such a hang I ran a drgn script to confirm it: $ cat dump_freeze_sems.py import sys import drgn from drgn import NULL, Object, cast, container_of, execscript, \ reinterpret, sizeof from drgn.helpers.linux import * mnt_path = b'/home/fdmanana/btrfs-tests/scratch_1' mnt = None for mnt in for_each_mount(prog, dst = mnt_path): pass if mnt is None: sys.stderr.write(f'Error: mount point {mnt_path} not found\n') sys.exit(1) def dump_sem(level_enum): level = level_enum.value_() sem = mnt.mnt.mnt_sb.s_writers.rw_sem[level - 1] print(f'freeze semaphore at level {level}, {str(level_enum)}') print(f' block {sem.block.counter.value_()}') for i in for_each_possible_cpu(prog): read_count = per_cpu_ptr(sem.read_count, i) print(f' read_count at cpu {i} = {read_count}') print() # dump semaphore read counts for all freeze levels (fs.h) dump_sem(prog['SB_FREEZE_WRITE']) dump_sem(prog['SB_FREEZE_PAGEFAULT']) dump_sem(prog['SB_FREEZE_FS']) $ drgn dump_freeze_sems.py freeze semaphore at level 1, (enum )SB_FREEZE_WRITE block 1 read_count at cpu 0 = *(unsigned int *)0xffffc2ec3ee00c74 = 3 read_count at cpu 1 = *(unsigned int *)0xffffc2ec3f200c74 = 4294967293 read_count at cpu 2 = *(unsigned int *)0xffffc2ec3f600c74 = 3 read_count at cpu 3 = *(unsigned int *)0xffffc2ec3fa00c74 = 4294967293 freeze semaphore at level 2, (enum )SB_FREEZE_PAGEFAULT block 1 read_count at cpu 0 = *(unsigned int *)0xffffc2ec3ee00c78 = 0 read_count at cpu 1 = *(unsigned int *)0xffffc2ec3f200c78 = 4294967295 read_count at cpu 2 = *(unsigned int *)0xffffc2ec3f600c78 = 0 read_count at cpu 3 = *(unsigned int *)0xffffc2ec3fa00c78 = 0 freeze semaphore at level 3, (enum )SB_FREEZE_FS block 0 read_count at cpu 0 = *(unsigned int *)0xffffc2ec3ee00c7c = 0 read_count at cpu 1 = *(unsigned int *)0xffffc2ec3f200c7c = 0 read_count at cpu 2 = *(unsigned int *)0xffffc2ec3f600c7c = 0 read_count at cpu 3 = *(unsigned int *)0xffffc2ec3fa00c7c = 0 At levels 1 and 3, read_count sums to 0, so it's fine, but at level 2 it sums to -1. The system remains like that for hours at least, with no progress at all. Is there a known regression with that lockdep commit? Anything I can do to help debug it in case it's not obvious? Thanks.