From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D85B0C433F5 for ; Thu, 27 Jan 2022 01:06:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234221AbiA0BGH (ORCPT ); Wed, 26 Jan 2022 20:06:07 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:59993 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234153AbiA0BGG (ORCPT ); Wed, 26 Jan 2022 20:06:06 -0500 Received: from fsav113.sakura.ne.jp (fsav113.sakura.ne.jp [27.133.134.240]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 20R15lEu045928; Thu, 27 Jan 2022 10:05:47 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav113.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav113.sakura.ne.jp); Thu, 27 Jan 2022 10:05:47 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav113.sakura.ne.jp) Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 20R15lAq045925 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Thu, 27 Jan 2022 10:05:47 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <7bebf860-2415-7eb6-55a1-47dc4439d9e9@I-love.SAKURA.ne.jp> Date: Thu, 27 Jan 2022 10:05:45 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: yet another approach to fix loop autoclear for xfstets xfs/049 Content-Language: en-US To: Christoph Hellwig , Jens Axboe Cc: Jan Kara , linux-block@vger.kernel.org, Ming Lei References: <20220126155040.1190842-1-hch@lst.de> From: Tetsuo Handa In-Reply-To: <20220126155040.1190842-1-hch@lst.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 2022/01/27 0:50, Christoph Hellwig wrote: > Hi Jens, hi Tetsuo, > > this series uses the approach from Tetsuo to delay the destroy_workueue > call, extended by a delayed teardown of the workers to fix a potential > racewindow then the workqueue can be still round after finishing the > commands. It then also removed the queue freeing in release that is > not needed to fix the dependency chain for that (which can't be > reported by lockdep) as well. I want to remove disk->open_mutex => lo->lo_mutex => I/O completion chain itself from /proc/lockdep . Even if real deadlock does not happen due to lo->lo_refcnt exclusion, I consider that disk->open_mutex => lo->lo_mutex dependency being recorded is a bad sign. It makes difficult to find real possibility of deadlock when things change in future. I consider that lo_release() is doing too much things under disk->open_mutex. I tried to defer lo->lo_mutex in lo_release() as much as possible. But this chain still remains. diff --git a/drivers/block/loop.c b/drivers/block/loop.c index cf7830a68113..a9abd213b38d 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1706,12 +1706,19 @@ static int lo_open(struct block_device *bdev, fmode_t mode) struct loop_device *lo = bdev->bd_disk->private_data; int err; + /* + * Since lo_open() and lo_release() are serialized by disk->open_mutex, + * we can racelessly increment/decrement lo->lo_refcnt without holding + * lo->lo_mutex. + */ if (atomic_inc_return(&lo->lo_refcnt) > 1) return 0; err = mutex_lock_killable(&lo->lo_mutex); - if (err) + if (err) { + atomic_dec(&lo->lo_refcnt); return err; + } if (lo->lo_state == Lo_deleting) { atomic_dec(&lo->lo_refcnt); err = -ENXIO; @@ -1727,16 +1734,18 @@ static void lo_release(struct gendisk *disk, fmode_t mode) if (!atomic_dec_and_test(&lo->lo_refcnt)) return; - mutex_lock(&lo->lo_mutex); - if (lo->lo_state == Lo_bound && - (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) { - lo->lo_state = Lo_rundown; - mutex_unlock(&lo->lo_mutex); - __loop_clr_fd(lo, true); + /* + * Since lo_open() and lo_release() are serialized by disk->open_mutex, + * and lo->refcnt == 0 means nobody is using this device, we can read + * lo->lo_state and lo->lo_flags without holding lo->lo_mutex. + */ + if (lo->lo_state != Lo_bound || !(lo->lo_flags & LO_FLAGS_AUTOCLEAR)) return; - } - + mutex_lock(&lo->lo_mutex); + lo->lo_state = Lo_rundown; mutex_unlock(&lo->lo_mutex); + __loop_clr_fd(lo, true); + return; } static const struct block_device_operations lo_fops = { In __loop_clr_fd(), it waits for loop_validate_mutex. loop_validate_mutex can be held when loop_change_fd() calls blk_mq_freeze_queue(). Loop recursion interacts with other loop devices. While each loop device takes care of only single backing file, we can use multiple loop devices for multiple backing files within the same mount point (e.g. /dev/loop0 is attached on /mnt/file0 and /dev/loop1 is attached on /mnt/file1), can't we? But fsfreeze is per a mount point (e.g. /mnt/). That is, fsfreeze also interacts with other loop devices, doesn't it? disk->open_mutex is a per a loop device serialization, but loop_validate_mutex and fsfreeze are global serialization. It is anxious to involve global serialization under individual serialization, when we already know that involvement of sysfs + fsfreeze causes possibility of deadlock. I expect that lo_open()/lo_release() are done without holding disk->open_mutex.