From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5F33C001DD for ; Wed, 28 Jun 2023 21:23:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230451AbjF1VW6 (ORCPT ); Wed, 28 Jun 2023 17:22:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232315AbjF1VWO (ORCPT ); Wed, 28 Jun 2023 17:22:14 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D66053A81 for ; Wed, 28 Jun 2023 14:17:45 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1b7e31154c4so660695ad.1 for ; Wed, 28 Jun 2023 14:17:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1687987065; x=1690579065; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=aQziuYew91uw/0bw2Pl0Zc1BFg+GVKPj9CR2YVGi5Aw=; b=t0JfEyC4VCwk0TDX5fgFzNA1W/AC0cp+cY4pSXiKLQ5cRkjoC9Auv7zdmpsRdNo9SL Kev05hC8jnTwSuWnzLfDnB6qHIV9Z8QOZXBMldUNiWSTyayPXkFgdyiqjmSkU+G9E42G am0pzlVPpYFnCh8DBcK8JMdjVbD4Ok1bKqAzdvVmlmSfcd1v4kyA6LYjWIV1e6mEW6yL lc6hKXH4wIYu64cAzVsLS+L09TZ6qCT7aWT0XFzXtyEPKybvsimdjobwgPI+BVKSHhJG VGczUOtqgaziUl4AAlB/DnyZZ2Fj3k1LyDhLomXMrDgf2BaaIP5FAkvxa7yd67z+BSMQ pFWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687987065; x=1690579065; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aQziuYew91uw/0bw2Pl0Zc1BFg+GVKPj9CR2YVGi5Aw=; b=OKAmMlseGC+XWH/+ssc33+mQq0MRIVuVfE4cB2MiY++LW8tqsoFoe/l2jbM3hcuS9f OXkIXDSbtxDdxiywk/q0cSbm0oV8sf8yn+bLU+6LtMMzVyq8jsWpumKwrCNI7Ht/PBnm 1CQfYLb/Ml/sWDsvOewPSXeZHdsW4PwgquLGz9JuA1MtJGtQq5DAtq5rJ0UcZdZeKSTN OButo5N9boi+kgn0aezY8iU6c+9wBZuSP1Ho8ILScRhzAeASdRIQbYNqEHLSqc3KcTpM Q6oE1guJn7gvc94fjilh+A16PdIuwd0MUSiAKYjaz7I/5DPe50wY/IroKgpthueO3gLz EMPw== X-Gm-Message-State: AC+VfDz3t3Q8ReRulDbg4Yxhp0sDzBJW2VG/QlQdiT4iX40aL8n2c2w7 RKKgpbJ1soF0l8hJmYr4YGlXOw== X-Google-Smtp-Source: ACHHUZ7wLqD+FTjJOLVRqthq+lfZRxWqssJ5dC4d2+07nBRUW0cx35fP1Tj7YFExunDGZggSBp2MMQ== X-Received: by 2002:a17:903:2051:b0:1b8:2ba0:c9a8 with SMTP id q17-20020a170903205100b001b82ba0c9a8mr4657183pla.2.1687987065278; Wed, 28 Jun 2023 14:17:45 -0700 (PDT) Received: from [192.168.1.136] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id az12-20020a170902a58c00b001b0034557afsm8086924plb.15.2023.06.28.14.17.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Jun 2023 14:17:44 -0700 (PDT) Message-ID: <2e635579-37ba-ddfc-a2ab-e6c080ab4971@kernel.dk> Date: Wed, 28 Jun 2023 15:17:43 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [GIT PULL] bcachefs Content-Language: en-US From: Jens Axboe To: Kent Overstreet Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org, Christoph Hellwig , Christian Brauner References: <20230627000635.43azxbkd2uf3tu6b@moria.home.lan> <91e9064b-84e3-1712-0395-b017c7c4a964@kernel.dk> <20230627020525.2vqnt2pxhtgiddyv@moria.home.lan> <23922545-917a-06bd-ec92-ff6aa66118e2@kernel.dk> <20230627201524.ool73bps2lre2tsz@moria.home.lan> <20230628040114.oz46icbsjpa4egpp@moria.home.lan> <4b863e62-4406-53e4-f96a-f4d1daf098ab@kernel.dk> <20230628175204.oeek4nnqx7ltlqmg@moria.home.lan> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/28/23 2:44?PM, Jens Axboe wrote: > On 6/28/23 11:52?AM, Kent Overstreet wrote: >> On Wed, Jun 28, 2023 at 10:57:02AM -0600, Jens Axboe wrote: >>> I discussed this with Christian offline. I have a patch that is pretty >>> simple, but it does mean that you'd wait for delayed fput flush off >>> umount. Which seems kind of iffy. >>> >>> I think we need to back up a bit and consider if the kill && umount >>> really is sane. If you kill a task that has open files, then any fput >>> from that task will end up being delayed. This means that the umount may >>> very well fail. >>> >>> It'd be handy if we could have umount wait for that to finish, but I'm >>> not at all confident this is a sane solution for all cases. And as >>> discussed, we have no way to even identify which files we'd need to >>> flush out of the delayed list. >>> >>> Maybe the test case just needs fixing? Christian suggested lazy/detach >>> umount and wait for sb release. There's an fsnotify hook for that, >>> fsnotify_sb_delete(). Obviously this is a bit more involved, but seems >>> to me that this would be the way to make it more reliable when killing >>> of tasks with open files are involved. >> >> No, this is a real breakage. Any time we introduce unexpected >> asynchrony there's the potential for breakage: case in point, there was >> a filesystem that made rm asynchronous, then there were scripts out >> there that deleted until df showed under some threshold.. whoops... > > This is nothing new - any fput done from an exiting task will end up > being deferred. The window may be a bit wider now or a bit different, > but it's the same window. If an application assumes it can kill && wait > on a task and be guaranteed that the files are released as soon as wait > returns, it is mistaken. That is NOT the case. Case in point, just changed my reproducer to use aio instead of io_uring. Here's the full script: #!/bin/bash DEV=/dev/nvme1n1 MNT=/data ITER=0 while true; do echo loop $ITER sudo mount $DEV $MNT fio --name=test --ioengine=aio --iodepth=2 --filename=$MNT/foo --size=1g --buffered=1 --overwrite=0 --numjobs=12 --minimal --rw=randread --output=/dev/null & Y=$(($RANDOM % 3)) X=$(($RANDOM % 10)) VAL="$Y.$X" sleep $VAL ps -e | grep fio > /dev/null 2>&1 while [ $? -eq 0 ]; do killall -9 fio > /dev/null 2>&1 echo will wait wait > /dev/null 2>&1 echo done waiting ps -e | grep "fio " > /dev/null 2>&1 done sudo umount /data if [ $? -ne 0 ]; then break fi ((ITER++)) done and if I run that, fails on the first umount attempt in that loop: axboe@m1max-kvm ~> bash test2.sh loop 0 will wait done waiting umount: /data: target is busy. So yeah, this is _nothing_ new. I really don't think trying to address this in the kernel is the right approach, it'd be a lot saner to harden the xfstest side to deal with the umount a bit more sanely. There are obviously tons of other ways that a mount could get pinned, which isn't too relevant here since the bdev and mount point are basically exclusive to the test being run. But the kill and delayed fput is enough to make that case imho. -- Jens Axboe