From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0580DC4320E for ; Mon, 30 Aug 2021 17:36:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF5C860F56 for ; Mon, 30 Aug 2021 17:36:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238289AbhH3RhY (ORCPT ); Mon, 30 Aug 2021 13:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238117AbhH3RhW (ORCPT ); Mon, 30 Aug 2021 13:37:22 -0400 Received: from mail-vs1-xe29.google.com (mail-vs1-xe29.google.com [IPv6:2607:f8b0:4864:20::e29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9CB1C06175F for ; Mon, 30 Aug 2021 10:36:28 -0700 (PDT) Received: by mail-vs1-xe29.google.com with SMTP id e9so11151029vst.6 for ; Mon, 30 Aug 2021 10:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2YdUdeUKZfBFzMR1jBM9RNC90FnM3wxYhfPH5nEI0FQ=; b=LX9ib2ZyUcSD/ZaFz+8eAUshfQOAn/4woZAxZhLaESc5RNlCL/7SNuGmtjHlB8Ra8a LYkHAy0oqhLyoAHQ/kFNQirR9PaGmx5S2gMPptsrRfGHJFRy0XPIcFTJMwD7Ebf4zMkp +ICtUsKogy/NDpufwRoIImN521zHSHcy6j20E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2YdUdeUKZfBFzMR1jBM9RNC90FnM3wxYhfPH5nEI0FQ=; b=MybRu7yQpMIDzT035d4On2LBvIDjfArrlLh5ERQiQyI+qKaI2rQdhlnTSpbav8c4EA lfVGiAGzFLUjEzw+idPEUu7p7rrigcfRsclw3RHZKQCzG8dusvQK7qHQV9FwBM73uaJn Zxk7L2fmaythkjrnZok4ExBwuEwm/VRkveywp07Te8TbZ6gj3DPGCg2uwICq/BgLB7yl jHnsMSDxkj3aemdR//gJPHAh3EF2Skow0lTgP7Q1xKbmmE7AYh7ZFrN/g6t27EpeXnr0 0bw1g2j5sAGrfxYXIo0FtFInF4jSzGi24HES1emApOV0g/nGBZfIsrzGW48bGosDw561 64uw== X-Gm-Message-State: AOAM5325701+tibvDWD1gC2QoEwXP4f9CqCVfvj+0pNHxPGJA8O3+V9W 1HcmYfQvq5djEkoWG2xLEAKH11g+YcC1YdnJ5DV5Cg== X-Google-Smtp-Source: ABdhPJwHwYOFG4pDjI/FzqFAeEptCPiadG2mKKA2G7M/xzR6URTq7cbu6NBgFZRZcR6Fj7zJzLpBXzKKJN97qEq8Mjw= X-Received: by 2002:a67:4c7:: with SMTP id 190mr16166130vse.0.1630344987923; Mon, 30 Aug 2021 10:36:27 -0700 (PDT) MIME-Version: 1.0 References: <20210520154654.1791183-1-groug@kaod.org> <20210520154654.1791183-6-groug@kaod.org> In-Reply-To: From: Miklos Szeredi Date: Mon, 30 Aug 2021 19:36:17 +0200 Message-ID: Subject: Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server To: Vivek Goyal Cc: Amir Goldstein , Greg Kurz , virtualization@lists.linux-foundation.org, linux-fsdevel , linux-kernel , virtio-fs-list , Stefan Hajnoczi , Max Reitz , Robert Krawitz Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 30 Aug 2021 at 19:01, Vivek Goyal wrote: > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > > struct fuse_args_pages *ap = &wpa->ia.ap; > > int i; > > > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) > > Hi Miklos, > > Wondering why this wpa->bucket check is there. Isn't every wpa is associated > bucket. So when do we run into situation when wpa->bucket = NULL. In case fc->sync_fs is false. > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > > > } > > > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > > + struct fuse_writepage_args *wpa) > > +{ > > + if (!fc->sync_fs) > > + return; > > + > > + rcu_read_lock(); > > + do { > > + wpa->bucket = rcu_dereference(fc->curr_bucket); > > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); > > So this loop is there because fuse_sync_fs() might be replacing > fc->curr_bucket. And we are fetching this pointer under rcu. So it is > possible that fuse_fs_sync() dropped its reference and that led to > ->num_writepages 0 and we don't want to use this bucket. > > What if fuse_sync_fs() dropped its reference but still there is another > wpa in progress and hence ->num_writepages is not zero. We still don't > want to use this bucket for new wpa, right? It's an unlikely race in which case the the write will go into the old bucket, and will be waited for, but that definitely should not be a problem. > > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > > if (!fc->sync_fs) > > return 0; > > > > + new_bucket = fuse_sync_bucket_alloc(); > > + spin_lock(&fc->lock); > > + bucket = fc->curr_bucket; > > + if (atomic_read(&bucket->num_writepages) != 0) { > > + /* One more for count completion of old bucket */ > > + atomic_inc(&new_bucket->num_writepages); > > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > > + /* Drop initially added active count */ > > + atomic_dec(&bucket->num_writepages); > > + spin_unlock(&fc->lock); > > + > > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > > + /* > > + * Drop count on new bucket, possibly resulting in a completion > > + * if more than one syncfs is going on > > + */ > > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > > + wake_up(&new_bucket->waitq); > > + kfree_rcu(bucket, rcu); > > + } else { > > + spin_unlock(&fc->lock); > > + /* Free unused */ > > + kfree(new_bucket); > When can we run into the situation when fc->curr_bucket is num_writepages > == 0. When install a bucket it has count 1. And only time it can go to > 0 is when we have dropped the initial reference. And initial reference > can be dropped only after removing bucket from fc->curr_bucket. > > IOW, we don't drop initial reference on a bucket if it is in > fc->curr_bucket. And that mean anything installed fc->curr_bucket should > not ever have a reference count of 0. What am I missing. You are correct. I fixed it by warning on zero count and checking for count != 1. I have other fixes as well, will send v2. Thanks, Miklos From mboxrd@z Thu Jan 1 00:00:00 1970 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2YdUdeUKZfBFzMR1jBM9RNC90FnM3wxYhfPH5nEI0FQ=; b=LX9ib2ZyUcSD/ZaFz+8eAUshfQOAn/4woZAxZhLaESc5RNlCL/7SNuGmtjHlB8Ra8a LYkHAy0oqhLyoAHQ/kFNQirR9PaGmx5S2gMPptsrRfGHJFRy0XPIcFTJMwD7Ebf4zMkp +ICtUsKogy/NDpufwRoIImN521zHSHcy6j20E= MIME-Version: 1.0 References: <20210520154654.1791183-1-groug@kaod.org> <20210520154654.1791183-6-groug@kaod.org> In-Reply-To: From: Miklos Szeredi Date: Mon, 30 Aug 2021 19:36:17 +0200 Message-ID: Content-Type: text/plain; charset="UTF-8" Subject: Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server List-Id: Development discussions about virtio-fs List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vivek Goyal Cc: Amir Goldstein , linux-kernel , virtio-fs-list , Max Reitz , linux-fsdevel , virtualization@lists.linux-foundation.org, Robert Krawitz On Mon, 30 Aug 2021 at 19:01, Vivek Goyal wrote: > > static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi, > > @@ -1608,6 +1609,9 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa) > > struct fuse_args_pages *ap = &wpa->ia.ap; > > int i; > > > > + if (wpa->bucket && atomic_dec_and_test(&wpa->bucket->num_writepages)) > > Hi Miklos, > > Wondering why this wpa->bucket check is there. Isn't every wpa is associated > bucket. So when do we run into situation when wpa->bucket = NULL. In case fc->sync_fs is false. > > @@ -1871,6 +1875,19 @@ static struct fuse_writepage_args *fuse_writepage_args_alloc(void) > > > > } > > > > +static void fuse_writepage_add_to_bucket(struct fuse_conn *fc, > > + struct fuse_writepage_args *wpa) > > +{ > > + if (!fc->sync_fs) > > + return; > > + > > + rcu_read_lock(); > > + do { > > + wpa->bucket = rcu_dereference(fc->curr_bucket); > > + } while (unlikely(!atomic_inc_not_zero(&wpa->bucket->num_writepages))); > > So this loop is there because fuse_sync_fs() might be replacing > fc->curr_bucket. And we are fetching this pointer under rcu. So it is > possible that fuse_fs_sync() dropped its reference and that led to > ->num_writepages 0 and we don't want to use this bucket. > > What if fuse_sync_fs() dropped its reference but still there is another > wpa in progress and hence ->num_writepages is not zero. We still don't > want to use this bucket for new wpa, right? It's an unlikely race in which case the the write will go into the old bucket, and will be waited for, but that definitely should not be a problem. > > @@ -528,6 +542,31 @@ static int fuse_sync_fs(struct super_block *sb, int wait) > > if (!fc->sync_fs) > > return 0; > > > > + new_bucket = fuse_sync_bucket_alloc(); > > + spin_lock(&fc->lock); > > + bucket = fc->curr_bucket; > > + if (atomic_read(&bucket->num_writepages) != 0) { > > + /* One more for count completion of old bucket */ > > + atomic_inc(&new_bucket->num_writepages); > > + rcu_assign_pointer(fc->curr_bucket, new_bucket); > > + /* Drop initially added active count */ > > + atomic_dec(&bucket->num_writepages); > > + spin_unlock(&fc->lock); > > + > > + wait_event(bucket->waitq, atomic_read(&bucket->num_writepages) == 0); > > + /* > > + * Drop count on new bucket, possibly resulting in a completion > > + * if more than one syncfs is going on > > + */ > > + if (atomic_dec_and_test(&new_bucket->num_writepages)) > > + wake_up(&new_bucket->waitq); > > + kfree_rcu(bucket, rcu); > > + } else { > > + spin_unlock(&fc->lock); > > + /* Free unused */ > > + kfree(new_bucket); > When can we run into the situation when fc->curr_bucket is num_writepages > == 0. When install a bucket it has count 1. And only time it can go to > 0 is when we have dropped the initial reference. And initial reference > can be dropped only after removing bucket from fc->curr_bucket. > > IOW, we don't drop initial reference on a bucket if it is in > fc->curr_bucket. And that mean anything installed fc->curr_bucket should > not ever have a reference count of 0. What am I missing. You are correct. I fixed it by warning on zero count and checking for count != 1. I have other fixes as well, will send v2. Thanks, Miklos