All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Dave Chinner <david@fromorbit.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org,
	xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync
Date: Tue, 5 Jan 2016 12:13:58 +0100	[thread overview]
Message-ID: <20160105111358.GD2724@quack.suse.cz> (raw)
In-Reply-To: <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com>

On Wed 23-12-15 12:39:17, Ross Zwisler wrote:
> To properly handle fsync/msync in an efficient way DAX needs to track dirty
> pages so it is able to flush them durably to media on demand.
> 
> The tracking of dirty pages is done via the radix tree in struct
> address_space.  This radix tree is already used by the page writeback
> infrastructure for tracking dirty pages associated with an open file, and
> it already has support for exceptional (non struct page*) entries.  We
> build upon these features to add exceptional entries to the radix tree for
> DAX dirty PMD or PTE pages at fault time.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
...
> +static int dax_writeback_one(struct block_device *bdev,
> +		struct address_space *mapping, pgoff_t index, void *entry)
> +{
> +	struct radix_tree_root *page_tree = &mapping->page_tree;
> +	int type = RADIX_DAX_TYPE(entry);
> +	struct radix_tree_node *node;
> +	struct blk_dax_ctl dax;
> +	void **slot;
> +	int ret = 0;
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * Regular page slots are stabilized by the page lock even
> +	 * without the tree itself locked.  These unlocked entries
> +	 * need verification under the tree lock.
> +	 */
> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unlock;
> +	if (*slot != entry)
> +		goto unlock;
> +
> +	/* another fsync thread may have already written back this entry */
> +	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +		goto unlock;
> +
> +	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +
> +	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	dax.sector = RADIX_DAX_SECTOR(entry);
> +	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> +	spin_unlock_irq(&mapping->tree_lock);
> +
> +	/*
> +	 * We cannot hold tree_lock while calling dax_map_atomic() because it
> +	 * eventually calls cond_resched().
> +	 */
> +	ret = dax_map_atomic(bdev, &dax);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (WARN_ON_ONCE(ret < dax.size)) {
> +		ret = -EIO;
> +		dax_unmap_atomic(bdev, &dax);
> +		return ret;
> +	}
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * We need to revalidate our radix entry while holding tree_lock
> +	 * before we do the writeback.
> +	 */

Do we really need to revalidate here? dax_map_atomic() makes sure the addr
& size is still part of the device. I guess you are concerned that due to
truncate or similar operation those sectors needn't belong to the same file
anymore but we don't really care about flushing sectors for someone else,
do we?

Otherwise the patch looks good to me.

> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unmap;
> +	if (*slot != entry)
> +		goto unmap;
> +
> +	wb_cache_pmem(dax.addr, dax.size);
> + unmap:
> +	dax_unmap_atomic(bdev, &dax);
> + unlock:
> +	spin_unlock_irq(&mapping->tree_lock);
> +	return ret;
> +}

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Dave Chinner <david@fromorbit.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
	xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync
Date: Tue, 5 Jan 2016 12:13:58 +0100	[thread overview]
Message-ID: <20160105111358.GD2724@quack.suse.cz> (raw)
In-Reply-To: <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com>

On Wed 23-12-15 12:39:17, Ross Zwisler wrote:
> To properly handle fsync/msync in an efficient way DAX needs to track dirty
> pages so it is able to flush them durably to media on demand.
> 
> The tracking of dirty pages is done via the radix tree in struct
> address_space.  This radix tree is already used by the page writeback
> infrastructure for tracking dirty pages associated with an open file, and
> it already has support for exceptional (non struct page*) entries.  We
> build upon these features to add exceptional entries to the radix tree for
> DAX dirty PMD or PTE pages at fault time.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
...
> +static int dax_writeback_one(struct block_device *bdev,
> +		struct address_space *mapping, pgoff_t index, void *entry)
> +{
> +	struct radix_tree_root *page_tree = &mapping->page_tree;
> +	int type = RADIX_DAX_TYPE(entry);
> +	struct radix_tree_node *node;
> +	struct blk_dax_ctl dax;
> +	void **slot;
> +	int ret = 0;
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * Regular page slots are stabilized by the page lock even
> +	 * without the tree itself locked.  These unlocked entries
> +	 * need verification under the tree lock.
> +	 */
> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unlock;
> +	if (*slot != entry)
> +		goto unlock;
> +
> +	/* another fsync thread may have already written back this entry */
> +	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +		goto unlock;
> +
> +	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +
> +	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	dax.sector = RADIX_DAX_SECTOR(entry);
> +	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> +	spin_unlock_irq(&mapping->tree_lock);
> +
> +	/*
> +	 * We cannot hold tree_lock while calling dax_map_atomic() because it
> +	 * eventually calls cond_resched().
> +	 */
> +	ret = dax_map_atomic(bdev, &dax);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (WARN_ON_ONCE(ret < dax.size)) {
> +		ret = -EIO;
> +		dax_unmap_atomic(bdev, &dax);
> +		return ret;
> +	}
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * We need to revalidate our radix entry while holding tree_lock
> +	 * before we do the writeback.
> +	 */

Do we really need to revalidate here? dax_map_atomic() makes sure the addr
& size is still part of the device. I guess you are concerned that due to
truncate or similar operation those sectors needn't belong to the same file
anymore but we don't really care about flushing sectors for someone else,
do we?

Otherwise the patch looks good to me.

> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unmap;
> +	if (*slot != entry)
> +		goto unmap;
> +
> +	wb_cache_pmem(dax.addr, dax.size);
> + unmap:
> +	dax_unmap_atomic(bdev, &dax);
> + unlock:
> +	spin_unlock_irq(&mapping->tree_lock);
> +	return ret;
> +}

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org, x86@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-kernel@vger.kernel.org, Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync
Date: Tue, 5 Jan 2016 12:13:58 +0100	[thread overview]
Message-ID: <20160105111358.GD2724@quack.suse.cz> (raw)
In-Reply-To: <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com>

On Wed 23-12-15 12:39:17, Ross Zwisler wrote:
> To properly handle fsync/msync in an efficient way DAX needs to track dirty
> pages so it is able to flush them durably to media on demand.
> 
> The tracking of dirty pages is done via the radix tree in struct
> address_space.  This radix tree is already used by the page writeback
> infrastructure for tracking dirty pages associated with an open file, and
> it already has support for exceptional (non struct page*) entries.  We
> build upon these features to add exceptional entries to the radix tree for
> DAX dirty PMD or PTE pages at fault time.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
...
> +static int dax_writeback_one(struct block_device *bdev,
> +		struct address_space *mapping, pgoff_t index, void *entry)
> +{
> +	struct radix_tree_root *page_tree = &mapping->page_tree;
> +	int type = RADIX_DAX_TYPE(entry);
> +	struct radix_tree_node *node;
> +	struct blk_dax_ctl dax;
> +	void **slot;
> +	int ret = 0;
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * Regular page slots are stabilized by the page lock even
> +	 * without the tree itself locked.  These unlocked entries
> +	 * need verification under the tree lock.
> +	 */
> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unlock;
> +	if (*slot != entry)
> +		goto unlock;
> +
> +	/* another fsync thread may have already written back this entry */
> +	if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +		goto unlock;
> +
> +	radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +
> +	if (WARN_ON_ONCE(type != RADIX_DAX_PTE && type != RADIX_DAX_PMD)) {
> +		ret = -EIO;
> +		goto unlock;
> +	}
> +
> +	dax.sector = RADIX_DAX_SECTOR(entry);
> +	dax.size = (type == RADIX_DAX_PMD ? PMD_SIZE : PAGE_SIZE);
> +	spin_unlock_irq(&mapping->tree_lock);
> +
> +	/*
> +	 * We cannot hold tree_lock while calling dax_map_atomic() because it
> +	 * eventually calls cond_resched().
> +	 */
> +	ret = dax_map_atomic(bdev, &dax);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (WARN_ON_ONCE(ret < dax.size)) {
> +		ret = -EIO;
> +		dax_unmap_atomic(bdev, &dax);
> +		return ret;
> +	}
> +
> +	spin_lock_irq(&mapping->tree_lock);
> +	/*
> +	 * We need to revalidate our radix entry while holding tree_lock
> +	 * before we do the writeback.
> +	 */

Do we really need to revalidate here? dax_map_atomic() makes sure the addr
& size is still part of the device. I guess you are concerned that due to
truncate or similar operation those sectors needn't belong to the same file
anymore but we don't really care about flushing sectors for someone else,
do we?

Otherwise the patch looks good to me.

> +	if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +		goto unmap;
> +	if (*slot != entry)
> +		goto unmap;
> +
> +	wb_cache_pmem(dax.addr, dax.size);
> + unmap:
> +	dax_unmap_atomic(bdev, &dax);
> + unlock:
> +	spin_unlock_irq(&mapping->tree_lock);
> +	return ret;
> +}

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-01-05 11:13 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-23 19:39 [PATCH v6 0/7] DAX fsync/msync support Ross Zwisler
2015-12-23 19:39 ` Ross Zwisler
2015-12-23 19:39 ` Ross Zwisler
2015-12-23 19:39 ` [PATCH v6 1/7] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39 ` [PATCH v6 2/7] dax: support dirty DAX entries in radix tree Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-30  8:02   ` Bob Liu
2015-12-30  8:02     ` Bob Liu
2015-12-30  8:02     ` Bob Liu
2015-12-30 20:39     ` Dan Williams
2015-12-30 20:39       ` Dan Williams
2015-12-30 20:39       ` Dan Williams
2015-12-31  3:28       ` Bob Liu
2015-12-31  3:28         ` Bob Liu
2015-12-31  3:28         ` Bob Liu
2015-12-31  3:28         ` Bob Liu
2015-12-31 22:08         ` Dan Williams
2015-12-31 22:08           ` Dan Williams
2015-12-31 22:08           ` Dan Williams
2016-01-05  9:41   ` Jan Kara
2016-01-05  9:41     ` Jan Kara
2016-01-05  9:41     ` Jan Kara
2015-12-23 19:39 ` [PATCH v6 3/7] mm: add find_get_entries_tag() Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-24  0:28   ` Elliott, Robert (Persistent Memory)
2015-12-24  0:28     ` Elliott, Robert (Persistent Memory)
2015-12-24  0:28     ` Elliott, Robert (Persistent Memory)
2015-12-23 19:39 ` [PATCH v6 4/7] dax: add support for fsync/msync Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2016-01-03 18:13   ` Dan Williams
2016-01-03 18:13     ` Dan Williams
2016-01-03 18:13     ` Dan Williams
2016-01-05 11:13     ` Jan Kara
2016-01-05 11:13       ` Jan Kara
2016-01-05 11:13       ` Jan Kara
2016-01-05 15:50       ` Ross Zwisler
2016-01-05 15:50         ` Ross Zwisler
2016-01-05 15:50         ` Ross Zwisler
2016-01-05 15:50         ` Ross Zwisler
2016-01-06 18:10     ` Ross Zwisler
2016-01-06 18:10       ` Ross Zwisler
2016-01-06 18:10       ` Ross Zwisler
2016-01-06 18:10       ` Ross Zwisler
2016-01-05 11:13   ` Jan Kara [this message]
2016-01-05 11:13     ` Jan Kara
2016-01-05 11:13     ` Jan Kara
2016-01-05 17:12     ` Ross Zwisler
2016-01-05 17:12       ` Ross Zwisler
2016-01-05 17:12       ` Ross Zwisler
2016-01-05 17:12       ` Ross Zwisler
2016-01-05 17:20       ` Dan Williams
2016-01-05 17:20         ` Dan Williams
2016-01-05 17:20         ` Dan Williams
2016-01-05 17:20         ` Dan Williams
2016-01-05 18:14         ` Ross Zwisler
2016-01-05 18:14           ` Ross Zwisler
2016-01-05 18:14           ` Ross Zwisler
2016-01-05 18:22           ` Dan Williams
2016-01-05 18:22             ` Dan Williams
2016-01-05 18:22             ` Dan Williams
2016-01-05 18:22             ` Dan Williams
2015-12-23 19:39 ` [PATCH v6 5/7] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39 ` [PATCH v6 6/7] ext4: " Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39 ` [PATCH v6 7/7] xfs: " Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler
2015-12-23 19:39   ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160105111358.GD2724@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.