All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org,
	xfs@oss.sgi.com
Subject: Re: [PATCH v8 4/9] dax: support dirty DAX entries in radix tree
Date: Wed, 13 Jan 2016 10:44:11 +0100	[thread overview]
Message-ID: <20160113094411.GA17057@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-5-git-send-email-ross.zwisler@linux.intel.com>

On Thu 07-01-16 22:27:54, Ross Zwisler wrote:
> Add support for tracking dirty DAX entries in the struct address_space
> radix tree.  This tree is already used for dirty page writeback, and it
> already supports the use of exceptional (non struct page*) entries.
> 
> In order to properly track dirty DAX pages we will insert new exceptional
> entries into the radix tree that represent dirty DAX PTE or PMD pages.
> These exceptional entries will also contain the writeback sectors for the
> PTE or PMD faults that we can use at fsync/msync time.
> 
> There are currently two types of exceptional entries (shmem and shadow)
> that can be placed into the radix tree, and this adds a third.  We rely on
> the fact that only one type of exceptional entry can be found in a given
> radix tree based on its usage.  This happens for free with DAX vs shmem but
> we explicitly prevent shadow entries from being added to radix trees for
> DAX mappings.
> 
> The only shadow entries that would be generated for DAX radix trees would
> be to track zero page mappings that were created for holes.  These pages
> would receive minimal benefit from having shadow entries, and the choice
> to have only one type of exceptional entry in a given radix tree makes the
> logic simpler both in clear_exceptional_entry() and in the rest of DAX.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

I have realized there's one issue with this code. See below:

> @@ -34,31 +35,39 @@ static void clear_exceptional_entry(struct address_space *mapping,
>  		return;
>  
>  	spin_lock_irq(&mapping->tree_lock);
> -	/*
> -	 * Regular page slots are stabilized by the page lock even
> -	 * without the tree itself locked.  These unlocked entries
> -	 * need verification under the tree lock.
> -	 */
> -	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
> -		goto unlock;
> -	if (*slot != entry)
> -		goto unlock;
> -	radix_tree_replace_slot(slot, NULL);
> -	mapping->nrshadows--;
> -	if (!node)
> -		goto unlock;
> -	workingset_node_shadows_dec(node);
> -	/*
> -	 * Don't track node without shadow entries.
> -	 *
> -	 * Avoid acquiring the list_lru lock if already untracked.
> -	 * The list_empty() test is safe as node->private_list is
> -	 * protected by mapping->tree_lock.
> -	 */
> -	if (!workingset_node_shadows(node) &&
> -	    !list_empty(&node->private_list))
> -		list_lru_del(&workingset_shadow_nodes, &node->private_list);
> -	__radix_tree_delete_node(&mapping->page_tree, node);
> +
> +	if (dax_mapping(mapping)) {
> +		if (radix_tree_delete_item(&mapping->page_tree, index, entry))
> +			mapping->nrexceptional--;

So when you punch hole in a file, you can delete a PMD entry from a radix
tree which covers part of the file which still stays. So in this case you
have to split the PMD entry into PTE entries (probably that needs to happen
up in truncate_inode_pages_range()) or something similar...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
	xfs@oss.sgi.com
Subject: Re: [PATCH v8 4/9] dax: support dirty DAX entries in radix tree
Date: Wed, 13 Jan 2016 10:44:11 +0100	[thread overview]
Message-ID: <20160113094411.GA17057@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-5-git-send-email-ross.zwisler@linux.intel.com>

On Thu 07-01-16 22:27:54, Ross Zwisler wrote:
> Add support for tracking dirty DAX entries in the struct address_space
> radix tree.  This tree is already used for dirty page writeback, and it
> already supports the use of exceptional (non struct page*) entries.
> 
> In order to properly track dirty DAX pages we will insert new exceptional
> entries into the radix tree that represent dirty DAX PTE or PMD pages.
> These exceptional entries will also contain the writeback sectors for the
> PTE or PMD faults that we can use at fsync/msync time.
> 
> There are currently two types of exceptional entries (shmem and shadow)
> that can be placed into the radix tree, and this adds a third.  We rely on
> the fact that only one type of exceptional entry can be found in a given
> radix tree based on its usage.  This happens for free with DAX vs shmem but
> we explicitly prevent shadow entries from being added to radix trees for
> DAX mappings.
> 
> The only shadow entries that would be generated for DAX radix trees would
> be to track zero page mappings that were created for holes.  These pages
> would receive minimal benefit from having shadow entries, and the choice
> to have only one type of exceptional entry in a given radix tree makes the
> logic simpler both in clear_exceptional_entry() and in the rest of DAX.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

I have realized there's one issue with this code. See below:

> @@ -34,31 +35,39 @@ static void clear_exceptional_entry(struct address_space *mapping,
>  		return;
>  
>  	spin_lock_irq(&mapping->tree_lock);
> -	/*
> -	 * Regular page slots are stabilized by the page lock even
> -	 * without the tree itself locked.  These unlocked entries
> -	 * need verification under the tree lock.
> -	 */
> -	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
> -		goto unlock;
> -	if (*slot != entry)
> -		goto unlock;
> -	radix_tree_replace_slot(slot, NULL);
> -	mapping->nrshadows--;
> -	if (!node)
> -		goto unlock;
> -	workingset_node_shadows_dec(node);
> -	/*
> -	 * Don't track node without shadow entries.
> -	 *
> -	 * Avoid acquiring the list_lru lock if already untracked.
> -	 * The list_empty() test is safe as node->private_list is
> -	 * protected by mapping->tree_lock.
> -	 */
> -	if (!workingset_node_shadows(node) &&
> -	    !list_empty(&node->private_list))
> -		list_lru_del(&workingset_shadow_nodes, &node->private_list);
> -	__radix_tree_delete_node(&mapping->page_tree, node);
> +
> +	if (dax_mapping(mapping)) {
> +		if (radix_tree_delete_item(&mapping->page_tree, index, entry))
> +			mapping->nrexceptional--;

So when you punch hole in a file, you can delete a PMD entry from a radix
tree which covers part of the file which still stays. So in this case you
have to split the PMD entry into PTE entries (probably that needs to happen
up in truncate_inode_pages_range()) or something similar...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, Andreas Dilger <adilger.kernel@dilger.ca>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org, x86@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-kernel@vger.kernel.org, Jan Kara <jack@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [PATCH v8 4/9] dax: support dirty DAX entries in radix tree
Date: Wed, 13 Jan 2016 10:44:11 +0100	[thread overview]
Message-ID: <20160113094411.GA17057@quack.suse.cz> (raw)
In-Reply-To: <1452230879-18117-5-git-send-email-ross.zwisler@linux.intel.com>

On Thu 07-01-16 22:27:54, Ross Zwisler wrote:
> Add support for tracking dirty DAX entries in the struct address_space
> radix tree.  This tree is already used for dirty page writeback, and it
> already supports the use of exceptional (non struct page*) entries.
> 
> In order to properly track dirty DAX pages we will insert new exceptional
> entries into the radix tree that represent dirty DAX PTE or PMD pages.
> These exceptional entries will also contain the writeback sectors for the
> PTE or PMD faults that we can use at fsync/msync time.
> 
> There are currently two types of exceptional entries (shmem and shadow)
> that can be placed into the radix tree, and this adds a third.  We rely on
> the fact that only one type of exceptional entry can be found in a given
> radix tree based on its usage.  This happens for free with DAX vs shmem but
> we explicitly prevent shadow entries from being added to radix trees for
> DAX mappings.
> 
> The only shadow entries that would be generated for DAX radix trees would
> be to track zero page mappings that were created for holes.  These pages
> would receive minimal benefit from having shadow entries, and the choice
> to have only one type of exceptional entry in a given radix tree makes the
> logic simpler both in clear_exceptional_entry() and in the rest of DAX.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

I have realized there's one issue with this code. See below:

> @@ -34,31 +35,39 @@ static void clear_exceptional_entry(struct address_space *mapping,
>  		return;
>  
>  	spin_lock_irq(&mapping->tree_lock);
> -	/*
> -	 * Regular page slots are stabilized by the page lock even
> -	 * without the tree itself locked.  These unlocked entries
> -	 * need verification under the tree lock.
> -	 */
> -	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
> -		goto unlock;
> -	if (*slot != entry)
> -		goto unlock;
> -	radix_tree_replace_slot(slot, NULL);
> -	mapping->nrshadows--;
> -	if (!node)
> -		goto unlock;
> -	workingset_node_shadows_dec(node);
> -	/*
> -	 * Don't track node without shadow entries.
> -	 *
> -	 * Avoid acquiring the list_lru lock if already untracked.
> -	 * The list_empty() test is safe as node->private_list is
> -	 * protected by mapping->tree_lock.
> -	 */
> -	if (!workingset_node_shadows(node) &&
> -	    !list_empty(&node->private_list))
> -		list_lru_del(&workingset_shadow_nodes, &node->private_list);
> -	__radix_tree_delete_node(&mapping->page_tree, node);
> +
> +	if (dax_mapping(mapping)) {
> +		if (radix_tree_delete_item(&mapping->page_tree, index, entry))
> +			mapping->nrexceptional--;

So when you punch hole in a file, you can delete a PMD entry from a radix
tree which covers part of the file which still stays. So in this case you
have to split the PMD entry into PTE entries (probably that needs to happen
up in truncate_inode_pages_range()) or something similar...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-01-13  9:44 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-08  5:27 [PATCH v8 0/9] DAX fsync/msync support Ross Zwisler
2016-01-08  5:27 ` Ross Zwisler
2016-01-08  5:27 ` Ross Zwisler
2016-01-08  5:27 ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 1/9] dax: fix NULL pointer dereference in __dax_dbg() Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-12  9:34   ` Jan Kara
2016-01-12  9:34     ` Jan Kara
2016-01-12  9:34     ` Jan Kara
2016-01-13  7:08     ` Ross Zwisler
2016-01-13  7:08       ` Ross Zwisler
2016-01-13  7:08       ` Ross Zwisler
2016-01-13  9:07       ` Jan Kara
2016-01-13  9:07         ` Jan Kara
2016-01-13  9:07         ` Jan Kara
2016-01-08  5:27 ` [PATCH v8 2/9] dax: fix conversion of holes to PMDs Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-12  9:44   ` Jan Kara
2016-01-12  9:44     ` Jan Kara
2016-01-12  9:44     ` Jan Kara
2016-01-13  7:37     ` Ross Zwisler
2016-01-13  7:37       ` Ross Zwisler
2016-01-13  7:37       ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 3/9] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 4/9] dax: support dirty DAX entries in radix tree Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-13  9:44   ` Jan Kara [this message]
2016-01-13  9:44     ` Jan Kara
2016-01-13  9:44     ` Jan Kara
2016-01-13 18:48     ` Ross Zwisler
2016-01-13 18:48       ` Ross Zwisler
2016-01-13 18:48       ` Ross Zwisler
2016-01-13 18:48       ` Ross Zwisler
2016-01-15 13:22       ` Jan Kara
2016-01-15 13:22         ` Jan Kara
2016-01-15 13:22         ` Jan Kara
2016-01-15 13:22         ` Jan Kara
2016-01-15 19:03         ` Ross Zwisler
2016-01-15 19:03           ` Ross Zwisler
2016-01-15 19:03           ` Ross Zwisler
2016-02-03 16:42         ` Ross Zwisler
2016-02-03 16:42           ` Ross Zwisler
2016-02-03 16:42           ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 5/9] mm: add find_get_entries_tag() Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 6/9] dax: add support for fsync/msync Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-12 10:57   ` Jan Kara
2016-01-12 10:57     ` Jan Kara
2016-01-12 10:57     ` Jan Kara
2016-01-13  7:30     ` Ross Zwisler
2016-01-13  7:30       ` Ross Zwisler
2016-01-13  7:30       ` Ross Zwisler
2016-01-13  9:35       ` Jan Kara
2016-01-13  9:35         ` Jan Kara
2016-01-13  9:35         ` Jan Kara
2016-01-13 18:58         ` Ross Zwisler
2016-01-13 18:58           ` Ross Zwisler
2016-01-13 18:58           ` Ross Zwisler
2016-01-15 13:10           ` Jan Kara
2016-01-15 13:10             ` Jan Kara
2016-01-15 13:10             ` Jan Kara
2016-02-06 14:33   ` Dmitry Monakhov
2016-02-06 14:33     ` Dmitry Monakhov
2016-02-06 14:33     ` Dmitry Monakhov
2016-02-06 14:33     ` Dmitry Monakhov
2016-02-08  9:44     ` Jan Kara
2016-02-08  9:44       ` Jan Kara
2016-02-08  9:44       ` Jan Kara
2016-02-08 22:06     ` Ross Zwisler
2016-02-08 22:06       ` Ross Zwisler
2016-02-08 22:06       ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 7/9] ext2: call dax_pfn_mkwrite() for DAX fsync/msync Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 8/9] ext4: " Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27 ` [PATCH v8 9/9] xfs: " Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler
2016-01-08  5:27   ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160113094411.GA17057@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.