* DAX: __dax_fault race question
@ 2016-02-08 11:23 Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov
2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler
0 siblings, 2 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2016-02-08 11:23 UTC (permalink / raw)
To: Linux Memory Management; +Cc: willy, ross.zwisler
[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]
Hi,
I try to understand locking rules for dax and realized that there is
some suspicious case in dax_fault
On __dax_fault we try to replace normal page with dax-entry
Basically dax_fault steps looks like follows
1) page = find_get_page(..)
2) lock_page_or_retry(page)
3) get_block
4) delete_from_page_cache(page)
5) unlock_page(page)
6) dax_insert_mapping(inode, &bh, vma, vmf)
...
But what protects us from other taks does new page_fault after (4) but
before (6).
AFAIU this case is not prohibited
Let's see what happens for two read/write tasks does fault inside file-hole
task_1(writer) task_2(reader)
__dax_fault(write)
->lock_page_or_retry
->delete_from_page_cache() __dax_fault(read)
->dax_load_hole
->find_or_create_page()
->new page in mapping->radix_tree
->dax_insert_mapping
->dax_radix_entry->collision: return -EIO
Before dax/fsync patch-set this race result in silent dax/page duality(which
likely result data incoherence or data corruption), Luckily now this
race result in collision on insertion to radix_tree and return -EIO.
From first glance testcase looks very simple, but I can not reproduce
this in my environment.
Imho it is reasonable pass locked page to dax_insert_mapping and let
dax_radix_entry use atomic page/dax-entry replacement similar to
replace_page_cache_page. Am I right?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert
2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov
@ 2016-02-08 13:53 ` Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov
2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler
2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler
1 sibling, 2 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2016-02-08 13:53 UTC (permalink / raw)
To: linux-mm; +Cc: willy, ross.zwisler, Dmitry Monakhov
- dax_radix_entry_insert is more appropriate name for that function
- Add lockless helper __dax_radix_entry_insert, it will be used by second patch
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/dax.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index fc2e314..89bb1f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -349,7 +349,7 @@ static int copy_user_bh(struct page *to, struct inode *inode,
#define NO_SECTOR -1
#define DAX_PMD_INDEX(page_index) (page_index & (PMD_MASK >> PAGE_CACHE_SHIFT))
-static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
+static int __dax_radix_entry_insert(struct address_space *mapping, pgoff_t index,
sector_t sector, bool pmd_entry, bool dirty)
{
struct radix_tree_root *page_tree = &mapping->page_tree;
@@ -358,10 +358,6 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
void *entry;
WARN_ON_ONCE(pmd_entry && !dirty);
- if (dirty)
- __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-
- spin_lock_irq(&mapping->tree_lock);
entry = radix_tree_lookup(page_tree, pmd_index);
if (entry && RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD) {
@@ -374,8 +370,7 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
type = RADIX_DAX_TYPE(entry);
if (WARN_ON_ONCE(type != RADIX_DAX_PTE &&
type != RADIX_DAX_PMD)) {
- error = -EIO;
- goto unlock;
+ return -EIO;
}
if (!pmd_entry || type == RADIX_DAX_PMD)
@@ -402,19 +397,31 @@ static int dax_radix_entry(struct address_space *mapping, pgoff_t index,
* pte_same() check will fail, eventually causing page fault
* to be retried by the CPU.
*/
- goto unlock;
+ return 0;
}
error = radix_tree_insert(page_tree, index,
RADIX_DAX_ENTRY(sector, pmd_entry));
if (error)
- goto unlock;
+ return error;
mapping->nrexceptional++;
- dirty:
+dirty:
if (dirty)
radix_tree_tag_set(page_tree, index, PAGECACHE_TAG_DIRTY);
- unlock:
+ return error;
+}
+
+static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index,
+ sector_t sector, bool pmd_entry, bool dirty)
+{
+ int error;
+
+ if (dirty)
+ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+
+ spin_lock_irq(&mapping->tree_lock);
+ error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty);
spin_unlock_irq(&mapping->tree_lock);
return error;
}
@@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
}
dax_unmap_atomic(bdev, &dax);
- error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false,
- vmf->flags & FAULT_FLAG_WRITE);
+ error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false,
+ vmf->flags & FAULT_FLAG_WRITE, vmf->page);
if (error)
goto out;
@@ -984,7 +991,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
* the write to insert a dirty entry.
*/
if (write) {
- error = dax_radix_entry(mapping, pgoff, dax.sector,
+ error = dax_radix_entry_insert(mapping, pgoff, dax.sector,
true, true);
if (error) {
dax_pmd_dbg(&bh, address,
@@ -1057,14 +1064,14 @@ int dax_pfn_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
struct file *file = vma->vm_file;
/*
- * We pass NO_SECTOR to dax_radix_entry() because we expect that a
+ * We pass NO_SECTOR to dax_radix_entry_insert() because we expect that a
* RADIX_DAX_PTE entry already exists in the radix tree from a
* previous call to __dax_fault(). We just want to look up that PTE
* entry using vmf->pgoff and make sure the dirty tag is set. This
* saves us from having to make a call to get_block() here to look
* up the sector.
*/
- dax_radix_entry(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true);
+ dax_radix_entry_insert(file->f_mapping, vmf->pgoff, NO_SECTOR, false, true);
return VM_FAULT_NOPAGE;
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] dax: fix race dax_fault write vs read
2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov
@ 2016-02-08 13:53 ` Dmitry Monakhov
2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler
1 sibling, 0 replies; 5+ messages in thread
From: Dmitry Monakhov @ 2016-02-08 13:53 UTC (permalink / raw)
To: linux-mm; +Cc: willy, ross.zwisler, Dmitry Monakhov
Two read/write tasks does fault inside file-hole
task_1(writer) task_2(reader)
__dax_fault(write)
->lock_page_or_retry
->delete_from_page_cache() __dax_fault(read)
->dax_load_hole
->find_or_create_page()
->new page in mapping->radix_tree
->dax_insert_mapping
->dax_radix_entry => collision
Let's move radix_tree update to dax_radix_entry_replace() where
page deletion and dax entry insertion will be protected by ->tree_lock
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
fs/dax.c | 37 +++++++++++++++++++++++++++++++------
1 file changed, 31 insertions(+), 6 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 89bb1f8..0294fc9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -424,6 +424,31 @@ static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index,
error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty);
spin_unlock_irq(&mapping->tree_lock);
return error;
+
+}
+
+static int dax_radix_entry_replace(struct address_space *mapping, pgoff_t index,
+ sector_t sector, bool pmd_entry, bool dirty,
+ struct page* old_page)
+{
+ int error;
+
+ BUG_ON(old_page && !PageLocked(old_page));
+ if (dirty)
+ __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+
+ spin_lock_irq(&mapping->tree_lock);
+ if (old_page)
+ __delete_from_page_cache(old_page, NULL);
+ error =__dax_radix_entry_insert(mapping, index, sector, pmd_entry, dirty);
+ spin_unlock_irq(&mapping->tree_lock);
+ if (old_page) {
+ if (mapping->a_ops->freepage)
+ mapping->a_ops->freepage(old_page);
+ page_cache_release(old_page);
+ }
+ return error;
+
}
static int dax_writeback_one(struct block_device *bdev,
@@ -586,7 +611,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
}
dax_unmap_atomic(bdev, &dax);
- error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false,
+ error = dax_radix_entry_replace(mapping, vmf->pgoff, dax.sector, false,
vmf->flags & FAULT_FLAG_WRITE, vmf->page);
if (error)
goto out;
@@ -711,14 +736,16 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
page = find_lock_page(mapping, vmf->pgoff);
if (page) {
+ vmf->page = page;
unmap_mapping_range(mapping, vmf->pgoff << PAGE_SHIFT,
PAGE_CACHE_SIZE, 0);
- delete_from_page_cache(page);
+ }
+ error = dax_insert_mapping(inode, &bh, vma, vmf);
+ if (page) {
unlock_page(page);
page_cache_release(page);
- page = NULL;
+ vmf->page = page = NULL;
}
-
/*
* If we successfully insert the new mapping over an unwritten extent,
* we need to ensure we convert the unwritten extent. If there is an
@@ -729,14 +756,12 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
* indicate what the callback should do via the uptodate variable, same
* as for normal BH based IO completions.
*/
- error = dax_insert_mapping(inode, &bh, vma, vmf);
if (buffer_unwritten(&bh)) {
if (complete_unwritten)
complete_unwritten(&bh, !error);
else
WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE));
}
-
out:
if (error == -ENOMEM)
return VM_FAULT_OOM | major;
--
1.8.3.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert
2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov
@ 2016-02-11 17:42 ` Ross Zwisler
1 sibling, 0 replies; 5+ messages in thread
From: Ross Zwisler @ 2016-02-11 17:42 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: linux-mm, willy, ross.zwisler
On Mon, Feb 08, 2016 at 05:53:17PM +0400, Dmitry Monakhov wrote:
> - dax_radix_entry_insert is more appropriate name for that function
I think I may have actually had it named that at some point. :) I changed it
because it doesn't always insert an entry - in the read case for example we
insert a clean entry, and then on the following dax_pfn_mkwrite() we call back
in and mark it as dirty.
> - Add lockless helper __dax_radix_entry_insert, it will be used by second patch
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
> fs/dax.c | 39 +++++++++++++++++++++++----------------
> 1 file changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index fc2e314..89bb1f8 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
<>
> @@ -579,8 +586,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
> }
> dax_unmap_atomic(bdev, &dax);
>
> - error = dax_radix_entry(mapping, vmf->pgoff, dax.sector, false,
> - vmf->flags & FAULT_FLAG_WRITE);
> + error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false,
> + vmf->flags & FAULT_FLAG_WRITE, vmf->page);
fs/dax.c: In function a??dax_insert_mappinga??:
fs/dax.c:589:10: error: too many arguments to function a??dax_radix_entry_inserta??
error = dax_radix_entry_insert(mapping, vmf->pgoff, dax.sector, false,
^
fs/dax.c:415:12: note: declared here
static int dax_radix_entry_insert(struct address_space *mapping, pgoff_t index,
^
scripts/Makefile.build:258: recipe for target 'fs/dax.o' failed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: DAX: __dax_fault race question
2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov
@ 2016-02-11 18:43 ` Ross Zwisler
1 sibling, 0 replies; 5+ messages in thread
From: Ross Zwisler @ 2016-02-11 18:43 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: Linux Memory Management, willy, ross.zwisler
On Mon, Feb 08, 2016 at 02:23:49PM +0300, Dmitry Monakhov wrote:
>
> Hi,
>
> I try to understand locking rules for dax and realized that there is
> some suspicious case in dax_fault
>
> On __dax_fault we try to replace normal page with dax-entry
> Basically dax_fault steps looks like follows
>
> 1) page = find_get_page(..)
> 2) lock_page_or_retry(page)
> 3) get_block
> 4) delete_from_page_cache(page)
> 5) unlock_page(page)
> 6) dax_insert_mapping(inode, &bh, vma, vmf)
> ...
>
> But what protects us from other taks does new page_fault after (4) but
> before (6).
> AFAIU this case is not prohibited
> Let's see what happens for two read/write tasks does fault inside file-hole
> task_1(writer) task_2(reader)
> __dax_fault(write)
> ->lock_page_or_retry
> ->delete_from_page_cache() __dax_fault(read)
> ->dax_load_hole
> ->find_or_create_page()
> ->new page in mapping->radix_tree
> ->dax_insert_mapping
> ->dax_radix_entry->collision: return -EIO
>
> Before dax/fsync patch-set this race result in silent dax/page duality(which
> likely result data incoherence or data corruption), Luckily now this
> race result in collision on insertion to radix_tree and return -EIO.
> From first glance testcase looks very simple, but I can not reproduce
> this in my environment.
>
> Imho it is reasonable pass locked page to dax_insert_mapping and let
> dax_radix_entry use atomic page/dax-entry replacement similar to
> replace_page_cache_page. Am I right?
We are trying to come up with a general locking scheme that will solve this
race as well as others.
https://lkml.org/lkml/2016/2/9/607
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-11 18:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08 11:23 DAX: __dax_fault race question Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Dmitry Monakhov
2016-02-08 13:53 ` [PATCH 2/2] dax: fix race dax_fault write vs read Dmitry Monakhov
2016-02-11 17:42 ` [PATCH 1/2] dax: rename dax_radix_entry to dax_radix_entry_insert Ross Zwisler
2016-02-11 18:43 ` DAX: __dax_fault race question Ross Zwisler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).