All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fuse: fix races related to fuse writeback
@ 2013-08-12 16:39 Maxim Patlasov
  2013-08-12 16:39 ` [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked() Maxim Patlasov
  2013-08-12 16:39 ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() Maxim Patlasov
  0 siblings, 2 replies; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-12 16:39 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, bfoster, linux-kernel, devel, xemul

Hi,

The patchset fixes a few subtle races stemmed from incorrect expectation
of what fuse_set_nowrite() guarantees. The fact that it makes fi->writectr
negative and waits for fi->writectr == FUSE_NOWRITE ensures only two things:

1) If there are any in-flight writeback requests right now, let's wait for
them being completed.
2) Suspend processing new writeback requests until fuse_release_nowrite().

Both are related to communication between in-kernel fuse and userspace
fuse daemon. But fuse_set_nowrite() does not prevent generic kernel code
from sending dirty pages to writeback resulting in fuse_writepage being
called. I.e. fi->queued_writes may grow independently on fuse_set_nowrite()
machinery.

As soon as fuse_writepage_locked() called end_page_writeback() generic
kernel code may do with the page virtually anything w/o notifying fuse. See
per-patch descriptions for details of some races.

Thanks,
Maxim

---

Maxim Patlasov (2):
      fuse: postpone end_page_writeback() in fuse_writepage_locked()
      fuse: wait for writeback in fuse_file_fallocate()


 fs/fuse/file.c |   56 ++++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 46 insertions(+), 10 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked()
  2013-08-12 16:39 [PATCH 0/2] fuse: fix races related to fuse writeback Maxim Patlasov
@ 2013-08-12 16:39 ` Maxim Patlasov
  2013-08-29 16:27   ` Miklos Szeredi
  2013-08-12 16:39 ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() Maxim Patlasov
  1 sibling, 1 reply; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-12 16:39 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, bfoster, linux-kernel, devel, xemul

The patch fixes a race between ftruncate(2), mmap-ed write and write(2):

1) An user makes a page dirty via mmap-ed write.
2) The user performs shrinking truncate(2) intended to purge the page.
3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
   writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
   the original page by end_page_writeback.
4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
   is free.
5) Ordinary write(2) extends i_size back to cover the page. Note that
   fuse_send_write_pages do wait for fuse writeback, but for another
   page->index.
6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
   fuse_send_writepage is supposed to crop inarg->size of the request,
   but it doesn't because i_size has already been extended back.

Moving end_page_writeback to the end of fuse_writepage_locked fixes the race
because now the fact that truncate_pagecache is successfully returned infers
that fuse_writepage_locked has already called end_page_writeback. And this,
in turn, infers that fuse_flush_writepages has already called
fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
could not extend it because of i_mutex held by ftruncate(2).

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 5c121fe..d1715b3 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1529,7 +1529,6 @@ static int fuse_writepage_locked(struct page *page)
 
 	inc_bdi_stat(mapping->backing_dev_info, BDI_WRITEBACK);
 	inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
-	end_page_writeback(page);
 
 	spin_lock(&fc->lock);
 	list_add(&req->writepages_entry, &fi->writepages);
@@ -1537,6 +1536,8 @@ static int fuse_writepage_locked(struct page *page)
 	fuse_flush_writepages(inode);
 	spin_unlock(&fc->lock);
 
+	end_page_writeback(page);
+
 	return 0;
 
 err_free:


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate()
  2013-08-12 16:39 [PATCH 0/2] fuse: fix races related to fuse writeback Maxim Patlasov
  2013-08-12 16:39 ` [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked() Maxim Patlasov
@ 2013-08-12 16:39 ` Maxim Patlasov
  2013-08-13 12:05   ` Brian Foster
  2013-08-16 11:30   ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2 Maxim Patlasov
  1 sibling, 2 replies; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-12 16:39 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, bfoster, linux-kernel, devel, xemul

The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):

1) An user makes a page dirty via mmap-ed write.
2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
   and <offset, size> covering the page.
3) Before truncate_pagecache_range call from fuse_file_fallocate,
   the page goes to write-back. The page is fully processed by fuse_writepage
   (including end_page_writeback on the page), but fuse_flush_writepages did
   nothing because fi->writectr < 0.
4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
   by calling fuse_release_nowrite. The latter triggers processing queued
   write-back request which will write stale date to the hole soon.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   53 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d1715b3..2b18c4b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -344,6 +344,31 @@ static bool fuse_page_is_writeback(struct inode *inode, pgoff_t index)
 	return found;
 }
 
+static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from,
+				    pgoff_t idx_to)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_req *req;
+	bool found = false;
+
+	spin_lock(&fc->lock);
+	list_for_each_entry(req, &fi->writepages, writepages_entry) {
+		pgoff_t curr_index;
+
+		BUG_ON(req->inode != inode);
+		curr_index = req->misc.write.in.offset >> PAGE_CACHE_SHIFT;
+		if (!(idx_from >= curr_index + req->num_pages ||
+		      idx_to < curr_index)) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&fc->lock);
+
+	return found;
+}
+
 /*
  * Wait for page writeback to be completed.
  *
@@ -358,6 +383,19 @@ static int fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index)
 	return 0;
 }
 
+static void fuse_wait_on_writeback(struct inode *inode, pgoff_t start,
+				   size_t bytes)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	pgoff_t idx_from, idx_to;
+
+	idx_from = start >> PAGE_CACHE_SHIFT;
+	idx_to = (start + bytes - 1) >> PAGE_CACHE_SHIFT;
+
+	wait_event(fi->page_waitq,
+		   !fuse_range_is_writeback(inode, idx_from, idx_to));
+}
+
 static int fuse_flush(struct file *file, fl_owner_t id)
 {
 	struct inode *inode = file_inode(file);
@@ -2478,8 +2516,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 
 	if (lock_inode) {
 		mutex_lock(&inode->i_mutex);
-		if (mode & FALLOC_FL_PUNCH_HOLE)
-			fuse_set_nowrite(inode);
+		if (mode & FALLOC_FL_PUNCH_HOLE) {
+			truncate_pagecache_range(inode, offset,
+						 offset + length - 1);
+			fuse_wait_on_writeback(inode, offset, length);
+		}
 	}
 
 	req = fuse_get_req_nopages(fc);
@@ -2508,17 +2549,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 	if (!(mode & FALLOC_FL_KEEP_SIZE))
 		fuse_write_update_size(inode, offset + length);
 
-	if (mode & FALLOC_FL_PUNCH_HOLE)
-		truncate_pagecache_range(inode, offset, offset + length - 1);
-
 	fuse_invalidate_attr(inode);
 
 out:
-	if (lock_inode) {
-		if (mode & FALLOC_FL_PUNCH_HOLE)
-			fuse_release_nowrite(inode);
+	if (lock_inode)
 		mutex_unlock(&inode->i_mutex);
-	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate()
  2013-08-12 16:39 ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() Maxim Patlasov
@ 2013-08-13 12:05   ` Brian Foster
  2013-08-13 12:56     ` Maxim Patlasov
  2013-08-16 11:30   ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2 Maxim Patlasov
  1 sibling, 1 reply; 18+ messages in thread
From: Brian Foster @ 2013-08-13 12:05 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: miklos, fuse-devel, linux-kernel, devel, xemul

On 08/12/2013 12:39 PM, Maxim Patlasov wrote:
> The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
> 
> 1) An user makes a page dirty via mmap-ed write.
> 2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
>    and <offset, size> covering the page.
> 3) Before truncate_pagecache_range call from fuse_file_fallocate,
>    the page goes to write-back. The page is fully processed by fuse_writepage
>    (including end_page_writeback on the page), but fuse_flush_writepages did
>    nothing because fi->writectr < 0.
> 4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
>    by calling fuse_release_nowrite. The latter triggers processing queued
>    write-back request which will write stale date to the hole soon.
> 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---

Hi Maxim,

Nice catch and description, one minor concern...

>  fs/fuse/file.c |   53 ++++++++++++++++++++++++++++++++++++++++++++---------
>  1 files changed, 44 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index d1715b3..2b18c4b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -344,6 +344,31 @@ static bool fuse_page_is_writeback(struct inode *inode, pgoff_t index)
>  	return found;
>  }
>  
> +static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from,
> +				    pgoff_t idx_to)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +	struct fuse_inode *fi = get_fuse_inode(inode);
> +	struct fuse_req *req;
> +	bool found = false;
> +
> +	spin_lock(&fc->lock);
> +	list_for_each_entry(req, &fi->writepages, writepages_entry) {
> +		pgoff_t curr_index;
> +
> +		BUG_ON(req->inode != inode);
> +		curr_index = req->misc.write.in.offset >> PAGE_CACHE_SHIFT;
> +		if (!(idx_from >= curr_index + req->num_pages ||
> +		      idx_to < curr_index)) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	spin_unlock(&fc->lock);
> +
> +	return found;
> +}
> +
>  /*
>   * Wait for page writeback to be completed.
>   *
> @@ -358,6 +383,19 @@ static int fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index)
>  	return 0;
>  }
>  
> +static void fuse_wait_on_writeback(struct inode *inode, pgoff_t start,
> +				   size_t bytes)
> +{
> +	struct fuse_inode *fi = get_fuse_inode(inode);
> +	pgoff_t idx_from, idx_to;
> +
> +	idx_from = start >> PAGE_CACHE_SHIFT;
> +	idx_to = (start + bytes - 1) >> PAGE_CACHE_SHIFT;
> +
> +	wait_event(fi->page_waitq,
> +		   !fuse_range_is_writeback(inode, idx_from, idx_to));
> +}
> +
>  static int fuse_flush(struct file *file, fl_owner_t id)
>  {
>  	struct inode *inode = file_inode(file);
> @@ -2478,8 +2516,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>  
>  	if (lock_inode) {
>  		mutex_lock(&inode->i_mutex);
> -		if (mode & FALLOC_FL_PUNCH_HOLE)
> -			fuse_set_nowrite(inode);
> +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> +			truncate_pagecache_range(inode, offset,
> +						 offset + length - 1);
> +			fuse_wait_on_writeback(inode, offset, length);
> +		}

If this happens to be the first attempt on an fs that doesn't support
fallocate, we'll return -EOPNOTSUPP after having already punched out the
data in the pagecache. What about replacing the nowrite logic with a
flush (and still followed by your new writeback wait logic) rather than
moving the pagecache truncate?

Brian

>  	}
>  
>  	req = fuse_get_req_nopages(fc);
> @@ -2508,17 +2549,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>  	if (!(mode & FALLOC_FL_KEEP_SIZE))
>  		fuse_write_update_size(inode, offset + length);
>  
> -	if (mode & FALLOC_FL_PUNCH_HOLE)
> -		truncate_pagecache_range(inode, offset, offset + length - 1);
> -
>  	fuse_invalidate_attr(inode);
>  
>  out:
> -	if (lock_inode) {
> -		if (mode & FALLOC_FL_PUNCH_HOLE)
> -			fuse_release_nowrite(inode);
> +	if (lock_inode)
>  		mutex_unlock(&inode->i_mutex);
> -	}
>  
>  	return err;
>  }
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate()
  2013-08-13 12:05   ` Brian Foster
@ 2013-08-13 12:56     ` Maxim Patlasov
  2013-08-13 13:23       ` Brian Foster
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-13 12:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: miklos, fuse-devel, linux-kernel, devel, xemul

Hi,

08/13/2013 04:05 PM, Brian Foster пишет:
> ...
> @@ -2478,8 +2516,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>   
>   	if (lock_inode) {
>   		mutex_lock(&inode->i_mutex);
> -		if (mode & FALLOC_FL_PUNCH_HOLE)
> -			fuse_set_nowrite(inode);
> +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> +			truncate_pagecache_range(inode, offset,
> +						 offset + length - 1);
> +			fuse_wait_on_writeback(inode, offset, length);
> +		}
> If this happens to be the first attempt on an fs that doesn't support
> fallocate, we'll return -EOPNOTSUPP after having already punched out the
> data in the pagecache.

Yes, this is unpleasant, but it's not critical, imo. We're returning an 
error code (even though equal to -EOPNOTSUPP) and a sane application 
should not make any assumption about current state of the punched 
region. Also, the application intended to discard given region of the 
file, so why should it pay care for its content afterwards?

> What about replacing the nowrite logic with a
> flush (and still followed by your new writeback wait logic) rather than
> moving the pagecache truncate?

The "flush" you mentioned should firstly flush page cache. 
invalidate_inode_pages2_range() seems to be a candidate. We definitely 
cannot ignore error code from it because it can be fuse_launder_page() 
who got -ENOMEM from fuse_writepage_locked(). In case of err == -ENOMEM, 
we could safely fail fallocate, but what should we do if it's -EBUSY? 
Any ideas?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate()
  2013-08-13 12:56     ` Maxim Patlasov
@ 2013-08-13 13:23       ` Brian Foster
  2013-08-13 13:45         ` Maxim Patlasov
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Foster @ 2013-08-13 13:23 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: miklos, fuse-devel, linux-kernel, devel, xemul

On 08/13/2013 08:56 AM, Maxim Patlasov wrote:
> Hi,
> 
> 08/13/2013 04:05 PM, Brian Foster пишет:
>> ...
>> @@ -2478,8 +2516,11 @@ static long fuse_file_fallocate(struct file
>> *file, int mode, loff_t offset,
>>         if (lock_inode) {
>>           mutex_lock(&inode->i_mutex);
>> -        if (mode & FALLOC_FL_PUNCH_HOLE)
>> -            fuse_set_nowrite(inode);
>> +        if (mode & FALLOC_FL_PUNCH_HOLE) {
>> +            truncate_pagecache_range(inode, offset,
>> +                         offset + length - 1);
>> +            fuse_wait_on_writeback(inode, offset, length);
>> +        }
>> If this happens to be the first attempt on an fs that doesn't support
>> fallocate, we'll return -EOPNOTSUPP after having already punched out the
>> data in the pagecache.
> 
> Yes, this is unpleasant, but it's not critical, imo. We're returning an
> error code (even though equal to -EOPNOTSUPP) and a sane application
> should not make any assumption about current state of the punched
> region. Also, the application intended to discard given region of the
> file, so why should it pay care for its content afterwards?
> 

I agree, though most users probably wouldn't expect that a blatant error
like EOPNOTSUPP leave the range in a weird state. What's more, it only
does so if it's the first attempt and behaves more appropriately after
that.

>> What about replacing the nowrite logic with a
>> flush (and still followed by your new writeback wait logic) rather than
>> moving the pagecache truncate?
> 
> The "flush" you mentioned should firstly flush page cache.
> invalidate_inode_pages2_range() seems to be a candidate. We definitely
> cannot ignore error code from it because it can be fuse_launder_page()
> who got -ENOMEM from fuse_writepage_locked(). In case of err == -ENOMEM,
> we could safely fail fallocate, but what should we do if it's -EBUSY?
> Any ideas?
> 

I was referring to something like filemap_write_and_wait_range(), for
example. Then continue to use truncate_pagecache_range() as we do today.
Thoughts?

Brian

> Thanks,
> Maxim


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate()
  2013-08-13 13:23       ` Brian Foster
@ 2013-08-13 13:45         ` Maxim Patlasov
  0 siblings, 0 replies; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-13 13:45 UTC (permalink / raw)
  To: Brian Foster; +Cc: miklos, fuse-devel, linux-kernel, devel, xemul

08/13/2013 05:23 PM, Brian Foster пишет:
> On 08/13/2013 08:56 AM, Maxim Patlasov wrote:
>> Hi,
>>
>> 08/13/2013 04:05 PM, Brian Foster пишет:
>>> ...
>>> @@ -2478,8 +2516,11 @@ static long fuse_file_fallocate(struct file
>>> *file, int mode, loff_t offset,
>>>          if (lock_inode) {
>>>            mutex_lock(&inode->i_mutex);
>>> -        if (mode & FALLOC_FL_PUNCH_HOLE)
>>> -            fuse_set_nowrite(inode);
>>> +        if (mode & FALLOC_FL_PUNCH_HOLE) {
>>> +            truncate_pagecache_range(inode, offset,
>>> +                         offset + length - 1);
>>> +            fuse_wait_on_writeback(inode, offset, length);
>>> +        }
>>> If this happens to be the first attempt on an fs that doesn't support
>>> fallocate, we'll return -EOPNOTSUPP after having already punched out the
>>> data in the pagecache.
>> Yes, this is unpleasant, but it's not critical, imo. We're returning an
>> error code (even though equal to -EOPNOTSUPP) and a sane application
>> should not make any assumption about current state of the punched
>> region. Also, the application intended to discard given region of the
>> file, so why should it pay care for its content afterwards?
>>
> I agree, though most users probably wouldn't expect that a blatant error
> like EOPNOTSUPP leave the range in a weird state. What's more, it only
> does so if it's the first attempt and behaves more appropriately after
> that.
>
>>> What about replacing the nowrite logic with a
>>> flush (and still followed by your new writeback wait logic) rather than
>>> moving the pagecache truncate?
>> The "flush" you mentioned should firstly flush page cache.
>> invalidate_inode_pages2_range() seems to be a candidate. We definitely
>> cannot ignore error code from it because it can be fuse_launder_page()
>> who got -ENOMEM from fuse_writepage_locked(). In case of err == -ENOMEM,
>> we could safely fail fallocate, but what should we do if it's -EBUSY?
>> Any ideas?
>>
> I was referring to something like filemap_write_and_wait_range(), for
> example. Then continue to use truncate_pagecache_range() as we do today.
> Thoughts

Looks nice. I'll send updated patch after some testing. Thanks a lot for 
the suggestion!

Maxim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-12 16:39 ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() Maxim Patlasov
  2013-08-13 12:05   ` Brian Foster
@ 2013-08-16 11:30   ` Maxim Patlasov
  2013-08-23 13:02     ` Brian Foster
  2013-08-29 15:41     ` Miklos Szeredi
  1 sibling, 2 replies; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-16 11:30 UTC (permalink / raw)
  Cc: miklos, fuse-devel, bfoster, xemul, linux-kernel, devel

The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):

1) An user makes a page dirty via mmap-ed write.
2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
   and <offset, size> covering the page.
3) Before truncate_pagecache_range call from fuse_file_fallocate,
   the page goes to write-back. The page is fully processed by fuse_writepage
   (including end_page_writeback on the page), but fuse_flush_writepages did
   nothing because fi->writectr < 0.
4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
   by calling fuse_release_nowrite. The latter triggers processing queued
   write-back request which will write stale data to the hole soon.

Changed in v2 (thanks to Brian for suggestion):
 - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
   we can end up in returning -ENOTSUPP while user data is already punched
   from page cache. Use filemap_write_and_wait_range() instead.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d1715b3..e88ba8b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -344,6 +344,31 @@ static bool fuse_page_is_writeback(struct inode *inode, pgoff_t index)
 	return found;
 }
 
+static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from,
+				    pgoff_t idx_to)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_req *req;
+	bool found = false;
+
+	spin_lock(&fc->lock);
+	list_for_each_entry(req, &fi->writepages, writepages_entry) {
+		pgoff_t curr_index;
+
+		BUG_ON(req->inode != inode);
+		curr_index = req->misc.write.in.offset >> PAGE_CACHE_SHIFT;
+		if (!(idx_from >= curr_index + req->num_pages ||
+		      idx_to < curr_index)) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&fc->lock);
+
+	return found;
+}
+
 /*
  * Wait for page writeback to be completed.
  *
@@ -358,6 +383,19 @@ static int fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index)
 	return 0;
 }
 
+static void fuse_wait_on_writeback(struct inode *inode, pgoff_t start,
+				   size_t bytes)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	pgoff_t idx_from, idx_to;
+
+	idx_from = start >> PAGE_CACHE_SHIFT;
+	idx_to = (start + bytes - 1) >> PAGE_CACHE_SHIFT;
+
+	wait_event(fi->page_waitq,
+		   !fuse_range_is_writeback(inode, idx_from, idx_to));
+}
+
 static int fuse_flush(struct file *file, fl_owner_t id)
 {
 	struct inode *inode = file_inode(file);
@@ -2478,8 +2516,15 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 
 	if (lock_inode) {
 		mutex_lock(&inode->i_mutex);
-		if (mode & FALLOC_FL_PUNCH_HOLE)
-			fuse_set_nowrite(inode);
+		if (mode & FALLOC_FL_PUNCH_HOLE) {
+			loff_t endbyte = offset + length - 1;
+			err = filemap_write_and_wait_range(inode->i_mapping,
+							   offset, endbyte);
+			if (err)
+				goto out;
+
+			fuse_wait_on_writeback(inode, offset, length);
+		}
 	}
 
 	req = fuse_get_req_nopages(fc);
@@ -2514,11 +2559,8 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 	fuse_invalidate_attr(inode);
 
 out:
-	if (lock_inode) {
-		if (mode & FALLOC_FL_PUNCH_HOLE)
-			fuse_release_nowrite(inode);
+	if (lock_inode)
 		mutex_unlock(&inode->i_mutex);
-	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-16 11:30   ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2 Maxim Patlasov
@ 2013-08-23 13:02     ` Brian Foster
  2013-08-29 15:41     ` Miklos Szeredi
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2013-08-23 13:02 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: miklos, fuse-devel, xemul, linux-kernel, devel

On 08/16/2013 07:30 AM, Maxim Patlasov wrote:
> The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
> 
> 1) An user makes a page dirty via mmap-ed write.
> 2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
>    and <offset, size> covering the page.
> 3) Before truncate_pagecache_range call from fuse_file_fallocate,
>    the page goes to write-back. The page is fully processed by fuse_writepage
>    (including end_page_writeback on the page), but fuse_flush_writepages did
>    nothing because fi->writectr < 0.
> 4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
>    by calling fuse_release_nowrite. The latter triggers processing queued
>    write-back request which will write stale data to the hole soon.
> 
> Changed in v2 (thanks to Brian for suggestion):
>  - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
>    we can end up in returning -ENOTSUPP while user data is already punched
>    from page cache. Use filemap_write_and_wait_range() instead.
> 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---

Looks good to me, thanks Maxim.

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/fuse/file.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 files changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index d1715b3..e88ba8b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -344,6 +344,31 @@ static bool fuse_page_is_writeback(struct inode *inode, pgoff_t index)
>  	return found;
>  }
>  
> +static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_from,
> +				    pgoff_t idx_to)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +	struct fuse_inode *fi = get_fuse_inode(inode);
> +	struct fuse_req *req;
> +	bool found = false;
> +
> +	spin_lock(&fc->lock);
> +	list_for_each_entry(req, &fi->writepages, writepages_entry) {
> +		pgoff_t curr_index;
> +
> +		BUG_ON(req->inode != inode);
> +		curr_index = req->misc.write.in.offset >> PAGE_CACHE_SHIFT;
> +		if (!(idx_from >= curr_index + req->num_pages ||
> +		      idx_to < curr_index)) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	spin_unlock(&fc->lock);
> +
> +	return found;
> +}
> +
>  /*
>   * Wait for page writeback to be completed.
>   *
> @@ -358,6 +383,19 @@ static int fuse_wait_on_page_writeback(struct inode *inode, pgoff_t index)
>  	return 0;
>  }
>  
> +static void fuse_wait_on_writeback(struct inode *inode, pgoff_t start,
> +				   size_t bytes)
> +{
> +	struct fuse_inode *fi = get_fuse_inode(inode);
> +	pgoff_t idx_from, idx_to;
> +
> +	idx_from = start >> PAGE_CACHE_SHIFT;
> +	idx_to = (start + bytes - 1) >> PAGE_CACHE_SHIFT;
> +
> +	wait_event(fi->page_waitq,
> +		   !fuse_range_is_writeback(inode, idx_from, idx_to));
> +}
> +
>  static int fuse_flush(struct file *file, fl_owner_t id)
>  {
>  	struct inode *inode = file_inode(file);
> @@ -2478,8 +2516,15 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>  
>  	if (lock_inode) {
>  		mutex_lock(&inode->i_mutex);
> -		if (mode & FALLOC_FL_PUNCH_HOLE)
> -			fuse_set_nowrite(inode);
> +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> +			loff_t endbyte = offset + length - 1;
> +			err = filemap_write_and_wait_range(inode->i_mapping,
> +							   offset, endbyte);
> +			if (err)
> +				goto out;
> +
> +			fuse_wait_on_writeback(inode, offset, length);
> +		}
>  	}
>  
>  	req = fuse_get_req_nopages(fc);
> @@ -2514,11 +2559,8 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>  	fuse_invalidate_attr(inode);
>  
>  out:
> -	if (lock_inode) {
> -		if (mode & FALLOC_FL_PUNCH_HOLE)
> -			fuse_release_nowrite(inode);
> +	if (lock_inode)
>  		mutex_unlock(&inode->i_mutex);
> -	}
>  
>  	return err;
>  }
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-16 11:30   ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2 Maxim Patlasov
  2013-08-23 13:02     ` Brian Foster
@ 2013-08-29 15:41     ` Miklos Szeredi
  2013-08-29 16:27       ` Maxim Patlasov
  1 sibling, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-08-29 15:41 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, bfoster, xemul, linux-kernel, devel

On Fri, Aug 16, 2013 at 03:30:27PM +0400, Maxim Patlasov wrote:
> The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
> 
> 1) An user makes a page dirty via mmap-ed write.
> 2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
>    and <offset, size> covering the page.
> 3) Before truncate_pagecache_range call from fuse_file_fallocate,
>    the page goes to write-back. The page is fully processed by fuse_writepage
>    (including end_page_writeback on the page), but fuse_flush_writepages did
>    nothing because fi->writectr < 0.
> 4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
>    by calling fuse_release_nowrite. The latter triggers processing queued
>    write-back request which will write stale data to the hole soon.
> 
> Changed in v2 (thanks to Brian for suggestion):
>  - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
>    we can end up in returning -ENOTSUPP while user data is already punched
>    from page cache. Use filemap_write_and_wait_range() instead.

The problem with fuse_wait_on_writeback() is starvation.  You could have the
page range continually being dirtied and written back and fallocate() livelocked
in fuse_wait_on_writeback() for ever AFAICS.

So having a barrier like FUSE_NOWRITE is good but then we need to take care of
throwing away the truncated part of the queue.  But that should be doable by
passing the truncated range explicitly to fuse_release_nowrite().

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked()
  2013-08-12 16:39 ` [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked() Maxim Patlasov
@ 2013-08-29 16:27   ` Miklos Szeredi
  0 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-08-29 16:27 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Brian Foster, Kernel Mailing List, devel, Pavel Emelianov

On Mon, Aug 12, 2013 at 6:39 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
> The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
>
> 1) An user makes a page dirty via mmap-ed write.
> 2) The user performs shrinking truncate(2) intended to purge the page.
> 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
>    writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
>    the original page by end_page_writeback.
> 4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
>    is free.
> 5) Ordinary write(2) extends i_size back to cover the page. Note that
>    fuse_send_write_pages do wait for fuse writeback, but for another
>    page->index.
> 6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
>    fuse_send_writepage is supposed to crop inarg->size of the request,
>    but it doesn't because i_size has already been extended back.
>
> Moving end_page_writeback to the end of fuse_writepage_locked fixes the race
> because now the fact that truncate_pagecache is successfully returned infers
> that fuse_writepage_locked has already called end_page_writeback. And this,
> in turn, infers that fuse_flush_writepages has already called
> fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
> could not extend it because of i_mutex held by ftruncate(2).
>
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>


Applied this, too.

Thanks,
Miklos

> ---
>  fs/fuse/file.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 5c121fe..d1715b3 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1529,7 +1529,6 @@ static int fuse_writepage_locked(struct page *page)
>
>         inc_bdi_stat(mapping->backing_dev_info, BDI_WRITEBACK);
>         inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
> -       end_page_writeback(page);
>
>         spin_lock(&fc->lock);
>         list_add(&req->writepages_entry, &fi->writepages);
> @@ -1537,6 +1536,8 @@ static int fuse_writepage_locked(struct page *page)
>         fuse_flush_writepages(inode);
>         spin_unlock(&fc->lock);
>
> +       end_page_writeback(page);
> +
>         return 0;
>
>  err_free:
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-29 15:41     ` Miklos Szeredi
@ 2013-08-29 16:27       ` Maxim Patlasov
  2013-08-29 16:37         ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-29 16:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, bfoster, xemul, linux-kernel, devel

Hi,

08/29/2013 07:41 PM, Miklos Szeredi пишет:
> On Fri, Aug 16, 2013 at 03:30:27PM +0400, Maxim Patlasov wrote:
>> The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
>>
>> 1) An user makes a page dirty via mmap-ed write.
>> 2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
>>     and <offset, size> covering the page.
>> 3) Before truncate_pagecache_range call from fuse_file_fallocate,
>>     the page goes to write-back. The page is fully processed by fuse_writepage
>>     (including end_page_writeback on the page), but fuse_flush_writepages did
>>     nothing because fi->writectr < 0.
>> 4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
>>     by calling fuse_release_nowrite. The latter triggers processing queued
>>     write-back request which will write stale data to the hole soon.
>>
>> Changed in v2 (thanks to Brian for suggestion):
>>   - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
>>     we can end up in returning -ENOTSUPP while user data is already punched
>>     from page cache. Use filemap_write_and_wait_range() instead.
> The problem with fuse_wait_on_writeback() is starvation.  You could have the
> page range continually being dirtied and written back and fallocate() livelocked
> in fuse_wait_on_writeback() for ever AFAICS.

Yes, I agree. I thought being infinitely dirtied is impossible if 
i_mutex is held, but now I understand it's not true for mmap-ed writes. 
I need to think more about it (livelock avoidance).

>
> So having a barrier like FUSE_NOWRITE is good but then we need to take care of
> throwing away the truncated part of the queue.  But that should be doable by
> passing the truncated range explicitly to fuse_release_nowrite().

Yes, I considered this approach, but splitting a fuse request into two 
in fuse_send_writepage() made me sick. What if allocation fails?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-29 16:27       ` Maxim Patlasov
@ 2013-08-29 16:37         ` Miklos Szeredi
  2013-08-29 16:41           ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-08-29 16:37 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, bfoster, xemul, linux-kernel, devel

On Thu, Aug 29, 2013 at 08:27:30PM +0400, Maxim Patlasov wrote:

> >So having a barrier like FUSE_NOWRITE is good but then we need to take care
> >of throwing away the truncated part of the queue.  But that should be doable
> >by passing the truncated range explicitly to fuse_release_nowrite().
> 
> Yes, I considered this approach, but splitting a fuse request into
> two in fuse_send_writepage() made me sick. What if allocation fails?

Heh, yeah.

I can think of a hundred ways this could be solved without needing an
allocation.  Probably none of them worth the hassle.

Or if the hole fits inside the write we could just zero out the affected pages.
Which is cheating a bit, but no one will notice ;)

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-29 16:37         ` Miklos Szeredi
@ 2013-08-29 16:41           ` Miklos Szeredi
  2013-08-30  9:13             ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-08-29 16:41 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Brian Foster, Pavel Emelianov, Kernel Mailing List, devel

BTW, isn't it enough to do the filemap_write_and_wait() *plus* the
fuse_set_nowrite()?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-29 16:41           ` Miklos Szeredi
@ 2013-08-30  9:13             ` Miklos Szeredi
  2013-08-30 11:33               ` Maxim Patlasov
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-08-30  9:13 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Brian Foster, Pavel Emelianov, Kernel Mailing List, devel

On Thu, Aug 29, 2013 at 6:41 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> BTW, isn't it enough to do the filemap_write_and_wait() *plus* the
> fuse_set_nowrite()?

Thought about it a bit and I think this should do fine.

Any writes before the fallocate will go trough before the fallocate.
i_mutex guarantees that only one instance of fuse_set_nowrite() is
running.  Any mmaped writes during the fallocate() will go after the
fallocate request and the page cache truncation and that's fine too.
Page cache is consistent since it doens't contain pages for those
writes to the hole.  Subsequent reads to that area will fill them in.

Any other concerns?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-30  9:13             ` Miklos Szeredi
@ 2013-08-30 11:33               ` Maxim Patlasov
  2013-09-11 10:12                 ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Patlasov @ 2013-08-30 11:33 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: fuse-devel, Brian Foster, Pavel Emelianov, Kernel Mailing List, devel

08/30/2013 01:13 PM, Miklos Szeredi пишет:
> On Thu, Aug 29, 2013 at 6:41 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> BTW, isn't it enough to do the filemap_write_and_wait() *plus* the
>> fuse_set_nowrite()?
> Thought about it a bit and I think this should do fine.
>
> Any writes before the fallocate will go trough before the fallocate.
> i_mutex guarantees that only one instance of fuse_set_nowrite() is
> running.  Any mmaped writes during the fallocate() will go after the
> fallocate request and the page cache truncation and that's fine too.
> Page cache is consistent since it doens't contain pages for those
> writes to the hole.  Subsequent reads to that area will fill them in.
>
> Any other concerns?

No. What you suggest looks as a neat and correct solution. I'll resend 
the updated patch after some testing (since now till Monday).

As for proof-of-correctness, all you wrote above is correct, but the 
first point had been boiling my mind for a while. I came to the 
following reasoning (hopefully it is what you meant):

The fact that filemap_write_and_wait() returned infers that 
end_page_writeback() was called for all relevant pages. And fuse doesn't 
call it before adding request to fi->queued_writes and calling 
fuse_flush_writepages(). And the latter, in turn, guarantees proper 
accounting of request in fi->writectr. Here, of course, it's crucial 
that we can't have concurrent fuse_set_nowrite(), as you explained. 
Hence, so far as fi->writectr was bumped, fuse_set_nowrite() we call 
after filemap_write_and_wait() would wait until all changes have gone to 
the server.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-08-30 11:33               ` Maxim Patlasov
@ 2013-09-11 10:12                 ` Miklos Szeredi
  2013-09-11 12:21                   ` Maxim Patlasov
  0 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-09-11 10:12 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Brian Foster, Pavel Emelianov, Kernel Mailing List, devel

On Fri, Aug 30, 2013 at 1:33 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
> 08/30/2013 01:13 PM, Miklos Szeredi пишет:
>
>> On Thu, Aug 29, 2013 at 6:41 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>>> BTW, isn't it enough to do the filemap_write_and_wait() *plus* the
>>> fuse_set_nowrite()?
>>
>> Thought about it a bit and I think this should do fine.
>>
>> Any writes before the fallocate will go trough before the fallocate.
>> i_mutex guarantees that only one instance of fuse_set_nowrite() is
>> running.  Any mmaped writes during the fallocate() will go after the
>> fallocate request and the page cache truncation and that's fine too.
>> Page cache is consistent since it doens't contain pages for those
>> writes to the hole.  Subsequent reads to that area will fill them in.
>>
>> Any other concerns?
>
>
> No. What you suggest looks as a neat and correct solution. I'll resend the
> updated patch after some testing (since now till Monday).
>
> As for proof-of-correctness, all you wrote above is correct, but the first
> point had been boiling my mind for a while. I came to the following
> reasoning (hopefully it is what you meant):
>
> The fact that filemap_write_and_wait() returned infers that
> end_page_writeback() was called for all relevant pages. And fuse doesn't
> call it before adding request to fi->queued_writes and calling
> fuse_flush_writepages(). And the latter, in turn, guarantees proper
> accounting of request in fi->writectr. Here, of course, it's crucial that we
> can't have concurrent fuse_set_nowrite(), as you explained. Hence, so far as
> fi->writectr was bumped, fuse_set_nowrite() we call after
> filemap_write_and_wait() would wait until all changes have gone to the
> server.

Any news about this?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2
  2013-09-11 10:12                 ` Miklos Szeredi
@ 2013-09-11 12:21                   ` Maxim Patlasov
  0 siblings, 0 replies; 18+ messages in thread
From: Maxim Patlasov @ 2013-09-11 12:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: fuse-devel, Brian Foster, Pavel Emelianov, Kernel Mailing List, devel

On 09/11/2013 02:12 PM, Miklos Szeredi wrote:
> On Fri, Aug 30, 2013 at 1:33 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>> 08/30/2013 01:13 PM, Miklos Szeredi пишет:
>>
>>> On Thu, Aug 29, 2013 at 6:41 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>> BTW, isn't it enough to do the filemap_write_and_wait() *plus* the
>>>> fuse_set_nowrite()?
>>> Thought about it a bit and I think this should do fine.
>>>
>>> Any writes before the fallocate will go trough before the fallocate.
>>> i_mutex guarantees that only one instance of fuse_set_nowrite() is
>>> running.  Any mmaped writes during the fallocate() will go after the
>>> fallocate request and the page cache truncation and that's fine too.
>>> Page cache is consistent since it doens't contain pages for those
>>> writes to the hole.  Subsequent reads to that area will fill them in.
>>>
>>> Any other concerns?
>>
>> No. What you suggest looks as a neat and correct solution. I'll resend the
>> updated patch after some testing (since now till Monday).
>>
>> As for proof-of-correctness, all you wrote above is correct, but the first
>> point had been boiling my mind for a while. I came to the following
>> reasoning (hopefully it is what you meant):
>>
>> The fact that filemap_write_and_wait() returned infers that
>> end_page_writeback() was called for all relevant pages. And fuse doesn't
>> call it before adding request to fi->queued_writes and calling
>> fuse_flush_writepages(). And the latter, in turn, guarantees proper
>> accounting of request in fi->writectr. Here, of course, it's crucial that we
>> can't have concurrent fuse_set_nowrite(), as you explained. Hence, so far as
>> fi->writectr was bumped, fuse_set_nowrite() we call after
>> filemap_write_and_wait() would wait until all changes have gone to the
>> server.
> Any news about this?

Testing updated patch revealed a problem (fsx caught data corruption). 
Then I instrumented debug version to get a cue. The debug version 
survived several days of testing, but now I discovered that that test 
setup was not fully correct. I'll re-run it now.

Thanks,
Maxim


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-09-11 12:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-12 16:39 [PATCH 0/2] fuse: fix races related to fuse writeback Maxim Patlasov
2013-08-12 16:39 ` [PATCH 1/2] fuse: postpone end_page_writeback() in fuse_writepage_locked() Maxim Patlasov
2013-08-29 16:27   ` Miklos Szeredi
2013-08-12 16:39 ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() Maxim Patlasov
2013-08-13 12:05   ` Brian Foster
2013-08-13 12:56     ` Maxim Patlasov
2013-08-13 13:23       ` Brian Foster
2013-08-13 13:45         ` Maxim Patlasov
2013-08-16 11:30   ` [PATCH 2/2] fuse: wait for writeback in fuse_file_fallocate() -v2 Maxim Patlasov
2013-08-23 13:02     ` Brian Foster
2013-08-29 15:41     ` Miklos Szeredi
2013-08-29 16:27       ` Maxim Patlasov
2013-08-29 16:37         ` Miklos Szeredi
2013-08-29 16:41           ` Miklos Szeredi
2013-08-30  9:13             ` Miklos Szeredi
2013-08-30 11:33               ` Maxim Patlasov
2013-09-11 10:12                 ` Miklos Szeredi
2013-09-11 12:21                   ` Maxim Patlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.