ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
@ 2021-04-26 22:05 Junxiao Bi
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Junxiao Bi @ 2021-04-26 22:05 UTC (permalink / raw)
  To: ocfs2-devel, cluster-devel, linux-fsdevel

When doing truncate/fallocate for some filesytem like ocfs2, it
will zero some pages that are out of inode size and then later
update the inode size, so it needs this api to writeback eof
pages.

Cc: <stable@vger.kernel.org>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/buffer.c                 | 14 +++++++++++---
 include/linux/buffer_head.h |  3 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0cb7ffd4977c..802f0bacdbde 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1709,9 +1709,9 @@ static struct buffer_head *create_page_buffers(struct page *page, struct inode *
  * WB_SYNC_ALL, the writes are posted using REQ_SYNC; this
  * causes the writes to be flagged as synchronous writes.
  */
-int __block_write_full_page(struct inode *inode, struct page *page,
+int __block_write_full_page_eof(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
-			bh_end_io_t *handler)
+			bh_end_io_t *handler, bool eof_write)
 {
 	int err;
 	sector_t block;
@@ -1746,7 +1746,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	 * handle any aliases from the underlying blockdev's mapping.
 	 */
 	do {
-		if (block > last_block) {
+		if (block > last_block && !eof_write) {
 			/*
 			 * mapped buffers outside i_size will occur, because
 			 * this page can be outside i_size when there is a
@@ -1871,6 +1871,14 @@ int __block_write_full_page(struct inode *inode, struct page *page,
 	unlock_page(page);
 	goto done;
 }
+EXPORT_SYMBOL(__block_write_full_page_eof);
+
+int __block_write_full_page(struct inode *inode, struct page *page,
+			get_block_t *get_block, struct writeback_control *wbc,
+			bh_end_io_t *handler)
+{
+	return __block_write_full_page_eof(inode, page, get_block, wbc, handler, false);
+}
 EXPORT_SYMBOL(__block_write_full_page);
 
 /*
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 6b47f94378c5..5da15a1ba15c 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -221,6 +221,9 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
 int __block_write_full_page(struct inode *inode, struct page *page,
 			get_block_t *get_block, struct writeback_control *wbc,
 			bh_end_io_t *handler);
+int __block_write_full_page_eof(struct inode *inode, struct page *page,
+			get_block_t *get_block, struct writeback_control *wbc,
+			bh_end_io_t *handler, bool eof_write);
 int block_read_full_page(struct page*, get_block_t*);
 int block_is_partially_uptodate(struct page *page, unsigned long from,
 				unsigned long count);
-- 
2.24.3 (Apple Git-128)


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size
  2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
@ 2021-04-26 22:05 ` Junxiao Bi
  2021-04-28 16:00   ` Junxiao Bi
  2021-04-29 13:09   ` Joseph Qi
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Junxiao Bi @ 2021-04-26 22:05 UTC (permalink / raw)
  To: ocfs2-devel, cluster-devel, linux-fsdevel

When fallocate/truncate extend inode size, if the original isize is in
the middle of last cluster, then the part from isize to the end of the
cluster needs to be zeroed with buffer write, at that time isize is not
yet updated to match the new size, if writeback is kicked in, it will
invoke ocfs2_writepage()->block_write_full_page() where the pages out
of inode size will be dropped. That will cause file corruption.

Running the following command with qemu-image 4.2.1 can get a corrupted
coverted image file easily.

    qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
             -O qcow2 -o compat=1.1 $qcow_image.conv

Cc: <stable@vger.kernel.org>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/aops.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index ad20403b383f..7a3e3d59f6a9 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac)
  */
 static int ocfs2_writepage(struct page *page, struct writeback_control *wbc)
 {
+	struct inode * const inode = page->mapping->host;
+	loff_t i_size = i_size_read(inode);
+	const pgoff_t end_index = i_size >> PAGE_SHIFT;
+	unsigned int offset;
+
 	trace_ocfs2_writepage(
 		(unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno,
 		page->index);
 
-	return block_write_full_page(page, ocfs2_get_block, wbc);
+	/*
+	 * The page straddles i_size.  It must be zeroed out on each and every
+	 * writepage invocation because it may be mmapped.  "A file is mapped
+	 * in multiples of the page size.  For a file that is not a multiple of
+	 * the  page size, the remaining memory is zeroed when mapped, and
+	 * writes to that region are not written out to the file."
+	 */
+	offset = i_size & (PAGE_SIZE-1);
+	if (page->index == end_index && offset)
+		zero_user_segment(page, offset, PAGE_SIZE);
+
+	return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc,
+			end_buffer_async_write, true);
 }
 
 /* Taken from ext3. We don't necessarily need the full blown
-- 
2.24.3 (Apple Git-128)


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback
  2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
@ 2021-04-26 22:05 ` Junxiao Bi
  2021-04-28 16:02   ` Junxiao Bi
  2021-04-29 11:58 ` [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Joseph Qi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-04-26 22:05 UTC (permalink / raw)
  To: ocfs2-devel, cluster-devel, linux-fsdevel

Dirty flag of buffers out of inode size will be cleared and will not
be writeback.

Cc: <stable@vger.kernel.org>
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/gfs2/aops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index cc4f987687f3..cd8a87555b3a 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -133,8 +133,8 @@ static int gfs2_write_jdata_page(struct page *page,
 	if (page->index == end_index && offset)
 		zero_user_segment(page, offset, PAGE_SIZE);
 
-	return __block_write_full_page(inode, page, gfs2_get_block_noalloc, wbc,
-				       end_buffer_async_write);
+	return __block_write_full_page_eof(inode, page, gfs2_get_block_noalloc, wbc,
+				       end_buffer_async_write, true);
 }
 
 /**
-- 
2.24.3 (Apple Git-128)


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
@ 2021-04-28 16:00   ` Junxiao Bi
  2021-04-29 13:09   ` Joseph Qi
  1 sibling, 0 replies; 20+ messages in thread
From: Junxiao Bi @ 2021-04-28 16:00 UTC (permalink / raw)
  To: ocfs2-devel, cluster-devel, linux-fsdevel

Hi Joseph,

Can you help review the first two patches?

Thanks,

Junxiao.

On 4/26/21 3:05 PM, Junxiao Bi wrote:
> When fallocate/truncate extend inode size, if the original isize is in
> the middle of last cluster, then the part from isize to the end of the
> cluster needs to be zeroed with buffer write, at that time isize is not
> yet updated to match the new size, if writeback is kicked in, it will
> invoke ocfs2_writepage()->block_write_full_page() where the pages out
> of inode size will be dropped. That will cause file corruption.
>
> Running the following command with qemu-image 4.2.1 can get a corrupted
> coverted image file easily.
>
>      qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
>               -O qcow2 -o compat=1.1 $qcow_image.conv
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>   fs/ocfs2/aops.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index ad20403b383f..7a3e3d59f6a9 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac)
>    */
>   static int ocfs2_writepage(struct page *page, struct writeback_control *wbc)
>   {
> +	struct inode * const inode = page->mapping->host;
> +	loff_t i_size = i_size_read(inode);
> +	const pgoff_t end_index = i_size >> PAGE_SHIFT;
> +	unsigned int offset;
> +
>   	trace_ocfs2_writepage(
>   		(unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno,
>   		page->index);
>   
> -	return block_write_full_page(page, ocfs2_get_block, wbc);
> +	/*
> +	 * The page straddles i_size.  It must be zeroed out on each and every
> +	 * writepage invocation because it may be mmapped.  "A file is mapped
> +	 * in multiples of the page size.  For a file that is not a multiple of
> +	 * the  page size, the remaining memory is zeroed when mapped, and
> +	 * writes to that region are not written out to the file."
> +	 */
> +	offset = i_size & (PAGE_SIZE-1);
> +	if (page->index == end_index && offset)
> +		zero_user_segment(page, offset, PAGE_SIZE);
> +
> +	return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc,
> +			end_buffer_async_write, true);
>   }
>   
>   /* Taken from ext3. We don't necessarily need the full blown

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
@ 2021-04-28 16:02   ` Junxiao Bi
  0 siblings, 0 replies; 20+ messages in thread
From: Junxiao Bi @ 2021-04-28 16:02 UTC (permalink / raw)
  To: ocfs2-devel, cluster-devel, linux-fsdevel; +Cc: rpeterso, agruenba

Hi Bob & Andreas,

Can you help review this patch?

Thanks,

Junxiao.

On 4/26/21 3:05 PM, Junxiao Bi wrote:
> Dirty flag of buffers out of inode size will be cleared and will not
> be writeback.
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>   fs/gfs2/aops.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index cc4f987687f3..cd8a87555b3a 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -133,8 +133,8 @@ static int gfs2_write_jdata_page(struct page *page,
>   	if (page->index == end_index && offset)
>   		zero_user_segment(page, offset, PAGE_SIZE);
>   
> -	return __block_write_full_page(inode, page, gfs2_get_block_noalloc, wbc,
> -				       end_buffer_async_write);
> +	return __block_write_full_page_eof(inode, page, gfs2_get_block_noalloc, wbc,
> +				       end_buffer_async_write, true);
>   }
>   
>   /**

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
@ 2021-04-29 11:58 ` Joseph Qi
  2021-04-29 17:14 ` [Ocfs2-devel] [Cluster-devel] " Andreas Gruenbacher
  2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
  4 siblings, 0 replies; 20+ messages in thread
From: Joseph Qi @ 2021-04-29 11:58 UTC (permalink / raw)
  To: Junxiao Bi, ocfs2-devel, cluster-devel, linux-fsdevel, akpm



On 4/27/21 6:05 AM, Junxiao Bi wrote:
> When doing truncate/fallocate for some filesytem like ocfs2, it
> will zero some pages that are out of inode size and then later
> update the inode size, so it needs this api to writeback eof
> pages.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>


Looks good.
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
>  fs/buffer.c                 | 14 +++++++++++---
>  include/linux/buffer_head.h |  3 +++
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0cb7ffd4977c..802f0bacdbde 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1709,9 +1709,9 @@ static struct buffer_head *create_page_buffers(struct page *page, struct inode *
>   * WB_SYNC_ALL, the writes are posted using REQ_SYNC; this
>   * causes the writes to be flagged as synchronous writes.
>   */
> -int __block_write_full_page(struct inode *inode, struct page *page,
> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
>  			get_block_t *get_block, struct writeback_control *wbc,
> -			bh_end_io_t *handler)
> +			bh_end_io_t *handler, bool eof_write)
>  {
>  	int err;
>  	sector_t block;
> @@ -1746,7 +1746,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>  	 * handle any aliases from the underlying blockdev's mapping.
>  	 */
>  	do {
> -		if (block > last_block) {
> +		if (block > last_block && !eof_write) {
>  			/*
>  			 * mapped buffers outside i_size will occur, because
>  			 * this page can be outside i_size when there is a
> @@ -1871,6 +1871,14 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>  	unlock_page(page);
>  	goto done;
>  }
> +EXPORT_SYMBOL(__block_write_full_page_eof);
> +
> +int __block_write_full_page(struct inode *inode, struct page *page,
> +			get_block_t *get_block, struct writeback_control *wbc,
> +			bh_end_io_t *handler)
> +{
> +	return __block_write_full_page_eof(inode, page, get_block, wbc, handler, false);
> +}
>  EXPORT_SYMBOL(__block_write_full_page);
>  
>  /*
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 6b47f94378c5..5da15a1ba15c 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -221,6 +221,9 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
>  int __block_write_full_page(struct inode *inode, struct page *page,
>  			get_block_t *get_block, struct writeback_control *wbc,
>  			bh_end_io_t *handler);
> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
> +			get_block_t *get_block, struct writeback_control *wbc,
> +			bh_end_io_t *handler, bool eof_write);
>  int block_read_full_page(struct page*, get_block_t*);
>  int block_is_partially_uptodate(struct page *page, unsigned long from,
>  				unsigned long count);
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size
  2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
  2021-04-28 16:00   ` Junxiao Bi
@ 2021-04-29 13:09   ` Joseph Qi
  1 sibling, 0 replies; 20+ messages in thread
From: Joseph Qi @ 2021-04-29 13:09 UTC (permalink / raw)
  To: Junxiao Bi, ocfs2-devel, cluster-devel, linux-fsdevel, akpm



On 4/27/21 6:05 AM, Junxiao Bi wrote:
> When fallocate/truncate extend inode size, if the original isize is in
> the middle of last cluster, then the part from isize to the end of the
> cluster needs to be zeroed with buffer write, at that time isize is not
> yet updated to match the new size, if writeback is kicked in, it will
> invoke ocfs2_writepage()->block_write_full_page() where the pages out
> of inode size will be dropped. That will cause file corruption.
> 
> Running the following command with qemu-image 4.2.1 can get a corrupted
> coverted image file easily.
> 
>     qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
>              -O qcow2 -o compat=1.1 $qcow_image.conv
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
>  fs/ocfs2/aops.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index ad20403b383f..7a3e3d59f6a9 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -402,11 +402,28 @@ static void ocfs2_readahead(struct readahead_control *rac)
>   */
>  static int ocfs2_writepage(struct page *page, struct writeback_control *wbc)
>  {
> +	struct inode * const inode = page->mapping->host;
> +	loff_t i_size = i_size_read(inode);
> +	const pgoff_t end_index = i_size >> PAGE_SHIFT;
> +	unsigned int offset;
> +
>  	trace_ocfs2_writepage(
>  		(unsigned long long)OCFS2_I(page->mapping->host)->ip_blkno,
>  		page->index);
>  
> -	return block_write_full_page(page, ocfs2_get_block, wbc);
> +	/*
> +	 * The page straddles i_size.  It must be zeroed out on each and every
> +	 * writepage invocation because it may be mmapped.  "A file is mapped
> +	 * in multiples of the page size.  For a file that is not a multiple of
> +	 * the  page size, the remaining memory is zeroed when mapped, and
> +	 * writes to that region are not written out to the file."
> +	 */
> +	offset = i_size & (PAGE_SIZE-1);
> +	if (page->index == end_index && offset)
> +		zero_user_segment(page, offset, PAGE_SIZE);
> +
> +	return __block_write_full_page_eof(inode, page, ocfs2_get_block, wbc,
> +			end_buffer_async_write, true);
>  }
>  
>  /* Taken from ext3. We don't necessarily need the full blown
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
                   ` (2 preceding siblings ...)
  2021-04-29 11:58 ` [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Joseph Qi
@ 2021-04-29 17:14 ` Andreas Gruenbacher
  2021-04-29 18:07   ` Junxiao Bi
  2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
  4 siblings, 1 reply; 20+ messages in thread
From: Andreas Gruenbacher @ 2021-04-29 17:14 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: cluster-devel, linux-fsdevel, Jan Kara, ocfs2-devel

Junxiao,

On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> When doing truncate/fallocate for some filesytem like ocfs2, it
> will zero some pages that are out of inode size and then later
> update the inode size, so it needs this api to writeback eof
> pages.

is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
cache filling races" patch set [*]? It doesn't look like the kind of
patch Christoph would be happy with.

Thanks,
Andreas

[*] https://lore.kernel.org/linux-fsdevel/20210423171010.12-1-jack@suse.cz/

> Cc: <stable@vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>  fs/buffer.c                 | 14 +++++++++++---
>  include/linux/buffer_head.h |  3 +++
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 0cb7ffd4977c..802f0bacdbde 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1709,9 +1709,9 @@ static struct buffer_head *create_page_buffers(struct page *page, struct inode *
>   * WB_SYNC_ALL, the writes are posted using REQ_SYNC; this
>   * causes the writes to be flagged as synchronous writes.
>   */
> -int __block_write_full_page(struct inode *inode, struct page *page,
> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
>                         get_block_t *get_block, struct writeback_control *wbc,
> -                       bh_end_io_t *handler)
> +                       bh_end_io_t *handler, bool eof_write)
>  {
>         int err;
>         sector_t block;
> @@ -1746,7 +1746,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>          * handle any aliases from the underlying blockdev's mapping.
>          */
>         do {
> -               if (block > last_block) {
> +               if (block > last_block && !eof_write) {
>                         /*
>                          * mapped buffers outside i_size will occur, because
>                          * this page can be outside i_size when there is a
> @@ -1871,6 +1871,14 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>         unlock_page(page);
>         goto done;
>  }
> +EXPORT_SYMBOL(__block_write_full_page_eof);
> +
> +int __block_write_full_page(struct inode *inode, struct page *page,
> +                       get_block_t *get_block, struct writeback_control *wbc,
> +                       bh_end_io_t *handler)
> +{
> +       return __block_write_full_page_eof(inode, page, get_block, wbc, handler, false);
> +}
>  EXPORT_SYMBOL(__block_write_full_page);
>
>  /*
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 6b47f94378c5..5da15a1ba15c 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -221,6 +221,9 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
>  int __block_write_full_page(struct inode *inode, struct page *page,
>                         get_block_t *get_block, struct writeback_control *wbc,
>                         bh_end_io_t *handler);
> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
> +                       get_block_t *get_block, struct writeback_control *wbc,
> +                       bh_end_io_t *handler, bool eof_write);
>  int block_read_full_page(struct page*, get_block_t*);
>  int block_is_partially_uptodate(struct page *page, unsigned long from,
>                                 unsigned long count);
> --
> 2.24.3 (Apple Git-128)
>


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-29 17:14 ` [Ocfs2-devel] [Cluster-devel] " Andreas Gruenbacher
@ 2021-04-29 18:07   ` Junxiao Bi
  2021-04-30 12:47     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-04-29 18:07 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: cluster-devel, linux-fsdevel, Jan Kara, ocfs2-devel

On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:

> Junxiao,
>
> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>> When doing truncate/fallocate for some filesytem like ocfs2, it
>> will zero some pages that are out of inode size and then later
>> update the inode size, so it needs this api to writeback eof
>> pages.
> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> cache filling races" patch set [*]? It doesn't look like the kind of
> patch Christoph would be happy with.

Thank you for pointing the patch set. I think that is fixing a different 
issue.

The issue here is when extending file size with fallocate/truncate, if 
the original inode size

is in the middle of the last cluster block(1M), eof part will be zeroed 
with buffer write first,

and then new inode size is updated, so there is a window that dirty 
pages is out of inode size,

if writeback is kicked in, block_write_full_page will drop all those eof 
pages.

I guess gfs2 has the similar issue?

I think it would be good to provide an api that allowed eof write back. 
If this is not good,

do you have any advise how to improve/fix it?

Thanks,

Junxiao.


>
> Thanks,
> Andreas
>
> [*] https://lore.kernel.org/linux-fsdevel/20210423171010.12-1-jack@suse.cz/
>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> ---
>>   fs/buffer.c                 | 14 +++++++++++---
>>   include/linux/buffer_head.h |  3 +++
>>   2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 0cb7ffd4977c..802f0bacdbde 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1709,9 +1709,9 @@ static struct buffer_head *create_page_buffers(struct page *page, struct inode *
>>    * WB_SYNC_ALL, the writes are posted using REQ_SYNC; this
>>    * causes the writes to be flagged as synchronous writes.
>>    */
>> -int __block_write_full_page(struct inode *inode, struct page *page,
>> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
>>                          get_block_t *get_block, struct writeback_control *wbc,
>> -                       bh_end_io_t *handler)
>> +                       bh_end_io_t *handler, bool eof_write)
>>   {
>>          int err;
>>          sector_t block;
>> @@ -1746,7 +1746,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>>           * handle any aliases from the underlying blockdev's mapping.
>>           */
>>          do {
>> -               if (block > last_block) {
>> +               if (block > last_block && !eof_write) {
>>                          /*
>>                           * mapped buffers outside i_size will occur, because
>>                           * this page can be outside i_size when there is a
>> @@ -1871,6 +1871,14 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>>          unlock_page(page);
>>          goto done;
>>   }
>> +EXPORT_SYMBOL(__block_write_full_page_eof);
>> +
>> +int __block_write_full_page(struct inode *inode, struct page *page,
>> +                       get_block_t *get_block, struct writeback_control *wbc,
>> +                       bh_end_io_t *handler)
>> +{
>> +       return __block_write_full_page_eof(inode, page, get_block, wbc, handler, false);
>> +}
>>   EXPORT_SYMBOL(__block_write_full_page);
>>
>>   /*
>> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
>> index 6b47f94378c5..5da15a1ba15c 100644
>> --- a/include/linux/buffer_head.h
>> +++ b/include/linux/buffer_head.h
>> @@ -221,6 +221,9 @@ int block_write_full_page(struct page *page, get_block_t *get_block,
>>   int __block_write_full_page(struct inode *inode, struct page *page,
>>                          get_block_t *get_block, struct writeback_control *wbc,
>>                          bh_end_io_t *handler);
>> +int __block_write_full_page_eof(struct inode *inode, struct page *page,
>> +                       get_block_t *get_block, struct writeback_control *wbc,
>> +                       bh_end_io_t *handler, bool eof_write);
>>   int block_read_full_page(struct page*, get_block_t*);
>>   int block_is_partially_uptodate(struct page *page, unsigned long from,
>>                                  unsigned long count);
>> --
>> 2.24.3 (Apple Git-128)
>>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-29 18:07   ` Junxiao Bi
@ 2021-04-30 12:47     ` Jan Kara
  2021-04-30 21:18       ` Junxiao Bi
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2021-04-30 12:47 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Jan Kara, Andreas Gruenbacher, cluster-devel, linux-fsdevel, ocfs2-devel

On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
> > On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > > When doing truncate/fallocate for some filesytem like ocfs2, it
> > > will zero some pages that are out of inode size and then later
> > > update the inode size, so it needs this api to writeback eof
> > > pages.
> > is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> > cache filling races" patch set [*]? It doesn't look like the kind of
> > patch Christoph would be happy with.
> 
> Thank you for pointing the patch set. I think that is fixing a different
> issue.
> 
> The issue here is when extending file size with fallocate/truncate, if the
> original inode size
> 
> is in the middle of the last cluster block(1M), eof part will be zeroed with
> buffer write first,
> 
> and then new inode size is updated, so there is a window that dirty pages is
> out of inode size,
> 
> if writeback is kicked in, block_write_full_page will drop all those eof
> pages.

I agree that the buffers describing part of the cluster beyond i_size won't
be written. But page cache will remain zeroed out so that is fine. So you
only need to zero out the on disk contents. Since this is actually
physically contiguous range of blocks why don't you just use
sb_issue_zeroout() to zero out the tail of the cluster? It will be more
efficient than going through the page cache and you also won't have to
tweak block_write_full_page()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-30 12:47     ` Jan Kara
@ 2021-04-30 21:18       ` Junxiao Bi
  2021-05-03 10:29         ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-04-30 21:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: cluster-devel, linux-fsdevel, Andreas Gruenbacher, ocfs2-devel

On 4/30/21 5:47 AM, Jan Kara wrote:

> On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
>> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
>>> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>> When doing truncate/fallocate for some filesytem like ocfs2, it
>>>> will zero some pages that are out of inode size and then later
>>>> update the inode size, so it needs this api to writeback eof
>>>> pages.
>>> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
>>> cache filling races" patch set [*]? It doesn't look like the kind of
>>> patch Christoph would be happy with.
>> Thank you for pointing the patch set. I think that is fixing a different
>> issue.
>>
>> The issue here is when extending file size with fallocate/truncate, if the
>> original inode size
>>
>> is in the middle of the last cluster block(1M), eof part will be zeroed with
>> buffer write first,
>>
>> and then new inode size is updated, so there is a window that dirty pages is
>> out of inode size,
>>
>> if writeback is kicked in, block_write_full_page will drop all those eof
>> pages.
> I agree that the buffers describing part of the cluster beyond i_size won't
> be written. But page cache will remain zeroed out so that is fine. So you
> only need to zero out the on disk contents. Since this is actually
> physically contiguous range of blocks why don't you just use
> sb_issue_zeroout() to zero out the tail of the cluster? It will be more
> efficient than going through the page cache and you also won't have to
> tweak block_write_full_page()...

Thanks for the review.

The physical blocks to be zeroed were continuous only when sparse mode 
is enabled, if sparse mode is disabled, unwritten extent was not 
supported for ocfs2, then all the blocks to the new size will be zeroed 
by the buffer write, since sb_issue_zeroout() will need waiting io done, 
there will be a lot of delay when extending file size. Use writeback to 
flush async seemed more efficient?

Thanks,

Junxiao.

>
> 								Honza

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-30 21:18       ` Junxiao Bi
@ 2021-05-03 10:29         ` Jan Kara
  2021-05-03 17:25           ` Junxiao Bi
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2021-05-03 10:29 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Jan Kara, Andreas Gruenbacher, cluster-devel, linux-fsdevel, ocfs2-devel

On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
> On 4/30/21 5:47 AM, Jan Kara wrote:
> 
> > On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
> > > On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
> > > > On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > > > > When doing truncate/fallocate for some filesytem like ocfs2, it
> > > > > will zero some pages that are out of inode size and then later
> > > > > update the inode size, so it needs this api to writeback eof
> > > > > pages.
> > > > is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> > > > cache filling races" patch set [*]? It doesn't look like the kind of
> > > > patch Christoph would be happy with.
> > > Thank you for pointing the patch set. I think that is fixing a different
> > > issue.
> > > 
> > > The issue here is when extending file size with fallocate/truncate, if the
> > > original inode size
> > > 
> > > is in the middle of the last cluster block(1M), eof part will be zeroed with
> > > buffer write first,
> > > 
> > > and then new inode size is updated, so there is a window that dirty pages is
> > > out of inode size,
> > > 
> > > if writeback is kicked in, block_write_full_page will drop all those eof
> > > pages.
> > I agree that the buffers describing part of the cluster beyond i_size won't
> > be written. But page cache will remain zeroed out so that is fine. So you
> > only need to zero out the on disk contents. Since this is actually
> > physically contiguous range of blocks why don't you just use
> > sb_issue_zeroout() to zero out the tail of the cluster? It will be more
> > efficient than going through the page cache and you also won't have to
> > tweak block_write_full_page()...
> 
> Thanks for the review.
> 
> The physical blocks to be zeroed were continuous only when sparse mode is
> enabled, if sparse mode is disabled, unwritten extent was not supported for
> ocfs2, then all the blocks to the new size will be zeroed by the buffer
> write, since sb_issue_zeroout() will need waiting io done, there will be a
> lot of delay when extending file size. Use writeback to flush async seemed
> more efficient?

It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
flash disks as well) do support WRITE_ZERO command so you don't actually
have to write all those zeros. The storage will just internally mark all
those blocks as having zeros. This is rather fast so I'd expect the overall
result to be faster that zeroing page cache and then writing all those
pages with zeroes on transaction commit. But I agree that for lower end
storage this may be slower because of synchronous writing of zeroes. That
being said your transaction commit has to write those zeroes anyway so the
cost is only mostly shifted but it could still make a difference for some
workloads. Not sure if that matters, that is your call I'd say.

Also note that you could submit those zeroing bios asynchronously but that
would be more coding and you need to make sure they are completed on
transaction commit so probably it isn't worth the complexity.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-03 10:29         ` Jan Kara
@ 2021-05-03 17:25           ` Junxiao Bi
  2021-05-04  9:02             ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-05-03 17:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: cluster-devel, linux-fsdevel, Andreas Gruenbacher, ocfs2-devel


On 5/3/21 3:29 AM, Jan Kara wrote:
> On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
>> On 4/30/21 5:47 AM, Jan Kara wrote:
>>
>>> On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
>>>> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
>>>>> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>>>> When doing truncate/fallocate for some filesytem like ocfs2, it
>>>>>> will zero some pages that are out of inode size and then later
>>>>>> update the inode size, so it needs this api to writeback eof
>>>>>> pages.
>>>>> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
>>>>> cache filling races" patch set [*]? It doesn't look like the kind of
>>>>> patch Christoph would be happy with.
>>>> Thank you for pointing the patch set. I think that is fixing a different
>>>> issue.
>>>>
>>>> The issue here is when extending file size with fallocate/truncate, if the
>>>> original inode size
>>>>
>>>> is in the middle of the last cluster block(1M), eof part will be zeroed with
>>>> buffer write first,
>>>>
>>>> and then new inode size is updated, so there is a window that dirty pages is
>>>> out of inode size,
>>>>
>>>> if writeback is kicked in, block_write_full_page will drop all those eof
>>>> pages.
>>> I agree that the buffers describing part of the cluster beyond i_size won't
>>> be written. But page cache will remain zeroed out so that is fine. So you
>>> only need to zero out the on disk contents. Since this is actually
>>> physically contiguous range of blocks why don't you just use
>>> sb_issue_zeroout() to zero out the tail of the cluster? It will be more
>>> efficient than going through the page cache and you also won't have to
>>> tweak block_write_full_page()...
>> Thanks for the review.
>>
>> The physical blocks to be zeroed were continuous only when sparse mode is
>> enabled, if sparse mode is disabled, unwritten extent was not supported for
>> ocfs2, then all the blocks to the new size will be zeroed by the buffer
>> write, since sb_issue_zeroout() will need waiting io done, there will be a
>> lot of delay when extending file size. Use writeback to flush async seemed
>> more efficient?
> It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
> flash disks as well) do support WRITE_ZERO command so you don't actually
> have to write all those zeros. The storage will just internally mark all
> those blocks as having zeros. This is rather fast so I'd expect the overall
> result to be faster that zeroing page cache and then writing all those
> pages with zeroes on transaction commit. But I agree that for lower end
> storage this may be slower because of synchronous writing of zeroes. That
> being said your transaction commit has to write those zeroes anyway so the
> cost is only mostly shifted but it could still make a difference for some
> workloads. Not sure if that matters, that is your call I'd say.

Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage 
to support WRITE_ZERO command.

Anything bad to add a new api to support eof writeback?

Thanks,

Junxiao.

>
> Also note that you could submit those zeroing bios asynchronously but that
> would be more coding and you need to make sure they are completed on
> transaction commit so probably it isn't worth the complexity.
>
> 								Honza

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-03 17:25           ` Junxiao Bi
@ 2021-05-04  9:02             ` Jan Kara
  2021-05-04 23:35               ` Junxiao Bi
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2021-05-04  9:02 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Jan Kara, Andreas Gruenbacher, cluster-devel, linux-fsdevel, ocfs2-devel

On Mon 03-05-21 10:25:31, Junxiao Bi wrote:
> 
> On 5/3/21 3:29 AM, Jan Kara wrote:
> > On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
> > > On 4/30/21 5:47 AM, Jan Kara wrote:
> > > 
> > > > On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
> > > > > On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
> > > > > > On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > > > > > > When doing truncate/fallocate for some filesytem like ocfs2, it
> > > > > > > will zero some pages that are out of inode size and then later
> > > > > > > update the inode size, so it needs this api to writeback eof
> > > > > > > pages.
> > > > > > is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> > > > > > cache filling races" patch set [*]? It doesn't look like the kind of
> > > > > > patch Christoph would be happy with.
> > > > > Thank you for pointing the patch set. I think that is fixing a different
> > > > > issue.
> > > > > 
> > > > > The issue here is when extending file size with fallocate/truncate, if the
> > > > > original inode size
> > > > > 
> > > > > is in the middle of the last cluster block(1M), eof part will be zeroed with
> > > > > buffer write first,
> > > > > 
> > > > > and then new inode size is updated, so there is a window that dirty pages is
> > > > > out of inode size,
> > > > > 
> > > > > if writeback is kicked in, block_write_full_page will drop all those eof
> > > > > pages.
> > > > I agree that the buffers describing part of the cluster beyond i_size won't
> > > > be written. But page cache will remain zeroed out so that is fine. So you
> > > > only need to zero out the on disk contents. Since this is actually
> > > > physically contiguous range of blocks why don't you just use
> > > > sb_issue_zeroout() to zero out the tail of the cluster? It will be more
> > > > efficient than going through the page cache and you also won't have to
> > > > tweak block_write_full_page()...
> > > Thanks for the review.
> > > 
> > > The physical blocks to be zeroed were continuous only when sparse mode is
> > > enabled, if sparse mode is disabled, unwritten extent was not supported for
> > > ocfs2, then all the blocks to the new size will be zeroed by the buffer
> > > write, since sb_issue_zeroout() will need waiting io done, there will be a
> > > lot of delay when extending file size. Use writeback to flush async seemed
> > > more efficient?
> > It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
> > flash disks as well) do support WRITE_ZERO command so you don't actually
> > have to write all those zeros. The storage will just internally mark all
> > those blocks as having zeros. This is rather fast so I'd expect the overall
> > result to be faster that zeroing page cache and then writing all those
> > pages with zeroes on transaction commit. But I agree that for lower end
> > storage this may be slower because of synchronous writing of zeroes. That
> > being said your transaction commit has to write those zeroes anyway so the
> > cost is only mostly shifted but it could still make a difference for some
> > workloads. Not sure if that matters, that is your call I'd say.
> 
> Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage to
> support WRITE_ZERO command.
> 
> Anything bad to add a new api to support eof writeback?

OK, now that I reread the whole series you've posted I think I somewhat
misunderstood your original problem and intention. So let's first settle
on that. As far as I understand the problem happens when extending a file
(either through truncate or through write beyond i_size). When that
happens, we need to make sure that blocks (or their parts) that used to be
above i_size and are not going to be after extension are zeroed out.
Usually, for simple filesystems such as ext2, there is only one such block
- the one straddling i_size - where we need to make sure this happens. And
we achieve that by zeroing out tail of this block on writeout (in
->writepage() handler) and also by zeroing out tail of the block when
reducing i_size (block_truncate_page() takes care of this for ext2). So the
tail of this block is zeroed out on disk at all times and thus we have no
problem when extending i_size.

Now what I described doesn't work for OCFS2. As far as I understand the
reason is that when block size is smaller than page size and OCFS2 uses
cluster size larger than block size, the page straddling i_size can have
also some buffers mapped (with underlying blocks allocated) that are fully
outside of i_size. These blocks are never written because of how
__block_write_full_page() currently behaves (never writes buffers fully
beyond i_size) so even if you zero out page cache and dirty the page,
racing writeback can clear dirty bits without writing those blocks and so
they are not zeroed out on disk although we are about to expand i_size.

Did I understand the problem correctly? But what confuses me is that
ocfs2_zero_extend_range() (ocfs2_write_zero_page() in fact) actually does
extend i_size to contain the range it zeroes out while still holding the
page lock so it should be protected against the race with writeback I
outlined above. What am I missing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-04  9:02             ` Jan Kara
@ 2021-05-04 23:35               ` Junxiao Bi
  2021-05-05 11:43                 ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-05-04 23:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: cluster-devel, linux-fsdevel, Andreas Gruenbacher, ocfs2-devel

On 5/4/21 2:02 AM, Jan Kara wrote:

> On Mon 03-05-21 10:25:31, Junxiao Bi wrote:
>> On 5/3/21 3:29 AM, Jan Kara wrote:
>>> On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
>>>> On 4/30/21 5:47 AM, Jan Kara wrote:
>>>>
>>>>> On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
>>>>>> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
>>>>>>> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>>>>>> When doing truncate/fallocate for some filesytem like ocfs2, it
>>>>>>>> will zero some pages that are out of inode size and then later
>>>>>>>> update the inode size, so it needs this api to writeback eof
>>>>>>>> pages.
>>>>>>> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
>>>>>>> cache filling races" patch set [*]? It doesn't look like the kind of
>>>>>>> patch Christoph would be happy with.
>>>>>> Thank you for pointing the patch set. I think that is fixing a different
>>>>>> issue.
>>>>>>
>>>>>> The issue here is when extending file size with fallocate/truncate, if the
>>>>>> original inode size
>>>>>>
>>>>>> is in the middle of the last cluster block(1M), eof part will be zeroed with
>>>>>> buffer write first,
>>>>>>
>>>>>> and then new inode size is updated, so there is a window that dirty pages is
>>>>>> out of inode size,
>>>>>>
>>>>>> if writeback is kicked in, block_write_full_page will drop all those eof
>>>>>> pages.
>>>>> I agree that the buffers describing part of the cluster beyond i_size won't
>>>>> be written. But page cache will remain zeroed out so that is fine. So you
>>>>> only need to zero out the on disk contents. Since this is actually
>>>>> physically contiguous range of blocks why don't you just use
>>>>> sb_issue_zeroout() to zero out the tail of the cluster? It will be more
>>>>> efficient than going through the page cache and you also won't have to
>>>>> tweak block_write_full_page()...
>>>> Thanks for the review.
>>>>
>>>> The physical blocks to be zeroed were continuous only when sparse mode is
>>>> enabled, if sparse mode is disabled, unwritten extent was not supported for
>>>> ocfs2, then all the blocks to the new size will be zeroed by the buffer
>>>> write, since sb_issue_zeroout() will need waiting io done, there will be a
>>>> lot of delay when extending file size. Use writeback to flush async seemed
>>>> more efficient?
>>> It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
>>> flash disks as well) do support WRITE_ZERO command so you don't actually
>>> have to write all those zeros. The storage will just internally mark all
>>> those blocks as having zeros. This is rather fast so I'd expect the overall
>>> result to be faster that zeroing page cache and then writing all those
>>> pages with zeroes on transaction commit. But I agree that for lower end
>>> storage this may be slower because of synchronous writing of zeroes. That
>>> being said your transaction commit has to write those zeroes anyway so the
>>> cost is only mostly shifted but it could still make a difference for some
>>> workloads. Not sure if that matters, that is your call I'd say.
>> Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage to
>> support WRITE_ZERO command.
>>
>> Anything bad to add a new api to support eof writeback?
> OK, now that I reread the whole series you've posted I think I somewhat
> misunderstood your original problem and intention. So let's first settle
> on that. As far as I understand the problem happens when extending a file
> (either through truncate or through write beyond i_size). When that
> happens, we need to make sure that blocks (or their parts) that used to be
> above i_size and are not going to be after extension are zeroed out.
> Usually, for simple filesystems such as ext2, there is only one such block
> - the one straddling i_size - where we need to make sure this happens. And
> we achieve that by zeroing out tail of this block on writeout (in
> ->writepage() handler) and also by zeroing out tail of the block when
> reducing i_size (block_truncate_page() takes care of this for ext2). So the
> tail of this block is zeroed out on disk at all times and thus we have no
> problem when extending i_size.
>
> Now what I described doesn't work for OCFS2. As far as I understand the
> reason is that when block size is smaller than page size and OCFS2 uses
> cluster size larger than block size, the page straddling i_size can have
> also some buffers mapped (with underlying blocks allocated) that are fully
> outside of i_size. These blocks are never written because of how
> __block_write_full_page() currently behaves (never writes buffers fully
> beyond i_size) so even if you zero out page cache and dirty the page,
> racing writeback can clear dirty bits without writing those blocks and so
> they are not zeroed out on disk although we are about to expand i_size.
Correct.
>
> Did I understand the problem correctly? But what confuses me is that
> ocfs2_zero_extend_range() (ocfs2_write_zero_page() in fact) actually does
> extend i_size to contain the range it zeroes out while still holding the
> page lock so it should be protected against the race with writeback I
> outlined above. What am I missing?

Thank you for pointing this. I didn't realize ocfs2_zero_extend() will 
update inode size,

with it, truncate to extend file will not suffer this issue. The 
original issue happened with

qemu that used the following fallocate to extend file size. The first 
fallocate punched

hole beyond the inode size(2276196352) but not update isize, the second 
one updated

isize, the first one will do some buffer write to zero eof blocks in 
ocfs2_remove_inode_range

->ocfs2_zero_partial_clusters->ocfs2_zero_range_for_truncate.

     fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 
65536) = 0
     fallocate(11, 0, 2276196352, 65536) = 0


Thanks,

Junxiao.

>
> 								Honza

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-04 23:35               ` Junxiao Bi
@ 2021-05-05 11:43                 ` Jan Kara
  2021-05-05 15:54                   ` Junxiao Bi
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2021-05-05 11:43 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Jan Kara, Andreas Gruenbacher, cluster-devel, linux-fsdevel, ocfs2-devel

On Tue 04-05-21 16:35:53, Junxiao Bi wrote:
> On 5/4/21 2:02 AM, Jan Kara wrote:
> > On Mon 03-05-21 10:25:31, Junxiao Bi wrote:
> > > On 5/3/21 3:29 AM, Jan Kara wrote:
> > > > On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
> > > > > On 4/30/21 5:47 AM, Jan Kara wrote:
> > > > > 
> > > > > > On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
> > > > > > > On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > > > > > > > > When doing truncate/fallocate for some filesytem like ocfs2, it
> > > > > > > > > will zero some pages that are out of inode size and then later
> > > > > > > > > update the inode size, so it needs this api to writeback eof
> > > > > > > > > pages.
> > > > > > > > is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
> > > > > > > > cache filling races" patch set [*]? It doesn't look like the kind of
> > > > > > > > patch Christoph would be happy with.
> > > > > > > Thank you for pointing the patch set. I think that is fixing a different
> > > > > > > issue.
> > > > > > > 
> > > > > > > The issue here is when extending file size with fallocate/truncate, if the
> > > > > > > original inode size
> > > > > > > 
> > > > > > > is in the middle of the last cluster block(1M), eof part will be zeroed with
> > > > > > > buffer write first,
> > > > > > > 
> > > > > > > and then new inode size is updated, so there is a window that dirty pages is
> > > > > > > out of inode size,
> > > > > > > 
> > > > > > > if writeback is kicked in, block_write_full_page will drop all those eof
> > > > > > > pages.
> > > > > > I agree that the buffers describing part of the cluster beyond i_size won't
> > > > > > be written. But page cache will remain zeroed out so that is fine. So you
> > > > > > only need to zero out the on disk contents. Since this is actually
> > > > > > physically contiguous range of blocks why don't you just use
> > > > > > sb_issue_zeroout() to zero out the tail of the cluster? It will be more
> > > > > > efficient than going through the page cache and you also won't have to
> > > > > > tweak block_write_full_page()...
> > > > > Thanks for the review.
> > > > > 
> > > > > The physical blocks to be zeroed were continuous only when sparse mode is
> > > > > enabled, if sparse mode is disabled, unwritten extent was not supported for
> > > > > ocfs2, then all the blocks to the new size will be zeroed by the buffer
> > > > > write, since sb_issue_zeroout() will need waiting io done, there will be a
> > > > > lot of delay when extending file size. Use writeback to flush async seemed
> > > > > more efficient?
> > > > It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
> > > > flash disks as well) do support WRITE_ZERO command so you don't actually
> > > > have to write all those zeros. The storage will just internally mark all
> > > > those blocks as having zeros. This is rather fast so I'd expect the overall
> > > > result to be faster that zeroing page cache and then writing all those
> > > > pages with zeroes on transaction commit. But I agree that for lower end
> > > > storage this may be slower because of synchronous writing of zeroes. That
> > > > being said your transaction commit has to write those zeroes anyway so the
> > > > cost is only mostly shifted but it could still make a difference for some
> > > > workloads. Not sure if that matters, that is your call I'd say.
> > > Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage to
> > > support WRITE_ZERO command.
> > > 
> > > Anything bad to add a new api to support eof writeback?
> > OK, now that I reread the whole series you've posted I think I somewhat
> > misunderstood your original problem and intention. So let's first settle
> > on that. As far as I understand the problem happens when extending a file
> > (either through truncate or through write beyond i_size). When that
> > happens, we need to make sure that blocks (or their parts) that used to be
> > above i_size and are not going to be after extension are zeroed out.
> > Usually, for simple filesystems such as ext2, there is only one such block
> > - the one straddling i_size - where we need to make sure this happens. And
> > we achieve that by zeroing out tail of this block on writeout (in
> > ->writepage() handler) and also by zeroing out tail of the block when
> > reducing i_size (block_truncate_page() takes care of this for ext2). So the
> > tail of this block is zeroed out on disk at all times and thus we have no
> > problem when extending i_size.
> > 
> > Now what I described doesn't work for OCFS2. As far as I understand the
> > reason is that when block size is smaller than page size and OCFS2 uses
> > cluster size larger than block size, the page straddling i_size can have
> > also some buffers mapped (with underlying blocks allocated) that are fully
> > outside of i_size. These blocks are never written because of how
> > __block_write_full_page() currently behaves (never writes buffers fully
> > beyond i_size) so even if you zero out page cache and dirty the page,
> > racing writeback can clear dirty bits without writing those blocks and so
> > they are not zeroed out on disk although we are about to expand i_size.
> Correct.
> > 
> > Did I understand the problem correctly? But what confuses me is that
> > ocfs2_zero_extend_range() (ocfs2_write_zero_page() in fact) actually does
> > extend i_size to contain the range it zeroes out while still holding the
> > page lock so it should be protected against the race with writeback I
> > outlined above. What am I missing?
> 
> Thank you for pointing this. I didn't realize ocfs2_zero_extend() will
> update inode size,
> 
> with it, truncate to extend file will not suffer this issue. The original
> issue happened with
> 
> qemu that used the following fallocate to extend file size. The first
> fallocate punched
> 
> hole beyond the inode size(2276196352) but not update isize, the second one
> updated
> 
> isize, the first one will do some buffer write to zero eof blocks in
> ocfs2_remove_inode_range
> 
> ->ocfs2_zero_partial_clusters->ocfs2_zero_range_for_truncate.
> 
>     fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352,
> 65536) = 0
>     fallocate(11, 0, 2276196352, 65536) = 0

OK, I see. And AFAICT it is not about writeback racing with the zeroing in
ocfs2_zero_range_for_truncate() but rather the filemap_fdatawrite_range()
there not writing out zeroed pages if they are beyond i_size. And honestly,
rather than trying to extend block_write_full_page() for this odd corner
case, I'd use sb_issue_zeroout() or code something similar to
__blkdev_issue_zero_pages() inside OCFS2. Because making pages in the page
cache beyond i_size work is always going to be fragile...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [Cluster-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-05 11:43                 ` Jan Kara
@ 2021-05-05 15:54                   ` Junxiao Bi
  0 siblings, 0 replies; 20+ messages in thread
From: Junxiao Bi @ 2021-05-05 15:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: cluster-devel, linux-fsdevel, Andreas Gruenbacher, ocfs2-devel


On 5/5/21 4:43 AM, Jan Kara wrote:
> On Tue 04-05-21 16:35:53, Junxiao Bi wrote:
>> On 5/4/21 2:02 AM, Jan Kara wrote:
>>> On Mon 03-05-21 10:25:31, Junxiao Bi wrote:
>>>> On 5/3/21 3:29 AM, Jan Kara wrote:
>>>>> On Fri 30-04-21 14:18:15, Junxiao Bi wrote:
>>>>>> On 4/30/21 5:47 AM, Jan Kara wrote:
>>>>>>
>>>>>>> On Thu 29-04-21 11:07:15, Junxiao Bi wrote:
>>>>>>>> On 4/29/21 10:14 AM, Andreas Gruenbacher wrote:
>>>>>>>>> On Tue, Apr 27, 2021 at 4:44 AM Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>>>>>>>> When doing truncate/fallocate for some filesytem like ocfs2, it
>>>>>>>>>> will zero some pages that are out of inode size and then later
>>>>>>>>>> update the inode size, so it needs this api to writeback eof
>>>>>>>>>> pages.
>>>>>>>>> is this in reaction to Jan's "[PATCH 0/12 v4] fs: Hole punch vs page
>>>>>>>>> cache filling races" patch set [*]? It doesn't look like the kind of
>>>>>>>>> patch Christoph would be happy with.
>>>>>>>> Thank you for pointing the patch set. I think that is fixing a different
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> The issue here is when extending file size with fallocate/truncate, if the
>>>>>>>> original inode size
>>>>>>>>
>>>>>>>> is in the middle of the last cluster block(1M), eof part will be zeroed with
>>>>>>>> buffer write first,
>>>>>>>>
>>>>>>>> and then new inode size is updated, so there is a window that dirty pages is
>>>>>>>> out of inode size,
>>>>>>>>
>>>>>>>> if writeback is kicked in, block_write_full_page will drop all those eof
>>>>>>>> pages.
>>>>>>> I agree that the buffers describing part of the cluster beyond i_size won't
>>>>>>> be written. But page cache will remain zeroed out so that is fine. So you
>>>>>>> only need to zero out the on disk contents. Since this is actually
>>>>>>> physically contiguous range of blocks why don't you just use
>>>>>>> sb_issue_zeroout() to zero out the tail of the cluster? It will be more
>>>>>>> efficient than going through the page cache and you also won't have to
>>>>>>> tweak block_write_full_page()...
>>>>>> Thanks for the review.
>>>>>>
>>>>>> The physical blocks to be zeroed were continuous only when sparse mode is
>>>>>> enabled, if sparse mode is disabled, unwritten extent was not supported for
>>>>>> ocfs2, then all the blocks to the new size will be zeroed by the buffer
>>>>>> write, since sb_issue_zeroout() will need waiting io done, there will be a
>>>>>> lot of delay when extending file size. Use writeback to flush async seemed
>>>>>> more efficient?
>>>>> It depends. Higher end storage (e.g. NVME or NAS, maybe some better SATA
>>>>> flash disks as well) do support WRITE_ZERO command so you don't actually
>>>>> have to write all those zeros. The storage will just internally mark all
>>>>> those blocks as having zeros. This is rather fast so I'd expect the overall
>>>>> result to be faster that zeroing page cache and then writing all those
>>>>> pages with zeroes on transaction commit. But I agree that for lower end
>>>>> storage this may be slower because of synchronous writing of zeroes. That
>>>>> being said your transaction commit has to write those zeroes anyway so the
>>>>> cost is only mostly shifted but it could still make a difference for some
>>>>> workloads. Not sure if that matters, that is your call I'd say.
>>>> Ocfs2 is mostly used with SAN, i don't think it's common for SAN storage to
>>>> support WRITE_ZERO command.
>>>>
>>>> Anything bad to add a new api to support eof writeback?
>>> OK, now that I reread the whole series you've posted I think I somewhat
>>> misunderstood your original problem and intention. So let's first settle
>>> on that. As far as I understand the problem happens when extending a file
>>> (either through truncate or through write beyond i_size). When that
>>> happens, we need to make sure that blocks (or their parts) that used to be
>>> above i_size and are not going to be after extension are zeroed out.
>>> Usually, for simple filesystems such as ext2, there is only one such block
>>> - the one straddling i_size - where we need to make sure this happens. And
>>> we achieve that by zeroing out tail of this block on writeout (in
>>> ->writepage() handler) and also by zeroing out tail of the block when
>>> reducing i_size (block_truncate_page() takes care of this for ext2). So the
>>> tail of this block is zeroed out on disk at all times and thus we have no
>>> problem when extending i_size.
>>>
>>> Now what I described doesn't work for OCFS2. As far as I understand the
>>> reason is that when block size is smaller than page size and OCFS2 uses
>>> cluster size larger than block size, the page straddling i_size can have
>>> also some buffers mapped (with underlying blocks allocated) that are fully
>>> outside of i_size. These blocks are never written because of how
>>> __block_write_full_page() currently behaves (never writes buffers fully
>>> beyond i_size) so even if you zero out page cache and dirty the page,
>>> racing writeback can clear dirty bits without writing those blocks and so
>>> they are not zeroed out on disk although we are about to expand i_size.
>> Correct.
>>> Did I understand the problem correctly? But what confuses me is that
>>> ocfs2_zero_extend_range() (ocfs2_write_zero_page() in fact) actually does
>>> extend i_size to contain the range it zeroes out while still holding the
>>> page lock so it should be protected against the race with writeback I
>>> outlined above. What am I missing?
>> Thank you for pointing this. I didn't realize ocfs2_zero_extend() will
>> update inode size,
>>
>> with it, truncate to extend file will not suffer this issue. The original
>> issue happened with
>>
>> qemu that used the following fallocate to extend file size. The first
>> fallocate punched
>>
>> hole beyond the inode size(2276196352) but not update isize, the second one
>> updated
>>
>> isize, the first one will do some buffer write to zero eof blocks in
>> ocfs2_remove_inode_range
>>
>> ->ocfs2_zero_partial_clusters->ocfs2_zero_range_for_truncate.
>>
>>      fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352,
>> 65536) = 0
>>      fallocate(11, 0, 2276196352, 65536) = 0
> OK, I see. And AFAICT it is not about writeback racing with the zeroing in
> ocfs2_zero_range_for_truncate() but rather the filemap_fdatawrite_range()
> there not writing out zeroed pages if they are beyond i_size. And honestly,
> rather than trying to extend block_write_full_page() for this odd corner
> case, I'd use sb_issue_zeroout() or code something similar to
> __blkdev_issue_zero_pages() inside OCFS2. Because making pages in the page
> cache beyond i_size work is always going to be fragile...

Thanks for the suggestion. I will make v2 using zeroout.

Thanks,

Junxiao.

>
> 								Honza

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
                   ` (3 preceding siblings ...)
  2021-04-29 17:14 ` [Ocfs2-devel] [Cluster-devel] " Andreas Gruenbacher
@ 2021-05-09 23:23 ` Andrew Morton
  2021-05-10 22:15   ` Junxiao Bi
  4 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2021-05-09 23:23 UTC (permalink / raw)
  To: Junxiao Bi; +Cc: cluster-devel, linux-fsdevel, ocfs2-devel

On Mon, 26 Apr 2021 15:05:50 -0700 Junxiao Bi <junxiao.bi@oracle.com> wrote:

> When doing truncate/fallocate for some filesytem like ocfs2, it
> will zero some pages that are out of inode size and then later
> update the inode size, so it needs this api to writeback eof
> pages.

Seems reasonable.  But can we please update the
__block_write_full_page_eof() comment?  It now uses the wrong function
name and doesn't document the new `eof' argument.


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
@ 2021-05-10 22:15   ` Junxiao Bi
  2021-05-11 12:19     ` Bob Peterson
  0 siblings, 1 reply; 20+ messages in thread
From: Junxiao Bi @ 2021-05-10 22:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cluster-devel, linux-fsdevel, Jan Kara, ocfs2-devel

On 5/9/21 4:23 PM, Andrew Morton wrote:

> On Mon, 26 Apr 2021 15:05:50 -0700 Junxiao Bi <junxiao.bi@oracle.com> wrote:
>
>> When doing truncate/fallocate for some filesytem like ocfs2, it
>> will zero some pages that are out of inode size and then later
>> update the inode size, so it needs this api to writeback eof
>> pages.
> Seems reasonable.  But can we please update the
> __block_write_full_page_eof() comment?  It now uses the wrong function
> name and doesn't document the new `eof' argument.

Jan suggested using sb_issue_zeroout to zero eof pages in 
ocfs2_fallocate, that can

also fix the issue for ocfs2. For gfs2, i though it had the same issue, 
but i didn't get

a confirm from gfs2 maintainer, if gfs2 is ok, then maybe this new api 
is not necessary?

Thanks,

Junxiao.

>

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback
  2021-05-10 22:15   ` Junxiao Bi
@ 2021-05-11 12:19     ` Bob Peterson
  0 siblings, 0 replies; 20+ messages in thread
From: Bob Peterson @ 2021-05-11 12:19 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: Jan Kara, Andreas Gruenbacher, cluster-devel, linux-fsdevel, ocfs2-devel

----- Original Message -----
> On 5/9/21 4:23 PM, Andrew Morton wrote:
> 
> > On Mon, 26 Apr 2021 15:05:50 -0700 Junxiao Bi <junxiao.bi@oracle.com>
> > wrote:
> >
> >> When doing truncate/fallocate for some filesytem like ocfs2, it
> >> will zero some pages that are out of inode size and then later
> >> update the inode size, so it needs this api to writeback eof
> >> pages.
> > Seems reasonable.  But can we please update the
> > __block_write_full_page_eof() comment?  It now uses the wrong function
> > name and doesn't document the new `eof' argument.
> 
> Jan suggested using sb_issue_zeroout to zero eof pages in
> ocfs2_fallocate, that can
> 
> also fix the issue for ocfs2. For gfs2, i though it had the same issue,
> but i didn't get
> 
> a confirm from gfs2 maintainer, if gfs2 is ok, then maybe this new api
> is not necessary?
> 
> Thanks,
> 
> Junxiao.

Hi,

Sorry. I was on holiday/vacation for the past week and a half without
Internet access except for my phone. I'll try to find the time to read
the thread and look into it soon.

Bob Peterson


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-13 15:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-26 22:05 [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Junxiao Bi
2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: allow writing back pages out of inode size Junxiao Bi
2021-04-28 16:00   ` Junxiao Bi
2021-04-29 13:09   ` Joseph Qi
2021-04-26 22:05 ` [Ocfs2-devel] [PATCH 3/3] gfs2: fix out of inode size writeback Junxiao Bi
2021-04-28 16:02   ` Junxiao Bi
2021-04-29 11:58 ` [Ocfs2-devel] [PATCH 1/3] fs/buffer.c: add new api to allow eof writeback Joseph Qi
2021-04-29 17:14 ` [Ocfs2-devel] [Cluster-devel] " Andreas Gruenbacher
2021-04-29 18:07   ` Junxiao Bi
2021-04-30 12:47     ` Jan Kara
2021-04-30 21:18       ` Junxiao Bi
2021-05-03 10:29         ` Jan Kara
2021-05-03 17:25           ` Junxiao Bi
2021-05-04  9:02             ` Jan Kara
2021-05-04 23:35               ` Junxiao Bi
2021-05-05 11:43                 ` Jan Kara
2021-05-05 15:54                   ` Junxiao Bi
2021-05-09 23:23 ` [Ocfs2-devel] " Andrew Morton
2021-05-10 22:15   ` Junxiao Bi
2021-05-11 12:19     ` Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).