[Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
@ 2017-12-18 12:06 Changwei Ge
  2017-12-18 21:53 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Changwei Ge @ 2017-12-18 12:06 UTC (permalink / raw)
  To: ocfs2-devel

Before ocfs2 supporting allocating clusters while doing append-dio, all append
dio will fall back to buffer io to allocate clusters firstly. Also, when it
steps on a file hole, it will fall back to buffer io, too. But for current
code, writing to file hole will leverage dio to allocate clusters. This is not
right, since whether append-io is enabled tells the capability whether ocfs2 can
allocate space while doing dio.
So introduce file hole check function back into ocfs2.
Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
back to buffer IO to allocate clusters.

Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
---
  fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
  1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index d151632..a982cf6 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
  	return ret;
  }
  
+/*
+ * Will look for holes and unwritten extents in the range starting at
+ * pos for count bytes (inclusive).
+ */
+static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
+				       size_t count)
+{
+	int ret = 0;
+	unsigned int extent_flags;
+	u32 cpos, clusters, extent_len, phys_cpos;
+	struct super_block *sb = inode->i_sb;
+
+	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
+	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
+
+	while (clusters) {
+		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
+					 &extent_flags);
+		if (ret < 0) {
+			mlog_errno(ret);
+			goto out;
+		}
+
+		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
+			ret = 1;
+			break;
+		}
+
+		if (extent_len > clusters)
+			extent_len = clusters;
+
+		clusters -= extent_len;
+		cpos += extent_len;
+	}
+out:
+	return ret;
+}
+
  static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
  {
  	struct file *file = iocb->ki_filp;
@@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
  		return 0;
  
  	/* Fallback to buffered I/O if we do not support append dio. */
-	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
-	    !ocfs2_supports_append_dio(osb))
+	if (!ocfs2_supports_append_dio(osb) &&
+			(iocb->ki_pos + iter->count > i_size_read(inode) ||
+			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
+						     iter->count)))
  		return 0;
  
  	if (iov_iter_rw(iter) == READ)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-18 12:06 [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing Changwei Ge
@ 2017-12-18 21:53 ` Andrew Morton
  2017-12-19  1:24   ` Joseph Qi
  2017-12-19  3:00   ` Changwei Ge
  2017-12-19  1:44 ` piaojun
  2017-12-19  2:35 ` Gang He
  2 siblings, 2 replies; 21+ messages in thread
From: Andrew Morton @ 2017-12-18 21:53 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, 18 Dec 2017 12:06:21 +0000 Changwei Ge <ge.changwei@h3c.com> wrote:

> Before ocfs2 supporting allocating clusters while doing append-dio, all append
> dio will fall back to buffer io to allocate clusters firstly. Also, when it
> steps on a file hole, it will fall back to buffer io, too. But for current
> code, writing to file hole will leverage dio to allocate clusters. This is not
> right, since whether append-io is enabled tells the capability whether ocfs2 can
> allocate space while doing dio.
> So introduce file hole check function back into ocfs2.
> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
> back to buffer IO to allocate clusters.
> 

hm, that's a bit hard to understand.  Hopefully reviewers will know
what's going on ;)

> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>   	return ret;
>   }
>   
> +/*
> + * Will look for holes and unwritten extents in the range starting at
> + * pos for count bytes (inclusive).
> + */
> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> +				       size_t count)
> +{
> +	int ret = 0;
> +	unsigned int extent_flags;
> +	u32 cpos, clusters, extent_len, phys_cpos;
> +	struct super_block *sb = inode->i_sb;
> +
> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
> +
> +	while (clusters) {
> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
> +					 &extent_flags);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
> +
> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> +			ret = 1;
> +			break;
> +		}
> +
> +		if (extent_len > clusters)
> +			extent_len = clusters;
> +
> +		clusters -= extent_len;
> +		cpos += extent_len;
> +	}
> +out:
> +	return ret;
> +}

A few thoughts:

- a function which does "check_foo" isn't well named.  Because the
  reader cannot determine whether the return value means "foo is true"
  or "foo is false".

  So a better name for this function is ocfs2_range_has_holes(), so
  the reader immediately understand what its return value *means".

  Also a bool return value is more logical.

- Mixing "goto out" with "break" as above is a bit odd.

So...


--- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
+++ a/fs/ocfs2/aops.c
@@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
  * Will look for holes and unwritten extents in the range starting at
  * pos for count bytes (inclusive).
  */
-static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
-				       size_t count)
+static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
 {
-	int ret = 0;
+	bool ret = false;
 	unsigned int extent_flags;
 	u32 cpos, clusters, extent_len, phys_cpos;
 	struct super_block *sb = inode->i_sb;
@@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
 		}
 
 		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
-			ret = 1;
-			break;
+			ret = true;
+			goto out;
 		}
 
 		if (extent_len > clusters)
_

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-18 21:53 ` Andrew Morton
@ 2017-12-19  1:24   ` Joseph Qi
  2017-12-19  1:27     ` Andrew Morton
  2017-12-19  3:02     ` Changwei Ge
  2017-12-19  3:00   ` Changwei Ge
  1 sibling, 2 replies; 21+ messages in thread
From: Joseph Qi @ 2017-12-19  1:24 UTC (permalink / raw)
  To: ocfs2-devel


On 17/12/19 05:53, Andrew Morton wrote:
> On Mon, 18 Dec 2017 12:06:21 +0000 Changwei Ge <ge.changwei@h3c.com> wrote:
> 
>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>> steps on a file hole, it will fall back to buffer io, too. But for current
>> code, writing to file hole will leverage dio to allocate clusters. This is not
>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>> allocate space while doing dio.
>> So introduce file hole check function back into ocfs2.
>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>> back to buffer IO to allocate clusters.
>>
> 
> hm, that's a bit hard to understand.  Hopefully reviewers will know
> what's going on ;)> 
Also suggest rearrange the description:)

>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>   	return ret;
>>   }
>>   
>> +/*
>> + * Will look for holes and unwritten extents in the range starting at
>> + * pos for count bytes (inclusive).
>> + */
>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> +				       size_t count)
>> +{
>> +	int ret = 0;
>> +	unsigned int extent_flags;
>> +	u32 cpos, clusters, extent_len, phys_cpos;
>> +	struct super_block *sb = inode->i_sb;
>> +
>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>> +
>> +	while (clusters) {
>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>> +					 &extent_flags);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>> +
>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> +			ret = 1;
>> +			break;
>> +		}
>> +
>> +		if (extent_len > clusters)
>> +			extent_len = clusters;
>> +
>> +		clusters -= extent_len;
>> +		cpos += extent_len;
>> +	}
>> +out:
>> +	return ret;
>> +}
> 
> A few thoughts:
> 
> - a function which does "check_foo" isn't well named.  Because the
>   reader cannot determine whether the return value means "foo is true"
>   or "foo is false".
> 
>   So a better name for this function is ocfs2_range_has_holes(), so
>   the reader immediately understand what its return value *means".
> 
>   Also a bool return value is more logical.
> 
> - Mixing "goto out" with "break" as above is a bit odd.
> 
> So...
> 
> 
> --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
> +++ a/fs/ocfs2/aops.c
> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>   * Will look for holes and unwritten extents in the range starting at
>   * pos for count bytes (inclusive).
>   */
> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> -				       size_t count)
> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
>  {
> -	int ret = 0;
> +	bool ret = false;

I have a different opinion here. If we change return value from int to
bool, the error returned by ocfs2_get_clusters cannot be reflected. So
I'd prefer the original version.

Thanks,
Joseph

>  	unsigned int extent_flags;
>  	u32 cpos, clusters, extent_len, phys_cpos;
>  	struct super_block *sb = inode->i_sb;
> @@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
>  		}
>  
>  		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> -			ret = 1;
> -			break;
> +			ret = true;
> +			goto out;
>  		}
>  
>  		if (extent_len > clusters)
> _
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  1:24   ` Joseph Qi
@ 2017-12-19  1:27     ` Andrew Morton
  2017-12-19  1:39       ` Joseph Qi
  2017-12-19  3:02     ` Changwei Ge
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2017-12-19  1:27 UTC (permalink / raw)
  To: ocfs2-devel

On Tue, 19 Dec 2017 09:24:17 +0800 Joseph Qi <jiangqi903@gmail.com> wrote:

> > --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
> > +++ a/fs/ocfs2/aops.c
> > @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
> >   * Will look for holes and unwritten extents in the range starting at
> >   * pos for count bytes (inclusive).
> >   */
> > -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> > -				       size_t count)
> > +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
> >  {
> > -	int ret = 0;
> > +	bool ret = false;
> 
> I have a different opinion here. If we change return value from int to
> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
> I'd prefer the original version.

But that error code is not used?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  1:27     ` Andrew Morton
@ 2017-12-19  1:39       ` Joseph Qi
  2017-12-19  3:03         ` Changwei Ge
  0 siblings, 1 reply; 21+ messages in thread
From: Joseph Qi @ 2017-12-19  1:39 UTC (permalink / raw)
  To: ocfs2-devel



On 17/12/19 09:27, Andrew Morton wrote:
> On Tue, 19 Dec 2017 09:24:17 +0800 Joseph Qi <jiangqi903@gmail.com> wrote:
> 
>>> --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
>>> +++ a/fs/ocfs2/aops.c
>>> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>>>   * Will look for holes and unwritten extents in the range starting at
>>>   * pos for count bytes (inclusive).
>>>   */
>>> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>> -				       size_t count)
>>> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
>>>  {
>>> -	int ret = 0;
>>> +	bool ret = false;
>>
>> I have a different opinion here. If we change return value from int to
>> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
>> I'd prefer the original version.
> 
> But that error code is not used?
> 
IMO, since ocfs2_get_clusters will get io involved, the caller shouldn't
ignore its error.

Something like:
ret = ocfs2_check_range_for_holes();
if (ret < 0) {
	mlog_errno(ret);
	goto out;
}

Then check if append dio feature has been enabled as well as write range
and hole.

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-18 12:06 [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing Changwei Ge
  2017-12-18 21:53 ` Andrew Morton
@ 2017-12-19  1:44 ` piaojun
  2017-12-19  3:05   ` Changwei Ge
  2017-12-19  2:35 ` Gang He
  2 siblings, 1 reply; 21+ messages in thread
From: piaojun @ 2017-12-19  1:44 UTC (permalink / raw)
  To: ocfs2-devel

Hi Changwei,

On 2017/12/18 20:06, Changwei Ge wrote:
> Before ocfs2 supporting allocating clusters while doing append-dio, all append
> dio will fall back to buffer io to allocate clusters firstly. Also, when it
> steps on a file hole, it will fall back to buffer io, too. But for current
> code, writing to file hole will leverage dio to allocate clusters. This is not
> right, since whether append-io is enabled tells the capability whether ocfs2 can
> allocate space while doing dio.
> So introduce file hole check function back into ocfs2.
> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
> back to buffer IO to allocate clusters.
> 
1. Do you mean that filling hole can't go with dio when append-dio is disabled?
2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>   fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index d151632..a982cf6 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>   	return ret;
>   }
>   
> +/*
> + * Will look for holes and unwritten extents in the range starting at
> + * pos for count bytes (inclusive).
> + */
> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> +				       size_t count)
> +{
> +	int ret = 0;
> +	unsigned int extent_flags;
> +	u32 cpos, clusters, extent_len, phys_cpos;
> +	struct super_block *sb = inode->i_sb;
> +
> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
> +
> +	while (clusters) {
> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
> +					 &extent_flags);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
> +
> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> +			ret = 1;
> +			break;
> +		}
> +
> +		if (extent_len > clusters)
> +			extent_len = clusters;
> +
> +		clusters -= extent_len;
> +		cpos += extent_len;
> +	}
> +out:
> +	return ret;
> +}
> +
>   static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   {
>   	struct file *file = iocb->ki_filp;
> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   		return 0;
>   
>   	/* Fallback to buffered I/O if we do not support append dio. */
> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
> -	    !ocfs2_supports_append_dio(osb))
> +	if (!ocfs2_supports_append_dio(osb) &&
> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
> +						     iter->count)))
we should check error here, right?

thanks,
Jun
>   		return 0;
>   
>   	if (iov_iter_rw(iter) == READ)
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-18 12:06 [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing Changwei Ge
  2017-12-18 21:53 ` Andrew Morton
  2017-12-19  1:44 ` piaojun
@ 2017-12-19  2:35 ` Gang He
  2017-12-19  5:50   ` Changwei Ge
  2 siblings, 1 reply; 21+ messages in thread
From: Gang He @ 2017-12-19  2:35 UTC (permalink / raw)
  To: ocfs2-devel

Hi Changwei,


>>> 
> Before ocfs2 supporting allocating clusters while doing append-dio, all append
> dio will fall back to buffer io to allocate clusters firstly. Also, when it
> steps on a file hole, it will fall back to buffer io, too. But for current
> code, writing to file hole will leverage dio to allocate clusters. This is 
> not
> right, since whether append-io is enabled tells the capability whether ocfs2 
> can
> allocate space while doing dio.
> So introduce file hole check function back into ocfs2.
> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will 
> fall
> back to buffer IO to allocate clusters.
> 
> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
> ---
>   fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index d151632..a982cf6 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>   	return ret;
>   }
>   
> +/*
> + * Will look for holes and unwritten extents in the range starting at
> + * pos for count bytes (inclusive).
> + */
Why do we also need to look for unwritten extents here? unwritten extents means, the clusters have been allocated, but have not been written yet.
Unwritten extents will not bring any block allocation, my understanding is right here?

> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> +				       size_t count)
> +{
> +	int ret = 0;
> +	unsigned int extent_flags;
> +	u32 cpos, clusters, extent_len, phys_cpos;
> +	struct super_block *sb = inode->i_sb;
> +
> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
Please consider data-inline case, if in data-inline case, we maybe need not to allocate more cluster.
Then, if there is not any more block need to be allocated, we should make the check return true.

> +
> +	while (clusters) {
> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
> +					 &extent_flags);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
> +
> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> +			ret = 1;
> +			break;
> +		}
> +
> +		if (extent_len > clusters)
> +			extent_len = clusters;
> +
> +		clusters -= extent_len;
> +		cpos += extent_len;
> +	}
> +out:
> +	return ret;
> +}
> +
>   static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>   {
>   	struct file *file = iocb->ki_filp;
> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   		return 0;
>   
>   	/* Fallback to buffered I/O if we do not support append dio. */
> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
> -	    !ocfs2_supports_append_dio(osb))
> +	if (!ocfs2_supports_append_dio(osb) &&
> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
> +						     iter->count)))
>   		return 0;
>   
>   	if (iov_iter_rw(iter) == READ)
> -- 
> 2.7.4
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com 
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-18 21:53 ` Andrew Morton
  2017-12-19  1:24   ` Joseph Qi
@ 2017-12-19  3:00   ` Changwei Ge
  1 sibling, 0 replies; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  3:00 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 5:54, Andrew Morton wrote:
> On Mon, 18 Dec 2017 12:06:21 +0000 Changwei Ge <ge.changwei@h3c.com> wrote:
> 
>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>> steps on a file hole, it will fall back to buffer io, too. But for current
>> code, writing to file hole will leverage dio to allocate clusters. This is not
>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>> allocate space while doing dio.
>> So introduce file hole check function back into ocfs2.
>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>> back to buffer IO to allocate clusters.
>>
> 
> hm, that's a bit hard to understand.  Hopefully reviewers will know
> what's going on ;)
Hi Andrew,
Sorry for mess up the changelog.
I will enrich the changelog and elaborate my intention further in the next version
of this patch.


> 
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>    	return ret;
>>    }
>>    
>> +/*
>> + * Will look for holes and unwritten extents in the range starting at
>> + * pos for count bytes (inclusive).
>> + */
>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> +				       size_t count)
>> +{
>> +	int ret = 0;
>> +	unsigned int extent_flags;
>> +	u32 cpos, clusters, extent_len, phys_cpos;
>> +	struct super_block *sb = inode->i_sb;
>> +
>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>> +
>> +	while (clusters) {
>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>> +					 &extent_flags);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>> +
>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> +			ret = 1;
>> +			break;
>> +		}
>> +
>> +		if (extent_len > clusters)
>> +			extent_len = clusters;
>> +
>> +		clusters -= extent_len;
>> +		cpos += extent_len;
>> +	}
>> +out:
>> +	return ret;
>> +}
> 
> A few thoughts:
> 
> - a function which does "check_foo" isn't well named.  Because the
>    reader cannot determine whether the return value means "foo is true"
>    or "foo is false".

Very true.

> 
>    So a better name for this function is ocfs2_range_has_holes(), so
>    the reader immediately understand what its return value *means".
> 
>    Also a bool return value is more logical.

I prefer int value.
It's my fault here since I didn't check the return value for other exception
scenario.

> 
> - Mixing "goto out" with "break" as above is a bit odd.

Agree.

> 
> So...
> 
> 
> --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
> +++ a/fs/ocfs2/aops.c
> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>    * Will look for holes and unwritten extents in the range starting at
>    * pos for count bytes (inclusive).
>    */
> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> -				       size_t count)
> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
>   {
> -	int ret = 0;
> +	bool ret = false;
>   	unsigned int extent_flags;
>   	u32 cpos, clusters, extent_len, phys_cpos;
>   	struct super_block *sb = inode->i_sb;
> @@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
>   		}
>   
>   		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> -			ret = 1;
> -			break;
> +			ret = true;
> +			goto out;
>   		}

Ok, I will take this into account.

Thanks,
Changwei

>   
>   		if (extent_len > clusters)
> _
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  1:24   ` Joseph Qi
  2017-12-19  1:27     ` Andrew Morton
@ 2017-12-19  3:02     ` Changwei Ge
  1 sibling, 0 replies; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  3:02 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 9:24, Joseph Qi wrote:
> 
> On 17/12/19 05:53, Andrew Morton wrote:
>> On Mon, 18 Dec 2017 12:06:21 +0000 Changwei Ge <ge.changwei@h3c.com> wrote:
>>
>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>> allocate space while doing dio.
>>> So introduce file hole check function back into ocfs2.
>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>> back to buffer IO to allocate clusters.
>>>
>>
>> hm, that's a bit hard to understand.  Hopefully reviewers will know
>> what's going on ;)>
> Also suggest rearrange the description:)

Hi Joseph,
OK, I will elaborate the intention of this patch further in the next version.

> 
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>    	return ret;
>>>    }
>>>    
>>> +/*
>>> + * Will look for holes and unwritten extents in the range starting at
>>> + * pos for count bytes (inclusive).
>>> + */
>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>> +				       size_t count)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned int extent_flags;
>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>> +	struct super_block *sb = inode->i_sb;
>>> +
>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>> +
>>> +	while (clusters) {
>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>> +					 &extent_flags);
>>> +		if (ret < 0) {
>>> +			mlog_errno(ret);
>>> +			goto out;
>>> +		}
>>> +
>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>> +			ret = 1;
>>> +			break;
>>> +		}
>>> +
>>> +		if (extent_len > clusters)
>>> +			extent_len = clusters;
>>> +
>>> +		clusters -= extent_len;
>>> +		cpos += extent_len;
>>> +	}
>>> +out:
>>> +	return ret;
>>> +}
>>
>> A few thoughts:
>>
>> - a function which does "check_foo" isn't well named.  Because the
>>    reader cannot determine whether the return value means "foo is true"
>>    or "foo is false".
>>
>>    So a better name for this function is ocfs2_range_has_holes(), so
>>    the reader immediately understand what its return value *means".
>>
>>    Also a bool return value is more logical.
>>
>> - Mixing "goto out" with "break" as above is a bit odd.
>>
>> So...
>>
>>
>> --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
>> +++ a/fs/ocfs2/aops.c
>> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>>    * Will look for holes and unwritten extents in the range starting at
>>    * pos for count bytes (inclusive).
>>    */
>> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> -				       size_t count)
>> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
>>   {
>> -	int ret = 0;
>> +	bool ret = false;
> 
> I have a different opinion here. If we change return value from int to
> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
> I'd prefer the original version.

Yes, it's my mistake here.
You have to forgive me.:)

Thanks,
Changwei

> 
> Thanks,
> Joseph
> 
>>   	unsigned int extent_flags;
>>   	u32 cpos, clusters, extent_len, phys_cpos;
>>   	struct super_block *sb = inode->i_sb;
>> @@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
>>   		}
>>   
>>   		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> -			ret = 1;
>> -			break;
>> +			ret = true;
>> +			goto out;
>>   		}
>>   
>>   		if (extent_len > clusters)
>> _
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  1:39       ` Joseph Qi
@ 2017-12-19  3:03         ` Changwei Ge
  0 siblings, 0 replies; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  3:03 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 9:39, Joseph Qi wrote:
> 
> 
> On 17/12/19 09:27, Andrew Morton wrote:
>> On Tue, 19 Dec 2017 09:24:17 +0800 Joseph Qi <jiangqi903@gmail.com> wrote:
>>
>>>> --- a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
>>>> +++ a/fs/ocfs2/aops.c
>>>> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>>>>    * Will look for holes and unwritten extents in the range starting at
>>>>    * pos for count bytes (inclusive).
>>>>    */
>>>> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>> -				       size_t count)
>>>> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t count)
>>>>   {
>>>> -	int ret = 0;
>>>> +	bool ret = false;
>>>
>>> I have a different opinion here. If we change return value from int to
>>> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
>>> I'd prefer the original version.
>>
>> But that error code is not used?
>>
> IMO, since ocfs2_get_clusters will get io involved, the caller shouldn't
> ignore its error.
> 
> Something like:
> ret = ocfs2_check_range_for_holes();
> if (ret < 0) {
> 	mlog_errno(ret);
> 	goto out;
> }
> 
> Then check if append dio feature has been enabled as well as write range
> and hole.

Accept this point.

> 
> Thanks,
> Joseph
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  1:44 ` piaojun
@ 2017-12-19  3:05   ` Changwei Ge
  2017-12-19  3:40     ` piaojun
  0 siblings, 1 reply; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  3:05 UTC (permalink / raw)
  To: ocfs2-devel

Hi Jun,

On 2017/12/19 9:48, piaojun wrote:
> Hi Changwei,
> 
> On 2017/12/18 20:06, Changwei Ge wrote:
>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>> steps on a file hole, it will fall back to buffer io, too. But for current
>> code, writing to file hole will leverage dio to allocate clusters. This is not
>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>> allocate space while doing dio.
>> So introduce file hole check function back into ocfs2.
>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>> back to buffer IO to allocate clusters.
>>
> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?

Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.

> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?

Just for append-dio

>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>    fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index d151632..a982cf6 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>    	return ret;
>>    }
>>    
>> +/*
>> + * Will look for holes and unwritten extents in the range starting at
>> + * pos for count bytes (inclusive).
>> + */
>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> +				       size_t count)
>> +{
>> +	int ret = 0;
>> +	unsigned int extent_flags;
>> +	u32 cpos, clusters, extent_len, phys_cpos;
>> +	struct super_block *sb = inode->i_sb;
>> +
>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>> +
>> +	while (clusters) {
>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>> +					 &extent_flags);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>> +
>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> +			ret = 1;
>> +			break;
>> +		}
>> +
>> +		if (extent_len > clusters)
>> +			extent_len = clusters;
>> +
>> +		clusters -= extent_len;
>> +		cpos += extent_len;
>> +	}
>> +out:
>> +	return ret;
>> +}
>> +
>>    static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>    {
>>    	struct file *file = iocb->ki_filp;
>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>    		return 0;
>>    
>>    	/* Fallback to buffered I/O if we do not support append dio. */
>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>> -	    !ocfs2_supports_append_dio(osb))
>> +	if (!ocfs2_supports_append_dio(osb) &&
>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>> +						     iter->count)))
> we should check error here, right?

Accept this point.

Thanks,
Changwei

> 
> thanks,
> Jun
>>    		return 0;
>>    
>>    	if (iov_iter_rw(iter) == READ)
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  3:05   ` Changwei Ge
@ 2017-12-19  3:40     ` piaojun
  2017-12-19  6:02       ` Changwei Ge
  0 siblings, 1 reply; 21+ messages in thread
From: piaojun @ 2017-12-19  3:40 UTC (permalink / raw)
  To: ocfs2-devel

Hi Changwei,

On 2017/12/19 11:05, Changwei Ge wrote:
> Hi Jun,
> 
> On 2017/12/19 9:48, piaojun wrote:
>> Hi Changwei,
>>
>> On 2017/12/18 20:06, Changwei Ge wrote:
>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>> allocate space while doing dio.
>>> So introduce file hole check function back into ocfs2.
>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>> back to buffer IO to allocate clusters.
>>>
>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
> 
> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.

Why does dio need fall back to buffer-io when append-dio disabled?
Could 'common-dio' on file hole go through direct io process? If not,
could you please point out the necessity.

> 
>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
> 
> Just for append-dio
> 

If your patch is just for 'append-dio', I wonder that will have impact
on 'common-dio'.

thanks,
Jun

>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>    fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index d151632..a982cf6 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>    	return ret;
>>>    }
>>>    
>>> +/*
>>> + * Will look for holes and unwritten extents in the range starting at
>>> + * pos for count bytes (inclusive).
>>> + */
>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>> +				       size_t count)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned int extent_flags;
>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>> +	struct super_block *sb = inode->i_sb;
>>> +
>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>> +
>>> +	while (clusters) {
>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>> +					 &extent_flags);
>>> +		if (ret < 0) {
>>> +			mlog_errno(ret);
>>> +			goto out;
>>> +		}
>>> +
>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>> +			ret = 1;
>>> +			break;
>>> +		}
>>> +
>>> +		if (extent_len > clusters)
>>> +			extent_len = clusters;
>>> +
>>> +		clusters -= extent_len;
>>> +		cpos += extent_len;
>>> +	}
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>>    static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>    {
>>>    	struct file *file = iocb->ki_filp;
>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>    		return 0;
>>>    
>>>    	/* Fallback to buffered I/O if we do not support append dio. */
>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>> -	    !ocfs2_supports_append_dio(osb))
>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>> +						     iter->count)))
>> we should check error here, right?
> 
> Accept this point.
> 
> Thanks,
> Changwei
> 
>>
>> thanks,
>> Jun
>>>    		return 0;
>>>    
>>>    	if (iov_iter_rw(iter) == READ)
>>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  2:35 ` Gang He
@ 2017-12-19  5:50   ` Changwei Ge
  2017-12-19  6:27     ` Gang He
  0 siblings, 1 reply; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  5:50 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 10:35, Gang He wrote:
> Hi Changwei,
> 
> 
>>>>
>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>> steps on a file hole, it will fall back to buffer io, too. But for current
>> code, writing to file hole will leverage dio to allocate clusters. This is
>> not
>> right, since whether append-io is enabled tells the capability whether ocfs2
>> can
>> allocate space while doing dio.
>> So introduce file hole check function back into ocfs2.
>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will
>> fall
>> back to buffer IO to allocate clusters.
>>
>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>> ---
>>    fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index d151632..a982cf6 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>    	return ret;
>>    }
>>    
>> +/*
>> + * Will look for holes and unwritten extents in the range starting at
>> + * pos for count bytes (inclusive).
>> + */
> Why do we also need to look for unwritten extents here? unwritten extents means, the clusters have been allocated, but have not been written yet.
> Unwritten extents will not bring any block allocation, my understanding is right here?
> 

Hi Gang,
For now, I think your comment here makes scene.
Unwritten cluster should not make dio fall back to buffer io.
Actually, I copied this function snippet from earlier version of ocfs2 :)

>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> +				       size_t count)
>> +{
>> +	int ret = 0;
>> +	unsigned int extent_flags;
>> +	u32 cpos, clusters, extent_len, phys_cpos;
>> +	struct super_block *sb = inode->i_sb;
>> +
>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
> Please consider data-inline case, if in data-inline case, we maybe need not to allocate more cluster.
> Then, if there is not any more block need to be allocated, we should make the check return true.

I suppose current code in ocfs2_direct_IO already checks this for us.

Thanks,
Changwei

> 
>> +
>> +	while (clusters) {
>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>> +					 &extent_flags);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>> +
>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> +			ret = 1;
>> +			break;
>> +		}
>> +
>> +		if (extent_len > clusters)
>> +			extent_len = clusters;
>> +
>> +		clusters -= extent_len;
>> +		cpos += extent_len;
>> +	}
>> +out:
>> +	return ret;
>> +}
>> +
>>    static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>    {
>>    	struct file *file = iocb->ki_filp;
>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb,
>> struct iov_iter *iter)
>>    		return 0;
>>    
>>    	/* Fallback to buffered I/O if we do not support append dio. */
>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>> -	    !ocfs2_supports_append_dio(osb))
>> +	if (!ocfs2_supports_append_dio(osb) &&
>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>> +						     iter->count)))
>>    		return 0;
>>    
>>    	if (iov_iter_rw(iter) == READ)
>> -- 
>> 2.7.4
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  3:40     ` piaojun
@ 2017-12-19  6:02       ` Changwei Ge
  2017-12-19  8:04         ` Junxiao Bi
  0 siblings, 1 reply; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  6:02 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 11:41, piaojun wrote:
> Hi Changwei,
> 
> On 2017/12/19 11:05, Changwei Ge wrote:
>> Hi Jun,
>>
>> On 2017/12/19 9:48, piaojun wrote:
>>> Hi Changwei,
>>>
>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>> allocate space while doing dio.
>>>> So introduce file hole check function back into ocfs2.
>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>> back to buffer IO to allocate clusters.
>>>>
>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>
>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
> 
> Why does dio need fall back to buffer-io when append-dio disabled?
> Could 'common-dio' on file hole go through direct io process? If not,
> could you please point out the necessity.
Hi Jun,

The intention to make dio fall back to buffer io is to provide *an option* to users, which
is more stable and even fast.
 From my perspective, current ocfs2 dio implementation especially around space allocation during
doing dio still needs more test and improvements.

Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
It rather makes ocfs2 configurable and flexible.

Besides, do you still remember John's report about dio crash weeks ago?

I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
It's an option not a mandatory action.

Besides, append-dio feature is key to whether to allocate space with dio writing.
So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
:)

Thanks,
Changwei

> 
>>
>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>
>> Just for append-dio
>>
> 
> If your patch is just for 'append-dio', I wonder that will have impact
> on 'common-dio'.
> 
> thanks,
> Jun
> 
>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>     fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>     1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index d151632..a982cf6 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>     	return ret;
>>>>     }
>>>>     
>>>> +/*
>>>> + * Will look for holes and unwritten extents in the range starting at
>>>> + * pos for count bytes (inclusive).
>>>> + */
>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>> +				       size_t count)
>>>> +{
>>>> +	int ret = 0;
>>>> +	unsigned int extent_flags;
>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>> +	struct super_block *sb = inode->i_sb;
>>>> +
>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>> +
>>>> +	while (clusters) {
>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>> +					 &extent_flags);
>>>> +		if (ret < 0) {
>>>> +			mlog_errno(ret);
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>> +			ret = 1;
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		if (extent_len > clusters)
>>>> +			extent_len = clusters;
>>>> +
>>>> +		clusters -= extent_len;
>>>> +		cpos += extent_len;
>>>> +	}
>>>> +out:
>>>> +	return ret;
>>>> +}
>>>> +
>>>>     static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>     {
>>>>     	struct file *file = iocb->ki_filp;
>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>     		return 0;
>>>>     
>>>>     	/* Fallback to buffered I/O if we do not support append dio. */
>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>> -	    !ocfs2_supports_append_dio(osb))
>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>> +						     iter->count)))
>>> we should check error here, right?
>>
>> Accept this point.
>>
>> Thanks,
>> Changwei
>>
>>>
>>> thanks,
>>> Jun
>>>>     		return 0;
>>>>     
>>>>     	if (iov_iter_rw(iter) == READ)
>>>>
>>>
>>
>> .
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  5:50   ` Changwei Ge
@ 2017-12-19  6:27     ` Gang He
  2017-12-19  7:26       ` Changwei Ge
  0 siblings, 1 reply; 21+ messages in thread
From: Gang He @ 2017-12-19  6:27 UTC (permalink / raw)
  To: ocfs2-devel




>>> 
> On 2017/12/19 10:35, Gang He wrote:
>> Hi Changwei,
>> 
>> 
>>>>>
>>> Before ocfs2 supporting allocating clusters while doing append-dio, all 
> append
>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>> code, writing to file hole will leverage dio to allocate clusters. This is
>>> not
>>> right, since whether append-io is enabled tells the capability whether ocfs2
>>> can
>>> allocate space while doing dio.
>>> So introduce file hole check function back into ocfs2.
>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will
>>> fall
>>> back to buffer IO to allocate clusters.
>>>
>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>> ---
>>>    fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 42 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index d151632..a982cf6 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>    	return ret;
>>>    }
>>>    
>>> +/*
>>> + * Will look for holes and unwritten extents in the range starting at
>>> + * pos for count bytes (inclusive).
>>> + */
>> Why do we also need to look for unwritten extents here? unwritten extents 
> means, the clusters have been allocated, but have not been written yet.
>> Unwritten extents will not bring any block allocation, my understanding is 
> right here?
>> 
> 
> Hi Gang,
> For now, I think your comment here makes scene.
> Unwritten cluster should not make dio fall back to buffer io.
> Actually, I copied this function snippet from earlier version of ocfs2 :)
> 
>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>> +				       size_t count)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned int extent_flags;
>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>> +	struct super_block *sb = inode->i_sb;
>>> +
>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>> Please consider data-inline case, if in data-inline case, we maybe need not to 
> allocate more cluster.
>> Then, if there is not any more block need to be allocated, we should make 
> the check return true.
> 
> I suppose current code in ocfs2_direct_IO already checks this for us.
Hi Changwei, please take a look again.
My means, if the user try to do di-write a inline-data inode, 
the inode has not any cluster, but still can write in inline-data area.
like the similar cases, maybe you need to check, make sure all the cases can work well.

Thanks
Gang

> 
> Thanks,
> Changwei
> 
>> 
>>> +
>>> +	while (clusters) {
>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>> +					 &extent_flags);
>>> +		if (ret < 0) {
>>> +			mlog_errno(ret);
>>> +			goto out;
>>> +		}
>>> +
>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>> +			ret = 1;
>>> +			break;
>>> +		}
>>> +
>>> +		if (extent_len > clusters)
>>> +			extent_len = clusters;
>>> +
>>> +		clusters -= extent_len;
>>> +		cpos += extent_len;
>>> +	}
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>>    static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>    {
>>>    	struct file *file = iocb->ki_filp;
>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb,
>>> struct iov_iter *iter)
>>>    		return 0;
>>>    
>>>    	/* Fallback to buffered I/O if we do not support append dio. */
>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>> -	    !ocfs2_supports_append_dio(osb))
>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>> +						     iter->count)))
>>>    		return 0;
>>>    
>>>    	if (iov_iter_rw(iter) == READ)
>>> -- 
>>> 2.7.4
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>> 
>> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  6:27     ` Gang He
@ 2017-12-19  7:26       ` Changwei Ge
  0 siblings, 0 replies; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  7:26 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 14:28, Gang He wrote:
> 
> 
> 
>>>>
>> On 2017/12/19 10:35, Gang He wrote:
>>> Hi Changwei,
>>>
>>>
>>>>>>
>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all
>> append
>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>> code, writing to file hole will leverage dio to allocate clusters. This is
>>>> not
>>>> right, since whether append-io is enabled tells the capability whether ocfs2
>>>> can
>>>> allocate space while doing dio.
>>>> So introduce file hole check function back into ocfs2.
>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will
>>>> fall
>>>> back to buffer IO to allocate clusters.
>>>>
>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>> ---
>>>>     fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>     1 file changed, 42 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>> index d151632..a982cf6 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>     	return ret;
>>>>     }
>>>>     
>>>> +/*
>>>> + * Will look for holes and unwritten extents in the range starting at
>>>> + * pos for count bytes (inclusive).
>>>> + */
>>> Why do we also need to look for unwritten extents here? unwritten extents
>> means, the clusters have been allocated, but have not been written yet.
>>> Unwritten extents will not bring any block allocation, my understanding is
>> right here?
>>>
>>
>> Hi Gang,
>> For now, I think your comment here makes scene.
>> Unwritten cluster should not make dio fall back to buffer io.
>> Actually, I copied this function snippet from earlier version of ocfs2 :)
>>
>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>> +				       size_t count)
>>>> +{
>>>> +	int ret = 0;
>>>> +	unsigned int extent_flags;
>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>> +	struct super_block *sb = inode->i_sb;
>>>> +
>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>> Please consider data-inline case, if in data-inline case, we maybe need not to
>> allocate more cluster.
>>> Then, if there is not any more block need to be allocated, we should make
>> the check return true.
>>
>> I suppose current code in ocfs2_direct_IO already checks this for us.
> Hi Changwei, please take a look again.
> My means, if the user try to do di-write a inline-data inode,
> the inode has not any cluster, but still can write in inline-data area.
> like the similar cases, maybe you need to check, make sure all the cases can work well.

Aha, I got your point.
Thanks, I will take this into account and perform some test.

> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Changwei
>>
>>>
>>>> +
>>>> +	while (clusters) {
>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>> +					 &extent_flags);
>>>> +		if (ret < 0) {
>>>> +			mlog_errno(ret);
>>>> +			goto out;
>>>> +		}
>>>> +
>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>> +			ret = 1;
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		if (extent_len > clusters)
>>>> +			extent_len = clusters;
>>>> +
>>>> +		clusters -= extent_len;
>>>> +		cpos += extent_len;
>>>> +	}
>>>> +out:
>>>> +	return ret;
>>>> +}
>>>> +
>>>>     static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>     {
>>>>     	struct file *file = iocb->ki_filp;
>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb,
>>>> struct iov_iter *iter)
>>>>     		return 0;
>>>>     
>>>>     	/* Fallback to buffered I/O if we do not support append dio. */
>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>> -	    !ocfs2_supports_append_dio(osb))
>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>> +						     iter->count)))
>>>>     		return 0;
>>>>     
>>>>     	if (iov_iter_rw(iter) == READ)
>>>> -- 
>>>> 2.7.4
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  6:02       ` Changwei Ge
@ 2017-12-19  8:04         ` Junxiao Bi
  2017-12-19  9:11           ` Changwei Ge
  0 siblings, 1 reply; 21+ messages in thread
From: Junxiao Bi @ 2017-12-19  8:04 UTC (permalink / raw)
  To: ocfs2-devel

Hi Changwei,

On 12/19/2017 02:02 PM, Changwei Ge wrote:
> On 2017/12/19 11:41, piaojun wrote:
>> Hi Changwei,
>>
>> On 2017/12/19 11:05, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> On 2017/12/19 9:48, piaojun wrote:
>>>> Hi Changwei,
>>>>
>>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>>> allocate space while doing dio.
>>>>> So introduce file hole check function back into ocfs2.
>>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>>> back to buffer IO to allocate clusters.
>>>>>
>>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>>
>>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
>>
>> Why does dio need fall back to buffer-io when append-dio disabled?
>> Could 'common-dio' on file hole go through direct io process? If not,
>> could you please point out the necessity.
> Hi Jun,
> 
> The intention to make dio fall back to buffer io is to provide *an option* to users, which
> is more stable and even fast.
More memory will be consumed for some important user cases. Like if
ocfs2 is using to store vm system image file, the file is highly sparse
but never extended. If fall back buffer io, more memory will be consumed
by page cache in dom0, that will cause less VM running on dom0 and
performance issue.

>  From my perspective, current ocfs2 dio implementation especially around space allocation during
> doing dio still needs more test and improvements.
> 
> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
> It rather makes ocfs2 configurable and flexible.
> 
> Besides, do you still remember John's report about dio crash weeks ago?
Looks like this is a workaround, why not fix the bug directly? If with
this, people may disable append-dio by default to avoid dio issues. That
will make it never stable. But it is a useful feature.

Thanks,
Junxiao.

> 
> I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
> So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
> It's an option not a mandatory action.
> 
> Besides, append-dio feature is key to whether to allocate space with dio writing.
> So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
> :)
> 
> Thanks,
> Changwei
> 
>>
>>>
>>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>>
>>> Just for append-dio
>>>
>>
>> If your patch is just for 'append-dio', I wonder that will have impact
>> on 'common-dio'.
>>
>> thanks,
>> Jun
>>
>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>> ---
>>>>>     fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>>     1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index d151632..a982cf6 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>>     	return ret;
>>>>>     }
>>>>>     
>>>>> +/*
>>>>> + * Will look for holes and unwritten extents in the range starting at
>>>>> + * pos for count bytes (inclusive).
>>>>> + */
>>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>>> +				       size_t count)
>>>>> +{
>>>>> +	int ret = 0;
>>>>> +	unsigned int extent_flags;
>>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>>> +	struct super_block *sb = inode->i_sb;
>>>>> +
>>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>>> +
>>>>> +	while (clusters) {
>>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>>> +					 &extent_flags);
>>>>> +		if (ret < 0) {
>>>>> +			mlog_errno(ret);
>>>>> +			goto out;
>>>>> +		}
>>>>> +
>>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>>> +			ret = 1;
>>>>> +			break;
>>>>> +		}
>>>>> +
>>>>> +		if (extent_len > clusters)
>>>>> +			extent_len = clusters;
>>>>> +
>>>>> +		clusters -= extent_len;
>>>>> +		cpos += extent_len;
>>>>> +	}
>>>>> +out:
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>>     static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>     {
>>>>>     	struct file *file = iocb->ki_filp;
>>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>     		return 0;
>>>>>     
>>>>>     	/* Fallback to buffered I/O if we do not support append dio. */
>>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>>> -	    !ocfs2_supports_append_dio(osb))
>>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>>> +						     iter->count)))
>>>> we should check error here, right?
>>>
>>> Accept this point.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>>
>>>> thanks,
>>>> Jun
>>>>>     		return 0;
>>>>>     
>>>>>     	if (iov_iter_rw(iter) == READ)
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  8:04         ` Junxiao Bi
@ 2017-12-19  9:11           ` Changwei Ge
  2017-12-19  9:25             ` Junxiao Bi
  0 siblings, 1 reply; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  9:11 UTC (permalink / raw)
  To: ocfs2-devel

Hi Junxiao,

On 2017/12/19 16:15, Junxiao Bi wrote:
> Hi Changwei,
> 
> On 12/19/2017 02:02 PM, Changwei Ge wrote:
>> On 2017/12/19 11:41, piaojun wrote:
>>> Hi Changwei,
>>>
>>> On 2017/12/19 11:05, Changwei Ge wrote:
>>>> Hi Jun,
>>>>
>>>> On 2017/12/19 9:48, piaojun wrote:
>>>>> Hi Changwei,
>>>>>
>>>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>>>> allocate space while doing dio.
>>>>>> So introduce file hole check function back into ocfs2.
>>>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>>>> back to buffer IO to allocate clusters.
>>>>>>
>>>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>>>
>>>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
>>>
>>> Why does dio need fall back to buffer-io when append-dio disabled?
>>> Could 'common-dio' on file hole go through direct io process? If not,
>>> could you please point out the necessity.
>> Hi Jun,
>>
>> The intention to make dio fall back to buffer io is to provide *an option* to users, which
>> is more stable and even fast.
> More memory will be consumed for some important user cases. Like if
> ocfs2 is using to store vm system image file, the file is highly sparse
> but never extended. If fall back buffer io, more memory will be consumed
> by page cache in dom0, that will cause less VM running on dom0 and
> performance issue.

I admit your point above makes scene.
But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially when file is extremely
sparse. So for the worst case, virtual machine even can't run due to crash again and again.

I think the most benefit DIO brings to VM is that data can be transferred to LUN as soon as possible,
thus no data could be missed.

> 
>>   From my perspective, current ocfs2 dio implementation especially around space allocation during
>> doing dio still needs more test and improvements.
>>
>> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
>> It rather makes ocfs2 configurable and flexible.
>>
>> Besides, do you still remember John's report about dio crash weeks ago?
> Looks like this is a workaround, why not fix the bug directly? If with
> this, people may disable append-dio by default to avoid dio issues. That
> will make it never stable. But it is a useful feature.

Arguably, this patch just provides an extra option to users. It's up to ocfs2 users how to use ocfs2 for
their business. I think we should not limit ocfs2 users.

Moreover, I agree that direct IO is a useful feature, but it is not mature enough yet.
We have to improve it, however, it needs time. I suppose we still have a short or long journey until that.
So before that we could provide a backup way.
This may look like kind of workaround, but I prefer to call it an extra option.

Thanks,
Changwei

> 
> Thanks,
> Junxiao.
> 
>>
>> I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
>> So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
>> It's an option not a mandatory action.
>>
>> Besides, append-dio feature is key to whether to allocate space with dio writing.
>> So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
>> :)
>>
>> Thanks,
>> Changwei
>>
>>>
>>>>
>>>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>>>
>>>> Just for append-dio
>>>>
>>>
>>> If your patch is just for 'append-dio', I wonder that will have impact
>>> on 'common-dio'.
>>>
>>> thanks,
>>> Jun
>>>
>>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>>> ---
>>>>>>      fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>>>      1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>> index d151632..a982cf6 100644
>>>>>> --- a/fs/ocfs2/aops.c
>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>>>      	return ret;
>>>>>>      }
>>>>>>      
>>>>>> +/*
>>>>>> + * Will look for holes and unwritten extents in the range starting at
>>>>>> + * pos for count bytes (inclusive).
>>>>>> + */
>>>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>>>> +				       size_t count)
>>>>>> +{
>>>>>> +	int ret = 0;
>>>>>> +	unsigned int extent_flags;
>>>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>>>> +	struct super_block *sb = inode->i_sb;
>>>>>> +
>>>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>>>> +
>>>>>> +	while (clusters) {
>>>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>>>> +					 &extent_flags);
>>>>>> +		if (ret < 0) {
>>>>>> +			mlog_errno(ret);
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>>>> +			ret = 1;
>>>>>> +			break;
>>>>>> +		}
>>>>>> +
>>>>>> +		if (extent_len > clusters)
>>>>>> +			extent_len = clusters;
>>>>>> +
>>>>>> +		clusters -= extent_len;
>>>>>> +		cpos += extent_len;
>>>>>> +	}
>>>>>> +out:
>>>>>> +	return ret;
>>>>>> +}
>>>>>> +
>>>>>>      static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>      {
>>>>>>      	struct file *file = iocb->ki_filp;
>>>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>      		return 0;
>>>>>>      
>>>>>>      	/* Fallback to buffered I/O if we do not support append dio. */
>>>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>>>> -	    !ocfs2_supports_append_dio(osb))
>>>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>>>> +						     iter->count)))
>>>>> we should check error here, right?
>>>>
>>>> Accept this point.
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>>>
>>>>> thanks,
>>>>> Jun
>>>>>>      		return 0;
>>>>>>      
>>>>>>      	if (iov_iter_rw(iter) == READ)
>>>>>>
>>>>>
>>>>
>>>> .
>>>>
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  9:11           ` Changwei Ge
@ 2017-12-19  9:25             ` Junxiao Bi
  2017-12-19  9:44               ` Changwei Ge
  0 siblings, 1 reply; 21+ messages in thread
From: Junxiao Bi @ 2017-12-19  9:25 UTC (permalink / raw)
  To: ocfs2-devel

On 12/19/2017 05:11 PM, Changwei Ge wrote:
> Hi Junxiao,
> 
> On 2017/12/19 16:15, Junxiao Bi wrote:
>> Hi Changwei,
>>
>> On 12/19/2017 02:02 PM, Changwei Ge wrote:
>>> On 2017/12/19 11:41, piaojun wrote:
>>>> Hi Changwei,
>>>>
>>>> On 2017/12/19 11:05, Changwei Ge wrote:
>>>>> Hi Jun,
>>>>>
>>>>> On 2017/12/19 9:48, piaojun wrote:
>>>>>> Hi Changwei,
>>>>>>
>>>>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>>>>> allocate space while doing dio.
>>>>>>> So introduce file hole check function back into ocfs2.
>>>>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>>>>> back to buffer IO to allocate clusters.
>>>>>>>
>>>>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>>>>
>>>>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
>>>>
>>>> Why does dio need fall back to buffer-io when append-dio disabled?
>>>> Could 'common-dio' on file hole go through direct io process? If not,
>>>> could you please point out the necessity.
>>> Hi Jun,
>>>
>>> The intention to make dio fall back to buffer io is to provide *an option* to users, which
>>> is more stable and even fast.
>> More memory will be consumed for some important user cases. Like if
>> ocfs2 is using to store vm system image file, the file is highly sparse
>> but never extended. If fall back buffer io, more memory will be consumed
>> by page cache in dom0, that will cause less VM running on dom0 and
>> performance issue.
> 
> I admit your point above makes scene.
> But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially when file is extremely
> sparse. So for the worst case, virtual machine even can't run due to crash again and again.
Can you please point out where those crash were? I would like take a
look at them. I run ocfs2-test for mainline sometimes, and never find a
dio crash. As i know, Eric also run the test regularly, he also didn't
have a dio crash report.

> 
> I think the most benefit DIO brings to VM is that data can be transferred to LUN as soon as possible,
> thus no data could be missed.
Right, another benefit.

Thanks,
Junxiao.
> 
>>
>>>   From my perspective, current ocfs2 dio implementation especially around space allocation during
>>> doing dio still needs more test and improvements.
>>>
>>> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
>>> It rather makes ocfs2 configurable and flexible.
>>>
>>> Besides, do you still remember John's report about dio crash weeks ago?
>> Looks like this is a workaround, why not fix the bug directly? If with
>> this, people may disable append-dio by default to avoid dio issues. That
>> will make it never stable. But it is a useful feature.
> 
> Arguably, this patch just provides an extra option to users. It's up to ocfs2 users how to use ocfs2 for
> their business. I think we should not limit ocfs2 users.
> 
> Moreover, I agree that direct IO is a useful feature, but it is not mature enough yet.
> We have to improve it, however, it needs time. I suppose we still have a short or long journey until that.
> So before that we could provide a backup way.
> This may look like kind of workaround, but I prefer to call it an extra option.
> 
> Thanks,
> Changwei
> 
>>
>> Thanks,
>> Junxiao.
>>
>>>
>>> I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
>>> So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
>>> It's an option not a mandatory action.
>>>
>>> Besides, append-dio feature is key to whether to allocate space with dio writing.
>>> So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
>>> :)
>>>
>>> Thanks,
>>> Changwei
>>>
>>>>
>>>>>
>>>>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>>>>
>>>>> Just for append-dio
>>>>>
>>>>
>>>> If your patch is just for 'append-dio', I wonder that will have impact
>>>> on 'common-dio'.
>>>>
>>>> thanks,
>>>> Jun
>>>>
>>>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>>>> ---
>>>>>>>      fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>      1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>>> index d151632..a982cf6 100644
>>>>>>> --- a/fs/ocfs2/aops.c
>>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>>>>      	return ret;
>>>>>>>      }
>>>>>>>      
>>>>>>> +/*
>>>>>>> + * Will look for holes and unwritten extents in the range starting at
>>>>>>> + * pos for count bytes (inclusive).
>>>>>>> + */
>>>>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>>>>> +				       size_t count)
>>>>>>> +{
>>>>>>> +	int ret = 0;
>>>>>>> +	unsigned int extent_flags;
>>>>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>>>>> +	struct super_block *sb = inode->i_sb;
>>>>>>> +
>>>>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>>>>> +
>>>>>>> +	while (clusters) {
>>>>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>>>>> +					 &extent_flags);
>>>>>>> +		if (ret < 0) {
>>>>>>> +			mlog_errno(ret);
>>>>>>> +			goto out;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>>>>> +			ret = 1;
>>>>>>> +			break;
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		if (extent_len > clusters)
>>>>>>> +			extent_len = clusters;
>>>>>>> +
>>>>>>> +		clusters -= extent_len;
>>>>>>> +		cpos += extent_len;
>>>>>>> +	}
>>>>>>> +out:
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>>      static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>      {
>>>>>>>      	struct file *file = iocb->ki_filp;
>>>>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>      		return 0;
>>>>>>>      
>>>>>>>      	/* Fallback to buffered I/O if we do not support append dio. */
>>>>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>>>>> -	    !ocfs2_supports_append_dio(osb))
>>>>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>>>>> +						     iter->count)))
>>>>>> we should check error here, right?
>>>>>
>>>>> Accept this point.
>>>>>
>>>>> Thanks,
>>>>> Changwei
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> Jun
>>>>>>>      		return 0;
>>>>>>>      
>>>>>>>      	if (iov_iter_rw(iter) == READ)
>>>>>>>
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  9:25             ` Junxiao Bi
@ 2017-12-19  9:44               ` Changwei Ge
  2017-12-19 12:39                 ` alex chen
  0 siblings, 1 reply; 21+ messages in thread
From: Changwei Ge @ 2017-12-19  9:44 UTC (permalink / raw)
  To: ocfs2-devel

On 2017/12/19 17:30, Junxiao Bi wrote:
> On 12/19/2017 05:11 PM, Changwei Ge wrote:
>> Hi Junxiao,
>>
>> On 2017/12/19 16:15, Junxiao Bi wrote:
>>> Hi Changwei,
>>>
>>> On 12/19/2017 02:02 PM, Changwei Ge wrote:
>>>> On 2017/12/19 11:41, piaojun wrote:
>>>>> Hi Changwei,
>>>>>
>>>>> On 2017/12/19 11:05, Changwei Ge wrote:
>>>>>> Hi Jun,
>>>>>>
>>>>>> On 2017/12/19 9:48, piaojun wrote:
>>>>>>> Hi Changwei,
>>>>>>>
>>>>>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>>>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>>>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>>>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>>>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>>>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>>>>>> allocate space while doing dio.
>>>>>>>> So introduce file hole check function back into ocfs2.
>>>>>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>>>>>> back to buffer IO to allocate clusters.
>>>>>>>>
>>>>>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>>>>>
>>>>>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
>>>>>
>>>>> Why does dio need fall back to buffer-io when append-dio disabled?
>>>>> Could 'common-dio' on file hole go through direct io process? If not,
>>>>> could you please point out the necessity.
>>>> Hi Jun,
>>>>
>>>> The intention to make dio fall back to buffer io is to provide *an option* to users, which
>>>> is more stable and even fast.
>>> More memory will be consumed for some important user cases. Like if
>>> ocfs2 is using to store vm system image file, the file is highly sparse
>>> but never extended. If fall back buffer io, more memory will be consumed
>>> by page cache in dom0, that will cause less VM running on dom0 and
>>> performance issue.
>>
>> I admit your point above makes scene.
>> But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially when file is extremely
>> sparse. So for the worst case, virtual machine even can't run due to crash again and again.
> Can you please point out where those crash were? I would like take a
> look at them. I run ocfs2-test for mainline sometimes, and never find a
> dio crash. As i know, Eric also run the test regularly, he also didn't
> have a dio crash report.

Actually, we are running ocfs2-test, too.
I think why it can't detect some issue is that the target file is not sparse enough.

Weeks ago, John report a dio crash issue and provide a reproducer.

I manage to reproduce it easily. Although, Alex has provided a way to fix it, but I think it has something smells.
Also, his patch didn't handle journal related operation well. Moreover, it will bring extra several times of read
for a single time of write.
I was planning to improve his patch, but it can't be very elegant. So If I have to,otherwise I want to figure out
the root cause for the dio crash issue.

So we are still investigating the root cause why metadata needed is not estimated accurately.

It will be better if you can also take a glance at the dio crash issue John reported.

For now we already have some clues that extents estimated are enough actually but merged and declaimed during marking
extents written. We still need more test on it.

Thanks,
Changwei  

> 
>>
>> I think the most benefit DIO brings to VM is that data can be transferred to LUN as soon as possible,
>> thus no data could be missed.
> Right, another benefit.
> 
> Thanks,
> Junxiao.
>>
>>>
>>>>    From my perspective, current ocfs2 dio implementation especially around space allocation during
>>>> doing dio still needs more test and improvements.
>>>>
>>>> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
>>>> It rather makes ocfs2 configurable and flexible.
>>>>
>>>> Besides, do you still remember John's report about dio crash weeks ago?
>>> Looks like this is a workaround, why not fix the bug directly? If with
>>> this, people may disable append-dio by default to avoid dio issues. That
>>> will make it never stable. But it is a useful feature.
>>
>> Arguably, this patch just provides an extra option to users. It's up to ocfs2 users how to use ocfs2 for
>> their business. I think we should not limit ocfs2 users.
>>
>> Moreover, I agree that direct IO is a useful feature, but it is not mature enough yet.
>> We have to improve it, however, it needs time. I suppose we still have a short or long journey until that.
>> So before that we could provide a backup way.
>> This may look like kind of workaround, but I prefer to call it an extra option.
>>
>> Thanks,
>> Changwei
>>
>>>
>>> Thanks,
>>> Junxiao.
>>>
>>>>
>>>> I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
>>>> So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
>>>> It's an option not a mandatory action.
>>>>
>>>> Besides, append-dio feature is key to whether to allocate space with dio writing.
>>>> So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
>>>> :)
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>>>
>>>>>>
>>>>>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>>>>>
>>>>>> Just for append-dio
>>>>>>
>>>>>
>>>>> If your patch is just for 'append-dio', I wonder that will have impact
>>>>> on 'common-dio'.
>>>>>
>>>>> thanks,
>>>>> Jun
>>>>>
>>>>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>>>>> ---
>>>>>>>>       fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>>       1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>>>> index d151632..a982cf6 100644
>>>>>>>> --- a/fs/ocfs2/aops.c
>>>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>>>>>       	return ret;
>>>>>>>>       }
>>>>>>>>       
>>>>>>>> +/*
>>>>>>>> + * Will look for holes and unwritten extents in the range starting at
>>>>>>>> + * pos for count bytes (inclusive).
>>>>>>>> + */
>>>>>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>>>>>> +				       size_t count)
>>>>>>>> +{
>>>>>>>> +	int ret = 0;
>>>>>>>> +	unsigned int extent_flags;
>>>>>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>>>>>> +	struct super_block *sb = inode->i_sb;
>>>>>>>> +
>>>>>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>>>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>>>>>> +
>>>>>>>> +	while (clusters) {
>>>>>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>>>>>> +					 &extent_flags);
>>>>>>>> +		if (ret < 0) {
>>>>>>>> +			mlog_errno(ret);
>>>>>>>> +			goto out;
>>>>>>>> +		}
>>>>>>>> +
>>>>>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>>>>>> +			ret = 1;
>>>>>>>> +			break;
>>>>>>>> +		}
>>>>>>>> +
>>>>>>>> +		if (extent_len > clusters)
>>>>>>>> +			extent_len = clusters;
>>>>>>>> +
>>>>>>>> +		clusters -= extent_len;
>>>>>>>> +		cpos += extent_len;
>>>>>>>> +	}
>>>>>>>> +out:
>>>>>>>> +	return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>       static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>>       {
>>>>>>>>       	struct file *file = iocb->ki_filp;
>>>>>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>>       		return 0;
>>>>>>>>       
>>>>>>>>       	/* Fallback to buffered I/O if we do not support append dio. */
>>>>>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>>>>>> -	    !ocfs2_supports_append_dio(osb))
>>>>>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>>>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>>>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>>>>>> +						     iter->count)))
>>>>>>> we should check error here, right?
>>>>>>
>>>>>> Accept this point.
>>>>>>
>>>>>> Thanks,
>>>>>> Changwei
>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>> Jun
>>>>>>>>       		return 0;
>>>>>>>>       
>>>>>>>>       	if (iov_iter_rw(iter) == READ)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
  2017-12-19  9:44               ` Changwei Ge
@ 2017-12-19 12:39                 ` alex chen
  0 siblings, 0 replies; 21+ messages in thread
From: alex chen @ 2017-12-19 12:39 UTC (permalink / raw)
  To: ocfs2-devel

Hi Changwei,

On 2017/12/19 17:44, Changwei Ge wrote:
> On 2017/12/19 17:30, Junxiao Bi wrote:
>> On 12/19/2017 05:11 PM, Changwei Ge wrote:
>>> Hi Junxiao,
>>>
>>> On 2017/12/19 16:15, Junxiao Bi wrote:
>>>> Hi Changwei,
>>>>
>>>> On 12/19/2017 02:02 PM, Changwei Ge wrote:
>>>>> On 2017/12/19 11:41, piaojun wrote:
>>>>>> Hi Changwei,
>>>>>>
>>>>>> On 2017/12/19 11:05, Changwei Ge wrote:
>>>>>>> Hi Jun,
>>>>>>>
>>>>>>> On 2017/12/19 9:48, piaojun wrote:
>>>>>>>> Hi Changwei,
>>>>>>>>
>>>>>>>> On 2017/12/18 20:06, Changwei Ge wrote:
>>>>>>>>> Before ocfs2 supporting allocating clusters while doing append-dio, all append
>>>>>>>>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>>>>>>>>> steps on a file hole, it will fall back to buffer io, too. But for current
>>>>>>>>> code, writing to file hole will leverage dio to allocate clusters. This is not
>>>>>>>>> right, since whether append-io is enabled tells the capability whether ocfs2 can
>>>>>>>>> allocate space while doing dio.
>>>>>>>>> So introduce file hole check function back into ocfs2.
>>>>>>>>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall
>>>>>>>>> back to buffer IO to allocate clusters.
>>>>>>>>>
>>>>>>>> 1. Do you mean that filling hole can't go with dio when append-dio is disabled?
>>>>>>>
>>>>>>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled.
>>>>>>
>>>>>> Why does dio need fall back to buffer-io when append-dio disabled?
>>>>>> Could 'common-dio' on file hole go through direct io process? If not,
>>>>>> could you please point out the necessity.
>>>>> Hi Jun,
>>>>>
>>>>> The intention to make dio fall back to buffer io is to provide *an option* to users, which
>>>>> is more stable and even fast.
>>>> More memory will be consumed for some important user cases. Like if
>>>> ocfs2 is using to store vm system image file, the file is highly sparse
>>>> but never extended. If fall back buffer io, more memory will be consumed
>>>> by page cache in dom0, that will cause less VM running on dom0 and
>>>> performance issue.
>>>
>>> I admit your point above makes scene.
>>> But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially when file is extremely
>>> sparse. So for the worst case, virtual machine even can't run due to crash again and again.
>> Can you please point out where those crash were? I would like take a
>> look at them. I run ocfs2-test for mainline sometimes, and never find a
>> dio crash. As i know, Eric also run the test regularly, he also didn't
>> have a dio crash report.
> 
> Actually, we are running ocfs2-test, too.
> I think why it can't detect some issue is that the target file is not sparse enough.
> 
> Weeks ago, John report a dio crash issue and provide a reproducer.
> 
> I manage to reproduce it easily. Although, Alex has provided a way to fix it, but I think it has something smells.
Um, I think it doesn't matter to split _mark_extent_written_ and _set_inode_size_ into different transactions in
my patch(ocfs2: check the metadate alloc before marking extent written). I think it is just a point that can be optimized.

> Also, his patch didn't handle journal related operation well. Moreover, it will bring extra several times of read
> for a single time of write.
It is necessary to bring extra reads for solving the crash problem.

> I was planning to improve his patch, but it can't be very elegant. So If I have to,otherwise I want to figure out
> the root cause for the dio crash issue.
> 
I hope you can send the patch which might be a draft and we can discuss it together.

> So we are still investigating the root cause why metadata needed is not estimated accurately.
> 
The root cause is that the extent tree may be merged during marking extents written, which is caused
by normal user write operation.

IMO, we should fix the crash problem directly instead of falling back to buffer IO.

Thanks,
Alex

> It will be better if you can also take a glance at the dio crash issue John reported.
> 
> For now we already have some clues that extents estimated are enough actually but merged and declaimed during marking
> extents written. We still need more test on it.
> 
> Thanks,
> Changwei  
> 
>>
>>>
>>> I think the most benefit DIO brings to VM is that data can be transferred to LUN as soon as possible,
>>> thus no data could be missed.
>> Right, another benefit.
>>
>> Thanks,
>> Junxiao.
>>>
>>>>
>>>>>    From my perspective, current ocfs2 dio implementation especially around space allocation during
>>>>> doing dio still needs more test and improvements.
>>>>>
>>>>> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature.
>>>>> It rather makes ocfs2 configurable and flexible.
>>>>>
>>>>> Besides, do you still remember John's report about dio crash weeks ago?
>>>> Looks like this is a workaround, why not fix the bug directly? If with
>>>> this, people may disable append-dio by default to avoid dio issues. That
>>>> will make it never stable. But it is a useful feature.
>>>
>>> Arguably, this patch just provides an extra option to users. It's up to ocfs2 users how to use ocfs2 for
>>> their business. I think we should not limit ocfs2 users.
>>>
>>> Moreover, I agree that direct IO is a useful feature, but it is not mature enough yet.
>>> We have to improve it, however, it needs time. I suppose we still have a short or long journey until that.
>>> So before that we could provide a backup way.
>>> This may look like kind of workaround, but I prefer to call it an extra option.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>>>>
>>>>> I managed to reproduce this issue, so for now, I don't trust dio related code one hundred percents.
>>>>> So if something bad happens to dio writing with space allocation, we can still make ocfs2 fall back to buffer io
>>>>> It's an option not a mandatory action.
>>>>>
>>>>> Besides, append-dio feature is key to whether to allocate space with dio writing.
>>>>> So writing to file hole and enlarging file(appending file) should have the same reflection to append-dio feature.
>>>>> :)
>>>>>
>>>>> Thanks,
>>>>> Changwei
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'?
>>>>>>>
>>>>>>> Just for append-dio
>>>>>>>
>>>>>>
>>>>>> If your patch is just for 'append-dio', I wonder that will have impact
>>>>>> on 'common-dio'.
>>>>>>
>>>>>> thanks,
>>>>>> Jun
>>>>>>
>>>>>>>>> Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
>>>>>>>>> ---
>>>>>>>>>       fs/ocfs2/aops.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>>>       1 file changed, 42 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>>>>> index d151632..a982cf6 100644
>>>>>>>>> --- a/fs/ocfs2/aops.c
>>>>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>>>>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>>>>>>>>       	return ret;
>>>>>>>>>       }
>>>>>>>>>       
>>>>>>>>> +/*
>>>>>>>>> + * Will look for holes and unwritten extents in the range starting at
>>>>>>>>> + * pos for count bytes (inclusive).
>>>>>>>>> + */
>>>>>>>>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>>>>>>>> +				       size_t count)
>>>>>>>>> +{
>>>>>>>>> +	int ret = 0;
>>>>>>>>> +	unsigned int extent_flags;
>>>>>>>>> +	u32 cpos, clusters, extent_len, phys_cpos;
>>>>>>>>> +	struct super_block *sb = inode->i_sb;
>>>>>>>>> +
>>>>>>>>> +	cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>>>>>>>>> +	clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>>>>>>>>> +
>>>>>>>>> +	while (clusters) {
>>>>>>>>> +		ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
>>>>>>>>> +					 &extent_flags);
>>>>>>>>> +		if (ret < 0) {
>>>>>>>>> +			mlog_errno(ret);
>>>>>>>>> +			goto out;
>>>>>>>>> +		}
>>>>>>>>> +
>>>>>>>>> +		if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>>>>>>>>> +			ret = 1;
>>>>>>>>> +			break;
>>>>>>>>> +		}
>>>>>>>>> +
>>>>>>>>> +		if (extent_len > clusters)
>>>>>>>>> +			extent_len = clusters;
>>>>>>>>> +
>>>>>>>>> +		clusters -= extent_len;
>>>>>>>>> +		cpos += extent_len;
>>>>>>>>> +	}
>>>>>>>>> +out:
>>>>>>>>> +	return ret;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>       static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>>>       {
>>>>>>>>>       	struct file *file = iocb->ki_filp;
>>>>>>>>> @@ -2429,8 +2467,10 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>>>>>>>>       		return 0;
>>>>>>>>>       
>>>>>>>>>       	/* Fallback to buffered I/O if we do not support append dio. */
>>>>>>>>> -	if (iocb->ki_pos + iter->count > i_size_read(inode) &&
>>>>>>>>> -	    !ocfs2_supports_append_dio(osb))
>>>>>>>>> +	if (!ocfs2_supports_append_dio(osb) &&
>>>>>>>>> +			(iocb->ki_pos + iter->count > i_size_read(inode) ||
>>>>>>>>> +			 ocfs2_check_range_for_holes(inode, iocb->ki_pos,
>>>>>>>>> +						     iter->count)))
>>>>>>>> we should check error here, right?
>>>>>>>
>>>>>>> Accept this point.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Changwei
>>>>>>>
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Jun
>>>>>>>>>       		return 0;
>>>>>>>>>       
>>>>>>>>>       	if (iov_iter_rw(iter) == READ)
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> .
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-12-19 12:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-18 12:06 [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing Changwei Ge
2017-12-18 21:53 ` Andrew Morton
2017-12-19  1:24   ` Joseph Qi
2017-12-19  1:27     ` Andrew Morton
2017-12-19  1:39       ` Joseph Qi
2017-12-19  3:03         ` Changwei Ge
2017-12-19  3:02     ` Changwei Ge
2017-12-19  3:00   ` Changwei Ge
2017-12-19  1:44 ` piaojun
2017-12-19  3:05   ` Changwei Ge
2017-12-19  3:40     ` piaojun
2017-12-19  6:02       ` Changwei Ge
2017-12-19  8:04         ` Junxiao Bi
2017-12-19  9:11           ` Changwei Ge
2017-12-19  9:25             ` Junxiao Bi
2017-12-19  9:44               ` Changwei Ge
2017-12-19 12:39                 ` alex chen
2017-12-19  2:35 ` Gang He
2017-12-19  5:50   ` Changwei Ge
2017-12-19  6:27     ` Gang He
2017-12-19  7:26       ` Changwei Ge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.