All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-03  5:54 ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-03  5:54 UTC (permalink / raw)
  To: david, akpm; +Cc: xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
run into I/O, like in superblock shrinker.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Ming Lei <ming.lei@canonical.com>
---
 include/linux/sched.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..2fb2c47 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
-/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
+/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
+ * __GFP_FS is also cleared as it implies __GFP_IO.
+ */
 static inline gfp_t memalloc_noio_flags(gfp_t flags)
 {
 	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
-		flags &= ~__GFP_IO;
+		flags &= ~(__GFP_IO | __GFP_FS);
 	return flags;
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-03  5:54 ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-03  5:54 UTC (permalink / raw)
  To: david, akpm; +Cc: xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
run into I/O, like in superblock shrinker.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Ming Lei <ming.lei@canonical.com>
---
 include/linux/sched.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..2fb2c47 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
-/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
+/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
+ * __GFP_FS is also cleared as it implies __GFP_IO.
+ */
 static inline gfp_t memalloc_noio_flags(gfp_t flags)
 {
 	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
-		flags &= ~__GFP_IO;
+		flags &= ~(__GFP_IO | __GFP_FS);
 	return flags;
 }
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03  5:54 ` Junxiao Bi
@ 2014-09-03 12:20   ` Trond Myklebust
  -1 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2014-09-03 12:20 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, akpm, xuejiufei, ming.lei, Linux Kernel mailing list,
	linux-mm, Devel FS Linux

On Wed, Sep 3, 2014 at 1:54 AM, Junxiao Bi <junxiao.bi@oracle.com> wrote:
> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: joyce.xue <xuejiufei@huawei.com>
> Cc: Ming Lei <ming.lei@canonical.com>
> ---
>  include/linux/sched.h |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5c2c885..2fb2c47 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>         if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -               flags &= ~__GFP_IO;
> +               flags &= ~(__GFP_IO | __GFP_FS);
>         return flags;
>  }
>

Shouldn't this be a stable fix? If it is needed, then it will affect
all kernels that define PF_MEMALLOC_NOIO.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-03 12:20   ` Trond Myklebust
  0 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2014-09-03 12:20 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, akpm, xuejiufei, ming.lei, Linux Kernel mailing list,
	linux-mm, Devel FS Linux

On Wed, Sep 3, 2014 at 1:54 AM, Junxiao Bi <junxiao.bi@oracle.com> wrote:
> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: joyce.xue <xuejiufei@huawei.com>
> Cc: Ming Lei <ming.lei@canonical.com>
> ---
>  include/linux/sched.h |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5c2c885..2fb2c47 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>         if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -               flags &= ~__GFP_IO;
> +               flags &= ~(__GFP_IO | __GFP_FS);
>         return flags;
>  }
>

Shouldn't this be a stable fix? If it is needed, then it will affect
all kernels that define PF_MEMALLOC_NOIO.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@primarydata.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03  5:54 ` Junxiao Bi
@ 2014-09-03 23:10   ` Andrew Morton
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2014-09-03 23:10 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:

> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.

Is there an actual bug which inspired this fix?  If so, please describe
it.

I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
Where did that info come from?

And the superblock shrinker is a good example of why this shouldn't be
the case.  The main thing that code does is to reclaim clean fs objects
without performing IO.  AFAICT the proposed patch will significantly
weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
the kernel from reclaiming such objects?

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -		flags &= ~__GFP_IO;
> +		flags &= ~(__GFP_IO | __GFP_FS);
>  	return flags;
>  }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-03 23:10   ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2014-09-03 23:10 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:

> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.

Is there an actual bug which inspired this fix?  If so, please describe
it.

I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
Where did that info come from?

And the superblock shrinker is a good example of why this shouldn't be
the case.  The main thing that code does is to reclaim clean fs objects
without performing IO.  AFAICT the proposed patch will significantly
weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
the kernel from reclaiming such objects?

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -		flags &= ~__GFP_IO;
> +		flags &= ~(__GFP_IO | __GFP_FS);
>  	return flags;
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03 23:10   ` Andrew Morton
@ 2014-09-04  2:08     ` Junxiao Bi
  -1 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  2:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 07:10 AM, Andrew Morton wrote:
> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
> 
> Is there an actual bug which inspired this fix?  If so, please describe
> it.
> 
Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
ocfs2 who is for building tcp connections and processing ocfs2 message.
Like when an new node is up in ocfs2 cluster, the workqueue will try to
build the connections to it, since there are some common code in
networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
reclaim will be triggered and call into superblock shrinker if available
memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
shrink the inode cache, ocfs2 needs release cluster lock and this
depends on workqueue to do it, so cause the deadlock. Not sure whether
there are similar issue for other cluster fs, like nfs, it is possible
rpciod hung like the ocfs2 workqueue?


> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> Where did that info come from?
__GFP_FS allowed callback into fs during memory allocation, and fs may
do io whatever __GFP_IO is set?
> 
> And the superblock shrinker is a good example of why this shouldn't be
> the case.  The main thing that code does is to reclaim clean fs objects
> without performing IO.  AFAICT the proposed patch will significantly
> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> the kernel from reclaiming such objects?
Even fs didn't do io in superblock shrinker, it is possible for a fs
process who is not convenient to set GFP_NOFS holding some fs lock and
call back fs again?

PF_MEMALLOC_NOIO is only set for some special processes. I think it
won't affect much.

Thanks,
Junxiao.
> 
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>  
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -		flags &= ~__GFP_IO;
>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>  	return flags;
>>  }
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  2:08     ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  2:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 07:10 AM, Andrew Morton wrote:
> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
> 
> Is there an actual bug which inspired this fix?  If so, please describe
> it.
> 
Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
ocfs2 who is for building tcp connections and processing ocfs2 message.
Like when an new node is up in ocfs2 cluster, the workqueue will try to
build the connections to it, since there are some common code in
networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
reclaim will be triggered and call into superblock shrinker if available
memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
shrink the inode cache, ocfs2 needs release cluster lock and this
depends on workqueue to do it, so cause the deadlock. Not sure whether
there are similar issue for other cluster fs, like nfs, it is possible
rpciod hung like the ocfs2 workqueue?


> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> Where did that info come from?
__GFP_FS allowed callback into fs during memory allocation, and fs may
do io whatever __GFP_IO is set?
> 
> And the superblock shrinker is a good example of why this shouldn't be
> the case.  The main thing that code does is to reclaim clean fs objects
> without performing IO.  AFAICT the proposed patch will significantly
> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> the kernel from reclaiming such objects?
Even fs didn't do io in superblock shrinker, it is possible for a fs
process who is not convenient to set GFP_NOFS holding some fs lock and
call back fs again?

PF_MEMALLOC_NOIO is only set for some special processes. I think it
won't affect much.

Thanks,
Junxiao.
> 
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>  
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -		flags &= ~__GFP_IO;
>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>  	return flags;
>>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03 12:20   ` Trond Myklebust
@ 2014-09-04  2:18     ` Junxiao Bi
  -1 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  2:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: david, akpm, xuejiufei, ming.lei, Linux Kernel mailing list,
	linux-mm, Devel FS Linux

On 09/03/2014 08:20 PM, Trond Myklebust wrote:
> On Wed, Sep 3, 2014 at 1:54 AM, Junxiao Bi <junxiao.bi@oracle.com> wrote:
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: joyce.xue <xuejiufei@huawei.com>
>> Cc: Ming Lei <ming.lei@canonical.com>
>> ---
>>  include/linux/sched.h |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 5c2c885..2fb2c47 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>         if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -               flags &= ~__GFP_IO;
>> +               flags &= ~(__GFP_IO | __GFP_FS);
>>         return flags;
>>  }
>>
> 
> Shouldn't this be a stable fix? If it is needed, then it will affect
> all kernels that define PF_MEMALLOC_NOIO.
Yes, should be. An ocfs2 deadlock bug related to this.

Thanks,
Junxiao.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  2:18     ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  2:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: david, akpm, xuejiufei, ming.lei, Linux Kernel mailing list,
	linux-mm, Devel FS Linux

On 09/03/2014 08:20 PM, Trond Myklebust wrote:
> On Wed, Sep 3, 2014 at 1:54 AM, Junxiao Bi <junxiao.bi@oracle.com> wrote:
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: joyce.xue <xuejiufei@huawei.com>
>> Cc: Ming Lei <ming.lei@canonical.com>
>> ---
>>  include/linux/sched.h |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 5c2c885..2fb2c47 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>         if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -               flags &= ~__GFP_IO;
>> +               flags &= ~(__GFP_IO | __GFP_FS);
>>         return flags;
>>  }
>>
> 
> Shouldn't this be a stable fix? If it is needed, then it will affect
> all kernels that define PF_MEMALLOC_NOIO.
Yes, should be. An ocfs2 deadlock bug related to this.

Thanks,
Junxiao.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-04  2:08     ` Junxiao Bi
@ 2014-09-04  2:30       ` Andrew Morton
  -1 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2014-09-04  2:30 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Thu, 04 Sep 2014 10:08:09 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:

> On 09/04/2014 07:10 AM, Andrew Morton wrote:
> > On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > 
> >> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> >> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> >> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> >> run into I/O, like in superblock shrinker.
> > 
> > Is there an actual bug which inspired this fix?  If so, please describe
> > it.
> > 
> Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
> ocfs2 who is for building tcp connections and processing ocfs2 message.
> Like when an new node is up in ocfs2 cluster, the workqueue will try to
> build the connections to it, since there are some common code in
> networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
> reclaim will be triggered and call into superblock shrinker if available
> memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
> shrink the inode cache, ocfs2 needs release cluster lock and this
> depends on workqueue to do it, so cause the deadlock. Not sure whether
> there are similar issue for other cluster fs, like nfs, it is possible
> rpciod hung like the ocfs2 workqueue?

All this info should be in the changelog.

> 
> > I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> > Where did that info come from?
> __GFP_FS allowed callback into fs during memory allocation, and fs may
> do io whatever __GFP_IO is set?

__GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
enter the fs for writepage, don't write back swapcache.

I guess those concepts have grown over time without a ton of thought
going into it.  Yes, I suppose that if a filesystem's writepage is
called (for example) it expects that it will be able to perform
writeback and it won't check (or even be passed) the __GFP_IO setting.

So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
shouldn't occur.

That being said, it still seems quite bad to disable VFS cache
shrinking for PF_MEMALLOC_NOIO allocation attempts.

> > 
> > And the superblock shrinker is a good example of why this shouldn't be
> > the case.  The main thing that code does is to reclaim clean fs objects
> > without performing IO.  AFAICT the proposed patch will significantly
> > weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> > the kernel from reclaiming such objects?
> Even fs didn't do io in superblock shrinker, it is possible for a fs
> process who is not convenient to set GFP_NOFS holding some fs lock and
> call back fs again?
> 
> PF_MEMALLOC_NOIO is only set for some special processes. I think it
> won't affect much.

Maybe not now.  But once we add hacks like this, people say "goody" and
go and use them rather than exerting the effort to sort out their
deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
2019.

Dunno, I'd like to hear David's thoughts but perhaps it would be better
to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
caches for most filesystems and find some fs-specific fix for ocfs2. 
That would mean testing PF_MEMALLOC_NOIO directly I guess.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  2:30       ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2014-09-04  2:30 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Thu, 04 Sep 2014 10:08:09 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:

> On 09/04/2014 07:10 AM, Andrew Morton wrote:
> > On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> > 
> >> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> >> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> >> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> >> run into I/O, like in superblock shrinker.
> > 
> > Is there an actual bug which inspired this fix?  If so, please describe
> > it.
> > 
> Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
> ocfs2 who is for building tcp connections and processing ocfs2 message.
> Like when an new node is up in ocfs2 cluster, the workqueue will try to
> build the connections to it, since there are some common code in
> networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
> reclaim will be triggered and call into superblock shrinker if available
> memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
> shrink the inode cache, ocfs2 needs release cluster lock and this
> depends on workqueue to do it, so cause the deadlock. Not sure whether
> there are similar issue for other cluster fs, like nfs, it is possible
> rpciod hung like the ocfs2 workqueue?

All this info should be in the changelog.

> 
> > I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> > Where did that info come from?
> __GFP_FS allowed callback into fs during memory allocation, and fs may
> do io whatever __GFP_IO is set?

__GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
enter the fs for writepage, don't write back swapcache.

I guess those concepts have grown over time without a ton of thought
going into it.  Yes, I suppose that if a filesystem's writepage is
called (for example) it expects that it will be able to perform
writeback and it won't check (or even be passed) the __GFP_IO setting.

So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
shouldn't occur.

That being said, it still seems quite bad to disable VFS cache
shrinking for PF_MEMALLOC_NOIO allocation attempts.

> > 
> > And the superblock shrinker is a good example of why this shouldn't be
> > the case.  The main thing that code does is to reclaim clean fs objects
> > without performing IO.  AFAICT the proposed patch will significantly
> > weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> > the kernel from reclaiming such objects?
> Even fs didn't do io in superblock shrinker, it is possible for a fs
> process who is not convenient to set GFP_NOFS holding some fs lock and
> call back fs again?
> 
> PF_MEMALLOC_NOIO is only set for some special processes. I think it
> won't affect much.

Maybe not now.  But once we add hacks like this, people say "goody" and
go and use them rather than exerting the effort to sort out their
deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
2019.

Dunno, I'd like to hear David's thoughts but perhaps it would be better
to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
caches for most filesystems and find some fs-specific fix for ocfs2. 
That would mean testing PF_MEMALLOC_NOIO directly I guess.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-04  2:30       ` Andrew Morton
  (?)
@ 2014-09-04  4:57         ` Junxiao Bi
  -1 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  4:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 10:30 AM, Andrew Morton wrote:
> On Thu, 04 Sep 2014 10:08:09 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
>> On 09/04/2014 07:10 AM, Andrew Morton wrote:
>>> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>
>>>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>>>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>>>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>>>> run into I/O, like in superblock shrinker.
>>>
>>> Is there an actual bug which inspired this fix?  If so, please describe
>>> it.
>>>
>> Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
>> ocfs2 who is for building tcp connections and processing ocfs2 message.
>> Like when an new node is up in ocfs2 cluster, the workqueue will try to
>> build the connections to it, since there are some common code in
>> networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
>> reclaim will be triggered and call into superblock shrinker if available
>> memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
>> shrink the inode cache, ocfs2 needs release cluster lock and this
>> depends on workqueue to do it, so cause the deadlock. Not sure whether
>> there are similar issue for other cluster fs, like nfs, it is possible
>> rpciod hung like the ocfs2 workqueue?
> 
> All this info should be in the changelog.
> 
>>
>>> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
>>> Where did that info come from?
>> __GFP_FS allowed callback into fs during memory allocation, and fs may
>> do io whatever __GFP_IO is set?
> 
> __GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
> enter the fs for writepage, don't write back swapcache.
> 
> I guess those concepts have grown over time without a ton of thought
> going into it.  Yes, I suppose that if a filesystem's writepage is
> called (for example) it expects that it will be able to perform
> writeback and it won't check (or even be passed) the __GFP_IO setting.
> 
> So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
> shouldn't occur.
> 
> That being said, it still seems quite bad to disable VFS cache
> shrinking for PF_MEMALLOC_NOIO allocation attempts.
Even without this ocfs2 deadlock bug, the implement of PF_MEMALLOC_NOIO
is wrong. See the deadlock case described in its log below. Let see the
case "block device runtime resume", since __GFP_FS is not cleared, it
could run into fs writepage and cause deadlock.


>From 21caf2fc1931b485483ddd254b634fa8f0099963 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Fri, 22 Feb 2013 16:34:08 -0800
Subject: [PATCH] mm: teach mm by current context info to not do I/O during
 memory allocation

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task to avoid
doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device, and
the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
  GFP_KERNEL is called inside runtime resume callback of any one of its
  ancestors(or the block device itself), the deadlock may be triggered
  inside the memory allocation since it might not complete until the block
  device becomes active and the involed page I/O finishes.  The situation
  is pointed out first by Alan Stern.  It is not a good approach to
  convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
  subsystems may be involved(for example, PCI, USB and SCSI may be
  involved for usb mass stoarage device, network devices involved too in
  the iSCSI case)

- during block device runtime suspend, because runtime resume need to
  wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset will
  be put on the device, so there shouldn't have any memory allocation with
  GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
  above may be triggered.  Unfortunately, any usb device may include one
  mass storage interface in theory, so it requires all usb interface
  drivers to handle the situation.  In fact, most usb drivers don't know
  how to handle bus reset on the device and don't provide .pre_set() and
  .post_reset() callback at all, so USB core has to unbind and bind driver
  for these devices.  So it is still not practical to resort to GFP_NOIO
  for solving the problem.

Thanks,
Junxiao.
> 
>>>
>>> And the superblock shrinker is a good example of why this shouldn't be
>>> the case.  The main thing that code does is to reclaim clean fs objects
>>> without performing IO.  AFAICT the proposed patch will significantly
>>> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
>>> the kernel from reclaiming such objects?
>> Even fs didn't do io in superblock shrinker, it is possible for a fs
>> process who is not convenient to set GFP_NOFS holding some fs lock and
>> call back fs again?
>>
>> PF_MEMALLOC_NOIO is only set for some special processes. I think it
>> won't affect much.
> 
> Maybe not now.  But once we add hacks like this, people say "goody" and
> go and use them rather than exerting the effort to sort out their
> deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
> 2019.
> 
> Dunno, I'd like to hear David's thoughts but perhaps it would be better
> to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
> caches for most filesystems and find some fs-specific fix for ocfs2. 
> That would mean testing PF_MEMALLOC_NOIO directly I guess.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  4:57         ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  4:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 10:30 AM, Andrew Morton wrote:
> On Thu, 04 Sep 2014 10:08:09 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
>> On 09/04/2014 07:10 AM, Andrew Morton wrote:
>>> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>
>>>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>>>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>>>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>>>> run into I/O, like in superblock shrinker.
>>>
>>> Is there an actual bug which inspired this fix?  If so, please describe
>>> it.
>>>
>> Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
>> ocfs2 who is for building tcp connections and processing ocfs2 message.
>> Like when an new node is up in ocfs2 cluster, the workqueue will try to
>> build the connections to it, since there are some common code in
>> networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
>> reclaim will be triggered and call into superblock shrinker if available
>> memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
>> shrink the inode cache, ocfs2 needs release cluster lock and this
>> depends on workqueue to do it, so cause the deadlock. Not sure whether
>> there are similar issue for other cluster fs, like nfs, it is possible
>> rpciod hung like the ocfs2 workqueue?
> 
> All this info should be in the changelog.
> 
>>
>>> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
>>> Where did that info come from?
>> __GFP_FS allowed callback into fs during memory allocation, and fs may
>> do io whatever __GFP_IO is set?
> 
> __GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
> enter the fs for writepage, don't write back swapcache.
> 
> I guess those concepts have grown over time without a ton of thought
> going into it.  Yes, I suppose that if a filesystem's writepage is
> called (for example) it expects that it will be able to perform
> writeback and it won't check (or even be passed) the __GFP_IO setting.
> 
> So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
> shouldn't occur.
> 
> That being said, it still seems quite bad to disable VFS cache
> shrinking for PF_MEMALLOC_NOIO allocation attempts.
Even without this ocfs2 deadlock bug, the implement of PF_MEMALLOC_NOIO
is wrong. See the deadlock case described in its log below. Let see the
case "block device runtime resume", since __GFP_FS is not cleared, it
could run into fs writepage and cause deadlock.


>From 21caf2fc1931b485483ddd254b634fa8f0099963 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Fri, 22 Feb 2013 16:34:08 -0800
Subject: [PATCH] mm: teach mm by current context info to not do I/O during
 memory allocation

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task to avoid
doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device, and
the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
  GFP_KERNEL is called inside runtime resume callback of any one of its
  ancestors(or the block device itself), the deadlock may be triggered
  inside the memory allocation since it might not complete until the block
  device becomes active and the involed page I/O finishes.  The situation
  is pointed out first by Alan Stern.  It is not a good approach to
  convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
  subsystems may be involved(for example, PCI, USB and SCSI may be
  involved for usb mass stoarage device, network devices involved too in
  the iSCSI case)

- during block device runtime suspend, because runtime resume need to
  wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset will
  be put on the device, so there shouldn't have any memory allocation with
  GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
  above may be triggered.  Unfortunately, any usb device may include one
  mass storage interface in theory, so it requires all usb interface
  drivers to handle the situation.  In fact, most usb drivers don't know
  how to handle bus reset on the device and don't provide .pre_set() and
  .post_reset() callback at all, so USB core has to unbind and bind driver
  for these devices.  So it is still not practical to resort to GFP_NOIO
  for solving the problem.

Thanks,
Junxiao.
> 
>>>
>>> And the superblock shrinker is a good example of why this shouldn't be
>>> the case.  The main thing that code does is to reclaim clean fs objects
>>> without performing IO.  AFAICT the proposed patch will significantly
>>> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
>>> the kernel from reclaiming such objects?
>> Even fs didn't do io in superblock shrinker, it is possible for a fs
>> process who is not convenient to set GFP_NOFS holding some fs lock and
>> call back fs again?
>>
>> PF_MEMALLOC_NOIO is only set for some special processes. I think it
>> won't affect much.
> 
> Maybe not now.  But once we add hacks like this, people say "goody" and
> go and use them rather than exerting the effort to sort out their
> deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
> 2019.
> 
> Dunno, I'd like to hear David's thoughts but perhaps it would be better
> to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
> caches for most filesystems and find some fs-specific fix for ocfs2. 
> That would mean testing PF_MEMALLOC_NOIO directly I guess.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  4:57         ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-04  4:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 10:30 AM, Andrew Morton wrote:
> On Thu, 04 Sep 2014 10:08:09 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
>> On 09/04/2014 07:10 AM, Andrew Morton wrote:
>>> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
>>>
>>>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>>>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>>>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>>>> run into I/O, like in superblock shrinker.
>>>
>>> Is there an actual bug which inspired this fix?  If so, please describe
>>> it.
>>>
>> Yes, an ocfs2 deadlock bug is related to this, there is a workqueue in
>> ocfs2 who is for building tcp connections and processing ocfs2 message.
>> Like when an new node is up in ocfs2 cluster, the workqueue will try to
>> build the connections to it, since there are some common code in
>> networking like sock_alloc() using GFP_KERNEL to allocate memory, direct
>> reclaim will be triggered and call into superblock shrinker if available
>> memory is not enough even set PF_MEMALLOC_NOIO for the workqueue. To
>> shrink the inode cache, ocfs2 needs release cluster lock and this
>> depends on workqueue to do it, so cause the deadlock. Not sure whether
>> there are similar issue for other cluster fs, like nfs, it is possible
>> rpciod hung like the ocfs2 workqueue?
> 
> All this info should be in the changelog.
> 
>>
>>> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
>>> Where did that info come from?
>> __GFP_FS allowed callback into fs during memory allocation, and fs may
>> do io whatever __GFP_IO is set?
> 
> __GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
> enter the fs for writepage, don't write back swapcache.
> 
> I guess those concepts have grown over time without a ton of thought
> going into it.  Yes, I suppose that if a filesystem's writepage is
> called (for example) it expects that it will be able to perform
> writeback and it won't check (or even be passed) the __GFP_IO setting.
> 
> So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
> shouldn't occur.
> 
> That being said, it still seems quite bad to disable VFS cache
> shrinking for PF_MEMALLOC_NOIO allocation attempts.
Even without this ocfs2 deadlock bug, the implement of PF_MEMALLOC_NOIO
is wrong. See the deadlock case described in its log below. Let see the
case "block device runtime resume", since __GFP_FS is not cleared, it
could run into fs writepage and cause deadlock.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-04  2:30       ` Andrew Morton
@ 2014-09-04  8:05         ` Anton Altaparmakov
  -1 siblings, 0 replies; 27+ messages in thread
From: Anton Altaparmakov @ 2014-09-04  8:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, david, xuejiufei, ming.lei, linux-kernel, linux-mm,
	linux-fsdevel

On 4 Sep 2014, at 03:30, Andrew Morton <akpm@linux-foundation.org> wrote:
> __GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
> enter the fs for writepage, don't write back swapcache.
> 
> I guess those concepts have grown over time without a ton of thought
> going into it.  Yes, I suppose that if a filesystem's writepage is
> called (for example) it expects that it will be able to perform
> writeback and it won't check (or even be passed) the __GFP_IO setting.
> 
> So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
> shouldn't occur.
> 
> That being said, it still seems quite bad to disable VFS cache
> shrinking for PF_MEMALLOC_NOIO allocation attempts.

I think what it really boils down to is that file systems cannot allow recursion into _that_ file system so if VFS/VM shrinking could skip over all inodes/dentries/pages that are associated with the superblock of the volume for which the allocation is being done then that would be just fine.

An alternative would be that the file systems would need to be passed in a flag that will tell them that it is not safe to take locks and then file systems that need to take a lock could return with -EDEADLOCK and the VM can then skip over those entries and reclaim others.  Though I think it would be more efficient for the VFS/VM to simply not call into the file system that is doing the allocation as above...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  8:05         ` Anton Altaparmakov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Altaparmakov @ 2014-09-04  8:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, david, xuejiufei, ming.lei, linux-kernel, linux-mm,
	linux-fsdevel

On 4 Sep 2014, at 03:30, Andrew Morton <akpm@linux-foundation.org> wrote:
> __GFP_FS and __GFP_IO are (or were) for communicating to vmscan: don't
> enter the fs for writepage, don't write back swapcache.
> 
> I guess those concepts have grown over time without a ton of thought
> going into it.  Yes, I suppose that if a filesystem's writepage is
> called (for example) it expects that it will be able to perform
> writeback and it won't check (or even be passed) the __GFP_IO setting.
> 
> So I guess we could say that !__GFP_FS && GFP_IO is not implemented and
> shouldn't occur.
> 
> That being said, it still seems quite bad to disable VFS cache
> shrinking for PF_MEMALLOC_NOIO allocation attempts.

I think what it really boils down to is that file systems cannot allow recursion into _that_ file system so if VFS/VM shrinking could skip over all inodes/dentries/pages that are associated with the superblock of the volume for which the allocation is being done then that would be just fine.

An alternative would be that the file systems would need to be passed in a flag that will tell them that it is not safe to take locks and then file systems that need to take a lock could return with -EDEADLOCK and the VM can then skip over those entries and reclaim others.  Though I think it would be more efficient for the VFS/VM to simply not call into the file system that is doing the allocation as above...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03 23:10   ` Andrew Morton
@ 2014-09-04  9:05     ` Dave Chinner
  -1 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 04:10:00PM -0700, Andrew Morton wrote:
> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
> > commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> > introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> > when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> > run into I/O, like in superblock shrinker.
> 
> Is there an actual bug which inspired this fix?  If so, please describe
> it.
> 
> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> Where did that info come from?

Pretty damn clear to me:

#define GFP_ATOMIC      (__GFP_HIGH)
#define GFP_NOIO        (__GFP_WAIT)
#define GFP_NOFS        (__GFP_WAIT | __GFP_IO)
#define GFP_KERNEL      (__GFP_WAIT | __GFP_IO | __GFP_FS)

especially when you consider the layering of the subsystems that use
these contexts. i.e. KERNEL on top of FS on top of IO on top of
ATOMIC....

IOWs, asking for (__GFP_WAIT | __GFP_FS) reclaim context is
something outside the defined reclaim heirarchy. Filesystems
*depend* on being about to do IO to perform recalim of dirty
objects, whether it be the page cache, inode cache or any other
filesystem cache that can hold dirty objects.

> And the superblock shrinker is a good example of why this shouldn't be
> the case.  The main thing that code does is to reclaim clean fs objects
> without performing IO.

Filesystem shrinkers do indeed perform IO from the superblock
shrinker and have for years. Even clean inodes can require IO before
they can be freed - e.g. on an orphan list, need truncation of
post-eof blocks, need to wait for ordered operations to complete
before it can be freed, etc.

IOWs, Ext4, btrfs and XFS all can issue and/or block on
arbitrary amounts of IO in the superblock shrinker context. XFS, in
particular, has been doing transactions and IO from the VFS inode
cache shrinker since it was first introduced....

> AFAICT the proposed patch will significantly
> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> the kernel from reclaiming such objects?

PF_MEMALLOC_NOIO is the anomolous case. It also has very few users,
who all happen to be working around very rare deadlocks caused by
vmalloc() hard coding GFP_KERNEL allocations deep in it's stack. So
the impact of fixing this anomoly is going to be completely
unnoticable...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  9:05     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 04:10:00PM -0700, Andrew Morton wrote:
> On Wed,  3 Sep 2014 13:54:54 +0800 Junxiao Bi <junxiao.bi@oracle.com> wrote:
> 
> > commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> > introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> > when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> > run into I/O, like in superblock shrinker.
> 
> Is there an actual bug which inspired this fix?  If so, please describe
> it.
> 
> I don't think it's accurate to say that __GFP_FS implies __GFP_IO. 
> Where did that info come from?

Pretty damn clear to me:

#define GFP_ATOMIC      (__GFP_HIGH)
#define GFP_NOIO        (__GFP_WAIT)
#define GFP_NOFS        (__GFP_WAIT | __GFP_IO)
#define GFP_KERNEL      (__GFP_WAIT | __GFP_IO | __GFP_FS)

especially when you consider the layering of the subsystems that use
these contexts. i.e. KERNEL on top of FS on top of IO on top of
ATOMIC....

IOWs, asking for (__GFP_WAIT | __GFP_FS) reclaim context is
something outside the defined reclaim heirarchy. Filesystems
*depend* on being about to do IO to perform recalim of dirty
objects, whether it be the page cache, inode cache or any other
filesystem cache that can hold dirty objects.

> And the superblock shrinker is a good example of why this shouldn't be
> the case.  The main thing that code does is to reclaim clean fs objects
> without performing IO.

Filesystem shrinkers do indeed perform IO from the superblock
shrinker and have for years. Even clean inodes can require IO before
they can be freed - e.g. on an orphan list, need truncation of
post-eof blocks, need to wait for ordered operations to complete
before it can be freed, etc.

IOWs, Ext4, btrfs and XFS all can issue and/or block on
arbitrary amounts of IO in the superblock shrinker context. XFS, in
particular, has been doing transactions and IO from the VFS inode
cache shrinker since it was first introduced....

> AFAICT the proposed patch will significantly
> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
> the kernel from reclaiming such objects?

PF_MEMALLOC_NOIO is the anomolous case. It also has very few users,
who all happen to be working around very rare deadlocks caused by
vmalloc() hard coding GFP_KERNEL allocations deep in it's stack. So
the impact of fixing this anomoly is going to be completely
unnoticable...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-04  2:30       ` Andrew Morton
@ 2014-09-04  9:21         ` Dave Chinner
  -1 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 07:30:58PM -0700, Andrew Morton wrote:
> > PF_MEMALLOC_NOIO is only set for some special processes. I think it
> > won't affect much.
> 
> Maybe not now.  But once we add hacks like this, people say "goody" and
> go and use them rather than exerting the effort to sort out their
> deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
> 2019.

We got PF_MEMALLOC_NOIO because we failed to get vmalloc deadlocks
fixed. The reason vmalloc didn't get fixed?

"there will be more vmalloc users".

> Dunno, I'd like to hear David's thoughts but perhaps it would be better
> to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
> caches for most filesystems and find some fs-specific fix for ocfs2. 
> That would mean testing PF_MEMALLOC_NOIO directly I guess.

No special flags in the superblock shrinker, please. We have tens of
other filesystem shrinkers that might be impacted, too. If we do not
want filesystem shrinkers (note the plural) to run, the
shrink_control->gfp_mask needs to have __GFP_FS cleared from it when
it is first configured and so that context is constant across all
shrinker reclaim cases.

If you're really worried by changing PF_MEMALLOC_NOIO, then we can
introduce PF_MEMALLOC_NOFS and have the mm subsystem mask both flags
appropriately when setting the gfp_mask in the shrink_control
settings. But fundamentally, our reclaim heirarchy defines that NOIO
implies NOFS, and so we need to fix PF_MEMALLOC_NOIO anyway.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  9:21         ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Junxiao Bi, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 07:30:58PM -0700, Andrew Morton wrote:
> > PF_MEMALLOC_NOIO is only set for some special processes. I think it
> > won't affect much.
> 
> Maybe not now.  But once we add hacks like this, people say "goody" and
> go and use them rather than exerting the effort to sort out their
> deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
> 2019.

We got PF_MEMALLOC_NOIO because we failed to get vmalloc deadlocks
fixed. The reason vmalloc didn't get fixed?

"there will be more vmalloc users".

> Dunno, I'd like to hear David's thoughts but perhaps it would be better
> to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
> caches for most filesystems and find some fs-specific fix for ocfs2. 
> That would mean testing PF_MEMALLOC_NOIO directly I guess.

No special flags in the superblock shrinker, please. We have tens of
other filesystem shrinkers that might be impacted, too. If we do not
want filesystem shrinkers (note the plural) to run, the
shrink_control->gfp_mask needs to have __GFP_FS cleared from it when
it is first configured and so that context is constant across all
shrinker reclaim cases.

If you're really worried by changing PF_MEMALLOC_NOIO, then we can
introduce PF_MEMALLOC_NOFS and have the mm subsystem mask both flags
appropriately when setting the gfp_mask in the shrink_control
settings. But fundamentally, our reclaim heirarchy defines that NOIO
implies NOFS, and so we need to fix PF_MEMALLOC_NOIO anyway.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-03  5:54 ` Junxiao Bi
@ 2014-09-04  9:23   ` Dave Chinner
  -1 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:23 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: joyce.xue <xuejiufei@huawei.com>
> Cc: Ming Lei <ming.lei@canonical.com>
> ---
>  include/linux/sched.h |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5c2c885..2fb2c47 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -		flags &= ~__GFP_IO;
> +		flags &= ~(__GFP_IO | __GFP_FS);
>  	return flags;
>  }

You also need to mask all the shrink_control->gfp_mask
initialisations in mm/vmscan.c. The current code only masks the page
reclaim gfp_mask, not those that are passed to the shrinkers.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-04  9:23   ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2014-09-04  9:23 UTC (permalink / raw)
  To: Junxiao Bi
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
> run into I/O, like in superblock shrinker.
> 
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> Cc: joyce.xue <xuejiufei@huawei.com>
> Cc: Ming Lei <ming.lei@canonical.com>
> ---
>  include/linux/sched.h |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5c2c885..2fb2c47 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>  #define used_math() tsk_used_math(current)
>  
> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
> + * __GFP_FS is also cleared as it implies __GFP_IO.
> + */
>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>  {
>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> -		flags &= ~__GFP_IO;
> +		flags &= ~(__GFP_IO | __GFP_FS);
>  	return flags;
>  }

You also need to mask all the shrink_control->gfp_mask
initialisations in mm/vmscan.c. The current code only masks the page
reclaim gfp_mask, not those that are passed to the shrinkers.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-04  9:23   ` Dave Chinner
@ 2014-09-05  2:32     ` Junxiao Bi
  -1 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-05  2:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 05:23 PM, Dave Chinner wrote:
> On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: joyce.xue <xuejiufei@huawei.com>
>> Cc: Ming Lei <ming.lei@canonical.com>
>> ---
>>  include/linux/sched.h |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 5c2c885..2fb2c47 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>  
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -		flags &= ~__GFP_IO;
>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>  	return flags;
>>  }
> 
> You also need to mask all the shrink_control->gfp_mask
> initialisations in mm/vmscan.c. The current code only masks the page
> reclaim gfp_mask, not those that are passed to the shrinkers.
Yes, there are some shrink_control->gfp_mask not masked in vmscan.c in
the following functions. Beside this, all seemed be masked from direct
reclaim path by memalloc_noio_flags().

-reclaim_clean_pages_from_list()
used by alloc_contig_range(), this function is invoked in hugetlb and
cma, for hugetlb, it should be safe as only userspace use it. I am not
sure about the cma.
David & Andrew, may you share your idea about whether cma is affected?

-mem_cgroup_shrink_node_zone()
-try_to_free_mem_cgroup_pages()
These two are used by mem cgroup, as no kernel thread can be assigned
into such cgroup, so i think, no need mask.

-balance_pgdat()
used by kswapd, no need mask.

-shrink_all_memory()
used by hibernate, should be safe with GFP_FS/IO.

Thanks,
Junxiao.
> 
> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-05  2:32     ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-05  2:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/04/2014 05:23 PM, Dave Chinner wrote:
> On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>> run into I/O, like in superblock shrinker.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Cc: joyce.xue <xuejiufei@huawei.com>
>> Cc: Ming Lei <ming.lei@canonical.com>
>> ---
>>  include/linux/sched.h |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 5c2c885..2fb2c47 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>  
>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>> + */
>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>  {
>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> -		flags &= ~__GFP_IO;
>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>  	return flags;
>>  }
> 
> You also need to mask all the shrink_control->gfp_mask
> initialisations in mm/vmscan.c. The current code only masks the page
> reclaim gfp_mask, not those that are passed to the shrinkers.
Yes, there are some shrink_control->gfp_mask not masked in vmscan.c in
the following functions. Beside this, all seemed be masked from direct
reclaim path by memalloc_noio_flags().

-reclaim_clean_pages_from_list()
used by alloc_contig_range(), this function is invoked in hugetlb and
cma, for hugetlb, it should be safe as only userspace use it. I am not
sure about the cma.
David & Andrew, may you share your idea about whether cma is affected?

-mem_cgroup_shrink_node_zone()
-try_to_free_mem_cgroup_pages()
These two are used by mem cgroup, as no kernel thread can be assigned
into such cgroup, so i think, no need mask.

-balance_pgdat()
used by kswapd, no need mask.

-shrink_all_memory()
used by hibernate, should be safe with GFP_FS/IO.

Thanks,
Junxiao.
> 
> Cheers,
> 
> Dave.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  2014-09-05  2:32     ` Junxiao Bi
@ 2014-09-05  5:13       ` Junxiao Bi
  -1 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-05  5:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/05/2014 10:32 AM, Junxiao Bi wrote:
> On 09/04/2014 05:23 PM, Dave Chinner wrote:
>> On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
>>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>>> run into I/O, like in superblock shrinker.
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Cc: joyce.xue <xuejiufei@huawei.com>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> ---
>>>  include/linux/sched.h |    6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 5c2c885..2fb2c47 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>>  #define used_math() tsk_used_math(current)
>>>  
>>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>>> + */
>>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>>  {
>>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>>> -		flags &= ~__GFP_IO;
>>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>>  	return flags;
>>>  }
>>
>> You also need to mask all the shrink_control->gfp_mask
>> initialisations in mm/vmscan.c. The current code only masks the page
>> reclaim gfp_mask, not those that are passed to the shrinkers.
> Yes, there are some shrink_control->gfp_mask not masked in vmscan.c in
> the following functions. Beside this, all seemed be masked from direct
> reclaim path by memalloc_noio_flags().
> 
> -reclaim_clean_pages_from_list()
> used by alloc_contig_range(), this function is invoked in hugetlb and
> cma, for hugetlb, it should be safe as only userspace use it. I am not
> sure about the cma.
> David & Andrew, may you share your idea about whether cma is affected?
> 
Look at CMA, it's used for device which doesn't support scatter/gather
dma and mainly used for embedded device like camera, this should not be
the case of the block device. So i think this gfp_mask doesn't need be
masked.

Thanks,
Junxiao.
> -mem_cgroup_shrink_node_zone()
> -try_to_free_mem_cgroup_pages()
> These two are used by mem cgroup, as no kernel thread can be assigned
> into such cgroup, so i think, no need mask.
> 
> -balance_pgdat()
> used by kswapd, no need mask.
> 
> -shrink_all_memory()
> used by hibernate, should be safe with GFP_FS/IO.
> 
> Thanks,
> Junxiao.
>>
>> Cheers,
>>
>> Dave.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
@ 2014-09-05  5:13       ` Junxiao Bi
  0 siblings, 0 replies; 27+ messages in thread
From: Junxiao Bi @ 2014-09-05  5:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: akpm, xuejiufei, ming.lei, linux-kernel, linux-mm, linux-fsdevel

On 09/05/2014 10:32 AM, Junxiao Bi wrote:
> On 09/04/2014 05:23 PM, Dave Chinner wrote:
>> On Wed, Sep 03, 2014 at 01:54:54PM +0800, Junxiao Bi wrote:
>>> commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation")
>>> introduces PF_MEMALLOC_NOIO flag to avoid doing I/O inside memory allocation, __GFP_IO is cleared
>>> when this flag is set, but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
>>> run into I/O, like in superblock shrinker.
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Cc: joyce.xue <xuejiufei@huawei.com>
>>> Cc: Ming Lei <ming.lei@canonical.com>
>>> ---
>>>  include/linux/sched.h |    6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 5c2c885..2fb2c47 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -1936,11 +1936,13 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
>>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>>  #define used_math() tsk_used_math(current)
>>>  
>>> -/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
>>> +/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags
>>> + * __GFP_FS is also cleared as it implies __GFP_IO.
>>> + */
>>>  static inline gfp_t memalloc_noio_flags(gfp_t flags)
>>>  {
>>>  	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>>> -		flags &= ~__GFP_IO;
>>> +		flags &= ~(__GFP_IO | __GFP_FS);
>>>  	return flags;
>>>  }
>>
>> You also need to mask all the shrink_control->gfp_mask
>> initialisations in mm/vmscan.c. The current code only masks the page
>> reclaim gfp_mask, not those that are passed to the shrinkers.
> Yes, there are some shrink_control->gfp_mask not masked in vmscan.c in
> the following functions. Beside this, all seemed be masked from direct
> reclaim path by memalloc_noio_flags().
> 
> -reclaim_clean_pages_from_list()
> used by alloc_contig_range(), this function is invoked in hugetlb and
> cma, for hugetlb, it should be safe as only userspace use it. I am not
> sure about the cma.
> David & Andrew, may you share your idea about whether cma is affected?
> 
Look at CMA, it's used for device which doesn't support scatter/gather
dma and mainly used for embedded device like camera, this should not be
the case of the block device. So i think this gfp_mask doesn't need be
masked.

Thanks,
Junxiao.
> -mem_cgroup_shrink_node_zone()
> -try_to_free_mem_cgroup_pages()
> These two are used by mem cgroup, as no kernel thread can be assigned
> into such cgroup, so i think, no need mask.
> 
> -balance_pgdat()
> used by kswapd, no need mask.
> 
> -shrink_all_memory()
> used by hibernate, should be safe with GFP_FS/IO.
> 
> Thanks,
> Junxiao.
>>
>> Cheers,
>>
>> Dave.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-09-05  5:14 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03  5:54 [PATCH] mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set Junxiao Bi
2014-09-03  5:54 ` Junxiao Bi
2014-09-03 12:20 ` Trond Myklebust
2014-09-03 12:20   ` Trond Myklebust
2014-09-04  2:18   ` Junxiao Bi
2014-09-04  2:18     ` Junxiao Bi
2014-09-03 23:10 ` Andrew Morton
2014-09-03 23:10   ` Andrew Morton
2014-09-04  2:08   ` Junxiao Bi
2014-09-04  2:08     ` Junxiao Bi
2014-09-04  2:30     ` Andrew Morton
2014-09-04  2:30       ` Andrew Morton
2014-09-04  4:57       ` Junxiao Bi
2014-09-04  4:57         ` Junxiao Bi
2014-09-04  4:57         ` Junxiao Bi
2014-09-04  8:05       ` Anton Altaparmakov
2014-09-04  8:05         ` Anton Altaparmakov
2014-09-04  9:21       ` Dave Chinner
2014-09-04  9:21         ` Dave Chinner
2014-09-04  9:05   ` Dave Chinner
2014-09-04  9:05     ` Dave Chinner
2014-09-04  9:23 ` Dave Chinner
2014-09-04  9:23   ` Dave Chinner
2014-09-05  2:32   ` Junxiao Bi
2014-09-05  2:32     ` Junxiao Bi
2014-09-05  5:13     ` Junxiao Bi
2014-09-05  5:13       ` Junxiao Bi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.