All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: How to favor memory allocations for WQ_MEM_RECLAIM threads?
Date: Wed, 8 Mar 2017 08:21:32 +1100	[thread overview]
Message-ID: <20170307212132.GQ17542@dastard> (raw)
In-Reply-To: <20170307193659.GD31179@htj.duckdns.org>

On Tue, Mar 07, 2017 at 02:36:59PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 07, 2017 at 01:15:04PM +0100, Michal Hocko wrote:
> > > The real problem here is that the XFS code has /no idea/ of what
> > > workqueue context it is operating in - the fact it is in a rescuer
> 
> I don't see how whether something is running off of a rescuer or not
> matters here.  The only thing workqueue guarantees is that there's
> gonna be at least one kworker thread executing work items from the
> workqueue.  Running on a rescuer doesn't necessarily indicate memory
> pressure condition.

That's news to me. In what situations do we run the rescuer thread
other than memory allocation failure when queuing work?

> > > thread is completely hidden from the executing context. It seems to
> > > me that the workqueue infrastructure's responsibility to tell memory
> > > reclaim that the rescuer thread needs special access to the memory
> > > reserves to allow the work it is running to allow forwards progress
> > > to be made. i.e.  setting PF_MEMALLOC on the rescuer thread or
> > > something similar...
> >
> > I am not sure an automatic access to memory reserves from the rescuer
> > context is safe. This sounds too easy to break (read consume all the
> > reserves) - note that we have almost 200 users of WQ_MEM_RECLAIM and
> > chances are some of them will not be careful with the memory
> > allocations. I agree it would be helpful to know that the current item
> > runs from the rescuer context, though. In such a case the implementation
> > can do what ever it takes to make a forward progress. If that is using
> > __GFP_MEMALLOC then be it but it would be at least explicit and well
> > thought through (I hope).
> 
> I don't think doing this automatically is a good idea.  xfs work items
> are free to mark itself PF_MEMALLOC while running tho.

I don't think that's a good idea to do unconditionally.It's quite
common to have IO intensive XFS workloads queue so much work that we
see several /thousand/ kworker threads running at once, even
on realtively small 16p systems.

> It makes sense
> to mark these cases explicitly anyway. 

Doing it on every work we queue will lead to immediate depletion of
memory reserves under heavy IO loads.

> W  can update workqueue code
> so that it automatically clears the flag after each work item
> completion to help.
> 
> > Tejun, would it be possible/reasonable to add current_is_wq_rescuer() API?
> 
> It's implementable for sure.  I'm just not sure how it'd help
> anything.  It's not a relevant information on anything.

Except to enable us to get closer to the "rescuer must make forwards
progress" guarantee. In this context, the rescuer is the only
context we should allow to dip into memory reserves. I'm happy if we
have to explicitly check for that and set PF_MEMALLOC ourselves 
(we do that for XFS kernel threads involved in memory reclaim),
but it's not something we should set automatically on every
IO completion work item we run....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID
From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: How to favor memory allocations for WQ_MEM_RECLAIM threads?
Date: Wed, 8 Mar 2017 08:21:32 +1100	[thread overview]
Message-ID: <20170307212132.GQ17542@dastard> (raw)
In-Reply-To: <20170307193659.GD31179@htj.duckdns.org>

On Tue, Mar 07, 2017 at 02:36:59PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 07, 2017 at 01:15:04PM +0100, Michal Hocko wrote:
> > > The real problem here is that the XFS code has /no idea/ of what
> > > workqueue context it is operating in - the fact it is in a rescuer
> 
> I don't see how whether something is running off of a rescuer or not
> matters here.  The only thing workqueue guarantees is that there's
> gonna be at least one kworker thread executing work items from the
> workqueue.  Running on a rescuer doesn't necessarily indicate memory
> pressure condition.

That's news to me. In what situations do we run the rescuer thread
other than memory allocation failure when queuing work?

> > > thread is completely hidden from the executing context. It seems to
> > > me that the workqueue infrastructure's responsibility to tell memory
> > > reclaim that the rescuer thread needs special access to the memory
> > > reserves to allow the work it is running to allow forwards progress
> > > to be made. i.e.  setting PF_MEMALLOC on the rescuer thread or
> > > something similar...
> >
> > I am not sure an automatic access to memory reserves from the rescuer
> > context is safe. This sounds too easy to break (read consume all the
> > reserves) - note that we have almost 200 users of WQ_MEM_RECLAIM and
> > chances are some of them will not be careful with the memory
> > allocations. I agree it would be helpful to know that the current item
> > runs from the rescuer context, though. In such a case the implementation
> > can do what ever it takes to make a forward progress. If that is using
> > __GFP_MEMALLOC then be it but it would be at least explicit and well
> > thought through (I hope).
> 
> I don't think doing this automatically is a good idea.  xfs work items
> are free to mark itself PF_MEMALLOC while running tho.

I don't think that's a good idea to do unconditionally.It's quite
common to have IO intensive XFS workloads queue so much work that we
see several /thousand/ kworker threads running at once, even
on realtively small 16p systems.

> It makes sense
> to mark these cases explicitly anyway. 

Doing it on every work we queue will lead to immediate depletion of
memory reserves under heavy IO loads.

> W  can update workqueue code
> so that it automatically clears the flag after each work item
> completion to help.
> 
> > Tejun, would it be possible/reasonable to add current_is_wq_rescuer() API?
> 
> It's implementable for sure.  I'm just not sure how it'd help
> anything.  It's not a relevant information on anything.

Except to enable us to get closer to the "rescuer must make forwards
progress" guarantee. In this context, the rescuer is the only
context we should allow to dip into memory reserves. I'm happy if we
have to explicitly check for that and set PF_MEMALLOC ourselves 
(we do that for XFS kernel threads involved in memory reclaim),
but it's not something we should set automatically on every
IO completion work item we run....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-08 13:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03 10:48 How to favor memory allocations for WQ_MEM_RECLAIM threads? Tetsuo Handa
2017-03-03 10:48 ` Tetsuo Handa
2017-03-03 13:39 ` Michal Hocko
2017-03-03 13:39   ` Michal Hocko
2017-03-03 15:37   ` Brian Foster
2017-03-03 15:37     ` Brian Foster
2017-03-03 15:52     ` Michal Hocko
2017-03-03 15:52       ` Michal Hocko
2017-03-03 17:29       ` Brian Foster
2017-03-03 17:29         ` Brian Foster
2017-03-04 14:54         ` Tetsuo Handa
2017-03-04 14:54           ` Tetsuo Handa
2017-03-06 13:25           ` Brian Foster
2017-03-06 13:25             ` Brian Foster
2017-03-06 16:08             ` Tetsuo Handa
2017-03-06 16:08               ` Tetsuo Handa
2017-03-06 16:17               ` Brian Foster
2017-03-06 16:17                 ` Brian Foster
2017-03-03 23:25   ` Dave Chinner
2017-03-03 23:25     ` Dave Chinner
2017-03-07 12:15     ` Michal Hocko
2017-03-07 12:15       ` Michal Hocko
2017-03-07 19:36       ` Tejun Heo
2017-03-07 19:36         ` Tejun Heo
2017-03-07 21:21         ` Dave Chinner [this message]
2017-03-07 21:21           ` Dave Chinner
2017-03-07 21:48           ` Tejun Heo
2017-03-07 21:48             ` Tejun Heo
2017-03-08 23:03             ` Tejun Heo
2017-03-08 23:03               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170307212132.GQ17542@dastard \
    --to=david@fromorbit.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.