From: Michal Hocko <mhocko@kernel.org>
To: Mike Christie <mchristi@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, martin@urbackup.org,
Damien.LeMoal@wdc.com
Subject: Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2
Date: Tue, 22 Oct 2019 18:33:10 +0200 [thread overview]
Message-ID: <20191022163310.GS9379@dhcp22.suse.cz> (raw)
In-Reply-To: <5DAF2AA0.5030500@redhat.com>
On Tue 22-10-19 11:13:20, Mike Christie wrote:
> On 10/22/2019 06:24 AM, Michal Hocko wrote:
> > On Mon 21-10-19 16:41:37, Mike Christie wrote:
> >> There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
> >> amd nbd that have userspace components that can run in the IO path. For
> >> example, iscsi and nbd's userspace deamons may need to recreate a socket
> >> and/or send IO on it, and dm-multipath's daemon multipathd may need to
> >> send IO to figure out the state of paths and re-set them up.
> >>
> >> In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
> >> memalloc_*_save/restore functions to control the allocation behavior,
> >> but for userspace we would end up hitting a allocation that ended up
> >> writing data back to the same device we are trying to allocate for.
> >
> > Which code paths are we talking about here? Any ioctl or is this a
> > general syscall path? Can we mark the process in a more generic way?
>
> It depends on the daemon. The common one for example are iscsi and nbd
> need network related calls like sendmsg, recvmsg, socket, etc.
> tcmu-runner could need the network ones and also read and write when it
> does IO to a FS or device. dm-multipath needs the sg io ioctls.
OK, so there is not a clear kernel entry point that could be explicitly
annotated. This would imply a per task context. This is an important
information. And I am wondering how those usecases ever worked in the
first place. This is not a minor detail.
> > E.g. we have PF_LESS_THROTTLE (used by nfsd). It doesn't affect the
> > reclaim recursion but it shows a pattern that doesn't really exhibit
> > too many internals. Maybe we need PF_IO_FLUSHER or similar?
>
> I am not familiar with PF_IO_FLUSHER. If it prevents the recursion
> problem then please send me details and I will look into it for the next
> posting.
PF_IO_FLUSHER doesn't exist. I just wanted to point out that similarly
to PF_LESS_THROTTLE it should be a more high level per task flag rather
than something as low level as a direct control of gfp allocation
context. PF_LESS_THROTTLE simply tells that the task is a part of the
reclaim process and therefore it shouldn't be a subject of a normal
throttling - whatever that means. PF_IO_FLUSHER would mean that the user
context is a part of the IO path and therefore there are certain reclaim
recursion restrictions.
> >> This patch allows the userspace deamon to set the PF_MEMALLOC* flags
> >> with prctl during their initialization so later allocations cannot
> >> calling back into them.
> >
> > TBH I am not really happy to export these to the userspace. They are
> > an internal implementation detail and the userspace shouldn't really
>
> They care in these cases, because block/fs drivers must be able to make
> forward progress during writes. To meet this guarantee kernel block
> drivers use mempools and memalloc/GFP flags.
>
> For these userspace components of the block/fs drivers they already do
> things normal daemons do not to meet that guarantee like mlock their
> memory, disable oom killer, and preallocate resources they have control
> over. They have no control over reclaim like the kernel drivers do so
> its easy for us to deadlock when memory gets low.
OK, fair enough. How much of a control do they really need though. Is a
single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO
context) sufficient?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2019-10-22 16:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-21 21:41 [PATCH] Add prctl support for controlling PF_MEMALLOC V2 Mike Christie
2019-10-21 22:52 ` Dave Chinner
2019-10-22 15:42 ` Mike Christie
2019-10-22 11:24 ` Michal Hocko
2019-10-22 16:13 ` Mike Christie
2019-10-22 16:33 ` Michal Hocko [this message]
2019-10-22 20:43 ` Dave Chinner
2019-10-23 7:11 ` Michal Hocko
2019-10-23 17:27 ` Mike Christie
2019-10-23 17:35 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191022163310.GS9379@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=Damien.LeMoal@wdc.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin@urbackup.org \
--cc=mchristi@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).