Linux-man Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER
@ 2020-02-13 18:23 Mike Christie
  2020-02-13 20:08 ` Darrick J. Wong
  2020-02-14 12:54 ` Michal Hocko
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Christie @ 2020-02-13 18:23 UTC (permalink / raw)
  To: linux-api, david, mhocko, masato.suzuki, damien.lemoal,
	darrick.wong, bvanassche, mtk.manpages, linux-man
  Cc: Mike Christie

This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
prctl commands added to the linux kernel for 5.6 in commit:

commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
Author: Mike Christie <mchristi@redhat.com>
Date:   Mon Nov 11 18:19:00 2019 -0600

    prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
---

V3:
- Replace emulation device example.

V2:
- My initial patch for this was very bad. This version is almost 100%
taken word for word from Dave Chinner's review comments.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 man2/prctl.2 | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/man2/prctl.2 b/man2/prctl.2
index 720ec04e4..58d77bf2e 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -1381,6 +1381,30 @@ system call on Tru64).
 for information on versions and architectures.)
 Return unaligned access control bits, in the location pointed to by
 .IR "(unsigned int\ *) arg2" .
+.TP
+.B PR_SET_IO_FLUSHER (Since Linux 5.6)
+An IO_FLUSHER is a user process that the kernel uses to issue IO
+that cleans dirty page cache data and/or filesystem metadata. The
+kernel may need to clean this memory when under memory pressure in
+order to free it. This means there is potential for a memory reclaim
+recursion deadlock if the user process attempts to allocate memory
+and the kernel then blocks waiting for it to clean memory before it
+can make reclaim progress.
+
+The kernel avoids these recursion problems internally via a special
+process state that prevents recursive reclaim from issuing new IO.
+If \fIarg2\fP is 1, the \fPPR_SET_IO_FLUSHER\fP control allows a userspace
+process to set up this same process state and hence avoid the memory
+reclaim recursion deadlocks in the same manner the kernel avoids them.
+If \fIarg2\fP is 0, the process will clear the IO_FLUSHER state, and the
+default behavior will be used.
+
+Examples of IO_FLUSHER applications are FUSE daemons, SCSI device
+emulation daemons, etc."
+.TP
+.B PR_GET_IO_FLUSHER (Since Linux 5.6)
+Return as the function result 1 if the caller is in the IO_FLUSHER state and
+0 if not.
 .SH RETURN VALUE
 On success,
 .BR PR_GET_DUMPABLE ,
@@ -1395,6 +1419,7 @@ On success,
 .BR PR_GET_SPECULATION_CTRL ,
 .BR PR_MCE_KILL_GET ,
 .BR PR_CAP_AMBIENT + PR_CAP_AMBIENT_IS_SET ,
+.BR PR_GET_IO_FLUSHER ,
 and (if it returns)
 .BR PR_GET_SECCOMP
 return the nonnegative values described above.
-- 
2.21.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER
  2020-02-13 18:23 [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER Mike Christie
@ 2020-02-13 20:08 ` Darrick J. Wong
  2020-02-13 20:14   ` Mike Christie
  2020-02-14 12:54 ` Michal Hocko
  1 sibling, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2020-02-13 20:08 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-api, david, mhocko, masato.suzuki, damien.lemoal,
	bvanassche, mtk.manpages, linux-man

On Thu, Feb 13, 2020 at 12:23:36PM -0600, Mike Christie wrote:
> This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
> prctl commands added to the linux kernel for 5.6 in commit:
> 
> commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
> Author: Mike Christie <mchristi@redhat.com>
> Date:   Mon Nov 11 18:19:00 2019 -0600
> 
>     prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
> 
> V3:
> - Replace emulation device example.
> 
> V2:
> - My initial patch for this was very bad. This version is almost 100%
> taken word for word from Dave Chinner's review comments.
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> ---
>  man2/prctl.2 | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index 720ec04e4..58d77bf2e 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -1381,6 +1381,30 @@ system call on Tru64).
>  for information on versions and architectures.)
>  Return unaligned access control bits, in the location pointed to by
>  .IR "(unsigned int\ *) arg2" .
> +.TP
> +.B PR_SET_IO_FLUSHER (Since Linux 5.6)
> +An IO_FLUSHER is a user process that the kernel uses to issue IO
> +that cleans dirty page cache data and/or filesystem metadata. The
> +kernel may need to clean this memory when under memory pressure in
> +order to free it. This means there is potential for a memory reclaim
> +recursion deadlock if the user process attempts to allocate memory
> +and the kernel then blocks waiting for it to clean memory before it
> +can make reclaim progress.
> +
> +The kernel avoids these recursion problems internally via a special
> +process state that prevents recursive reclaim from issuing new IO.
> +If \fIarg2\fP is 1, the \fPPR_SET_IO_FLUSHER\fP control allows a userspace
> +process to set up this same process state and hence avoid the memory
> +reclaim recursion deadlocks in the same manner the kernel avoids them.
> +If \fIarg2\fP is 0, the process will clear the IO_FLUSHER state, and the
> +default behavior will be used.

I forget, does a program have to have special capabilities (e.g.
CAP_SYS_ADMIN) to be able to PR_SET_IO_FLUSHER?

--D

> +Examples of IO_FLUSHER applications are FUSE daemons, SCSI device
> +emulation daemons, etc."
> +.TP
> +.B PR_GET_IO_FLUSHER (Since Linux 5.6)
> +Return as the function result 1 if the caller is in the IO_FLUSHER state and
> +0 if not.
>  .SH RETURN VALUE
>  On success,
>  .BR PR_GET_DUMPABLE ,
> @@ -1395,6 +1419,7 @@ On success,
>  .BR PR_GET_SPECULATION_CTRL ,
>  .BR PR_MCE_KILL_GET ,
>  .BR PR_CAP_AMBIENT + PR_CAP_AMBIENT_IS_SET ,
> +.BR PR_GET_IO_FLUSHER ,
>  and (if it returns)
>  .BR PR_GET_SECCOMP
>  return the nonnegative values described above.
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER
  2020-02-13 20:08 ` Darrick J. Wong
@ 2020-02-13 20:14   ` Mike Christie
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Christie @ 2020-02-13 20:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-api, david, mhocko, masato.suzuki, damien.lemoal,
	bvanassche, mtk.manpages, linux-man

On 02/13/2020 02:08 PM, Darrick J. Wong wrote:
> On Thu, Feb 13, 2020 at 12:23:36PM -0600, Mike Christie wrote:
>> This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
>> prctl commands added to the linux kernel for 5.6 in commit:
>>
>> commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
>> Author: Mike Christie <mchristi@redhat.com>
>> Date:   Mon Nov 11 18:19:00 2019 -0600
>>
>>     prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
>>
>> Signed-off-by: Mike Christie <mchristi@redhat.com>
>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>
>> V3:
>> - Replace emulation device example.
>>
>> V2:
>> - My initial patch for this was very bad. This version is almost 100%
>> taken word for word from Dave Chinner's review comments.
>>
>> Signed-off-by: Mike Christie <mchristi@redhat.com>
>> ---
>>  man2/prctl.2 | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/man2/prctl.2 b/man2/prctl.2
>> index 720ec04e4..58d77bf2e 100644
>> --- a/man2/prctl.2
>> +++ b/man2/prctl.2
>> @@ -1381,6 +1381,30 @@ system call on Tru64).
>>  for information on versions and architectures.)
>>  Return unaligned access control bits, in the location pointed to by
>>  .IR "(unsigned int\ *) arg2" .
>> +.TP
>> +.B PR_SET_IO_FLUSHER (Since Linux 5.6)
>> +An IO_FLUSHER is a user process that the kernel uses to issue IO
>> +that cleans dirty page cache data and/or filesystem metadata. The
>> +kernel may need to clean this memory when under memory pressure in
>> +order to free it. This means there is potential for a memory reclaim
>> +recursion deadlock if the user process attempts to allocate memory
>> +and the kernel then blocks waiting for it to clean memory before it
>> +can make reclaim progress.
>> +
>> +The kernel avoids these recursion problems internally via a special
>> +process state that prevents recursive reclaim from issuing new IO.
>> +If \fIarg2\fP is 1, the \fPPR_SET_IO_FLUSHER\fP control allows a userspace
>> +process to set up this same process state and hence avoid the memory
>> +reclaim recursion deadlocks in the same manner the kernel avoids them.
>> +If \fIarg2\fP is 0, the process will clear the IO_FLUSHER state, and the
>> +default behavior will be used.
> 
> I forget, does a program have to have special capabilities (e.g.
> CAP_SYS_ADMIN) to be able to PR_SET_IO_FLUSHER?

Yes, CAP_SYS_RESOURCE. I will add that info.


> 
> --D
> 
>> +Examples of IO_FLUSHER applications are FUSE daemons, SCSI device
>> +emulation daemons, etc."
>> +.TP
>> +.B PR_GET_IO_FLUSHER (Since Linux 5.6)
>> +Return as the function result 1 if the caller is in the IO_FLUSHER state and
>> +0 if not.
>>  .SH RETURN VALUE
>>  On success,
>>  .BR PR_GET_DUMPABLE ,
>> @@ -1395,6 +1419,7 @@ On success,
>>  .BR PR_GET_SPECULATION_CTRL ,
>>  .BR PR_MCE_KILL_GET ,
>>  .BR PR_CAP_AMBIENT + PR_CAP_AMBIENT_IS_SET ,
>> +.BR PR_GET_IO_FLUSHER ,
>>  and (if it returns)
>>  .BR PR_GET_SECCOMP
>>  return the nonnegative values described above.
>> -- 
>> 2.21.0
>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER
  2020-02-13 18:23 [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER Mike Christie
  2020-02-13 20:08 ` Darrick J. Wong
@ 2020-02-14 12:54 ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2020-02-14 12:54 UTC (permalink / raw)
  To: Mike Christie
  Cc: linux-api, david, masato.suzuki, damien.lemoal, darrick.wong,
	bvanassche, mtk.manpages, linux-man

On Thu 13-02-20 12:23:36, Mike Christie wrote:
> This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
> prctl commands added to the linux kernel for 5.6 in commit:
> 
> commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
> Author: Mike Christie <mchristi@redhat.com>
> Date:   Mon Nov 11 18:19:00 2019 -0600
> 
>     prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
> 
> V3:
> - Replace emulation device example.
> 
> V2:
> - My initial patch for this was very bad. This version is almost 100%
> taken word for word from Dave Chinner's review comments.
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> ---
>  man2/prctl.2 | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index 720ec04e4..58d77bf2e 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -1381,6 +1381,30 @@ system call on Tru64).
>  for information on versions and architectures.)
>  Return unaligned access control bits, in the location pointed to by
>  .IR "(unsigned int\ *) arg2" .
> +.TP
> +.B PR_SET_IO_FLUSHER (Since Linux 5.6)
> +An IO_FLUSHER is a user process that the kernel uses to issue IO
> +that cleans dirty page cache data and/or filesystem metadata. The
> +kernel may need to clean this memory when under memory pressure in
> +order to free it. This means there is potential for a memory reclaim
> +recursion deadlock if the user process attempts to allocate memory
> +and the kernel then blocks waiting for it to clean memory before it
> +can make reclaim progress.
> +
> +The kernel avoids these recursion problems internally via a special
> +process state that prevents recursive reclaim from issuing new IO.

I would refrain from describing the internal implementation. The
important part is that this flag tells the kernel that IO_FLUSHER
process gets a special treatment to workaround the deadlock.

So anytime a process is involved in the IO path and the kernel cannot
make a forward progress without it then the flag should be set.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER
       [not found] <20200210221557.8021-1-mchristi@redhat.com>
@ 2020-02-11 14:17 ` Christian Brauner
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2020-02-11 14:17 UTC (permalink / raw)
  To: Mike Christie, mtk.manpages
  Cc: linux-api, david, mhocko, masato.suzuki, damien.lemoal,
	darrick.wong, bvanassche, linux-man

I think you've missed:
mtk.manpages@gmail.com
linux-man@vger.kernel.org

:)

Christian

On Mon, Feb 10, 2020 at 04:15:57PM -0600, Mike Christie wrote:
> This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
> prctl commands added to the linux kernel for 5.6 in commit:
> 
> commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
> Author: Mike Christie <mchristi@redhat.com>
> Date:   Mon Nov 11 18:19:00 2019 -0600
> 
>     prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> ---
> 
> V2:
> - My initial patch for this was very bad. This version is almost 100%
> taken word for word from Dave Chinner's review comments.
> 
> 
>  man2/prctl.2 | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index 720ec04e4..b481d186b 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -1381,6 +1381,30 @@ system call on Tru64).
>  for information on versions and architectures.)
>  Return unaligned access control bits, in the location pointed to by
>  .IR "(unsigned int\ *) arg2" .
> +.TP
> +.B PR_SET_IO_FLUSHER (Since Linux 5.6)
> +An IO_FLUSHER is a user process that the kernel uses to issue IO
> +that cleans dirty page cache data and/or filesystem metadata. The
> +kernel may need to clean this memory when under memory pressure in
> +order to free it. This means there is potential for a memory reclaim
> +recursion deadlock if the user process attempts to allocate memory
> +and the kernel then blocks waiting for it to clean memory before it
> +can make reclaim progress.
> +
> +The kernel avoids these recursion problems internally via a special
> +process state that prevents recursive reclaim from issuing new IO.
> +If \fIarg2\fP is 1, the \fPPR_SET_IO_FLUSHER\fP control allows a userspace
> +process to set up this same process state and hence avoid the memory
> +reclaim recursion deadlocks in the same manner the kernel avoids them.
> +If \fIarg2\fP is 0, the process will clear the IO_FLUSHER state, and the
> +default behavior will be used.
> +
> +Examples of IO_FLUSHER applications are FUSE daemons, zoned disk
> +emulation daemons, etc."
> +.TP
> +.B PR_GET_IO_FLUSHER (Since Linux 5.6)
> +Return as the function result 1 if the caller is in the IO_FLUSHER state and
> +0 if not.
>  .SH RETURN VALUE
>  On success,
>  .BR PR_GET_DUMPABLE ,
> @@ -1395,6 +1419,7 @@ On success,
>  .BR PR_GET_SPECULATION_CTRL ,
>  .BR PR_MCE_KILL_GET ,
>  .BR PR_CAP_AMBIENT + PR_CAP_AMBIENT_IS_SET ,
> +.BR PR_GET_IO_FLUSHER ,
>  and (if it returns)
>  .BR PR_GET_SECCOMP
>  return the nonnegative values described above.
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 18:23 [PATCH 1/1] prctl.2: doc PR_SET/GET_IO_FLUSHER Mike Christie
2020-02-13 20:08 ` Darrick J. Wong
2020-02-13 20:14   ` Mike Christie
2020-02-14 12:54 ` Michal Hocko
     [not found] <20200210221557.8021-1-mchristi@redhat.com>
2020-02-11 14:17 ` Christian Brauner

Linux-man Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-man/0 linux-man/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-man linux-man/ https://lore.kernel.org/linux-man \
		linux-man@vger.kernel.org
	public-inbox-index linux-man

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-man


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git