All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Shawn Landden <slandden@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)
Date: Thu, 2 Nov 2017 16:45:18 +0100	[thread overview]
Message-ID: <20171102154518.fbd6pb533asd7wfo@dhcp22.suse.cz> (raw)
In-Reply-To: <20171101053244.5218-1-slandden@gmail.com>

[Always cc linux-api mailing list when proposing user visible api
 changes]

On Tue 31-10-17 22:32:44, Shawn Landden wrote:
> It is common for services to be stateless around their main event loop.
> If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
> signals to the kernel that epoll_wait5() may not complete, and the kernel
> may send SIGKILL if resources get tight.
> 
> See my systemd patch: https://github.com/shawnl/systemd/tree/killme
> 
> Android uses this memory model for all programs, and having it in the
> kernel will enable integration with the page cache (not in this
> series).

I have to say I completely hate the idea. You are abusing epoll_wait5
for the out of memory handling? Why is this syscall any special from any
other one which sleeps and waits idle for an event? We do have per task
oom_score_adj for that purposes.

Besides that the patch is simply wrong because

[...]
> @@ -1029,6 +1030,22 @@ bool out_of_memory(struct oom_control *oc)
>  		return true;
>  	}
>  
> +	/*
> +	 * Check death row.
> +	 */
> +	if (!list_empty(eventpoll_deathrow_list())) {
> +		struct list_head *l = eventpoll_deathrow_list();
> +		struct task_struct *ts = list_first_entry(l,
> +					 struct task_struct, se.deathrow);
> +
> +		pr_debug("Killing pid %u from EPOLL_KILLME death row.",
> +			ts->pid);
> +
> +		/* We use SIGKILL so as to cleanly interrupt ep_poll() */
> +		kill_pid(task_pid(ts), SIGKILL, 1);
> +		return true;
> +	}
> +

this doesn't reflect the oom domain (is this memcg, mempolicy/tastset constrained
OOM). You might be killing tasks which are not in the target domain.
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Shawn Landden <slandden@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)
Date: Thu, 2 Nov 2017 16:45:18 +0100	[thread overview]
Message-ID: <20171102154518.fbd6pb533asd7wfo@dhcp22.suse.cz> (raw)
In-Reply-To: <20171101053244.5218-1-slandden@gmail.com>

[Always cc linux-api mailing list when proposing user visible api
 changes]

On Tue 31-10-17 22:32:44, Shawn Landden wrote:
> It is common for services to be stateless around their main event loop.
> If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
> signals to the kernel that epoll_wait5() may not complete, and the kernel
> may send SIGKILL if resources get tight.
> 
> See my systemd patch: https://github.com/shawnl/systemd/tree/killme
> 
> Android uses this memory model for all programs, and having it in the
> kernel will enable integration with the page cache (not in this
> series).

I have to say I completely hate the idea. You are abusing epoll_wait5
for the out of memory handling? Why is this syscall any special from any
other one which sleeps and waits idle for an event? We do have per task
oom_score_adj for that purposes.

Besides that the patch is simply wrong because

[...]
> @@ -1029,6 +1030,22 @@ bool out_of_memory(struct oom_control *oc)
>  		return true;
>  	}
>  
> +	/*
> +	 * Check death row.
> +	 */
> +	if (!list_empty(eventpoll_deathrow_list())) {
> +		struct list_head *l = eventpoll_deathrow_list();
> +		struct task_struct *ts = list_first_entry(l,
> +					 struct task_struct, se.deathrow);
> +
> +		pr_debug("Killing pid %u from EPOLL_KILLME death row.",
> +			ts->pid);
> +
> +		/* We use SIGKILL so as to cleanly interrupt ep_poll() */
> +		kill_pid(task_pid(ts), SIGKILL, 1);
> +		return true;
> +	}
> +

this doesn't reflect the oom domain (is this memcg, mempolicy/tastset constrained
OOM). You might be killing tasks which are not in the target domain.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-11-02 15:45 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01  5:32 [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall) Shawn Landden
2017-11-01  5:32 ` Shawn Landden
2017-11-01  5:32 ` Shawn Landden
2017-11-01 14:04 ` Matthew Wilcox
2017-11-01 14:04   ` Matthew Wilcox
2017-11-01 15:16 ` Colin Walters
2017-11-01 15:16   ` Colin Walters
2017-11-01 15:22   ` Colin Walters
2017-11-01 15:22     ` Colin Walters
2017-11-03  9:22     ` peter enderborg
2017-11-03  9:22       ` peter enderborg
2017-11-03  9:22       ` peter enderborg
2017-11-01 19:02   ` Shawn Landden
2017-11-01 19:37     ` Colin Walters
2017-11-01 19:37       ` Colin Walters
2017-11-01 19:43       ` Shawn Landden
2017-11-01 20:54       ` Shawn Landden
2017-11-02 15:24       ` Shawn Paul Landden
2017-11-02 15:24         ` Shawn Paul Landden
2017-11-01 19:05   ` Shawn Landden
2017-11-01 22:10 ` Tetsuo Handa
2017-11-01 22:10   ` Tetsuo Handa
2017-11-02  7:36   ` Shawn Landden
2017-11-02 15:45 ` Michal Hocko [this message]
2017-11-02 15:45   ` Michal Hocko
2017-11-03  6:35 ` [RFC v2] prctl: prctl(PR_SET_IDLE, PR_IDLE_MODE_KILLME), for stateless idle loops Shawn Landden
2017-11-03  6:35   ` Shawn Landden
2017-11-03  6:35   ` Shawn Landden
2017-11-03  9:09   ` Michal Hocko
2017-11-03  9:09     ` Michal Hocko
2017-11-18  4:45     ` Shawn Landden
2017-11-19  4:19       ` Matthew Wilcox
2017-11-19  4:19         ` Matthew Wilcox
2017-11-19  4:19         ` Matthew Wilcox
2017-11-20  8:35       ` Michal Hocko
2017-11-20  8:35         ` Michal Hocko
2017-11-21  4:48         ` Shawn Landden
2017-11-21  4:48           ` Shawn Landden
2017-11-21  7:05           ` Michal Hocko
2017-11-21  7:05             ` Michal Hocko
2017-11-18 20:33     ` Shawn Landden
2017-11-18 20:33       ` Shawn Landden
2017-11-15 21:11   ` Pavel Machek
2017-11-21  4:49   ` [RFC v3] It is common for services to be stateless around their main event loop. If a process sets PR_SET_IDLE to PR_IDLE_MODE_KILLME then it signals to the kernel that epoll_wait() and friends may not complete, and the kernel may send SIGKILL if resources get tight Shawn Landden
2017-11-21  4:49     ` Shawn Landden
2017-11-21  4:49     ` Shawn Landden
2017-11-21  4:56     ` Shawn Landden
2017-11-21  4:56       ` Shawn Landden
2017-11-21  5:16     ` [RFC v4] " Shawn Landden
2017-11-21  5:16       ` Shawn Landden
2017-11-21  5:16       ` Shawn Landden
2017-11-21  5:26       ` Shawn Landden
2017-11-21  5:26         ` Shawn Landden
2017-11-21  9:14       ` Thomas Gleixner
2017-11-21  9:14         ` Thomas Gleixner
2017-11-22 10:29   ` [RFC v2] prctl: prctl(PR_SET_IDLE, PR_IDLE_MODE_KILLME), for stateless idle loops peter enderborg
2017-11-22 10:29     ` peter enderborg
2017-11-22 10:29     ` peter enderborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171102154518.fbd6pb533asd7wfo@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=slandden@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.