From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864B1C07E9E for ; Thu, 8 Jul 2021 06:39:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6981261CD1 for ; Thu, 8 Jul 2021 06:39:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbhGHGm3 (ORCPT ); Thu, 8 Jul 2021 02:42:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbhGHGm2 (ORCPT ); Thu, 8 Jul 2021 02:42:28 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F78C061574 for ; Wed, 7 Jul 2021 23:39:46 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id k184so7227384ybf.12 for ; Wed, 07 Jul 2021 23:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=Q8SVvr1jTKOq3j1BCslRCPLWmaQMXyTeDdBuyUrjxBV79g3niuwJsDTUD9ZUUaHJ0s CmDeMucSyAQQbZ+j+xfAsykJJ8181J+14rDNbhmGr15mWq/jW0X4JmmPGq0ampUTCFLj Ih+mINskcgfLLx4uGScvhz3uXISFAHbElucLnlFNBHUtgsdR0E6XhwibmnKO4OTFmRkV efPB+obV4e8RgQ32aCvcY3hJ7ndU45iGFw+PGvbVEaIq3vSFen6pNbrsoOgfi0UqzvfW sqyAwULvJamzlS1Cj6xjhtRamBFLl80h2AQTm2OBtFA+kI3OxMaiY20J/kelIl8MGSJu zkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=HeChCU0j3BoiRAeqPjDYLyLJPR9WLxh75Wl+hsxLYiiN7yeRjDs2XdghnhxUXhlDcZ PCPJdFnyGOrIpNDn4I25wd8tjK3DpJYfU+io2UDxWnk84dz+fSGAtxwh45LKO41awh5O J0SMTX5/MywTYvnIgvg4sj6U9Vooihwqt+X/cCZUciufvNCkhLy2ZkgZe+ZFzHsZwzFp qRAlMI6AIHFADhyqY42fTTwXuEjS/7meIXLIGISDkQgW8ii/yEKFz40V29fe6ZHkXrRb RILp1ng202RsQME2KgS47Vw3k64C5P+ExkL7eBTie94rKT7L6UQ0rFKe1enkipQTh4TW Pt6g== X-Gm-Message-State: AOAM5304c4UNsNa8vRwcfyxqg4R8hc4KVc+s4mh/ZESwFFb20gTUc5sZ rGqAHm+7kfQhXS5Eni0myznQEGub3JjfZL/Tt4TnIQ== X-Google-Smtp-Source: ABdhPJydDNpP10FuMf7vtQWnSXkxu7Hqff+2qGL2BpHgm1W1RGB/mrktTIT/F59XFDxVt4zCP/5eFl+K0CtXXxtblYY= X-Received: by 2002:a25:4102:: with SMTP id o2mr35358941yba.23.1625726385748; Wed, 07 Jul 2021 23:39:45 -0700 (PDT) MIME-Version: 1.0 References: <20210623192822.3072029-1-surenb@google.com> <87sg0qa22l.fsf@oldenburg.str.redhat.com> <87wnq1z7kl.fsf@oldenburg.str.redhat.com> <87zguxxrfl.fsf@oldenburg.str.redhat.com> In-Reply-To: <87zguxxrfl.fsf@oldenburg.str.redhat.com> From: Suren Baghdasaryan Date: Wed, 7 Jul 2021 23:39:34 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: introduce process_reap system call To: Florian Weimer Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Minchan Kim , Christian Brauner , Christoph Hellwig , Oleg Nesterov , David Hildenbrand , Jann Horn , Shakeel Butt , Tim Murray , Linux API , linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 7, 2021 at 11:15 PM Florian Weimer wrote: > > * Suren Baghdasaryan: > > > On Wed, Jul 7, 2021 at 10:41 PM Florian Weimer wro= te: > >> > >> * Suren Baghdasaryan: > >> > >> > On Wed, Jul 7, 2021 at 2:47 AM Florian Weimer w= rote: > >> >> > >> >> * Suren Baghdasaryan: > >> >> > >> >> > The API is as follows, > >> >> > > >> >> > int process_reap(int pidfd, unsigned int flags); > >> >> > > >> >> > DESCRIPTION > >> >> > The process_reap() system call is used to free the memo= ry of a > >> >> > dying process. > >> >> > > >> >> > The pidfd selects the process referred to by the PID fi= le > >> >> > descriptor. > >> >> > (See pidofd_open(2) for further information) > >> >> > > >> >> > The flags argument is reserved for future use; currentl= y, this > >> >> > argument must be specified as 0. > >> >> > > >> >> > RETURN VALUE > >> >> > On success, process_reap() returns 0. On error, -1 is r= eturned > >> >> > and errno is set to indicate the error. > >> >> > >> >> I think the manual page should mention what it means for a process = to be > >> >> =E2=80=9Cdying=E2=80=9D, and how to move a process to this state. > >> > > >> > Thanks for the suggestion, Florian! Would replacing "dying process" > >> > with "process which was sent a SIGKILL signal" be sufficient? > >> > >> That explains very clearly the requirement, but it raises the question > >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existi= ng > >> system call. > > > > I think you are suggesting to use sigqueue() to deliver the signal and > > perform the reaping when a special value accompanies it. This would be > > somewhat similar to my early suggestion to use a flag in > > pidfd_send_signal() (see: > > https://lore.kernel.org/patchwork/patch/1060407) to implement memory > > reaping which has another advantage of operation on PIDFDs instead of > > PIDs which can be recycled. > > kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the > > signal and return without blocking. Changing that behavior was > > considered unacceptable in these discussions. > > Does this mean that you need two threads, one that sends SIGKILL, and > one that calls process_reap? Given that sending SIGKILL is blocking > with the existing interfaces? Sending SIGKILL is blocking in terms of delivering the signal, but it does not block waiting for SIGKILL to be processed by the signal recipient and memory to be released. When I was talking about "blocking", I meant that current kill() and friends do not block to wait for SIGKILL to be processed. process_reap() will block until the memory is released. Whether the userspace caller is using it right after sending a SIGKILL to reclaim the memory synchronously or spawns a separate thread to reclaim memory asynchronously is up to the user. Both patterns are supported. > Please also note that asynchronous deallocation of resources leads to > bugs and can cause unrelated workloads to fail. For example, in some > configurations, clone can fail with EAGAIN even in cases where the total > number of tasks is clearly bounded because the kernel signals task exit > to applications before all resources are deallocated. I'm worried that > the new interface makes things quite a bit worse in this regard. The process_reap() releases memory synchronously, no kthreads are being used. If asynchronous release is required, the userspace would need to spawn a userspace thread and issue this syscall from it. I hope this clears your concerns, which I think are about asynchronous deallocations within the kernel. Thanks! > > Thanks, > Florian > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA163C07E96 for ; Thu, 8 Jul 2021 06:39:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 79BDE61CC4 for ; Thu, 8 Jul 2021 06:39:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 79BDE61CC4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2F4FF6B006C; Thu, 8 Jul 2021 02:39:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A3676B0070; Thu, 8 Jul 2021 02:39:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 144556B0071; Thu, 8 Jul 2021 02:39:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id E5FFF6B006C for ; Thu, 8 Jul 2021 02:39:47 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2F10E810A799 for ; Thu, 8 Jul 2021 06:39:47 +0000 (UTC) X-FDA: 78338470014.01.97B19DE Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf18.hostedemail.com (Postfix) with ESMTP id BBB8C4002087 for ; Thu, 8 Jul 2021 06:39:46 +0000 (UTC) Received: by mail-yb1-f176.google.com with SMTP id o139so7235520ybg.9 for ; Wed, 07 Jul 2021 23:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=Q8SVvr1jTKOq3j1BCslRCPLWmaQMXyTeDdBuyUrjxBV79g3niuwJsDTUD9ZUUaHJ0s CmDeMucSyAQQbZ+j+xfAsykJJ8181J+14rDNbhmGr15mWq/jW0X4JmmPGq0ampUTCFLj Ih+mINskcgfLLx4uGScvhz3uXISFAHbElucLnlFNBHUtgsdR0E6XhwibmnKO4OTFmRkV efPB+obV4e8RgQ32aCvcY3hJ7ndU45iGFw+PGvbVEaIq3vSFen6pNbrsoOgfi0UqzvfW sqyAwULvJamzlS1Cj6xjhtRamBFLl80h2AQTm2OBtFA+kI3OxMaiY20J/kelIl8MGSJu zkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=ROW/le19jNFRL2eb9MhBvd/Bb/tewop9N7JOpdqK8MK5h1+iq6qyZt8O7IbpWugFO6 K8ITcQ51xlGxdZFh+cS/u1M3EVG3an8XAZQAeQHTF2oMYXMTM5X2y6IqHXZdtwsXAwHs KAAXvi9uH0QV0cRjbym2m5SuL9KBSG13O5fTMYEO2WNJXm9tKaWWZyZAl7MoBGDGNVRB o5ETrt2H8TxLFnwGH7xus4jirmV17XXx1x+DUB0Z0Z+QkrV0GHArV9SunV6Z3JHzp1kC zbs/KGNpIa3Gm7TzrJ2XAKRJTE+bRsJp8XRrolfajAzYjmzSbG8wxZLjrgipH7BQ27FD dHWA== X-Gm-Message-State: AOAM5327UGUxqrAW1Vua7B2KQKV5z4kzi4k3o2bC0C7Oc/WsgrBVU5d5 +qtUAG4UbXoL4CcjtLaKy30v713HMsNCe3f2e2p3DA== X-Google-Smtp-Source: ABdhPJydDNpP10FuMf7vtQWnSXkxu7Hqff+2qGL2BpHgm1W1RGB/mrktTIT/F59XFDxVt4zCP/5eFl+K0CtXXxtblYY= X-Received: by 2002:a25:4102:: with SMTP id o2mr35358941yba.23.1625726385748; Wed, 07 Jul 2021 23:39:45 -0700 (PDT) MIME-Version: 1.0 References: <20210623192822.3072029-1-surenb@google.com> <87sg0qa22l.fsf@oldenburg.str.redhat.com> <87wnq1z7kl.fsf@oldenburg.str.redhat.com> <87zguxxrfl.fsf@oldenburg.str.redhat.com> In-Reply-To: <87zguxxrfl.fsf@oldenburg.str.redhat.com> From: Suren Baghdasaryan Date: Wed, 7 Jul 2021 23:39:34 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: introduce process_reap system call To: Florian Weimer Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Minchan Kim , Christian Brauner , Christoph Hellwig , Oleg Nesterov , David Hildenbrand , Jann Horn , Shakeel Butt , Tim Murray , Linux API , linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=Q8SVvr1j; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BBB8C4002087 X-Rspam-User: nil X-Stat-Signature: ydpne7ixuk8c3ayponpd3wc1h8mp5jdc X-HE-Tag: 1625726386-274396 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 7, 2021 at 11:15 PM Florian Weimer wrote: > > * Suren Baghdasaryan: > > > On Wed, Jul 7, 2021 at 10:41 PM Florian Weimer wro= te: > >> > >> * Suren Baghdasaryan: > >> > >> > On Wed, Jul 7, 2021 at 2:47 AM Florian Weimer w= rote: > >> >> > >> >> * Suren Baghdasaryan: > >> >> > >> >> > The API is as follows, > >> >> > > >> >> > int process_reap(int pidfd, unsigned int flags); > >> >> > > >> >> > DESCRIPTION > >> >> > The process_reap() system call is used to free the memo= ry of a > >> >> > dying process. > >> >> > > >> >> > The pidfd selects the process referred to by the PID fi= le > >> >> > descriptor. > >> >> > (See pidofd_open(2) for further information) > >> >> > > >> >> > The flags argument is reserved for future use; currentl= y, this > >> >> > argument must be specified as 0. > >> >> > > >> >> > RETURN VALUE > >> >> > On success, process_reap() returns 0. On error, -1 is r= eturned > >> >> > and errno is set to indicate the error. > >> >> > >> >> I think the manual page should mention what it means for a process = to be > >> >> =E2=80=9Cdying=E2=80=9D, and how to move a process to this state. > >> > > >> > Thanks for the suggestion, Florian! Would replacing "dying process" > >> > with "process which was sent a SIGKILL signal" be sufficient? > >> > >> That explains very clearly the requirement, but it raises the question > >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existi= ng > >> system call. > > > > I think you are suggesting to use sigqueue() to deliver the signal and > > perform the reaping when a special value accompanies it. This would be > > somewhat similar to my early suggestion to use a flag in > > pidfd_send_signal() (see: > > https://lore.kernel.org/patchwork/patch/1060407) to implement memory > > reaping which has another advantage of operation on PIDFDs instead of > > PIDs which can be recycled. > > kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the > > signal and return without blocking. Changing that behavior was > > considered unacceptable in these discussions. > > Does this mean that you need two threads, one that sends SIGKILL, and > one that calls process_reap? Given that sending SIGKILL is blocking > with the existing interfaces? Sending SIGKILL is blocking in terms of delivering the signal, but it does not block waiting for SIGKILL to be processed by the signal recipient and memory to be released. When I was talking about "blocking", I meant that current kill() and friends do not block to wait for SIGKILL to be processed. process_reap() will block until the memory is released. Whether the userspace caller is using it right after sending a SIGKILL to reclaim the memory synchronously or spawns a separate thread to reclaim memory asynchronously is up to the user. Both patterns are supported. > Please also note that asynchronous deallocation of resources leads to > bugs and can cause unrelated workloads to fail. For example, in some > configurations, clone can fail with EAGAIN even in cases where the total > number of tasks is clearly bounded because the kernel signals task exit > to applications before all resources are deallocated. I'm worried that > the new interface makes things quite a bit worse in this regard. The process_reap() releases memory synchronously, no kthreads are being used. If asynchronous release is required, the userspace would need to spawn a userspace thread and issue this syscall from it. I hope this clears your concerns, which I think are about asynchronous deallocations within the kernel. Thanks! > > Thanks, > Florian >