From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 864B1C07E9E for ; Thu, 8 Jul 2021 06:39:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6981261CD1 for ; Thu, 8 Jul 2021 06:39:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229843AbhGHGm3 (ORCPT ); Thu, 8 Jul 2021 02:42:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbhGHGm2 (ORCPT ); Thu, 8 Jul 2021 02:42:28 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F78C061574 for ; Wed, 7 Jul 2021 23:39:46 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id k184so7227384ybf.12 for ; Wed, 07 Jul 2021 23:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=Q8SVvr1jTKOq3j1BCslRCPLWmaQMXyTeDdBuyUrjxBV79g3niuwJsDTUD9ZUUaHJ0s CmDeMucSyAQQbZ+j+xfAsykJJ8181J+14rDNbhmGr15mWq/jW0X4JmmPGq0ampUTCFLj Ih+mINskcgfLLx4uGScvhz3uXISFAHbElucLnlFNBHUtgsdR0E6XhwibmnKO4OTFmRkV efPB+obV4e8RgQ32aCvcY3hJ7ndU45iGFw+PGvbVEaIq3vSFen6pNbrsoOgfi0UqzvfW sqyAwULvJamzlS1Cj6xjhtRamBFLl80h2AQTm2OBtFA+kI3OxMaiY20J/kelIl8MGSJu zkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=HeChCU0j3BoiRAeqPjDYLyLJPR9WLxh75Wl+hsxLYiiN7yeRjDs2XdghnhxUXhlDcZ PCPJdFnyGOrIpNDn4I25wd8tjK3DpJYfU+io2UDxWnk84dz+fSGAtxwh45LKO41awh5O J0SMTX5/MywTYvnIgvg4sj6U9Vooihwqt+X/cCZUciufvNCkhLy2ZkgZe+ZFzHsZwzFp qRAlMI6AIHFADhyqY42fTTwXuEjS/7meIXLIGISDkQgW8ii/yEKFz40V29fe6ZHkXrRb RILp1ng202RsQME2KgS47Vw3k64C5P+ExkL7eBTie94rKT7L6UQ0rFKe1enkipQTh4TW Pt6g== X-Gm-Message-State: AOAM5304c4UNsNa8vRwcfyxqg4R8hc4KVc+s4mh/ZESwFFb20gTUc5sZ rGqAHm+7kfQhXS5Eni0myznQEGub3JjfZL/Tt4TnIQ== X-Google-Smtp-Source: ABdhPJydDNpP10FuMf7vtQWnSXkxu7Hqff+2qGL2BpHgm1W1RGB/mrktTIT/F59XFDxVt4zCP/5eFl+K0CtXXxtblYY= X-Received: by 2002:a25:4102:: with SMTP id o2mr35358941yba.23.1625726385748; Wed, 07 Jul 2021 23:39:45 -0700 (PDT) MIME-Version: 1.0 References: <20210623192822.3072029-1-surenb@google.com> <87sg0qa22l.fsf@oldenburg.str.redhat.com> <87wnq1z7kl.fsf@oldenburg.str.redhat.com> <87zguxxrfl.fsf@oldenburg.str.redhat.com> In-Reply-To: <87zguxxrfl.fsf@oldenburg.str.redhat.com> From: Suren Baghdasaryan Date: Wed, 7 Jul 2021 23:39:34 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: introduce process_reap system call To: Florian Weimer Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Minchan Kim , Christian Brauner , Christoph Hellwig , Oleg Nesterov , David Hildenbrand , Jann Horn , Shakeel Butt , Tim Murray , Linux API , linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 7, 2021 at 11:15 PM Florian Weimer wrote: > > * Suren Baghdasaryan: > > > On Wed, Jul 7, 2021 at 10:41 PM Florian Weimer wro= te: > >> > >> * Suren Baghdasaryan: > >> > >> > On Wed, Jul 7, 2021 at 2:47 AM Florian Weimer w= rote: > >> >> > >> >> * Suren Baghdasaryan: > >> >> > >> >> > The API is as follows, > >> >> > > >> >> > int process_reap(int pidfd, unsigned int flags); > >> >> > > >> >> > DESCRIPTION > >> >> > The process_reap() system call is used to free the memo= ry of a > >> >> > dying process. > >> >> > > >> >> > The pidfd selects the process referred to by the PID fi= le > >> >> > descriptor. > >> >> > (See pidofd_open(2) for further information) > >> >> > > >> >> > The flags argument is reserved for future use; currentl= y, this > >> >> > argument must be specified as 0. > >> >> > > >> >> > RETURN VALUE > >> >> > On success, process_reap() returns 0. On error, -1 is r= eturned > >> >> > and errno is set to indicate the error. > >> >> > >> >> I think the manual page should mention what it means for a process = to be > >> >> =E2=80=9Cdying=E2=80=9D, and how to move a process to this state. > >> > > >> > Thanks for the suggestion, Florian! Would replacing "dying process" > >> > with "process which was sent a SIGKILL signal" be sufficient? > >> > >> That explains very clearly the requirement, but it raises the question > >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existi= ng > >> system call. > > > > I think you are suggesting to use sigqueue() to deliver the signal and > > perform the reaping when a special value accompanies it. This would be > > somewhat similar to my early suggestion to use a flag in > > pidfd_send_signal() (see: > > https://lore.kernel.org/patchwork/patch/1060407) to implement memory > > reaping which has another advantage of operation on PIDFDs instead of > > PIDs which can be recycled. > > kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the > > signal and return without blocking. Changing that behavior was > > considered unacceptable in these discussions. > > Does this mean that you need two threads, one that sends SIGKILL, and > one that calls process_reap? Given that sending SIGKILL is blocking > with the existing interfaces? Sending SIGKILL is blocking in terms of delivering the signal, but it does not block waiting for SIGKILL to be processed by the signal recipient and memory to be released. When I was talking about "blocking", I meant that current kill() and friends do not block to wait for SIGKILL to be processed. process_reap() will block until the memory is released. Whether the userspace caller is using it right after sending a SIGKILL to reclaim the memory synchronously or spawns a separate thread to reclaim memory asynchronously is up to the user. Both patterns are supported. > Please also note that asynchronous deallocation of resources leads to > bugs and can cause unrelated workloads to fail. For example, in some > configurations, clone can fail with EAGAIN even in cases where the total > number of tasks is clearly bounded because the kernel signals task exit > to applications before all resources are deallocated. I'm worried that > the new interface makes things quite a bit worse in this regard. The process_reap() releases memory synchronously, no kthreads are being used. If asynchronous release is required, the userspace would need to spawn a userspace thread and issue this syscall from it. I hope this clears your concerns, which I think are about asynchronous deallocations within the kernel. Thanks! > > Thanks, > Florian >