From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3799AC4742C for ; Sat, 14 Nov 2020 02:16:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFAFF2225D for ; Sat, 14 Nov 2020 02:16:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="VxAVz1yM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AFAFF2225D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D87436B005C; Fri, 13 Nov 2020 21:16:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D37716B005D; Fri, 13 Nov 2020 21:16:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C743A6B0068; Fri, 13 Nov 2020 21:16:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id 9B0766B005C for ; Fri, 13 Nov 2020 21:16:35 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 40B151EE6 for ; Sat, 14 Nov 2020 02:16:35 +0000 (UTC) X-FDA: 77481409950.21.eye73_360cf2427314 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 278E6180442C3 for ; Sat, 14 Nov 2020 02:16:35 +0000 (UTC) X-HE-Tag: eye73_360cf2427314 X-Filterd-Recvd-Size: 6715 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Sat, 14 Nov 2020 02:16:34 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DCEBE20A8B; Sat, 14 Nov 2020 02:16:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605320193; bh=f53VHtpTrGtgDaUId+eGdbfbwV1CzrlZJlAgcdrN4Jc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=VxAVz1yMuuB8W7A8AtC9Zqncm3wiP/le2O9NBW8GNkFAmWqXK9EZy4v6BkAu2XxXS ryd/KliwGNUs0cNYHemGDXCgZECcMjXzRnG3/gHKnDgoMbt+c4a9uP9XvSBVFukn02 I5wo9rKB38e0w3LMD3Fk4oyc9SFgwINpiwh2KGc0= Date: Fri, 13 Nov 2020 18:16:32 -0800 From: Andrew Morton To: Suren Baghdasaryan Cc: Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Christian Brauner , Oleg Nesterov , Tim Murray , linux-api@vger.kernel.org, linux-mm , LKML , kernel-team , Minchan Kim Subject: Re: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process Message-Id: <20201113181632.6d98489465430a987c96568d@linux-foundation.org> In-Reply-To: References: <20201113173448.1863419-1-surenb@google.com> <20201113155539.64e0af5b60ad3145b018ab0d@linux-foundation.org> <20201113170032.7aa56ea273c900f97e6ccbdc@linux-foundation.org> <20201113171810.bebf66608b145cced85bf54c@linux-foundation.org> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan wrote: > On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton wrote: > > > > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan wrote: > > > > > > > > Seems to me that the ability to reap another process's memory is a > > > > > > generally useful one, and that it should not be tied to delivering a > > > > > > signal in this fashion. > > > > > > > > > > > > And we do have the new process_madvise(MADV_PAGEOUT). It may need a > > > > > > few changes and tweaks, but can't that be used to solve this problem? > > > > > > > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was > > > > > one of the options recently discussed in > > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com > > > > > . The thread describes some of the issues with that approach but if we > > > > > limit it to processes with pending SIGKILL only then I think that > > > > > would be doable. > > > > > > > > Why would it be necessary to read /proc/pid/maps? I'd have thought > > > > that a starting effort would be > > > > > > > > madvise((void *)0, (void *)-1, MADV_PAGEOUT) > > > > > > > > (after translation into process_madvise() speak). Which is equivalent > > > > to the proposed process_madvise(MADV_DONTNEED_MM)? > > > > > > Yep, this is very similar to option #3 in > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@mail.gmail.com > > > and I actually have a tested prototype for that. > > > > Why is the `vector=NULL' needed? Can't `vector' point at a single iovec > > which spans the whole address range? > > That would be the option #4 from the same discussion and the issues > noted there are "process_madvise return value can't handle such a > large number of bytes and there is MAX_RW_COUNT limit on max number of > bytes one process_madvise call can handle". In my prototype I have a > special handling for such "bulk operation" to work around the > MAX_RW_COUNT limitation. Ah, OK, return value. Maybe process_madvise() shouldn't have done that and should have simply returned 0 on success, like madvise(). I guess a special "nuke whole address space" command is OK. But, again in the search for generality, the ability to nuke very large amounts of address space (but not the entire address space) would be better. The process_madvise() return value issue could be addressed by adding a process_madvise() mode which return 0 on success. And I guess the MAX_RW_COUNT issue is solvable by adding an import_iovec() arg to say "don't check that". Along those lines. It's all sounding a bit painful (but not *too* painful). But to reiterate, I do think that adding the ability for a process to shoot down a large amount of another process's memory is a lot more generally useful than tying it to SIGKILL, agree? > > > > > If that's the > > > preferred method then I can post it quite quickly. > > > > I assume you've tested that prototype. How did its usefulness compare > > with this SIGKILL-based approach? > > Just to make sure I understand correctly your question, you are asking > about performance comparison of: > > // approach in this RFC > pidfd_send_signal(SIGKILL, SYNC_REAP_MM) > > vs > > // option #4 in the previous RFC > kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED); > > If so, I have results for the current RFC approach but the previous > approach was testing on an older device, so don't have > apples-to-apples comparison results at the moment. I can collect the > data for fair comparison if desired, however I don't expect a > noticeable performance difference since they both do pretty much the > same thing (even on different devices my results are quite close). I > think it's more a question of which API would be more appropriate. OK. I wouldn't expect performance to be very different (and things can be sped up if so), but the API usefulness might be an issue. Using process_madvise() (or similar) makes it a two-step operation, whereas tying it to SIGKILL&&TASK_UNINTERRUPTIBLE provides a more precise tool. Any thoughts on this?