From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5685CC433E0 for ; Mon, 1 Feb 2021 18:55:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9AC564DA8 for ; Mon, 1 Feb 2021 18:55:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9AC564DA8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E0996B006C; Mon, 1 Feb 2021 13:55:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 16AC56B006E; Mon, 1 Feb 2021 13:55:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 032216B0070; Mon, 1 Feb 2021 13:55:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0071.hostedemail.com [216.40.44.71]) by kanga.kvack.org (Postfix) with ESMTP id E1AA26B006C for ; Mon, 1 Feb 2021 13:55:26 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7ABC53622 for ; Mon, 1 Feb 2021 18:55:26 +0000 (UTC) X-FDA: 77770602252.23.veil15_210639f275c4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 5B8B637604 for ; Mon, 1 Feb 2021 18:55:26 +0000 (UTC) X-HE-Tag: veil15_210639f275c4 X-Filterd-Recvd-Size: 5149 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 18:55:25 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 61363ABD6; Mon, 1 Feb 2021 18:55:24 +0000 (UTC) To: Milan Broz , Michal Hocko Cc: linux-mm@kvack.org, Linux Kernel Mailing List , Mikulas Patocka References: <70885d37-62b7-748b-29df-9e94f3291736@gmail.com> <20210108134140.GA9883@dhcp22.suse.cz> <9474cd07-676a-56ed-1942-5090e0b9a82f@suse.cz> From: Vlastimil Babka Subject: Re: Very slow unlockall() Message-ID: <6eebb858-d517-b70d-9202-f4e84221ed89@suse.cz> Date: Mon, 1 Feb 2021 19:55:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/1/21 7:00 PM, Milan Broz wrote: > On 01/02/2021 14:08, Vlastimil Babka wrote: >> On 1/8/21 3:39 PM, Milan Broz wrote: >>> On 08/01/2021 14:41, Michal Hocko wrote: >>>> On Wed 06-01-21 16:20:15, Milan Broz wrote: >>>>> Hi, >>>>> >>>>> we use mlockall(MCL_CURRENT | MCL_FUTURE) / munlockall() in cryptse= tup code >>>>> and someone tried to use it with hardened memory allocator library. >>>>> >>>>> Execution time was increased to extreme (minutes) and as we found, = the problem >>>>> is in munlockall(). >>>>> >>>>> Here is a plain reproducer for the core without any external code -= it takes >>>>> unlocking on Fedora rawhide kernel more than 30 seconds! >>>>> I can reproduce it on 5.10 kernels and Linus' git. >>>>> >>>>> The reproducer below tries to mmap large amount memory with PROT_NO= NE (later never used). >>>>> The real code of course does something more useful but the problem = is the same. >>>>> >>>>> #include >>>>> #include >>>>> #include >>>>> #include >>>>> >>>>> int main (int argc, char *argv[]) >>>>> { >>>>> void *p =3D mmap(NULL, 1UL << 41, PROT_NONE, MAP_PRIVATE |= MAP_ANONYMOUS, -1, 0); So, this is 2TB memory area, but PROT_NONE means it's never actually popu= lated, although mlockall(MCL_CURRENT) should do that. Once you put PROT_READ | PROT_WRITE there, the mlockall() starts taking ages. So does that reflect your use case? munlockall() with large PROT_NONE are= as? If so, munlock_vma_pages_range() is indeed not optimized for that, but I wou= ld expect such scenario to be uncommon, so better clarify first. >>>>> >>>>> if (p =3D=3D MAP_FAILED) return 1; >>>>> >>>>> if (mlockall(MCL_CURRENT | MCL_FUTURE)) return 1; >>>>> printf("locked\n"); >>>>> >>>>> if (munlockall()) return 1; >>>>> printf("unlocked\n"); >>>>> >>>>> return 0; >>>>> } >=20 > ... >=20 >>> Today's Linus git - 5.11.0-rc2+ in my testing x86_64 VM (no extensive= kernel debug options): >>> >>> # time ./lock >>> locked >>> unlocked >>> >>> real 0m4.172s >>> user 0m0.000s >>> sys 0m4.172s >>=20 >> The perf report would be more interesting from this configuration. >=20 > ok, I cannot run perf on that particular VM but tried the latest Fedora= stable > kernel without debug options - 5.10.12-200.fc33.x86_64 >=20 > This is the report running reproducer above: >=20 > time: > real 0m6.123s > user 0m0.099s > sys 0m5.310s >=20 > perf: >=20 > # Total Lost Samples: 0 > # > # Samples: 20K of event 'cycles' > # Event count (approx.): 20397603279 > # > # Overhead Command Shared Object Symbol =20 > # ........ ....... ................. ............................ > # > 47.26% lock [kernel.kallsyms] [k] follow_page_mask > 20.43% lock [kernel.kallsyms] [k] munlock_vma_pages_range > 15.92% lock [kernel.kallsyms] [k] follow_page > 7.40% lock [kernel.kallsyms] [k] rcu_all_qs > 5.87% lock [kernel.kallsyms] [k] _cond_resched > 3.08% lock [kernel.kallsyms] [k] follow_huge_addr > 0.01% lock [kernel.kallsyms] [k] __update_load_avg_cfs_rq > 0.01% lock [kernel.kallsyms] [k] ____fput > 0.01% lock [kernel.kallsyms] [k] rmap_walk_file > 0.00% lock [kernel.kallsyms] [k] page_mapped > 0.00% lock [kernel.kallsyms] [k] native_irq_return_iret > 0.00% lock [kernel.kallsyms] [k] _raw_spin_lock_irq > 0.00% lock [kernel.kallsyms] [k] perf_iterate_ctx > 0.00% lock [kernel.kallsyms] [k] finish_task_switch > 0.00% perf [kernel.kallsyms] [k] native_sched_clock > 0.00% lock [kernel.kallsyms] [k] native_write_msr > 0.00% perf [kernel.kallsyms] [k] native_write_msr >=20 >=20 > m. >=20