From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CC8BC433EF for ; Thu, 28 Oct 2021 12:04:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ACB47610CA for ; Thu, 28 Oct 2021 12:04:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ACB47610CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 451966B0071; Thu, 28 Oct 2021 08:04:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 401576B0072; Thu, 28 Oct 2021 08:04:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F0E46B0073; Thu, 28 Oct 2021 08:04:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 0951A6B0071 for ; Thu, 28 Oct 2021 08:04:33 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9710B8249980 for ; Thu, 28 Oct 2021 12:04:33 +0000 (UTC) X-FDA: 78745714026.28.C2B6451 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf23.hostedemail.com (Postfix) with ESMTP id 1B6C99000393 for ; Thu, 28 Oct 2021 12:04:24 +0000 (UTC) Date: Thu, 28 Oct 2021 14:04:29 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1635422670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=i7meJMnHvfMh3jAPn7mbBm4R9Ud6+0Ii+at1R9Wvsyw=; b=bbw7sXVL7gP5Z2o1+duYSU2JlF0GePwt45Jk6NQbdN+ea2r5POaPLESE0Ck0F2XJTALoNR cMlcs9yZjMjrNsQlXQZkdSpqPcxt/6ZvFlXVbm9R+qZLB0E9AYl1zG5YXW/ukvIh5w4BRJ Mnr0DVbUIFzXi05bKFZRIxclHSxn7x+L1pBAjmYb6qbOLeVzbSOiguY4DW0u4dUo4TTG+X zHq8fJsf+qjajGaPwhJ9JdWKs1onpTqKJOwRFyI6/yM+pAB8RDPeMFK6RJ+xBfUMROqXt5 eG6HsIXFL/0ntImS8hzOL78eNZ/kQY57VXq7oC/KEk2gYHdSuY9XiVWC+lHUuA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1635422670; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=i7meJMnHvfMh3jAPn7mbBm4R9Ud6+0Ii+at1R9Wvsyw=; b=QDdeRHbR8CK/mVIbr0Dhf/7IyWyoEq6gBV+oc+ins2H0GNjjgDbXBgIiTN//mou+CQZxp7 RbKk1cR493QX8TBg== From: Sebastian Andrzej Siewior To: Mel Gorman Cc: linux-mm@kvack.org, Andrew Morton , Vlastimil Babka , Peter Zijlstra , Thomas Gleixner Subject: Re: [RFC] mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT Message-ID: <20211028120429.eqgmqmva7276jd5n@linutronix.de> References: <20211026165100.ahz5bkx44lrrw5pt@linutronix.de> <20211027091212.GP3959@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20211027091212.GP3959@techsingularity.net> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1B6C99000393 X-Stat-Signature: cu4cc47zpjgjnet4dsyqfh6q79zu9zhn Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=bbw7sXVL; dkim=pass header.d=linutronix.de header.s=2020e header.b=QDdeRHbR; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf23.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-HE-Tag: 1635422664-222969 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021-10-27 10:12:12 [+0100], Mel Gorman wrote: > On Tue, Oct 26, 2021 at 06:51:00PM +0200, Sebastian Andrzej Siewior wrote: > > In https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ > > Mel wrote: > > > > | While I ack'd this, an RT application using THP is playing with fire, > > | I know the RT extension for SLE explicitly disables it from being enabled > > | at kernel config time. At minimum the critical regions should be mlocked > > | followed by prctl to disable future THP faults that are non-deterministic, > > | both from an allocation point of view, and a TLB access point of view. It's > > | still reasonable to expect a smaller TLB reach for huge pages than > > | base pages. > > > > With TRANSPARENT_HUGEPAGE enabled I haven't seen spikes > 100us > > in cyclictest. I did have mlock_all() enabled but nothing else. > > PR_SET_THP_DISABLE remained unchanged (enabled). Is there anything to > > stress this to be sure or is mlock_all() enough to do THP but leave the > > mlock() applications alone? > > > > Then Mel continued with: > > > > | It's a similar hazard with NUMA balancing, an RT application should either > > | disable balancing globally or set a memory policy that forces it to be > > | ignored. They should be doing this anyway to avoid non-deterministic > > | memory access costs due to NUMA artifacts but it wouldn't surprise me > > | if some applications got it wrong. > > > > Usually (often) RT applications are pinned. I would assume that on > > bigger box the RT tasks are at least pinned to a node. How bad can this > > get in worst case? cyclictest pins every thread to CPU. I could remove > > this for testing. What would be a good test to push this to its limit? > > > > Cc: Mel Gorman > > Signed-off-by: Sebastian Andrzej Siewior > > Somewhat tentative but > > Acked-by: Mel Gorman > > It's tentative because NUMA Balancing gets default disabled on PREEMPT_RT > but it's still possible to enable where as THP is disabled entirely > and can never be enabled. This is a little inconsistent and it would be > preferable that they match either by disabling NUMA_BALANCING entirely or > forbidding TRANSPARENT_HUGEPAGE_ALWAYS && PREEMPT_RT. I'm ok with either. Oh. I can go either way depending on the input ;) > There is the possibility that an RT application could use THP safely by > using madvise() and mlock(). That way, THP is available but only if an > application has explicit knowledge of THP and smart enough to do it only > during the initialisation phase with Yes that was my question. So if you have "always", do mlock_all() in the application and then have other threads that same application doing malloc/ free of memory that the RT thread is not touching then bad things can still happen, right? My understanding is that all threads can be blocked in a page fault if there is some THP operation going on. You suggest that the application is using THP by setting madvice on the relevant area, mlock afterwards and then nothing bad can happen. No defrag or an optimisation happens later. The memory area uses hugepages after the madvice or not. If so, then this sounds good. > diff --git a/mm/Kconfig b/mm/Kconfig > index d16ba9249bc5..d6ccca216028 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -393,6 +393,7 @@ choice > > config TRANSPARENT_HUGEPAGE_ALWAYS > bool "always" > + depends on !PREEMPT_RT > help > Enabling Transparent Hugepage always, can increase the > memory footprint of applications without a guaranteed > > There is the slight caveat that even then THP can have inconsistent > latencies if it has a split THP with separate entries for base and huge > pages. The responsibility would be on the person deploying the application > to ensure a platform was suitable for both RT and using huge pages. split THP? You mean latencies are different by accessing the memory depending if it is reached via the THP entry or one of the many 4kib entries? I'm more worries about locked mmap_lock while the THP operation is in progress and then a fault from the RT application has to wait until the THP operation is done. Sebastian