From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D22BC47096 for ; Thu, 3 Jun 2021 10:19:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0424C6139A for ; Thu, 3 Jun 2021 10:19:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0424C6139A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D8266B0074; Thu, 3 Jun 2021 06:19:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 888336B0075; Thu, 3 Jun 2021 06:19:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 728CF6B0078; Thu, 3 Jun 2021 06:19:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id 4345A6B0074 for ; Thu, 3 Jun 2021 06:19:31 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C8A31EFBE for ; Thu, 3 Jun 2021 10:19:30 +0000 (UTC) X-FDA: 78212015700.33.ECC0D36 Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) by imf17.hostedemail.com (Postfix) with ESMTP id A5EDA4202A27 for ; Thu, 3 Jun 2021 10:19:20 +0000 (UTC) Received: by mail-oo1-f48.google.com with SMTP id v13-20020a4aa40d0000b02902052145a469so1273744ool.3 for ; Thu, 03 Jun 2021 03:19:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6RNWFHjnK95WvTFB33SdSxX9jSbolNBF1nLUGKTdCNQ=; b=jF+aXbEyX05XO3P7/FYCOozuLw09MHxAlxMp9y0xUT5WS5Sl5zBKk6ndg8Dzvmwxn4 0R0MrBJrEkfpDnnEMEUCH66SjMuvRwBvxCI62/0C19KDtXmexOMCXpS5cDpbT90O9jep 87qA24eUczfRMT6kiKYDCFb39Kw29iLJzn8k5O36SR4j/NQq0ODZlwj27jWfyqqldqz2 hNGkhrLTf7bplojEx5pTL+HDi6NZsJcJ+WMCX5LSLz/osvmuGaZ4+/eOQKqvDuURYX36 f0iSR6wNaH62+K8S4DpimTW/uDPObdY+Q2mxWIq5JN9taHqJcnl8uqJjj364DLaE47FS IWaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6RNWFHjnK95WvTFB33SdSxX9jSbolNBF1nLUGKTdCNQ=; b=U4zf2ZD1y4HUB2spCf9eQZ6nQYX+/2NdQAvX4BQ/UooPD+4aQqaEligBPkwKOa3KJM BcmLBVnAI6c6aupsy0+FmfD3BXIKz2Dlp/o73Za1PyV7oAYsmHlZWkHNTcO6HdKJFR86 B4a/rH3ByZ+H9GBhuJYzWsi8srmfp88wYFEkY68qOCpCxfTzCAN81wci1jrmr2AsSjeM HAp4uqq80lJtGrix8tEkbYDFsLS2oR8F8RZDmZoKI7e+3Hvj+bCfYdOXs+RcKA3JN5Pd 4yWC6BoJomRYBFVOsQ/j1VqIfGxLmycooxSOFn2J72XE1Nd85nY9HXOOh2gRVTbZHzG3 peDQ== X-Gm-Message-State: AOAM533X27kJiZYpDsJWhlTwf7bpRHyEIkhkK8s5+2v8E7oU2e72ydMZ DoSR/ewQeRAGelUEKqTj6n3QhR/PPwJMdFkFnjE= X-Google-Smtp-Source: ABdhPJwN/tOwGpC5wSWtYlXgOnnJkF2W9Km3j1z8KbXFDVx6wWD8nLUiyptTx2t52cwoYrEJvMLxc4wNaXiEXI540bY= X-Received: by 2002:a4a:d4c7:: with SMTP id r7mr28522472oos.85.1622715569622; Thu, 03 Jun 2021 03:19:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: yulei zhang Date: Thu, 3 Jun 2021 18:19:18 +0800 Message-ID: Subject: Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg To: Shakeel Butt Cc: Chris Down , Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Cgroups , benbjiang@tencent.com, Wanpeng Li , Yulei Zhang , Linux MM , Michal Hocko , Roman Gushchin Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=jF+aXbEy; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of yuleikernel@gmail.com designates 209.85.161.48 as permitted sender) smtp.mailfrom=yuleikernel@gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A5EDA4202A27 X-Stat-Signature: kszpquoc47garso5pnnxdnoeusi3zxmp X-HE-Tag: 1622715560-64613 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 2, 2021 at 11:39 PM Shakeel Butt wrote: > > On Wed, Jun 2, 2021 at 2:11 AM yulei zhang wrote: > > > > On Tue, Jun 1, 2021 at 10:45 PM Chris Down wrote: > > > > > > yulei zhang writes: > > > >Yep, dynamically adjust the memory.high limits can ease the memory pressure > > > >and postpone the global reclaim, but it can easily trigger the oom in > > > >the cgroups, > > > > > > To go further on Shakeel's point, which I agree with, memory.high should > > > _never_ result in memcg OOM. Even if the limit is breached dramatically, we > > > don't OOM the cgroup. If you have a demonstration of memory.high resulting in > > > cgroup-level OOM kills in recent kernels, then that needs to be provided. :-) > > > > You are right, I mistook it for max. Shakeel means the throttling > > during context switch > > which uses memory.high as threshold to calculate the sleep time. > > Currently it only applies > > to cgroupv2. In this patchset we explore another idea to throttle the > > memory usage, which > > rely on setting an average allocation speed in memcg. We hope to > > suppress the memory > > usage in low priority cgroups when it reaches the system watermark and > > still keep the activities > > alive. > > I think you need to make the case: why should we add one more form of > throttling? Basically why memory.high is not good for your use-case > and the proposed solution works better. Though IMO it would be a hard > sell. Thanks. IMHO, there are differences between these two throttlings. memory.high is a per-memcg throttle which targets to limit the memory usage of the tasks in the cgroup. For the memory allocation speed throttle(MST), the purpose is to avoid the memory burst in cgroup which would trigger the global reclaim and affects the timing sensitive workloads in other cgroup. For example, we have two pods with memory overcommit enabled, one includes online tasks and the other has offline tasks, if we restrict the memory usage of the offline pod with memory.high, it will lose the benefit of memory overcommit when the other workloads are idle. On the other hand, if we don't limit the memory usage, it will easily break the system watermark when there suddenly has massive memory operations. If enable MST in this case, we will be able to avoid the direct reclaim and leverage the overcommit. . From mboxrd@z Thu Jan 1 00:00:00 1970 From: yulei zhang Subject: Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg Date: Thu, 3 Jun 2021 18:19:18 +0800 Message-ID: References: Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6RNWFHjnK95WvTFB33SdSxX9jSbolNBF1nLUGKTdCNQ=; b=jF+aXbEyX05XO3P7/FYCOozuLw09MHxAlxMp9y0xUT5WS5Sl5zBKk6ndg8Dzvmwxn4 0R0MrBJrEkfpDnnEMEUCH66SjMuvRwBvxCI62/0C19KDtXmexOMCXpS5cDpbT90O9jep 87qA24eUczfRMT6kiKYDCFb39Kw29iLJzn8k5O36SR4j/NQq0ODZlwj27jWfyqqldqz2 hNGkhrLTf7bplojEx5pTL+HDi6NZsJcJ+WMCX5LSLz/osvmuGaZ4+/eOQKqvDuURYX36 f0iSR6wNaH62+K8S4DpimTW/uDPObdY+Q2mxWIq5JN9taHqJcnl8uqJjj364DLaE47FS IWaQ== In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Shakeel Butt Cc: Chris Down , Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Cgroups , benbjiang-1Nz4purKYjRBDgjK7y7TUQ@public.gmane.org, Wanpeng Li , Yulei Zhang , Linux MM , Michal Hocko , Roman Gushchin On Wed, Jun 2, 2021 at 11:39 PM Shakeel Butt wrote: > > On Wed, Jun 2, 2021 at 2:11 AM yulei zhang wrote: > > > > On Tue, Jun 1, 2021 at 10:45 PM Chris Down wrote: > > > > > > yulei zhang writes: > > > >Yep, dynamically adjust the memory.high limits can ease the memory pressure > > > >and postpone the global reclaim, but it can easily trigger the oom in > > > >the cgroups, > > > > > > To go further on Shakeel's point, which I agree with, memory.high should > > > _never_ result in memcg OOM. Even if the limit is breached dramatically, we > > > don't OOM the cgroup. If you have a demonstration of memory.high resulting in > > > cgroup-level OOM kills in recent kernels, then that needs to be provided. :-) > > > > You are right, I mistook it for max. Shakeel means the throttling > > during context switch > > which uses memory.high as threshold to calculate the sleep time. > > Currently it only applies > > to cgroupv2. In this patchset we explore another idea to throttle the > > memory usage, which > > rely on setting an average allocation speed in memcg. We hope to > > suppress the memory > > usage in low priority cgroups when it reaches the system watermark and > > still keep the activities > > alive. > > I think you need to make the case: why should we add one more form of > throttling? Basically why memory.high is not good for your use-case > and the proposed solution works better. Though IMO it would be a hard > sell. Thanks. IMHO, there are differences between these two throttlings. memory.high is a per-memcg throttle which targets to limit the memory usage of the tasks in the cgroup. For the memory allocation speed throttle(MST), the purpose is to avoid the memory burst in cgroup which would trigger the global reclaim and affects the timing sensitive workloads in other cgroup. For example, we have two pods with memory overcommit enabled, one includes online tasks and the other has offline tasks, if we restrict the memory usage of the offline pod with memory.high, it will lose the benefit of memory overcommit when the other workloads are idle. On the other hand, if we don't limit the memory usage, it will easily break the system watermark when there suddenly has massive memory operations. If enable MST in this case, we will be able to avoid the direct reclaim and leverage the overcommit. .