From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05E6FC27C42 for ; Wed, 16 Aug 2023 22:36:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346866AbjHPWgO (ORCPT ); Wed, 16 Aug 2023 18:36:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346886AbjHPWgG (ORCPT ); Wed, 16 Aug 2023 18:36:06 -0400 Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66D6F1980 for ; Wed, 16 Aug 2023 15:36:04 -0700 (PDT) Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-99de884ad25so184100366b.3 for ; Wed, 16 Aug 2023 15:36:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692225363; x=1692830163; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HQmT7Uh6sWQfVX+jY524yxLYNHKGsl2fBek6097smRE=; b=HE9ACyuPPztqNyPxMJQxaWvmISJrT2O9wxlj6I3goqtgzDkkUS1kgXt+rhwzwpNVau vgAcjEFeg8oZ2Y7Qn54zx6x10dTczteNXnJCK8dJiC8IdA1AnK98T4udBeNNZDHmCxT8 TesHS7tlAyWIfqrRXQ5Bq0wKxcgC92lmxlUCOADGVHL1AYWe9PWf+1vxj7oSPUonsg4N 8bexPQjNfdrbICA/7tpvLEd7RzKCdazkK1XZJ5tw9Iull4sR7NqpzIELjVlqslZvIoz+ FNMUm9u69u6T8dHHa6ghSukbVowSf6hc6Z+mTkLJdHWWEHWESEj5KRZkM2SrzOr7eUp8 wLnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692225363; x=1692830163; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HQmT7Uh6sWQfVX+jY524yxLYNHKGsl2fBek6097smRE=; b=D9AyTC4OxvpZcTNNtM8CxI6Z2jYiBn3iu7JVSKnpIGgJODaMpFyI2ThtKabzZWChUf 9tut1LEsP0Scq9Z0/CxNI7YBIhkdcXiCqaTa/od0cqYv/zvJ9bHSB2Wl1BqiwHsFvu74 +R/7xEfGAQGhTDFF+DZpQFZYEtW2vjBjn8p2N5GG0Lb9SfexvWfp9fmJBoD6XiwJ0GWE 4pn4XJhDNfcHmYJcW+nMl6JmtFcd+uK8T028sePLNm103D9llKVl9pCNgoqugvOteavb YpxFUVWCHEGpgU79raRRKGt0BNEMxHdItEkJiN3Ac+QcCBsg73Jxi6d814lvhJod4gbE 0lQA== X-Gm-Message-State: AOJu0Yzbfq/rwvdjrAwejTY2Gf/bgNIZyIbGOnNIXTBV1mXQlRh5TqZ5 QRMhg064mpYiPgMBmZaGfzdZEPaVe5hpl0Sc48H0wNssmH+KSSEhwvezbQ== X-Google-Smtp-Source: AGHT+IEMFyFOdyFVoQT+XlqGRpULwvmyipNra1bP9KYmfqYJt1udtszGG/fZE+m1+7FS8iRCYv31nvmhnFivxhOxM1w= X-Received: by 2002:a17:907:78c2:b0:99a:e756:57bf with SMTP id kv2-20020a17090778c200b0099ae75657bfmr2349406ejc.7.1692225362689; Wed, 16 Aug 2023 15:36:02 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Wed, 16 Aug 2023 15:35:26 -0700 Message-ID: Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads To: Tejun Heo Cc: Shakeel Butt , Michal Hocko , Johannes Weiner , Roman Gushchin , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ivan Babrou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 16, 2023 at 12:08=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Wed, Aug 16, 2023 at 10:11:20AM -0700, Shakeel Butt wrote: > > These options are not white and black and there can be something in > > between but let me be very clear on what I don't want and would NACK. > > I'm not a big fan of interfaces with hidden states. What you're proposing > isn't strictly that but it's still a bit nasty. So, if we can get by with= out > doing that, that'd be great. Agreed. I will try to send patches soon implementing option (2) above, basically removing unified flushing. I will try to document any potential regressions that may happen and how we may fix them. Ideally we see no regressions. > > > I don't want a global sleepable lock which can be taken by potentially > > any application running on the system. We have seen similar global > > locks causing isolation and priority inversion issues in production. > > So, not another lock which needs to be taken under extreme condition > > (reading stats under OOM) by a high priority task (node controller) > > and might be held by a low priority task. > > Yeah, this is a real concern. Those priority inversions do occur and can = be > serious but causing serious problems under memory pressure usually requir= es > involving memory allocations and IOs. Here, it's just all CPU. So, at lea= st > in OOM conditions, this shouldn't be in the way (the system wouldn't have > anything else to do anyway). > > It is true that this still can lead to priority through CPU competition t= ho. > However, that problem isn't necessarily solved by what you're suggesting > either unless you want to restrict explicit flushing based on permissions > which is another can of worms. Right. Also in the case of a mutex, if we disable preemption while holding the mutex, this makes sure that whoever holding the mutex does not starve waiters. Essentially the difference would be that waiters will sleep with the mutex instead of spinning, but the mutex holder itself wouldn't sleep. I will make this a separate patch, just in case it's too controversial. Switching the spinlock to a mutex should not block removing unified flushing. > > My preference is not exposing this in user interface. This is mostly aris= ing > from internal implementation details and isn't what users necessarily car= e > about. There are many things we can do on the kernel side to make trade-o= ffs > among overhead, staleness and priority inversions. If we make this an > explicit userland interface behavior, we get locked into that semantics > which we'll likely regret in some future. > Yeah that's what I am trying to do here as well. I will try to follow up on this discussion with patches soon. Thanks everyone!