From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E237C2D0FB for ; Tue, 12 May 2020 22:57:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9F31823126 for ; Tue, 12 May 2020 22:57:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="Fo9KNuvX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F31823126 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E13089000F5; Tue, 12 May 2020 18:57:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC3A09000F3; Tue, 12 May 2020 18:57:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB4659000F5; Tue, 12 May 2020 18:57:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id AFF619000F3 for ; Tue, 12 May 2020 18:57:06 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6457C8248047 for ; Tue, 12 May 2020 22:57:06 +0000 (UTC) X-FDA: 76809579252.07.coat41_20dd9f51be461 X-HE-Tag: coat41_20dd9f51be461 X-Filterd-Recvd-Size: 7655 Received: from mail-qv1-f66.google.com (mail-qv1-f66.google.com [209.85.219.66]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 May 2020 22:57:05 +0000 (UTC) Received: by mail-qv1-f66.google.com with SMTP id r3so7351884qvm.1 for ; Tue, 12 May 2020 15:57:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=uw6rcMifPeMm4u4SGK3ZasDqkkdkPJx5dwRBOhb9iZ0=; b=Fo9KNuvX7TEGQgNrP52TRDep9vxdmN6lPqGfnpBGIBh9wjAB8AKtPP/Vqb7ViMW3Di xeZMTfN09TVP/s7ZQCmIHH6jhRiLYTAVIXIXOKudV3tl//5uF0X1/kzjgfSM0Oj4ULck l9NVd3YzTKjAgXRt1YRh1wvOsXex1hNf1OeCa15HKUtQo0QJ1FzMZDMAoeG3lnguurKt 0JWFTTsPu0tf03pF6ik2qK3NFsWA5Jd1mhl1QDdfSF2jSgYTBz4rzbkZ+kU8mbOSG5kh QZvtN4Ycns0x0bwu3qCEvWkT/drJsG/nWpo8qETwaKR0X3pbeDhn9MS07P7Sp57k0GJK 6zwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=uw6rcMifPeMm4u4SGK3ZasDqkkdkPJx5dwRBOhb9iZ0=; b=Km/144LHmAsnT8mNXAl/56zlEM2qQBaVvW+cKJ3xAbajV3UatHkSLaTB6uWYeffMjm T+vk6DKHzO8Ukl+jyFEPqcjZhHVj/hze3zx9n4YaHkEQ6L7VpU78bSPeQ1KaS5jDtCUE bktu3pH4Rh8yjcb8EkvGAvZJiLTeriOj5uKJfjCjnwci89hbQkDlrnfucID5b4GWHcjE c0NZQRy+idd/oyCpbica8e6eTsmqHjpEYpNAlZ73e8ek28SZjHqhWrYCaPrxMWVlQJAv rlYnktZZGyG8OeWuQM5ERGmkmAeTVa+XxQ/hYFygaAZLbqZkyIQb3rwQyQCezCAuhOLG r66w== X-Gm-Message-State: AGi0PuZLX/k74FmqGIryh9b+kg/z3ywcnZE+cZozcvJCl6Lk9mF92n0Y YtFo605nn/KoZugJJVgKMI9ClQ== X-Google-Smtp-Source: APiQypIVWK4DOItsAwZSF++TtWpZGg68EDxJaQnUCromDwnEvmQDQRxuDhdiWxCkFJmry65ErRlOuQ== X-Received: by 2002:ad4:4d44:: with SMTP id m4mr18915485qvm.236.1589324224935; Tue, 12 May 2020 15:57:04 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:2627]) by smtp.gmail.com with ESMTPSA id h185sm12160202qkc.19.2020.05.12.15.57.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2020 15:57:04 -0700 (PDT) Date: Tue, 12 May 2020 18:56:45 -0400 From: Johannes Weiner To: Roman Gushchin Cc: Andrew Morton , Michal Hocko , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 06/19] mm: memcg/slab: obj_cgroup API Message-ID: <20200512225645.GA488426@cmpxchg.org> References: <20200422204708.2176080-1-guro@fb.com> <20200422204708.2176080-7-guro@fb.com> <20200507210314.GD161043@cmpxchg.org> <20200507222631.GA81857@carbon.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200507222631.GA81857@carbon.dhcp.thefacebook.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 07, 2020 at 03:26:31PM -0700, Roman Gushchin wrote: > On Thu, May 07, 2020 at 05:03:14PM -0400, Johannes Weiner wrote: > > On Wed, Apr 22, 2020 at 01:46:55PM -0700, Roman Gushchin wrote: > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -257,6 +257,78 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr) > > > } > > > > > > #ifdef CONFIG_MEMCG_KMEM > > > +extern spinlock_t css_set_lock; > > > + > > > +static void obj_cgroup_release(struct percpu_ref *ref) > > > +{ > > > + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); > > > + struct mem_cgroup *memcg; > > > + unsigned int nr_bytes; > > > + unsigned int nr_pages; > > > + unsigned long flags; > > > + > > > + nr_bytes = atomic_read(&objcg->nr_charged_bytes); > > > + WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); > > > + nr_pages = nr_bytes >> PAGE_SHIFT; > > > > What guarantees that we don't have a partial page in there at this > > point? I guess any outstanding allocations would pin the objcg, so > > when it's released all objects have been freed. > > Right, this is exactly the reason why there can't be a partial page > at this point. > > > > > But if that's true, how can we have full pages remaining in there now? > > Imagine the following sequence: > 1) CPU0: objcg == stock->cached_objcg > 2) CPU1: we do a small allocation (e.g. 92 bytes), page is charged > 3) CPU1: a process from another memcg is allocating something, stock if flushed, > objcg->nr_charged_bytes = PAGE_SIZE - 92 > 5) CPU0: we do release this object, 92 bytes are added to stock->nr_bytes > 6) CPU0: stock is flushed, 92 bytes are added to objcg->nr_charged_bytes > > In the result, nr_charged_bytes == PAGE_SIZE. This PAGE will be uncharged > in obj_cgroup_release(). > > I've double checked this, it's actually pretty easy to trigger in the real life. Ah, so no outstanding allocations, but a full page split between the percpu cache and objcg->nr_charged_bytes. Would it simplify things if refill_obj_stock() drained on >= PAGE_SIZE stock instead of > PAGE_SIZE? Otherwise, the scenario above would be good to have as a comment as the drain on release is not self-explanatory. > > > +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) > > > +{ > > > + struct mem_cgroup *memcg; > > > + unsigned int nr_pages, nr_bytes; > > > + int ret; > > > + > > > + if (consume_obj_stock(objcg, size)) > > > + return 0; > > > + > > > + rcu_read_lock(); > > > + memcg = obj_cgroup_memcg(objcg); > > > + css_get(&memcg->css); > > > + rcu_read_unlock(); > > > + > > > + nr_pages = size >> PAGE_SHIFT; > > > + nr_bytes = size & (PAGE_SIZE - 1); > > > + > > > + if (nr_bytes) > > > + nr_pages += 1; > > > + > > > + ret = __memcg_kmem_charge(memcg, gfp, nr_pages); > > > > If consume_obj_stock() fails because some other memcg is cached, > > should this try to consume the partial page in objcg->nr_charged_bytes > > before getting more pages? > > We can definitely do it, but I'm not sure if it's good for the performance. > > Dealing with nr_charged_bytes will require up to two atomic writes, > so calling __memcg_kmem_charge() can be faster if memcg is cached > on percpu stock. Hm, but it's the slowpath. And sooner or later somebody has to deal with the remaining memory in there. > > > + if (!ret && nr_bytes) > > > + refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); > > > > This will put the cgroup into the cache if the allocation resulted in > > a partially free page. > > > > But if this was a page allocation, we may have objcg->nr_cache_bytes > > from a previous subpage allocation that we should probably put back > > into the stock. > > Yeah, we can do this, but I don't know if there will be any benefits. It's mostly about understanding the code. > Actually we don't wanna to touch objcg->nr_cache_bytes too often, as > it can become a contention point if there are many threads allocating > in the memory cgroup. > > So maybe we want to do the opposite: relax it a bit and stop flushing > it on every stock refill and flush only if it exceeds a certain value. That could be useful, yes. > > It's not a common case, I'm just trying to understand what > > objcg->nr_cache_bytes holds and when it does so. > > So it's actually a centralized leftover from the rounding of the actual > charge to the page size. It would be good to add code comments explaining this. Thanks!