From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D91FC17441 for ; Sat, 9 Nov 2019 00:40:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F32592084D for ; Sat, 9 Nov 2019 00:40:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KpNhnpOO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F32592084D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4340A6B0003; Fri, 8 Nov 2019 19:40:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BC306B0006; Fri, 8 Nov 2019 19:40:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 25E6E6B0007; Fri, 8 Nov 2019 19:40:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 0C4DD6B0003 for ; Fri, 8 Nov 2019 19:40:55 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id B38AD121A for ; Sat, 9 Nov 2019 00:40:54 +0000 (UTC) X-FDA: 76134884028.05.cub47_7e96a405c60a X-HE-Tag: cub47_7e96a405c60a X-Filterd-Recvd-Size: 8259 Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Sat, 9 Nov 2019 00:40:54 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id l20so6863511oie.10 for ; Fri, 08 Nov 2019 16:40:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KTO0G4yUHoQ2566+748dLq9N6XJTos6t1RKXW6iu8LY=; b=KpNhnpOOa3qcfcHYXInuUVl0BAw8G0yyb4KfBM4SGSMaep5qbx/BvvcVNeLHgEyBuC cgq33G3YWscje+Ru56drwaaKZWAW1jJUAjWmEKUtdcK57zh5NbvjHm+3gMGtJKviD2ZE eXNb93Lq1q5GlDhkbP+563K4M2Wsnp8rNd8djXnQNH9RDp4pkGqu/3Am8RgK5j1I69yy +Litcz+iepJW5TharNy4btNYoPFeDFV0Cc/xxio5krZ4aZT96fmk5Wby1hZdngw2JOkW o5P41KAZnj7z/2wPIG9444KEbw5XFqdx8u1iF2nN9H4eAqD31DOkMEzE+E0ZkYbZDdjQ 1ldA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KTO0G4yUHoQ2566+748dLq9N6XJTos6t1RKXW6iu8LY=; b=Qe0Go+A5tDKmcg5VBK2LWw1atN1nSLA6LXJ3FmDzoa92gjJxPXEiT9ZQ9ggwj3juyT FPwEHkt1zLSbv37iWmf5yrIGFdtettXfxPfw7J6lvV3Ot0Wo/BSioGsFhPB3VaX1kB4V R8WqAZ4jz/IqP81QGQ2+Lyk0eN77HzZWJTGfYSWlVSrAOPukJ46Vth7MoG2J37FM6MtW lCGY7uKicLNBoUgTEkjLxEn3Cu3brP3QsTu0wlXnbb9ZOWx2IVkSDm1qVKxPXdmWhS76 wLTWufcInuYG7LLm0OK1u7R/iuOcbhxtEyJdM6RAsltzHDrnHiqx4QJbf34Jj/YQfbms BsAA== X-Gm-Message-State: APjAAAXXhvqo/iA0kScM/fe3UCWoYR36bK1UiZdUh/4hcBVCFY4StSlu /z2QINzyLUvitmppQkt3TY6XIwg1ETrxJme4wgpRJw== X-Google-Smtp-Source: APXvYqzFG4ayJA9m/ZjeTPbIRTox5oAFY8AlkPBKYOaQszMhW9xvB2UIU21h0S1SiA5tbcofbulCmZpIu3gQ/Lkn4fE= X-Received: by 2002:aca:1101:: with SMTP id 1mr12894333oir.103.1573260052966; Fri, 08 Nov 2019 16:40:52 -0800 (PST) MIME-Version: 1.0 References: <20191030013701.39647-1-almasrymina@google.com> <20191030013701.39647-2-almasrymina@google.com> <9e10c273-f0ab-4173-570e-26c314b989fb@oracle.com> <3f30658c-0e3d-7d5c-4de9-1526b9bac3ce@oracle.com> In-Reply-To: <3f30658c-0e3d-7d5c-4de9-1526b9bac3ce@oracle.com> From: Mina Almasry Date: Fri, 8 Nov 2019 16:40:42 -0800 Message-ID: Subject: Re: [PATCH v8 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations To: Mike Kravetz Cc: shuah , open list , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org, Aneesh Kumar Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 8, 2019 at 4:01 PM Mike Kravetz wrote: > > On 11/8/19 3:48 PM, Mina Almasry wrote: > > On Thu, Nov 7, 2019 at 4:57 PM Mike Kravetz wrote: > >> > >> On 10/29/19 6:36 PM, Mina Almasry wrote: > >>> @@ -22,27 +22,35 @@ struct hugetlb_cgroup; > >>> * Minimum page order trackable by hugetlb cgroup. > >>> * At least 3 pages are necessary for all the tracking information. > >>> */ > >>> -#define HUGETLB_CGROUP_MIN_ORDER 2 > >>> +#define HUGETLB_CGROUP_MIN_ORDER 3 > >> > >> Correct me if misremembering, but I think the reson you changed this was > >> so that you could use page[3].private. Correct? > >> In that case isn't page[3] the last page of an order 2 allocation? > >> If my understanding is correct, then leave HUGETLB_CGROUP_MIN_ORDER as is > >> and update the preceding comment to say that at least 4 pages are necessary. > >> > > > > Yes, I just misunderstood what MIN_ORDER means. I'll revert the code change. > > But, do update the comment please. > Will do. > > >>> @@ -85,18 +89,32 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup, > >>> int idx; > >>> > >>> for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) { > >>> - struct page_counter *counter = &h_cgroup->hugepage[idx]; > >>> struct page_counter *parent = NULL; > >> > >> Should we perhaps rename 'parent' to 'fault_parent' to be consistent? > > > > Yes that makes sense; will do. > > > >> That makes me think if perhaps the naming in the previous patch should > >> be more explicit. Make the existing names explicitly contin 'fault' as > >> the new names contain 'reservation'. > >> Just a thought. > >> > > > > You mean change the names of the actual user-facing files? I'm all for > > better names but that would break existing users that read/write the > > hugetlb_cgroup.2MB.usage_in_bytes/limit_in_bytes users, and so I would > > assume is a no-go. > > > > I was thinking about internal variables/definitions such as: > > +enum { > + /* Tracks hugetlb memory faulted in. */ > + HUGETLB_RES_USAGE, > + /* Tracks hugetlb memory reserved. */ > + HUGETLB_RES_RESERVATION_USAGE, > + /* Limit for hugetlb memory faulted in. */ > + HUGETLB_RES_LIMIT, > + /* Limit for hugetlb memory reserved. */ > + HUGETLB_RES_RESERVATION_LIMIT, > + /* Max usage for hugetlb memory faulted in. */ > + HUGETLB_RES_MAX_USAGE, > + /* Max usage for hugetlb memory reserved. */ > + HUGETLB_RES_RESERVATION_MAX_USAGE, > + /* Faulted memory accounting fail count. */ > + HUGETLB_RES_FAILCNT, > + /* Reserved memory accounting fail count. */ > + HUGETLB_RES_RESERVATION_FAILCNT, > + HUGETLB_RES_NULL, > + HUGETLB_RES_MAX, > +}; > > But, I guess the existing definitions (such as HUGETLB_RES_LIMIT) correspond > closely to the externally visible name. In that case, you should leave them > as is and ignore my comment. > > > >>> @@ -126,6 +144,26 @@ static void hugetlb_cgroup_css_free(struct cgroup_subsys_state *css) > >>> kfree(h_cgroup); > >>> } > >>> > >>> +static void hugetlb_cgroup_move_parent_reservation(int idx, > >>> + struct hugetlb_cgroup *h_cg) > >>> +{ > >>> + struct hugetlb_cgroup *parent = parent_hugetlb_cgroup(h_cg); > >>> + > >>> + /* Move the reservation counters. */ > >>> + if (!parent_hugetlb_cgroup(h_cg)) { > >>> + parent = root_h_cgroup; > >>> + /* root has no limit */ > >>> + page_counter_charge( > >>> + &root_h_cgroup->reserved_hugepage[idx], > >>> + page_counter_read( > >>> + hugetlb_cgroup_get_counter(h_cg, idx, true))); > >>> + } > >>> + > >>> + /* Take the pages off the local counter */ > >>> + page_counter_cancel( > >>> + hugetlb_cgroup_get_counter(h_cg, idx, true), > >>> + page_counter_read(hugetlb_cgroup_get_counter(h_cg, idx, true))); > >>> +} > >> > >> I know next to nothing about cgroups and am just comparing this to the > >> existing hugetlb_cgroup_move_parent() routine. hugetlb_cgroup_move_parent > >> updates the cgroup pointer in each page being moved. Do we need to do > >> something similar for reservations being moved (move pointer in reservation)? > >> > > > > Oh, good catch. Yes I need to be doing that. I should probably > > consolidate those routines so the code doesn't miss things like this. > > This might get a bit ugly/complicated? Seems like you will need to examine > all hugetlbfs inodes and vma's mapping those inodes. > Hmm yes on closer look it does seem like this is not straightforward. I'll write a test that does this reparenting so I can start running into the issue and poke for solutions. Off the top of my head, I think maybe we can just not reparent the hugetlb reservations - the hugetlb_cgroup stays alive until all its memory is uncharged. That shouldn't be too bad. Today, I think memcg doesn't reparent memory when it gets offlined. I'll poke at this a bit and come back with suggestions, you may want to hold off reviewing the rest of the patches until then. > -- > Mike Kravetz