From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B091EC388F3 for ; Fri, 27 Sep 2019 22:51:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 73AE52082F for ; Fri, 27 Sep 2019 22:51:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="K1PMWaVA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73AE52082F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0C5918E0005; Fri, 27 Sep 2019 18:51:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 077658E0001; Fri, 27 Sep 2019 18:51:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECE9D8E0005; Fri, 27 Sep 2019 18:51:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id CCD558E0001 for ; Fri, 27 Sep 2019 18:51:37 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 56A8C2C1F for ; Fri, 27 Sep 2019 22:51:37 +0000 (UTC) X-FDA: 75982199034.04.owl43_65581b1583c43 X-HE-Tag: owl43_65581b1583c43 X-Filterd-Recvd-Size: 6178 Received: from mail-ot1-f68.google.com (mail-ot1-f68.google.com [209.85.210.68]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Sep 2019 22:51:36 +0000 (UTC) Received: by mail-ot1-f68.google.com with SMTP id y39so3636856ota.7 for ; Fri, 27 Sep 2019 15:51:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UestkIPELIjBFfWcMJudydMzKJ+gMFgCuI48K8BqrAY=; b=K1PMWaVARwRP8FL++1ErBgWx4Qdus1r4ONn9vNNwu89yrhxqFGuxr4DrOtAfy01N0A VNtv1sAmVr7Y1nn75qAdQz5i7/uPflDiVz2bFpvtrJyKqcf8gQeb7HSoYJ9z/I+3hhQ8 +79su4rMPcLgmKpXa6KALw0ZHLKUqDmNY/5blt3LzapYnEOAwOA789BJrUCupeNuSm5u tWa6sQFHOzbjbghiNGMHKObasz/mUgKI98H6sz8k2+dyNaWyxN/lPAIYxtlnLubNwrc/ JJsR6KcEHhR/XA9NbWHdrW4PNYHkW4DlrQTxNGTW5w48tnH1J8HuDXisU+yvK6QMhkU1 N2nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UestkIPELIjBFfWcMJudydMzKJ+gMFgCuI48K8BqrAY=; b=pwse9388l48uDbd9Ikfgu2LEuS60JtHWLrA1P3cMa1KwEcsb1g9BQ3AOqbG8G0N//C ToAKUPbli5E2eBXndEVf7UU81vs6a9I3OePVWlo9tHSlCPjJ+roZKrLL8JSHLBMtMii2 YTn/gUHPJGpipIOVUVhPsqrZlqB/VHP0FrVuouDg2cnrPPM/vDRM8HUv+t2vk2dpp2GH yb0s3WfiaLvWZxv5LSrNRIZzcwu2pOXesBYdz2PwwhEqOoS057FGaJ4pgk8WdKRxN9T5 064pde//PbSjnVQlziYyh8jLEVGHxxfJIeY5OknySPzNRqoXQJADq34yYR59lnZF+/DA YF4w== X-Gm-Message-State: APjAAAXqdwbLVYMZiVfU+0BVejxed/BUs5XhG+/i0GrK2PqzNrvwv0Pr i/+dph/omPEpr2oFbHaqhETniV1fsFj6YBBxMuQwqw== X-Google-Smtp-Source: APXvYqztJgW0BPdmBajhQzrtxxzRSauVev6OBtd6SsBpd7X8EfW0nxAdOJDaed4nJVCUyelbJcfqxSBn0PfalhIMNSc= X-Received: by 2002:a05:6830:1358:: with SMTP id r24mr4925285otq.349.1569624695280; Fri, 27 Sep 2019 15:51:35 -0700 (PDT) MIME-Version: 1.0 References: <20190919222421.27408-1-almasrymina@google.com> <3c73d2b7-f8d0-16bf-b0f0-86673c3e9ce3@oracle.com> <8f7db4f1-9c16-def5-79dc-d38d6b9d150e@oracle.com> <794398cc-07a4-d235-a0da-0246f5a09f6e@oracle.com> In-Reply-To: <794398cc-07a4-d235-a0da-0246f5a09f6e@oracle.com> From: Mina Almasry Date: Fri, 27 Sep 2019 15:51:24 -0700 Message-ID: Subject: Re: [PATCH v5 0/7] hugetlb_cgroup: Add hugetlb_cgroup reservation limits To: Mike Kravetz Cc: Tejun Heo , David Rientjes , Aneesh Kumar , shuah , Shakeel Butt , Greg Thelen , Andrew Morton , khalid.aziz@oracle.com, open list , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, cgroups@vger.kernel.org, =?UTF-8?Q?Michal_Koutn=C3=BD?= Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 27, 2019 at 2:59 PM Mike Kravetz wrote: > > On 9/26/19 5:55 PM, Mina Almasry wrote: > > Provided we keep the existing controller untouched, should the new > > controller track: > > > > 1. only reservations, or > > 2. both reservations and allocations for which no reservations exist > > (such as the MAP_NORESERVE case)? > > > > I like the 'both' approach. Seems to me a counter like that would work > > automatically regardless of whether the application is allocating > > hugetlb memory with NORESERVE or not. NORESERVE allocations cannot cut > > into reserved hugetlb pages, correct? > > Correct. One other easy way to allocate huge pages without reserves > (that I know is used today) is via the fallocate system call. > > > If so, then applications that > > allocate with NORESERVE will get sigbused when they hit their limit, > > and applications that allocate without NORESERVE may get an error at > > mmap time but will always be within their limits while they access the > > mmap'd memory, correct? > > Correct. At page allocation time we can easily check to see if a reservation > exists and not charge. For any specific page within a hugetlbfs file, > a charge would happen at mmap time or allocation time. > > One exception (that I can think of) to this mmap(RESERVE) will not cause > a SIGBUS rule is in the case of hole punch. If someone punches a hole in > a file, not only do they remove pages associated with the file but the > reservation information as well. Therefore, a subsequent fault will be > the same as an allocation without reservation. > I don't think it causes a sigbus. This is the scenario, right: 1. Make cgroup with limit X bytes. 2. Task in cgroup mmaps a file with X bytes, causing the cgroup to get charged 3. A hole of size Y is punched in the file, causing the cgroup to get uncharged Y bytes. 4. The task faults in memory from the hole, getting charged up to Y bytes again. But they will be still within their limits. IIUC userspace only gets sigbus'd if the limit is lowered between steps 3 and 4, and it's ok if it gets sigbus'd there in my opinion. > I 'think' the code to remove/truncate a file will work corrctly as it > is today, but I need to think about this some more. > > > mmap'd memory, correct? So the 'both' counter seems like a one size > > fits all. > > > > I think the only sticking point left is whether an added controller > > can support both cgroup-v2 and cgroup-v1. If I could get confirmation > > on that I'll provide a patchset. > > Sorry, but I can not provide cgroup expertise. > -- > Mike Kravetz