From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8628EC433EF for ; Wed, 13 Apr 2022 16:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236882AbiDMQdV (ORCPT ); Wed, 13 Apr 2022 12:33:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236870AbiDMQdQ (ORCPT ); Wed, 13 Apr 2022 12:33:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F1CB259A63 for ; Wed, 13 Apr 2022 09:30:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649867454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=D2SnrNCKiTa1JS0rYYQDcOGF80pSS4KiX5Gqks49E+9oPr4w51x7Y6u/rFsz+FvKnV3z7Q FwMpJpyWYhiWdSIF7lpAuZy5GkrP8szrWSVF7vkMyhXnjXx7gRzR2evPzUQOsyYV5Fmr5e FLVnkjysybxjDKraHYJ/TX6UrfxNM2E= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-BDAD78gBOzG90quI0Lb3LA-1; Wed, 13 Apr 2022 12:30:52 -0400 X-MC-Unique: BDAD78gBOzG90quI0Lb3LA-1 Received: by mail-wm1-f69.google.com with SMTP id g13-20020a1c4e0d000000b0038eba16aa46so977375wmh.7 for ; Wed, 13 Apr 2022 09:30:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=bKTmiGcISP1wyspqv/pZumCmBBhWmM7ICM76ny0IMCzeYdXCZIm0KdKfS4Sg/06N+u V5WdPWEh2LBfF4Q3z0sRUSuzYIg7EMjcuitU+Jes3YCU1gO0+Y4JTRX9sxPN5u1BDG7/ zDCPPbpkKCg00A802yCf+bIT8gZh+n+eQVD+c+mRGIueHms0bvFErxDssnd2YBs5IhAH NcyjSCVbrj2au3dsde8JERdSJSaep1t5L/XfEpbpcG0dTrZXe8UuaiOTfiPK1uXPceaO rA1Y/PJXT5qQYJKnlXzu7NhZ3/34Zo086YVsPW74FqYiRBQ3FaQIys5yqVvorX68l3N4 4a3A== X-Gm-Message-State: AOAM530w7eK6FPoqHBA87BD+/FSXlEUOX/234xV7gh0kyZUR7Oxp8Asm H2ffTraQc7gHRNl1JIMwPSWdwUVuyXVFcnQKD8e2KP0P2cZBoiSYSd1SdEV+Y2pK6mTkz4qQnVz W6I50gQRodWpwDtZ1CwHXX+zy X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458259wmq.197.1649867450714; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkytZT68ibSBKNJoHlOqbNjWnA9Zh1oDwEE8SJcMIkEgIwTu5PARnU1fLD1KJu/KXuQqJYmA== X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458202wmq.197.1649867450341; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:5800:1078:ebb9:e2c3:ea8c? (p200300cbc70458001078ebb9e2c3ea8c.dip0.t-ipconnect.de. [2003:cb:c704:5800:1078:ebb9:e2c3:ea8c]) by smtp.gmail.com with ESMTPSA id f9-20020a05600c154900b0038cb98076d6sm3269751wmg.10.2022.04.13.09.30.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Apr 2022 09:30:49 -0700 (PDT) Message-ID: <3b9effd9-4aba-e7ca-b3ca-6a474fd6469f@redhat.com> Date: Wed, 13 Apr 2022 18:30:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 Content-Language: en-US To: Andy Lutomirski , Jason Gunthorpe Cc: Sean Christopherson , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , "Eric W. Biederman" References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> <7ab689e7-e04d-5693-f899-d2d785b09892@redhat.com> <20220412143636.GG64706@ziepe.ca> <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK In-Reply-To: <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > So this is another situation where the actual backend (TDX, SEV, pKVM, pure software) makes a difference -- depending on exactly what backend we're using, the memory may not be unmoveable. It might even be swappable (in the potentially distant future). Right. And on a system without swap we don't particularly care about mlock, but we might (in most cases) care about fragmentation with unmovable memory. > > Anyway, here's a concrete proposal, with a bit of handwaving: Thanks for investing some brainpower. > > We add new cgroup limits: > > memory.unmoveable > memory.locked > > These can be set to an actual number or they can be set to the special value ROOT_CAP. If they're set to ROOT_CAP, then anyone in the cgroup with capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or locked memory with this (and potentially other) new APIs. If it's 0, then they can't. If it's another value, then the memory can be allocated, charged to the cgroup, up to the limit, with no particular capability needed. The default at boot is ROOT_CAP. Anyone who wants to configure it differently is free to do so. This avoids introducing a DoS, makes it easy to run tests without configuring cgroup, and lets serious users set up their cgroups. I wonder what the implications are for existing user space. Assume we want to move page pinning (rdma, vfio, io_uring, ...) to the new model. How can we be sure a) We don't break existing user space b) We don't open the doors unnoticed for the admin to go crazy on unmovable memory. Any ideas? > > Nothing is charge per mm. > > To make this fully sensible, we need to know what the backend is for the private memory before allocating any so that we can charge it accordingly. Right, the support for migration and/or swap defines how to account. -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7BD8CC433EF for ; Wed, 13 Apr 2022 16:31:58 +0000 (UTC) Received: from localhost ([::1]:38564 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nefuP-0007JY-Ed for qemu-devel@archiver.kernel.org; Wed, 13 Apr 2022 12:31:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:32962) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1neftR-0006Jv-2e for qemu-devel@nongnu.org; Wed, 13 Apr 2022 12:31:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:54928) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1neftP-0002oZ-0c for qemu-devel@nongnu.org; Wed, 13 Apr 2022 12:30:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649867454; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=D2SnrNCKiTa1JS0rYYQDcOGF80pSS4KiX5Gqks49E+9oPr4w51x7Y6u/rFsz+FvKnV3z7Q FwMpJpyWYhiWdSIF7lpAuZy5GkrP8szrWSVF7vkMyhXnjXx7gRzR2evPzUQOsyYV5Fmr5e FLVnkjysybxjDKraHYJ/TX6UrfxNM2E= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-659-Zu8CDGAuODKc2jZnw57tWw-1; Wed, 13 Apr 2022 12:30:52 -0400 X-MC-Unique: Zu8CDGAuODKc2jZnw57tWw-1 Received: by mail-wm1-f71.google.com with SMTP id l41-20020a05600c1d2900b0038ec007ac7fso3049887wms.4 for ; Wed, 13 Apr 2022 09:30:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=JpsTDUjpmmHkgiKCF75tJ9Qb2QqmtHwjLJW7Lijr6Ec=; b=gWuK4Ok7uBcv5xx6xMBwwB+dVxEMWioSqNVU3LSahcF5JD45gq1CqUaOdSL3psvD3O JcRpMPzoD30VFHGlwgRlMmZkVUoRThl4BxaYNc7gb7ZkYDjyWP5QKjC4RxFb4n02bhHm lONMoSgDjVLdTgb3yPYGM/XgzlIvfOkrBuZLXILq1dSG+jGp0ZSV7PWPDHKNc346+D4f qDyH/yiWg7rVvrlHbmZCjWFe/YNMhPzCvd4QQdGV3h2QVDv6u7dyEyRGqxVFW4LdlHqq z4MG/hi0H1QcxINkDTTUzzWtzOGSqt0yoERLHS+kZZ8Khlh3e5OV5Qf/3x4COzuTzunH 54xQ== X-Gm-Message-State: AOAM533GU+omH6E+sBHb/EphLMeoO/o4/8I/sKB1Yu7zcEjV9GiTVuBk cAwme28WR3d2F3RW0ImUeQ/J4RgoxzLaZ515A7z5D6ctv5L2+ruE7N2mGsekLCLYM/uPZQYcOhu i85eIcNKf5Et1/q4= X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458253wmq.197.1649867450705; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwkytZT68ibSBKNJoHlOqbNjWnA9Zh1oDwEE8SJcMIkEgIwTu5PARnU1fLD1KJu/KXuQqJYmA== X-Received: by 2002:a05:600c:3512:b0:38c:be56:fc9c with SMTP id h18-20020a05600c351200b0038cbe56fc9cmr9458202wmq.197.1649867450341; Wed, 13 Apr 2022 09:30:50 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:5800:1078:ebb9:e2c3:ea8c? (p200300cbc70458001078ebb9e2c3ea8c.dip0.t-ipconnect.de. [2003:cb:c704:5800:1078:ebb9:e2c3:ea8c]) by smtp.gmail.com with ESMTPSA id f9-20020a05600c154900b0038cb98076d6sm3269751wmg.10.2022.04.13.09.30.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Apr 2022 09:30:49 -0700 (PDT) Message-ID: <3b9effd9-4aba-e7ca-b3ca-6a474fd6469f@redhat.com> Date: Wed, 13 Apr 2022 18:30:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Andy Lutomirski , Jason Gunthorpe References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> <7ab689e7-e04d-5693-f899-d2d785b09892@redhat.com> <20220412143636.GG64706@ziepe.ca> <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK In-Reply-To: <6f44ddf9-6755-4120-be8b-7a62f0abc0e0@www.fastmail.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wanpeng Li , kvm list , qemu-devel@nongnu.org, "J . Bruce Fields" , linux-mm@kvack.org, "H. Peter Anvin" , Chao Peng , Andi Kleen , Jonathan Corbet , Joerg Roedel , the arch/x86 maintainers , Hugh Dickins , Steven Price , Ingo Molnar , "Eric W. Biederman" , "Maciej S . Szmigiero" , Borislav Petkov , "Nakajima, Jun" , Thomas Gleixner , Andrew Morton , Vlastimil Babka , Jim Mattson , Dave Hansen , Sean Christopherson , Jeff Layton , Linux Kernel Mailing List , Yu Zhang , "Kirill A. Shutemov" , Linux API , linux-fsdevel@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov , Vishal Annapurve , Mike Rapoport Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" > > So this is another situation where the actual backend (TDX, SEV, pKVM, pure software) makes a difference -- depending on exactly what backend we're using, the memory may not be unmoveable. It might even be swappable (in the potentially distant future). Right. And on a system without swap we don't particularly care about mlock, but we might (in most cases) care about fragmentation with unmovable memory. > > Anyway, here's a concrete proposal, with a bit of handwaving: Thanks for investing some brainpower. > > We add new cgroup limits: > > memory.unmoveable > memory.locked > > These can be set to an actual number or they can be set to the special value ROOT_CAP. If they're set to ROOT_CAP, then anyone in the cgroup with capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or locked memory with this (and potentially other) new APIs. If it's 0, then they can't. If it's another value, then the memory can be allocated, charged to the cgroup, up to the limit, with no particular capability needed. The default at boot is ROOT_CAP. Anyone who wants to configure it differently is free to do so. This avoids introducing a DoS, makes it easy to run tests without configuring cgroup, and lets serious users set up their cgroups. I wonder what the implications are for existing user space. Assume we want to move page pinning (rdma, vfio, io_uring, ...) to the new model. How can we be sure a) We don't break existing user space b) We don't open the doors unnoticed for the admin to go crazy on unmovable memory. Any ideas? > > Nothing is charge per mm. > > To make this fully sensible, we need to know what the backend is for the private memory before allocating any so that we can charge it accordingly. Right, the support for migration and/or swap defines how to account. -- Thanks, David / dhildenb