From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75F07C636CD for ; Fri, 16 Jul 2021 18:59:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5935E61002 for ; Fri, 16 Jul 2021 18:59:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232240AbhGPTCp (ORCPT ); Fri, 16 Jul 2021 15:02:45 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:36828 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231335AbhGPTCn (ORCPT ); Fri, 16 Jul 2021 15:02:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626461988; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t3eOYXdWOcxLwyJeyCXZFcz5Qu68cSCzVAnpdB+6mag=; b=f7dhxgU2+3RRUlmbcK8zgD4ZrOTrxj1+IZyHNsVO8I/wlH2LgvfRzc5ms8+V3CvCQSvU/7 Zy+2vpanOs6TMGXCsCWRn6CpkRPjBSJ1Ur/9YIYj57H2ayqUWJFrpg91XcKBMg0AgFMEQi amPejuYRr15nm3kGl7fKyuojAGdAN94= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-578-o9o9uPxuPVmGn9LIWjQ3LQ-1; Fri, 16 Jul 2021 14:59:46 -0400 X-MC-Unique: o9o9uPxuPVmGn9LIWjQ3LQ-1 Received: by mail-qt1-f200.google.com with SMTP id w3-20020ac80ec30000b029024e8c2383c1so6923638qti.5 for ; Fri, 16 Jul 2021 11:59:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=t3eOYXdWOcxLwyJeyCXZFcz5Qu68cSCzVAnpdB+6mag=; b=XeyFTYOVDtB+KuUrt2QOtfQMDtLpC7oMGrGSKzWk7ukRtH8MtUoxnJvQAoUKItzMZ+ PyFHt/ecR3FfqzyvW1WMPGq45CB5tFqkdiwBXUW7EzxZ3ce6GCGlgp/oaXKPlXupFJYZ YtNAcs+fJcmmF3abPZlzMKzM7Ck/MLKYbcp4tRcKwXrvZCHZhImQt/2CJ8hhZZlgtBbI sUSkSDEWE57aiFEZGME4eyqwajvlQ61f3kt7ixOe9sa8T7hONvtth+Pcy9MoEyMkJV+9 M0bU/0jzaKuIw8WOOsmxZEVaUxhnTRIfYFbXQX6oFOJ0lNCz4WtZ5BVEN3y9Vofpw4+8 9txw== X-Gm-Message-State: AOAM531GEJfvWvutsTmA9bHGnL0i3LYQV+u+6YJzqHENYcczbOQhWSmx HNRGdmZOm+HiNTP2lw+kG+a+Kq6+ag8OTqBmfTV+jomS2ruRp2lCeD6dk7GdJaOKvXR7ofsxRhq Mymvc+MCYUXx0Iy/BDjypm+6F X-Received: by 2002:ae9:f106:: with SMTP id k6mr11116139qkg.274.1626461986576; Fri, 16 Jul 2021 11:59:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyWrEaDUK2aTHtHoRv6qDWayp0aBiec8jDx0iHrZK+d3fUrWhn8/UAlWoUjPwPpD0PA7q7H3g== X-Received: by 2002:ae9:f106:: with SMTP id k6mr11116115qkg.274.1626461986393; Fri, 16 Jul 2021 11:59:46 -0700 (PDT) Received: from llong.remote.csb ([2601:191:8500:76c0::cdbc]) by smtp.gmail.com with ESMTPSA id j7sm4290785qkd.21.2021.07.16.11.59.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Jul 2021 11:59:45 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Subject: Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid partition root To: Waiman Long , Tejun Heo Cc: Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli References: <20210621184924.27493-1-longman@redhat.com> <20210621184924.27493-3-longman@redhat.com> <6ea1ac38-73e1-3f78-a5d2-a4c23bcd8dd1@redhat.com> Message-ID: <1bb119a1-d94a-6707-beac-e3ae5c03fae5@redhat.com> Date: Fri, 16 Jul 2021 14:59:44 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/16/21 2:44 PM, Waiman Long wrote: > On 7/5/21 1:51 PM, Tejun Heo wrote: >> Hello, Waiman. >> >> On Mon, Jun 28, 2021 at 09:06:50AM -0400, Waiman Long wrote: >>> The main reason for doing this is because normal cpuset control file >>> actions >>> are under the direct control of the cpuset code. So it is up to us >>> to decide >>> whether to grant it or deny it. Hotplug, on the other hand, is not >>> under the >>> control of cpuset code. It can't deny a hotplug operation. This is >>> the main >>> reason why the partition root error state was added in the first place. >> I have a difficult time convincing myself that this difference >> justifies the >> behavior difference and it keeps bothering me that there is a state >> which >> can be reached through one path but rejected by the other. I'll continue >> below. >> >>> Normally, users can set cpuset.cpus to whatever value they want even >>> though >>> they are not actually granted. However, turning on partition root is >>> under >>> more strict control. You can't turn on partition root if the CPUs >>> requested >>> cannot actually be granted. The problem with setting the state to just >>> partition error is that users may not be aware that the partition >>> creation >>> operation fails.  We can't assume all users will do the proper error >>> checking. I would rather let them know the operation fails rather than >>> relying on them doing the proper check afterward. >>> >>> Yes, I agree that it is a different philosophy than the original cpuset >>> code, but I thought one reason of doing cgroup v2 is to simplify the >>> interface and make it a bit more erorr-proof. Since partition root >>> creation >>> is a relatively rare operation, we can afford to make it more strict >>> than >>> the other operations. >> So, IMO, one of the reasons why cgroup1 interface was such a mess was >> because each piece of interaction was designed ad-hoc without regard >> to the >> overall consistency. One person feels a particular way of interacting >> with >> the interface is "correct" and does it that way and another person does >> another part in a different way. In the end, we ended up with a messy >> patchwork. >> >> One problematic aspect of cpuset in cgroup1 was the handling of failure >> modes, which was caused by the same exact approach - we wanted the >> interface >> to reject invalid configurations outright even though we didn't have the >> ability to prevent those configurations from occurring through other >> paths, >> which makes the failure mode more subtle by further obscuring them. >> >> I think a better approach would be having a clear signal and >> mechanism to >> watch the state and explicitly requiring users to verify and monitor the >> state transitions. > > Sorry for the late reply as I was busy with other works. > > I agree with you on principle. However, the reason why there are more > restrictions on enabling partition is because I want to avoid forcing > the users to always read back cpuset.partition.type to see if the > operation succeeds instead of just getting an error from the > operation. The former approach is more error prone. If you don't want > changes in existing behavior, I can relax the checking and allow them > to become an invalid partition if an illegal operation happens. > > Also there is now another cpuset patch to extend cpu isolation to > cgroup v1 [1]. I think it is better suit to the cgroup v2 partition > scheme, but cgroup v1 is still quite heavily out there. > > Please let me know what you want me to do and I will send out a v3 > version. Note that the current cpuset partition implementation have implemented some restrictions on when a partition can be enabled. However, I missed some corner cases in the original implementation that allow certain cpuset operations to make a partition invalid. I tried to plug those holes in this patchset. However, if maintaining backward compatibility is more important, I can leave those holes and update the documentation to make sure that people check cpuset.partition.type to confirm if their operation succeeds. Cheers, Longman From mboxrd@z Thu Jan 1 00:00:00 1970 From: Waiman Long Subject: Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid partition root Date: Fri, 16 Jul 2021 14:59:44 -0400 Message-ID: <1bb119a1-d94a-6707-beac-e3ae5c03fae5@redhat.com> References: <20210621184924.27493-1-longman@redhat.com> <20210621184924.27493-3-longman@redhat.com> <6ea1ac38-73e1-3f78-a5d2-a4c23bcd8dd1@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626461988; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t3eOYXdWOcxLwyJeyCXZFcz5Qu68cSCzVAnpdB+6mag=; b=f7dhxgU2+3RRUlmbcK8zgD4ZrOTrxj1+IZyHNsVO8I/wlH2LgvfRzc5ms8+V3CvCQSvU/7 Zy+2vpanOs6TMGXCsCWRn6CpkRPjBSJ1Ur/9YIYj57H2ayqUWJFrpg91XcKBMg0AgFMEQi amPejuYRr15nm3kGl7fKyuojAGdAN94= In-Reply-To: Content-Language: en-US List-ID: Content-Type: text/plain; charset="iso-8859-1"; format="flowed" To: Waiman Long , Tejun Heo Cc: Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli On 7/16/21 2:44 PM, Waiman Long wrote: > On 7/5/21 1:51 PM, Tejun Heo wrote: >> Hello, Waiman. >> >> On Mon, Jun 28, 2021 at 09:06:50AM -0400, Waiman Long wrote: >>> The main reason for doing this is because normal cpuset control file=20 >>> actions >>> are under the direct control of the cpuset code. So it is up to us=20 >>> to decide >>> whether to grant it or deny it. Hotplug, on the other hand, is not=20 >>> under the >>> control of cpuset code. It can't deny a hotplug operation. This is=20 >>> the main >>> reason why the partition root error state was added in the first place. >> I have a difficult time convincing myself that this difference=20 >> justifies the >> behavior difference and it keeps bothering me that there is a state=20 >> which >> can be reached through one path but rejected by the other. I'll continue >> below. >> >>> Normally, users can set cpuset.cpus to whatever value they want even=20 >>> though >>> they are not actually granted. However, turning on partition root is=20 >>> under >>> more strict control. You can't turn on partition root if the CPUs=20 >>> requested >>> cannot actually be granted. The problem with setting the state to just >>> partition error is that users may not be aware that the partition=20 >>> creation >>> operation fails.=C2=A0 We can't assume all users will do the proper err= or >>> checking. I would rather let them know the operation fails rather than >>> relying on them doing the proper check afterward. >>> >>> Yes, I agree that it is a different philosophy than the original cpuset >>> code, but I thought one reason of doing cgroup v2 is to simplify the >>> interface and make it a bit more erorr-proof. Since partition root=20 >>> creation >>> is a relatively rare operation, we can afford to make it more strict=20 >>> than >>> the other operations. >> So, IMO, one of the reasons why cgroup1 interface was such a mess was >> because each piece of interaction was designed ad-hoc without regard=20 >> to the >> overall consistency. One person feels a particular way of interacting=20 >> with >> the interface is "correct" and does it that way and another person does >> another part in a different way. In the end, we ended up with a messy >> patchwork. >> >> One problematic aspect of cpuset in cgroup1 was the handling of failure >> modes, which was caused by the same exact approach - we wanted the=20 >> interface >> to reject invalid configurations outright even though we didn't have the >> ability to prevent those configurations from occurring through other=20 >> paths, >> which makes the failure mode more subtle by further obscuring them. >> >> I think a better approach would be having a clear signal and=20 >> mechanism to >> watch the state and explicitly requiring users to verify and monitor the >> state transitions. > > Sorry for the late reply as I was busy with other works. > > I agree with you on principle. However, the reason why there are more=20 > restrictions on enabling partition is because I want to avoid forcing=20 > the users to always read back cpuset.partition.type to see if the=20 > operation succeeds instead of just getting an error from the=20 > operation. The former approach is more error prone. If you don't want=20 > changes in existing behavior, I can relax the checking and allow them=20 > to become an invalid partition if an illegal operation happens. > > Also there is now another cpuset patch to extend cpu isolation to=20 > cgroup v1 [1]. I think it is better suit to the cgroup v2 partition=20 > scheme, but cgroup v1 is still quite heavily out there. > > Please let me know what you want me to do and I will send out a v3=20 > version.=20 Note that the current cpuset partition implementation have implemented=20 some restrictions on when a partition can be enabled. However, I missed=20 some corner cases in the original implementation that allow certain=20 cpuset operations to make a partition invalid. I tried to plug those=20 holes in this patchset. However, if maintaining backward compatibility=20 is more important, I can leave those holes and update the documentation=20 to make sure that people check cpuset.partition.type to confirm if their=20 operation succeeds. Cheers, Longman