From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0BB8C433E0 for ; Tue, 12 Jan 2021 14:22:29 +0000 (UTC) Received: by mail.kernel.org (Postfix) id A06802312F; Tue, 12 Jan 2021 14:22:29 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 70CFA23121; Tue, 12 Jan 2021 14:22:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1610461349; bh=sbcVEreyAHJc4XzED40GfBd949GtVSfU3yOJPMjTYj4=; h=References:In-Reply-To:From:Date:Subject:To:List-Id:Cc:From; b=tbFjEmz6CLRU7KqPvmnGb9hwb6EeROefF+AkMXUa0/qwqNhRsz2eomYw+8Q0ypxdi l1N6bjvWQe5IcOUp3gJXC1JHP9ENQTPJhI1shUbZIZ0v7lsKDu1uPuT54XuXO7jduO FUvd3xMnFHGjP/tCZRhxWw0co5ZMoZtGbze/AnkcJoFIVnc5uGSC+2CmMNmU/6ED/N dPJQXckYnRKtijTfPGDcnNQAHrdVjCTPO7CBFWl+QDsyJZrhUxyxn4mwe6oS+xWMdw s1Bx0DthrJmi61xppnlr39OSBgTT8DPPmqx/JLgO8nkSwcxVvXt67CK/BIccqPPRUK FZeNIs4+0n/uw== Received: by mail-ot1-f44.google.com with SMTP id q25so2358531otn.10; Tue, 12 Jan 2021 06:22:29 -0800 (PST) X-Gm-Message-State: AOAM5315Y9NQSOFu07l2w40RG9QD6HqQMBhDtFYJwzgZ5EqN0YNZIDc7 cdZNnqiH9gAi30+9IHFGyCgCiSpbx3Vix4VorS4= X-Google-Smtp-Source: ABdhPJzGo0hy6hCYMeL+4I4ME57o0HLNQY2qNINNCP6cHdeI7PPNdeSLe3YuwwYOxDK0O9P8ZbAkdOD1PZ4fKiteo+U= X-Received: by 2002:a05:6830:2413:: with SMTP id j19mr3036404ots.251.1610461348653; Tue, 12 Jan 2021 06:22:28 -0800 (PST) MIME-Version: 1.0 References: <20210108105241.1757799-1-misono.tomohiro@jp.fujitsu.com> <20210108125410.GA84941@C02TD0UTHF1T.local> In-Reply-To: From: Arnd Bergmann Date: Tue, 12 Jan 2021 15:22:12 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver To: "misono.tomohiro@fujitsu.com" List-Id: Cc: Mark Rutland , Arnd Bergmann , Catalin Marinas , SoC Team , Olof Johansson , Will Deacon , Linux ARM Content-Type: text/plain; charset="UTF-8" On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com wrote: > > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland wrote: > However, I don't know any other processors having similar > features at this point and it is hard to provide common abstraction interface. > I would appreciate should anyone have any information. The specification you pointed to mentions the SPARC64 XIfx, so at a minimum, a user interface should be designed to also work on whatever register-level interface that provides. > > > Secondly, the intended usage model appears to expose this to EL0 for > > > direct access, and the code seems to depend on threads being pinned, but > > > AFAICT this is not enforced and there is no provision for > > > context-switch, thread migration, or interaction with ptrace. I fear > > > this is going to be very fragile in practice, and that extending that > > > support in future will require much more complexity than is currently > > > apparent, with potentially invasive changes to arch code. > > > > Right, this is the main problem I see, too. I had not even realized > > that this will have to tie in with user space threads in some form, but > > you are right that once this has to interact with the CPU scheduler, > > it all breaks down. > > This observation is right. I thought adding context switch etc. support for > implementation defined registers requires core arch code changes and > it is far less acceptable. So, I tried to confine code change in a module with > these restrictions. My feeling is that having the code separate from where it would belong in an operating system that was designed specifically for this feature ends up being no better than rewriting the core scheduling code. As Mark said, it may well be that neither approach would be sufficient for an upstream merge. On the other hand, keeping the code in a separate loadable module does make most sense if we end up not merging it at all, in which case this is the easiest to port between kernel versions. > Regarding direct access from EL0, it is necessary for realizing fast synchronization > as this enables synchronization logic in user application check if all threads have > reached at synchronization point without switching to kernel. Ok, I see. > Also, It is common usage that each running thread is bound to one PE in > multi-threaded HPC applications. I think the expectation that all threads are bound to a physical CPU makes sense for using this feature, but I think it would be necessary to enforce that, e.g. by allowing only threads to enable it after they are isolated to a non-shared CPU, and automatically disabling it if the CPU isolation is changed. For the user space interface, something based on process IDs seems to make more sense to me than something based on CPU numbers. All of the above does require some level of integration with the core kernel of course. I think the next step would be to try to come up with a high-level user interface design that has a chance to get merged, rather than addressing the review comments for the current implementation. Aside from the user interface question, it would be good to understand the performance impact of the feature. As I understand it, the entire purpose is to make things faster, so to put it in perspective compared to the burden of adding an interface, there should be some numbers: What are the kinds of applications that would use it in practice, and how much faster are they compared to not having it? Arnd From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C333C433DB for ; Tue, 12 Jan 2021 14:24:19 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D439223122 for ; Tue, 12 Jan 2021 14:24:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D439223122 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=6TttGCB+Jz6X3URskHVPp32/qz4R26CAURoofo8Sqp0=; b=jqJqKnsLZUDMXc4+TIzUqMYku t4OjnLed1PI8bOrpzzsETFYqlUTZc8WlR99NapSACWNM/4Y/hPIrouUtIYvA1liUN+2738LoRZ9qE bFu4i/SW81ngH+hHsZxUQmtgeMv85zooUkzztbormTW2jgzaDkOpNRWL0BghMqAUsuC3w8/kXRQ33 6E/uBRMZDBJXn59an+9QnQH6sZcY+x+bnVSQ9OSt+opsywEauYBQVFb4EOunf3rZEGI7rbOFjj04+ rK4klazQzTp8mODIkh3VkhiLJ969/u5TKzGT2UyPFQu8pI+hg2yak28sIs0vmx6+yWclD8lpBbF/t G3MJW00qQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kzKZ9-0006rY-Vv; Tue, 12 Jan 2021 14:22:36 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kzKZ4-0006px-Mm for linux-arm-kernel@lists.infradead.org; Tue, 12 Jan 2021 14:22:33 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7B05123125 for ; Tue, 12 Jan 2021 14:22:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1610461349; bh=sbcVEreyAHJc4XzED40GfBd949GtVSfU3yOJPMjTYj4=; h=References:In-Reply-To:From:Date:Subject:To:List-Id:Cc:From; b=tbFjEmz6CLRU7KqPvmnGb9hwb6EeROefF+AkMXUa0/qwqNhRsz2eomYw+8Q0ypxdi l1N6bjvWQe5IcOUp3gJXC1JHP9ENQTPJhI1shUbZIZ0v7lsKDu1uPuT54XuXO7jduO FUvd3xMnFHGjP/tCZRhxWw0co5ZMoZtGbze/AnkcJoFIVnc5uGSC+2CmMNmU/6ED/N dPJQXckYnRKtijTfPGDcnNQAHrdVjCTPO7CBFWl+QDsyJZrhUxyxn4mwe6oS+xWMdw s1Bx0DthrJmi61xppnlr39OSBgTT8DPPmqx/JLgO8nkSwcxVvXt67CK/BIccqPPRUK FZeNIs4+0n/uw== Received: by mail-ot1-f52.google.com with SMTP id o11so2398178ote.4 for ; Tue, 12 Jan 2021 06:22:29 -0800 (PST) X-Gm-Message-State: AOAM533T+z0ta39pPPT8g+vhxCSSNmdXls03n0NuqGY3mDovF28dR+vx hGTEc/OjMz6y3ES3ns4ltVk/gL4Wj64gqTNlYJw= X-Google-Smtp-Source: ABdhPJzGo0hy6hCYMeL+4I4ME57o0HLNQY2qNINNCP6cHdeI7PPNdeSLe3YuwwYOxDK0O9P8ZbAkdOD1PZ4fKiteo+U= X-Received: by 2002:a05:6830:2413:: with SMTP id j19mr3036404ots.251.1610461348653; Tue, 12 Jan 2021 06:22:28 -0800 (PST) MIME-Version: 1.0 References: <20210108105241.1757799-1-misono.tomohiro@jp.fujitsu.com> <20210108125410.GA84941@C02TD0UTHF1T.local> In-Reply-To: From: Arnd Bergmann Date: Tue, 12 Jan 2021 15:22:12 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver To: "misono.tomohiro@fujitsu.com" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210112_092231_011394_FCF5FE19 X-CRM114-Status: GOOD ( 32.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Cc: Mark Rutland , Arnd Bergmann , Catalin Marinas , SoC Team , Olof Johansson , Will Deacon , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Message-ID: <20210112142212.MVrfkJXdlqWrXzPzcPkFgkQGqjeZqXKUm02vUNpOF2I@z> On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com wrote: > > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland wrote: > However, I don't know any other processors having similar > features at this point and it is hard to provide common abstraction interface. > I would appreciate should anyone have any information. The specification you pointed to mentions the SPARC64 XIfx, so at a minimum, a user interface should be designed to also work on whatever register-level interface that provides. > > > Secondly, the intended usage model appears to expose this to EL0 for > > > direct access, and the code seems to depend on threads being pinned, but > > > AFAICT this is not enforced and there is no provision for > > > context-switch, thread migration, or interaction with ptrace. I fear > > > this is going to be very fragile in practice, and that extending that > > > support in future will require much more complexity than is currently > > > apparent, with potentially invasive changes to arch code. > > > > Right, this is the main problem I see, too. I had not even realized > > that this will have to tie in with user space threads in some form, but > > you are right that once this has to interact with the CPU scheduler, > > it all breaks down. > > This observation is right. I thought adding context switch etc. support for > implementation defined registers requires core arch code changes and > it is far less acceptable. So, I tried to confine code change in a module with > these restrictions. My feeling is that having the code separate from where it would belong in an operating system that was designed specifically for this feature ends up being no better than rewriting the core scheduling code. As Mark said, it may well be that neither approach would be sufficient for an upstream merge. On the other hand, keeping the code in a separate loadable module does make most sense if we end up not merging it at all, in which case this is the easiest to port between kernel versions. > Regarding direct access from EL0, it is necessary for realizing fast synchronization > as this enables synchronization logic in user application check if all threads have > reached at synchronization point without switching to kernel. Ok, I see. > Also, It is common usage that each running thread is bound to one PE in > multi-threaded HPC applications. I think the expectation that all threads are bound to a physical CPU makes sense for using this feature, but I think it would be necessary to enforce that, e.g. by allowing only threads to enable it after they are isolated to a non-shared CPU, and automatically disabling it if the CPU isolation is changed. For the user space interface, something based on process IDs seems to make more sense to me than something based on CPU numbers. All of the above does require some level of integration with the core kernel of course. I think the next step would be to try to come up with a high-level user interface design that has a chance to get merged, rather than addressing the review comments for the current implementation. Aside from the user interface question, it would be good to understand the performance impact of the feature. As I understand it, the entire purpose is to make things faster, so to put it in perspective compared to the burden of adding an interface, there should be some numbers: What are the kinds of applications that would use it in practice, and how much faster are they compared to not having it? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel