All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@kernel.org>
To: "misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	SoC Team <soc@kernel.org>,  Olof Johansson <olof@lixom.net>,
	Will Deacon <will@kernel.org>,
	 Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
Date: Tue, 12 Jan 2021 15:22:12 +0100	[thread overview]
Message-ID: <CAK8P3a2BbCtZG=+fi1mtrhMtBKHcaZa=gvfiKJVdnsq6N3yG4A@mail.gmail.com> (raw)
In-Reply-To: <OSBPR01MB4582283E56A13D03A778150FE5AA0@OSBPR01MB4582.jpnprd01.prod.outlook.com>

On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> However, I don't know any other processors having similar
> features at this point and it is hard to provide common abstraction interface.
> I would appreciate should anyone have any information.

The specification you pointed to mentions the SPARC64 XIfx, so
at a minimum, a user interface should be designed to also work on
whatever register-level interface that provides.

> > > Secondly, the intended usage model appears to expose this to EL0 for
> > > direct access, and the code seems to depend on threads being pinned, but
> > > AFAICT this is not enforced and there is no provision for
> > > context-switch, thread migration, or interaction with ptrace. I fear
> > > this is going to be very fragile in practice, and that extending that
> > > support in future will require much more complexity than is currently
> > > apparent, with potentially invasive changes to arch code.
> >
> > Right, this is the main problem I see, too. I had not even realized
> > that this will have to tie in with user space threads in some form, but
> > you are right that once this has to interact with the CPU scheduler,
> > it all breaks down.
>
> This observation is right. I thought adding context switch etc. support for
> implementation defined registers requires core arch code changes and
> it is far less acceptable. So, I tried to confine code change in a module with
> these restrictions.

My feeling is that having the code separate from where it would belong
in an operating system that was designed specifically for this feature
ends up being no better than rewriting the core scheduling code.

As Mark said, it may well be that neither approach would be sufficient
for an upstream merge. On the other hand, keeping the code in a
separate loadable module does make most sense if we end up
not merging it at all, in which case this is the easiest to port
between kernel versions.

> Regarding direct access from EL0, it is necessary for realizing fast synchronization
> as this enables synchronization logic in user application check if all threads have
> reached at synchronization point without switching to kernel.

Ok, I see.

> Also, It is common usage that each running thread is bound to one PE in
> multi-threaded HPC applications.

I think the expectation that all threads are bound to a physical CPU
makes sense for using this feature, but I think it would be necessary
to enforce that, e.g. by allowing only threads to enable it after they
are isolated to a non-shared CPU, and automatically disabling it
if the CPU isolation is changed.

For the user space interface, something based on process IDs
seems to make more sense to me than something based on CPU
numbers. All of the above does require some level of integration
with the core kernel of course.

I think the next step would be to try to come up with a high-level
user interface design that has a chance to get merged, rather than
addressing the review comments for the current implementation.

Aside from the user interface question, it would be good to
understand the performance impact of the feature.
As I understand it, the entire purpose is to make things faster, so
to put it in perspective compared to the burden of adding an
interface, there should be some numbers: What are the kinds of
applications that would use it in practice, and how much faster are
they compared to not having it?

       Arnd

WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@kernel.org>
To: "misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	SoC Team <soc@kernel.org>, Olof Johansson <olof@lixom.net>,
	Will Deacon <will@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
Date: Tue, 12 Jan 2021 15:22:12 +0100	[thread overview]
Message-ID: <CAK8P3a2BbCtZG=+fi1mtrhMtBKHcaZa=gvfiKJVdnsq6N3yG4A@mail.gmail.com> (raw)
Message-ID: <20210112142212.MVrfkJXdlqWrXzPzcPkFgkQGqjeZqXKUm02vUNpOF2I@z> (raw)
In-Reply-To: <OSBPR01MB4582283E56A13D03A778150FE5AA0@OSBPR01MB4582.jpnprd01.prod.outlook.com>

On Tue, Jan 12, 2021 at 11:24 AM misono.tomohiro@fujitsu.com
<misono.tomohiro@fujitsu.com> wrote:
> > On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland@arm.com> wrote:
> However, I don't know any other processors having similar
> features at this point and it is hard to provide common abstraction interface.
> I would appreciate should anyone have any information.

The specification you pointed to mentions the SPARC64 XIfx, so
at a minimum, a user interface should be designed to also work on
whatever register-level interface that provides.

> > > Secondly, the intended usage model appears to expose this to EL0 for
> > > direct access, and the code seems to depend on threads being pinned, but
> > > AFAICT this is not enforced and there is no provision for
> > > context-switch, thread migration, or interaction with ptrace. I fear
> > > this is going to be very fragile in practice, and that extending that
> > > support in future will require much more complexity than is currently
> > > apparent, with potentially invasive changes to arch code.
> >
> > Right, this is the main problem I see, too. I had not even realized
> > that this will have to tie in with user space threads in some form, but
> > you are right that once this has to interact with the CPU scheduler,
> > it all breaks down.
>
> This observation is right. I thought adding context switch etc. support for
> implementation defined registers requires core arch code changes and
> it is far less acceptable. So, I tried to confine code change in a module with
> these restrictions.

My feeling is that having the code separate from where it would belong
in an operating system that was designed specifically for this feature
ends up being no better than rewriting the core scheduling code.

As Mark said, it may well be that neither approach would be sufficient
for an upstream merge. On the other hand, keeping the code in a
separate loadable module does make most sense if we end up
not merging it at all, in which case this is the easiest to port
between kernel versions.

> Regarding direct access from EL0, it is necessary for realizing fast synchronization
> as this enables synchronization logic in user application check if all threads have
> reached at synchronization point without switching to kernel.

Ok, I see.

> Also, It is common usage that each running thread is bound to one PE in
> multi-threaded HPC applications.

I think the expectation that all threads are bound to a physical CPU
makes sense for using this feature, but I think it would be necessary
to enforce that, e.g. by allowing only threads to enable it after they
are isolated to a non-shared CPU, and automatically disabling it
if the CPU isolation is changed.

For the user space interface, something based on process IDs
seems to make more sense to me than something based on CPU
numbers. All of the above does require some level of integration
with the core kernel of course.

I think the next step would be to try to come up with a high-level
user interface design that has a chance to get merged, rather than
addressing the review comments for the current implementation.

Aside from the user interface question, it would be good to
understand the performance impact of the feature.
As I understand it, the entire purpose is to make things faster, so
to put it in perspective compared to the burden of adding an
interface, there should be some numbers: What are the kinds of
applications that would use it in practice, and how much faster are
they compared to not having it?

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-12 14:22 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 10:52 [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Misono Tomohiro
2021-01-08 10:52 ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 01/10] soc: fujitsu: hwb: Add hardware barrier driver init/exit code Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 02/10] soc: fujtisu: hwb: Add open operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:22   ` Arnd Bergmann
2021-01-08 13:22     ` Arnd Bergmann
2021-01-12 11:02     ` misono.tomohiro
2021-01-12 11:02       ` misono.tomohiro
2021-01-12 12:34       ` Arnd Bergmann
2021-01-12 12:34         ` Arnd Bergmann
2021-01-08 10:52 ` [PATCH 04/10] soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 05/10] soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 06/10] soc: fujitsu: hwb: Add IOC_BB_FREE ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 07/10] soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 08/10] soc: fujitsu: hwb: Add release operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:25   ` Arnd Bergmann
2021-01-08 13:25     ` Arnd Bergmann
2021-01-12 10:38     ` misono.tomohiro
2021-01-12 10:38       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:27   ` Arnd Bergmann
2021-01-08 13:27     ` Arnd Bergmann
2021-01-12 10:40     ` misono.tomohiro
2021-01-12 10:40       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 10/10] soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 12:54 ` [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Mark Rutland
2021-01-08 12:54   ` Mark Rutland
2021-01-08 14:23   ` Arnd Bergmann
2021-01-08 14:23     ` Arnd Bergmann
2021-01-08 15:51     ` Mark Rutland
2021-01-08 15:51       ` Mark Rutland
2021-01-12 10:24     ` misono.tomohiro
2021-01-12 10:24       ` misono.tomohiro
2021-01-12 14:22       ` Arnd Bergmann [this message]
2021-01-12 14:22         ` Arnd Bergmann
2021-01-15 11:10         ` misono.tomohiro
2021-01-15 11:10           ` misono.tomohiro
2021-01-15 12:24           ` Arnd Bergmann
2021-01-15 12:24             ` Arnd Bergmann
2021-01-19  5:30             ` misono.tomohiro
2021-01-19  5:30               ` misono.tomohiro
2021-02-18  9:49             ` misono.tomohiro
2021-02-18  9:49               ` misono.tomohiro
2021-03-01  7:53               ` misono.tomohiro
2021-03-01  7:53                 ` misono.tomohiro
2021-03-02 11:06               ` Arnd Bergmann
2021-03-02 11:06                 ` Arnd Bergmann
2021-03-03 11:20                 ` misono.tomohiro
2021-03-03 11:20                   ` misono.tomohiro
2021-03-03 13:33                   ` Arnd Bergmann
2021-03-03 13:33                     ` Arnd Bergmann
2021-03-04  7:03                     ` misono.tomohiro
2021-03-04  7:03                       ` misono.tomohiro
2021-01-12 10:32   ` misono.tomohiro
2021-01-12 10:32     ` misono.tomohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK8P3a2BbCtZG=+fi1mtrhMtBKHcaZa=gvfiKJVdnsq6N3yG4A@mail.gmail.com' \
    --to=arnd@kernel.org \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=misono.tomohiro@fujitsu.com \
    --cc=olof@lixom.net \
    --cc=soc@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.