All of lore.kernel.org
 help / color / mirror / Atom feed
From: "misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>
To: 'Arnd Bergmann' <arnd@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	SoC Team <soc@kernel.org>, Olof Johansson <olof@lixom.net>,
	Will Deacon <will@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
Date: Tue, 19 Jan 2021 05:30:48 +0000	[thread overview]
Message-ID: <OSBPR01MB4582B8BD611C66B999F3A3EDE5A30@OSBPR01MB4582.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <CAK8P3a0Oxvm=VTnPAwxSQ-od4Q0rqMq45633pMyTcV8Adet-Lw@mail.gmail.com>

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.
> >
> > Understood. One question is that high-level interface such as process
> > based control could solve several problems (i.e. access control/force binding),
> > I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> > above. Is it acceptable in your sense?
> 
> I think you will get different answers for that depending on who you ask ;-)
> 
> I'm generally ok with it, given that it will only affect a very small
> number of specialized applications that are already built for
> a specific microarchitecture for performance reasons. E.g. when
> using an arm64 BLAS library, you would use different versions
> of the same functions depending on CPU support for NEON,
> SVE, SVE2, Apple AMX (which also uses imp-def instructions),
> ARMv8.6 GEMM extensions, and likely a hand-optimized
> version for the A64FX pipeline. Having a version for A64FX with
> hardware barriers adds (at most) one more code path but hopefully
> does not add complexity to the common code.

Thanks. Btw, to be precise, A64FX doesn't use imp-def instructions.
It provides imp-def registers which can be accessed by system
register access instructions (msr/mrs).

> > > Aside from the user interface question, it would be good to
> > > understand the performance impact of the feature.
> > > As I understand it, the entire purpose is to make things faster, so
> > > to put it in perspective compared to the burden of adding an
> > > interface, there should be some numbers: What are the kinds of
> > > applications that would use it in practice, and how much faster are
> > > they compared to not having it?
> >
> > Microbenchmark shows it takes around 250ns for 1 synchronization for
> > 12 PEs with hardware barrier and it is multiple times faster than software
> > barrier (only measuring core synchronization logic and excluding setup time).
> > I don't have application results at this point and will share when I could get some.
> 
> Thanks. That will be helpful indeed. Please also include information
> about what you are comparing against for the software barrier. E.g.
> Is that based on a futex() system call, or completely implemented
> in user space?

It completely implemented in user space by using shared variables
without system call.
(As all PEs to be synced shares L3, it should cause to access to L3.)

Regards,
Tomohiro

WARNING: multiple messages have this Message-ID (diff)
From: "misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>
To: 'Arnd Bergmann' <arnd@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	SoC Team <soc@kernel.org>, Olof Johansson <olof@lixom.net>,
	Will Deacon <will@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: RE: [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver
Date: Tue, 19 Jan 2021 05:30:48 +0000	[thread overview]
Message-ID: <OSBPR01MB4582B8BD611C66B999F3A3EDE5A30@OSBPR01MB4582.jpnprd01.prod.outlook.com> (raw)
Message-ID: <20210119053048.WOSaxaSuVZFOFjINn9h4JIoCa11HJ9QizWkOlTKTI-w@z> (raw)
In-Reply-To: <CAK8P3a0Oxvm=VTnPAwxSQ-od4Q0rqMq45633pMyTcV8Adet-Lw@mail.gmail.com>

> > > > Also, It is common usage that each running thread is bound to one PE in
> > > > multi-threaded HPC applications.
> > >
> > > I think the expectation that all threads are bound to a physical CPU
> > > makes sense for using this feature, but I think it would be necessary
> > > to enforce that, e.g. by allowing only threads to enable it after they
> > > are isolated to a non-shared CPU, and automatically disabling it
> > > if the CPU isolation is changed.
> > >
> > > For the user space interface, something based on process IDs
> > > seems to make more sense to me than something based on CPU
> > > numbers. All of the above does require some level of integration
> > > with the core kernel of course.
> > >
> > > I think the next step would be to try to come up with a high-level
> > > user interface design that has a chance to get merged, rather than
> > > addressing the review comments for the current implementation.
> >
> > Understood. One question is that high-level interface such as process
> > based control could solve several problems (i.e. access control/force binding),
> > I cannot eliminate access to IMP-DEF registers from EL0 as I explained
> > above. Is it acceptable in your sense?
> 
> I think you will get different answers for that depending on who you ask ;-)
> 
> I'm generally ok with it, given that it will only affect a very small
> number of specialized applications that are already built for
> a specific microarchitecture for performance reasons. E.g. when
> using an arm64 BLAS library, you would use different versions
> of the same functions depending on CPU support for NEON,
> SVE, SVE2, Apple AMX (which also uses imp-def instructions),
> ARMv8.6 GEMM extensions, and likely a hand-optimized
> version for the A64FX pipeline. Having a version for A64FX with
> hardware barriers adds (at most) one more code path but hopefully
> does not add complexity to the common code.

Thanks. Btw, to be precise, A64FX doesn't use imp-def instructions.
It provides imp-def registers which can be accessed by system
register access instructions (msr/mrs).

> > > Aside from the user interface question, it would be good to
> > > understand the performance impact of the feature.
> > > As I understand it, the entire purpose is to make things faster, so
> > > to put it in perspective compared to the burden of adding an
> > > interface, there should be some numbers: What are the kinds of
> > > applications that would use it in practice, and how much faster are
> > > they compared to not having it?
> >
> > Microbenchmark shows it takes around 250ns for 1 synchronization for
> > 12 PEs with hardware barrier and it is multiple times faster than software
> > barrier (only measuring core synchronization logic and excluding setup time).
> > I don't have application results at this point and will share when I could get some.
> 
> Thanks. That will be helpful indeed. Please also include information
> about what you are comparing against for the software barrier. E.g.
> Is that based on a futex() system call, or completely implemented
> in user space?

It completely implemented in user space by using shared variables
without system call.
(As all PEs to be synced shares L3, it should cause to access to L3.)

Regards,
Tomohiro
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-19  5:32 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 10:52 [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Misono Tomohiro
2021-01-08 10:52 ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 01/10] soc: fujitsu: hwb: Add hardware barrier driver init/exit code Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 02/10] soc: fujtisu: hwb: Add open operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 03/10] soc: fujitsu: hwb: Add IOC_BB_ALLOC ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:22   ` Arnd Bergmann
2021-01-08 13:22     ` Arnd Bergmann
2021-01-12 11:02     ` misono.tomohiro
2021-01-12 11:02       ` misono.tomohiro
2021-01-12 12:34       ` Arnd Bergmann
2021-01-12 12:34         ` Arnd Bergmann
2021-01-08 10:52 ` [PATCH 04/10] soc: fujitsu: hwb: Add IOC_BW_ASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 05/10] soc: fujitsu: hwb: Add IOC_BW_UNASSIGN ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 06/10] soc: fujitsu: hwb: Add IOC_BB_FREE ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 07/10] soc: fujitsu: hwb: Add IOC_GET_PE_INFO ioctl Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 10:52 ` [PATCH 08/10] soc: fujitsu: hwb: Add release operation Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:25   ` Arnd Bergmann
2021-01-08 13:25     ` Arnd Bergmann
2021-01-12 10:38     ` misono.tomohiro
2021-01-12 10:38       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 09/10] soc: fujitsu: hwb: Add sysfs entry Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 13:27   ` Arnd Bergmann
2021-01-08 13:27     ` Arnd Bergmann
2021-01-12 10:40     ` misono.tomohiro
2021-01-12 10:40       ` misono.tomohiro
2021-01-08 10:52 ` [PATCH 10/10] soc: fujitsu: hwb: Add Kconfig/Makefile to build fujitsu_hwb driver Misono Tomohiro
2021-01-08 10:52   ` Misono Tomohiro
2021-01-08 12:54 ` [RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver Mark Rutland
2021-01-08 12:54   ` Mark Rutland
2021-01-08 14:23   ` Arnd Bergmann
2021-01-08 14:23     ` Arnd Bergmann
2021-01-08 15:51     ` Mark Rutland
2021-01-08 15:51       ` Mark Rutland
2021-01-12 10:24     ` misono.tomohiro
2021-01-12 10:24       ` misono.tomohiro
2021-01-12 14:22       ` Arnd Bergmann
2021-01-12 14:22         ` Arnd Bergmann
2021-01-15 11:10         ` misono.tomohiro
2021-01-15 11:10           ` misono.tomohiro
2021-01-15 12:24           ` Arnd Bergmann
2021-01-15 12:24             ` Arnd Bergmann
2021-01-19  5:30             ` misono.tomohiro [this message]
2021-01-19  5:30               ` misono.tomohiro
2021-02-18  9:49             ` misono.tomohiro
2021-02-18  9:49               ` misono.tomohiro
2021-03-01  7:53               ` misono.tomohiro
2021-03-01  7:53                 ` misono.tomohiro
2021-03-02 11:06               ` Arnd Bergmann
2021-03-02 11:06                 ` Arnd Bergmann
2021-03-03 11:20                 ` misono.tomohiro
2021-03-03 11:20                   ` misono.tomohiro
2021-03-03 13:33                   ` Arnd Bergmann
2021-03-03 13:33                     ` Arnd Bergmann
2021-03-04  7:03                     ` misono.tomohiro
2021-03-04  7:03                       ` misono.tomohiro
2021-01-12 10:32   ` misono.tomohiro
2021-01-12 10:32     ` misono.tomohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OSBPR01MB4582B8BD611C66B999F3A3EDE5A30@OSBPR01MB4582.jpnprd01.prod.outlook.com \
    --to=misono.tomohiro@fujitsu.com \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=olof@lixom.net \
    --cc=soc@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.