All of lore.kernel.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@dabbelt.com>
To: David.Laight@ACULAB.COM
Cc: robh@kernel.org, evan@rivosinc.com,
	Conor Dooley <conor@kernel.org>,
	Vineet Gupta <vineetg@rivosinc.com>,
	heiko@sntech.de, slewis@rivosinc.com, aou@eecs.berkeley.edu,
	krzysztof.kozlowski+dt@linaro.org,
	Paul Walmsley <paul.walmsley@sifive.com>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org
Subject: RE: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance
Date: Thu, 09 Feb 2023 08:51:22 -0800 (PST)	[thread overview]
Message-ID: <mhng-8736b349-e27a-4372-81ca-3a25d2ec1e94@palmer-ri-x1c9> (raw)
In-Reply-To: <4bd24def02014939a87eb8430ba0070d@AcuMS.aculab.com>

On Wed, 08 Feb 2023 04:45:10 PST (-0800), David.Laight@ACULAB.COM wrote:
> From: Rob Herring
>> Sent: 07 February 2023 17:06
>>
>> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
>> > From: Palmer Dabbelt <palmer@rivosinc.com>
>> >
>> > This key allows device trees to specify the performance of misaligned
>> > accesses to main memory regions from each CPU in the system.
>> >
>> > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> > Signed-off-by: Evan Green <evan@rivosinc.com>
>> > ---
>> >
>> > (no changes since v1)
>> >
>> >  Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
>> >  1 file changed, 15 insertions(+)
>> >
>> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > index c6720764e765..2c09bd6f2927 100644
>> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > @@ -85,6 +85,21 @@ properties:
>> >      $ref: "/schemas/types.yaml#/definitions/string"
>> >      pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>> >
>> > +  riscv,misaligned-access-performance:
>> > +    description:
>> > +      Identifies the performance of misaligned memory accesses to main memory
>> > +      regions.  There are three flavors of unaligned access performance: "emulated"
>> > +      means that misaligned accesses are emulated via software and thus
>> > +      extremely slow, "slow" means that misaligned accesses are supported by
>> > +      hardware but still slower that aligned accesses sequences, and "fast"
>> > +      means that misaligned accesses are as fast or faster than the
>> > +      cooresponding aligned accesses sequences.
>> > +    $ref: "/schemas/types.yaml#/definitions/string"
>> > +    enum:
>> > +      - emulated
>> > +      - slow
>> > +      - fast
>>
>> I don't think this belongs in DT. (I'm not sure about a userspace
>> interface either.)

[Kind of answered below.]

>> Can't this be tested and determined at runtime? Do misaligned accesses
>> and compare the performance. We already do this for things like memcpy
>> or crypto implementation selection.

We've had a history of broken firmware emulation of misaligned accesses 
wreaking havoc.  We don't run into concrete bugs there because we avoid 
misaligned accesses as much as possible in the kernel, but I'd be 
worried that we'd trigger a lot of these when probing for misaligned 
accesses.

> There is also an long discussion about misaligned accesses
> for loooongarch.
>
> Basically if you want to run a common kernel (and userspace)
> you have to default to compiling everything with -mno-stict-align
> so that the compiler generates byte accesses for anything
> marked 'packed' (etc).
>
> Run-time tests can optimise some hot-spots.
>
> In any case 'slow' is probably pointless - unless the accesses
> take more than 1 or 2 extra cycles.

[Also below.]

> Oh, and you really never, ever want to emulate them.

Unfortunately we're kind of stuck with this one: the specs used to 
require that misaligned accesses were supported and thus there's a bunch 
of firmwares that emulate them (and various misaligned accesses spread 
around, though they're kind of a mess).  The specs no longer require 
this support, but just dropping it from firmware will break binaries.

There's been some vague plans to dig out of this, but it'd require some 
sort of firmware interface additions in order to turn off the emulation 
and that's going to take a while.  As it stands we've got a bunch of 
users that just want to know when they can emit misaligned accesses.

> Technically misaligned reads on (some) x86-64 cpu are slower
> than aligned ones, but the difference is marginal.
> I've measured two 64bit misaligned reads every clock.
> But it is consistently slower by much less than one clock
> per cache line.

The "fast" case is explicitly written to catch that flavor of 
implementation.

The "slow" one is a bit vaguer, but the general idea is to catch 
implementations that end up with some sort of pipeline flush on 
misaligned accesses.  We've got a lot of very small in-order processors 
in RISC-V land, and while I haven't gotten around to benchmarking them 
all my guess is that the spec requirement for support ended up with some 
simple implementations.

FWIW: I checked the c906 RTL and it's setting some exception-related 
info on misaligned accesses, but I'd need to actually benchmark on to 
know for sure and they're kind of a headache to deal with.

>
> 	David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

WARNING: multiple messages have this Message-ID (diff)
From: Palmer Dabbelt <palmer@dabbelt.com>
To: David.Laight@ACULAB.COM
Cc: robh@kernel.org, evan@rivosinc.com,
	Conor Dooley <conor@kernel.org>,
	Vineet Gupta <vineetg@rivosinc.com>,
	heiko@sntech.de, slewis@rivosinc.com, aou@eecs.berkeley.edu,
	krzysztof.kozlowski+dt@linaro.org,
	Paul Walmsley <paul.walmsley@sifive.com>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org
Subject: RE: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance
Date: Thu, 09 Feb 2023 08:51:22 -0800 (PST)	[thread overview]
Message-ID: <mhng-8736b349-e27a-4372-81ca-3a25d2ec1e94@palmer-ri-x1c9> (raw)
In-Reply-To: <4bd24def02014939a87eb8430ba0070d@AcuMS.aculab.com>

On Wed, 08 Feb 2023 04:45:10 PST (-0800), David.Laight@ACULAB.COM wrote:
> From: Rob Herring
>> Sent: 07 February 2023 17:06
>>
>> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
>> > From: Palmer Dabbelt <palmer@rivosinc.com>
>> >
>> > This key allows device trees to specify the performance of misaligned
>> > accesses to main memory regions from each CPU in the system.
>> >
>> > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> > Signed-off-by: Evan Green <evan@rivosinc.com>
>> > ---
>> >
>> > (no changes since v1)
>> >
>> >  Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
>> >  1 file changed, 15 insertions(+)
>> >
>> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > index c6720764e765..2c09bd6f2927 100644
>> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > @@ -85,6 +85,21 @@ properties:
>> >      $ref: "/schemas/types.yaml#/definitions/string"
>> >      pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>> >
>> > +  riscv,misaligned-access-performance:
>> > +    description:
>> > +      Identifies the performance of misaligned memory accesses to main memory
>> > +      regions.  There are three flavors of unaligned access performance: "emulated"
>> > +      means that misaligned accesses are emulated via software and thus
>> > +      extremely slow, "slow" means that misaligned accesses are supported by
>> > +      hardware but still slower that aligned accesses sequences, and "fast"
>> > +      means that misaligned accesses are as fast or faster than the
>> > +      cooresponding aligned accesses sequences.
>> > +    $ref: "/schemas/types.yaml#/definitions/string"
>> > +    enum:
>> > +      - emulated
>> > +      - slow
>> > +      - fast
>>
>> I don't think this belongs in DT. (I'm not sure about a userspace
>> interface either.)

[Kind of answered below.]

>> Can't this be tested and determined at runtime? Do misaligned accesses
>> and compare the performance. We already do this for things like memcpy
>> or crypto implementation selection.

We've had a history of broken firmware emulation of misaligned accesses 
wreaking havoc.  We don't run into concrete bugs there because we avoid 
misaligned accesses as much as possible in the kernel, but I'd be 
worried that we'd trigger a lot of these when probing for misaligned 
accesses.

> There is also an long discussion about misaligned accesses
> for loooongarch.
>
> Basically if you want to run a common kernel (and userspace)
> you have to default to compiling everything with -mno-stict-align
> so that the compiler generates byte accesses for anything
> marked 'packed' (etc).
>
> Run-time tests can optimise some hot-spots.
>
> In any case 'slow' is probably pointless - unless the accesses
> take more than 1 or 2 extra cycles.

[Also below.]

> Oh, and you really never, ever want to emulate them.

Unfortunately we're kind of stuck with this one: the specs used to 
require that misaligned accesses were supported and thus there's a bunch 
of firmwares that emulate them (and various misaligned accesses spread 
around, though they're kind of a mess).  The specs no longer require 
this support, but just dropping it from firmware will break binaries.

There's been some vague plans to dig out of this, but it'd require some 
sort of firmware interface additions in order to turn off the emulation 
and that's going to take a while.  As it stands we've got a bunch of 
users that just want to know when they can emit misaligned accesses.

> Technically misaligned reads on (some) x86-64 cpu are slower
> than aligned ones, but the difference is marginal.
> I've measured two 64bit misaligned reads every clock.
> But it is consistently slower by much less than one clock
> per cache line.

The "fast" case is explicitly written to catch that flavor of 
implementation.

The "slow" one is a bit vaguer, but the general idea is to catch 
implementations that end up with some sort of pipeline flush on 
misaligned accesses.  We've got a lot of very small in-order processors 
in RISC-V land, and while I haven't gotten around to benchmarking them 
all my guess is that the spec requirement for support ended up with some 
simple implementations.

FWIW: I checked the c906 RTL and it's setting some exception-related 
info on misaligned accesses, but I'd need to actually benchmark on to 
know for sure and they're kind of a headache to deal with.

>
> 	David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-02-09 16:51 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-06 20:14 [PATCH v2 0/6] RISC-V Hardware Probing User Interface Evan Green
2023-02-06 20:14 ` Evan Green
2023-02-06 20:14 ` [PATCH v2 1/6] RISC-V: Move struct riscv_cpuinfo to new header Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-14 21:38   ` Conor Dooley
2023-02-14 21:38     ` Conor Dooley
2023-02-14 21:57     ` Evan Green
2023-02-14 21:57       ` Evan Green
2023-02-06 20:14 ` [PATCH v2 2/6] RISC-V: Add a syscall for HW probing Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-07  6:13   ` Greg KH
2023-02-07  6:13     ` Greg KH
2023-02-07  6:32     ` Conor Dooley
2023-02-07  6:32       ` Conor Dooley
2023-02-09 17:09       ` Evan Green
2023-02-09 17:09         ` Evan Green
2023-02-09 17:13         ` Greg KH
2023-02-09 17:13           ` Greg KH
2023-02-09 17:22           ` Jessica Clarke
2023-02-09 17:22             ` Jessica Clarke
2023-02-10  6:48             ` Greg KH
2023-02-10  6:48               ` Greg KH
2023-02-09 18:41           ` Evan Green
2023-02-09 18:41             ` Evan Green
2023-02-10  6:50             ` Greg KH
2023-02-10  6:50               ` Greg KH
2023-02-07 23:16   ` kernel test robot
2023-02-07 23:16     ` kernel test robot
2023-02-14 23:51   ` Conor Dooley
2023-02-14 23:51     ` Conor Dooley
2023-02-15  8:04     ` Andrew Jones
2023-02-15  8:04       ` Andrew Jones
2023-02-15 20:49     ` Evan Green
2023-02-15 20:49       ` Evan Green
2023-02-15 21:10       ` Conor Dooley
2023-02-15 21:10         ` Conor Dooley
2023-02-15  9:56   ` Arnd Bergmann
2023-02-15  9:56     ` Arnd Bergmann
2023-02-15 21:14     ` Evan Green
2023-02-15 21:14       ` Evan Green
2023-02-15 22:43       ` Jessica Clarke
2023-02-15 22:43         ` Jessica Clarke
2023-02-16 13:28         ` Arnd Bergmann
2023-02-16 13:28           ` Arnd Bergmann
2023-02-16 23:18           ` Evan Green
2023-02-16 23:18             ` Evan Green
2023-02-06 20:14 ` [PATCH v2 3/6] RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-08  5:06   ` kernel test robot
2023-02-15 21:25   ` Conor Dooley
2023-02-15 21:25     ` Conor Dooley
2023-02-15 22:09   ` Conor Dooley
2023-02-15 22:09     ` Conor Dooley
2023-02-06 20:14 ` [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-06 21:49   ` Rob Herring
2023-02-06 21:49     ` Rob Herring
2023-02-07 17:05   ` Rob Herring
2023-02-07 17:05     ` Rob Herring
2023-02-08 12:45     ` David Laight
2023-02-08 12:45       ` David Laight
2023-02-09 16:51       ` Palmer Dabbelt [this message]
2023-02-09 16:51         ` Palmer Dabbelt
2023-02-28 14:56         ` Rob Herring
2023-02-28 14:56           ` Rob Herring
2023-02-14 21:26   ` Conor Dooley
2023-02-14 21:26     ` Conor Dooley
2023-02-15 20:50     ` Evan Green
2023-02-15 20:50       ` Evan Green
2023-02-06 20:14 ` [PATCH v2 5/6] RISC-V: hwprobe: Support probing of " Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-07  7:02   ` kernel test robot
2023-02-07  7:02     ` kernel test robot
2023-02-15 21:57   ` Conor Dooley
2023-02-15 21:57     ` Conor Dooley
2023-02-18  0:15     ` Evan Green
2023-02-18  0:15       ` Evan Green
2023-02-06 20:14 ` [PATCH v2 6/6] selftests: Test the new RISC-V hwprobe interface Evan Green
2023-02-06 20:14   ` Evan Green
2023-02-06 21:27   ` Mark Brown
2023-02-06 21:27     ` Mark Brown
2023-02-09 18:44     ` Evan Green
2023-02-09 18:44       ` Evan Green
2023-02-06 21:11 ` [PATCH v2 0/6] RISC-V Hardware Probing User Interface Jessica Clarke
2023-02-06 21:11   ` Jessica Clarke
2023-02-06 22:47   ` Heinrich Schuchardt
2023-02-06 22:47     ` Heinrich Schuchardt
2023-02-09 16:56     ` Palmer Dabbelt
2023-02-09 16:56       ` Palmer Dabbelt
2023-02-06 22:32 ` Conor Dooley
2023-02-06 22:32   ` Conor Dooley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-8736b349-e27a-4372-81ca-3a25d2ec1e94@palmer-ri-x1c9 \
    --to=palmer@dabbelt.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=aou@eecs.berkeley.edu \
    --cc=conor@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=evan@rivosinc.com \
    --cc=heiko@sntech.de \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=paul.walmsley@sifive.com \
    --cc=robh@kernel.org \
    --cc=slewis@rivosinc.com \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.