All of lore.kernel.org
 help / color / mirror / Atom feed
* Locking for HW crypto accelerators
@ 2018-08-30 12:22 Krzysztof Kozlowski
  2018-08-30 12:56 ` Stephan Mueller
  2018-08-30 13:39 ` Herbert Xu
  0 siblings, 2 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2018-08-30 12:22 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, linux-crypto, linux-kernel, smueller

Hi,

I am trying to figure out necessary locking on the driver side of
crypto HW accelerator for symmetric hash (actually: CRC). I
implemented quite simple driver for shash_alg.

I looked at the docs, I looked at the crypto kcapi core code... and
there is nothing about necessary locking. kcapi does not perform it.

My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
only one HW set of registers for dealing with CRC. Or in other words,
only one queue of one element. :) I implemented all shash_alg
callbacks - init(), update(), final()... and also finup() (manually
calling update+final) and digest() (init+update+final).

Now imagine multiple user-space users of this crypto alg where all of
them call kcapi_md_digest() (so essentially init() -> update() ->
final()). It seems that kcapi does not perform any locking here so at
some point updates from different processes might be mixed with
finals:

Process A:             Process B:
init()
                       init()
update()
                       update()
final()
                       final()

My findings show that the requests are indeed being mixed with each other...

Should driver perform any weird locking here? Or maybe that is the
case of using ONLY the digest() callback (so no update, no final)
because my HW cannot track different kcapi requests?

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 12:22 Locking for HW crypto accelerators Krzysztof Kozlowski
@ 2018-08-30 12:56 ` Stephan Mueller
  2018-08-30 13:09   ` Krzysztof Kozlowski
  2018-08-30 13:39 ` Herbert Xu
  1 sibling, 1 reply; 9+ messages in thread
From: Stephan Mueller @ 2018-08-30 12:56 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Herbert Xu, David S. Miller, linux-crypto, linux-kernel

Am Donnerstag, 30. August 2018, 14:22:22 CEST schrieb Krzysztof Kozlowski:

Hi Krzysztof,

> Hi,
> 
> I am trying to figure out necessary locking on the driver side of
> crypto HW accelerator for symmetric hash (actually: CRC). I
> implemented quite simple driver for shash_alg.
> 
> I looked at the docs, I looked at the crypto kcapi core code... and
> there is nothing about necessary locking. kcapi does not perform it.
> 
> My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> only one HW set of registers for dealing with CRC. Or in other words,
> only one queue of one element. :) I implemented all shash_alg
> callbacks - init(), update(), final()... and also finup() (manually
> calling update+final) and digest() (init+update+final).
> 
> Now imagine multiple user-space users of this crypto alg where all of
> them call kcapi_md_digest() (so essentially init() -> update() ->
> final()). It seems that kcapi does not perform any locking here so at
> some point updates from different processes might be mixed with
> finals:
> 
> Process A:             Process B:
> init()
>                        init()
> update()
>                        update()
> final()
>                        final()
> 
> My findings show that the requests are indeed being mixed with each other...
> 
> Should driver perform any weird locking here? Or maybe that is the
> case of using ONLY the digest() callback (so no update, no final)
> because my HW cannot track different kcapi requests?

The hashing is performed on a buffer provided by the caller. E.g. it is the 
buffer pointed to by the ahash request or the shash desc structure. All 
operations of init/update/final operate on that memory.

If you have parallel requests, each caller has a private buffer that it 
provides to the kernel crypto API. This applies also to AF_ALG.

Thus, as long as the individual operations of init/update and final are atomic 
operations, there should be no locking necessary.

Thus, all your driver needs to guarantee is the atomicitcy of the init/update/
final operation in respect to your hardware state.

Ciao
Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 12:56 ` Stephan Mueller
@ 2018-08-30 13:09   ` Krzysztof Kozlowski
  2018-08-30 13:19     ` Stephan Mueller
  2018-08-30 13:27     ` Kamil Konieczny
  0 siblings, 2 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2018-08-30 13:09 UTC (permalink / raw)
  To: smueller; +Cc: herbert, davem, linux-crypto, linux-kernel

On Thu, 30 Aug 2018 at 14:59, Stephan Mueller <smueller@chronox.de> wrote:
>
> Am Donnerstag, 30. August 2018, 14:22:22 CEST schrieb Krzysztof Kozlowski:
>
> Hi Krzysztof,
>
> > Hi,
> >
> > I am trying to figure out necessary locking on the driver side of
> > crypto HW accelerator for symmetric hash (actually: CRC). I
> > implemented quite simple driver for shash_alg.
> >
> > I looked at the docs, I looked at the crypto kcapi core code... and
> > there is nothing about necessary locking. kcapi does not perform it.
> >
> > My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> > only one HW set of registers for dealing with CRC. Or in other words,
> > only one queue of one element. :) I implemented all shash_alg
> > callbacks - init(), update(), final()... and also finup() (manually
> > calling update+final) and digest() (init+update+final).
> >
> > Now imagine multiple user-space users of this crypto alg where all of
> > them call kcapi_md_digest() (so essentially init() -> update() ->
> > final()). It seems that kcapi does not perform any locking here so at
> > some point updates from different processes might be mixed with
> > finals:
> >
> > Process A:             Process B:
> > init()
> >                        init()
> > update()
> >                        update()
> > final()
> >                        final()
> >
> > My findings show that the requests are indeed being mixed with each other...
> >
> > Should driver perform any weird locking here? Or maybe that is the
> > case of using ONLY the digest() callback (so no update, no final)
> > because my HW cannot track different kcapi requests?
>
> The hashing is performed on a buffer provided by the caller. E.g. it is the
> buffer pointed to by the ahash request or the shash desc structure. All
> operations of init/update/final operate on that memory.
>
> If you have parallel requests, each caller has a private buffer that it
> provides to the kernel crypto API. This applies also to AF_ALG.
>
> Thus, as long as the individual operations of init/update and final are atomic
> operations, there should be no locking necessary.
>
> Thus, all your driver needs to guarantee is the atomicitcy of the init/update/
> final operation in respect to your hardware state.

Thanks Stephan for hints. Let's assume the each of init, update and
final are atomic... but how about the relation between update and
final? I have two concurrent users in user-space but only one HW:

Process A:             Process B:
init() and set_key()
                       init() and different key
update(some_data)
                       update(different_data)
final()
                       final()

The final() from process A will now produce the result of hashing/CRC
of some_data and different_data (and even maybe mixed with init() for
different key). All because in the meantime process B added its own
data to the HW.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 13:09   ` Krzysztof Kozlowski
@ 2018-08-30 13:19     ` Stephan Mueller
  2018-08-30 13:54       ` Krzysztof Kozlowski
  2018-08-30 13:27     ` Kamil Konieczny
  1 sibling, 1 reply; 9+ messages in thread
From: Stephan Mueller @ 2018-08-30 13:19 UTC (permalink / raw)
  To: Krzysztof Kozlowski; +Cc: herbert, davem, linux-crypto, linux-kernel

Am Donnerstag, 30. August 2018, 15:09:05 CEST schrieb Krzysztof Kozlowski:

Hi Krzysztof,

> Thanks Stephan for hints. Let's assume the each of init, update and
> final are atomic... but how about the relation between update and
> final? I have two concurrent users in user-space but only one HW:
> 
> Process A:             Process B:
> init() and set_key()
>                        init() and different key
> update(some_data)
>                        update(different_data)
> final()
>                        final()
> 
> The final() from process A will now produce the result of hashing/CRC
> of some_data and different_data (and even maybe mixed with init() for
> different key). All because in the meantime process B added its own
> data to the HW.

The question is where is the state of the cipher operation kept that is 
produced with each of the init/update/final calls. Your answer implies that 
this state is kept in hardware.

But commonly the state is kept in software. Look at ahash_request for example, 
the __ctx pointer is intended to point to the memory the driver needs to store 
its state.

Pick a random driver implementation and search for ahash_request_ctx, for 
example.
> 
> Best regards,
> Krzysztof



Ciao
Stephan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 13:09   ` Krzysztof Kozlowski
  2018-08-30 13:19     ` Stephan Mueller
@ 2018-08-30 13:27     ` Kamil Konieczny
  2018-08-30 13:59       ` Krzysztof Kozlowski
  1 sibling, 1 reply; 9+ messages in thread
From: Kamil Konieczny @ 2018-08-30 13:27 UTC (permalink / raw)
  To: Krzysztof Kozlowski, smueller; +Cc: herbert, davem, linux-crypto, linux-kernel

On 30.08.2018 15:09, Krzysztof Kozlowski wrote:
> [...]
> Thanks Stephan for hints. Let's assume the each of init, update and
> final are atomic... but how about the relation between update and
> final? I have two concurrent users in user-space but only one HW:
> 
> Process A:             Process B:
> init() and set_key()
>                        init() and different key
> update(some_data)
>                        update(different_data)
> final()
>                        final()
> 
> The final() from process A will now produce the result of hashing/CRC
> of some_data and different_data (and even maybe mixed with init() for
> different key). All because in the meantime process B added its own
> data to the HW.
> 
> Best regards,
> Krzysztof
Can your hardware do export/import ?

If yes, you can use workqueue and guard HW with spinlock,
as in exynos hash in s5p-sss.c (or see other drivers).

-- 
Best regards,
Kamil Konieczny
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 12:22 Locking for HW crypto accelerators Krzysztof Kozlowski
  2018-08-30 12:56 ` Stephan Mueller
@ 2018-08-30 13:39 ` Herbert Xu
  2018-08-30 14:00   ` Krzysztof Kozlowski
  1 sibling, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2018-08-30 13:39 UTC (permalink / raw)
  To: Krzysztof Kozlowski; +Cc: David S. Miller, linux-crypto, linux-kernel, smueller

On Thu, Aug 30, 2018 at 02:22:22PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> I am trying to figure out necessary locking on the driver side of
> crypto HW accelerator for symmetric hash (actually: CRC). I
> implemented quite simple driver for shash_alg.
> 
> I looked at the docs, I looked at the crypto kcapi core code... and
> there is nothing about necessary locking. kcapi does not perform it.
> 
> My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> only one HW set of registers for dealing with CRC. Or in other words,
> only one queue of one element. :) I implemented all shash_alg
> callbacks - init(), update(), final()... and also finup() (manually
> calling update+final) and digest() (init+update+final).
> 
> Now imagine multiple user-space users of this crypto alg where all of
> them call kcapi_md_digest() (so essentially init() -> update() ->
> final()). It seems that kcapi does not perform any locking here so at
> some point updates from different processes might be mixed with
> finals:
> 
> Process A:             Process B:
> init()
>                        init()
> update()
>                        update()
> final()
>                        final()
> 
> My findings show that the requests are indeed being mixed with each other...

After each operation all state must be stored in the ahash_request
object.  The next operation should then load the state from the
request object into the hardware.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 13:19     ` Stephan Mueller
@ 2018-08-30 13:54       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2018-08-30 13:54 UTC (permalink / raw)
  To: smueller; +Cc: herbert, davem, linux-crypto, linux-kernel

On Thu, 30 Aug 2018 at 15:19, Stephan Mueller <smueller@chronox.de> wrote:
>
> Am Donnerstag, 30. August 2018, 15:09:05 CEST schrieb Krzysztof Kozlowski:
>
> Hi Krzysztof,
>
> > Thanks Stephan for hints. Let's assume the each of init, update and
> > final are atomic... but how about the relation between update and
> > final? I have two concurrent users in user-space but only one HW:
> >
> > Process A:             Process B:
> > init() and set_key()
> >                        init() and different key
> > update(some_data)
> >                        update(different_data)
> > final()
> >                        final()
> >
> > The final() from process A will now produce the result of hashing/CRC
> > of some_data and different_data (and even maybe mixed with init() for
> > different key). All because in the meantime process B added its own
> > data to the HW.
>
> The question is where is the state of the cipher operation kept that is
> produced with each of the init/update/final calls. Your answer implies that
> this state is kept in hardware.

Yes, that's correct. The HW once initialized to specific CRC
parameters (size, polynomial, initial value), will operate on this
till another init happens.

> But commonly the state is kept in software. Look at ahash_request for example,
> the __ctx pointer is intended to point to the memory the driver needs to store
> its state.
>
> Pick a random driver implementation and search for ahash_request_ctx, for
> example.

I see now some drivers are indeed saving and restoring state.

Thanks!

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 13:27     ` Kamil Konieczny
@ 2018-08-30 13:59       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2018-08-30 13:59 UTC (permalink / raw)
  To: k.konieczny; +Cc: smueller, herbert, davem, linux-crypto, linux-kernel

On Thu, 30 Aug 2018 at 15:27, Kamil Konieczny
<k.konieczny@partner.samsung.com> wrote:
>
> On 30.08.2018 15:09, Krzysztof Kozlowski wrote:
> > [...]
> > Thanks Stephan for hints. Let's assume the each of init, update and
> > final are atomic... but how about the relation between update and
> > final? I have two concurrent users in user-space but only one HW:
> >
> > Process A:             Process B:
> > init() and set_key()
> >                        init() and different key
> > update(some_data)
> >                        update(different_data)
> > final()
> >                        final()
> >
> > The final() from process A will now produce the result of hashing/CRC
> > of some_data and different_data (and even maybe mixed with init() for
> > different key). All because in the meantime process B added its own
> > data to the HW.
> >
> > Best regards,
> > Krzysztof
> Can your hardware do export/import ?
>
> If yes, you can use workqueue and guard HW with spinlock,
> as in exynos hash in s5p-sss.c (or see other drivers).

Yes, workqueue doing all necessary HW initialization would be the
solution here as well. The point is what later Herbert wrote - I need
to take care about the HW state because the requests even for shash
will be coming in random order.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Locking for HW crypto accelerators
  2018-08-30 13:39 ` Herbert Xu
@ 2018-08-30 14:00   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2018-08-30 14:00 UTC (permalink / raw)
  To: herbert; +Cc: davem, linux-crypto, linux-kernel, smueller

On Thu, 30 Aug 2018 at 15:39, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Thu, Aug 30, 2018 at 02:22:22PM +0200, Krzysztof Kozlowski wrote:
> > Hi,
> >
> > I am trying to figure out necessary locking on the driver side of
> > crypto HW accelerator for symmetric hash (actually: CRC). I
> > implemented quite simple driver for shash_alg.
> >
> > I looked at the docs, I looked at the crypto kcapi core code... and
> > there is nothing about necessary locking. kcapi does not perform it.
> >
> > My HW is quite similar to drivers/crypto/stm32/stm32_crc32.c so it has
> > only one HW set of registers for dealing with CRC. Or in other words,
> > only one queue of one element. :) I implemented all shash_alg
> > callbacks - init(), update(), final()... and also finup() (manually
> > calling update+final) and digest() (init+update+final).
> >
> > Now imagine multiple user-space users of this crypto alg where all of
> > them call kcapi_md_digest() (so essentially init() -> update() ->
> > final()). It seems that kcapi does not perform any locking here so at
> > some point updates from different processes might be mixed with
> > finals:
> >
> > Process A:             Process B:
> > init()
> >                        init()
> > update()
> >                        update()
> > final()
> >                        final()
> >
> > My findings show that the requests are indeed being mixed with each other...
>
> After each operation all state must be stored in the ahash_request
> object.  The next operation should then load the state from the
> request object into the hardware.

Thanks, that's the solution I was missing.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-08-30 14:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 12:22 Locking for HW crypto accelerators Krzysztof Kozlowski
2018-08-30 12:56 ` Stephan Mueller
2018-08-30 13:09   ` Krzysztof Kozlowski
2018-08-30 13:19     ` Stephan Mueller
2018-08-30 13:54       ` Krzysztof Kozlowski
2018-08-30 13:27     ` Kamil Konieczny
2018-08-30 13:59       ` Krzysztof Kozlowski
2018-08-30 13:39 ` Herbert Xu
2018-08-30 14:00   ` Krzysztof Kozlowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.