Getting weird TPM error after rebasing my tree to security/next-general

linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Getting weird TPM error after rebasing my tree to security/next-general
@ 2019-01-18 14:25 Jarkko Sakkinen
  2019-01-18 22:09 ` James Bottomley
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-18 14:25 UTC (permalink / raw)
  To: linux-integrity, linux-security-module; +Cc: linux-kernel

I get this on a Geminilake NUC after rebasing my maintainer trees:

tpm tpm0: A TPM error (-1) occurred attempting the self test

I checked the latest commit ID from drivers/char/tpm to make sure
that I did not put anything broken to my last PR [1]. It works
without issues.

In addition [2] gives me an empty diff.

Something outside of the TPM driver must have happened that breaks
the driver. Any ideas?

[1] commit 9488585b21bef0df1217e510c7134905d1d376a7
[2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master drivers/char/tpm/

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-18 14:25 Getting weird TPM error after rebasing my tree to security/next-general Jarkko Sakkinen
@ 2019-01-18 22:09 ` James Bottomley
  2019-01-20 16:04   ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: James Bottomley @ 2019-01-18 22:09 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-integrity, linux-security-module; +Cc: linux-kernel

On Fri, 2019-01-18 at 16:25 +0200, Jarkko Sakkinen wrote:
> I get this on a Geminilake NUC after rebasing my maintainer trees:
> 
> tpm tpm0: A TPM error (-1) occurred attempting the self test
> 
> I checked the latest commit ID from drivers/char/tpm to make sure
> that I did not put anything broken to my last PR [1]. It works
> without issues.
> 
> In addition [2] gives me an empty diff.
> 
> Something outside of the TPM driver must have happened that breaks
> the driver. Any ideas?
> 
> [1] commit 9488585b21bef0df1217e510c7134905d1d376a7
> [2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master
> drivers/char/tpm/

I'm afraid you're going to have to bisect to find the offending in-
kernel commit, which is going to be painful since it seems to depend on
physical hardware.  My first instinct is that we're getting a zero
length read somewhere, but I still can't see anything in the merge
window that would cause that behaviour.

James


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-18 22:09 ` James Bottomley
@ 2019-01-20 16:04   ` Jarkko Sakkinen
  2019-01-22  1:02     ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-20 16:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-integrity, linux-security-module, linux-kernel

On Fri, Jan 18, 2019 at 02:09:18PM -0800, James Bottomley wrote:
> On Fri, 2019-01-18 at 16:25 +0200, Jarkko Sakkinen wrote:
> > I get this on a Geminilake NUC after rebasing my maintainer trees:
> > 
> > tpm tpm0: A TPM error (-1) occurred attempting the self test
> > 
> > I checked the latest commit ID from drivers/char/tpm to make sure
> > that I did not put anything broken to my last PR [1]. It works
> > without issues.
> > 
> > In addition [2] gives me an empty diff.
> > 
> > Something outside of the TPM driver must have happened that breaks
> > the driver. Any ideas?
> > 
> > [1] commit 9488585b21bef0df1217e510c7134905d1d376a7
> > [2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master
> > drivers/char/tpm/
> 
> I'm afraid you're going to have to bisect to find the offending in-
> kernel commit, which is going to be painful since it seems to depend on
> physical hardware.  My first instinct is that we're getting a zero
> length read somewhere, but I still can't see anything in the merge
> window that would cause that behaviour.

Yeah, I've started to bisect it (still 9 rounds to go).

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-20 16:04   ` Jarkko Sakkinen
@ 2019-01-22  1:02     ` Jarkko Sakkinen
  2019-01-22  2:58       ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-22  1:02 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-integrity, linux-security-module, linux-kernel

On Sun, Jan 20, 2019 at 06:04:13PM +0200, Jarkko Sakkinen wrote:
> On Fri, Jan 18, 2019 at 02:09:18PM -0800, James Bottomley wrote:
> > On Fri, 2019-01-18 at 16:25 +0200, Jarkko Sakkinen wrote:
> > > I get this on a Geminilake NUC after rebasing my maintainer trees:
> > > 
> > > tpm tpm0: A TPM error (-1) occurred attempting the self test
> > > 
> > > I checked the latest commit ID from drivers/char/tpm to make sure
> > > that I did not put anything broken to my last PR [1]. It works
> > > without issues.
> > > 
> > > In addition [2] gives me an empty diff.
> > > 
> > > Something outside of the TPM driver must have happened that breaks
> > > the driver. Any ideas?
> > > 
> > > [1] commit 9488585b21bef0df1217e510c7134905d1d376a7
> > > [2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master
> > > drivers/char/tpm/
> > 
> > I'm afraid you're going to have to bisect to find the offending in-
> > kernel commit, which is going to be painful since it seems to depend on
> > physical hardware.  My first instinct is that we're getting a zero
> > length read somewhere, but I still can't see anything in the merge
> > window that would cause that behaviour.
> 
> Yeah, I've started to bisect it (still 9 rounds to go).

Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.

Double-checked this after bisecting by compiling the kernel with these
commits as tip.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-22  1:02     ` Jarkko Sakkinen
@ 2019-01-22  2:58       ` Jason Gunthorpe
  2019-01-22 13:29         ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2019-01-22  2:58 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: James Bottomley, linux-integrity, linux-security-module, linux-kernel

On Tue, Jan 22, 2019 at 03:02:18AM +0200, Jarkko Sakkinen wrote:
> On Sun, Jan 20, 2019 at 06:04:13PM +0200, Jarkko Sakkinen wrote:
> > On Fri, Jan 18, 2019 at 02:09:18PM -0800, James Bottomley wrote:
> > > On Fri, 2019-01-18 at 16:25 +0200, Jarkko Sakkinen wrote:
> > > > I get this on a Geminilake NUC after rebasing my maintainer trees:
> > > > 
> > > > tpm tpm0: A TPM error (-1) occurred attempting the self test
> > > > 
> > > > I checked the latest commit ID from drivers/char/tpm to make sure
> > > > that I did not put anything broken to my last PR [1]. It works
> > > > without issues.
> > > > 
> > > > In addition [2] gives me an empty diff.
> > > > 
> > > > Something outside of the TPM driver must have happened that breaks
> > > > the driver. Any ideas?
> > > > 
> > > > [1] commit 9488585b21bef0df1217e510c7134905d1d376a7
> > > > [2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master
> > > > drivers/char/tpm/
> > > 
> > > I'm afraid you're going to have to bisect to find the offending in-
> > > kernel commit, which is going to be painful since it seems to depend on
> > > physical hardware.  My first instinct is that we're getting a zero
> > > length read somewhere, but I still can't see anything in the merge
> > > window that would cause that behaviour.
> > 
> > Yeah, I've started to bisect it (still 9 rounds to go).
> 
> Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
> preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.

This changes the IO access pattern in memcpy_to/fromio.. Presumably
CRB HW doesn't like the new 4 byte move? Swap each one in crb to
memcpy to confirm..

If the HW requires particular access patterns you can't use
memcpy_to/fromio

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-22  2:58       ` Jason Gunthorpe
@ 2019-01-22 13:29         ` Jarkko Sakkinen
  2019-01-22 18:26           ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-22 13:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Linus Torvalds
  Cc: James Bottomley, linux-integrity, linux-security-module, linux-kernel

On Mon, Jan 21, 2019 at 07:58:36PM -0700, Jason Gunthorpe wrote:
> On Tue, Jan 22, 2019 at 03:02:18AM +0200, Jarkko Sakkinen wrote:
> > On Sun, Jan 20, 2019 at 06:04:13PM +0200, Jarkko Sakkinen wrote:
> > > On Fri, Jan 18, 2019 at 02:09:18PM -0800, James Bottomley wrote:
> > > > On Fri, 2019-01-18 at 16:25 +0200, Jarkko Sakkinen wrote:
> > > > > I get this on a Geminilake NUC after rebasing my maintainer trees:
> > > > > 
> > > > > tpm tpm0: A TPM error (-1) occurred attempting the self test
> > > > > 
> > > > > I checked the latest commit ID from drivers/char/tpm to make sure
> > > > > that I did not put anything broken to my last PR [1]. It works
> > > > > without issues.
> > > > > 
> > > > > In addition [2] gives me an empty diff.
> > > > > 
> > > > > Something outside of the TPM driver must have happened that breaks
> > > > > the driver. Any ideas?
> > > > > 
> > > > > [1] commit 9488585b21bef0df1217e510c7134905d1d376a7
> > > > > [2] git diff 9488585b21bef0df1217e510c7134905d1d376a7 master
> > > > > drivers/char/tpm/
> > > > 
> > > > I'm afraid you're going to have to bisect to find the offending in-
> > > > kernel commit, which is going to be painful since it seems to depend on
> > > > physical hardware.  My first instinct is that we're getting a zero
> > > > length read somewhere, but I still can't see anything in the merge
> > > > window that would cause that behaviour.
> > > 
> > > Yeah, I've started to bisect it (still 9 rounds to go).
> > 
> > Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
> > preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.
> 
> This changes the IO access pattern in memcpy_to/fromio.. Presumably
> CRB HW doesn't like the new 4 byte move? Swap each one in crb to
> memcpy to confirm..
> 
> If the HW requires particular access patterns you can't use
> memcpy_to/fromio

Did not have time to look at the commit at all but your deduction
is correct. I know it without testing.

Memory controller will feed 1's on unaligned read from IO memory,
and as we can see from the TPM header, this change causes two of
those:

struct tpm_output_header {
	__be16	tag;
	__be32	length;
	__be32	return_code;
} __packed;

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-22 13:29         ` Jarkko Sakkinen
@ 2019-01-22 18:26           ` Linus Torvalds
  2019-01-23 15:36             ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-01-22 18:26 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Wed, Jan 23, 2019 at 2:29 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> > >
> > > Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
> > > preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.
> >
> > This changes the IO access pattern in memcpy_to/fromio.. Presumably
> > CRB HW doesn't like the new 4 byte move? Swap each one in crb to
> > memcpy to confirm..
> >
> > If the HW requires particular access patterns you can't use
> > memcpy_to/fromio
>
> Did not have time to look at the commit at all but your deduction
> is correct. I know it without testing.
>
> Memory controller will feed 1's on unaligned read from IO memory,
> and as we can see from the TPM header, this change causes two of
> those:

Funky. But how did it work before then?

The new memcpy_fromio() is designed to have _predictable_ access
patterns. Not necessarily the best, but at least consistent.

Prevously, we used whatever random "memcpy()" implementation we
happened to pick, which *could* be aligned (particularly "rep movsb" -
absolutely horrible performance for MMIO, but by doing IO one byte at
a time it was certainly aligned ;), but most of our x86 memcpy
implementations don't actually try all that hard to align the source.
And the manual version will actually copy things *backwards* for some
cases.

Is it just that this particular hardware always happened to trigger
the ERMS case (ie "rep movsb")?

Anyway, Jason is correct that if a device has particular IO pattern
requirements, you shouldn't use "memcpy_fromio()" and friends, but
it's interesting how it apparently *happened* to work before.

             Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-22 18:26           ` Linus Torvalds
@ 2019-01-23 15:36             ` Jarkko Sakkinen
  2019-01-23 18:43               ` Linus Torvalds
  2019-01-23 20:11               ` Jarkko Sakkinen
  0 siblings, 2 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-23 15:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Wed, Jan 23, 2019 at 07:26:42AM +1300, Linus Torvalds wrote:
> On Wed, Jan 23, 2019 at 2:29 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > > >
> > > > Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
> > > > preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.
> > >
> > > This changes the IO access pattern in memcpy_to/fromio.. Presumably
> > > CRB HW doesn't like the new 4 byte move? Swap each one in crb to
> > > memcpy to confirm..
> > >
> > > If the HW requires particular access patterns you can't use
> > > memcpy_to/fromio
> >
> > Did not have time to look at the commit at all but your deduction
> > is correct. I know it without testing.
> >
> > Memory controller will feed 1's on unaligned read from IO memory,
> > and as we can see from the TPM header, this change causes two of
> > those:
> 
> Funky. But how did it work before then?
> 
> The new memcpy_fromio() is designed to have _predictable_ access
> patterns. Not necessarily the best, but at least consistent.
> 
> Prevously, we used whatever random "memcpy()" implementation we
> happened to pick, which *could* be aligned (particularly "rep movsb" -
> absolutely horrible performance for MMIO, but by doing IO one byte at
> a time it was certainly aligned ;), but most of our x86 memcpy
> implementations don't actually try all that hard to align the source.
> And the manual version will actually copy things *backwards* for some
> cases.
> 
> Is it just that this particular hardware always happened to trigger
> the ERMS case (ie "rep movsb")?

This is the particular snippet in question:

memcpy_fromio(buf, priv->rsp, 6);
expected = be32_to_cpup((__be32 *) &buf[2]);
if (expected > count || expected < 6)
	return -EIO;

memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);

I guess it did in the first memcpy_fromio operation since it is less
than a quad word, right? Not sure why the 2nd memcpy_fromio() operation
has worked, though.

> Anyway, Jason is correct that if a device has particular IO pattern
> requirements, you shouldn't use "memcpy_fromio()" and friends, but
> it's interesting how it apparently *happened* to work before.
> 
>              Linus

Sure, I'll prepare a fix ASAP.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-23 15:36             ` Jarkko Sakkinen
@ 2019-01-23 18:43               ` Linus Torvalds
  2019-01-29 13:20                 ` Jarkko Sakkinen
  2019-01-23 20:11               ` Jarkko Sakkinen
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-01-23 18:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
> >
> > Is it just that this particular hardware always happened to trigger
> > the ERMS case (ie "rep movsb")?
>
> This is the particular snippet in question:
>
> memcpy_fromio(buf, priv->rsp, 6);
> expected = be32_to_cpup((__be32 *) &buf[2]);
> if (expected > count || expected < 6)
>         return -EIO;

Ok, strange.

So what *used* to happen is that the memcpy_fromio() would just expand
as a "memcpy()", and in this case, gcc would then inline the memcpy().
In fact, gcc does it as a 4-byte access and a two-byte access from
what I can tell.

Which is actually exactly the same as memcpy_fromio() should do, just
using a different code sequence.

> memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);

This one gets turned into an out-of-line "memcpy()" in the old world
order, which depending on size will do different things, but might be
a "rep movsb". Or it might be the software expansion that does
overlapping accesses and/or backwards copies.

In the new world order, it's the "memcpy_fromio()" that willdo first
4-byte accesses for the main bulk of the copy, and then end up with a
two-byte and single-byte move to pad out the end.

> I guess it did in the first memcpy_fromio operation since it is less
> than a quad word, right? Not sure why the 2nd memcpy_fromio() operation
> has worked, though.

The first one seems to do the same thing now as it used to do, so I
don't *think* it should have mattered.

The second one looks like it is unaligned (offset 6) and doing the
4-byte io reads would fail if that device needs aligned accesses. The
old memcpy() *might* have done it with a "rep movsb" that would just
work (?).

                 Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-23 15:36             ` Jarkko Sakkinen
  2019-01-23 18:43               ` Linus Torvalds
@ 2019-01-23 20:11               ` Jarkko Sakkinen
  1 sibling, 0 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-23 20:11 UTC (permalink / raw)
  To: Linus Torvalds, tomas.winkler
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Wed, Jan 23, 2019 at 05:36:38PM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 23, 2019 at 07:26:42AM +1300, Linus Torvalds wrote:
> > On Wed, Jan 23, 2019 at 2:29 AM Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > >
> > > > >
> > > > > Fails on commit 170d13ca3a2fdaaa0283399247631b76b441cca2. Still works on
> > > > > preceding commit a959dc88f9c8900296ccf13e2f3e1cbc555a8917.
> > > >
> > > > This changes the IO access pattern in memcpy_to/fromio.. Presumably
> > > > CRB HW doesn't like the new 4 byte move? Swap each one in crb to
> > > > memcpy to confirm..
> > > >
> > > > If the HW requires particular access patterns you can't use
> > > > memcpy_to/fromio
> > >
> > > Did not have time to look at the commit at all but your deduction
> > > is correct. I know it without testing.
> > >
> > > Memory controller will feed 1's on unaligned read from IO memory,
> > > and as we can see from the TPM header, this change causes two of
> > > those:
> > 
> > Funky. But how did it work before then?
> > 
> > The new memcpy_fromio() is designed to have _predictable_ access
> > patterns. Not necessarily the best, but at least consistent.
> > 
> > Prevously, we used whatever random "memcpy()" implementation we
> > happened to pick, which *could* be aligned (particularly "rep movsb" -
> > absolutely horrible performance for MMIO, but by doing IO one byte at
> > a time it was certainly aligned ;), but most of our x86 memcpy
> > implementations don't actually try all that hard to align the source.
> > And the manual version will actually copy things *backwards* for some
> > cases.
> > 
> > Is it just that this particular hardware always happened to trigger
> > the ERMS case (ie "rep movsb")?
> 
> This is the particular snippet in question:
> 
> memcpy_fromio(buf, priv->rsp, 6);
> expected = be32_to_cpup((__be32 *) &buf[2]);
> if (expected > count || expected < 6)
> 	return -EIO;
> 
> memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
> 
> I guess it did in the first memcpy_fromio operation since it is less
> than a quad word, right? Not sure why the 2nd memcpy_fromio() operation
> has worked, though.

And I wonder why 32-bit has worked before.

Tomas, you've been more involved with ME and fTPM runs there. Do you
have any clues where this could be rooted?

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-23 18:43               ` Linus Torvalds
@ 2019-01-29 13:20                 ` Jarkko Sakkinen
  2019-01-31 12:26                   ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-29 13:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote:
> On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> > >
> > > Is it just that this particular hardware always happened to trigger
> > > the ERMS case (ie "rep movsb")?
> >
> > This is the particular snippet in question:
> >
> > memcpy_fromio(buf, priv->rsp, 6);
> > expected = be32_to_cpup((__be32 *) &buf[2]);
> > if (expected > count || expected < 6)
> >         return -EIO;
> 
> Ok, strange.
> 
> So what *used* to happen is that the memcpy_fromio() would just expand
> as a "memcpy()", and in this case, gcc would then inline the memcpy().
> In fact, gcc does it as a 4-byte access and a two-byte access from
> what I can tell.

I verified, and it is exactly as you stated:

   0xffffffff814aaa33 <+51>:	mov    (%rax),%edx
   0xffffffff814aaa35 <+53>:	mov    %edx,0x0(%rbp)
   0xffffffff814aaa38 <+56>:	movzwl 0x4(%rax),%eax
   0xffffffff814aaa3c <+60>:	mov    %ax,0x4(%rbp)

And your new version does exactly the same thing to the first six bytes
(with different opcode, but the same memory access pattern).

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-29 13:20                 ` Jarkko Sakkinen
@ 2019-01-31 12:26                   ` Jarkko Sakkinen
  2019-01-31 16:04                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 12:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing, tomas.winkler

On Tue, Jan 29, 2019 at 03:20:16PM +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote:
> > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > > >
> > > > Is it just that this particular hardware always happened to trigger
> > > > the ERMS case (ie "rep movsb")?
> > >
> > > This is the particular snippet in question:
> > >
> > > memcpy_fromio(buf, priv->rsp, 6);
> > > expected = be32_to_cpup((__be32 *) &buf[2]);
> > > if (expected > count || expected < 6)
> > >         return -EIO;
> > 
> > Ok, strange.
> > 
> > So what *used* to happen is that the memcpy_fromio() would just expand
> > as a "memcpy()", and in this case, gcc would then inline the memcpy().
> > In fact, gcc does it as a 4-byte access and a two-byte access from
> > what I can tell.
> 
> I verified, and it is exactly as you stated:
> 
>    0xffffffff814aaa33 <+51>:	mov    (%rax),%edx
>    0xffffffff814aaa35 <+53>:	mov    %edx,0x0(%rbp)
>    0xffffffff814aaa38 <+56>:	movzwl 0x4(%rax),%eax
>    0xffffffff814aaa3c <+60>:	mov    %ax,0x4(%rbp)
> 
> And your new version does exactly the same thing to the first six bytes
> (with different opcode, but the same memory access pattern).

I think I have found the root cause:

memcpy_fromio(&__rsp_pa, &priv->regs_t->ctrl_rsp_pa, 8);

This is from crb_map_io(). This should be read as quad word.

I'll change it to ioread64() and see what happens. I don't know why it
even has used memcpy_fromio() in the first place. I guess, when I first
implemented the driver, I used that for no logical reason, and it has
worked since up until now.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 12:26                   ` Jarkko Sakkinen
@ 2019-01-31 16:04                     ` Jarkko Sakkinen
  2019-01-31 17:06                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 16:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing, tomas.winkler

On Thu, Jan 31, 2019 at 02:26:06PM +0200, Jarkko Sakkinen wrote:
> On Tue, Jan 29, 2019 at 03:20:16PM +0200, Jarkko Sakkinen wrote:
> > On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote:
> > > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen
> > > <jarkko.sakkinen@linux.intel.com> wrote:
> > > > >
> > > > > Is it just that this particular hardware always happened to trigger
> > > > > the ERMS case (ie "rep movsb")?
> > > >
> > > > This is the particular snippet in question:
> > > >
> > > > memcpy_fromio(buf, priv->rsp, 6);
> > > > expected = be32_to_cpup((__be32 *) &buf[2]);
> > > > if (expected > count || expected < 6)
> > > >         return -EIO;
> > > 
> > > Ok, strange.
> > > 
> > > So what *used* to happen is that the memcpy_fromio() would just expand
> > > as a "memcpy()", and in this case, gcc would then inline the memcpy().
> > > In fact, gcc does it as a 4-byte access and a two-byte access from
> > > what I can tell.
> > 
> > I verified, and it is exactly as you stated:
> > 
> >    0xffffffff814aaa33 <+51>:	mov    (%rax),%edx
> >    0xffffffff814aaa35 <+53>:	mov    %edx,0x0(%rbp)
> >    0xffffffff814aaa38 <+56>:	movzwl 0x4(%rax),%eax
> >    0xffffffff814aaa3c <+60>:	mov    %ax,0x4(%rbp)
> > 
> > And your new version does exactly the same thing to the first six bytes
> > (with different opcode, but the same memory access pattern).
> 
> I think I have found the root cause:
> 
> memcpy_fromio(&__rsp_pa, &priv->regs_t->ctrl_rsp_pa, 8);
> 
> This is from crb_map_io(). This should be read as quad word.
> 
> I'll change it to ioread64() and see what happens. I don't know why it
> even has used memcpy_fromio() in the first place. I guess, when I first
> implemented the driver, I used that for no logical reason, and it has
> worked since up until now.

No, cannot be it. If you couldn't read it in two dwords, then it would
have been always broken with 32-bit build.

Anyway, just in case, I will check what address it prints out.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 16:04                     ` Jarkko Sakkinen
@ 2019-01-31 17:06                       ` Jarkko Sakkinen
  2019-01-31 17:43                         ` Linus Torvalds
  2019-01-31 18:35                         ` Jarkko Sakkinen
  0 siblings, 2 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 17:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing, tomas.winkler

On Thu, Jan 31, 2019 at 06:04:37PM +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 31, 2019 at 02:26:06PM +0200, Jarkko Sakkinen wrote:
> > On Tue, Jan 29, 2019 at 03:20:16PM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Jan 24, 2019 at 07:43:30AM +1300, Linus Torvalds wrote:
> > > > On Thu, Jan 24, 2019 at 4:36 AM Jarkko Sakkinen
> > > > <jarkko.sakkinen@linux.intel.com> wrote:
> > > > > >
> > > > > > Is it just that this particular hardware always happened to trigger
> > > > > > the ERMS case (ie "rep movsb")?
> > > > >
> > > > > This is the particular snippet in question:
> > > > >
> > > > > memcpy_fromio(buf, priv->rsp, 6);
> > > > > expected = be32_to_cpup((__be32 *) &buf[2]);
> > > > > if (expected > count || expected < 6)
> > > > >         return -EIO;
> > > > 
> > > > Ok, strange.
> > > > 
> > > > So what *used* to happen is that the memcpy_fromio() would just expand
> > > > as a "memcpy()", and in this case, gcc would then inline the memcpy().
> > > > In fact, gcc does it as a 4-byte access and a two-byte access from
> > > > what I can tell.
> > > 
> > > I verified, and it is exactly as you stated:
> > > 
> > >    0xffffffff814aaa33 <+51>:	mov    (%rax),%edx
> > >    0xffffffff814aaa35 <+53>:	mov    %edx,0x0(%rbp)
> > >    0xffffffff814aaa38 <+56>:	movzwl 0x4(%rax),%eax
> > >    0xffffffff814aaa3c <+60>:	mov    %ax,0x4(%rbp)
> > > 
> > > And your new version does exactly the same thing to the first six bytes
> > > (with different opcode, but the same memory access pattern).
> > 
> > I think I have found the root cause:
> > 
> > memcpy_fromio(&__rsp_pa, &priv->regs_t->ctrl_rsp_pa, 8);
> > 
> > This is from crb_map_io(). This should be read as quad word.
> > 
> > I'll change it to ioread64() and see what happens. I don't know why it
> > even has used memcpy_fromio() in the first place. I guess, when I first
> > implemented the driver, I used that for no logical reason, and it has
> > worked since up until now.
> 
> No, cannot be it. If you couldn't read it in two dwords, then it would
> have been always broken with 32-bit build.
> 
> Anyway, just in case, I will check what address it prints out.

Found something that *does* fix the issue. If I replace memcpy_*io()
calls with regular memcpy(), the driver works and all my tests pass.

/Jarkko


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 17:06                       ` Jarkko Sakkinen
@ 2019-01-31 17:43                         ` Linus Torvalds
  2019-01-31 18:47                           ` Jarkko Sakkinen
  2019-01-31 18:35                         ` Jarkko Sakkinen
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-01-31 17:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing, tomas.winkler

On Thu, Jan 31, 2019 at 9:06 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> Found something that *does* fix the issue. If I replace memcpy_*io()
> calls with regular memcpy(), the driver works and all my tests pass.

That's not surprising, since that's what we used to do. And it's
horribly wrong because "memcpy()" can do things that are horribly
wrong on IO accesses. Like doing them twice, but alternatively also
"copy one byte at a time" which generally works, but is horrendously
slow for IO.

Can you check *which* memcpy_*io() triggers the issue? Maybe by
"bisecting" them (first perhaps on a file-by-file basis, and then
within a file).

                 Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 17:06                       ` Jarkko Sakkinen
  2019-01-31 17:43                         ` Linus Torvalds
@ 2019-01-31 18:35                         ` Jarkko Sakkinen
  2019-01-31 18:51                           ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 18:35 UTC (permalink / raw)
  To: Linus Torvalds, tomas.winkler
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Thu, Jan 31, 2019 at 07:06:03PM +0200, Jarkko Sakkinen wrote:
> Found something that *does* fix the issue. If I replace memcpy_*io()
> calls with regular memcpy(), the driver works and all my tests pass.

OK, so the length of the response is not trashed, but only the error
code. The attached patch fully fixes the issue.

Here's the header again:

struct tpm_output_header {
	__be16	tag;
	__be32	length;
	__be32	return_code;
} __packed;

The first to fields *are* read correctly and the last field get 1's
(thus TPM error -1).

With the attached the patch things work properly, but still
unsatisfactory fix (return to old behavior because it seems to
work).

/Jarkko

[-- Attachment #2: 0001-tpm-tpm_crb-Revert-to-memcpy-for-copying-tail-of-the.patch --]
[-- Type: text/x-diff, Size: 901 bytes --]

From 8ff7eb66e1dc05daf1319bf8365e5a3434a90061 Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Date: Thu, 31 Jan 2019 20:16:00 +0200
Subject: [PATCH] tpm/tpm_crb: Revert to memcpy() for copying tail of the
 response

Revert the behavior before 170d13ca3a2f.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 drivers/char/tpm/tpm_crb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 36952ef98f90..003923ea50d2 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -299,7 +299,7 @@ static int crb_recv(struct tpm_chip *chip, u8 *buf, size_t count)
 	if (expected > count || expected < 6)
 		return -EIO;
 
-	memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
+	memcpy(&buf[6], &priv->rsp[6], expected - 6);
 
 	return expected;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 17:43                         ` Linus Torvalds
@ 2019-01-31 18:47                           ` Jarkko Sakkinen
  0 siblings, 0 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing, tomas.winkler

On Thu, Jan 31, 2019 at 09:43:42AM -0800, Linus Torvalds wrote:
> On Thu, Jan 31, 2019 at 9:06 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > Found something that *does* fix the issue. If I replace memcpy_*io()
> > calls with regular memcpy(), the driver works and all my tests pass.
> 
> That's not surprising, since that's what we used to do. And it's
> horribly wrong because "memcpy()" can do things that are horribly
> wrong on IO accesses. Like doing them twice, but alternatively also
> "copy one byte at a time" which generally works, but is horrendously
> slow for IO.

Yup, was just a sanity check.

> Can you check *which* memcpy_*io() triggers the issue? Maybe by
> "bisecting" them (first perhaps on a file-by-file basis, and then
> within a file).

Already did that. See my follow-up.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 18:35                         ` Jarkko Sakkinen
@ 2019-01-31 18:51                           ` Linus Torvalds
  2019-01-31 18:52                             ` Linus Torvalds
  2019-01-31 20:45                             ` Jarkko Sakkinen
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2019-01-31 18:51 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 10:35 AM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> OK, so the length of the response is not trashed, but only the error
> code. The attached patch fully fixes the issue.
>
> Here's the header again:
>
> struct tpm_output_header {
>         __be16  tag;
>         __be32  length;
>         __be32  return_code;
> } __packed;
>
> The first to fields *are* read correctly and the last field get 1's
> (thus TPM error -1).

Ok, so this makes sense, even though that patch is (I think) completely wrong.

What happens is that the 32-bit fields are mis-aligned: the "tag" is
obviously properly 16-bit aligned, but then both "length" and
"return_code" are 32-bit fields that are only aligned on a 16-bit
alignment.

What happens is that first you copy the two first fields:

        memcpy_fromio(buf, priv->rsp, 6);

which copies "tag" and "length", but it copies them by reading then as
a 4-byte and then 2-byte value (in that order). So it actually reads
'tag' and 'first two bytes of 'length', and then the second access
reads the last two bytes of 'length'

And it all works, because the accesses are aligned by address of
access, even though they are *not* aligned in the 'struct
tpm_output_header' fields.

But then later on, when you read 'return_code', and do

        memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);

you now do a 4-byte memcpy at offset 6. So it does a 4-byte access,
bit it's not 4-byte aligned.

Honestly, memcpy() itself shouldn't have worked *either*, but you
lucked out. Gcc doesn't know that it's a 4-byte access, so it actually
calls out to the memcpy() routine. And that one happened to be "rep
movsb" on your machine. And that happened to work.

But it's really not supposed to work, and it really *wouldn't* have
worked if somebody disabled the rep-string functions.

In fact, we have another patch (that isn't applied) that makes even
the memcpy_erms() just call the sw version of memcpy() for short
copies (because "rep movsb" is slow for those cases). That would also
have broken your driver.

               Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 18:51                           ` Linus Torvalds
@ 2019-01-31 18:52                             ` Linus Torvalds
  2019-01-31 19:10                               ` Linus Torvalds
  2019-01-31 20:45                             ` Jarkko Sakkinen
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-01-31 18:52 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 10:51 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In fact, we have another patch (that isn't applied) that makes even
> the memcpy_erms() just call the sw version of memcpy() for short
> copies (because "rep movsb" is slow for those cases). That would also
> have broken your driver.

I think what I should do is to just make "memcpy_*io()" do the "align
naturally" thing.

Let me cook up a patch for you to test.

              Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 18:52                             ` Linus Torvalds
@ 2019-01-31 19:10                               ` Linus Torvalds
  2019-01-31 19:47                                 ` Winkler, Tomas
                                                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Linus Torvalds @ 2019-01-31 19:10 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 385 bytes --]

On Thu, Jan 31, 2019 at 10:52 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think what I should do is to just make "memcpy_*io()" do the "align
> naturally" thing.
>
> Let me cook up a patch for you to test.

Does this work for you?

I haven't tested it at all, but I verified that the generated code
seems to make at least some amount of sense.

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1595 bytes --]

 arch/x86/lib/iomem.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index 66894675f3c8..7b5b07f3d370 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -3,7 +3,7 @@
 #include <linux/io.h>
 
 /* Originally from i386/string.h */
-static __always_inline void __iomem_memcpy(void *to, const void *from, size_t n)
+static __always_inline void rep_movs(void *to, const void *from, size_t n)
 {
 	unsigned long d0, d1, d2;
 	asm volatile("rep ; movsl\n\t"
@@ -19,15 +19,36 @@ static __always_inline void __iomem_memcpy(void *to, const void *from, size_t n)
 		     : "memory");
 }
 
+#define movs(type,to,from) \
+	asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
+
 void memcpy_fromio(void *to, const volatile void __iomem *from, size_t n)
 {
-	__iomem_memcpy(to, (const void *)from, n);
+	/* Unaligned IO? */
+	if (n && unlikely(1 & (unsigned long)from)) {
+		movs("b", to, from);
+		n--;
+	}
+	if (n > 1 && unlikely(2 & (unsigned long)from)) {
+		movs("w", to, from);
+		n-=2;
+	}
+	rep_movs(to, (const void *)from, n);
 }
 EXPORT_SYMBOL(memcpy_fromio);
 
 void memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
 {
-	__iomem_memcpy((void *)to, (const void *) from, n);
+	/* Unaligned IO? */
+	if (n && unlikely(1 & (unsigned long)to)) {
+		movs("b", to, from);
+		n--;
+	}
+	if (n > 1 && unlikely(2 & (unsigned long)to)) {
+		movs("w", to, from);
+		n-=2;
+	}
+	rep_movs((void *)to, (const void *) from, n);
 }
 EXPORT_SYMBOL(memcpy_toio);
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 19:10                               ` Linus Torvalds
@ 2019-01-31 19:47                                 ` Winkler, Tomas
  2019-02-01  8:12                                   ` Jarkko Sakkinen
  2019-01-31 20:07                                 ` Winkler, Tomas
  2019-01-31 20:47                                 ` Jarkko Sakkinen
  2 siblings, 1 reply; 30+ messages in thread
From: Winkler, Tomas @ 2019-01-31 19:47 UTC (permalink / raw)
  To: Linus Torvalds, Jarkko Sakkinen
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing


> 
> On Thu, Jan 31, 2019 at 10:52 AM Linus Torvalds <torvalds@linux-
> foundation.org> wrote:
> >
> > I think what I should do is to just make "memcpy_*io()" do the "align
> > naturally" thing.
> >
> > Let me cook up a patch for you to test.
> 
> Does this work for you?
> 
> I haven't tested it at all, but I verified that the generated code seems to make
> at least some amount of sense.
> 
>                Linus

So dig into the spec and I think this is a bit relevant. 

TPM TCG according the spec requires that all buffer access is done sequentially from the start to end of the payload, 
Simply In case of skipping or going back the transaction is aborted. 
The write transactions should be 1 or power of 2. So in general 6 byte read should not work. But I'm sure our hw really obey this restriction.

Thanks
Tomas 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 19:10                               ` Linus Torvalds
  2019-01-31 19:47                                 ` Winkler, Tomas
@ 2019-01-31 20:07                                 ` Winkler, Tomas
  2019-01-31 20:47                                 ` Jarkko Sakkinen
  2 siblings, 0 replies; 30+ messages in thread
From: Winkler, Tomas @ 2019-01-31 20:07 UTC (permalink / raw)
  To: Linus Torvalds, Jarkko Sakkinen
  Cc: Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing



> -----Original Message-----
> From: Winkler, Tomas
> Sent: Thursday, January 31, 2019 21:47
> To: 'Linus Torvalds' <torvalds@linux-foundation.org>; Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; James Bottomley
> <James.Bottomley@hansenpartnership.com>; linux-integrity@vger.kernel.org;
> linux-security-module@vger.kernel.org; Linux List Kernel Mailing <linux-
> kernel@vger.kernel.org>
> Subject: RE: Getting weird TPM error after rebasing my tree to security/next-
> general
> 
> 
> >
> > On Thu, Jan 31, 2019 at 10:52 AM Linus Torvalds <torvalds@linux-
> > foundation.org> wrote:
> > >
> > > I think what I should do is to just make "memcpy_*io()" do the
> > > "align naturally" thing.
> > >
> > > Let me cook up a patch for you to test.
> >
> > Does this work for you?
> >
> > I haven't tested it at all, but I verified that the generated code
> > seems to make at least some amount of sense.
> >
> >                Linus
> 
> So dig into the spec and I think this is a bit relevant.
> 
> TPM TCG according the spec requires that all buffer access is done sequentially
> from the start to end of the payload, Simply In case of skipping or going back
> the transaction is aborted.
> The write transactions should be 1 or power of 2. So in general 6 byte read
> should not work. But I'm sure our hw really obey this restriction.

For alignment of register access the spec specifies:

Some instantiating hardware may have size and alignment restrictions when accessing any of the fields in this interface. Some hardware may require access to any of data within a field or buffer be performed by reads or writes which are naturally aligned. For this reason, software should access the fields and buffers defined in this interface using instructions which do not cause an access to cross an alignment boundary and should not use string move instructions

This is for register address, I'm not sure this spans also for request response buffer, but from the behavior looks like it applies there two.

 
Thanks
Tomas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 18:51                           ` Linus Torvalds
  2019-01-31 18:52                             ` Linus Torvalds
@ 2019-01-31 20:45                             ` Jarkko Sakkinen
  2019-02-01 18:04                               ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 20:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 10:51:03AM -0800, Linus Torvalds wrote:
> On Thu, Jan 31, 2019 at 10:35 AM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > OK, so the length of the response is not trashed, but only the error
> > code. The attached patch fully fixes the issue.
> >
> > Here's the header again:
> >
> > struct tpm_output_header {
> >         __be16  tag;
> >         __be32  length;
> >         __be32  return_code;
> > } __packed;
> >
> > The first to fields *are* read correctly and the last field get 1's
> > (thus TPM error -1).
> 
> Ok, so this makes sense, even though that patch is (I think) completely wrong.

I don't disagree on that :-) Just pinpointed the location where it
fails.

> What happens is that the 32-bit fields are mis-aligned: the "tag" is
> obviously properly 16-bit aligned, but then both "length" and
> "return_code" are 32-bit fields that are only aligned on a 16-bit
> alignment.
> 
> What happens is that first you copy the two first fields:
> 
>         memcpy_fromio(buf, priv->rsp, 6);
> 
> which copies "tag" and "length", but it copies them by reading then as
> a 4-byte and then 2-byte value (in that order). So it actually reads
> 'tag' and 'first two bytes of 'length', and then the second access
> reads the last two bytes of 'length'
> 
> And it all works, because the accesses are aligned by address of
> access, even though they are *not* aligned in the 'struct
> tpm_output_header' fields.

Right, they are still naturally aligned accesses.

> But then later on, when you read 'return_code', and do
> 
>         memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
> 
> you now do a 4-byte memcpy at offset 6. So it does a 4-byte access,
> bit it's not 4-byte aligned.
> 
> Honestly, memcpy() itself shouldn't have worked *either*, but you
> lucked out. Gcc doesn't know that it's a 4-byte access, so it actually
> calls out to the memcpy() routine. And that one happened to be "rep
> movsb" on your machine. And that happened to work.

I understand what you mean. Just surprised that this hasn't failed
before to anyone (the same driver has been even successfully used
on ARM64 with TrustZone based fTPM implementation). It has been in
for three years now.

> But it's really not supposed to work, and it really *wouldn't* have
> worked if somebody disabled the rep-string functions.
> 
> In fact, we have another patch (that isn't applied) that makes even
> the memcpy_erms() just call the sw version of memcpy() for short
> copies (because "rep movsb" is slow for those cases). That would also
> have broken your driver.
> 
>                Linus

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 19:10                               ` Linus Torvalds
  2019-01-31 19:47                                 ` Winkler, Tomas
  2019-01-31 20:07                                 ` Winkler, Tomas
@ 2019-01-31 20:47                                 ` Jarkko Sakkinen
  2019-01-31 21:58                                   ` Linus Torvalds
  2 siblings, 1 reply; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-01-31 20:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 11:10:20AM -0800, Linus Torvalds wrote:
> On Thu, Jan 31, 2019 at 10:52 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I think what I should do is to just make "memcpy_*io()" do the "align
> > naturally" thing.
> >
> > Let me cook up a patch for you to test.
> 
> Does this work for you?
> 
> I haven't tested it at all, but I verified that the generated code
> seems to make at least some amount of sense.

I'll try it first thing when I wake up tomorrow (11PM in Finland ATM).
Appreciate for taking time on this. Thank you.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 20:47                                 ` Jarkko Sakkinen
@ 2019-01-31 21:58                                   ` Linus Torvalds
  2019-01-31 23:31                                     ` Jerry Snitselaar
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-01-31 21:58 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: tomas.winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 12:47 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> I'll try it first thing when I wake up tomorrow (11PM in Finland ATM).

Thanks.

> Appreciate for taking time on this.

Hey, it was my commit that broke it for you. Even if it happened to
work before, and only did so by pure luck, it was a functional
regression.

I get very upset when other developers don't step up when *their*
changes break something, and I don't consider "it shouldn't have
worked in the first place" to be a valid excuse. You broke it, you'd
better fix it.

So I had better fix my own mess too, in order to not look too hypocritical.

And I was very aware that hardcoding the memcpy_*io() access patterns
might break something. I just _hoped_ it wouldn't, because we actually
ended up going back to the very original access patterns (but it was
from a long long time ago).

In fact, while it's slightly annoying, in many ways it's actually good
that we found breakage, and could pinpoint exactly *why* it broke.
That does validate the whole "we shouldn't just depend on the random
implementation detail of 'memcpy()'" argument.

So I'll wait to hear back whether that patch fixes things for you, but
I _think_ it will, and we'll be better off in the long range with this
whole thing.

              Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 21:58                                   ` Linus Torvalds
@ 2019-01-31 23:31                                     ` Jerry Snitselaar
  2019-02-01 11:40                                       ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Jerry Snitselaar @ 2019-01-31 23:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jarkko Sakkinen, tomas.winkler, Jason Gunthorpe, James Bottomley,
	linux-integrity, linux-security-module,
	Linux List Kernel Mailing

On Thu Jan 31 19, Linus Torvalds wrote:
>On Thu, Jan 31, 2019 at 12:47 PM Jarkko Sakkinen
><jarkko.sakkinen@linux.intel.com> wrote:
>>
>> I'll try it first thing when I wake up tomorrow (11PM in Finland ATM).
>
>Thanks.
>
>> Appreciate for taking time on this.
>
>Hey, it was my commit that broke it for you. Even if it happened to
>work before, and only did so by pure luck, it was a functional
>regression.
>
>I get very upset when other developers don't step up when *their*
>changes break something, and I don't consider "it shouldn't have
>worked in the first place" to be a valid excuse. You broke it, you'd
>better fix it.
>
>So I had better fix my own mess too, in order to not look too hypocritical.
>
>And I was very aware that hardcoding the memcpy_*io() access patterns
>might break something. I just _hoped_ it wouldn't, because we actually
>ended up going back to the very original access patterns (but it was
>from a long long time ago).
>
>In fact, while it's slightly annoying, in many ways it's actually good
>that we found breakage, and could pinpoint exactly *why* it broke.
>That does validate the whole "we shouldn't just depend on the random
>implementation detail of 'memcpy()'" argument.
>
>So I'll wait to hear back whether that patch fixes things for you, but
>I _think_ it will, and we'll be better off in the long range with this
>whole thing.
>
>              Linus

I just did a quick test here of booting and running a couple commands
(tpm2_getcap, tpm2_pcrlist), and the patch seems to work for me. I was
seeing the error during tpm_crb initialization without the patch.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 19:47                                 ` Winkler, Tomas
@ 2019-02-01  8:12                                   ` Jarkko Sakkinen
  0 siblings, 0 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-02-01  8:12 UTC (permalink / raw)
  To: Winkler, Tomas
  Cc: Linus Torvalds, Jason Gunthorpe, James Bottomley,
	linux-integrity, linux-security-module,
	Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 07:47:19PM +0000, Winkler, Tomas wrote:
> 
> > 
> > On Thu, Jan 31, 2019 at 10:52 AM Linus Torvalds <torvalds@linux-
> > foundation.org> wrote:
> > >
> > > I think what I should do is to just make "memcpy_*io()" do the "align
> > > naturally" thing.
> > >
> > > Let me cook up a patch for you to test.
> > 
> > Does this work for you?
> > 
> > I haven't tested it at all, but I verified that the generated code seems to make
> > at least some amount of sense.
> > 
> >                Linus
> 
> So dig into the spec and I think this is a bit relevant. 
> 
> TPM TCG according the spec requires that all buffer access is done
> sequentially from the start to end of the payload, Simply In case of
> skipping or going back the transaction is aborted.  The write
> transactions should be 1 or power of 2. So in general 6 byte read
> should not work. But I'm sure our hw really obey this restriction.

We could easily read the first 8 bytes instead of 6, and then check the
length. The benefit in doing that would be to be able to backport the
patch to stable kernels.

I'm fine with either solution.

Applying Linus' fix might anyway a good idea since given that tpm_crb
broke down, there is a finite probability that there might be some other
drivers that break down for similar reasons, but no one has noticed so
far.

Thus, I might even propose doing both...

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 23:31                                     ` Jerry Snitselaar
@ 2019-02-01 11:40                                       ` Jarkko Sakkinen
  0 siblings, 0 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-02-01 11:40 UTC (permalink / raw)
  To: Linus Torvalds, tomas.winkler, Jason Gunthorpe, James Bottomley,
	linux-integrity, linux-security-module,
	Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 04:31:36PM -0700, Jerry Snitselaar wrote:
> On Thu Jan 31 19, Linus Torvalds wrote:
> > On Thu, Jan 31, 2019 at 12:47 PM Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > > 
> > > I'll try it first thing when I wake up tomorrow (11PM in Finland ATM).
> > 
> > Thanks.
> > 
> > > Appreciate for taking time on this.
> > 
> > Hey, it was my commit that broke it for you. Even if it happened to
> > work before, and only did so by pure luck, it was a functional
> > regression.
> > 
> > I get very upset when other developers don't step up when *their*
> > changes break something, and I don't consider "it shouldn't have
> > worked in the first place" to be a valid excuse. You broke it, you'd
> > better fix it.
> > 
> > So I had better fix my own mess too, in order to not look too hypocritical.
> > 
> > And I was very aware that hardcoding the memcpy_*io() access patterns
> > might break something. I just _hoped_ it wouldn't, because we actually
> > ended up going back to the very original access patterns (but it was
> > from a long long time ago).
> > 
> > In fact, while it's slightly annoying, in many ways it's actually good
> > that we found breakage, and could pinpoint exactly *why* it broke.
> > That does validate the whole "we shouldn't just depend on the random
> > implementation detail of 'memcpy()'" argument.
> > 
> > So I'll wait to hear back whether that patch fixes things for you, but
> > I _think_ it will, and we'll be better off in the long range with this
> > whole thing.
> > 
> >              Linus
> 
> I just did a quick test here of booting and running a couple commands
> (tpm2_getcap, tpm2_pcrlist), and the patch seems to work for me. I was
> seeing the error during tpm_crb initialization without the patch.

Works for me too. Just submitted tpm_crb level fix too for the reasons
explained in accompanied commit message.

Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-01-31 20:45                             ` Jarkko Sakkinen
@ 2019-02-01 18:04                               ` Linus Torvalds
  2019-02-04 11:58                                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2019-02-01 18:04 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Tomas Winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Thu, Jan 31, 2019 at 12:45 PM Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> I understand what you mean. Just surprised that this hasn't failed
> before to anyone (the same driver has been even successfully used
> on ARM64 with TrustZone based fTPM implementation). It has been in
> for three years now.

Just to finish this thread off: it turns out that both ARM and ARM64
worked fine, because neither did a memcpy(), but had explicit IO copy
routines.

And in those explicit routines, 32-bit ARM did only byte accesses, and
64-bit ARM did 8-byte accesses for the bulk transfer part, but byte
accesses for the unaligned head and tail of the IO area.

So I think it's all good. x86 used to work by luck (either because all
machines that used that TPM chip always had ERMS, or because the
people who didn't have it never cared), and ARM just worked because it
would never do unaligned IO accesses anyway (well, I guess you can
force them with "readl()" on an unaligned address, but then you just
have yourself to blame).

           Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Getting weird TPM error after rebasing my tree to security/next-general
  2019-02-01 18:04                               ` Linus Torvalds
@ 2019-02-04 11:58                                 ` Jarkko Sakkinen
  0 siblings, 0 replies; 30+ messages in thread
From: Jarkko Sakkinen @ 2019-02-04 11:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tomas Winkler, Jason Gunthorpe, James Bottomley, linux-integrity,
	linux-security-module, Linux List Kernel Mailing

On Fri, Feb 01, 2019 at 10:04:35AM -0800, Linus Torvalds wrote:
> On Thu, Jan 31, 2019 at 12:45 PM Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > I understand what you mean. Just surprised that this hasn't failed
> > before to anyone (the same driver has been even successfully used
> > on ARM64 with TrustZone based fTPM implementation). It has been in
> > for three years now.
> 
> Just to finish this thread off: it turns out that both ARM and ARM64
> worked fine, because neither did a memcpy(), but had explicit IO copy
> routines.
> 
> And in those explicit routines, 32-bit ARM did only byte accesses, and
> 64-bit ARM did 8-byte accesses for the bulk transfer part, but byte
> accesses for the unaligned head and tail of the IO area.
> 
> So I think it's all good. x86 used to work by luck (either because all
> machines that used that TPM chip always had ERMS, or because the
> people who didn't have it never cared), and ARM just worked because it
> would never do unaligned IO accesses anyway (well, I guess you can
> force them with "readl()" on an unaligned address, but then you just
> have yourself to blame).
> 
>            Linus

OK, thanks for the summary. This kind of answered to my question. Should
be sufficient to include the tpm_crb fix to the 5.1 PR.

/Jarkko

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-02-04 11:58 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-18 14:25 Getting weird TPM error after rebasing my tree to security/next-general Jarkko Sakkinen
2019-01-18 22:09 ` James Bottomley
2019-01-20 16:04   ` Jarkko Sakkinen
2019-01-22  1:02     ` Jarkko Sakkinen
2019-01-22  2:58       ` Jason Gunthorpe
2019-01-22 13:29         ` Jarkko Sakkinen
2019-01-22 18:26           ` Linus Torvalds
2019-01-23 15:36             ` Jarkko Sakkinen
2019-01-23 18:43               ` Linus Torvalds
2019-01-29 13:20                 ` Jarkko Sakkinen
2019-01-31 12:26                   ` Jarkko Sakkinen
2019-01-31 16:04                     ` Jarkko Sakkinen
2019-01-31 17:06                       ` Jarkko Sakkinen
2019-01-31 17:43                         ` Linus Torvalds
2019-01-31 18:47                           ` Jarkko Sakkinen
2019-01-31 18:35                         ` Jarkko Sakkinen
2019-01-31 18:51                           ` Linus Torvalds
2019-01-31 18:52                             ` Linus Torvalds
2019-01-31 19:10                               ` Linus Torvalds
2019-01-31 19:47                                 ` Winkler, Tomas
2019-02-01  8:12                                   ` Jarkko Sakkinen
2019-01-31 20:07                                 ` Winkler, Tomas
2019-01-31 20:47                                 ` Jarkko Sakkinen
2019-01-31 21:58                                   ` Linus Torvalds
2019-01-31 23:31                                     ` Jerry Snitselaar
2019-02-01 11:40                                       ` Jarkko Sakkinen
2019-01-31 20:45                             ` Jarkko Sakkinen
2019-02-01 18:04                               ` Linus Torvalds
2019-02-04 11:58                                 ` Jarkko Sakkinen
2019-01-23 20:11               ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).