linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Babu Moger <babu.moger@amd.com>
To: linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org, shuah@kernel.org,
	jroedel@suse.de, ubizjak@gmail.com, viro@zeniv.linux.org.uk,
	jpa@git.mail.kapsi.fi, fenghua.yu@intel.com,
	kan.liang@linux.intel.com, akpm@linux-foundation.org,
	rppt@kernel.org, Fan_Yang@sjtu.edu.cn, anshuman.khandual@arm.com,
	b.thiel@posteo.de, jgross@suse.com, keescook@chromium.org,
	seanjc@google.com, mh@glandium.org, sashal@kernel.org,
	krisman@collabora.com, chang.seok.bae@intel.com,
	0x7f454c46@gmail.com, jhubbard@nvidia.com,
	sandipan@linux.ibm.com, ziy@nvidia.com,
	kirill.shutemov@linux.intel.com, suxingxing@loongson.cn,
	harish@linux.ibm.com, rong.a.chen@intel.com, linuxram@us.ibm.com,
	bauerman@linux.ibm.com, dave.kleikamp@oracle.com
Subject: x86/fpu/xsave: protection key test failures
Date: Tue, 25 May 2021 16:37:04 -0500	[thread overview]
Message-ID: <b2e0324a-9125-bb34-9e76-81817df27c48@amd.com> (raw)

Hi All,
We have been noticing protection key test failures on our systems here.
Shggy(Dave Kleikamp) from oracle reported to this problem couple of weeks
ago. Here the is the failure.

# ./protection_keys_64
has pkeys: 1
startup pkey_reg: 0000000055555550
test  0 PASSED (iteration 1)
test  1 PASSED (iteration 1)
test  2 PASSED (iteration 1)
test  3 PASSED (iteration 1)
test  4 PASSED (iteration 1)
test  5 PASSED (iteration 1)
test  6 PASSED (iteration 1)
test  7 PASSED (iteration 1)
test  8 PASSED (iteration 1)
test  9 PASSED (iteration 1)
test 10 PASSED (iteration 1)
test 11 PASSED (iteration 1)
test 12 PASSED (iteration 1)
test 13 PASSED (iteration 1)
protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
`pkey_reg == shadow_pkey_reg' failed.
Aborted (core dumped)

The test that is failing is "test_ptrace_of_child".

Sometimes it fails even earlier also. Sometimes(very rarely) it
passes when I insert few printfs. Most probably it fails with
test_ptrace_of_child.


In the test "test_ptrace_of_child", the parent thread attaches to the
child and modifies the data structure associated with protection key.
Verifies the access results based on the key permissions. While running
the test, the tool finds the key permission changes out of nowhere. The
test fails with assert  "assert(pkey_reg == shadow_pkey_reg);"


Investigation so far.
1. The test fails on AMD(Milan) systems. Test appears to pass on Intel
systems. Tested on old Intel system available here.

2. I was trying to see if the hardware(or firmware) is corrupting the pkru
register value.  At this point, it does not appear that way. I was able to
trace all the MSR writes to the application or kernel. At this point, it
does not appear to me as an hardware issue. I see that kernel appears to
do save/restore properly during the context switch. This value stays
default(value 0x55555554) in most cases unless the application changes the
default permissions. Only application that changes here is
"protection_keys".

3. I played with the test tool little bit. The behavior changes
drastically when I make minor changes.
For example, in the following code.

   void setup_handlers(void)
{
       signal(SIGCHLD, &sig_chld);
       setup_sigsegv_handler();
}

Just commenting the first line "signal(SIGCHLD, &sig_chld);" changes the
behavior drastically. I see some tests PASS after this change. The first
line appear to not do anything except some printing.

I still have not figured out completely what is going on with
setup_sigsegv_handler();. It seems very odd modifying some structures in
the signal handler. I am not sure about some of the offsets it is trying
to modify. I feel it may be messing up something there. I am not sure
though. Will have to investigate.

4. I took the traces(x86_fpu) while running test. It appears to show some
of the feature headers are not copied during the test. But I could not
figure out why it was happening. It appears to show not all the feature
headers are copied when the new child is created. When I printed the
feature bits, they all appear to be intact. Here is the trace.

 protection_keys-17350 [035] 59275.833511: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src:     x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst:     x86/fpu:
0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17351 [040] 59275.834274: x86_fpu_regs_activated:
x86/fpu: 0xffff93d722877800 load: 1 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834283: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17351 [040] 59275.834289: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834296: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834298: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834406: x86_fpu_before_save:  x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834406: x86_fpu_after_save:   x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834408: x86_fpu_before_save:  x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834408: x86_fpu_after_save:   x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834654: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834654: x86_fpu_dropped:      x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
                    auditd-1834  [036] 59275.834655:
x86_fpu_regs_activated: x86/fpu: 0xffff93d713fef800 load: 1 xfeatures: 202
xcomp_bv: 8000000000000207
 protection_keys-17350 [035] 59275.834665: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
           <...>-17285 [041] 59275.834679: x86_fpu_regs_activated:
x86/fpu: 0xffff93d716d0df40 load: 1 xfeatures: 202 xcomp_bv: 8000000000000207
   5. I instrumented parent child interactions using a separate standalone
application(without the signal handler), it appears to work just fine.  I
see PRKU register staying intact swiching from parent to child.


My suspicion at this point is towards the selftest tool protection_keys.c.
I will keep looking. Any feedback would be much appreciated to debug further.

Shaggy, Feel free to add if I missed something.

Thanks
Babu

             reply	other threads:[~2021-05-25 21:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-25 21:37 Babu Moger [this message]
2021-05-25 22:18 ` x86/fpu/xsave: protection key test failures Dave Hansen
2021-05-25 22:22   ` Dave Kleikamp
2021-05-25 22:28     ` Dave Hansen
2021-05-26  0:03   ` Babu Moger
2021-05-26  0:20     ` Dave Hansen
2021-05-26 15:25       ` Babu Moger
2021-05-26 15:50         ` Dave Kleikamp
2021-05-26 16:06         ` Dave Hansen
2021-05-26 17:03           ` Babu Moger
2021-05-26  0:36 ` Andy Lutomirski
2021-05-26 21:14   ` Babu Moger
2021-05-26 21:17     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2e0324a-9125-bb34-9e76-81817df27c48@amd.com \
    --to=babu.moger@amd.com \
    --cc=0x7f454c46@gmail.com \
    --cc=Fan_Yang@sjtu.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=b.thiel@posteo.de \
    --cc=bauerman@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.kleikamp@oracle.com \
    --cc=fenghua.yu@intel.com \
    --cc=harish@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=jpa@git.mail.kapsi.fi \
    --cc=jroedel@suse.de \
    --cc=kan.liang@linux.intel.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=krisman@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=luto@kernel.org \
    --cc=mh@glandium.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rong.a.chen@intel.com \
    --cc=rppt@kernel.org \
    --cc=sandipan@linux.ibm.com \
    --cc=sashal@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=suxingxing@loongson.cn \
    --cc=tglx@linutronix.de \
    --cc=ubizjak@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).