From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 31 Jul 2018 08:16:04 -0000 Received: from mx3-rdu2.redhat.com ([66.187.233.73] helo=mx1.redhat.com) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fkPp4-0007UX-Qx for speck@linutronix.de; Tue, 31 Jul 2018 10:16:03 +0200 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1BF73402315B for ; Tue, 31 Jul 2018 08:15:56 +0000 (UTC) Received: from [10.36.117.153] (ovpn-117-153.ams2.redhat.com [10.36.117.153]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DA33D215670D for ; Tue, 31 Jul 2018 08:15:54 +0000 (UTC) Subject: [MODERATED] Re: eager FPU backport for 2.6.32 References: From: Paolo Bonzini Message-ID: <8a9a93cd-f3a9-fb03-cd10-d3c72ee12129@redhat.com> Date: Tue, 31 Jul 2018 10:15:52 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="ZBLLALdnPgwM61h0et72ZR867JudJaTCw"; protected-headers="v1" To: speck@linutronix.de List-ID: This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156) --ZBLLALdnPgwM61h0et72ZR867JudJaTCw Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 31/07/2018 09:35, speck for Jiri Kosina wrote: >=20 >> For the lucky souls who have to backport eager FPU support to 2.6.32, = >> and at the risk of being larted by Thomas :) here is my current list o= f=20 >> FPU patches on top of 2.6.32. I tried to include also those that were= =20 >> already in RHEL before I started the backport. > Hi Paolo, >=20 > have have quite divergent eager FPU switching backports to 3.0 and 4.4 = > (where 4.4 is basically cherry-picked fixes from upstream, while 3.0 is= =20 > mostly a divergent minimalistic implementation), and for *both* codestr= eam=20 > we're receiving reports of Oracle database segfaulting in a way that lo= oks=20 > like register / memory corruption. We're not seeing any errors / report= s=20 > with any other userspace excercising FPU. >=20 > Have you guys at RH by chance received any such reports after switching= =20 > older kernels to eager FPU switching? No, but we had bugs with signals with the patch list I sent earlier, so here is a list of extra patches that we added on top of that list. a9241ea5fd709fc935dade130f4e3b2612bbe9e3 x86/fpu: Don't reset thread.fpu_counter 1a2a7f4ec8e3a7ac582dac4d01fcc7e8acd3bb30 x86/fpu: Don't do __thread_fpu_end() if use_eager_fpu() 9b6dba9e0798325dab427b9d60c61630ccc39b28 x86: Merge simd_math_error() into math_error() 08a744c6bfded3d5fa66f94263f81773226113d1 x86/fpu: Change math_error() to use unlazy_fpu(), kill (now) unused s= ave_init_fpu() 731bd6a93a6e9172094a2322bd0ee964bb1f4d63 x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU 14e153ef75eecae8fd0738ffb42120f4962a00cd x86, fpu: Introduce per-cpu in_kernel_fpu state 33a3ebdc077fd85f1bf4d4586eea579b297461ae x86, fpu: Don't abuse has_fpu in __kernel_fpu_begin/end() 4b2e762e2e53c721458a83d547b222178bb72a34 x86/fpu: Always allow FPU in interrupt if use_eager_fpu() e7f180dcd8ab48f18b20d7e8a7e9b39192bdf8e0 x86/fpu: Change xstateregs_get()/set() to use ->xsave.i387 rather tha= n ->fxsave 1d23c4518b1f3a03c278f23333149245c178d2a6 x86/fpu: Factor out memset(xstate, 0) in fpu_finit() paths fb14b4eadf73500d3b2104f031472a268562c047 x86/fpu: Document user_fpu_begin() 8f4d81863ba4e8dfee93bd50840f1099a296251f x86/fpu: Introduce restore_init_xstate() f893959b0898bd876673adbeb6798bdf25c034d7 x86/fpu: Don't abuse drop_init_fpu() in flush_thread() d2d0ac9a4644e00120bb9b7427a512a99d2cacc5 x86/fpu: Fold __drop_fpu() into its sole user 7575637ab293861a799f3bbafe0d8c597389f4e9 x86, fpu: Fix math_state_restore() race with kernel_fpu_begin() b85e67d1483c72b77d1bdc265aa8ba91590794c1 x86/fpu: Rename drop_init_fpu() to fpu_reset_state() c88d47480d300eaad80c213d50c9bf6077fc49bc x86/fpu: Always restore_xinit_state() when use_eager_cpu() ab6b52947545a5355154f64f449f97af9d05845f x86/fpu: Fix 32-bit signal frame handling 18ecb3bfa5a9f6fffbb3eeb4369f0b9463438ec0 x86/fpu: Load xsave pointer *after* initialization c447e76b4cabb49ddae8e49c5758f031f35d55fb (more or less redone from scratc= h) kvm/fpu: Enable eager restore kvm FPU for MPX A reproducer is after my sig. It does 10,000 iterations, but really it almost always fails within the first 3-4, and if it doesn't it fails within the first 100. Also note that I didn't backport fully lazy FPU because it scared the hell out of me. :) Paolo #include #include #include #include #include void set(unsigned long long v) { asm("movsd %[v], %%XMM0" : : [v] "m" (v) : ); } unsigned long long get(void) { unsigned long long v; asm("movsd %%XMM0, %[check]" : : [check] "m" (v) : ); return v; } volatile int signal_cnt =3D 0; void sigcld(int s, siginfo_t *si, void *ctx) { ucontext_t *uc =3D ctx; mcontext_t *mc =3D &uc->uc_mcontext; fpregset_t fpr =3D mc->fpregs; //printf("in signal handler, saved xmm0 is %llx %llx\n", fpr->_xm= m[0].element[0], fpr->_xmm[0].element[1]); signal_cnt++; } void try(int j) { int i, status; int rounds =3D 100; for (i =3D 0; i < rounds; i++) { unsigned long long correct =3D i + 0x5713AFDB2639ECA0ULL;= unsigned long long actual; signal_cnt =3D 0; set(correct); if (fork() =3D=3D 0) { exit(0); } actual =3D get(); int x =3D signal_cnt; if (correct !=3D actual) { printf("xmm0 is different for %d, %d: %llx (expec= ted %llx) signal_cnt=3D%d\n", j, i, actual, correct, signal_cnt); exit(1); } } for (i =3D 0; i < rounds; i++) { wait(&status); } } int main(void) { int i; setvbuf(stdout, NULL, _IONBF, 0); struct sigaction sigact =3D { .sa_sigaction =3D sigcld, .sa_flags= =3D SA_SIGINFO }; sigaction(SIGCLD, &sigact, NULL); for (i =3D 0; i < 10000; i++) { if ((i % 10) =3D=3D 0) putchar (((i / 10) % 10) + '0'); try(i); if ((i % 100) =3D=3D 99) printf(" %d\n", i+1); } return 0; } --ZBLLALdnPgwM61h0et72ZR867JudJaTCw--