All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Vlasov <vsu@altlinux.ru>
To: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org, stian@nixia.no
Subject: Re: timer + fpu stuff locks my console race
Date: Sat, 12 Jun 2004 17:44:14 +0400	[thread overview]
Message-ID: <20040612134413.GA3396@sirius.home> (raw)
In-Reply-To: <Pine.LNX.4.44.0406112308100.13607-100000@chimarrao.boston.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

On Fri, 11 Jun 2004 23:50:25 -0400, Rik van Riel wrote:

> On Fri, 11 Jun 2004, Rik van Riel wrote:
> 
>> Reproduced here, on my test system running a 2.6 kernel.
>> I did get a kernel backtrace over serial console, though ;)
> 
> Now I'm not sure if the process is actually stuck in kernel
> space or if it's looping tightly through both kernel and
> user space...

Here is the culprit (include/asm-i386/i387.h):

#define __clear_fpu( tsk )					\
do {								\
	if ((tsk)->thread_info->status & TS_USEDFPU) {		\
		asm volatile("fwait");				\
		(tsk)->thread_info->status &= ~TS_USEDFPU;	\
		stts();						\
	}							\
} while (0)

This is called in flush_thread() (which is used in flush_old_exec()
and therefore in sys_execve() path) and in restore_i387_fsave(),
restore_i387_fxsave() (which are reached from sys_sigreturn() and
sys_rt_sigreturn()).

The buggy code in the Stian's program corrupts the FPU state - in
particular, it results in some exception bits being set in the FPU
status word.  In this state the next FP command (except non-waiting
commands, like fnsave and fninit) will raise the FP error exception
(trap 16).  The "fwait" above happens to be that next command.

The FP error handler do_coprocessor_error() calls math_error() for
real work (both in arch/i386/traps.c).  math_error() calls
save_init_fpu(), which saves the FPU state in current->thread.i387 and
sets the TS flag; then math_error() queues a SIGFPE to the task and
returns.  If the fault comes from userspace, this is enough - on the
return path the pending signal will be noticed and delivered.
However, in this case the fault happens in the kernel code, therefore
execution just resumes at the same point - trying to reexecute that
fwait again.

At this time, however, the TS flag is set, so we get another trap -
trap 7, device_not_available.  The trap handler calls
math_state_restore(), which clears the TS flag and reloads the FP
state from current->thread.i387.  Then it returns, and the faulting
instruction is restarted again.  But it gets the same FP error
exception as at the first time...

So the CPU is stuck handling endless faults in kernel mode.

How to fix this?  A quick and dirty fix is to remove the problematic
fwait from __clear_fpu(); 2.2.x kernels did not have it - probably it
was added in some 2.3.x.

--- linux-2.6.6/include/asm-i386/i387.h.fp-lockup	2004-05-10 06:33:06 +0400
+++ linux-2.6.6/include/asm-i386/i387.h	2004-06-12 17:25:56 +0400
@@ -51,7 +51,6 @@
 #define __clear_fpu( tsk )					\
 do {								\
 	if ((tsk)->thread_info->status & TS_USEDFPU) {		\
-		asm volatile("fwait");				\
 		(tsk)->thread_info->status &= ~TS_USEDFPU;	\
 		stts();						\
 	}							\

In this case we will ignore a pending FP exception at execve() or
sigreturn() instead of raising SIGFPE (which was probably intended by
whoever put an fwait there).

If we want to be pedantic and care about such pending exceptions, we
should add a check for kernel addresses to do_coprocessor_error() and
add fixup_exception there, like we do for protection faults, so that
the handler will not attempt to restart the failing instruction again.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2004-06-12 13:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-09 21:02 timer + fpu stuff locks my console race stian
2004-06-10 21:00 ` Matias Hermanrud Fjeld
2004-06-11  6:08   ` Lars Age Kamfjord
2004-06-12  2:53 ` Rik van Riel
2004-06-12  3:50   ` Rik van Riel
2004-06-12 13:44     ` Sergey Vlasov [this message]
2004-06-12 13:57       ` stian
2004-06-12 14:28         ` Sergey Vlasov
2004-06-12 14:25       ` timer + fpu stuff locks up computer Alexander Nyberg
2004-06-12 14:42         ` stian
2004-06-12 15:20           ` martin capitanio
2004-06-12 16:15             ` stian
2004-06-12 15:14         ` Sergey Vlasov
2004-06-12 18:45           ` Sergey Vlasov
2004-06-12 20:27             ` Alexander Nyberg
2004-06-12  4:35   ` timer + fpu stuff locks my console race Matt Mackall
2004-06-10 18:59 Lars Age Kamfjord
2004-06-10 19:21 ` Lars Age Kamfjord
2004-06-10 19:27 Bård Kalbakk
2004-06-11 12:10 stian
2004-06-11 12:20 Gard Spreemann
2004-06-12 12:26 stian
2004-06-12 13:14 stian
2004-06-12 13:28 stian
2004-06-12 13:45 ` Manuel Arostegui Ramirez
2004-06-12 13:50 ` Kalin KOZHUHAROV

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040612134413.GA3396@sirius.home \
    --to=vsu@altlinux.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=stian@nixia.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.