linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: BUG: Global FPU corruption in 2.2
@ 2001-04-24  7:56 alad
  0 siblings, 0 replies; 33+ messages in thread
From: alad @ 2001-04-24  7:56 UTC (permalink / raw)
  To: David Konerding; +Cc: Ulrich Drepper, root, linux-kernel



Hi,
     I want to look into this problem. Its seems to be very interesting. But I
was not following the thread from the beginning (and I mistakely deleted all
these mails :( .. ).. I hope you won't mind answering following questions...

1) you are doing this on an MP or a uniprocessor ?
2) I want to know how are you calling sys_ptrace(Attach) and
sys_ptrace(detach).. i.e is it something linke following

      for(;;){
     sys_ptrace(attach to process);
     sys_wait4();
     sys_ptrace(detach from process);
      }

In short the sequence of system calls you are using for attaching and detaching
to the process

3) Have you tried doing attach and detach only once ? If not.. can you please
try this and let me know whether by doing attach and detach one time also
results in global FPU corruption. Please do not fork in the above process.

---------

Whenever process A calls sys_ptrace(Attach) to Process B, sys_ptrace sends
SIGSTOP to process B.
Now process B in do_signal, checks that it is being traced and then it does the
following
     current->state = TASK_STOPPED;
     notify_parent(current,SIGCHLD);
     schedule();

so now in schedule() --> __switch_to --> unlazy_fpu() function we do following
     if (current->flags & PF_USEDFPU)
          save_fpu();

In save_fpu() we do following
     fnsave current->tss.i387
     fwait;

I want to ask a question....... is it possible if 'somehow' we were not able to
save the complete floating point state with fnsave i.e. current->tss.i387 is
'invalid' after
          fnsave current->tss.i387
     fwait;

Thanks
Amol





David Konerding <dek_ml@konerding.com> on 04/23/2001 01:09:27 AM

To:   Ulrich Drepper <drepper@cygnus.com>
cc:   root@chaos.analogic.com, linux-kernel@vger.kernel.org (bcc: Amol Lad/HSS)

Subject:  Re: BUG: Global FPU corruption in 2.2




Ulrich Drepper wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
>
> > The kernel doesn't know if a process is going to use the FPU when
> > a new process is created. Only the user's code, i.e., the 'C' runtime
> > library knows.
>
> Maybe you should try to understand the kernel code and the features of
> the processor first.  The kernel can detect when the FPU is used for
> the first time.

OK, regardless of how the linux kernel actually manages the FPU for user-space

programs, does anybody have any comments on the original bugreport?

>We have found that one of our programs can cause system-wide
>corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
>run this program, the FPU gives bad results to all subsequent
>processes.

>We see this problem on dual 550MHz Xeons with 1GB RAM.  We have 64 of
>these things, and we see the problem on every node we try (dozens).
>We don't have other SMPs handy.  Uniprocessors, including other PIIIs,
>don't seem to be affected.

>Below are two programs we use to produce the behavior.  The first
>program, pi, repeatedly spawns 10 parallel computations of pi.  When
>all is well, each process prints pi as it completes.

>The second program, pt, repeatedly attaches to and detaches from
>another process.  Run pt against the root pi process until the output
>of pi begins to look wrong.  Then kill everything and run pi by itself
>again.  It will no longer produce good results.  We find that the FPU
>persistently gives bad results until we reboot.

I tried this on my dual PIII-600 runnng 2.2.19 and got exactly the behavior
described.
If it is a bug in the linux kernel (I can see nothing wrong with the source
code provided),
I would suspect probems with SMP and ptrace, somehow causing the wrong FP
registers
to be returned to a process after the scheduler restarted it.  It's very
interesting that the
PI program works fine until you run PT, but after you run PT, PI is screwed
until reboot.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 16:56     ` Christian Ehrhardt
@ 2001-04-24 20:15       ` Michal Jaegermann
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Jaegermann @ 2001-04-24 20:15 UTC (permalink / raw)
  To: linux-kernel

On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote:
> On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
> > ptrace only operates on processes that are stopped. So there are no
> > locking issues - we've synchronized on a much higher level than a
> > spinlock or semaphore.
> 
> This is only true for requests other than PTRACE_ATTACH and
> PTRACE_ATTACH is exactly what I'm worried about.

May I remind everybody that at the beginning of this thread I posted
another example, from an SMP Alpha, of FPU problems.  It certainly
was not exactly like the one under discussion but it looked that
it had a similar "smell" to it.

It looks like that to reproduce this Alpha example one needs processors
with a rather fast clock and this hardware version is not yet very
widely available.

  Michal

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 19:17   ` Victor Zandy
@ 2001-04-24 19:51     ` Alan Cox
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Cox @ 2001-04-24 19:51 UTC (permalink / raw)
  To: Victor Zandy; +Cc: Alan Cox, linux-kernel

> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> > The preferable one for performance is certainly to backport the 2.4 changes
> 
> Is it any more substantial than changing all uses of the ptrace flags
> to the new variable?

It affects asm blocks and offsets on some ports. Its not too bad tho


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 18:37 ` Alan Cox
@ 2001-04-24 19:17   ` Victor Zandy
  2001-04-24 19:51     ` Alan Cox
  0 siblings, 1 reply; 33+ messages in thread
From: Victor Zandy @ 2001-04-24 19:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> The preferable one for performance is certainly to backport the 2.4 changes

Is it any more substantial than changing all uses of the ptrace flags
to the new variable?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 18:21 Victor Zandy
@ 2001-04-24 18:37 ` Alan Cox
  2001-04-24 19:17   ` Victor Zandy
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2001-04-24 18:37 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel

> >         child->flags |= PF_PTRACED; 
> > 
> > without waiting for the child to have stopped. 
> 
> I can see how this could case PF_USEDFPU to be cleared inadvertently,
> but I do not have any ideas for testing this.  Is it clear that this
> is the source of the problem?

There is no guarantee that |= is implemented atomically - in fact its quite
likely to read

		get child->flags
		or PF_PTRACED
		write child->flags

and a PF_USEDFPU on another processor at the same instant -would- end up being
lost.

There are two fixes

1.	Make all the ops atomic (foo_bit())
2.	Split the flags

The preferable one for performance is certainly to backport the 2.4 changes


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
@ 2001-04-24 18:21 Victor Zandy
  2001-04-24 18:37 ` Alan Cox
  0 siblings, 1 reply; 33+ messages in thread
From: Victor Zandy @ 2001-04-24 18:21 UTC (permalink / raw)
  To: linux-kernel


Linus Torvalds writes:
> Ahh.. This actually _does_ look like a race on "current->flags": 
> PTRACE_ATTACH will do a 
> 
>         child->flags |= PF_PTRACED; 
> 
> without waiting for the child to have stopped. 

I can see how this could case PF_USEDFPU to be cleared inadvertently,
but I do not have any ideas for testing this.  Is it clear that this
is the source of the problem?

What would be involved in backporting the split ptrace flags to 2.2?
Are there other solutions?

Vic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 16:47 ` Christian Ehrhardt
@ 2001-04-24 18:09   ` Victor Zandy
  0 siblings, 0 replies; 33+ messages in thread
From: Victor Zandy @ 2001-04-24 18:09 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: linux-kernel

"Christian Ehrhardt" <ehrhardt@mathematik.uni-ulm.de> writes:
> Victor: Could you try to reproduce the system wide corruption if you
> add an explicit call to stts(); at the very end of __switch_to?
> This should prevent the FPU corruption from spreading.

After adding this call, I cannot reproduce the global corruption.
There is still occasional local corruption of individual pi processes
while pt is running.

Vic





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 16:10   ` Linus Torvalds
  2001-04-24 16:25     ` Alan Cox
@ 2001-04-24 16:56     ` Christian Ehrhardt
  2001-04-24 20:15       ` Michal Jaegermann
  1 sibling, 1 reply; 33+ messages in thread
From: Christian Ehrhardt @ 2001-04-24 16:56 UTC (permalink / raw)
  To: linux-kernel

On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
> ptrace only operates on processes that are stopped. So there are no
> locking issues - we've synchronized on a much higher level than a
> spinlock or semaphore.

This is only true for requests other than PTRACE_ATTACH and
PTRACE_ATTACH is exactly what I'm worried about.

   regards   Christian

-- 
THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 13:05 Victor Zandy
  2001-04-24 16:24 ` Linus Torvalds
@ 2001-04-24 16:47 ` Christian Ehrhardt
  2001-04-24 18:09   ` Victor Zandy
  1 sibling, 1 reply; 33+ messages in thread
From: Christian Ehrhardt @ 2001-04-24 16:47 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel

On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote:
> 
> He found that PF_USEDFPU was always set before the machine was broken.
> After he found that it was set about 70% of the time.

If I'm not mistaken this actully can cause GLOBAL FPU corruption.
Here's why:

Assyme for a moment that we lose either the PF_USEDFPU flag of one
process. This not only means that the current process won't have its
state saved, it also means that the next process won't have the TS bit
set. This in turn means that this new process won't get PF_USEDFPU set
and suddenly we have a second process with a corrupted FPU state.

Victor: Could you try to reproduce the system wide corruption if you
add an explicit call to stts(); at the very end of __switch_to?
This should prevent the FPU corruption from spreading.

NOTE: This is just to prove my theory, it is not and isn't meant
to be a fix for the actual problem.

   regards   Christian Ehrhardt

-- 

THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 16:10   ` Linus Torvalds
@ 2001-04-24 16:25     ` Alan Cox
  2001-04-24 16:56     ` Christian Ehrhardt
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Cox @ 2001-04-24 16:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> >1.) If I'm not mistaken switch_to changes current->flags without
> >atomic operations and without any locks and sys_ptrace changes
> >child->flags only protected by the big kernel lock.
> 
> ptrace only operates on processes that are stopped. So there are no
> locking issues - we've synchronized on a much higher level than a
> spinlock or semaphore.

In the 2.2 case the ptrace flags themselves are in the same flag set as
the PF_ flags. In 2.4 that was fixed. That means there are some bizarre cases
where current->flags might not be handled perfectly.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-24 13:05 Victor Zandy
@ 2001-04-24 16:24 ` Linus Torvalds
  2001-04-24 16:47 ` Christian Ehrhardt
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2001-04-24 16:24 UTC (permalink / raw)
  To: linux-kernel

[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has
  been fixed already and there's something else going on. Worth a look ]

In article <cpxu23etpmc.fsf@goat.cs.wisc.edu>,
Victor Zandy  <zandy@cs.wisc.edu> wrote:
>
>Someone else here traced the process flags of a FP-intensive program
>on a machine before and after it is put in the faulty FPU state.  He
>periodically sampled /proc/pid/stat while the program was running.
>
>He found that PF_USEDFPU was always set before the machine was broken.
>After he found that it was set about 70% of the time.

[ Looks closer at the ptrace synchronization ]

Ahh.. This actually _does_ look like a race on "current->flags":
PTRACE_ATTACH will do a

	child->flags |= PF_PTRACED;

without waiting for the child to have stopped.

(Aside: thinking more about the stopping logic - I'm not actually sure
the ptrace synchronization is complete wrt scheduling, as there will be
a window when the process has set the task state to TASK_STOPPED but
hasn't actually yet scheduled away. Oh, well).

All other ptrace operations (not counting killing the child) will check
that the child is quiescent.  But PTRACE_ATTACH will not, as we're just
setting up the stopping.

In 2.4.x, this bug doesn't happen because "flags" was split up into
"current->ptrace" and "current->flags".  Exactly because of locking
concerns.

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-23 16:11 ` Christian Ehrhardt
@ 2001-04-24 16:10   ` Linus Torvalds
  2001-04-24 16:25     ` Alan Cox
  2001-04-24 16:56     ` Christian Ehrhardt
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2001-04-24 16:10 UTC (permalink / raw)
  To: linux-kernel

In article <20010423161148.6465.qmail@theseus.mathematik.uni-ulm.de>,
Christian Ehrhardt <ehrhardt@mathematik.uni-ulm.de> wrote:
>
>1.) If I'm not mistaken switch_to changes current->flags without
>atomic operations and without any locks and sys_ptrace changes
>child->flags only protected by the big kernel lock.

ptrace only operates on processes that are stopped. So there are no
locking issues - we've synchronized on a much higher level than a
spinlock or semaphore.

That said, it does look like 2.2.x has a real bug, and maybe the ptrace
task stopping sycnhronization is broken..

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
@ 2001-04-24 13:05 Victor Zandy
  2001-04-24 16:24 ` Linus Torvalds
  2001-04-24 16:47 ` Christian Ehrhardt
  0 siblings, 2 replies; 33+ messages in thread
From: Victor Zandy @ 2001-04-24 13:05 UTC (permalink / raw)
  To: linux-kernel


Someone else here traced the process flags of a FP-intensive program
on a machine before and after it is put in the faulty FPU state.  He
periodically sampled /proc/pid/stat while the program was running.

He found that PF_USEDFPU was always set before the machine was broken.
After he found that it was set about 70% of the time.

Vic




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
@ 2001-04-24  8:56 alad
  0 siblings, 0 replies; 33+ messages in thread
From: alad @ 2001-04-24  8:56 UTC (permalink / raw)
  To: linux-kernel






Hi,
     I want to look into this problem. Its seems to be very interesting. But I
was not following the thread from the beginning (and I mistakely deleted all
these mails :( .. ).. I hope you won't mind answering following questions...

1) you are doing this on an MP or a uniprocessor ?
2) I want to know how are you calling sys_ptrace(Attach) and
sys_ptrace(detach).. i.e is it something linke following

      for(;;){
     sys_ptrace(attach to process);
     sys_wait4();
     sys_ptrace(detach from process);
      }

In short the sequence of system calls you are using for attaching and detaching
to the process

3) Have you tried doing attach and detach only once ? If not.. can you please
try this and let me know whether by doing attach and detach one time also
results in global FPU corruption. Please do not fork in the above process.

---------

Whenever process A calls sys_ptrace(Attach) to Process B, sys_ptrace sends
SIGSTOP to process B.
Now process B in do_signal, checks that it is being traced and then it does the
following
     current->state = TASK_STOPPED;
     notify_parent(current,SIGCHLD);
     schedule();

so now in schedule() --> __switch_to --> unlazy_fpu() function we do following
     if (current->flags & PF_USEDFPU)
          save_fpu();

In save_fpu() we do following
     fnsave current->tss.i387
     fwait;

I want to ask a question....... is it possible if 'somehow' we were not able to
save the complete floating point state with fnsave i.e. current->tss.i387 is
'invalid' after
          fnsave current->tss.i387
     fwait;

Thanks
Amol




David Konerding <dek_ml@konerding.com> on 04/23/2001 01:09:27 AM

To:   Ulrich Drepper <drepper@cygnus.com>
cc:   root@chaos.analogic.com, linux-kernel@vger.kernel.org (bcc: Amol Lad/HSS)

Subject:  Re: BUG: Global FPU corruption in 2.2




Ulrich Drepper wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
>
> > The kernel doesn't know if a process is going to use the FPU when
> > a new process is created. Only the user's code, i.e., the 'C' runtime
> > library knows.
>
> Maybe you should try to understand the kernel code and the features of
> the processor first.  The kernel can detect when the FPU is used for
> the first time.

OK, regardless of how the linux kernel actually manages the FPU for user-space

programs, does anybody have any comments on the original bugreport?

>We have found that one of our programs can cause system-wide
>corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
>run this program, the FPU gives bad results to all subsequent
>processes.

>We see this problem on dual 550MHz Xeons with 1GB RAM.  We have 64 of
>these things, and we see the problem on every node we try (dozens).
>We don't have other SMPs handy.  Uniprocessors, including other PIIIs,
>don't seem to be affected.

>Below are two programs we use to produce the behavior.  The first
>program, pi, repeatedly spawns 10 parallel computations of pi.  When
>all is well, each process prints pi as it completes.

>The second program, pt, repeatedly attaches to and detaches from
>another process.  Run pt against the root pi process until the output
>of pi begins to look wrong.  Then kill everything and run pi by itself
>again.  It will no longer produce good results.  We find that the FPU
>persistently gives bad results until we reboot.

I tried this on my dual PIII-600 runnng 2.2.19 and got exactly the behavior
described.
If it is a bug in the linux kernel (I can see nothing wrong with the source
code provided),
I would suspect probems with SMP and ptrace, somehow causing the wrong FP
registers
to be returned to a process after the scheduler restarted it.  It's very
interesting that the
PI program works fine until you run PT, but after you run PT, PI is screwed
until reboot.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/







^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
@ 2001-04-24  5:33 alad
  0 siblings, 0 replies; 33+ messages in thread
From: alad @ 2001-04-24  5:33 UTC (permalink / raw)
  To: Erik Paulson; +Cc: Christian Ehrhardt, linux-kernel, zandy








Erik Paulson <epaulson@cs.wisc.edu> on 04/24/2001 01:14:27 AM

To:   Christian Ehrhardt <ehrhardt@mathematik.uni-ulm.de>
cc:   linux-kernel@vger.kernel.org, zandy@cs.wisc.edu (bcc: Amol Lad/HSS)

Subject:  Re: BUG: Global FPU corruption in 2.2




On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote:
> On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
> >
> > We have found that one of our programs can cause system-wide
> > corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> > run this program, the FPU gives bad results to all subsequent
> > processes.
>
<...>
>
> 3.) It might be interesting to know if the problem can be triggered:
> a) If pi doesn't fork, i.e. just one process calculating pi and
> another one doing the attach/detach.

Yes, we are still able to reproduce it without calling fork (the new
program just calls
do_pi() a bunch of times, and then we attach and detach to that process)

> b) If pi doesn't do FPU Operations, i.e. only the children call do_pi.
>

You seem to need to attach and detach to a program using the fpu -
running pt on a
process that is just busy-looping over and over some integer adds does
not seem to
while running pi on the machine at the same time, but not attaching to
it does not
seem to affect the floating point state.

>>>> well... during context switching.. call to unlazy_fpu() does the following
        if (current->flags & PF_USEDFPU)
          save_fpu();

somebody earlier pointed out, for the possible race when in sys_ptrace, at the
time of attach we modify child->flags.
It really looks again strange that it is software that is causing the problem as
the code to handle FPU looks pretty clean.
still can we check current->flags when the problem occurs ?


Amol


-Erik

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-19 16:05 Victor Zandy
                   ` (3 preceding siblings ...)
  2001-04-23 16:11 ` Christian Ehrhardt
@ 2001-04-23 18:44 ` Erik Paulson
  4 siblings, 0 replies; 33+ messages in thread
From: Erik Paulson @ 2001-04-23 18:44 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: linux-kernel, zandy

On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote:
> On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
> > 
> > We have found that one of our programs can cause system-wide
> > corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> > run this program, the FPU gives bad results to all subsequent
> > processes.
> 
<...>
> 
> 3.) It might be interesting to know if the problem can be triggered:
> a) If pi doesn't fork, i.e. just one process calculating pi and
> another one doing the attach/detach.

Yes, we are still able to reproduce it without calling fork (the new
program just calls
do_pi() a bunch of times, and then we attach and detach to that process)

> b) If pi doesn't do FPU Operations, i.e. only the children call do_pi.
> 

You seem to need to attach and detach to a program using the fpu -
running pt on a 
process that is just busy-looping over and over some integer adds does
not seem to
while running pi on the machine at the same time, but not attaching to
it does not
seem to affect the floating point state.

-Erik


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-19 16:05 Victor Zandy
                   ` (2 preceding siblings ...)
  2001-04-22 20:59 ` kees
@ 2001-04-23 16:11 ` Christian Ehrhardt
  2001-04-24 16:10   ` Linus Torvalds
  2001-04-23 18:44 ` Erik Paulson
  4 siblings, 1 reply; 33+ messages in thread
From: Christian Ehrhardt @ 2001-04-23 16:11 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel

On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
> 
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> run this program, the FPU gives bad results to all subsequent
> processes.

A few comments, not sure if they will help very much:

1.) If I'm not mistaken switch_to changes current->flags without
atomic operations and without any locks and sys_ptrace changes
child->flags only protected by the big kernel lock.
I could imagine that this causes local corruption on an SMP machine
and this is something that changed in 2.4 kernels, but I don't see
how this can corrupt FPU state globally. Maybe there is something else.

2.) I guess a single finit (as proposed by someone else in this thread)
won't assure that both FPUs are in a sane state.

3.) It might be interesting to know if the problem can be triggered:
a) If pi doesn't fork, i.e. just one process calculating pi and
another one doing the attach/detach.
b) If pi doesn't do FPU Operations, i.e. only the children call do_pi.

    regards    Christian

-- 
THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-19 16:05 Victor Zandy
  2001-04-19 20:18 ` Michal Jaegermann
  2001-04-20 18:50 ` Victor Zandy
@ 2001-04-22 20:59 ` kees
  2001-04-23 16:11 ` Christian Ehrhardt
  2001-04-23 18:44 ` Erik Paulson
  4 siblings, 0 replies; 33+ messages in thread
From: kees @ 2001-04-22 20:59 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel

Hello,

Linux 2.2.19 SMP, confirm report. Even games are going weird after
running this test, (my wife is complaining :-))

Have to reboot.

Kees



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-22 18:39           ` David Konerding
@ 2001-04-22 18:59             ` Alan Cox
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Cox @ 2001-04-22 18:59 UTC (permalink / raw)
  To: David Konerding; +Cc: Ulrich Drepper, root, linux-kernel

> OK, regardless of how the linux kernel actually manages the FPU for user-space
> 
> programs, does anybody have any comments on the original bugreport?

Complete mystification.

> >of pi begins to look wrong.  Then kill everything and run pi by itself
> >again.  It will no longer produce good results.  We find that the FPU
> >persistently gives bad results until we reboot.
> 
> I tried this on my dual PIII-600 runnng 2.2.19 and got exactly the behavior
> described.

This is the most odd bit of all. The processor state for the FPU is per task
private and each task initializes its own FPU state. In terms of FPU state
itself I don't currently see what there is that can be left behind


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 21:44         ` Ulrich Drepper
  2001-04-22  1:46           ` Richard B. Johnson
@ 2001-04-22 18:39           ` David Konerding
  2001-04-22 18:59             ` Alan Cox
  1 sibling, 1 reply; 33+ messages in thread
From: David Konerding @ 2001-04-22 18:39 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: root, linux-kernel

Ulrich Drepper wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
>
> > The kernel doesn't know if a process is going to use the FPU when
> > a new process is created. Only the user's code, i.e., the 'C' runtime
> > library knows.
>
> Maybe you should try to understand the kernel code and the features of
> the processor first.  The kernel can detect when the FPU is used for
> the first time.

OK, regardless of how the linux kernel actually manages the FPU for user-space

programs, does anybody have any comments on the original bugreport?

>We have found that one of our programs can cause system-wide
>corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
>run this program, the FPU gives bad results to all subsequent
>processes.

>We see this problem on dual 550MHz Xeons with 1GB RAM.  We have 64 of
>these things, and we see the problem on every node we try (dozens).
>We don't have other SMPs handy.  Uniprocessors, including other PIIIs,
>don't seem to be affected.

>Below are two programs we use to produce the behavior.  The first
>program, pi, repeatedly spawns 10 parallel computations of pi.  When
>all is well, each process prints pi as it completes.

>The second program, pt, repeatedly attaches to and detaches from
>another process.  Run pt against the root pi process until the output
>of pi begins to look wrong.  Then kill everything and run pi by itself
>again.  It will no longer produce good results.  We find that the FPU
>persistently gives bad results until we reboot.

I tried this on my dual PIII-600 runnng 2.2.19 and got exactly the behavior
described.
If it is a bug in the linux kernel (I can see nothing wrong with the source
code provided),
I would suspect probems with SMP and ptrace, somehow causing the wrong FP
registers
to be returned to a process after the scheduler restarted it.  It's very
interesting that the
PI program works fine until you run PT, but after you run PT, PI is screwed
until reboot.




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-22  2:18             ` Alan Cox
@ 2001-04-22  2:30               ` Richard B. Johnson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard B. Johnson @ 2001-04-22  2:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ulrich Drepper, linux-kernel

On Sun, 22 Apr 2001, Alan Cox wrote:

> > Only if it traps on the esc op-code --and if it does, we are in a
> > world or hurt for performance. There is no other way that the kernel
> 
> FPU lazy task switch exceptions are a feature of X86 hardware. Have been for
> a very very long time.
> 

Hmmm.. Okay I stand corrected. Guess I haven't checked for a very very
long time.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-22  1:46           ` Richard B. Johnson
@ 2001-04-22  2:18             ` Alan Cox
  2001-04-22  2:30               ` Richard B. Johnson
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2001-04-22  2:18 UTC (permalink / raw)
  To: root; +Cc: Ulrich Drepper, linux-kernel

> Only if it traps on the esc op-code --and if it does, we are in a
> world or hurt for performance. There is no other way that the kernel

FPU lazy task switch exceptions are a feature of X86 hardware. Have been for
a very very long time.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 21:44         ` Ulrich Drepper
@ 2001-04-22  1:46           ` Richard B. Johnson
  2001-04-22  2:18             ` Alan Cox
  2001-04-22 18:39           ` David Konerding
  1 sibling, 1 reply; 33+ messages in thread
From: Richard B. Johnson @ 2001-04-22  1:46 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

On 20 Apr 2001, Ulrich Drepper wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
> 
> > The kernel doesn't know if a process is going to use the FPU when
> > a new process is created. Only the user's code, i.e., the 'C' runtime
> > library knows.
> 
> Maybe you should try to understand the kernel code and the features of
> the processor first.  The kernel can detect when the FPU is used for
> the first time.
> 

Only if it traps on the esc op-code --and if it does, we are in a
world or hurt for performance. There is no other way that the kernel
can 'protect' on a per-process basis since the FPU executes instructions
in "process-owned" address space, and addresses "process-owned" data.

I'll have to check this out. Of course it traps on all such instructions
if we have a '386 (so the FPU can be emulated), but that was never a
performance issue.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:37       ` Richard B. Johnson
  2001-04-20 20:20         ` Victor Zandy
@ 2001-04-20 21:44         ` Ulrich Drepper
  2001-04-22  1:46           ` Richard B. Johnson
  2001-04-22 18:39           ` David Konerding
  1 sibling, 2 replies; 33+ messages in thread
From: Ulrich Drepper @ 2001-04-20 21:44 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

"Richard B. Johnson" <root@chaos.analogic.com> writes:

> The kernel doesn't know if a process is going to use the FPU when
> a new process is created. Only the user's code, i.e., the 'C' runtime
> library knows.

Maybe you should try to understand the kernel code and the features of
the processor first.  The kernel can detect when the FPU is used for
the first time.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:37       ` Richard B. Johnson
@ 2001-04-20 20:20         ` Victor Zandy
  2001-04-20 21:44         ` Ulrich Drepper
  1 sibling, 0 replies; 33+ messages in thread
From: Victor Zandy @ 2001-04-20 20:20 UTC (permalink / raw)
  To: root; +Cc: Ulrich Drepper, linux-kernel, pcroth, epaulson


It looks to me like the kernel sets a trap for FP operations when a
process is switched in.  Then when the process executes an FP op, the
kernel clears the trap and either loads the FP context or initializes
it, depending on whether it is the process' first FP operation.  So no
help is need from anything in user space.

Vic

"Richard B. Johnson" <root@chaos.analogic.com> writes:
> On 20 Apr 2001, Ulrich Drepper wrote:
> 
> > "Richard B. Johnson" <root@chaos.analogic.com> writes:
> > 
> > > If it "fixes" it, there is no problem with the FPU, but with the
> > > 'C' runtime library which doesn't initialize the FPU to a known
> > > state before it uses it.
> > 
> > It's the kernel which initializes the FPU.  This was always the case
> > and necessary to implement the fast lazy FPU saving/restoring.
> > Processes which never use the FPU never initialize it.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> The kernel doesn't know if a process is going to use the FPU when
> a new process is created. Only the user's code, i.e., the 'C' runtime
> library knows. If the user is using 'asm' or whatever, the user must


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:20     ` Victor Zandy
@ 2001-04-20 19:44       ` Richard B. Johnson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard B. Johnson @ 2001-04-20 19:44 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel, pcroth, epaulson

On 20 Apr 2001, Victor Zandy wrote:

> 
> No dice.  Your program does not fix the problem.
> 
> If it were a hardware problem, I would expect the problem to occur
> under 2.4.2 as well as 2.2.*, and I would be surprised that we can
> consistently produce the behavior across our 64 node cluster.  But we
> are keeping the possibility in mind.
> 
> Thanks for your suggestions.
> 
> Vic
> 

Then, if the FPU is fine, you have just proven that the storage
where the FPU context is saved, gets overwritten. Further, once the
initial write occurs, all subsequent fnsave/frestore operations also
encounter the same spurious write. --OR some continuously-running
floating-point has sneaked into the kernel.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:23     ` Ulrich Drepper
@ 2001-04-20 19:37       ` Richard B. Johnson
  2001-04-20 20:20         ` Victor Zandy
  2001-04-20 21:44         ` Ulrich Drepper
  0 siblings, 2 replies; 33+ messages in thread
From: Richard B. Johnson @ 2001-04-20 19:37 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Victor Zandy, linux-kernel, pcroth, epaulson

On 20 Apr 2001, Ulrich Drepper wrote:

> "Richard B. Johnson" <root@chaos.analogic.com> writes:
> 
> > If it "fixes" it, there is no problem with the FPU, but with the
> > 'C' runtime library which doesn't initialize the FPU to a known
> > state before it uses it.
> 
> It's the kernel which initializes the FPU.  This was always the case
> and necessary to implement the fast lazy FPU saving/restoring.
> Processes which never use the FPU never initialize it.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The kernel doesn't know if a process is going to use the FPU when
a new process is created. Only the user's code, i.e., the 'C' runtime
library knows. If the user is using 'asm' or whatever, the user must
initialize the FPU before using it, otherwise, the user doesn't know
anything about its state and the results ... (let's see, what was at
TOS, errm, is this a NAN?). The results are indeterminate.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:07   ` Richard B. Johnson
  2001-04-20 19:20     ` Victor Zandy
@ 2001-04-20 19:23     ` Ulrich Drepper
  2001-04-20 19:37       ` Richard B. Johnson
  1 sibling, 1 reply; 33+ messages in thread
From: Ulrich Drepper @ 2001-04-20 19:23 UTC (permalink / raw)
  To: root; +Cc: Victor Zandy, linux-kernel, pcroth, epaulson

"Richard B. Johnson" <root@chaos.analogic.com> writes:

> If it "fixes" it, there is no problem with the FPU, but with the
> 'C' runtime library which doesn't initialize the FPU to a known
> state before it uses it.

It's the kernel which initializes the FPU.  This was always the case
and necessary to implement the fast lazy FPU saving/restoring.
Processes which never use the FPU never initialize it.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 19:07   ` Richard B. Johnson
@ 2001-04-20 19:20     ` Victor Zandy
  2001-04-20 19:44       ` Richard B. Johnson
  2001-04-20 19:23     ` Ulrich Drepper
  1 sibling, 1 reply; 33+ messages in thread
From: Victor Zandy @ 2001-04-20 19:20 UTC (permalink / raw)
  To: root; +Cc: linux-kernel, pcroth, epaulson


No dice.  Your program does not fix the problem.

If it were a hardware problem, I would expect the problem to occur
under 2.4.2 as well as 2.2.*, and I would be surprised that we can
consistently produce the behavior across our 64 node cluster.  But we
are keeping the possibility in mind.

Thanks for your suggestions.

Vic

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-20 18:50 ` Victor Zandy
@ 2001-04-20 19:07   ` Richard B. Johnson
  2001-04-20 19:20     ` Victor Zandy
  2001-04-20 19:23     ` Ulrich Drepper
  0 siblings, 2 replies; 33+ messages in thread
From: Richard B. Johnson @ 2001-04-20 19:07 UTC (permalink / raw)
  To: Victor Zandy; +Cc: linux-kernel, pcroth, epaulson

On 20 Apr 2001, Victor Zandy wrote:

> 
> Victor Zandy <zandy@cs.wisc.edu> writes:
> > We have found that one of our programs can cause system-wide
> > corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> > run this program, the FPU gives bad results to all subsequent
> > processes.
> 
> We have now tested 2.4.2 and 2.2.19.
> 
> 2.2.19 has the same problem.
> 
> 2.4.3 does not seem to be affected.  Unfortunately, we really need a
> working 2.2 kernel at this time.
> 
> We also patched the 2.2.19 kernel with the PIII patch found in
> /pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre13/PIII-10.bz2
> on ftp.kernel.org.  Same problem.
> 
> Does anyone have any ideas for us?
> 
> Thanks.
> 
> Vic

Just for kicks, do whatever is necessary to "break" the fpu. Then run
this program:

int  main()
{
        __asm__("finit\n");
        return 0;
}

If it "fixes" it, there is no problem with the FPU, but with the
'C' runtime library which doesn't initialize the FPU to a known
state before it uses it. It is possible for the kernel to work
around th 'C' library problem by clearing the FPU after every
fork(). The last time I checked (years ago), 'finit' was executed
during the fork. Maybe it isn't anymore because it takes many
machine-cycles to complete.

If this doesn't "fix" it, then your hardware may have a problem
like overheating, etc., (loose heatsink?).


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-19 16:05 Victor Zandy
  2001-04-19 20:18 ` Michal Jaegermann
@ 2001-04-20 18:50 ` Victor Zandy
  2001-04-20 19:07   ` Richard B. Johnson
  2001-04-22 20:59 ` kees
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Victor Zandy @ 2001-04-20 18:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: pcroth, epaulson


Victor Zandy <zandy@cs.wisc.edu> writes:
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> run this program, the FPU gives bad results to all subsequent
> processes.

We have now tested 2.4.2 and 2.2.19.

2.2.19 has the same problem.

2.4.3 does not seem to be affected.  Unfortunately, we really need a
working 2.2 kernel at this time.

We also patched the 2.2.19 kernel with the PIII patch found in
/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre13/PIII-10.bz2
on ftp.kernel.org.  Same problem.

Does anyone have any ideas for us?

Thanks.

Vic


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: BUG: Global FPU corruption in 2.2
  2001-04-19 16:05 Victor Zandy
@ 2001-04-19 20:18 ` Michal Jaegermann
  2001-04-20 18:50 ` Victor Zandy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Michal Jaegermann @ 2001-04-19 20:18 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
> 
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17.
....
> 
> We see this problem on dual 550MHz Xeons with 1GB RAM.

Hm, I started to wonder if this is not somewhat related to a recent
report I got.  "The victim" was running 2.2.19 (basically) on an SMP
Alpha UP2000+ with two 800 MHz processors.  He managed to reduce the
problem to a rather small test case and I attach sources,  Makefile and
a "loop.sh" driver as a shar archive if you want to have a closer look.

This "loop.sh" simply fires triplets of "harry" process in a loop.
The guy hit by this gets apparently random floating point exceptions
starting with roughly sixth process and later intervals between bombs
will vary.  I have also 'strace' outputs from failing processes but
they are not telling very much.  'gdb' is also not very illuminating:

Program received signal SIGFPE, Arithmetic exception.
0x1200010a8 in vadd_ (a=0x11fff21e4, ia=0x120003294, b=0x11fff7004, 
    ib=0x120003294, c=0x11fffbe20, ic=0x120003294, n=0x11ffffc70) at vadd.f:99
99               C(CI) = A(AI) + B(BI)
Current language:  auto; currently fortran

(gdb) p *ia
$10 = 1
(gdb) p *ib
$11 = 1
(gdb) p *ic
$12 = 1
(gdb) p *n
Cannot access memory at address 0x4
(gdb) p *(0x11ffffc70)
$13 = 1024

(gdb) info locals
n = (PTR TO -> ( integer )) 0x4
__g77_expr_0 = 10


He tells me that he is getting that on two different machines he has
around.

The trouble is that I tried to repeat that with different hardware,
kernels, compilers and libraries and I failed even on SMP; but I got an
access to a box with only 667 MHz processors.  OTOH he is running
right now 2.4.3-ac9 plus Andrea Arcangeli patches for rw semaphores
on Alpha and he reports that the problem went away (and, hopefuly,
nothing else will crop out :-).

Anybody can offer an insight what that may really be?  It may be,
of course, totally unrelated to this report from Victor Zandy.

  Michal
  michal@harddata.com


[-- Attachment #2: fpbomb.shar --]
[-- Type: application/x-shar, Size: 12565 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* BUG: Global FPU corruption in 2.2
@ 2001-04-19 16:05 Victor Zandy
  2001-04-19 20:18 ` Michal Jaegermann
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Victor Zandy @ 2001-04-19 16:05 UTC (permalink / raw)
  To: linux-kernel


We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
run this program, the FPU gives bad results to all subsequent
processes.

We see this problem on dual 550MHz Xeons with 1GB RAM.  We have 64 of
these things, and we see the problem on every node we try (dozens).
We don't have other SMPs handy.  Uniprocessors, including other PIIIs,
don't seem to be affected.

While we prepare to test for the problem on more recent 2.2 and 2.4
kernels, we would appreciate hearing from anyone who may have insight
into it.

Below are two programs we use to produce the behavior.  The first
program, pi, repeatedly spawns 10 parallel computations of pi.  When
all is well, each process prints pi as it completes.

The second program, pt, repeatedly attaches to and detaches from
another process.  Run pt against the root pi process until the output
of pi begins to look wrong.  Then kill everything and run pi by itself
again.  It will no longer produce good results.  We find that the FPU
persistently gives bad results until we reboot.

Here is the sort of thing we see:

BEFORE                  AFTER
--------------------------------------
c36% ./pi               c36% ./pi        
[3883]                  [4069]           
3.141593                6865157.146714   
3.141593                inf              
3.141593                81705.277947     
3.141593                4.742524         
3.141593                nan              
3.141593                585.810296       
3.141593                inf              
3.141593                4.578857         
3.141593                nan              
3.141593                4.578857         

I am not currently subscribed to linux-kernel.  I'll be checking the
web archives, but please CC replies to me.

Thanks!

Vic Zandy

/* pi.c: gcc -g -o pi pi.c -lm */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <errno.h>

static double
do_pi()
{
	double sum=0.0;
	double x=1.0;
	double s=1.0;
	double pi;

	while (x <= 10000000.0)	{
		sum += (1.0/pow(x, 3.0))*s;
		s = -s;
		x += 2.0;
	}
	pi = pow(sum*32.0, 1.0/3.0);
	return pi;
}

int
main( int argc, char* argv[] )
{
	int i;
	int pid;
	int m = 1000;   /* runs */
	int n = 10;     /* procs per run */

	pid = getpid();
	fprintf(stderr, "[%d]\n", pid);
	while (m-- > 0) {
	     for (i = 1; i < n; i++)
		  if (!fork())
		       break;
	     fprintf(stderr, "%f\n", do_pi());
	     if (getpid() != pid)
		  return 0;
	     while (waitpid(0, 0, WNOHANG) > 0)
		  ;
	}
	return 0;
}
/* end of pi.c */

/* pt.c: gcc -g -o pt pt.c */
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <string.h>
#include <linux/ptrace.h>

long
dptrace(int req, pid_t pid, void *addr, void *data)
{
	char buf[64];
	int rv;
	rv = ptrace(req, pid, addr, data);
	if ((req != PTRACE_PEEKUSR && req != PTRACE_PEEKTEXT) && 0 > rv) {
		sprintf(buf, "ptrace (req=%d)", req);
		perror(buf);
		exit(1);
	}
	return rv;
}

int
main(int argc, char *argv[])
{
	int pid;
	char buf[1024];
	int n;

	if (argc < 2) {
		fprintf(stderr, "Usage: %s PID\n", argv[0]);
		exit(1);
	}
	pid = atoi(argv[1]);
	while (1) {
		dptrace(PTRACE_ATTACH, pid, 0, 0);
		waitpid(pid, 0, 0);
		dptrace(PTRACE_DETACH, pid, 0, 0);
		fprintf(stderr, ".");
	}
	return 0;
}
/* end of pt.c */



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2001-04-24 20:15 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-24  7:56 BUG: Global FPU corruption in 2.2 alad
  -- strict thread matches above, loose matches on Subject: below --
2001-04-24 18:21 Victor Zandy
2001-04-24 18:37 ` Alan Cox
2001-04-24 19:17   ` Victor Zandy
2001-04-24 19:51     ` Alan Cox
2001-04-24 13:05 Victor Zandy
2001-04-24 16:24 ` Linus Torvalds
2001-04-24 16:47 ` Christian Ehrhardt
2001-04-24 18:09   ` Victor Zandy
2001-04-24  8:56 alad
2001-04-24  5:33 alad
2001-04-19 16:05 Victor Zandy
2001-04-19 20:18 ` Michal Jaegermann
2001-04-20 18:50 ` Victor Zandy
2001-04-20 19:07   ` Richard B. Johnson
2001-04-20 19:20     ` Victor Zandy
2001-04-20 19:44       ` Richard B. Johnson
2001-04-20 19:23     ` Ulrich Drepper
2001-04-20 19:37       ` Richard B. Johnson
2001-04-20 20:20         ` Victor Zandy
2001-04-20 21:44         ` Ulrich Drepper
2001-04-22  1:46           ` Richard B. Johnson
2001-04-22  2:18             ` Alan Cox
2001-04-22  2:30               ` Richard B. Johnson
2001-04-22 18:39           ` David Konerding
2001-04-22 18:59             ` Alan Cox
2001-04-22 20:59 ` kees
2001-04-23 16:11 ` Christian Ehrhardt
2001-04-24 16:10   ` Linus Torvalds
2001-04-24 16:25     ` Alan Cox
2001-04-24 16:56     ` Christian Ehrhardt
2001-04-24 20:15       ` Michal Jaegermann
2001-04-23 18:44 ` Erik Paulson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).