On Tue, 16 Mar 2010, Carlos O'Donell wrote:

> After some of my own testing I think this is all MMU related, but I
> can't prove it yet. I'm pouring through as much kernel code as I can
> right now to determine what is going wrong at the time of the clone,
> and I see at least one bug that I'm investigating regarding return
> addresses.

I've attached another version of the minifail test program.  In
this one, the parent and thread both monitor the location 0x4000001c
in the stack region allocated to the thread.  If a problem is
detected, they drop core with an illegal instruction.  If the
child of the fork sees a nonzero value in the above location
when the fork call returns, it sleeps for ten seconds.

When corruption occurs and core is dropped on my c3750 (UP 32-bit
kernel), both the parent and thread have undergone many iterations
of their respective monitor loops.  The forked child always reports
seeing a nonzero value at the stack location.  The before value
in the core dump was zero (i.e., thread_run had not started).

I added an illegal instruction abort to the child.  In this case,
the thread_run loop counter was 48085 when the page was copied and
the before value was zero.

One thought that has crossed my mind is that the memory pages allocated
for the stack region used by the thread are somehow getting interchanged
between parent and child by the fork operation.  This happens fairly
late as both the parent and thread are executing post fork at the time
this happens.  Possibly, this is part of the bug.

I have looked at entry.S and pacache.S quite a bit and it's not obvious
how this could happen, although I must admit to not fully understanding
the tmp alias code.  I tend to think the bug is in the core mm code.

I see a few cleanups to entry.S.  We didn't kill the misnamed macros
(DEP, DEPI and EXTR) for example.  But I don't think these are the problem.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)