linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* switch_mm() can fail to load ldt on SMP
@ 2001-07-24 22:41 James Washer
  2001-07-25  0:37 ` Linus Torvalds
  0 siblings, 1 reply; 2+ messages in thread
From: James Washer @ 2001-07-24 22:41 UTC (permalink / raw)
  To: linux-kernel



We've run into a small bug in switch_mm() which results in a process
running with a 'stale' ldt.

The following timeline explains how the bug can occur

Process A begins its life on cpu 0.
Process A is context switched OFF of cpu 0 and replaced by
        the idle task (perhaps because of a slow system call).
Process A next runs on cpu 1, where it calls modify_ldt,
        to establish an ldt
Process A is context switched OFF of cpu 1
Process A is selected to run on cpu 0.

If cpu 0 has been continually running the idle task, then when
switch_mm() compares the next and prev mm_struct pointers, it will
find they are equal and NOT call load_LDT().

Process A will begin running, with the ldt pointing at default_ldt rather
than its private ldt ( mm->context.segments ).



We have two possible fixes for this problem, and welcome comments as to
which would be 'best'.

The first fix would be to patch switch_mm(), so that when the next and
prev mm pointers are equal, it checks to see if mm->context.segments
is non-null, if so, it calls load_LDT(). This will unfortunately lead
to many unnecessary calls to load_LDT(). An enhanced version of this
fix, would involve introducing a bit array into the mm_struct, one
bit per cpu. When write_ldt() first allocates the ldt for this mm_struct,
it would set all bits. Subsequently, in switch_mm(), we could
introduce  a test such as
        if(next->context.segments &&
        test_and_clear_bit(cpu,&next->ldtupdate))load_LDT(next);
so that as each engine switchs to this mm_struct, a load_LDT() is
guaranteed.

The second fix would introduce yet another IPI. When write_ldt() first
creates a new ldt, it would send an IPI to all other cpu's, passing the
mm pointer as an arg. If the other cpu's were using the mm, they would
then call load_LDT().


Which fix is better depends on the system and application. On a system
with hundreds of processes sharing the same mm_struct, the first fix
will result in quite a few calls to load_LDT(). On a system with a large
number of cpus, and short lived programs using segments, the IPI will
be wasteful.


Please respond with comments as to which patch seems best to you, or
alternative patches if you have suggestions.

thanks

 Judy Barkal and Jim Washer
	jbarkal@us.ibm.com   washer@us.ibm.com


 p.s. This problem presented itself in a small application linked to
 libpthread (for those that don't know, some versions of libpthread use
 segmentation on i386 platforms). Occasionally, the application would 
 die from a SIGSEGV.
 Investigation found that it was dying when the kernel attempted to
 restore the gs register. Further investigation showed that the ldt 
 entry in the gdt pointed to the default_ldt, leading to the 
 segmentation violation.




^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: switch_mm() can fail to load ldt on SMP
  2001-07-24 22:41 switch_mm() can fail to load ldt on SMP James Washer
@ 2001-07-25  0:37 ` Linus Torvalds
  0 siblings, 0 replies; 2+ messages in thread
From: Linus Torvalds @ 2001-07-25  0:37 UTC (permalink / raw)
  To: washer, linux-kernel

In article <200107242241.PAA05423@crg8.sequent.com> you write:
>
>We've run into a small bug in switch_mm() which results in a process
>running with a 'stale' ldt.

Good job.

>The first fix would be to patch switch_mm(), so that when the next and
>prev mm pointers are equal, it checks to see if mm->context.segments
>is non-null, if so, it calls load_LDT(). This will unfortunately lead
>to many unnecessary calls to load_LDT(). An enhanced version of this
>fix, would involve introducing a bit array into the mm_struct, one
>bit per cpu. When write_ldt() first allocates the ldt for this mm_struct,
>it would set all bits. Subsequently, in switch_mm(), we could
>introduce  a test such as
>        if(next->context.segments &&
>        test_and_clear_bit(cpu,&next->ldtupdate))load_LDT(next);

This is actually how the "mmu_context" struct is meant to be used: it's
there exactly for per-architecture context bits, and when I did the x86
part I incorrectly thought that the x86 doesn't have any MMU context.
You're obviously right that it has context, and part of the context is
just the list of CPU's that have seen the new LDT.

>Which fix is better depends on the system and application. On a system
>with hundreds of processes sharing the same mm_struct, the first fix
>will result in quite a few calls to load_LDT(). On a system with a large
>number of cpus, and short lived programs using segments, the IPI will
>be wasteful.

Done right, you should have
 - processes with a NULL segment have mm->context.ldtinvalid = 0
 - setldt() sets all bits in "mm->context.ldtinvalid"
 - switch_mm() does (in the CONFIG_SMP "else {" part)
	if (test_and_clear_bit(cpu, &next->context.ldtinvalid))
		load_LDT(next);

which has _no_ extra LDT loads except when required, and only adds one
bit clear-and-test for the one case that needs it (SMP with same mm's)

Would you like to code up this, test it and send it to me?

Btw, good debugging!

		Linus "lazy is my middle name" Torvalds

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-07-25  0:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-24 22:41 switch_mm() can fail to load ldt on SMP James Washer
2001-07-25  0:37 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).