linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* desc v0.61 found a 2.5 kernel bug
@ 2003-04-27 21:09 Chuck Ebbert
  2003-04-28 10:34 ` Gabriel Paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2003-04-27 21:09 UTC (permalink / raw)
  To: linux-kernel



desc v0.61 running on Linux 2.5.68-rel:

 GDT at c0306300, 32 entries:

GDT# 12: base:00000000 limit:ffffffff  flags:c09b <P:1 DPL:0 32-bit Code>
GDT# 13: base:00000000 limit:ffffffff  flags:c093 <P:1 DPL:0 RW Data>
GDT# 14: base:00000000 limit:ffffffff  flags:c0fb <P:1 DPL:3 32-bit Code>
GDT# 15: base:00000000 limit:ffffffff  flags:c0f3 <P:1 DPL:3 RW Data>
GDT# 16: base:c0353800 limit:000eb     flags:008b <P:1 DPL:0 Busy TSS>

    TSS at c0353800, 236 bytes:

   CS:0000 <GDT#00,RPL0>   EIP:00000000   eflags:00000000
  SS0:0068 <GDT#13,RPL0>  ESP0:c2806000
   SS:0000 <GDT#00,RPL0>   ESP:00000000
   DS:0000 <GDT#00,RPL0>  ES:0000 <GDT#00,RPL0>
   FS:0000 <GDT#00,RPL0>  GS:0000 <GDT#00,RPL0>
  LDT:0011 <GDT#02,RPL1>   CR3:00000000
      ^^^^                     ^^^^^^^^


 The LDT in the kernel's TSS is wrong -- it's shifted right by three

bits and should be 0088 <GDT entry #17, RPL 0>

 And shouldn't CR3 be intitialized in case anyone actually wants to
switch back to the kernel TSS?


------
 Chuck

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
  2003-04-27 21:09 desc v0.61 found a 2.5 kernel bug Chuck Ebbert
@ 2003-04-28 10:34 ` Gabriel Paubert
  0 siblings, 0 replies; 9+ messages in thread
From: Gabriel Paubert @ 2003-04-28 10:34 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

On Sun, Apr 27, 2003 at 05:09:04PM -0400, Chuck Ebbert wrote:
> 
> 
> desc v0.61 running on Linux 2.5.68-rel:
> 
>  GDT at c0306300, 32 entries:
> 
> GDT# 12: base:00000000 limit:ffffffff  flags:c09b <P:1 DPL:0 32-bit Code>
> GDT# 13: base:00000000 limit:ffffffff  flags:c093 <P:1 DPL:0 RW Data>
> GDT# 14: base:00000000 limit:ffffffff  flags:c0fb <P:1 DPL:3 32-bit Code>
> GDT# 15: base:00000000 limit:ffffffff  flags:c0f3 <P:1 DPL:3 RW Data>
> GDT# 16: base:c0353800 limit:000eb     flags:008b <P:1 DPL:0 Busy TSS>
> 
>     TSS at c0353800, 236 bytes:
> 
>    CS:0000 <GDT#00,RPL0>   EIP:00000000   eflags:00000000
>   SS0:0068 <GDT#13,RPL0>  ESP0:c2806000
>    SS:0000 <GDT#00,RPL0>   ESP:00000000
>    DS:0000 <GDT#00,RPL0>  ES:0000 <GDT#00,RPL0>
>    FS:0000 <GDT#00,RPL0>  GS:0000 <GDT#00,RPL0>
>   LDT:0011 <GDT#02,RPL1>   CR3:00000000
>       ^^^^                     ^^^^^^^^
> 
> 
>  The LDT in the kernel's TSS is wrong -- it's shifted right by three

It would only be used if we ever performed a hardware task switch
back to the kernel's default TSS. However, it's clearly wrong.
> 
> bits and should be 0088 <GDT entry #17, RPL 0>
> 
>  And shouldn't CR3 be intitialized in case anyone actually wants to
> switch back to the kernel TSS?

For now no, since the only task gate ever taken (double fault), never
returns (you don't want to update the TSS's CR3 field on every 
switch_to() so you would have to do it in the task gate return 
path, as well as having a correct LDT field).

However, returning from a task gate is so much fraught with races wrt 
segment registers that the best thing to do is to avoid it. Read out 
the details on how segment registers are reloaded on a hardware task 
switch to convince yourself.

	Gabriel

> 
> 
> ------
>  Chuck
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
  2003-05-11  3:50 Chuck Ebbert
@ 2003-05-11 17:22 ` paubert
  0 siblings, 0 replies; 9+ messages in thread
From: paubert @ 2003-05-11 17:22 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

On Sat, May 10, 2003 at 11:50:07PM -0400, Chuck Ebbert wrote:
> Gabriel Paubert wrote:
> 
> > The devil is in the details: you have to edit the TSS, clear the busy bit
> > of the previous TSS, LTR, clear the busy bit of the debug TSS, restore
> > many registers from the previous TSS image, switch to the kernel stack of
> > the interrupted process, push a lot of stuff on the stack to be used by iret.
> > (depending on whether you return to kernel/user/v86 modes). All of this in the 
> > right order, of course (and after having cleared your own NT flag).
> 
>  And this is the way to do it right, but...

And you don't need to keep cr3 in the TSS.
> 
> > Doable I believe but not simple, and there is still the TS issue.
> 
>  I finally realized the TS problem is basically unsolvable.  There is no
> way to know what the value was before a switch happened.

I believe the TS value can be inferred from the thread flags except
between kernel_fpu_begin() and kernel_fpu_end().
 
>  (BTW some other Free kernel has interesting things in its descriptor
> tables: DPL 1 execute-only code segments, conforming code, expand-down
> data, multiple LDTs etc...  It uncovered a bug in my code, too.)

Interesting, but using 3 privilege levels is not very portable, and
you'll need another per process stack.

Multiple LDT, how can this be useful and what are the semantics? 
There are enough problems with LDT eating up vmalloc space (I believe 
I have a solution to that particular problem).

	Gabriel.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
@ 2003-05-11  3:50 Chuck Ebbert
  2003-05-11 17:22 ` paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2003-05-11  3:50 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linux-kernel

Gabriel Paubert wrote:

> The devil is in the details: you have to edit the TSS, clear the busy bit
> of the previous TSS, LTR, clear the busy bit of the debug TSS, restore
> many registers from the previous TSS image, switch to the kernel stack of
> the interrupted process, push a lot of stuff on the stack to be used by iret.
> (depending on whether you return to kernel/user/v86 modes). All of this in the 
> right order, of course (and after having cleared your own NT flag).

 And this is the way to do it right, but...

> Doable I believe but not simple, and there is still the TS issue.

 I finally realized the TS problem is basically unsolvable.  There is no
way to know what the value was before a switch happened.


 (BTW some other Free kernel has interesting things in its descriptor
tables: DPL 1 execute-only code segments, conforming code, expand-down
data, multiple LDTs etc...  It uncovered a bug in my code, too.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
@ 2003-05-09  8:58 Chuck Ebbert
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Ebbert @ 2003-05-09  8:58 UTC (permalink / raw)
  To: paubert; +Cc: linux-kernel

paubert wrote:

>>   invalid FS,GS -> 0
>>      "    DS,ES -> __USER_DS
>>           CS,SS -> panic?
>
> It's still racy on SMP if a thread with the same MM is modifying the LDT
> between the time you check whether the selectors are valid and the iret
> instruction restoring the previous stack.

 Probably nothing can be done about that, either.  Handling invalid segment
with another hardware task doesn't help since the trap occurs in the context
of the new task and there's no way to tell what happened by then.

>> 
>>  Bad things can happen if a debug fault happens in certain places... for now
>> the solution is to only support int3 breakpoints and avoid those places.
>
> Can you elaborate a bit, in which places?

 I never even implemented the above checks; there is just a comment in the code
where they belong. It ran for five days that way, then generated a string
of segfaults while trying to shut down.

>> 
>>  Given the above, I hope to be able to put int3 instructions in either
>> kernel or user code and get snapshots of CPU state in the kernel TSS.
>
> And what about the little bit called TS in CR0 which is always set by 
> a task switch.

 Forgot all about that one.  Maybe pushing cs:eip and flags onto the kernel's
stack and returning to an iret in the kernel task would work?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
  2003-04-30 20:08 Chuck Ebbert
@ 2003-05-08 22:54 ` paubert
  0 siblings, 0 replies; 9+ messages in thread
From: paubert @ 2003-05-08 22:54 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

On Wed, Apr 30, 2003 at 04:08:05PM -0400, Chuck Ebbert wrote:
[Sorry for the delay I've been extremely busy on other things]
> 
>  Looks like the only clean way is to follow the TSS back link and manually
> validate the segment registers before returning:
> 
>   invalid FS,GS -> 0
>      "    DS,ES -> __USER_DS
>           CS,SS -> panic?

It's still racy on SMP if a thread with the same MM is modifying the LDT
between the time you check whether the selectors are valid and the iret
instruction restoring the previous stack.

> 
>  Bad things can happen if a debug fault happens in certain places... for now
> the solution is to only support int3 breakpoints and avoid those places.

Can you elaborate a bit, in which places?

> 
>  Given the above, I hope to be able to put int3 instructions in either
> kernel or user code and get snapshots of CPU state in the kernel TSS.

And what about the little bit called TS in CR0 which is always set by 
a task switch. That's one bit of state which will be always set when
the debug interrupt returns, and the current code for FPU will be
confused by this AFAICT. Things become even more interesting if you 
want to allow debug traps between in the kernel routines using the
FPU, between kernel_fpu_begin() and kernel_fpu_end().

	Gabriel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
@ 2003-04-30 20:08 Chuck Ebbert
  2003-05-08 22:54 ` paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2003-04-30 20:08 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linux-kernel

Gabriel Paubert wrote:

>>   I want to write a TSS-based debug exception handler that just does
>> an iret when it gets invoked.  For now it looks easier to just keep
>> CR3 up-to-date on every switch.
>
> It seems cr3 is in the same cache line as esp0 for a 32 byte cache line, 
> so it's not that big a deal, but I'd still try to avoid this.

 There's no easy way of fixing this up in the handler, so that's the plan
for now.  It also puts more info in the TSS dump right away.

.> Currently %fs and %gs are lazily cleaned up when switching processes
.> using the standard fixup mechanism, %ds and %es are cleaned up if
.> necessary when popping them off the stack in the return to user
.> mode path (the one which ends up in iret). There is no way to recover
.> from bad user %cs/%ss, the process simply exits in the iret fixup.

 Looks like the only clean way is to follow the TSS back link and manually
validate the segment registers before returning:

  invalid FS,GS -> 0
     "    DS,ES -> __USER_DS
          CS,SS -> panic?

 Bad things can happen if a debug fault happens in certain places... for now
the solution is to only support int3 breakpoints and avoid those places.

 Given the above, I hope to be able to put int3 instructions in either
kernel or user code and get snapshots of CPU state in the kernel TSS.
------
 Chuck

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
  2003-04-30  2:33 Chuck Ebbert
@ 2003-04-30 17:10 ` Gabriel Paubert
  0 siblings, 0 replies; 9+ messages in thread
From: Gabriel Paubert @ 2003-04-30 17:10 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

On Tue, Apr 29, 2003 at 10:33:09PM -0400, Chuck Ebbert wrote:
> Gabriel Paubert wrote:
> 
> >>  And shouldn't CR3 be intitialized in case anyone actually wants to
> >> switch back to the kernel TSS?
> >
> > For now no, since the only task gate ever taken (double fault), never
> > returns (you don't want to update the TSS's CR3 field on every 
> > switch_to() so you would have to do it in the task gate return 
> > path, as well as having a correct LDT field).
> 
>   I want to write a TSS-based debug exception handler that just does
> an iret when it gets invoked.  For now it looks easier to just keep
> CR3 up-to-date on every switch.

It seems cr3 is in the same cache line as esp0 for a 32 byte cache line, 
so it's not that big a deal, but I'd still try to avoid this.

> 
> > However, returning from a task gate is so much fraught with races wrt 
> > segment registers that the best thing to do is to avoid it.
> 
>  Even with interrupts off?

Yes. Consider the following:

	create an LDT entry 
	load the segment to %fs
	clear the LDT entry (or mark it non present), 
		-> %fs is now stale but still marked valid
	...(no task switch)
	Interrupt handled through task gate
		-> stale selector written to TSS
	...(interrupt handler)
	iret-> TS/NP/SF exception when loading segments in the
		new task (I believe it can't be GP)

Of course on an SMP machine with shared LDT, there are even more
ways of triggering segment related exceptions.

Currently %fs and %gs are lazily cleaned up when switching processes
using the standard fixup mechanism, %ds and %es are cleaned up if
necessary when popping them off the stack in the return to user
mode path (the one which ends up in iret). There is no way to recover
from bad user %cs/%ss, the process simply exits in the iret fixup.

But this works only because you can put a specific fixup for each
instruction which loads a given segment register (or two for iret).
In an iret from a task gate, you don't have this fine grained control
(all registers are loaded at once and then checked one by one)
and the return address is unpredictable, so the fixup mechanism is out.

This does not mean that there is no way to safely return from an 
interrupt handled through a task gate, but it's not simple (you 
don't want to change the existing lazy cleanup mechanism which is 
about as simple and low overhead as it gets for the common cases).

	Gabriel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: desc v0.61 found a 2.5 kernel bug
@ 2003-04-30  2:33 Chuck Ebbert
  2003-04-30 17:10 ` Gabriel Paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Ebbert @ 2003-04-30  2:33 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linux-kernel

Gabriel Paubert wrote:

>>  And shouldn't CR3 be intitialized in case anyone actually wants to
>> switch back to the kernel TSS?
>
> For now no, since the only task gate ever taken (double fault), never
> returns (you don't want to update the TSS's CR3 field on every 
> switch_to() so you would have to do it in the task gate return 
> path, as well as having a correct LDT field).

  I want to write a TSS-based debug exception handler that just does
an iret when it gets invoked.  For now it looks easier to just keep
CR3 up-to-date on every switch.

> However, returning from a task gate is so much fraught with races wrt 
> segment registers that the best thing to do is to avoid it.

 Even with interrupts off?


------
 Chuck

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-05-11 17:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-27 21:09 desc v0.61 found a 2.5 kernel bug Chuck Ebbert
2003-04-28 10:34 ` Gabriel Paubert
2003-04-30  2:33 Chuck Ebbert
2003-04-30 17:10 ` Gabriel Paubert
2003-04-30 20:08 Chuck Ebbert
2003-05-08 22:54 ` paubert
2003-05-09  8:58 Chuck Ebbert
2003-05-11  3:50 Chuck Ebbert
2003-05-11 17:22 ` paubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).