All of lore.kernel.org
 help / color / mirror / Atom feed
* NPTL/TLS segment flipping code problem
@ 2005-01-07 10:16 Jan Beulich
  2005-01-08  2:44 ` Ian Pratt
  2005-01-11 16:32 ` Keir Fraser
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Beulich @ 2005-01-07 10:16 UTC (permalink / raw)
  To: xen-devel

Looking at this code (2.0.2), it appears to have a couple of problems I
could not find mentioned on the mailing list archive:

(1) If base is zero in an expand-up segment, the conversion will yield
an expand-down segment covering the whole 4Gb, thus providing a
mechanism to obtain access to XEN space.

(2) If a malicious program accesses memory at a small negative offset
from gs:0 and the access extends into the positive range, the access
will gp-fault with either descriptor setting, thus leading to an endless
loop of flipping between the two states.

(3) Since escaped opcodes (those starting with 0F) aren't handled,
accessing mm/xmm data in __thread variables (along with other
specialized operations on such variable the compiler might generate) is
going to kill the program. Of course, it is similarly problematic that
SIB addressing still isn't implemented, but that's at least stated so in
the code.

(4) In the no-mod-r/m handling of the decoder, the byte case is handled
incorrectly: The address it deals with is still a 32-byte (or 16-byte,
but 16-bit addressing isn't handled anyway) one. There simply must not
be a 'case 1' there, and the insn_decode table should be changed
accordingly.

(5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more
correct) left an access to the no longer existing positive_access
parameter of fixup_seg in (at least) one of the DPRINTK-s.

Jan


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
  2005-01-07 10:16 NPTL/TLS segment flipping code problem Jan Beulich
@ 2005-01-08  2:44 ` Ian Pratt
  2005-01-09  6:13   ` Nuno Silva
  2005-01-11 16:32 ` Keir Fraser
  1 sibling, 1 reply; 10+ messages in thread
From: Ian Pratt @ 2005-01-08  2:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Ian.Pratt

> Looking at this code (2.0.2), it appears to have a couple of problems I
> could not find mentioned on the mailing list archive:

Thanks, we'll look into these.

The whole TLS workaround is a huge pain in the butt, and has
pretty horrible performance anyhow. The segment flipping approach
doesn't buy us much as +ve and -ve accesses seem to be
interleaved. It at least enables some simplifications to the
instruction decoder (though as you point out, we need to extend
it some).

We really need to look into producing a suitably patched glibc
rpm. Likewise, we need to prevent gcc from generating -ve offsets
for tls as the default.

Ian

> (1) If base is zero in an expand-up segment, the conversion will yield
> an expand-down segment covering the whole 4Gb, thus providing a
> mechanism to obtain access to XEN space.
> 
> (2) If a malicious program accesses memory at a small negative offset
> from gs:0 and the access extends into the positive range, the access
> will gp-fault with either descriptor setting, thus leading to an endless
> loop of flipping between the two states.
> 
> (3) Since escaped opcodes (those starting with 0F) aren't handled,
> accessing mm/xmm data in __thread variables (along with other
> specialized operations on such variable the compiler might generate) is
> going to kill the program. Of course, it is similarly problematic that
> SIB addressing still isn't implemented, but that's at least stated so in
> the code.
> 
> (4) In the no-mod-r/m handling of the decoder, the byte case is handled
> incorrectly: The address it deals with is still a 32-byte (or 16-byte,
> but 16-bit addressing isn't handled anyway) one. There simply must not
> be a 'case 1' there, and the insn_decode table should be changed
> accordingly.
> 
> (5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more
> correct) left an access to the no longer existing positive_access
> parameter of fixup_seg in (at least) one of the DPRINTK-s.
> 
> Jan
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
  2005-01-08  2:44 ` Ian Pratt
@ 2005-01-09  6:13   ` Nuno Silva
  2005-01-09  9:13     ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Nuno Silva @ 2005-01-09  6:13 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Jan Beulich, xen-devel


Hi!

Ian Pratt wrote:

[..]

> We really need to look into producing a suitably patched glibc
> rpm. Likewise, we need to prevent gcc from generating -ve offsets
> for tls as the default.

You can workaround that in userspace with:
LD_ASSUME_KERNEL=2.4.9

Set it in /etc/environment or in your kernel command line. The later 
depends on your setup: IMMV.

Regards,
Nuno Silva



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
  2005-01-09  6:13   ` Nuno Silva
@ 2005-01-09  9:13     ` Keir Fraser
  0 siblings, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2005-01-09  9:13 UTC (permalink / raw)
  To: Nuno Silva; +Cc: Ian Pratt, Jan Beulich, xen-devel


I don't think that would get us all the way. It would stop ld.so
linking to TLS glibc after booting, but:
 1. GCC can now emit -ve offset accesses directly in application
    binaries. So the problem may not be entirely solvable at
    link-time. 
 2. Still need to deal with linking of 'init' process, which is run
    before you get a chance to set LD_ASSUME_KERNEL. 

Maybe there's a workaround for 2 less invasive than deleting /lib/tls,
but I don't know what it is.

 -- Keir

> 
> Hi!
> 
> Ian Pratt wrote:
> 
> [..]
> 
> > We really need to look into producing a suitably patched glibc
> > rpm. Likewise, we need to prevent gcc from generating -ve offsets
> > for tls as the default.
> 
> You can workaround that in userspace with:
> LD_ASSUME_KERNEL=2.4.9
> 
> Set it in /etc/environment or in your kernel command line. The later 
> depends on your setup: IMMV.
> 
> Regards,
> Nuno Silva
> 
> 
> 
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
  2005-01-07 10:16 NPTL/TLS segment flipping code problem Jan Beulich
  2005-01-08  2:44 ` Ian Pratt
@ 2005-01-11 16:32 ` Keir Fraser
  1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2005-01-11 16:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> Looking at this code (2.0.2), it appears to have a couple of problems I
> could not find mentioned on the mailing list archive:
> 
> (1) If base is zero in an expand-up segment, the conversion will yield
> an expand-down segment covering the whole 4Gb, thus providing a
> mechanism to obtain access to XEN space.

No it won't. When we flip to expands-down we set the limit to
    (-(base & PAGE_MASK) >> 12) - 1
 == (-(0 & PAGE_MASK) >> 12) - 1
 == 0 - 1
 == 0xfffff

This is a *zero-length* grows-down segment, exactly as we require for
safety. [Yes, you can have a zero-length grows-down segment, even
though it is impossible to have a zero-length grows-up segment -- I
tested on real silicon.]

> (2) If a malicious program accesses memory at a small negative offset
> from gs:0 and the access extends into the positive range, the access
> will gp-fault with either descriptor setting, thus leading to an endless
> loop of flipping between the two states.

Harmless. The guest OS will still execute (e.g., to service timer
interrupts) and will be able to preempt the malicious program when it
has received its timeslice. So the program cannot take over the
machine -- it can only receive the same amount of CPU as a user-space
infinite CPU loop.

> (3) Since escaped opcodes (those starting with 0F) aren't handled,
> accessing mm/xmm data in __thread variables (along with other
> specialized operations on such variable the compiler might generate) is
> going to kill the program. Of course, it is similarly problematic that
> SIB addressing still isn't implemented, but that's at least stated so in
> the code.

Yes there are restrictions in what it supports. But we have a DPRINTK
for those cases so we can fix them up as/if they occur.

> (4) In the no-mod-r/m handling of the decoder, the byte case is handled
> incorrectly: The address it deals with is still a 32-byte (or 16-byte,
> but 16-bit addressing isn't handled anyway) one. There simply must not
> be a 'case 1' there, and the insn_decode table should be changed
> accordingly.

You mean the following fragment?

    if ( !(decode & HAS_MODRM) )
    {
        switch ( decode & 7 )
        {
        case 1:
            offset = (long)(*(char *)pb);
            goto skip_modrm;

It is correct -- sign extends the 8-bit offset to a signed long as we
require. The instruction only contains a single-byte offset -- it
isn't stored as 16 or 32 bits.

> (5, minor) The change from 2.0.1 to 2.0.2 (making the code a lot more
> correct) left an access to the no longer existing positive_access
> parameter of fixup_seg in (at least) one of the DPRINTK-s.

Now fixed in the new 2.0.3 release. The FC3 issue that some poeple
(including Rik) were seeing is also now fixed.

 -- Keir


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
       [not found] <s1e539bb.038@emea1-mh.id2.novell.com>
@ 2005-01-12 15:01 ` Keir Fraser
  0 siblings, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2005-01-12 15:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir.Fraser, xen-devel

> >Your analysis of these opcodes (A1 and A3) is correct, but both have
> >code 'O|4' in the insn_decode table, so they don't go thru 'case 1:'.
> 
> >
> >A0 and A2 have code 'O|1' which are instructions:
> > MOV moffs8,AL   ;   MOV AL,moffs8
> >These have a single-byte offset incoded within the instruction.
> >The 'case 1:' *is* needed for A0 and A2.
> 
> Hmm, I'm sorry, I actually meant A0 and A2. Just look at what the
> assembler generates for 
> 
> 	movb	symbol, %al
> 	movb	%al, symbol

Oops. Sorry - yes, you win. :-)

I'll get rid of 'case 1'.

 -- Keir


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
@ 2005-01-12 14:52 Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2005-01-12 14:52 UTC (permalink / raw)
  To: Keir.Fraser; +Cc: xen-devel

>Your analysis of these opcodes (A1 and A3) is correct, but both have
>code 'O|4' in the insn_decode table, so they don't go thru 'case 1:'.

>
>A0 and A2 have code 'O|1' which are instructions:
> MOV moffs8,AL   ;   MOV AL,moffs8
>These have a single-byte offset incoded within the instruction.
>The 'case 1:' *is* needed for A0 and A2.

Hmm, I'm sorry, I actually meant A0 and A2. Just look at what the
assembler generates for 

	movb	symbol, %al
	movb	%al, symbol

(gas -al=movb.l -o movb.o movb.s):

GAS LISTING movb.s 			page 1


   1 0000 A0000000 		movb	symbol, %al
   1      00
   2 0005 A2000000 		movb	%al, symbol
   2      00

Jan



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
       [not found] <s1e52551.013@emea1-mh.id2.novell.com>
@ 2005-01-12 13:58 ` Keir Fraser
  0 siblings, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2005-01-12 13:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir.Fraser, xen-devel

> >No it won't. When we flip to expands-down we set the limit to
> >    (-(base & PAGE_MASK) >> 12) - 1
> > == (-(0 & PAGE_MASK) >> 12) - 1
> > == 0 - 1
> > == 0xfffff
> >
> >This is a *zero-length* grows-down segment, exactly as we require for
> >safety. [Yes, you can have a zero-length grows-down segment, even
> >though it is impossible to have a zero-length grows-up segment -- I
> >tested on real silicon.]
> 
> Oh, right, I didn't connect the -1 done after the flip: label. But it
> seems pointless to flip a segment with a base of zero; the gp fault must
> have had a different reason (i.e. by flipping it to a zero-length
> segment you'll only cause another gp fault right away when restarting
> the instruction).

We could improve the flipping code by detecting accesses that overlap
Xen's private area and simply propagate the GPF in that case. That
would stop us flipping to a zero-length segment. 

Modifying that code again is not high on our priority list,
though. Our main aim is compatibility with non-buggy programs. If
there is a minuscule chance of buggy programs looping instead of
seg-faulting, we can accept that for now. :-)

> >Harmless. The guest OS will still execute (e.g., to service timer
> >interrupts) and will be able to preempt the malicious program when it
> >has received its timeslice. So the program cannot take over the
> >machine -- it can only receive the same amount of CPU as a user-space
> >infinite CPU loop.
> 
> Not exactly: if a user mode process runs en infinite loop, it'll not
> block out interrupts for any more than a single instruction. For the
> mentioned scenario, (physical) interrupt latency would significantly
> increase.

Well, maybe the path thru the segment-flip code is a few hundred
instructions. I bet there are plenty of syscall paths of *at least*
that length that don't contain a yield point. The major cost of the
flipping is actually the privilege-level changes (ring 3 to 0 to 3).
An application can trivially cause lots fo those, even with no
seg-flipping. 

> >Yes there are restrictions in what it supports. But we have a DPRINTK
> >for those cases so we can fix them up as/if they occur.
> 
> Except that the DPRINTK expands to nothing in the published sources, so
> one would never know what caused a spurious SEGV seen in a client OSes
> app without re-running the whole thing with a version of XEN where these
> DPRINTKs are actually doing something.

I agree it is not the best situation. But to fix it we would need a
much more complex decoder, and a comprehensive test suite
(implementing the decoder without testing every instruction it can
decode would just lead to latent bugs in Xen that could be worse than
having no decoding ability for those instructions at all). It's just
more effort than we want to commit to -- the main purpose of the
seg-flipping is to allow existing distros to initially boot and allow
the sysadmin to move /lib/tls out of the way.

> >It is correct -- sign extends the 8-bit offset to a signed long as we
> >require. The instruction only contains a single-byte offset -- it
> >isn't stored as 16 or 32 bits.
> 
> I don't think so. The instructions this refers to are (opcodes A1 and
> A3)

Your analysis of these opcodes (A1 and A3) is correct, but both have
code 'O|4' in the insn_decode table, so they don't go thru 'case 1:'. 

A0 and A2 have code 'O|1' which are instructions:
 MOV moffs8,AL   ;   MOV AL,moffs8
These have a single-byte offset incoded within the instruction.
The 'case 1:' *is* needed for A0 and A2.

> Having a second look at this reveals another dangerous thing: The code
> here directly derefences pb, whereas all other instances of references
> through ip correctly use get_user.

Yes, this needs fixing.

> Finally, in the case I'm still missing something here and the 'case 1'
> is needed, then an operation like (long)*(char *) is rather dangerous,
> because you depend on the (implementation defined) signedness of char.
> You'd want to code (long)*(signed char *) instead.

Yes, this also needs fixing. :-)

 Thanks,
 Keir


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NPTL/TLS segment flipping code problem
@ 2005-01-12 13:25 Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2005-01-12 13:25 UTC (permalink / raw)
  To: Keir.Fraser; +Cc: xen-devel

>> Looking at this code (2.0.2), it appears to have a couple of problems
I
>> could not find mentioned on the mailing list archive:
>> 
>> (1) If base is zero in an expand-up segment, the conversion will
yield
>> an expand-down segment covering the whole 4Gb, thus providing a
>> mechanism to obtain access to XEN space.
>
>No it won't. When we flip to expands-down we set the limit to
>    (-(base & PAGE_MASK) >> 12) - 1
> == (-(0 & PAGE_MASK) >> 12) - 1
> == 0 - 1
> == 0xfffff
>
>This is a *zero-length* grows-down segment, exactly as we require for
>safety. [Yes, you can have a zero-length grows-down segment, even
>though it is impossible to have a zero-length grows-up segment -- I
>tested on real silicon.]

Oh, right, I didn't connect the -1 done after the flip: label. But it
seems pointless to flip a segment with a base of zero; the gp fault must
have had a different reason (i.e. by flipping it to a zero-length
segment you'll only cause another gp fault right away when restarting
the instruction).

>> (2) If a malicious program accesses memory at a small negative
offset
>> from gs:0 and the access extends into the positive range, the
access
>> will gp-fault with either descriptor setting, thus leading to an
endless
>> loop of flipping between the two states.
>
>Harmless. The guest OS will still execute (e.g., to service timer
>interrupts) and will be able to preempt the malicious program when it
>has received its timeslice. So the program cannot take over the
>machine -- it can only receive the same amount of CPU as a user-space
>infinite CPU loop.

Not exactly: if a user mode process runs en infinite loop, it'll not
block out interrupts for any more than a single instruction. For the
mentioned scenario, (physical) interrupt latency would significantly
increase.

>> (3) Since escaped opcodes (those starting with 0F) aren't handled,
>> accessing mm/xmm data in __thread variables (along with other
>> specialized operations on such variable the compiler might generate)
is
>> going to kill the program. Of course, it is similarly problematic
that
>> SIB addressing still isn't implemented, but that's at least stated
so in
>> the code.
>
>Yes there are restrictions in what it supports. But we have a DPRINTK
>for those cases so we can fix them up as/if they occur.

Except that the DPRINTK expands to nothing in the published sources, so
one would never know what caused a spurious SEGV seen in a client OSes
app without re-running the whole thing with a version of XEN where these
DPRINTKs are actually doing something.

>> (4) In the no-mod-r/m handling of the decoder, the byte case is
handled
>> incorrectly: The address it deals with is still a 32-byte (or
16-byte,
>> but 16-bit addressing isn't handled anyway) one. There simply must
not
>> be a 'case 1' there, and the insn_decode table should be changed
>> accordingly.
>
>You mean the following fragment?
>
>    if ( !(decode & HAS_MODRM) )
>    {
>        switch ( decode & 7 )
>        {
>        case 1:
>            offset = (long)(*(char *)pb);
>            goto skip_modrm;

Yes.

>It is correct -- sign extends the 8-bit offset to a signed long as we
>require. The instruction only contains a single-byte offset -- it
>isn't stored as 16 or 32 bits.

I don't think so. The instructions this refers to are (opcodes A1 and
A3)

	movb	symbol, %al
	movb	%al, symbol

Clearly, they take a full (16-/32-bit) address but operate on a byte at
that address (they would be useless if they operated only on an 8-bit
sign-extended address).
Having a second look at this reveals another dangerous thing: The code
here directly derefences pb, whereas all other instances of references
through ip correctly use get_user.
Finally, in the case I'm still missing something here and the 'case 1'
is needed, then an operation like (long)*(char *) is rather dangerous,
because you depend on the (implementation defined) signedness of char.
You'd want to code (long)*(signed char *) instead.

Jan


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: NPTL/TLS segment flipping code problem
@ 2005-01-09  9:16 James Harper
  0 siblings, 0 replies; 10+ messages in thread
From: James Harper @ 2005-01-09  9:16 UTC (permalink / raw)
  To: Keir Fraser, Nuno Silva; +Cc: Ian Pratt, Jan Beulich, xen-devel

> Maybe there's a workaround for 2 less invasive than deleting /lib/tls,
> but I don't know what it is.

Under Debian at least, deleting /lib/tls is only a temporary measure as
if the right packages are updated, it is re-created again. Maybe a
redirect could fix that though.

James


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-01-12 15:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-07 10:16 NPTL/TLS segment flipping code problem Jan Beulich
2005-01-08  2:44 ` Ian Pratt
2005-01-09  6:13   ` Nuno Silva
2005-01-09  9:13     ` Keir Fraser
2005-01-11 16:32 ` Keir Fraser
2005-01-09  9:16 James Harper
2005-01-12 13:25 Jan Beulich
     [not found] <s1e52551.013@emea1-mh.id2.novell.com>
2005-01-12 13:58 ` Keir Fraser
2005-01-12 14:52 Jan Beulich
     [not found] <s1e539bb.038@emea1-mh.id2.novell.com>
2005-01-12 15:01 ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.