linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Assignment of GDT entries
@ 2006-09-15  7:55 Mikael Pettersson
  2006-09-15  8:20 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Mikael Pettersson @ 2006-09-15  7:55 UTC (permalink / raw)
  To: acahalan, jeremy; +Cc: ak, arjan, ebiederm, linux-kernel, mingo, torvalds, zach

On Wed, 13 Sep 2006 23:11:05 -0700, Jeremy Fitzhardinge wrote:
>Albert Cahalan wrote:
>> We actually have an ABI problem right now because of this.
>> Note that i386 and x86_64 use different GDT slots.
>>
>> As far as I can tell, users need to hard-code the mapping
>> from TLS slot to segment number. They use 0,1,2 to ask the
>> kernel to set things up (via set_thread_area), but can't
>> just pop that into %fs or %gs.
>
>That's not true at all.  The program I posted earlier in this thread 
>uses set_thread_area() to allocate a GDT slot, and it works on both 
>native 32 bit and 32-under-64.

The i386 TLS API has three components:

(1) set_thread_area(entry_number == -1):
    allocates and sets up the first available TLS entry and
    copies the chosen GDT index back to user-space
(2) set_thread_area(6 <= entry_number && entry_number <= 8):
    allocates and sets up the indicated GDT entry
(3) get_thread_area(6 <= entry_number && entry_number <= 8):
    retrieves the contents of the indicated GDT entry

Only (1) works in x86-64's ia32 emulation, the other two fail
with EINVAL because x86-64 only accepts GDT indices 12 to 14
for TLS entries. glibc only uses (1).

If you move the i386 TLS GDT entries to other indices then you
break (2) and (3) also on i386.

It's not difficult to design a better i386 TLS API that avoids
requiring user-space to know the actual GDT indices (just use
logical TLS indices and always copy the GDT index to user-space).
but unfortunately that doesn't help us now because the TLS GDT
indices must remain fixed as long as the current API is supported.

I _personally_ could certainly handle a post-2.6.18 kernel where
the improved API (new syscalls) is in place, the GDT indices have
been moved, and consequently components (2) and (3) of the old API
are broken. However, this still implies breaking binary compatibility,
which is not something to be done lightly.

(What's _really_ sad is that the implementation of the i386 TLS API
internally operates on logical TLS indices, it's just the syscall
interface that insists on requiring actual GDT indices from user-space.)

/Mikael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-15  7:55 Assignment of GDT entries Mikael Pettersson
@ 2006-09-15  8:20 ` Jeremy Fitzhardinge
  2006-09-15  8:58   ` Mikael Pettersson
  0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-15  8:20 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: acahalan, ak, arjan, ebiederm, linux-kernel, mingo, torvalds, zach

Mikael Pettersson wrote:
> The i386 TLS API has three components:
>
> (1) set_thread_area(entry_number == -1):
>     allocates and sets up the first available TLS entry and
>     copies the chosen GDT index back to user-space
> (2) set_thread_area(6 <= entry_number && entry_number <= 8):
>     allocates and sets up the indicated GDT entry
> (3) get_thread_area(6 <= entry_number && entry_number <= 8):
>     retrieves the contents of the indicated GDT entry
>
> Only (1) works in x86-64's ia32 emulation, the other two fail
> with EINVAL because x86-64 only accepts GDT indices 12 to 14
> for TLS entries. glibc only uses (1).
>
> If you move the i386 TLS GDT entries to other indices then you
> break (2) and (3) also on i386.
>   

(2) and (3) are always OK if you pass it the result of (1) - ie to 
update or readback a previously allocated descriptor.  Neither is useful 
without having done (1) first.  The fact that 32-on-32 and 32-on-64 
differ here means that nothing can (an apparently nothing does) depend 
on hardcoded knowledge of the TLS descriptor indicies anyway.

> It's not difficult to design a better i386 TLS API that avoids
> requiring user-space to know the actual GDT indices (just use
> logical TLS indices and always copy the GDT index to user-space).
> but unfortunately that doesn't help us
>   

You still need the real indicies to construct a selector to put into a 
segment register - ie, actually do something useful.  Changing the API 
to use abstract "TLS indicies" would also require a call to return the 
"TLS base", which hardly seems like an improvement.

Also, there's no inherent reason why the TLS indicies should be 
contigious; it happens to be true, but there's nothing useful userspace 
can do with that knowledge.  Allowing them to be discontigious may be 
helpful, for example, in packing the most used TLS entries (ie #1) into 
a hot cache line, while putting the lesser-used ones elsewhere.  The 
current API could deal with this without needing to change.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-15  8:20 ` Jeremy Fitzhardinge
@ 2006-09-15  8:58   ` Mikael Pettersson
  2006-09-15 18:27     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Mikael Pettersson @ 2006-09-15  8:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Mikael Pettersson, acahalan, ak, arjan, ebiederm, linux-kernel,
	mingo, torvalds, zach

Jeremy Fitzhardinge writes:
 > Mikael Pettersson wrote:
 > > The i386 TLS API has three components:
 > >
 > > (1) set_thread_area(entry_number == -1):
 > >     allocates and sets up the first available TLS entry and
 > >     copies the chosen GDT index back to user-space
 > > (2) set_thread_area(6 <= entry_number && entry_number <= 8):
 > >     allocates and sets up the indicated GDT entry
 > > (3) get_thread_area(6 <= entry_number && entry_number <= 8):
 > >     retrieves the contents of the indicated GDT entry
 > >
 > > Only (1) works in x86-64's ia32 emulation, the other two fail
 > > with EINVAL because x86-64 only accepts GDT indices 12 to 14
 > > for TLS entries. glibc only uses (1).
 > >
 > > If you move the i386 TLS GDT entries to other indices then you
 > > break (2) and (3) also on i386.
 > >   
 > 
 > (2) and (3) are always OK if you pass it the result of (1) - ie to 
 > update or readback a previously allocated descriptor.  Neither is useful 
 > without having done (1) first.

In the real world a process' state is influenced by code I have little
control over, usually glibc and other libraries, and fork(). Using (3)
I can inspect parts of my process' state that I did not initialize myself.

 >  The fact that 32-on-32 and 32-on-64 
 > differ here means that nothing can (an apparently nothing does) depend 
 > on hardcoded knowledge of the TLS descriptor indicies anyway.

No, it means that x86-64's ia32 emulation was implemented by someone
who either didn't realize the difference, or didn't care (because
"only glibc matters").

 > > It's not difficult to design a better i386 TLS API that avoids
 > > requiring user-space to know the actual GDT indices (just use
 > > logical TLS indices and always copy the GDT index to user-space).
 > > but unfortunately that doesn't help us
 > >   
 > 
 > You still need the real indicies to construct a selector to put into a 
 > segment register - ie, actually do something useful.

Sure.

 >  Changing the API 
 > to use abstract "TLS indicies" would also require a call to return the 
 > "TLS base", which hardly seems like an improvement.

The TLS base can obviously be zero.

User-space asks to access TLS #n (for allocs #n can be -1).
The kernel maps that to GDT index #m.
The kernel stores #m in the user-space buffer.
User-space maps #m to a selector.

 > Also, there's no inherent reason why the TLS indicies should be 
 > contigious; it happens to be true, but there's nothing useful userspace 
 > can do with that knowledge.  Allowing them to be discontigious may be 
 > helpful, for example, in packing the most used TLS entries (ie #1) into 
 > a hot cache line, while putting the lesser-used ones elsewhere.  The 
 > current API could deal with this without needing to change.

I have said nothing that would prevent the use of sparse TLS GDT indices.

Look, I'm not saying the current API is perfect, far from it. But it does
have valid usage modes which are broken in x86-64's ia32 emulation, and
will break on i386 of you reallocate the TLS GDT indices. This is a fact.

This is why I'm asking that if you change things (thus breaking binary
compatibility even more), that a corrected API be placed in new syscalls.

That is, instead of forcing user-space to do

   uname
   if (version >= 2.6.N)
      call {set,get}_thread_area with new-style parameters
   else
      call {set,get}_thread_area with old-style parameters

it should do

   call new_{set,get}_thread_area with new-style parameters
   if (ENOSYS)
     call old_{set,get}_thread_area with old-style parameters

/Mikael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-15  8:58   ` Mikael Pettersson
@ 2006-09-15 18:27     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-15 18:27 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: acahalan, ak, arjan, ebiederm, linux-kernel, mingo, torvalds, zach

Mikael Pettersson wrote:
>  >  Changing the API 
>  > to use abstract "TLS indicies" would also require a call to return the 
>  > "TLS base", which hardly seems like an improvement.
>
> The TLS base can obviously be zero.
>
> User-space asks to access TLS #n (for allocs #n can be -1).
> The kernel maps that to GDT index #m.
> The kernel stores #m in the user-space buffer.
> User-space maps #m to a selector.
>   

I'm missing why this is a substantial improvement over the current 
interface (or functionally different at all).  What does this proposal 
let you do that the current one doesn't?

> Look, I'm not saying the current API is perfect, far from it. But it does
> have valid usage modes which are broken in x86-64's ia32 emulation, and
> will break on i386 of you reallocate the TLS GDT indices. This is a fact.
>   

Hm, well its a "fact" in that they use different segment descriptors, 
but you'd be hard pressed to say that was a breakage.  set_thread_area 
was added in 2.5.29 (Jul 2002), and x86-64 added support in 2.5.43 (Oct 
2002), so the current behaviour is pretty much as it has always been.  
If you have a program that expects something different, you either wrote 
it in Jul-Oct 2002, or you made an unsustainable assumption about how 
set_thread_area() works.

> Look, I'm not saying the current API is perfect, far from it. But it does
> have valid usage modes which are broken in x86-64's ia32 emulation, and
> will break on i386 of you reallocate the TLS GDT indices. This is a fact.
>   

You seem to have a specific use-case in mind; do you have a program 
which would like to use a new interface?  Would you mind spelling it 
out, and describe why the current interface doesn't work for you?

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  0:25         ` Zachary Amsden
  2006-09-14  1:40           ` Stephen Rothwell
@ 2006-09-14 13:03           ` Alan Cox
  1 sibling, 0 replies; 32+ messages in thread
From: Alan Cox @ 2006-09-14 13:03 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Jeremy Fitzhardinge, Arjan van de Ven, Linus Torvalds,
	Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Linux Kernel Mailing List, Michael A Fetterman

Ar Mer, 2006-09-13 am 17:25 -0700, ysgrifennodd Zachary Amsden:
> that makes use of APM or PnP facilities.  There is the possibility 
> however, that such a program could sleep, run the idle thread, which 
> makes a call into some of these BIOS facilities, and then reschedules 
> the same program thread - which means FS/GS never get reloaded, thus 
> maintaining their corrupted values.  It is worth fixing, just not a high 
> priority.  I had a patch that fixed both APM and PnP at one time, but it 
> is covered with mold and now looks like a science experiment.  Shall I 
> apply disinfectant?

I think that would be useful, or just post up the mouldy one for someone
else to rework. If someone is hitting that kind of bug its going to be
pretty horrible to track down.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  7:12       ` Albert Cahalan
@ 2006-09-14  7:24         ` Zachary Amsden
  0 siblings, 0 replies; 32+ messages in thread
From: Zachary Amsden @ 2006-09-14  7:24 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Eric W. Biederman, torvalds, jeremy, mingo, ak, arjan, linux-kernel

Albert Cahalan wrote
>>
>> There are only 32 possible GDT entries in 32-bit i386 Linux, and only
>> three of them are usable for userspace.  You can't find out which slots
>> are in use, but you can cause one to be allocated and returned to you.
>> This seems like a perfectly reasonable API to me, why do you think it is
>> so ugly?
>
> Eh, "returned to you" doesn't work for me. I need to
> figure out what other code (not written by me) uses.

I don't understand.  Why do you need to figure that out?  You need a 
selector, you ask for one, and you get assigned one.  It is that 
simple.  You can't figure out what other code uses, and the kernel has 
no way to tell you, because that is an application level allocation 
problem, not a kernel responsibility.  The kernel has no visibility into 
userspace intentions regarding segment usage.

> I may need to "borrow" a slot if all three slots are in
> use. Without using evil knowledge of the GDT, how
> am I to do that? I don't know what slots might have
> been allocated by other libraries.

What kind of libraries are you using?  Unless this is really, really, 
special purpose, they are going to allocate at most one, and that is 
only if you use TLS libraries.

If all three slots are in use (i.e. your allocation fails), you'll have 
to allocate an LDT selector, just like wine:

void wine_ldt_init_fs( unsigned short sel, const LDT_ENTRY *entry )
{
    if ((sel & ~3) == (global_fs_sel & ~3))
    {
#ifdef __linux__
        struct modify_ldt_s ldt_info;
        int ret;

        ldt_info.entry_number = sel >> 3;
        fill_modify_ldt_struct( &ldt_info, entry );
        if ((ret = set_thread_area( &ldt_info ) < 0)) perror( 
"set_thread_area" );
#elif defined(__APPLE__)
        int ret = thread_set_user_ldt( wine_ldt_get_base(entry), 
wine_ldt_get_limit(entry), 0 );
        if (ret == -1) perror( "thread_set_user_ldt" );
        else assert( ret == global_fs_sel );
#endif  /* __APPLE__ */
    }
    else  /* LDT selector */
    {
        internal_set_entry( sel, entry );  <---- just like this
    }
    wine_set_fs( sel );
}

Zach

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  6:28     ` Zachary Amsden
@ 2006-09-14  7:12       ` Albert Cahalan
  2006-09-14  7:24         ` Zachary Amsden
  0 siblings, 1 reply; 32+ messages in thread
From: Albert Cahalan @ 2006-09-14  7:12 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Eric W. Biederman, torvalds, jeremy, mingo, ak, arjan, linux-kernel

On 9/14/06, Zachary Amsden <zach@vmware.com> wrote:
> Albert Cahalan wrote:

> > So basically it's not allowed to just grab the 3rd slot?
>
> You can, but you should be prepared for it to fail as well.

Without knowing details of the kernel's GDT, how?

> > What if I want to find out what is already in use?
> > Am I supposed to iterate over all 8191 possible
> > GDT entries? How do I even tell how many slots
> > are available without using them all up?
>
> There are only 32 possible GDT entries in 32-bit i386 Linux, and only
> three of them are usable for userspace.  You can't find out which slots
> are in use, but you can cause one to be allocated and returned to you.
> This seems like a perfectly reasonable API to me, why do you think it is
> so ugly?

Eh, "returned to you" doesn't work for me. I need to
figure out what other code (not written by me) uses.

I may need to "borrow" a slot if all three slots are in
use. Without using evil knowledge of the GDT, how
am I to do that? I don't know what slots might have
been allocated by other libraries.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  6:19   ` Albert Cahalan
  2006-09-14  6:28     ` Zachary Amsden
@ 2006-09-14  6:29     ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-14  6:29 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Eric W. Biederman, torvalds, mingo, ak, arjan, zach, linux-kernel

Albert Cahalan wrote:
> So if I grabbed the first two slots before glibc got to
> mess with them, glibc wouldn't break horribly?

glibc would be happy with anything it got; if you grabbed all 3 TLS 
slots it would probably be upset.

> If I grabbed one slot and glibc grabbed another, Wine
> would be OK with the third instead of the second?

Presumably.

> So basically it's not allowed to just grab the 3rd slot?
Eh?  You mean there's no "allocate and return TLS slot #N" operation?  
No, but all the TLS slots should be interchangeable.  Once you've got 
your entry numbers and worked out your selector values, you can just use 
them.

> What if I want to find out what is already in use?
> Am I supposed to iterate over all 8191 possible
> GDT entries? How do I even tell how many slots
> are available without using them all up?
The kernel reserves 3 slots in the GDT for usermode use, which are 
per-thread.  If you want more segment descriptors, you can always 
allocate an LDT.

> Eeeeeeew. Well this was documented exactly nowhere.
> The man page is even vague about entry_number,

man set_thread_area has this as paragraph 2:

       When  set_thread_area() is passed an entry_number of -1, it uses a free
       TLS entry. If set_thread_area() finds a free TLS entry,  the  value  of
       u_info->entry_number  is  set  upon  return  to  show  which  entry was
       changed.

which seems pretty clear to me.  A quick run with strace on any binary 
shows this in action:

    set_thread_area({entry_number:-1 -> 6, base_addr:0xb7fb06c0,
    limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
    limit_in_pages:1, seg_not_present:0, useable:1}) = 0

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  6:19   ` Albert Cahalan
@ 2006-09-14  6:28     ` Zachary Amsden
  2006-09-14  7:12       ` Albert Cahalan
  2006-09-14  6:29     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 32+ messages in thread
From: Zachary Amsden @ 2006-09-14  6:28 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Eric W. Biederman, torvalds, jeremy, mingo, ak, arjan, linux-kernel

Albert Cahalan wrote:
> Eeeeeew.
>
> So if I grabbed the first two slots before glibc got to
> mess with them, glibc wouldn't break horribly?
> If I grabbed one slot and glibc grabbed another, Wine
> would be OK with the third instead of the second?

Glibc should allocate a slot just the same way, just like wine does as 
well.  Glibc just usually gets its slot allocated first.

>
> So basically it's not allowed to just grab the 3rd slot?

You can, but you should be prepared for it to fail as well.

>
> What if I want to find out what is already in use?
> Am I supposed to iterate over all 8191 possible
> GDT entries? How do I even tell how many slots
> are available without using them all up?

There are only 32 possible GDT entries in 32-bit i386 Linux, and only 
three of them are usable for userspace.  You can't find out which slots 
are in use, but you can cause one to be allocated and returned to you.  
This seems like a perfectly reasonable API to me, why do you think it is 
so ugly?

Zach

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  4:44 ` Eric W. Biederman
@ 2006-09-14  6:19   ` Albert Cahalan
  2006-09-14  6:28     ` Zachary Amsden
  2006-09-14  6:29     ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 32+ messages in thread
From: Albert Cahalan @ 2006-09-14  6:19 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: torvalds, jeremy, mingo, ak, arjan, zach, linux-kernel

On 9/14/06, Eric W. Biederman <ebiederm@xmission.com> wrote:

> I agree that the difference is annoying.
>
> However I just wrote a user space implementation of fork that
> is capable of copying a process from an i386 only kernel to a x86_64
> kernel, and executing there without having to detect the kernel type.
>
> It didn't takes hacks to accomplish that.
>
> The basic syscall is:
> int set_thread_area (struct user_desc *u_info);
> struct user_desc {
>         unsigned int  entry_number;
>         unsigned long base_addr;
>         unsigned int  limit;
>         unsigned int  seg_32bit:1;
>         unsigned int  contents:2;
>         unsigned int  read_exec_only:1;
>         unsigned int  limit_in_pages:1;
>         unsigned int  seg_not_present:1;
>         unsigned int  useable:1;
> };
>
> If entry_number is -1 the kernel finds a free gdt entry and
> sets up the segment and returns with entry_number set to the
> segment number.

Eeeeeew.

So if I grabbed the first two slots before glibc got to
mess with them, glibc wouldn't break horribly?
If I grabbed one slot and glibc grabbed another, Wine
would be OK with the third instead of the second?

So basically it's not allowed to just grab the 3rd slot?

What if I want to find out what is already in use?
Am I supposed to iterate over all 8191 possible
GDT entries? How do I even tell how many slots
are available without using them all up?

Eeeeeeew. Well this was documented exactly nowhere.
The man page is even vague about entry_number,
meaning I had to dig in the kernel source (AMD manual
by my side) to find if that was a GDT slot or TLS slot,
as array index or byte offset, with or without the low bits
all set up for loading into the segment register, loaded for
me or not, etc.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  3:23 Albert Cahalan
@ 2006-09-14  6:11 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-14  6:11 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: torvalds, mingo, ak, ebiederm, arjan, zach, linux-kernel

Albert Cahalan wrote:
> We actually have an ABI problem right now because of this.
> Note that i386 and x86_64 use different GDT slots.
>
> As far as I can tell, users need to hard-code the mapping
> from TLS slot to segment number. They use 0,1,2 to ask the
> kernel to set things up (via set_thread_area), but can't
> just pop that into %fs or %gs.

That's not true at all.  The program I posted earlier in this thread 
uses set_thread_area() to allocate a GDT slot, and it works on both 
native 32 bit and 32-under-64.  The entry_number field in the struct 
user_desc is an actual entry number, so you can easily construct a 
selector from it.

> Typical hacks that result from this:
>
> call uname() and look for "x86_64"
> see of the addresses of local variables exceed 0xbfffffff
> examine /proc/1/maps
> check for a /lib64 directory
> change SSE register 8 in a signal handler frame and see if it sticks
> checksum the vdso code
> ...
>
> Please save us from these foul hacks.

Er, that all looks completely unnecessary.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 18:58 Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2006-09-13 21:21 ` Linus Torvalds
@ 2006-09-14  6:00 ` Andi Kleen
  3 siblings, 0 replies; 32+ messages in thread
From: Andi Kleen @ 2006-09-14  6:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ingo Molnar, Eric W. Biederman, Arjan van de Ven,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman

On Wednesday 13 September 2006 20:58, Jeremy Fitzhardinge wrote:
> What's the rationale for the current assignment of GDT entries?  In
> particular, this section:

AFAIK it was mostly for APM and various BIOS bugs.  IIRC Wine had 
some special requirements at some point too, but I can't remember them right
now. On x86-64 I use all GDT entries, although there are a few special 
ordering restrictions due to the semantics of SYSCALL. I ignored Wine too and 
so far nobody has complained, so whatever requirements they have they can't 
be that important.

> I'm asking because I'd like to use one of these entries for the PDA
> descriptor, so that it is on the same cache line as the TLS
> descriptors.  That way, the entry/exit segment register reloads would
> still only need to touch two GDT cache lines.  Would there be a real
> problem in doing this?

The only way to find out would be to do it. It's quite possible that all 
the systems with APM BIOS that needed it are long beyond their MTBF.

-Andi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  4:06 Albert Cahalan
@ 2006-09-14  4:44 ` Eric W. Biederman
  2006-09-14  6:19   ` Albert Cahalan
  0 siblings, 1 reply; 32+ messages in thread
From: Eric W. Biederman @ 2006-09-14  4:44 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: torvalds, jeremy, mingo, ak, arjan, zach, linux-kernel

"Albert Cahalan" <acahalan@gmail.com> writes:

> I think that would be a lower chance, not a greater chance.
> Reasons why an app might care:
>
> a. identify a 64-bit kernel
> b. far jumps between 32-bit and 64-bit code
> c. reload of ds/es after a string operation on thread-private data
>
> Perhaps i386 should change to match x86_64.

I agree that the difference is annoying.

However I just wrote a user space implementation of fork that
is capable of copying a process from an i386 only kernel to a x86_64
kernel, and executing there without having to detect the kernel type.

It didn't takes hacks to accomplish that.

The basic syscall is:
int set_thread_area (struct user_desc *u_info);
struct user_desc {
	unsigned int  entry_number;
	unsigned long base_addr;
	unsigned int  limit;
	unsigned int  seg_32bit:1;
	unsigned int  contents:2;
	unsigned int  read_exec_only:1;
	unsigned int  limit_in_pages:1;
	unsigned int  seg_not_present:1;
	unsigned int  useable:1;
};

If entry_number is -1 the kernel finds a free gdt entry and
sets up the segment and returns with entry_number set to the
segment number.

Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
@ 2006-09-14  4:06 Albert Cahalan
  2006-09-14  4:44 ` Eric W. Biederman
  0 siblings, 1 reply; 32+ messages in thread
From: Albert Cahalan @ 2006-09-14  4:06 UTC (permalink / raw)
  To: torvalds, jeremy, mingo, ak, ebiederm, arjan, zach, linux-kernel

Jeremy Fitzhardinge writes:
> Zachary Amsden wrote:

>> I believe 9,10,11 are reserved for future users like yourself or
>> expanded TLS segments.  I think a bank of 3 TLS segments in the
>> GDT is working fine now (does NPTL even use more than one?).
>
> Nope.  And there's a comment that wine uses one more.  I think
> the third is completely unused.

I use the third. The sucky thing is that I need to determine if
the kernel is 64-bit to know what I must load into the segment
register. Fortunately this code is not yet out in the wild, so
you can still fix the ABI situation for me at least.

>>> Otherwise line 1 would be ideal for putting 3 TLS, kernel+user
>>> code+data and PDA into, thereby making 99.999% of GDT descriptor
>>> uses come from one cache line.
>>
>> That change is visible to userspace, unfortunately.
>
> Don't think it matters much.  32-bit processes on x86-64 seem
> perfectly happy with the TLS being in a different place.

Heh. I wish. Well, OK, but only because I detect the kernel!

> I think the ABI is defined in terms of "use the selector for
> the entry that set_thread_area/clone returns", and so is not
> a constant.  But I agree it would be better not to.
>
> Hm, moving user cs/ds would be pretty visible too... Hm, and
> it would have a greater chance of breaking stuff if they changed,
> compared to moving the TLS...

I think that would be a lower chance, not a greater chance.
Reasons why an app might care:

a. identify a 64-bit kernel
b. far jumps between 32-bit and 64-bit code
c. reload of ds/es after a string operation on thread-private data

Perhaps i386 should change to match x86_64.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
@ 2006-09-14  3:23 Albert Cahalan
  2006-09-14  6:11 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Albert Cahalan @ 2006-09-14  3:23 UTC (permalink / raw)
  To: torvalds, jeremy, mingo, ak, ebiederm, arjan, zach, linux-kernel

Linus Torvalds writes:
> On Wed, 13 Sep 2006, Jeremy Fitzhardinge wrote:

>> So does this mean that moving the user-visible cs/ds isn't
>> likely to break stuff, if it has been done before?
>
> Yes. I _think_ we could do it. It's been done before, and nobody noticed.
>
> That said, it may actually be that programs have since become much more
> aware of segments, for a rather perverse reason: the TLS stuff. Old
> programs are all very much coded and compiled for a totally flat model,
> and as such they really don't know _anything_ about segments. But with
> more TLS stuff, it's possible that a modern threded program is at least
> aware of _some_ of it.

We actually have an ABI problem right now because of this.
Note that i386 and x86_64 use different GDT slots.

As far as I can tell, users need to hard-code the mapping
from TLS slot to segment number. They use 0,1,2 to ask the
kernel to set things up (via set_thread_area), but can't
just pop that into %fs or %gs.

So a 32-bit app using set_thread_area can work on i386 or x86_64,
but not both. I guess glibc gets %gs set up free via clone() with
the right flags, and thus does not need to determine the kernel.
For anything involving set_thread_area though, it gets nasty.

Typical hacks that result from this:

call uname() and look for "x86_64"
see of the addresses of local variables exceed 0xbfffffff
examine /proc/1/maps
check for a /lib64 directory
change SSE register 8 in a signal handler frame and see if it sticks
checksum the vdso code
...

Please save us from these foul hacks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-14  0:25         ` Zachary Amsden
@ 2006-09-14  1:40           ` Stephen Rothwell
  2006-09-14 13:03           ` Alan Cox
  1 sibling, 0 replies; 32+ messages in thread
From: Stephen Rothwell @ 2006-09-14  1:40 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Alan Cox, Jeremy Fitzhardinge, Arjan van de Ven, Linus Torvalds,
	Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Linux Kernel Mailing List, Michael A Fetterman

On Wed, 13 Sep 2006 17:25:53 -0700 Zachary Amsden <zach@vmware.com> wrote:
>
> It is worth fixing, just not a high 
> priority.  I had a patch that fixed both APM and PnP at one time, but it 
> is covered with mold and now looks like a science experiment.  Shall I 
> apply disinfectant?

Yes, please.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 21:35       ` Alan Cox
@ 2006-09-14  0:25         ` Zachary Amsden
  2006-09-14  1:40           ` Stephen Rothwell
  2006-09-14 13:03           ` Alan Cox
  0 siblings, 2 replies; 32+ messages in thread
From: Zachary Amsden @ 2006-09-14  0:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeremy Fitzhardinge, Arjan van de Ven, Linus Torvalds,
	Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Linux Kernel Mailing List, Michael A Fetterman

Alan Cox wrote:
> Ar Mer, 2006-09-13 am 13:59 -0700, ysgrifennodd Zachary Amsden:
>   
>> TLS #3 overlaps BIOS 0x40, but code which calls borken APM / PnP BIOS 
>> and sets up protected mode 0x40 GDT segment does so by swapping out the 
>> TLS segment with the identity simulation of physical 0x400 offset, 
>> swapping it back afterwards.  Short of bugs in that code (which there 
>> are, btw), you shouldn't need to be concerned with it.
>>     
>
> Care to elucidate ?
>   

I believe the current max use case for GDT descriptors is Wine.  Wine 
compiled against TLS glibc uses entry zero for libc, and allocates 
another GDT entry for the first thread created by NTDLL (although I have 
no idea why, since there is fallback code to use LDT allocation instead, 
and all subsequent allocations happen via the LDT -  perhaps some kernel 
mode DLL thing insists on having the first thread in the GDT?)  DOSemu 
by the way, only uses the LDT.

But there is no reason userspace can't allocate 3 TLS descriptors in the 
GDT per thread.  If it did, the overlap between 0x40 (descriptor #8, 
real mode BIOS simulation of physical address 0x400, BIOS data area) 
causes a problem.  Fortunately, APM and PnP take care to fix this by 
swapping in and out the descriptors.  Unfortunately, they don't get it 
quite right.

Selected code snippets (PnP):

        /*
         * PnP BIOSes are generally not terribly re-entrant.
         * Also, don't rely on them to save everything correctly.
         */
        if(pnp_bios_is_utter_crap)
                return PNP_FUNCTION_NOT_SUPPORTED;

        cpu = get_cpu();
        save_desc_40 = get_cpu_gdt_table(cpu)[0x40 / 8];
        get_cpu_gdt_table(cpu)[0x40 / 8] = bad_bios_desc;   <---- set up 
fake BIOS descriptor for 0x400

        /* On some boxes IRQ's during PnP BIOS calls are deadly.  */
        spin_lock_irqsave(&pnp_bios_lock, flags);

...  now inline assembler

                "pushl %%fs\n\t"
                "pushl %%gs\n\t"
                "pushfl\n\t"
                "movl %%esp, pnp_bios_fault_esp\n\t"
                "movl $1f, pnp_bios_fault_eip\n\t"
                "lcall %5,%6\n\t"
                "1:popfl\n\t"
                "popl %%gs\n\t"   <---- (**)
                "popl %%fs\n\t"    <---- (**)

... now restore original GDT descriptor back

        spin_unlock_irqrestore(&pnp_bios_lock, flags);

        get_cpu_gdt_table(cpu)[0x40 / 8] = save_desc_40;
        put_cpu();


But it is too late - damage is already done (at **), since %fs or %gs 
could have had a reference to TLS descriptor #3, and they get reloaded 
_before_ the GDT is restored.  Thus any userspace process that uses TLS 
descriptor #3 in FS or GS and makes a BIOS call to PnP may get corrupted 
data loaded into the hidden state of FS / GS selectors.

APM has a similar problem.  Both are easily fixable, but there has been 
too much flux in this area recently to get a stable patch for these 
problems, and the problems are exceedingly unlikely, since I don't know 
of a single userspace program using TLS descriptor #3, much less one 
that makes use of APM or PnP facilities.  There is the possibility 
however, that such a program could sleep, run the idle thread, which 
makes a call into some of these BIOS facilities, and then reschedules 
the same program thread - which means FS/GS never get reloaded, thus 
maintaining their corrupted values.  It is worth fixing, just not a high 
priority.  I had a patch that fixed both APM and PnP at one time, but it 
is covered with mold and now looks like a science experiment.  Shall I 
apply disinfectant?

Zach

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 22:05     ` Linus Torvalds
@ 2006-09-13 22:22       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 22:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Eric W. Biederman, Arjan van de Ven,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman

Linus Torvalds wrote:
> So I'd not be surprised if movign the TLS segments around would break 
> something. 
>   

I don't think so.  32-bit code running on x86-64 has different TLS 
selectors, and everything seems to work there...

> That said, numbers talk, bullshit walks. If the above just works a lot 
> better for all modern CPU's that all have 64-byte cachelines (because now 
> _everything_ is in that bigger cacheline), and if you can show that with 
> numbers, and nothing breaks in practice, then hey..
>   

My goal would be to do a minimal change which packs all the useful stuff 
together in a 64-byte line.  Ideally it would just use two 32-byte 
lines, but I don't think that's as important.

Caching effects are pretty hard to measure anyway, and with something as 
deeply x86-microarchitectural as this, I could imagine lots of other CPU 
cleverness which could obscure any simple measurement.  But packing 
things into a line certainly can't hurt.

I'll put something together, and see how it goes...

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 21:47   ` Jeremy Fitzhardinge
@ 2006-09-13 22:05     ` Linus Torvalds
  2006-09-13 22:22       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2006-09-13 22:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Andi Kleen, Eric W. Biederman, Arjan van de Ven,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman



On Wed, 13 Sep 2006, Jeremy Fitzhardinge wrote:
> 
> So does this mean that moving the user-visible cs/ds isn't likely to break
> stuff, if it has been done before?

Yes. I _think_ we could do it. It's been done before, and nobody noticed.

That said, it may actually be that programs have since become much more 
aware of segments, for a rather perverse reason: the TLS stuff. Old 
programs are all very much coded and compiled for a totally flat model, 
and as such they really don't know _anything_ about segments. But with 
more TLS stuff, it's possible that a modern threded program is at least 
aware of _some_ of it. 

In other words - I _suspect_ we can move things around, but it would 
require some rather heavy testing, at least. Especially programs like Wine 
might react badly.

> > And segment #8 (ie 0x40) is special (TLS segment #3), of course. Anybody who
> > wants to emulate windows or use the BIOS needs to use that for their "common
> > BIOS area" thing, iirc.
> 
> Do you mean that something like dosemu/Wine needs to be able to use GDT #8?
> Or is it only used in kernel code?

Both. I think the APM BIOS callbacks use GDT#8 too. As long as it's not 
one of the really _core_ kernel segments, that's ok (you can swap it 
around and nobody will care). But it would be a total disaster (I suspect) 
if GDT#8 was the kernel code segment, for example. Suddenly the "switch 
things around temporarily" is not as trivial any more, and involves nasty 
nasty things.

[ BUT! I haven't ever really had much to do with those BIOS callbacks, and 
  I'm too lazy to check, so this is all from memory. ]

> > See above. The kernel and user segments have to be moved as a block of four,
> > and obviously we'd like to keep them in the same cacheline too. Also, the
> > cacheline that contains segment #8/0x40 is not available,
> 
> Why's that?  That cacheline (assuming 64 byte line size) already contains the
> user/kernel/cs/ds descriptors.

Right. That's what I'm saying. We should move them all together, and we 
should keep them as aligned as they are now. 

> I'm thinking of putting together a patch to change the descriptor use to:
> 
>    8  - TLS #1
>    9  - TLS #2
>    10 - TLS #3

So I'd not be surprised if movign the TLS segments around would break 
something. 

>    11 - Kernel PDA

But you keep the four basic ones in the same place:

>    12 - Kernel CS
>    13 - Kernel DS
>    14 - User CS
>    15 - User DS

So that's obviously ok at least for _those_.

> Alternatively, maybe:
> 
>    0  - NULL
>    1  - Kernel PDA
>    2  - Kernel CS
>    3  - Kernel DS
>    4  - User CS
>    5  - User DS
>    6  - TLS #1
>    7  - TLS #2
>      
> which moves the user cs/ds, but avoids #8.

I don't like that one, exactly because now the four most common segments 
(which get accessed for all system calls) are no longer in the same 
32-byte cacheline.

[ Unless we start playing games with offsetting the GDT or something.. 
  Quite frankly, I'd rather keep it simple and obvious. ]

Now, most systems have a 64-byte cacheline these days (and some have a 
split 128-byte one), and maybe we'll never go back to the "good old days" 
with 32-byte lines, so maybe this is a total non-issue. But fitting in the 
same 32-byte aligned thing would still count as a "good thing" in my book.

That said, numbers talk, bullshit walks. If the above just works a lot 
better for all modern CPU's that all have 64-byte cachelines (because now 
_everything_ is in that bigger cacheline), and if you can show that with 
numbers, and nothing breaks in practice, then hey..

			Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 21:21 ` Linus Torvalds
@ 2006-09-13 21:47   ` Jeremy Fitzhardinge
  2006-09-13 22:05     ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 21:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Eric W. Biederman, Arjan van de Ven,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman

Linus Torvalds wrote:
> These _used_ to be the "user CS/DS" respectively, but that got changed 
> around by me when did the "sysenter" support.
>   

So does this mean that moving the user-visible cs/ds isn't likely to 
break stuff, if it has been done before?

> The sysenter logic (or, more properly, the sysexit one) requires that the 
> user code segment number is the same as the kernel code segment +2 (ie 
> "+16" in actual selector term). And the user data segment needs to be +3.
>   

Yep, I'm aware of that constraint.

> And segment #8 (ie 0x40) is special (TLS segment #3), of course. 
> Anybody who wants to emulate windows or use the BIOS needs to use that for 
> their "common BIOS area" thing, iirc.
>   

Do you mean that something like dosemu/Wine needs to be able to use GDT 
#8?  Or is it only used in kernel code?

> See above. The kernel and user segments have to be moved as a block of 
> four, and obviously we'd like to keep them in the same cacheline too. 
> Also, the cacheline that contains segment #8/0x40 is not available,

Why's that?  That cacheline (assuming 64 byte line size) already 
contains the user/kernel/cs/ds descriptors.

I'm thinking of putting together a patch to change the descriptor use to:

    8  - TLS #1
    9  - TLS #2
    10 - TLS #3
    11 - Kernel PDA
    12 - Kernel CS
    13 - Kernel DS
    14 - User CS
    15 - User DS
      

This has the advantage of leaving the user cs/ds unchanged.  From what 
people had said so far, this should be OK, other than making the heavily 
used TLS #1 share the BIOS common area entry number.  If this needs to 
be usable by userspace for something special, then making it TLS #1 
won't fly...

Alternatively, maybe:

    0  - NULL
    1  - Kernel PDA
    2  - Kernel CS
    3  - Kernel DS
    4  - User CS
    5  - User DS
    6  - TLS #1
    7  - TLS #2
      

which moves the user cs/ds, but avoids #8.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 20:59     ` Zachary Amsden
  2006-09-13 21:15       ` Jeremy Fitzhardinge
@ 2006-09-13 21:35       ` Alan Cox
  2006-09-14  0:25         ` Zachary Amsden
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Cox @ 2006-09-13 21:35 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Jeremy Fitzhardinge, Arjan van de Ven, Linus Torvalds,
	Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Linux Kernel Mailing List, Michael A Fetterman

Ar Mer, 2006-09-13 am 13:59 -0700, ysgrifennodd Zachary Amsden:
> TLS #3 overlaps BIOS 0x40, but code which calls borken APM / PnP BIOS 
> and sets up protected mode 0x40 GDT segment does so by swapping out the 
> TLS segment with the identity simulation of physical 0x400 offset, 
> swapping it back afterwards.  Short of bugs in that code (which there 
> are, btw), you shouldn't need to be concerned with it.

Care to elucidate ?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 18:58 Jeremy Fitzhardinge
  2006-09-13 19:16 ` Arjan van de Ven
  2006-09-13 19:55 ` linux-os (Dick Johnson)
@ 2006-09-13 21:21 ` Linus Torvalds
  2006-09-13 21:47   ` Jeremy Fitzhardinge
  2006-09-14  6:00 ` Andi Kleen
  3 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2006-09-13 21:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Andi Kleen, Eric W. Biederman, Arjan van de Ven,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman



On Wed, 13 Sep 2006, Jeremy Fitzhardinge wrote:
> *
> *   4 - unused			<==== new cacheline
> *   5 - unused

These _used_ to be the "user CS/DS" respectively, but that got changed 
around by me when did the "sysenter" support.

The sysenter logic (or, more properly, the sysexit one) requires that the 
user code segment number is the same as the kernel code segment +2 (ie 
"+16" in actual selector term). And the user data segment needs to be +3.

So with sysenter, we needed a block of four contiguous segments: kernel 
code, kernel data, user code, user data (in that order).

There are other possible things to do, but what we did was to move the 
user segments up to just above the kernel ones (which we left in place).

> *   6 - TLS segment #1			[ glibc's TLS segment ]
> *   7 - TLS segment #2			[ Wine's %fs Win32 segment ]
> *   8 - TLS segment #3
> *   9 - reserved
> *  10 - reserved
> *  11 - reserved

These are really reserved, I think we left them that way on purpose so 
that if we wanted to, we can allow more of the contiguous per-thread 
state. And segment #8 (ie 0x40) is special (TLS segment #3), of course. 
Anybody who wants to emulate windows or use the BIOS needs to use that for 
their "common BIOS area" thing, iirc.

I think it's generally a good idea to keep the low segment reserved (or at 
least free to use for whatever user code), since if there are any special 
magic segment descriptor numbers, they tend to be in that low range. The 
#8/0x40 thing is just an example.

> What are entries 1-3 and 9-11 reserved for?  Must they be unused for some
> reason, or is there some proposed use that has not been impemented yet?
> 
> Also, is there a particular reason kernel GDT entries start at 12?  Would
> there be a problem in using either 4 or 5 for a kernel GDT descriptor?

See above. The kernel and user segments have to be moved as a block of 
four, and obviously we'd like to keep them in the same cacheline too. 
Also, the cacheline that contains segment #8/0x40 is not available, so 
that together with keeping low segments for user space explains why it's 
at segment numbers #12-15 (selectors 0x60/0x68/0x73/0x7b).

But I don't think anything but 0x40 is "set in stone".

		Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 20:59     ` Zachary Amsden
@ 2006-09-13 21:15       ` Jeremy Fitzhardinge
  2006-09-13 21:35       ` Alan Cox
  1 sibling, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 21:15 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Arjan van de Ven, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Eric W. Biederman, Linux Kernel Mailing List,
	Michael A Fetterman, Alan Cox

Zachary Amsden wrote:
> I believe 9,10,11 are reserved for future users like yourself or 
> expanded TLS segments.  I think a bank of 3 TLS segments in the GDT is 
> working fine now (does NPTL even use more than one?).

Nope.  And there's a comment that wine uses one more.  I think the third 
is completely unused.

Does this mean that "reserved" is actually synonymous with "unused" in 
asm/segment.h?

>> Otherwise line 1 would be ideal for putting 3 TLS, kernel+user 
>> code+data and PDA into, thereby making 99.999% of GDT descriptor uses 
>> come from one cache line.
>
> That change is visible to userspace, unfortunately.

Don't think it matters much.  32-bit processes on x86-64 seem perfectly 
happy with the TLS being in a different place.  I think the ABI is 
defined in terms of "use the selector for the entry that 
set_thread_area/clone returns", and so is not a constant.  But I agree 
it would be better not to.

Hm, moving user cs/ds would be pretty visible too... Hm, and it would 
have a greater chance of breaking stuff if they changed, compared to 
moving the TLS...

So is there any reason for "kernel entries start at 12"?  If there's no 
reason for it, then we can pack everything useful into 1-5.

>> But anyway, what breaks if I put the PDA in 11?
>
> Nothing.

OK then.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 20:20   ` Jeremy Fitzhardinge
@ 2006-09-13 20:59     ` Zachary Amsden
  2006-09-13 21:15       ` Jeremy Fitzhardinge
  2006-09-13 21:35       ` Alan Cox
  0 siblings, 2 replies; 32+ messages in thread
From: Zachary Amsden @ 2006-09-13 20:59 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Arjan van de Ven, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Eric W. Biederman, Linux Kernel Mailing List,
	Michael A Fetterman, Alan Cox

Jeremy Fitzhardinge wrote:
> Arjan van de Ven wrote:
>> I don't know the exact details on these; I do know that several GDT
>> entries tend to be used by BIOSes in their APM implementations and thus
>> are better of not being used. That might be the underlying reason
>> here....
>>   
>
> Hm, I see.
>
> Also, thinking about this a bit more, it would be most helpful to move 
> the PDA descriptor onto the same cache line as the other descriptors 
> used in the kernel - ie, somewhere in the range of 8-15 (assuming 64 
> byte line size):
>
> *   8 - TLS segment #3
> *   9 - reserved
> *  10 - reserved
> *  11 - reserved
> *
> *  ------- start of kernel segments:
> *
> *  12 - kernel code segment
> *  13 - kernel data segment
> *  14 - default user CS
> *  15 - default user DS
>
> This seems pretty wasteful of the GDT cache line, since the 
> kernel+user cs/ds are shared a cache line with 3 reserved entries and 
> the never-used TLS #3 descriptor.    If it were OK to put the PDA in 
> one of 9,10,11, then that would be

TLS #3 overlaps BIOS 0x40, but code which calls borken APM / PnP BIOS 
and sets up protected mode 0x40 GDT segment does so by swapping out the 
TLS segment with the identity simulation of physical 0x400 offset, 
swapping it back afterwards.  Short of bugs in that code (which there 
are, btw), you shouldn't need to be concerned with it.

I believe 9,10,11 are reserved for future users like yourself or 
expanded TLS segments.  I think a bank of 3 TLS segments in the GDT is 
working fine now (does NPTL even use more than one?).

> good.  Unfortunately the next cache line is clogged up with PNP and 
> APM stuff, which I presume not movable.

Totally movable, actually, just means breaking module dependencies.

>
> In fact, if we assume that "reserved" means "unusable", it looks like 
> none of the GDT's cache lines can be freed up to lay out the most 
> commonly used descriptors into a single cache line:
>
>    line 0: NULL descriptor, 3 reserved, 2 unused, 2 TLS
>    line 1: 1 TLS, 3 reserved, kernel+user code+data
>    line 2: TSS, LDT, PNPBIOS, APMBIOS
>    line 3: APMBIOS, ESPFIX, 4 unused, doublefault TSS
>
> Otherwise line 1 would be ideal for putting 3 TLS, kernel+user 
> code+data and PDA into, thereby making 99.999% of GDT descriptor uses 
> come from one cache line.

That change is visible to userspace, unfortunately.

>
> But anyway, what breaks if I put the PDA in 11?

Nothing.

Zach

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 20:08   ` Jeremy Fitzhardinge
@ 2006-09-13 20:32     ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-09-13 20:32 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Arjan van de Ven, Zachary Amsden, Linux Kernel Mailing List,
	Michael A Fetterman


On Wed, 13 Sep 2006, Jeremy Fitzhardinge wrote:

> linux-os (Dick Johnson) wrote:
>> The entries 1 through 3 are used during the boot sequence, see
>> setup.S, search for "gdt" around line 983.
>>
>
> OK, but that's an early GDT used during boot, which shouldn't have any
> bearing on the GDT of the running kernel.
>
>> I can't imagine a reason why you'd want to do this.
>>
>
> I'm looking at packing all the descriptors together so they share a
> cache line, and therefore reduce the likelihood of a cache miss when
> loading a segment register.
>
>    J

You can certainly see if what you do works, but the last time I
looked, you need the linear address-space mapped by these to load
a new GDT, which needs to be untranslated in the TLB (unity mapped).
You can also search the kernel to see if they are required to
get into and get out of VM86 mode, for the dosemu users. Basically,
you need to change to what you want, see if it boots, then test
everything that might use these entries. It's scary and a lot of
work, probably the reason why nobody's bothered to muck with the
GDT. There is also a "specific" entry, used for pseudo-real mode
to access the BIOS for some APM stuff. I don't remember the number,
but that shouldn't be changed because the BIOS hard-codes it for
setting segments.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.66 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 19:16 ` Arjan van de Ven
  2006-09-13 20:00   ` Alan Cox
@ 2006-09-13 20:20   ` Jeremy Fitzhardinge
  2006-09-13 20:59     ` Zachary Amsden
  1 sibling, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 20:20 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman

Arjan van de Ven wrote:
> I don't know the exact details on these; I do know that several GDT
> entries tend to be used by BIOSes in their APM implementations and thus
> are better of not being used. That might be the underlying reason
> here....
>   

Hm, I see.

Also, thinking about this a bit more, it would be most helpful to move 
the PDA descriptor onto the same cache line as the other descriptors 
used in the kernel - ie, somewhere in the range of 8-15 (assuming 64 
byte line size):

 *   8 - TLS segment #3
 *   9 - reserved
 *  10 - reserved
 *  11 - reserved
 *
 *  ------- start of kernel segments:
 *
 *  12 - kernel code segment
 *  13 - kernel data segment
 *  14 - default user CS
 *  15 - default user DS

This seems pretty wasteful of the GDT cache line, since the kernel+user 
cs/ds are shared a cache line with 3 reserved entries and the never-used 
TLS #3 descriptor.    If it were OK to put the PDA in one of 9,10,11, 
then that would be good.  Unfortunately the next cache line is clogged 
up with PNP and APM stuff, which I presume not movable.

In fact, if we assume that "reserved" means "unusable", it looks like 
none of the GDT's cache lines can be freed up to lay out the most 
commonly used descriptors into a single cache line:

    line 0: NULL descriptor, 3 reserved, 2 unused, 2 TLS
    line 1: 1 TLS, 3 reserved, kernel+user code+data
    line 2: TSS, LDT, PNPBIOS, APMBIOS
    line 3: APMBIOS, ESPFIX, 4 unused, doublefault TSS

Otherwise line 1 would be ideal for putting 3 TLS, kernel+user code+data 
and PDA into, thereby making 99.999% of GDT descriptor uses come from 
one cache line.

But anyway, what breaks if I put the PDA in 11?

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 19:55 ` linux-os (Dick Johnson)
@ 2006-09-13 20:08   ` Jeremy Fitzhardinge
  2006-09-13 20:32     ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 20:08 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Arjan van de Ven, Zachary Amsden, Linux Kernel Mailing List,
	Michael A Fetterman

linux-os (Dick Johnson) wrote:
> The entries 1 through 3 are used during the boot sequence, see
> setup.S, search for "gdt" around line 983.
>   

OK, but that's an early GDT used during boot, which shouldn't have any 
bearing on the GDT of the running kernel.

> I can't imagine a reason why you'd want to do this.
>   

I'm looking at packing all the descriptors together so they share a 
cache line, and therefore reduce the likelihood of a cache miss when 
loading a segment register.

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 20:00   ` Alan Cox
@ 2006-09-13 20:02     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 20:02 UTC (permalink / raw)
  To: Alan Cox
  Cc: Arjan van de Ven, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Eric W. Biederman, Zachary Amsden, Linux Kernel Mailing List,
	Michael A Fetterman

Alan Cox wrote:
> Ar Mer, 2006-09-13 am 21:16 +0200, ysgrifennodd Arjan van de Ven:
>   
>> I don't know the exact details on these; I do know that several GDT
>> entries tend to be used by BIOSes in their APM implementations and thus
>> are better of not being used.
>>     
>
> Thats 0x40 which tends to get used as if was a real mode base for BIOS
> accesses even via the protected mode interface.
>   

Do you mean descriptor entry 8?  Because that's TLS #3, not reserved...

    J

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 19:16 ` Arjan van de Ven
@ 2006-09-13 20:00   ` Alan Cox
  2006-09-13 20:02     ` Jeremy Fitzhardinge
  2006-09-13 20:20   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Cox @ 2006-09-13 20:00 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jeremy Fitzhardinge, Linus Torvalds, Ingo Molnar, Andi Kleen,
	Eric W. Biederman, Zachary Amsden, Linux Kernel Mailing List,
	Michael A Fetterman

Ar Mer, 2006-09-13 am 21:16 +0200, ysgrifennodd Arjan van de Ven:
> I don't know the exact details on these; I do know that several GDT
> entries tend to be used by BIOSes in their APM implementations and thus
> are better of not being used.

Thats 0x40 which tends to get used as if was a real mode base for BIOS
accesses even via the protected mode interface.

Alan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 18:58 Jeremy Fitzhardinge
  2006-09-13 19:16 ` Arjan van de Ven
@ 2006-09-13 19:55 ` linux-os (Dick Johnson)
  2006-09-13 20:08   ` Jeremy Fitzhardinge
  2006-09-13 21:21 ` Linus Torvalds
  2006-09-14  6:00 ` Andi Kleen
  3 siblings, 1 reply; 32+ messages in thread
From: linux-os (Dick Johnson) @ 2006-09-13 19:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Arjan van de Ven, Zachary Amsden, Linux Kernel Mailing List,
	Michael A Fetterman


On Wed, 13 Sep 2006, Jeremy Fitzhardinge wrote:

> What's the rationale for the current assignment of GDT entries?  In
> particular, this section:
>
> *   0 - null
> *   1 - reserved
> *   2 - reserved
> *   3 - reserved
> *
> *   4 - unused			<==== new cacheline
> *   5 - unused
> *
> *  ------- start of TLS (Thread-Local Storage) segments:
> *
> *   6 - TLS segment #1			[ glibc's TLS segment ]
> *   7 - TLS segment #2			[ Wine's %fs Win32 segment ]
> *   8 - TLS segment #3
> *   9 - reserved
> *  10 - reserved
> *  11 - reserved
>
>
> What are entries 1-3 and 9-11 reserved for?  Must they be unused for
> some reason, or is there some proposed use that has not been impemented yet?
>

In the ix86, the first descriptor in the GDT is not used. The are
TWO 32-bits words for each GDT entry. The GDT numbers are the offset from
the first, so they are numbered as offsets, there are multiplied by
8, the size of a GDT, by the processor when they are used to set segment
registers. This table is only accessed by the CPU when the LGDT
instruction in executed. When a segment register is set, the invisible
part of the segment, the top 16 bits, contains the information extracted
from the GDT, so no further access is necessary. This means that
it has nothing to do with cache-lines.

The entries 1 through 3 are used during the boot sequence, see
setup.S, search for "gdt" around line 983.

> Also, is there a particular reason kernel GDT entries start at 12?
> Would there be a problem in using either 4 or 5 for a kernel GDT descriptor?
>
> I'm asking because I'd like to use one of these entries for the PDA
> descriptor, so that it is on the same cache line as the TLS
> descriptors.  That way, the entry/exit segment register reloads would
> still only need to touch two GDT cache lines.  Would there be a real
> problem in doing this?
>

You can add other GDT entries up to 8192 if you want. If you set the
base, limit, type, etc., to something that's different than the
kernel DS, SS, CS, etc., then you need to reload the segment registers
and if the base is different, the code offset will be WRONG so you
will need to tell the linker the new relocation information.

I can't imagine a reason why you'd want to do this.

> Thanks,
>    J
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.66 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Assignment of GDT entries
  2006-09-13 18:58 Jeremy Fitzhardinge
@ 2006-09-13 19:16 ` Arjan van de Ven
  2006-09-13 20:00   ` Alan Cox
  2006-09-13 20:20   ` Jeremy Fitzhardinge
  2006-09-13 19:55 ` linux-os (Dick Johnson)
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 32+ messages in thread
From: Arjan van de Ven @ 2006-09-13 19:16 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Zachary Amsden, Linux Kernel Mailing List, Michael A Fetterman

On Wed, 2006-09-13 at 11:58 -0700, Jeremy Fitzhardinge wrote:
> What's the rationale for the current assignment of GDT entries?  In 
> particular, this section:
> 
>  *   0 - null
>  *   1 - reserved
>  *   2 - reserved
>  *   3 - reserved
>  *
>  *   4 - unused			<==== new cacheline
>  *   5 - unused
>  *
>  *  ------- start of TLS (Thread-Local Storage) segments:
>  *
>  *   6 - TLS segment #1			[ glibc's TLS segment ]
>  *   7 - TLS segment #2			[ Wine's %fs Win32 segment ]
>  *   8 - TLS segment #3
>  *   9 - reserved
>  *  10 - reserved
>  *  11 - reserved
> 
> 
> What are entries 1-3 and 9-11 reserved for?  Must they be unused for 
> some reason, or is there some proposed use that has not been impemented yet?


I don't know the exact details on these; I do know that several GDT
entries tend to be used by BIOSes in their APM implementations and thus
are better of not being used. That might be the underlying reason
here....



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Assignment of GDT entries
@ 2006-09-13 18:58 Jeremy Fitzhardinge
  2006-09-13 19:16 ` Arjan van de Ven
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Jeremy Fitzhardinge @ 2006-09-13 18:58 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, Andi Kleen, Eric W. Biederman,
	Arjan van de Ven, Zachary Amsden
  Cc: Linux Kernel Mailing List, Michael A Fetterman

What's the rationale for the current assignment of GDT entries?  In 
particular, this section:

 *   0 - null
 *   1 - reserved
 *   2 - reserved
 *   3 - reserved
 *
 *   4 - unused			<==== new cacheline
 *   5 - unused
 *
 *  ------- start of TLS (Thread-Local Storage) segments:
 *
 *   6 - TLS segment #1			[ glibc's TLS segment ]
 *   7 - TLS segment #2			[ Wine's %fs Win32 segment ]
 *   8 - TLS segment #3
 *   9 - reserved
 *  10 - reserved
 *  11 - reserved


What are entries 1-3 and 9-11 reserved for?  Must they be unused for 
some reason, or is there some proposed use that has not been impemented yet?

Also, is there a particular reason kernel GDT entries start at 12?  
Would there be a problem in using either 4 or 5 for a kernel GDT descriptor?

I'm asking because I'd like to use one of these entries for the PDA 
descriptor, so that it is on the same cache line as the TLS 
descriptors.  That way, the entry/exit segment register reloads would 
still only need to touch two GDT cache lines.  Would there be a real 
problem in doing this?

Thanks,
    J


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2006-09-15 18:27 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-15  7:55 Assignment of GDT entries Mikael Pettersson
2006-09-15  8:20 ` Jeremy Fitzhardinge
2006-09-15  8:58   ` Mikael Pettersson
2006-09-15 18:27     ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2006-09-14  4:06 Albert Cahalan
2006-09-14  4:44 ` Eric W. Biederman
2006-09-14  6:19   ` Albert Cahalan
2006-09-14  6:28     ` Zachary Amsden
2006-09-14  7:12       ` Albert Cahalan
2006-09-14  7:24         ` Zachary Amsden
2006-09-14  6:29     ` Jeremy Fitzhardinge
2006-09-14  3:23 Albert Cahalan
2006-09-14  6:11 ` Jeremy Fitzhardinge
2006-09-13 18:58 Jeremy Fitzhardinge
2006-09-13 19:16 ` Arjan van de Ven
2006-09-13 20:00   ` Alan Cox
2006-09-13 20:02     ` Jeremy Fitzhardinge
2006-09-13 20:20   ` Jeremy Fitzhardinge
2006-09-13 20:59     ` Zachary Amsden
2006-09-13 21:15       ` Jeremy Fitzhardinge
2006-09-13 21:35       ` Alan Cox
2006-09-14  0:25         ` Zachary Amsden
2006-09-14  1:40           ` Stephen Rothwell
2006-09-14 13:03           ` Alan Cox
2006-09-13 19:55 ` linux-os (Dick Johnson)
2006-09-13 20:08   ` Jeremy Fitzhardinge
2006-09-13 20:32     ` linux-os (Dick Johnson)
2006-09-13 21:21 ` Linus Torvalds
2006-09-13 21:47   ` Jeremy Fitzhardinge
2006-09-13 22:05     ` Linus Torvalds
2006-09-13 22:22       ` Jeremy Fitzhardinge
2006-09-14  6:00 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).