RE: possible CPU bug and request for Intel contacts

* RE: possible CPU bug and request for Intel contacts
@ 2005-01-26  1:38 Seth, Rohit
  2005-02-13 20:10 ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Seth, Rohit @ 2005-01-26  1:38 UTC (permalink / raw)
  To: Kirill Korotaev
  Cc: Linus Torvalds, Ingo Molnar, Saxena, Sunil, Pallipadi, Venkatesh,
	Andrey Savochkin, linux-kernel

Kirill Korotaev <mailto:dev@sw.ru> wrote on Tuesday, January 25, 2005
6:12 AM:

> Hello Rohit,
> 
>>> BTW, can you explain why making pages non-global is the cure? Is it
>>> safe workaround for this bug?

>> There is a boundary condition that can have non-global pages
>> containing the CR3 load to also hit this issue on affected PIII. 
>> Though for this to happen, mov to cr3 has to be the very last
>> instruction on a page. And the page following that page (containing
>> CR3 load) has to have different mapping between user and kernel
>> spaces. 

> but in our case "mov %edx, %cr3" is not the last instruction on a
> page. 
> It is in the middle of it.

So, in this scenario (where trampoline code is mapped by non global
page), we will not hit this issue.

> Well, another remark is that after cr3 load there are only few
> instructions before the "call system_call_table(%edx)" which
> references 
> the page with different user and kernel mappings.
> 
> also, this bug can be cured via inserting about 20 simple operations
> between cr3 load and call to the page with overlapping mappings.
> 

This is not a recommended solution.

> I'm just trying to understand is it the bug referenced in E80 or not
> and 
> is it safe to use non-global mappings as a cure.

Our analysis has shown that this is E80 issue.  In this 4G-4G kernel
context, we are safe to use non-global mapping as a workaround for this
issue. (Or we can use any of the other recommendations given in the spec
update except rdtsc with global pages).

We have also seen that inserting rdtsc instruction is not a workaround.
We will update the spec update with this information.

On a little different note, while running the 4G-4G kernel on our
machine, we saw occasional hangs.  Those are root caused to the fact
that this kernel was first chaging the stack pointer from virtual stack
to kernel and then changing the CR3 to that of kernel.  Any interrupt
between these two instructions will result in those hangs as the
interruption handler will execute with user's CR3(as the kernel thinks
that it is already in kernel because of the value of esp).  Swapping the
order, first loading the CR3 with kernel and then switching the stack to
kernel fixes this issue.  Venki will generate that patch and send to
lkml.

Thanks, rohit

^ permalink raw reply	[flat|nested] 11+ messages in thread