All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about switch_mm function
@ 2015-01-28 16:26 Sreejith M M
  2015-03-25 13:30 ` Sreejith M M
  0 siblings, 1 reply; 10+ messages in thread
From: Sreejith M M @ 2015-01-28 16:26 UTC (permalink / raw)
  To: kernelnewbies

Hi,

I was trying to understand the difference in scheduling between
processes and threads(belong to same process).

I was thinking that, when kernel has to switch to a task which belong
to the same process, it does not have to clear / replace page global
directories and other memory related information.

But in switch_mm function some code is put under CONFIG_SMP function.
What is its signigicance? Code is
below(http://lxr.free-electrons.com/source/arch/x86/include/asm/mmu_context.h#L37)
.
What I infer is that the code is doing flush tlb, reload page table
directories etc in multiprocessor mode(obviously)  but I believe this
code may never be executed .

Can anyone help to understand what this part of the function supposed to do?

 60 #ifdef CONFIG_SMP
 61           else {
 62                 this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
 63                 BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);
 64
 65                 if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
 66                         /*
 67                          * On established mms, the mm_cpumask is
only changed
 68                          * from irq context, from
ptep_clear_flush() while in
 69                          * lazy tlb mode, and here. Irqs are blocked during
 70                          * schedule, protecting us from
simultaneous changes.
 71                          */
 72                         cpumask_set_cpu(cpu, mm_cpumask(next));
 73                         /*
 74                          * We were in lazy tlb mode and leave_mm disabled
 75                          * tlb flush IPI delivery. We must reload CR3
 76                          * to make sure to use no freed page tables.
 77                          */
 78                         load_cr3(next->pgd);
 79                         trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
TLB_FLUSH_ALL);
 80                         load_LDT_nolock(&next->context);
 81                 }
 82         }
 83 #endif


-- 
Regards,
Sreejith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-01-28 16:26 Question about switch_mm function Sreejith M M
@ 2015-03-25 13:30 ` Sreejith M M
  2015-03-25 16:00   ` Rajat Sharma
  0 siblings, 1 reply; 10+ messages in thread
From: Sreejith M M @ 2015-03-25 13:30 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Jan 28, 2015 at 9:56 PM, Sreejith M M <sreejith.mm@gmail.com> wrote:

> Hi,
>
> I was trying to understand the difference in scheduling between
> processes and threads(belong to same process).
>
> I was thinking that, when kernel has to switch to a task which belong
> to the same process, it does not have to clear / replace page global
> directories and other memory related information.
>
> But in switch_mm function some code is put under CONFIG_SMP function.
> What is its signigicance? Code is
> below(
> http://lxr.free-electrons.com/source/arch/x86/include/asm/mmu_context.h#L37
> )
> .
> What I infer is that the code is doing flush tlb, reload page table
> directories etc in multiprocessor mode(obviously)  but I believe this
> code may never be executed .
>
> Can anyone help to understand what this part of the function supposed to
> do?
>
>  60 #ifdef CONFIG_SMP
>  61           else {
>  62                 this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
>  63                 BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);
>  64
>  65                 if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
>  66                         /*
>  67                          * On established mms, the mm_cpumask is
> only changed
>  68                          * from irq context, from
> ptep_clear_flush() while in
>  69                          * lazy tlb mode, and here. Irqs are blocked
> during
>  70                          * schedule, protecting us from
> simultaneous changes.
>  71                          */
>  72                         cpumask_set_cpu(cpu, mm_cpumask(next));
>  73                         /*
>  74                          * We were in lazy tlb mode and leave_mm
> disabled
>  75                          * tlb flush IPI delivery. We must reload CR3
>  76                          * to make sure to use no freed page tables.
>  77                          */
>  78                         load_cr3(next->pgd);
>  79                         trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
> TLB_FLUSH_ALL);
>  80                         load_LDT_nolock(&next->context);
>  81                 }
>  82         }
>  83 #endif
>
>
> --
> Regards,
> Sreejith
>

Hi ,

can someone please give me any answers for this?

-- 
Regards,
Sreejith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/9706b23b/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 13:30 ` Sreejith M M
@ 2015-03-25 16:00   ` Rajat Sharma
  2015-03-25 16:05     ` Sreejith M M
  0 siblings, 1 reply; 10+ messages in thread
From: Rajat Sharma @ 2015-03-25 16:00 UTC (permalink / raw)
  To: kernelnewbies

On Mar 25, 2015 6:33 AM, "Sreejith M M" <sreejith.mm@gmail.com> wrote:
>
>
>
> On Wed, Jan 28, 2015 at 9:56 PM, Sreejith M M <sreejith.mm@gmail.com>
wrote:
>>
>> Hi,
>>
>> I was trying to understand the difference in scheduling between
>> processes and threads(belong to same process).
>>
>> I was thinking that, when kernel has to switch to a task which belong
>> to the same process, it does not have to clear / replace page global
>> directories and other memory related information.
>>
>> But in switch_mm function some code is put under CONFIG_SMP function.
>> What is its signigicance? Code is
>> below(
http://lxr.free-electrons.com/source/arch/x86/include/asm/mmu_context.h#L37)
>> .
>> What I infer is that the code is doing flush tlb, reload page table
>> directories etc in multiprocessor mode(obviously)  but I believe this
>> code may never be executed .
>>
>> Can anyone help to understand what this part of the function supposed to
do?
>>
>>  60 #ifdef CONFIG_SMP
>>  61           else {
>>  62                 this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
>>  63                 BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) !=
next);
>>  64
>>  65                 if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
>>  66                         /*
>>  67                          * On established mms, the mm_cpumask is
>> only changed
>>  68                          * from irq context, from
>> ptep_clear_flush() while in
>>  69                          * lazy tlb mode, and here. Irqs are blocked
during
>>  70                          * schedule, protecting us from
>> simultaneous changes.
>>  71                          */
>>  72                         cpumask_set_cpu(cpu, mm_cpumask(next));
>>  73                         /*
>>  74                          * We were in lazy tlb mode and leave_mm
disabled
>>  75                          * tlb flush IPI delivery. We must reload CR3
>>  76                          * to make sure to use no freed page tables.
>>  77                          */
>>  78                         load_cr3(next->pgd);
>>  79                         trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
>> TLB_FLUSH_ALL);
>>  80                         load_LDT_nolock(&next->context);
>>  81                 }
>>  82         }
>>  83 #endif
>>
>>
>> --
>> Regards,
>> Sreejith
>
>
> Hi ,
>
> can someone please give me any answers for this?
>
> --
> Regards,
> Sreejith
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>

This code is handling context switch from a kernel thread back to user mode
thread so TLB entries are invalid translation for user mode thread and do
not correspond to user process pgd. It is Master kernel page table
translation as a result of kernel thread execution.

-Rajat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/38e6788b/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 16:00   ` Rajat Sharma
@ 2015-03-25 16:05     ` Sreejith M M
  2015-03-25 17:25       ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 10+ messages in thread
From: Sreejith M M @ 2015-03-25 16:05 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Mar 25, 2015 at 9:30 PM, Rajat Sharma <fs.rajat@gmail.com> wrote:
>
> On Mar 25, 2015 6:33 AM, "Sreejith M M" <sreejith.mm@gmail.com> wrote:
>>
>>
>>
>> On Wed, Jan 28, 2015 at 9:56 PM, Sreejith M M <sreejith.mm@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I was trying to understand the difference in scheduling between
>>> processes and threads(belong to same process).
>>>
>>> I was thinking that, when kernel has to switch to a task which belong
>>> to the same process, it does not have to clear / replace page global
>>> directories and other memory related information.
>>>
>>> But in switch_mm function some code is put under CONFIG_SMP function.
>>> What is its signigicance? Code is
>>>
>>> below(http://lxr.free-electrons.com/source/arch/x86/include/asm/mmu_context.h#L37)
>>> .
>>> What I infer is that the code is doing flush tlb, reload page table
>>> directories etc in multiprocessor mode(obviously)  but I believe this
>>> code may never be executed .
>>>
>>> Can anyone help to understand what this part of the function supposed to
>>> do?
>>>
>>>  60 #ifdef CONFIG_SMP
>>>  61           else {
>>>  62                 this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
>>>  63                 BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) !=
>>> next);
>>>  64
>>>  65                 if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
>>>  66                         /*
>>>  67                          * On established mms, the mm_cpumask is
>>> only changed
>>>  68                          * from irq context, from
>>> ptep_clear_flush() while in
>>>  69                          * lazy tlb mode, and here. Irqs are blocked
>>> during
>>>  70                          * schedule, protecting us from
>>> simultaneous changes.
>>>  71                          */
>>>  72                         cpumask_set_cpu(cpu, mm_cpumask(next));
>>>  73                         /*
>>>  74                          * We were in lazy tlb mode and leave_mm
>>> disabled
>>>  75                          * tlb flush IPI delivery. We must reload CR3
>>>  76                          * to make sure to use no freed page tables.
>>>  77                          */
>>>  78                         load_cr3(next->pgd);
>>>  79                         trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
>>> TLB_FLUSH_ALL);
>>>  80                         load_LDT_nolock(&next->context);
>>>  81                 }
>>>  82         }
>>>  83 #endif
>>>
>>>
>>> --
>>> Regards,
>>> Sreejith
>>
>>
>> Hi ,
>>
>> can someone please give me any answers for this?
>>
>> --
>> Regards,
>> Sreejith
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
> This code is handling context switch from a kernel thread back to user mode
> thread so TLB entries are invalid translation for user mode thread and do
> not correspond to user process pgd. It is Master kernel page table
> translation as a result of kernel thread execution.
>
> -Rajat
Hi Rajat,

If that is the case, why this code is put under CONFIG_SMP switch?


-- 
Regards,
Sreejith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 16:05     ` Sreejith M M
@ 2015-03-25 17:25       ` Valdis.Kletnieks at vt.edu
  2015-03-25 17:31         ` Sreejith M M
  0 siblings, 1 reply; 10+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2015-03-25 17:25 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:

> > This code is handling context switch from a kernel thread back to user mode
> > thread so TLB entries are invalid translation for user mode thread and do
> > not correspond to user process pgd. It is Master kernel page table
> > translation as a result of kernel thread execution.
> >
> > -Rajat
> Hi Rajat,
>
> If that is the case, why this code is put under CONFIG_SMP switch?

Vastly simplified because I'm lazy :)

If you look at the code, it's poking the status on *other* CPUs.  That's why
the cpumask() stuff.

If you're on a single execution unit, you don't have to tell the other
CPU about the change in state, because there isn't an other CPU.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/94977dc2/attachment.bin 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 17:25       ` Valdis.Kletnieks at vt.edu
@ 2015-03-25 17:31         ` Sreejith M M
  2015-03-25 17:33           ` Rajat Sharma
  0 siblings, 1 reply; 10+ messages in thread
From: Sreejith M M @ 2015-03-25 17:31 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Mar 25, 2015 at 10:55 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:
>
>> > This code is handling context switch from a kernel thread back to user mode
>> > thread so TLB entries are invalid translation for user mode thread and do
>> > not correspond to user process pgd. It is Master kernel page table
>> > translation as a result of kernel thread execution.
>> >
>> > -Rajat
>> Hi Rajat,
>>
>> If that is the case, why this code is put under CONFIG_SMP switch?
>
> Vastly simplified because I'm lazy :)
>
> If you look at the code, it's poking the status on *other* CPUs.  That's why
> the cpumask() stuff.
>
> If you're on a single execution unit, you don't have to tell the other
> CPU about the change in state, because there isn't an other CPU.

can you come out of this lazy mode explain this a bit more because I
am a newbie ?or tell me what else I should know before I have to
understand this code

-- 
Regards,
Sreejith

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 17:31         ` Sreejith M M
@ 2015-03-25 17:33           ` Rajat Sharma
  2015-03-25 19:13             ` Rajat Sharma
  0 siblings, 1 reply; 10+ messages in thread
From: Rajat Sharma @ 2015-03-25 17:33 UTC (permalink / raw)
  To: kernelnewbies

On Mar 25, 2015 10:31 AM, "Sreejith M M" <sreejith.mm@gmail.com> wrote:
>
> On Wed, Mar 25, 2015 at 10:55 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> > On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:
> >
> >> > This code is handling context switch from a kernel thread back to
user mode
> >> > thread so TLB entries are invalid translation for user mode thread
and do
> >> > not correspond to user process pgd. It is Master kernel page table
> >> > translation as a result of kernel thread execution.
> >> >
> >> > -Rajat
> >> Hi Rajat,
> >>
> >> If that is the case, why this code is put under CONFIG_SMP switch?
> >
> > Vastly simplified because I'm lazy :)
> >
> > If you look at the code, it's poking the status on *other* CPUs.
That's why
> > the cpumask() stuff.
> >
> > If you're on a single execution unit, you don't have to tell the other
> > CPU about the change in state, because there isn't an other CPU.
>
> can you come out of this lazy mode explain this a bit more because I
> am a newbie ?or tell me what else I should know before I have to
> understand this code
>
> --
> Regards,
> Sreejith

Valdis is talking about lazy tlb flush, not him being lazy. Otherwise he
wouldn't have replied at all :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/90c7a184/attachment-0001.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 17:33           ` Rajat Sharma
@ 2015-03-25 19:13             ` Rajat Sharma
  2015-03-25 19:25               ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 10+ messages in thread
From: Rajat Sharma @ 2015-03-25 19:13 UTC (permalink / raw)
  To: kernelnewbies

On Wed, Mar 25, 2015 at 10:33 AM, Rajat Sharma <fs.rajat@gmail.com> wrote:
>
>
> On Mar 25, 2015 10:31 AM, "Sreejith M M" <sreejith.mm@gmail.com> wrote:
> >
> > On Wed, Mar 25, 2015 at 10:55 PM,  <Valdis.Kletnieks@vt.edu> wrote:
> > > On Wed, 25 Mar 2015 21:35:22 +0530, Sreejith M M said:
> > >
> > >> > This code is handling context switch from a kernel thread back to user mode
> > >> > thread so TLB entries are invalid translation for user mode thread and do
> > >> > not correspond to user process pgd. It is Master kernel page table
> > >> > translation as a result of kernel thread execution.
> > >> >
> > >> > -Rajat
> > >> Hi Rajat,
> > >>
> > >> If that is the case, why this code is put under CONFIG_SMP switch?
> > >
> > > Vastly simplified because I'm lazy :)
> > >
> > > If you look at the code, it's poking the status on *other* CPUs.  That's why
> > > the cpumask() stuff.
> > >
> > > If you're on a single execution unit, you don't have to tell the other
> > > CPU about the change in state, because there isn't an other CPU.
> >
> > can you come out of this lazy mode explain this a bit more because I
> > am a newbie ?or tell me what else I should know before I have to
> > understand this code
> >
> > --
> > Regards,
> > Sreejith
>
> Valdis is talking about lazy tlb flush, not him being lazy. Otherwise he wouldn't have replied at all :)


Okay bit more details, I admit I had to dig through bit more to find
this out. After all, we all are newbies :)

On SMP system, there is an optimization called lazy TLB mode for
kernel threads. Follow the steps:

1. Assume that some of the CPU are executing a multithreaded user mode
application so essentially they all share same mm and page tables.
2. Now lets say some other CPU changes/assigns physical page frame to
user mode linear address, tets say as a result of processing a system
call on behalf of user mode process. Putting data in user mode buffer
etc. It needs to invalidate old TLB entry for this linear address in
local page table.
3. Since application is multithreaded, some other CPU sharing the same
page table will have old values for corresponding linear address in
its TLB.
4. Normally we would invalidate TLB entries of all CPUs sharing this page table.
5. Now suppose some of the participating CPUs were running a kernel
thread and does not want to be bothered about this change as it has
nothing to do with user mode pages TLB entries, it makes its executing
CPU with do not disturb mode called lazy TLB mode.
6. TLB invalidation of all CPU executing kernel thread are deferred
till kernel thread is finished.
7. At this point, when kernel thread switches back to user mode
process, the invalidation is done and is the code which are are
referring to.

Just in case, if you wonder where is invalidation happening, so
invalidation is arch specific step. In most simple way it is flush all
TLB entries and let it build up over a period of time in future.
That's why it is costly and optimization like lazy TLB mode pays off.
how it is done in x86 is by loading cr3.

http://stackoverflow.com/questions/1090218/what-does-this-little-bit-of-x86-doing-with-cr3

-Rajat

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 19:13             ` Rajat Sharma
@ 2015-03-25 19:25               ` Valdis.Kletnieks at vt.edu
  2015-03-25 19:39                 ` Rajat Sharma
  0 siblings, 1 reply; 10+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2015-03-25 19:25 UTC (permalink / raw)
  To: kernelnewbies

On Wed, 25 Mar 2015 12:13:55 -0700, Rajat Sharma said:

> Okay bit more details, I admit I had to dig through bit more to find
> this out. After all, we all are newbies :)

And you probably learned 3 times more while digging than if I had spelled it
out for you :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/52d272b4/attachment.bin 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Question about switch_mm function
  2015-03-25 19:25               ` Valdis.Kletnieks at vt.edu
@ 2015-03-25 19:39                 ` Rajat Sharma
  0 siblings, 0 replies; 10+ messages in thread
From: Rajat Sharma @ 2015-03-25 19:39 UTC (permalink / raw)
  To: kernelnewbies

On Mar 25, 2015 12:26 PM, <Valdis.Kletnieks@vt.edu> wrote:
>
> On Wed, 25 Mar 2015 12:13:55 -0700, Rajat Sharma said:
>
> > Okay bit more details, I admit I had to dig through bit more to find
> > this out. After all, we all are newbies :)
>
> And you probably learned 3 times more while digging than if I had spelled
it
> out for you :)
>
>

Completely agree :)
_______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150325/988c3619/attachment.html 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-25 19:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-28 16:26 Question about switch_mm function Sreejith M M
2015-03-25 13:30 ` Sreejith M M
2015-03-25 16:00   ` Rajat Sharma
2015-03-25 16:05     ` Sreejith M M
2015-03-25 17:25       ` Valdis.Kletnieks at vt.edu
2015-03-25 17:31         ` Sreejith M M
2015-03-25 17:33           ` Rajat Sharma
2015-03-25 19:13             ` Rajat Sharma
2015-03-25 19:25               ` Valdis.Kletnieks at vt.edu
2015-03-25 19:39                 ` Rajat Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.