All of lore.kernel.org
 help / color / mirror / Atom feed
* Thread scheduling in 2.6 kernels
@ 2011-02-23 14:16 Mandeep Sandhu
  2011-02-23 17:44 ` Mulyadi Santosa
  0 siblings, 1 reply; 8+ messages in thread
From: Mandeep Sandhu @ 2011-02-23 14:16 UTC (permalink / raw)
  To: kernelnewbies

Hi Guys,

I'm posting on this list after a very long time...so plz excuse me if
my question is OT.

I had some doubts on how threads are scheduled in 2.6 kernels.

A little background of my "issue" first.

We're working in an MIPS based embedded system, running a fairly old
Linux 2.6.22 kernel (with vendor provided BSP). We write UI
applications on top of this system using QT. One of the applications
is a photo-viewer like app which fetches image data from the n/w and
displays them on the screen in a grid-like manner. The app shows
images in a "paged" manner so going from one page to another requires
animating the grid with a new set of images.

Now the task of decoding the incoming image data is very CPU intensive
and so is the animation part. So we used to see the animation suffer
when images (for next page were being decoded). So we though of
offloading the decoding part to a low priority thread. The main thread
mostly used to do 2 main operations - animation (CPU intensive) and
n/w i/o (blocking IO). The decoder thread was given the lowest
priority and the main GUI thread was given the highest priority (this
is QT thread priority and not pthread priority, but QT internally maps
such priorities to OS specific values)

This helped improve the animation, but 2 implementations that we tried
were giving very different and unexpected results:

1. In our first implementation the decoder thread was always running,
i.e it was busy-looping even when there was no data to work on. BUT,
we were observing that it was getting a chance to run _only_ when the
main thread was idle! So it almost looked like once the main thread
finished animating (a CPU intensive task) the other thread was getting
a chance to run and we used to see all the enqueued images decoded in
a "batch". In this implementation the animation was completely smooth
as no decoding operation was hampering it.

2. In our 2nd implementation,  since busy looping was not a good idea,
we changed the decoder thread to sleep (blocking on a mutex) when
there was no job to be done. Surprisingly, in this implementation the
animation was suffering again. We could see the decoder thread run
in-between the main thread's run (which was animating the UI) which
could possibly explain the poor animation performance.

So my questions:

- In the second implementation, why was the low-priority thread
running in-between and while doing a busy-loop caused it ran only when
the main thread was idle? One of my colleague suggested that since
busy-looping is CPU bound, the kernel might be giving it a "nice"
value penalty causing it priority to change dynamically.

- Is there still a concept of fixed process timeslices for scheduling
of processes?

- How can I find out if the kernel supports NPTL (kernel managed
threads) or plain old linux threads (user-space managed threads)?

I was trying to find out more info about the threads when the app is
running (using "ps" etc), but it seems that the most useful options
are not available in the busybox ver that I'm using.

Any other way to get more thread related info about a running application?

And again, sorry if this is OT.

Thanks,
-mandeep

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-23 14:16 Thread scheduling in 2.6 kernels Mandeep Sandhu
@ 2011-02-23 17:44 ` Mulyadi Santosa
  2011-02-24 13:47   ` Mandeep Sandhu
  0 siblings, 1 reply; 8+ messages in thread
From: Mulyadi Santosa @ 2011-02-23 17:44 UTC (permalink / raw)
  To: kernelnewbies

Hi Mandeep

Quite long questions you have below...but I'll try to summarize and answer....

Btw, your problem description is great....I believe it helps (at least
/me) to get a sense what you gonna do, what you've done and how it
really works. A nice example for every one of us....

On Wed, Feb 23, 2011 at 21:16, Mandeep Sandhu
<mandeepsandhu.chd@gmail.com> wrote:

> We're working in an MIPS based embedded system, running a fairly old

OK, I take a bold note here. I only have in touch with x86 32 bit, so
what I am going to say might be completely wrong it is brought to MIPS
realm.

> Linux 2.6.22 kernel (with vendor provided BSP). We write UI

I remember vaguely that CFS (Complete Fair Scheduler) was improved
somewhere after 2.6.22 version...I couldn't recall exactly what
changes they are...

In fact, the latest "200 lines famous patch" also affect how scheduling works...

> applications on top of this system using QT. One of the applications
> is a photo-viewer like app which fetches image data from the n/w and
> displays them on the screen in a grid-like manner. The app shows
> images in a "paged" manner so going from one page to another requires
> animating the grid with a new set of images.
>
> Now the task of decoding the incoming image data is very CPU intensive
> and so is the animation part. So we used to see the animation suffer
> when images (for next page were being decoded). So we though of
> offloading the decoding part to a low priority thread. The main thread
> mostly used to do 2 main operations - animation (CPU intensive) and
> n/w i/o (blocking IO).

Why not shifting the network I/O to the decoder threat? or IMHO,
better...another separate thread? So each other could
overlap...between CPU computation and I/O.

> The decoder thread was given the lowest
> priority and the main GUI thread was given the highest priority (this
> is QT thread priority and not pthread priority, but QT internally maps
> such priorities to OS specific values)

one is lowest, latter is highest? hmmmm if we put that back to pre CFS
era, that could mean a very different time slice assignment...or in
simpler word...kinda bad idea. I think if it's using nice value, it's
better if the difference is around 5 or 10 by maximum.

> This helped improve the animation, but 2 implementations that we tried
> were giving very different and unexpected results:
>
> 1. In our first implementation the decoder thread was always running,
> i.e it was busy-looping even when there was no data to work on. BUT,
> we were observing that it was getting a chance to run _only_ when the
> main thread was idle! So it almost looked like once the main thread
> finished animating (a CPU intensive task) the other thread was getting
> a chance to run and we used to see all the enqueued images decoded in
> a "batch". In this implementation the animation was completely smooth
> as no decoding operation was hampering it.

wait, so decoder just "eat" the content of the buffer without being
signaled before? in other word, it just work all the time?

> 2. In our 2nd implementation, ?since busy looping was not a good idea,
> we changed the decoder thread to sleep (blocking on a mutex) when
> there was no job to be done.

OK...better....or signal...or any other kind of IPC that suits you....

> Surprisingly, in this implementation the
> animation was suffering again. We could see the decoder thread run
> in-between the main thread's run (which was animating the UI) which
> could possibly explain the poor animation performance.

I think this is the problem and that's why I proposed to isolate the
network I/O into separate thread. It's like ping pong, main thread
push new data, decoder thread wait...it is then woken
up..decoding...main thread waits....

Technically it is called priority inversion..if I got it correctly
about your situation.

> - Is there still a concept of fixed process timeslices for scheduling
> of processes?

Fixed? I don't think so. CFS is kinda using "delta" i.e if current
task runs for x and other which is waiting is y, then for the next
round, others deserve some kind of weighted x-y.

> - How can I find out if the kernel supports NPTL (kernel managed
> threads) or plain old linux threads (user-space managed threads)?

I think this trick might work: Check /proc/<pid>/maps or use pmap.
NPTL ones usually maps libtls in its process address space

> I was trying to find out more info about the threads when the app is
> running (using "ps" etc), but it seems that the most useful options
> are not available in the busybox ver that I'm using.

so, no coreutils/util-linux/util-linux-ng?

> Any other way to get more thread related info about a running application?

everything under /proc/<pid>? have you checked that?

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-23 17:44 ` Mulyadi Santosa
@ 2011-02-24 13:47   ` Mandeep Sandhu
  2011-02-24 17:27     ` Mulyadi Santosa
  2011-02-28  1:23     ` Sri Ram Vemulpali
  0 siblings, 2 replies; 8+ messages in thread
From: Mandeep Sandhu @ 2011-02-24 13:47 UTC (permalink / raw)
  To: kernelnewbies

> Quite long questions you have below...but I'll try to summarize and answer....

I did try to be as concise as possible! :)

>
> Btw, your problem description is great....I believe it helps (at least
> /me) to get a sense what you gonna do, what you've done and how it
> really works. A nice example for every one of us....

Thanks

>
>> We're working in an MIPS based embedded system, running a fairly old
>
> OK, I take a bold note here. I only have in touch with x86 32 bit, so
> what I am going to say might be completely wrong it is brought to MIPS
> realm.

No probs...even I'm no expert in MIPS (rather my first time with MIPS
as well!:))

The only thing that I found which _might_ be pertinent to our
discussion was that the multi-threading option for MIPS  was disabled
("MIPS MT options (Disable multithreading support.)" ). Since this is
a vendor provided config option I have not changed it. So no processor
MT support for apps.

>
>> Linux 2.6.22 kernel (with vendor provided BSP). We write UI
>
> I remember vaguely that CFS (Complete Fair Scheduler) was improved
> somewhere after 2.6.22 version...I couldn't recall exactly what
> changes they are...

The vendor provided linux kernel has the "Staircase Deadline"
scheduler patched into it...so no CFS here...

>
> In fact, the latest "200 lines famous patch" also affect how scheduling works...

Yeah I read about it (thoug I couldn't grasp how the thing actually
works)...I have the user-space variant of this soln running on my
ubuntu box :)

>
> Why not shifting the network I/O to the decoder threat? or IMHO,
> better...another separate thread? So each other could
> overlap...between CPU computation and I/O.

We have tested running the app with just the decoding bit disabled in
the decoder thread. The animation is pretty smooth...though thats also
because there's not much to do w/o the images! :)

QT handles n/w i/o pretty well, in a non-blocking, async
manner...though I'm not sure if it is internally using separate
threads for doing so...will have to find out.

>
> one is lowest, latter is highest? hmmmm if we put that back to pre CFS
> era, that could mean a very different time slice assignment...or in
> simpler word...kinda bad idea. I think if it's using nice value, it's
> better if the difference is around 5 or 10 by maximum.

The idea of assigning 2 extreme pri's was to ensure that the decode
thread never interferer's with the main thread while animation is
going on. It's almost like the main thread needs "real-time" priority
while it's doing animation...and goes back to normal priority when
idle! :)

I think SD sched uses nice values...I'm also not certain whether the
QT wrappers are assigning "nice" values when one tries to set priority
to a thread...will have to check and get back.

>
> wait, so decoder just "eat" the content of the buffer without being
> signaled before? in other word, it just work all the time?

I'm not sure i follow your question here.

The main thread _copies_ raw data rx'ed from the n/w and adds it to a
"job queue" of the decoder thread...a fxn in the decoder thread simply
checks if there are any jobs in the queue...if there is...it accesses
the data (which was copied earlier when adding the job) and decodes
the image...

This is where had the 2 types of implementations...i.e in one...this
job queue is checked continuously like:

while(true) {
    if (job-queue is NOT empty) {
       // do decode
    }
}

And in the second implementation:

while(true) {
    if (job-queue is NOT empty) {
        // do decode
    } else {
        // wait for main thread to signal us when a new job is available
    }
}

The "waiting" (in 2nd implementation) is done via thread
synchronization primitives available in QT
(http://doc.qt.nokia.com/4.6/qwaitcondition.html)

>
>
> I think this is the problem and that's why I proposed to isolate the
> network I/O into separate thread. It's like ping pong, main thread
> push new data, decoder thread wait...it is then woken
> up..decoding...main thread waits....
>
> Technically it is called priority inversion..if I got it correctly
> about your situation.

Hmmm...n/w io doesn't seem to be affecting animation perf of main
thread (as pointed above)...it's just that when the decoder thread has
a job to do..I need it to be preempted by the main thread so it can
complete its animation w/o the other thread taking away precious CPU
cycles...

I'm going to try an "renice"-ing the decoder thread to a higher value
and see if it changes the behaviour in the 2nd implementation (where
we don't busy-loop)...

>
> Fixed? I don't think so. CFS is kinda using "delta" i.e if current
> task runs for x and other which is waiting is y, then for the next
> round, others deserve some kind of weighted x-y.

SD sched, i think, assigns a fixed quota of runtime (= timeslice?) and
if the process uses up this quota...it's priority is reduced to the
next level....
>
>> - How can I find out if the kernel supports NPTL (kernel managed
>> threads) or plain old linux threads (user-space managed threads)?
>
> I think this trick might work: Check /proc/<pid>/maps or use pmap.
> NPTL ones usually maps libtls in its process address space

pmap's not available! :(

and i couldn't see libtls mapped in this process's addr space (is it
really libtls? why would we have TLS library for NPTL?...isn't libtls
used for SSL communications?)

>
> so, no coreutils/util-linux/util-linux-ng?

coreutils is there.....but most commands are stripped down/lightweight
versions of the originals! :)

>
>> Any other way to get more thread related info about a running application?
>
> everything under /proc/<pid>? have you checked that?

This helped a little!

I can see the threads spawned by the main thread under
"/proc/<pid>/task". This dir lists pid's of all the threads started by
the parent proc...and contents of individual dir (pids) is same as
"/proc/<pid>"...

Here I could find out my decoder thread's ID...but again contents of
that dir does not show info like priority/nice value etc...

Thanks again for your inputs. I'll keep posting my findings
here...till I get a satisfactory soln to this issue.

Regards,
-mandeep

>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-24 13:47   ` Mandeep Sandhu
@ 2011-02-24 17:27     ` Mulyadi Santosa
  2011-02-25  5:57       ` Mandeep Sandhu
  2011-02-28  1:23     ` Sri Ram Vemulpali
  1 sibling, 1 reply; 8+ messages in thread
From: Mulyadi Santosa @ 2011-02-24 17:27 UTC (permalink / raw)
  To: kernelnewbies

Hi Mandeep

Just quick reply 1st...I couldn't reply longer right now...

On Thu, Feb 24, 2011 at 20:47, Mandeep Sandhu
<mandeepsandhu.chd@gmail.com> wrote:
> and i couldn't see libtls mapped in this process's addr space (is it
> really libtls? why would we have TLS library for NPTL?...isn't libtls
> used for SSL communications?)

Nope, tls here refers to "thread".....

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-24 17:27     ` Mulyadi Santosa
@ 2011-02-25  5:57       ` Mandeep Sandhu
  2011-02-25 14:26         ` Mulyadi Santosa
  0 siblings, 1 reply; 8+ messages in thread
From: Mandeep Sandhu @ 2011-02-25  5:57 UTC (permalink / raw)
  To: kernelnewbies

> Nope, tls here refers to "thread".....

Ok, got it...you're referring to Thread Local Storage. I think you
meant for me to search "/lib/tls" _path_ in the addr space mappings of
my main process, right?

Googling also shows that libtls is the TLS library

I did not see any tls dir entries in my process' addr map:

$ cat /proc/<my apps' pid>/maps | grep tls
$

As a matter of fact, there's no /lib/tls dir at all!

But I'm not sure if this means that NPTL is NOT supported....because I
can see the individual threads under the 'task' of my apps' /proc
listing...

Regards,
-mandeep


>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-25  5:57       ` Mandeep Sandhu
@ 2011-02-25 14:26         ` Mulyadi Santosa
  0 siblings, 0 replies; 8+ messages in thread
From: Mulyadi Santosa @ 2011-02-25 14:26 UTC (permalink / raw)
  To: kernelnewbies

Hi Mandeep :)

On Fri, Feb 25, 2011 at 12:57, Mandeep Sandhu
<mandeepsandhu.chd@gmail.com> wrote:
>> Nope, tls here refers to "thread".....
>
> Ok, got it...you're referring to Thread Local Storage. I think you
> meant for me to search "/lib/tls" _path_ in the addr space mappings of
> my main process, right?

> Googling also shows that libtls is the TLS library
>
> I did not see any tls dir entries in my process' addr map:
>
> $ cat /proc/<my apps' pid>/maps | grep tls
> $

Here's mine, in x86:
$ cat /proc/self/maps
001e8000-0033b000 r-xp 00000000 08:08 133713
/lib/tls/i686/cmov/libc-2.11.1.so
0033b000-0033c000 ---p 00153000 08:08 133713
/lib/tls/i686/cmov/libc-2.11.1.so
0033c000-0033e000 r--p 00153000 08:08 133713
/lib/tls/i686/cmov/libc-2.11.1.so
0033e000-0033f000 rw-p 00155000 08:08 133713
/lib/tls/i686/cmov/libc-2.11.1.so
.....



> As a matter of fact, there's no /lib/tls dir at all!

OK, now that's something I don't know why...

> But I'm not sure if this means that NPTL is NOT supported....because I
> can see the individual threads under the 'task' of my apps' /proc
> listing...

it could mean that it uses the old pthread method...

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-24 13:47   ` Mandeep Sandhu
  2011-02-24 17:27     ` Mulyadi Santosa
@ 2011-02-28  1:23     ` Sri Ram Vemulpali
  2011-02-28 11:38       ` Mandeep Sandhu
  1 sibling, 1 reply; 8+ messages in thread
From: Sri Ram Vemulpali @ 2011-02-28  1:23 UTC (permalink / raw)
  To: kernelnewbies

Hi Mandeep,

         What is the preemptive level you have set for your kernel,
Check that one, and find out from your third party who provided
scheduler, the algorithm, and how it modifies the nice values.

If the thread scheduling policy was set to SCHED_OTHER than the third
party scheduler is been used. If you set thread schd policy to
SCHED_FIFO for both decoder and rendering thread and set rendering
thread to higher priority it will do for you. The other decoder thread
can be in busy loop. Why do not create a notifier for decoder thread,
so that it will wake up only when data is available.

Also, you need to tune your thread nr time and policies based on bit
rate of data you are rendering. If you can run in interims of bit rate
time both the threads, rendering and decoding, that creates a smooth
picture. Thats the catch.

Are you using multi core to do the job or single core.

--Sri.

On Thu, Feb 24, 2011 at 8:47 AM, Mandeep Sandhu
<mandeepsandhu.chd@gmail.com> wrote:
>> Quite long questions you have below...but I'll try to summarize and answer....
>
> I did try to be as concise as possible! :)
>
>>
>> Btw, your problem description is great....I believe it helps (at least
>> /me) to get a sense what you gonna do, what you've done and how it
>> really works. A nice example for every one of us....
>
> Thanks
>
>>
>>> We're working in an MIPS based embedded system, running a fairly old
>>
>> OK, I take a bold note here. I only have in touch with x86 32 bit, so
>> what I am going to say might be completely wrong it is brought to MIPS
>> realm.
>
> No probs...even I'm no expert in MIPS (rather my first time with MIPS
> as well!:))
>
> The only thing that I found which _might_ be pertinent to our
> discussion was that the multi-threading option for MIPS ?was disabled
> ("MIPS MT options (Disable multithreading support.)" ). Since this is
> a vendor provided config option I have not changed it. So no processor
> MT support for apps.
>
>>
>>> Linux 2.6.22 kernel (with vendor provided BSP). We write UI
>>
>> I remember vaguely that CFS (Complete Fair Scheduler) was improved
>> somewhere after 2.6.22 version...I couldn't recall exactly what
>> changes they are...
>
> The vendor provided linux kernel has the "Staircase Deadline"
> scheduler patched into it...so no CFS here...
>
>>
>> In fact, the latest "200 lines famous patch" also affect how scheduling works...
>
> Yeah I read about it (thoug I couldn't grasp how the thing actually
> works)...I have the user-space variant of this soln running on my
> ubuntu box :)
>
>>
>> Why not shifting the network I/O to the decoder threat? or IMHO,
>> better...another separate thread? So each other could
>> overlap...between CPU computation and I/O.
>
> We have tested running the app with just the decoding bit disabled in
> the decoder thread. The animation is pretty smooth...though thats also
> because there's not much to do w/o the images! :)
>
> QT handles n/w i/o pretty well, in a non-blocking, async
> manner...though I'm not sure if it is internally using separate
> threads for doing so...will have to find out.
>
>>
>> one is lowest, latter is highest? hmmmm if we put that back to pre CFS
>> era, that could mean a very different time slice assignment...or in
>> simpler word...kinda bad idea. I think if it's using nice value, it's
>> better if the difference is around 5 or 10 by maximum.
>
> The idea of assigning 2 extreme pri's was to ensure that the decode
> thread never interferer's with the main thread while animation is
> going on. It's almost like the main thread needs "real-time" priority
> while it's doing animation...and goes back to normal priority when
> idle! :)
>
> I think SD sched uses nice values...I'm also not certain whether the
> QT wrappers are assigning "nice" values when one tries to set priority
> to a thread...will have to check and get back.
>
>>
>> wait, so decoder just "eat" the content of the buffer without being
>> signaled before? in other word, it just work all the time?
>
> I'm not sure i follow your question here.
>
> The main thread _copies_ raw data rx'ed from the n/w and adds it to a
> "job queue" of the decoder thread...a fxn in the decoder thread simply
> checks if there are any jobs in the queue...if there is...it accesses
> the data (which was copied earlier when adding the job) and decodes
> the image...
>
> This is where had the 2 types of implementations...i.e in one...this
> job queue is checked continuously like:
>
> while(true) {
> ? ?if (job-queue is NOT empty) {
> ? ? ? // do decode
> ? ?}
> }
>
> And in the second implementation:
>
> while(true) {
> ? ?if (job-queue is NOT empty) {
> ? ? ? ?// do decode
> ? ?} else {
> ? ? ? ?// wait for main thread to signal us when a new job is available
> ? ?}
> }
>
> The "waiting" (in 2nd implementation) is done via thread
> synchronization primitives available in QT
> (http://doc.qt.nokia.com/4.6/qwaitcondition.html)
>
>>
>>
>> I think this is the problem and that's why I proposed to isolate the
>> network I/O into separate thread. It's like ping pong, main thread
>> push new data, decoder thread wait...it is then woken
>> up..decoding...main thread waits....
>>
>> Technically it is called priority inversion..if I got it correctly
>> about your situation.
>
> Hmmm...n/w io doesn't seem to be affecting animation perf of main
> thread (as pointed above)...it's just that when the decoder thread has
> a job to do..I need it to be preempted by the main thread so it can
> complete its animation w/o the other thread taking away precious CPU
> cycles...
>
> I'm going to try an "renice"-ing the decoder thread to a higher value
> and see if it changes the behaviour in the 2nd implementation (where
> we don't busy-loop)...
>
>>
>> Fixed? I don't think so. CFS is kinda using "delta" i.e if current
>> task runs for x and other which is waiting is y, then for the next
>> round, others deserve some kind of weighted x-y.
>
> SD sched, i think, assigns a fixed quota of runtime (= timeslice?) and
> if the process uses up this quota...it's priority is reduced to the
> next level....
>>
>>> - How can I find out if the kernel supports NPTL (kernel managed
>>> threads) or plain old linux threads (user-space managed threads)?
>>
>> I think this trick might work: Check /proc/<pid>/maps or use pmap.
>> NPTL ones usually maps libtls in its process address space
>
> pmap's not available! :(
>
> and i couldn't see libtls mapped in this process's addr space (is it
> really libtls? why would we have TLS library for NPTL?...isn't libtls
> used for SSL communications?)
>
>>
>> so, no coreutils/util-linux/util-linux-ng?
>
> coreutils is there.....but most commands are stripped down/lightweight
> versions of the originals! :)
>
>>
>>> Any other way to get more thread related info about a running application?
>>
>> everything under /proc/<pid>? have you checked that?
>
> This helped a little!
>
> I can see the threads spawned by the main thread under
> "/proc/<pid>/task". This dir lists pid's of all the threads started by
> the parent proc...and contents of individual dir (pids) is same as
> "/proc/<pid>"...
>
> Here I could find out my decoder thread's ID...but again contents of
> that dir does not show info like priority/nice value etc...
>
> Thanks again for your inputs. I'll keep posting my findings
> here...till I get a satisfactory soln to this issue.
>
> Regards,
> -mandeep
>
>>
>> --
>> regards,
>>
>> Mulyadi Santosa
>> Freelance Linux trainer and consultant
>>
>> blog: the-hydra.blogspot.com
>> training: mulyaditraining.blogspot.com
>>
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>



-- 
Regards,
Sri.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Thread scheduling in 2.6 kernels
  2011-02-28  1:23     ` Sri Ram Vemulpali
@ 2011-02-28 11:38       ` Mandeep Sandhu
  0 siblings, 0 replies; 8+ messages in thread
From: Mandeep Sandhu @ 2011-02-28 11:38 UTC (permalink / raw)
  To: kernelnewbies

> ? ? ? ? What is the preemptive level you have set for your kernel,

As mentioned in my first mail, I have set the following options:

- "Preemption Model" option as  - Preemptible Kernel (Low-Latency Desktop).

This, I think, means that even the kernel can be preempted (involuntarily)

- "Preempt The Big Kernel Lock"

> Check that one, and find out from your third party who provided
> scheduler, the algorithm, and how it modifies the nice values.

The scheduler being used here is Con Kolivas' "Staircase Deadline"
scheduler. It uses a priority matrix, where each process is placed at
it's "static prio" position in the matrix. Here's a short desc of the
SD desgin (taken from patch file)

+Design description
+==================
+
+SD works off the principle of providing each task a quota of runtime that it is
+allowed to run at a number of priority levels determined by its static priority
+(ie. its nice level). If the task uses up its quota it has its priority
+decremented to the next level determined by a priority matrix. Once every
+runtime quota has been consumed of every priority level, a task is
queued on the
+"expired" array. When no other tasks exist with quota, the expired array is
+activated and fresh quotas are handed out. This is all done in O(1).

>
> If the thread scheduling policy was set to SCHED_OTHER than the third
> party scheduler is been used. If you set thread schd policy to

I'm not sure what you mean here? SCHED_OTHER is the default sched
policy used for normal process' (unless explicitly changed). I think
irrespective of what sched policy is being set, there's only 1
scheduler available for use, i.e in my case, the SD scheduler. CMIIW.

> SCHED_FIFO for both decoder and rendering thread and set rendering
> thread to higher priority it will do for you. The other decoder thread
> can be in busy loop. Why do not create a notifier for decoder thread,
> so that it will wake up only when data is available.

Well, i tried something similar and that seemed to work fairly well!

I set the scheduling policy of the decoder thread to "SCHED_BATCH".
Now I'm observing that the main render/GUI thread completes its
animation and then the decoder gets a chance to run (batch mode
processing).

We're not busy-looping. Rather we're making the decoder thread wait on
a job-queue. It'll sleep as long as the job-queue is empty.

>
> Also, you need to tune your thread nr time and policies based on bit
> rate of data you are rendering. If you can run in interims of bit rate
> time both the threads, rendering and decoding, that creates a smooth
> picture. Thats the catch.

Don't quite follow you here...what is "nr time" ? I don't quite
understand what is the significance of "bit rate" for static images?
Also note that these images (JPEG) are quite small in dimensions (~
200 x 150). The memory bandwidth available from the main memory (DRAM)
to the video-rendering subsystem is quite high (~2.6Gbps), so that
won't be a bottleneck.

For me the trick to solving this issue was to NOT do decoding while
the animation was going on. Even a single decode op use to make the
animation suffer as it had fairly strict timing requirement (not hard
real-time, but close). So forcing the decoder thread to sort-of
"pause" on decoding while animation is in progress, helped.

>
> Are you using multi core to do the job or single core.

Single core. The processor has multi-threading support but that
support is disabled in the kernel config. Since this was something set
by the vendor, I'm not changing it.

Thanks,
-mandeep

>
> --Sri.
>
> On Thu, Feb 24, 2011 at 8:47 AM, Mandeep Sandhu
> <mandeepsandhu.chd@gmail.com> wrote:
>>> Quite long questions you have below...but I'll try to summarize and answer....
>>
>> I did try to be as concise as possible! :)
>>
>>>
>>> Btw, your problem description is great....I believe it helps (at least
>>> /me) to get a sense what you gonna do, what you've done and how it
>>> really works. A nice example for every one of us....
>>
>> Thanks
>>
>>>
>>>> We're working in an MIPS based embedded system, running a fairly old
>>>
>>> OK, I take a bold note here. I only have in touch with x86 32 bit, so
>>> what I am going to say might be completely wrong it is brought to MIPS
>>> realm.
>>
>> No probs...even I'm no expert in MIPS (rather my first time with MIPS
>> as well!:))
>>
>> The only thing that I found which _might_ be pertinent to our
>> discussion was that the multi-threading option for MIPS ?was disabled
>> ("MIPS MT options (Disable multithreading support.)" ). Since this is
>> a vendor provided config option I have not changed it. So no processor
>> MT support for apps.
>>
>>>
>>>> Linux 2.6.22 kernel (with vendor provided BSP). We write UI
>>>
>>> I remember vaguely that CFS (Complete Fair Scheduler) was improved
>>> somewhere after 2.6.22 version...I couldn't recall exactly what
>>> changes they are...
>>
>> The vendor provided linux kernel has the "Staircase Deadline"
>> scheduler patched into it...so no CFS here...
>>
>>>
>>> In fact, the latest "200 lines famous patch" also affect how scheduling works...
>>
>> Yeah I read about it (thoug I couldn't grasp how the thing actually
>> works)...I have the user-space variant of this soln running on my
>> ubuntu box :)
>>
>>>
>>> Why not shifting the network I/O to the decoder threat? or IMHO,
>>> better...another separate thread? So each other could
>>> overlap...between CPU computation and I/O.
>>
>> We have tested running the app with just the decoding bit disabled in
>> the decoder thread. The animation is pretty smooth...though thats also
>> because there's not much to do w/o the images! :)
>>
>> QT handles n/w i/o pretty well, in a non-blocking, async
>> manner...though I'm not sure if it is internally using separate
>> threads for doing so...will have to find out.
>>
>>>
>>> one is lowest, latter is highest? hmmmm if we put that back to pre CFS
>>> era, that could mean a very different time slice assignment...or in
>>> simpler word...kinda bad idea. I think if it's using nice value, it's
>>> better if the difference is around 5 or 10 by maximum.
>>
>> The idea of assigning 2 extreme pri's was to ensure that the decode
>> thread never interferer's with the main thread while animation is
>> going on. It's almost like the main thread needs "real-time" priority
>> while it's doing animation...and goes back to normal priority when
>> idle! :)
>>
>> I think SD sched uses nice values...I'm also not certain whether the
>> QT wrappers are assigning "nice" values when one tries to set priority
>> to a thread...will have to check and get back.
>>
>>>
>>> wait, so decoder just "eat" the content of the buffer without being
>>> signaled before? in other word, it just work all the time?
>>
>> I'm not sure i follow your question here.
>>
>> The main thread _copies_ raw data rx'ed from the n/w and adds it to a
>> "job queue" of the decoder thread...a fxn in the decoder thread simply
>> checks if there are any jobs in the queue...if there is...it accesses
>> the data (which was copied earlier when adding the job) and decodes
>> the image...
>>
>> This is where had the 2 types of implementations...i.e in one...this
>> job queue is checked continuously like:
>>
>> while(true) {
>> ? ?if (job-queue is NOT empty) {
>> ? ? ? // do decode
>> ? ?}
>> }
>>
>> And in the second implementation:
>>
>> while(true) {
>> ? ?if (job-queue is NOT empty) {
>> ? ? ? ?// do decode
>> ? ?} else {
>> ? ? ? ?// wait for main thread to signal us when a new job is available
>> ? ?}
>> }
>>
>> The "waiting" (in 2nd implementation) is done via thread
>> synchronization primitives available in QT
>> (http://doc.qt.nokia.com/4.6/qwaitcondition.html)
>>
>>>
>>>
>>> I think this is the problem and that's why I proposed to isolate the
>>> network I/O into separate thread. It's like ping pong, main thread
>>> push new data, decoder thread wait...it is then woken
>>> up..decoding...main thread waits....
>>>
>>> Technically it is called priority inversion..if I got it correctly
>>> about your situation.
>>
>> Hmmm...n/w io doesn't seem to be affecting animation perf of main
>> thread (as pointed above)...it's just that when the decoder thread has
>> a job to do..I need it to be preempted by the main thread so it can
>> complete its animation w/o the other thread taking away precious CPU
>> cycles...
>>
>> I'm going to try an "renice"-ing the decoder thread to a higher value
>> and see if it changes the behaviour in the 2nd implementation (where
>> we don't busy-loop)...
>>
>>>
>>> Fixed? I don't think so. CFS is kinda using "delta" i.e if current
>>> task runs for x and other which is waiting is y, then for the next
>>> round, others deserve some kind of weighted x-y.
>>
>> SD sched, i think, assigns a fixed quota of runtime (= timeslice?) and
>> if the process uses up this quota...it's priority is reduced to the
>> next level....
>>>
>>>> - How can I find out if the kernel supports NPTL (kernel managed
>>>> threads) or plain old linux threads (user-space managed threads)?
>>>
>>> I think this trick might work: Check /proc/<pid>/maps or use pmap.
>>> NPTL ones usually maps libtls in its process address space
>>
>> pmap's not available! :(
>>
>> and i couldn't see libtls mapped in this process's addr space (is it
>> really libtls? why would we have TLS library for NPTL?...isn't libtls
>> used for SSL communications?)
>>
>>>
>>> so, no coreutils/util-linux/util-linux-ng?
>>
>> coreutils is there.....but most commands are stripped down/lightweight
>> versions of the originals! :)
>>
>>>
>>>> Any other way to get more thread related info about a running application?
>>>
>>> everything under /proc/<pid>? have you checked that?
>>
>> This helped a little!
>>
>> I can see the threads spawned by the main thread under
>> "/proc/<pid>/task". This dir lists pid's of all the threads started by
>> the parent proc...and contents of individual dir (pids) is same as
>> "/proc/<pid>"...
>>
>> Here I could find out my decoder thread's ID...but again contents of
>> that dir does not show info like priority/nice value etc...
>>
>> Thanks again for your inputs. I'll keep posting my findings
>> here...till I get a satisfactory soln to this issue.
>>
>> Regards,
>> -mandeep
>>
>>>
>>> --
>>> regards,
>>>
>>> Mulyadi Santosa
>>> Freelance Linux trainer and consultant
>>>
>>> blog: the-hydra.blogspot.com
>>> training: mulyaditraining.blogspot.com
>>>
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>
>
>
> --
> Regards,
> Sri.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-02-28 11:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-23 14:16 Thread scheduling in 2.6 kernels Mandeep Sandhu
2011-02-23 17:44 ` Mulyadi Santosa
2011-02-24 13:47   ` Mandeep Sandhu
2011-02-24 17:27     ` Mulyadi Santosa
2011-02-25  5:57       ` Mandeep Sandhu
2011-02-25 14:26         ` Mulyadi Santosa
2011-02-28  1:23     ` Sri Ram Vemulpali
2011-02-28 11:38       ` Mandeep Sandhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.