* High CPU usage of scheduler? @ 2012-04-26 22:08 Dave Johansen 2012-04-27 3:10 ` Yong Zhang 0 siblings, 1 reply; 9+ messages in thread From: Dave Johansen @ 2012-04-26 22:08 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 3318 bytes --] I am looking into moving an application from RHEL 5 to RHEL 6 and I noticed an unexpected increase in CPU usage. A little digging has led me to believe that the scheduler may be the culprit. I created the attached test_select_work.c file to test this out. I compiled it with the following command on RHEL 5: cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread -o test_select_work I then played with the parameters until the execution time per iteration was about 1 ms on a Dell Precision m6500. I got the following result on RHEL 5: ./test_select_work 1000 10000 300 4 time_per_iteration: min: 911.5 us avg: 913.7 us max: 917.1 us stddev: 2.4 us ./test_select_work 1000 10000 300 8 time_per_iteration: min: 1802.6 us avg: 1803.9 us max: 1809.1 us stddev: 2.1 us ./test_select_work 1000 10000 300 40 time_per_iteration: min: 7580.4 us avg: 8567.3 us max: 9022.0 us stddev: 299.6 us And the following on RHEL 6: ./test_select_work 1000 10000 300 4 time_per_iteration: min: 914.6 us avg: 975.7 us max: 1034.5 us stddev: 50.0 us ./test_select_work 1000 10000 300 8 time_per_iteration: min: 1683.9 us avg: 1771.8 us max: 1810.8 us stddev: 43.4 us ./test_select_work 1000 10000 300 40 time_per_iteration: min: 7997.1 us avg: 8709.1 us max: 9061.8 us stddev: 310.0 us On both versions, these results were about what I expected with the average amount of time per iteration scaling relatively linearly. I then recompiled with -DSLEEP_TYPE=1 and got the following results on RHEL 5: ./test_select_work 1000 10000 300 4 time_per_iteration: min: 1803.3 us avg: 1902.8 us max: 2001.5 us stddev: 113.8 us ./test_select_work 1000 10000 300 8 time_per_iteration: min: 1997.1 us avg: 2002.0 us max: 2010.8 us stddev: 5.0 us ./test_select_work 1000 10000 300 40 time_per_iteration: min: 6958.4 us avg: 8397.9 us max: 9423.7 us stddev: 619.7 us And the following results on RHEL 6: ./test_select_work 1000 10000 300 4 time_per_iteration: min: 2107.1 us avg: 2143.1 us max: 2177.7 us stddev: 30.3 us ./test_select_work 1000 10000 300 8 time_per_iteration: min: 2903.3 us avg: 2903.8 us max: 2904.3 us stddev: 0.3 us ./test_select_work 1000 10000 300 40 time_per_iteration: min: 8877.7.1 us avg: 9016.3 us max: 9112.6 us stddev: 62.9 us On RHEL 5, the results were about what I expected (4 threads taking twice as long because of the 1 ms sleep but the 8 threads taking the same amount of time since each thread is now sleeping for about half the time, and a still fairly linear increase with the 40 threads). However, with RHEL 6, the time taken with 4 threads increased by about 15% more than the expected doubling and the 8 thread case increased by about 45% more than the expected slight increase. The increase in the 4 thread case seems to be that RHEL 6 is actually sleeping for a handful of microseconds more than 1 ms while RHEL 5 is only sleep about 900 us, but this doesn't explain the unexpectedly large increase in the 8 and 40 thread cases. I saw similar types of behaviour with all 3 -DSLEEP_TYPE values. I also tried playing with the scheduler parameters in sysctl, but nothing seemed to have a significant impact on the results. Any ideas on how I can further diagnose this issue? Thanks, Dave [-- Attachment #2: test_select_work.c --] [-- Type: text/x-csrc, Size: 4624 bytes --] #include <float.h> #include <math.h> #include <poll.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/select.h> #include <sys/time.h> // Uncomment to use select inside the loop of each of the threads #define SLEEP_TYPE 1 // Uncomment to use poll inside the loop of each of the threads //#define SLEEP_TYPE 2 // Uncomment to use usleep inside the loop of each of the threads //#define SLEEP_TYPE 3 struct thread_info { int sleep_time; int num_iterations; int work_size; }; void *do_test(void *arg) { const struct thread_info *tinfo = (struct thread_info *)arg; const int sleep_time = tinfo->sleep_time; const int num_iterations = tinfo->num_iterations; const int work_size = tinfo->work_size; #if SLEEP_TYPE == 1 // Data for calling select struct timeval ts; #elif SLEEP_TYPE == 2 // Data for calling poll struct pollfd pfd; #endif // Data for doing work int *buf; int pseed; int inum, bnum; // Data for tracking the time struct timeval before, after; long long *diff; buf = calloc(work_size, sizeof(int)); diff = malloc(sizeof(unsigned long long)); #if SLEEP_TYPE == 2 // Initialize the poll data pfd.fd = 0; pfd.events = 0; #endif // Get the time before starting the processing gettimeofday(&before, NULL); // Do the requested number of iterations for (inum=0; inum<num_iterations; ++inum) { #if SLEEP_TYPE == 1 ts.tv_sec = 0; ts.tv_usec = sleep_time; select(0, 0, 0, 0, &ts); #elif SLEEP_TYPE == 2 poll(&pfd, 1, sleep_time / 1000); #elif SLEEP_TYPE == 3 usleep(sleep_time); #else // Get rid of warning about unused variable (void)sleep_time; #endif // Fill in a buffer with random numbers (taken from latt.c by Jens Axboe <jens.axboe@oracle.com>) pseed = 1; for (bnum=0; bnum<work_size; ++bnum) { pseed = pseed * 1103515245 + 12345; buf[bnum] = (pseed / 65536) % 32768; } } // Get the time after starting the processing gettimeofday(&after, NULL); // Calculate the delta time *diff = 1000000LL * (after.tv_sec - before.tv_sec); *diff += after.tv_usec - before.tv_usec; // Clean up the data free(buf); return diff; } int main(int argc, char **argv) { if (argc < 4) { printf("Usage: %s <sleep_time> <num_iterations> <work_size> <num_threads>\n", argv[0]); return -1; } struct thread_info tinfo; int s, tnum, num_threads; pthread_attr_t attr; pthread_t *threads; long long *res; long long *times; // Get the parameters tinfo.sleep_time = atoi(argv[1]); tinfo.num_iterations = atoi(argv[2]); tinfo.work_size = atoi(argv[3]) * 1024; num_threads = atoi(argv[4]); // Initialize the thread creation attributes s = pthread_attr_init(&attr); if (s != 0) { printf("Error initializing thread attributes\n"); return -2; } // Allocate the memory to track the threads threads = calloc(num_threads, sizeof(pthread_t)); times = calloc(num_threads, sizeof(unsigned long long)); if (threads == NULL) { printf("Error allocating memory to track threads\n"); return -3; } // Start all of the threads for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_create(&threads[tnum], &attr, &do_test, &tinfo); if (s != 0) { printf("Error starting thread\n"); return -4; } } // Clean up the thread creation attributes s = pthread_attr_destroy(&attr); if (s != 0) { printf("Error cleaning up thread attributes\n"); return -5; } // Wait for all the threads to finish for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_join(threads[tnum], (void **)(&res)); if (s != 0) { printf("Error waiting for thread\n"); return -6; } // Save the time times[tnum] = *res; // And clean it up free(res); } // Calculate the min, max, and average times float min_time = FLT_MAX; float max_time = -FLT_MAX; float avg_time = 0; for (tnum=0; tnum<num_threads; ++tnum) { if (times[tnum] < min_time) min_time = times[tnum]; if (times[tnum] > max_time) max_time = times[tnum]; avg_time += (times[tnum] - avg_time) / (float)(tnum + 1); } // Calculate the standard deviation of the time float stddev_time = 0; if (num_threads > 1) { for (tnum=0; tnum<num_threads; ++tnum) stddev_time += pow(times[tnum] - avg_time, 2); stddev_time = sqrtf(stddev_time / (num_threads - 1)); } // Print out the statistics of the times printf("time_per_iteration: min: %.1f us avg: %.1f us max: %.1f us stddev: %.1f us\n", min_time / tinfo.num_iterations, avg_time / tinfo.num_iterations, max_time / tinfo.num_iterations, stddev_time / tinfo.num_iterations); // Clean up the allocated threads free(threads); return 0; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-04-26 22:08 High CPU usage of scheduler? Dave Johansen @ 2012-04-27 3:10 ` Yong Zhang 2012-04-27 15:23 ` Dave Johansen 0 siblings, 1 reply; 9+ messages in thread From: Yong Zhang @ 2012-04-27 3:10 UTC (permalink / raw) To: Dave Johansen; +Cc: linux-kernel On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: > I am looking into moving an application from RHEL 5 to RHEL 6 and I > noticed an unexpected increase in CPU usage. A little digging has led > me to believe that the scheduler may be the culprit. > > I created the attached test_select_work.c file to test this out. I > compiled it with the following command on RHEL 5: > > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread > -o test_select_work Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? If not, could you please try to boot the one which enable high resolution timer with 'highres=off' to see if things change? Thanks, Yong > > I then played with the parameters until the execution time per > iteration was about 1 ms on a Dell Precision m6500. > > I got the following result on RHEL 5: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 911.5 us avg: 913.7 us max: 917.1 us stddev: 2.4 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 1802.6 us avg: 1803.9 us max: 1809.1 us > stddev: 2.1 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 7580.4 us avg: 8567.3 us max: 9022.0 us > stddev: 299.6 us > > And the following on RHEL 6: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 914.6 us avg: 975.7 us max: 1034.5 us > stddev: 50.0 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 1683.9 us avg: 1771.8 us max: 1810.8 us > stddev: 43.4 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 7997.1 us avg: 8709.1 us max: 9061.8 us > stddev: 310.0 us > > On both versions, these results were about what I expected with the > average amount of time per iteration scaling relatively linearly. I > then recompiled with -DSLEEP_TYPE=1 and got the following results on > RHEL 5: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 1803.3 us avg: 1902.8 us max: 2001.5 us > stddev: 113.8 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 1997.1 us avg: 2002.0 us max: 2010.8 us > stddev: 5.0 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 6958.4 us avg: 8397.9 us max: 9423.7 us > stddev: 619.7 us > > And the following results on RHEL 6: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 2107.1 us avg: 2143.1 us max: 2177.7 us > stddev: 30.3 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 2903.3 us avg: 2903.8 us max: 2904.3 us > stddev: 0.3 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 8877.7.1 us avg: 9016.3 us max: 9112.6 us > stddev: 62.9 us > > On RHEL 5, the results were about what I expected (4 threads taking > twice as long because of the 1 ms sleep but the 8 threads taking the > same amount of time since each thread is now sleeping for about half > the time, and a still fairly linear increase with the 40 threads). > > However, with RHEL 6, the time taken with 4 threads increased by about > 15% more than the expected doubling and the 8 thread case increased by > about 45% more than the expected slight increase. The increase in the > 4 thread case seems to be that RHEL 6 is actually sleeping for a > handful of microseconds more than 1 ms while RHEL 5 is only sleep > about 900 us, but this doesn't explain the unexpectedly large increase > in the 8 and 40 thread cases. > > I saw similar types of behaviour with all 3 -DSLEEP_TYPE values. I > also tried playing with the scheduler parameters in sysctl, but > nothing seemed to have a significant impact on the results. Any ideas > on how I can further diagnose this issue? > > Thanks, > Dave > #include <float.h> > #include <math.h> > #include <poll.h> > #include <pthread.h> > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <sys/select.h> > #include <sys/time.h> > > > // Uncomment to use select inside the loop of each of the threads > #define SLEEP_TYPE 1 > // Uncomment to use poll inside the loop of each of the threads > //#define SLEEP_TYPE 2 > // Uncomment to use usleep inside the loop of each of the threads > //#define SLEEP_TYPE 3 > > > struct thread_info { > int sleep_time; > int num_iterations; > int work_size; > }; > > void *do_test(void *arg) > { > const struct thread_info *tinfo = (struct thread_info *)arg; > > const int sleep_time = tinfo->sleep_time; > const int num_iterations = tinfo->num_iterations; > const int work_size = tinfo->work_size; > > #if SLEEP_TYPE == 1 > // Data for calling select > struct timeval ts; > #elif SLEEP_TYPE == 2 > // Data for calling poll > struct pollfd pfd; > #endif > // Data for doing work > int *buf; > int pseed; > int inum, bnum; > // Data for tracking the time > struct timeval before, after; > long long *diff; > > buf = calloc(work_size, sizeof(int)); > diff = malloc(sizeof(unsigned long long)); > > #if SLEEP_TYPE == 2 > // Initialize the poll data > pfd.fd = 0; > pfd.events = 0; > #endif > > // Get the time before starting the processing > gettimeofday(&before, NULL); > > // Do the requested number of iterations > for (inum=0; inum<num_iterations; ++inum) { > #if SLEEP_TYPE == 1 > ts.tv_sec = 0; > ts.tv_usec = sleep_time; > select(0, 0, 0, 0, &ts); > #elif SLEEP_TYPE == 2 > poll(&pfd, 1, sleep_time / 1000); > #elif SLEEP_TYPE == 3 > usleep(sleep_time); > #else > // Get rid of warning about unused variable > (void)sleep_time; > #endif > > // Fill in a buffer with random numbers (taken from latt.c by Jens Axboe <jens.axboe@oracle.com>) > pseed = 1; > for (bnum=0; bnum<work_size; ++bnum) { > pseed = pseed * 1103515245 + 12345; > buf[bnum] = (pseed / 65536) % 32768; > } > } > > // Get the time after starting the processing > gettimeofday(&after, NULL); > > // Calculate the delta time > *diff = 1000000LL * (after.tv_sec - before.tv_sec); > *diff += after.tv_usec - before.tv_usec; > > // Clean up the data > free(buf); > > return diff; > } > > int main(int argc, char **argv) > { > if (argc < 4) { > printf("Usage: %s <sleep_time> <num_iterations> <work_size> <num_threads>\n", argv[0]); > return -1; > } > > struct thread_info tinfo; > int s, tnum, num_threads; > pthread_attr_t attr; > pthread_t *threads; > long long *res; > long long *times; > > // Get the parameters > tinfo.sleep_time = atoi(argv[1]); > tinfo.num_iterations = atoi(argv[2]); > tinfo.work_size = atoi(argv[3]) * 1024; > num_threads = atoi(argv[4]); > > // Initialize the thread creation attributes > s = pthread_attr_init(&attr); > if (s != 0) { > printf("Error initializing thread attributes\n"); > return -2; > } > > // Allocate the memory to track the threads > threads = calloc(num_threads, sizeof(pthread_t)); > times = calloc(num_threads, sizeof(unsigned long long)); > if (threads == NULL) { > printf("Error allocating memory to track threads\n"); > return -3; > } > > // Start all of the threads > for (tnum=0; tnum<num_threads; ++tnum) { > s = pthread_create(&threads[tnum], &attr, &do_test, &tinfo); > > if (s != 0) { > printf("Error starting thread\n"); > return -4; > } > } > > // Clean up the thread creation attributes > s = pthread_attr_destroy(&attr); > if (s != 0) { > printf("Error cleaning up thread attributes\n"); > return -5; > } > > // Wait for all the threads to finish > for (tnum=0; tnum<num_threads; ++tnum) { > s = pthread_join(threads[tnum], (void **)(&res)); > > if (s != 0) { > printf("Error waiting for thread\n"); > return -6; > } > > // Save the time > times[tnum] = *res; > > // And clean it up > free(res); > } > > // Calculate the min, max, and average times > float min_time = FLT_MAX; > float max_time = -FLT_MAX; > float avg_time = 0; > for (tnum=0; tnum<num_threads; ++tnum) { > if (times[tnum] < min_time) > min_time = times[tnum]; > if (times[tnum] > max_time) > max_time = times[tnum]; > avg_time += (times[tnum] - avg_time) / (float)(tnum + 1); > } > // Calculate the standard deviation of the time > float stddev_time = 0; > if (num_threads > 1) { > for (tnum=0; tnum<num_threads; ++tnum) > stddev_time += pow(times[tnum] - avg_time, 2); > stddev_time = sqrtf(stddev_time / (num_threads - 1)); > } > > // Print out the statistics of the times > printf("time_per_iteration: min: %.1f us avg: %.1f us max: %.1f us stddev: %.1f us\n", > min_time / tinfo.num_iterations, > avg_time / tinfo.num_iterations, > max_time / tinfo.num_iterations, > stddev_time / tinfo.num_iterations); > > // Clean up the allocated threads > free(threads); > > return 0; > } -- Only stand for myself ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-04-27 3:10 ` Yong Zhang @ 2012-04-27 15:23 ` Dave Johansen 2012-04-30 17:29 ` Dave Johansen ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Dave Johansen @ 2012-04-27 15:23 UTC (permalink / raw) To: Yong Zhang; +Cc: linux-kernel On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: > > On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: > > I am looking into moving an application from RHEL 5 to RHEL 6 and I > > noticed an unexpected increase in CPU usage. A little digging has led > > me to believe that the scheduler may be the culprit. > > > > I created the attached test_select_work.c file to test this out. I > > compiled it with the following command on RHEL 5: > > > > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread > > -o test_select_work > > Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? > > If not, could you please try to boot the one which enable high resolution > timer with 'highres=off' to see if things change? Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the 'highres=off' in grub and got the following results: ./test_select_work 1000 10000 300 4 time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us stddev: 15.0 us ./test_select_work 1000 10000 300 8 time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us stddev: 60.6 us ./test_select_work 1000 10000 300 40 time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us stddev: 57.5 us Any other info that might be helpful? Thanks, Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-04-27 15:23 ` Dave Johansen @ 2012-04-30 17:29 ` Dave Johansen 2012-04-30 18:50 ` Dave Johansen 2012-05-01 5:11 ` Dave Johansen 2 siblings, 0 replies; 9+ messages in thread From: Dave Johansen @ 2012-04-30 17:29 UTC (permalink / raw) To: Yong Zhang; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 4840 bytes --] On Fri, Apr 27, 2012 at 8:23 AM, Dave Johansen <davejohansen@gmail.com> wrote: > On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: >> >> On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: >> > I am looking into moving an application from RHEL 5 to RHEL 6 and I >> > noticed an unexpected increase in CPU usage. A little digging has led >> > me to believe that the scheduler may be the culprit. >> > >> > I created the attached test_select_work.c file to test this out. I >> > compiled it with the following command on RHEL 5: >> > >> > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread >> > -o test_select_work >> >> Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? >> >> If not, could you please try to boot the one which enable high resolution >> timer with 'highres=off' to see if things change? > > Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the > 'highres=off' in grub and got the following results: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us > stddev: 15.0 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us > stddev: 60.6 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us > stddev: 57.5 us > > Any other info that might be helpful? > > Thanks, > Dave I made some improvements to the program to make comparisons a bit easier and the standard deviation a bit more meaningful. I have attached the updated program and script I used to run it. I also made a git repo to track the code and it's available at git://github.com/daveisfera/test_sleep.git I also ran the tests on a Dell Vostro 200 (dual core CPU) with several different OS versions. I realize that several of these will have different patches applied and won't be "pure kernel code", but this was the simplest way that I could run the tests on different versions of the kernel and get comparisons. I've attached the .svg files output by gnuplot and the data that was used to generate the plots. I apologize for having so many attachments, but I figured that this was the best way to convey the information. All of these tests were run with the following command with the executable that was built on Ubuntu 11.10 (except for CentOS 6.2 which had gcc available on it so I re-compiled it): ./run_test 1000 10 1000 250 8 4 The first interesting results is with Ubuntu 7.10. usleep seems to behave as expected, but select and poll took significantly longer than expected. Then with CentOS 5.6, the opposite is true, so I'm not sure what result can be drawn from those conclusions. I tried running the Ubuntu 8.04-4 LiveCD (because that's when the CFS was introduced), but it wouldn't boot on the Dell Vostro 200. CentOS 6.2 is where things start to get interesting, because a 10-15% increase in the execution time is seen with select and poll. With usleep, that penalty doesn't seem to be present for low numbers of threads but seems to reach similar levels once the number of threads is 2x the number of cores. Ubuntu 11.10, Ubuntu 12.04, and Fedora 16 all show results basically in line with CentOS 6.2 but with usleep always being a comparable penalty the other sleep methods. The one exception to this is that on Fedora 16, poll is lower than the other sleep methods with 3 threads. One final test that I did was to install the PPA on Ubuntu 11.10 that has BFS instead of CFS. Info about it can be found at: https://launchpad.net/~chogydan/+archive/ppa It doesn't seem to be suffering from this sleep issue (or at least in a different way) because the results for 1-3 threads are at 2 ms as expected (basically no penalty), but then at 4 threads it's about 20% higher than the no sleep methods and it seems to maintain more of a penalty at higher numbers of threads. I also didn't see the system CPU usage when running the previous test with BFS (but this might just be an accounting thing since CFS and BFS account for CPU usage differently). I know that these results aren't 100% conclusive, but they seem to indicate to me that something about the sleeping mechanism in the kernel is using more CPU than I expect it to and that this is the cause of the increased CPU usage. It doesn't appear to be the scheduler itself because then a similar sort of penalty would have been seen with sched_yield. It appears to me that it is something to do with the sleep mechanism's interaction with the scheduler (particularly the CFS more than the BFS). Is there any more data I can gather or tests that I can run that can help diagnose this problem? Thanks in advance for any help, Dave [-- Attachment #2: test_sleep.c --] [-- Type: text/x-csrc, Size: 7372 bytes --] #include <float.h> #include <math.h> #include <poll.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/select.h> #include <sys/time.h> // The different type of sleep that are supported enum sleep_type { SLEEP_TYPE_NONE, SLEEP_TYPE_SELECT, SLEEP_TYPE_POLL, SLEEP_TYPE_USLEEP, SLEEP_TYPE_YIELD, }; // Function type for doing work with a sleep typedef long long *(*work_func)(const int sleep_time, const int num_iterations, const int work_size); // Information passed to the thread struct thread_info { int sleep_time; int num_iterations; int work_size; work_func func; }; // In order to make SLEEP_TYPE a run-time parameter function pointers are used. // The function pointer could have been to the sleep function being used, but // then that would mean an extra function call inside of the "work loop" and I // wanted to keep the measurements as tight as possible and the extra work being // done to be as small/controlled as possible so instead the work is declared as // a seriees of macros that are called in all of the sleep functions. The code // is a bit uglier this way, but I believe it results in a more accurate test. // Fill in a buffer with random numbers (taken from latt.c by Jens Axboe <jens.axboe@oracle.com>) #define DECLARE_WORK() \ int *buf; \ int pseed; \ int inum, bnum; \ struct timeval before, after; \ long long *diff; \ buf = calloc(work_size, sizeof(int)); \ diff = malloc(sizeof(long long)); \ gettimeofday(&before, NULL) #define DO_WORK(SLEEP_FUNC) \ for (inum=0; inum<num_iterations; ++inum) { \ SLEEP_FUNC \ \ pseed = 1; \ for (bnum=0; bnum<work_size; ++bnum) { \ pseed = pseed * 1103515245 + 12345; \ buf[bnum] = (pseed / 65536) % 32768; \ } \ } \ #define FINISH_WORK() \ gettimeofday(&after, NULL); \ *diff = 1000000LL * (after.tv_sec - before.tv_sec); \ *diff += after.tv_usec - before.tv_usec; \ free(buf); \ return diff long long *do_work_nosleep(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK(); FINISH_WORK(); } long long *do_work_select(const int sleep_time, const int num_iterations, const int work_size) { struct timeval ts; DECLARE_WORK(); DO_WORK( ts.tv_sec = 0; ts.tv_usec = sleep_time; select(0, 0, 0, 0, &ts); ); FINISH_WORK(); } long long *do_work_poll(const int sleep_time, const int num_iterations, const int work_size) { struct pollfd pfd; const int sleep_time_ms = sleep_time / 1000; DECLARE_WORK(); pfd.fd = 0; pfd.events = 0; DO_WORK( poll(&pfd, 1, sleep_time_ms); ); FINISH_WORK(); } long long *do_work_usleep(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); DO_WORK( usleep(sleep_time); ); FINISH_WORK(); } long long *do_work_yield(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK( sched_yield(); ); FINISH_WORK(); } void *do_test(void *arg) { const struct thread_info *tinfo = (struct thread_info *)arg; // Call the function to do the work return (*tinfo->func)(tinfo->sleep_time, tinfo->num_iterations, tinfo->work_size); } int main(int argc, char **argv) { if (argc <= 6) { printf("Usage: %s <sleep_time> <outer_iterations> <inner_iterations> <work_size> <num_threads> <sleep_type>\n", argv[0]); printf(" outer_iterations: Number of iterations for each thread (used to calculate statistics)\n"); printf(" inner_iterations: Number of work/sleep cycles performed in each thread (used to improve consistency/observability))\n"); printf(" work_size: Number of array elements (in kb) that are filled with psuedo-random numbers\n"); printf(" num_threads: Number of threads to spawn and perform work/sleep cycles in\n"); printf(" sleep_type: 0=none 1=select 2=poll 3=usleep 4=yield\n"); return -1; } struct thread_info tinfo; int outer_iterations; int sleep_type; int s, inum, tnum, num_threads; pthread_attr_t attr; pthread_t *threads; long long *res; long long *times; // Get the parameters tinfo.sleep_time = atoi(argv[1]); outer_iterations = atoi(argv[2]); tinfo.num_iterations = atoi(argv[3]); tinfo.work_size = atoi(argv[4]) * 1024; num_threads = atoi(argv[5]); sleep_type = atoi(argv[6]); switch (sleep_type) { case SLEEP_TYPE_NONE: tinfo.func = &do_work_nosleep; break; case SLEEP_TYPE_SELECT: tinfo.func = &do_work_select; break; case SLEEP_TYPE_POLL: tinfo.func = &do_work_poll; break; case SLEEP_TYPE_USLEEP: tinfo.func = &do_work_usleep; break; case SLEEP_TYPE_YIELD: tinfo.func = &do_work_yield; break; default: printf("Invalid sleep type: %d\n", sleep_type); return -7; } // Initialize the thread creation attributes s = pthread_attr_init(&attr); if (s != 0) { printf("Error initializing thread attributes\n"); return -2; } // Allocate the memory to track the threads threads = calloc(num_threads, sizeof(pthread_t)); times = calloc(num_threads, sizeof(unsigned long long)); if (threads == NULL) { printf("Error allocating memory to track threads\n"); return -3; } // Calculate the statistics of the processing float min_time = FLT_MAX; float max_time = -FLT_MAX; float avg_time = 0; float prev_avg_time = 0; float stddev_time = 0; // Perform the requested number of outer iterations for (inum=0; inum<outer_iterations; ++inum) { // Start all of the threads for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_create(&threads[tnum], &attr, &do_test, &tinfo); if (s != 0) { printf("Error starting thread\n"); return -4; } } // Clean up the thread creation attributes s = pthread_attr_destroy(&attr); if (s != 0) { printf("Error cleaning up thread attributes\n"); return -5; } // Wait for all the threads to finish for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_join(threads[tnum], (void **)(&res)); if (s != 0) { printf("Error waiting for thread\n"); return -6; } // Save the time times[tnum] = *res; // And clean it up free(res); } // Update the statistics for (tnum=0; tnum<num_threads; ++tnum) { if (times[tnum] < min_time) min_time = times[tnum]; if (times[tnum] > max_time) max_time = times[tnum]; avg_time += (times[tnum] - avg_time) / (float)(tnum + 1); stddev_time += (times[tnum] - prev_avg_time) * (times[tnum] - avg_time); prev_avg_time = avg_time; } } // Finish the calculation of the standard deviation stddev_time = sqrtf(stddev_time / ((outer_iterations * num_threads) - 1)); // Print out the statistics of the times printf("time_per_iteration: min: %.1f us avg: %.1f us max: %.1f us stddev: %.1f us\n", min_time / tinfo.num_iterations, avg_time / tinfo.num_iterations, max_time / tinfo.num_iterations, stddev_time / tinfo.num_iterations); // Clean up the allocated threads free(threads); return 0; } [-- Attachment #3: centos_5.6.svg --] [-- Type: image/svg+xml, Size: 16791 bytes --] [-- Attachment #4: centos_6.2.svg --] [-- Type: image/svg+xml, Size: 16776 bytes --] [-- Attachment #5: fedora_16.svg --] [-- Type: image/svg+xml, Size: 16536 bytes --] [-- Attachment #6: results.tgz --] [-- Type: application/x-gzip, Size: 9128 bytes --] [-- Attachment #7: ubuntu_7.10.svg --] [-- Type: image/svg+xml, Size: 16776 bytes --] [-- Attachment #8: ubuntu_11.10.svg --] [-- Type: image/svg+xml, Size: 16556 bytes --] [-- Attachment #9: ubuntu_11.10_bfs.svg --] [-- Type: image/svg+xml, Size: 16564 bytes --] [-- Attachment #10: ubuntu_12.04.svg --] [-- Type: image/svg+xml, Size: 16539 bytes --] [-- Attachment #11: run_test --] [-- Type: application/octet-stream, Size: 746 bytes --] #!/bin/bash if [ $# -ne 6 ]; then echo "Usage: `basename $0` <sleep_time> <outer_iterations> <inner_iterations> <work_size> <max_num_threads> <max_sleep_type>" echo " max_num_threads: The highest value used for num_threads in the results" echo " max_sleep_type: The highest value used for sleep_type in the results" exit -1 fi # Run through the requested number of SLEEP_TYPE values for i in $(seq 0 $6) do # Remove any existing results file rm -f "results_$i.txt" # Run through the requested number of threads for j in $(seq 1 $5) do # Print which settings are about to be run echo "sleep_type: $i num_threads: $j" # Run the test and save it to the results file ./test_sleep $1 $2 $3 $4 $j $i >> "results_$i.txt" done done ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-04-27 15:23 ` Dave Johansen 2012-04-30 17:29 ` Dave Johansen @ 2012-04-30 18:50 ` Dave Johansen 2012-05-01 5:11 ` Dave Johansen 2 siblings, 0 replies; 9+ messages in thread From: Dave Johansen @ 2012-04-30 18:50 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 4746 bytes --] On Fri, Apr 27, 2012 at 8:23 AM, Dave Johansen <davejohansen@gmail.com> wrote: > On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: >> >> On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: >> > I am looking into moving an application from RHEL 5 to RHEL 6 and I >> > noticed an unexpected increase in CPU usage. A little digging has led >> > me to believe that the scheduler may be the culprit. >> > >> > I created the attached test_select_work.c file to test this out. I >> > compiled it with the following command on RHEL 5: >> > >> > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread >> > -o test_select_work >> >> Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? >> >> If not, could you please try to boot the one which enable high resolution >> timer with 'highres=off' to see if things change? > > Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the > 'highres=off' in grub and got the following results: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us > stddev: 15.0 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us > stddev: 60.6 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us > stddev: 57.5 us > > Any other info that might be helpful? > > Thanks, > Dave I made some improvements to the program to make comparisons a bit easier and the standard deviation a bit more meaningful. I have attached the updated program and script I used to run it. I also made a git repo to track the code and it's available at: git://github.com/daveisfera/test_sleep.git I also ran the tests on a Dell Vostro 200 (dual core CPU) with several different OS versions. I realize that several of these will have different patches applied and won't be "pure kernel code", but this was the simplest way that I could run the tests on different versions of the kernel and get comparisons. The plots are available in the bugzilla I opened on this issue: https://bugzilla.redhat.com/show_bug.cgi?id=812148 All of these tests were run with the following command with the executable that was built on Ubuntu 11.10 (except for CentOS 6.2 which had gcc available on it so I re-compiled it): ./run_test 1000 10 1000 250 8 4 The first interesting results is with Ubuntu 7.10. usleep seems to behave as expected, but select and poll took significantly longer than expected. Then with CentOS 5.6, the opposite is true, so I'm not sure what result can be drawn from those conclusions. I tried running the Ubuntu 8.04-4 LiveCD (because that's when the CFS was introduced), but it wouldn't boot on the Dell Vostro 200. CentOS 6.2 is where things start to get interesting, because a 10-15% increase in the execution time is seen with select and poll. With usleep, that penalty doesn't seem to be present for low numbers of threads but seems to reach similar levels once the number of threads is 2x the number of cores. Ubuntu 11.10, Ubuntu 12.04, and Fedora 16 all show results basically in line with CentOS 6.2 but with usleep always being a comparable penalty the other sleep methods. The one exception to this is that on Fedora 16, poll is lower than the other sleep methods with 3 threads. One final test that I did was to install the PPA on Ubuntu 11.10 that has BFS instead of CFS. Info about it can be found at: https://launchpad.net/~chogydan/+archive/ppa It doesn't seem to be suffering from this sleep issue (or at least in a different way) because the results for 1-3 threads are at 2 ms as expected (basically no penalty), but then at 4 threads it's about 20% higher than the no sleep methods and it seems to maintain more of a penalty at higher numbers of threads. I also didn't see the system CPU usage when running the previous test with BFS (but this might just be an accounting thing since CFS and BFS account for CPU usage differently). I know that these results aren't 100% conclusive, but they seem to indicate to me that something about the sleeping mechanism in the kernel is using more CPU than I expect it to and that this is the cause of the increased CPU usage. It doesn't appear to be the scheduler itself because then a similar sort of penalty would have been seen with sched_yield. It appears to me that it is something to do with the sleep mechanism's interaction with the scheduler (particularly the CFS more than the BFS). Is there any more data I can gather or tests that I can run that can help diagnose this problem? Thanks in advance for any help, Dave [-- Attachment #2: run_test --] [-- Type: application/octet-stream, Size: 746 bytes --] #!/bin/bash if [ $# -ne 6 ]; then echo "Usage: `basename $0` <sleep_time> <outer_iterations> <inner_iterations> <work_size> <max_num_threads> <max_sleep_type>" echo " max_num_threads: The highest value used for num_threads in the results" echo " max_sleep_type: The highest value used for sleep_type in the results" exit -1 fi # Run through the requested number of SLEEP_TYPE values for i in $(seq 0 $6) do # Remove any existing results file rm -f "results_$i.txt" # Run through the requested number of threads for j in $(seq 1 $5) do # Print which settings are about to be run echo "sleep_type: $i num_threads: $j" # Run the test and save it to the results file ./test_sleep $1 $2 $3 $4 $j $i >> "results_$i.txt" done done [-- Attachment #3: test_sleep.c --] [-- Type: text/x-csrc, Size: 7372 bytes --] #include <float.h> #include <math.h> #include <poll.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/select.h> #include <sys/time.h> // The different type of sleep that are supported enum sleep_type { SLEEP_TYPE_NONE, SLEEP_TYPE_SELECT, SLEEP_TYPE_POLL, SLEEP_TYPE_USLEEP, SLEEP_TYPE_YIELD, }; // Function type for doing work with a sleep typedef long long *(*work_func)(const int sleep_time, const int num_iterations, const int work_size); // Information passed to the thread struct thread_info { int sleep_time; int num_iterations; int work_size; work_func func; }; // In order to make SLEEP_TYPE a run-time parameter function pointers are used. // The function pointer could have been to the sleep function being used, but // then that would mean an extra function call inside of the "work loop" and I // wanted to keep the measurements as tight as possible and the extra work being // done to be as small/controlled as possible so instead the work is declared as // a seriees of macros that are called in all of the sleep functions. The code // is a bit uglier this way, but I believe it results in a more accurate test. // Fill in a buffer with random numbers (taken from latt.c by Jens Axboe <jens.axboe@oracle.com>) #define DECLARE_WORK() \ int *buf; \ int pseed; \ int inum, bnum; \ struct timeval before, after; \ long long *diff; \ buf = calloc(work_size, sizeof(int)); \ diff = malloc(sizeof(long long)); \ gettimeofday(&before, NULL) #define DO_WORK(SLEEP_FUNC) \ for (inum=0; inum<num_iterations; ++inum) { \ SLEEP_FUNC \ \ pseed = 1; \ for (bnum=0; bnum<work_size; ++bnum) { \ pseed = pseed * 1103515245 + 12345; \ buf[bnum] = (pseed / 65536) % 32768; \ } \ } \ #define FINISH_WORK() \ gettimeofday(&after, NULL); \ *diff = 1000000LL * (after.tv_sec - before.tv_sec); \ *diff += after.tv_usec - before.tv_usec; \ free(buf); \ return diff long long *do_work_nosleep(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK(); FINISH_WORK(); } long long *do_work_select(const int sleep_time, const int num_iterations, const int work_size) { struct timeval ts; DECLARE_WORK(); DO_WORK( ts.tv_sec = 0; ts.tv_usec = sleep_time; select(0, 0, 0, 0, &ts); ); FINISH_WORK(); } long long *do_work_poll(const int sleep_time, const int num_iterations, const int work_size) { struct pollfd pfd; const int sleep_time_ms = sleep_time / 1000; DECLARE_WORK(); pfd.fd = 0; pfd.events = 0; DO_WORK( poll(&pfd, 1, sleep_time_ms); ); FINISH_WORK(); } long long *do_work_usleep(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); DO_WORK( usleep(sleep_time); ); FINISH_WORK(); } long long *do_work_yield(const int sleep_time, const int num_iterations, const int work_size) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK( sched_yield(); ); FINISH_WORK(); } void *do_test(void *arg) { const struct thread_info *tinfo = (struct thread_info *)arg; // Call the function to do the work return (*tinfo->func)(tinfo->sleep_time, tinfo->num_iterations, tinfo->work_size); } int main(int argc, char **argv) { if (argc <= 6) { printf("Usage: %s <sleep_time> <outer_iterations> <inner_iterations> <work_size> <num_threads> <sleep_type>\n", argv[0]); printf(" outer_iterations: Number of iterations for each thread (used to calculate statistics)\n"); printf(" inner_iterations: Number of work/sleep cycles performed in each thread (used to improve consistency/observability))\n"); printf(" work_size: Number of array elements (in kb) that are filled with psuedo-random numbers\n"); printf(" num_threads: Number of threads to spawn and perform work/sleep cycles in\n"); printf(" sleep_type: 0=none 1=select 2=poll 3=usleep 4=yield\n"); return -1; } struct thread_info tinfo; int outer_iterations; int sleep_type; int s, inum, tnum, num_threads; pthread_attr_t attr; pthread_t *threads; long long *res; long long *times; // Get the parameters tinfo.sleep_time = atoi(argv[1]); outer_iterations = atoi(argv[2]); tinfo.num_iterations = atoi(argv[3]); tinfo.work_size = atoi(argv[4]) * 1024; num_threads = atoi(argv[5]); sleep_type = atoi(argv[6]); switch (sleep_type) { case SLEEP_TYPE_NONE: tinfo.func = &do_work_nosleep; break; case SLEEP_TYPE_SELECT: tinfo.func = &do_work_select; break; case SLEEP_TYPE_POLL: tinfo.func = &do_work_poll; break; case SLEEP_TYPE_USLEEP: tinfo.func = &do_work_usleep; break; case SLEEP_TYPE_YIELD: tinfo.func = &do_work_yield; break; default: printf("Invalid sleep type: %d\n", sleep_type); return -7; } // Initialize the thread creation attributes s = pthread_attr_init(&attr); if (s != 0) { printf("Error initializing thread attributes\n"); return -2; } // Allocate the memory to track the threads threads = calloc(num_threads, sizeof(pthread_t)); times = calloc(num_threads, sizeof(unsigned long long)); if (threads == NULL) { printf("Error allocating memory to track threads\n"); return -3; } // Calculate the statistics of the processing float min_time = FLT_MAX; float max_time = -FLT_MAX; float avg_time = 0; float prev_avg_time = 0; float stddev_time = 0; // Perform the requested number of outer iterations for (inum=0; inum<outer_iterations; ++inum) { // Start all of the threads for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_create(&threads[tnum], &attr, &do_test, &tinfo); if (s != 0) { printf("Error starting thread\n"); return -4; } } // Clean up the thread creation attributes s = pthread_attr_destroy(&attr); if (s != 0) { printf("Error cleaning up thread attributes\n"); return -5; } // Wait for all the threads to finish for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_join(threads[tnum], (void **)(&res)); if (s != 0) { printf("Error waiting for thread\n"); return -6; } // Save the time times[tnum] = *res; // And clean it up free(res); } // Update the statistics for (tnum=0; tnum<num_threads; ++tnum) { if (times[tnum] < min_time) min_time = times[tnum]; if (times[tnum] > max_time) max_time = times[tnum]; avg_time += (times[tnum] - avg_time) / (float)(tnum + 1); stddev_time += (times[tnum] - prev_avg_time) * (times[tnum] - avg_time); prev_avg_time = avg_time; } } // Finish the calculation of the standard deviation stddev_time = sqrtf(stddev_time / ((outer_iterations * num_threads) - 1)); // Print out the statistics of the times printf("time_per_iteration: min: %.1f us avg: %.1f us max: %.1f us stddev: %.1f us\n", min_time / tinfo.num_iterations, avg_time / tinfo.num_iterations, max_time / tinfo.num_iterations, stddev_time / tinfo.num_iterations); // Clean up the allocated threads free(threads); return 0; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-04-27 15:23 ` Dave Johansen 2012-04-30 17:29 ` Dave Johansen 2012-04-30 18:50 ` Dave Johansen @ 2012-05-01 5:11 ` Dave Johansen 2012-05-01 17:28 ` Dave Johansen 2 siblings, 1 reply; 9+ messages in thread From: Dave Johansen @ 2012-05-01 5:11 UTC (permalink / raw) To: linux-kernel On Fri, Apr 27, 2012 at 8:23 AM, Dave Johansen <davejohansen@gmail.com> wrote: > On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: >> >> On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: >> > I am looking into moving an application from RHEL 5 to RHEL 6 and I >> > noticed an unexpected increase in CPU usage. A little digging has led >> > me to believe that the scheduler may be the culprit. >> > >> > I created the attached test_select_work.c file to test this out. I >> > compiled it with the following command on RHEL 5: >> > >> > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread >> > -o test_select_work >> >> Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? >> >> If not, could you please try to boot the one which enable high resolution >> timer with 'highres=off' to see if things change? > > Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the > 'highres=off' in grub and got the following results: > > ./test_select_work 1000 10000 300 4 > time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us > stddev: 15.0 us > ./test_select_work 1000 10000 300 8 > time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us > stddev: 60.6 us > ./test_select_work 1000 10000 300 40 > time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us > stddev: 57.5 us > > Any other info that might be helpful? > > Thanks, > Dave I made some improvements to the program to make comparisons a bit easier and the standard deviation a bit more meaningful. It is available in a git repo at git://github.com/daveisfera/test_sleep.git I tried sending results from running this updated program on several OS versions, but it must have been rejected by the filters. The info can be found in the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=812148 Is there any more data I can gather or tests that I can run that can help diagnose this problem? Thanks in advance for any help, Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-05-01 5:11 ` Dave Johansen @ 2012-05-01 17:28 ` Dave Johansen 2012-05-07 18:12 ` Dave Johansen 0 siblings, 1 reply; 9+ messages in thread From: Dave Johansen @ 2012-05-01 17:28 UTC (permalink / raw) To: linux-kernel On Mon, Apr 30, 2012 at 10:11 PM, Dave Johansen <davejohansen@gmail.com> wrote: > On Fri, Apr 27, 2012 at 8:23 AM, Dave Johansen <davejohansen@gmail.com> wrote: >> On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: >>> >>> On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: >>> > I am looking into moving an application from RHEL 5 to RHEL 6 and I >>> > noticed an unexpected increase in CPU usage. A little digging has led >>> > me to believe that the scheduler may be the culprit. >>> > >>> > I created the attached test_select_work.c file to test this out. I >>> > compiled it with the following command on RHEL 5: >>> > >>> > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread >>> > -o test_select_work >>> >>> Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? >>> >>> If not, could you please try to boot the one which enable high resolution >>> timer with 'highres=off' to see if things change? >> >> Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the >> 'highres=off' in grub and got the following results: >> >> ./test_select_work 1000 10000 300 4 >> time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us >> stddev: 15.0 us >> ./test_select_work 1000 10000 300 8 >> time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us >> stddev: 60.6 us >> ./test_select_work 1000 10000 300 40 >> time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us >> stddev: 57.5 us >> >> Any other info that might be helpful? >> >> Thanks, >> Dave > > I made some improvements to the program to make comparisons a bit > easier and the standard deviation a bit more meaningful. It is > available in a git repo at git://github.com/daveisfera/test_sleep.git > > I tried sending results from running this updated program on several > OS versions, but it must have been rejected by the filters. The info > can be found in the bugzilla: > https://bugzilla.redhat.com/show_bug.cgi?id=812148 > > Is there any more data I can gather or tests that I can run that can > help diagnose this problem? > > Thanks in advance for any help, > Dave I also added the images and descriptions to the stackexchange question, so they can be more easily viewed there: http://unix.stackexchange.com/questions/37391/high-cpu-usage-with-cfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-05-01 17:28 ` Dave Johansen @ 2012-05-07 18:12 ` Dave Johansen 2012-08-02 21:45 ` Aaron Scamehorn 0 siblings, 1 reply; 9+ messages in thread From: Dave Johansen @ 2012-05-07 18:12 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 3062 bytes --] On Tue, May 1, 2012 at 10:28 AM, Dave Johansen <davejohansen@gmail.com> wrote: > > On Mon, Apr 30, 2012 at 10:11 PM, Dave Johansen <davejohansen@gmail.com> wrote: > > On Fri, Apr 27, 2012 at 8:23 AM, Dave Johansen <davejohansen@gmail.com> wrote: > >> On Thu, Apr 26, 2012 at 8:10 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: > >>> > >>> On Thu, Apr 26, 2012 at 03:08:51PM -0700, Dave Johansen wrote: > >>> > I am looking into moving an application from RHEL 5 to RHEL 6 and I > >>> > noticed an unexpected increase in CPU usage. A little digging has led > >>> > me to believe that the scheduler may be the culprit. > >>> > > >>> > I created the attached test_select_work.c file to test this out. I > >>> > compiled it with the following command on RHEL 5: > >>> > > >>> > cc test_select_work.c -O2 -DSLEEP_TYPE=0 -Wall -Wextra -lm -lpthread > >>> > -o test_select_work > >>> > >>> Hmm...Do both RHEL 5 and RHEL 6 have high resolution timer enabled? > >>> > >>> If not, could you please try to boot the one which enable high resolution > >>> timer with 'highres=off' to see if things change? > >> > >> Yes, RHEL 6 has CONFIG_HIGH_RES_TIMERS=y. I rebooted and used the > >> 'highres=off' in grub and got the following results: > >> > >> ./test_select_work 1000 10000 300 4 > >> time_per_iteration: min: 3130.1 us avg: 3152.2 us max: 3162.2 us > >> stddev: 15.0 us > >> ./test_select_work 1000 10000 300 8 > >> time_per_iteration: min: 4314.6 us avg: 4407.9 us max: 4496.3 us > >> stddev: 60.6 us > >> ./test_select_work 1000 10000 300 40 > >> time_per_iteration: min: 8901.7 us avg: 9056.5 us max: 9121.3 us > >> stddev: 57.5 us > >> > >> Any other info that might be helpful? > >> > >> Thanks, > >> Dave > > > > I made some improvements to the program to make comparisons a bit > > easier and the standard deviation a bit more meaningful. It is > > available in a git repo at git://github.com/daveisfera/test_sleep.git > > > > I tried sending results from running this updated program on several > > OS versions, but it must have been rejected by the filters. The info > > can be found in the bugzilla: > > https://bugzilla.redhat.com/show_bug.cgi?id=812148 > > > > Is there any more data I can gather or tests that I can run that can > > help diagnose this problem? > > > > Thanks in advance for any help, > > Dave > > I also added the images and descriptions to the stackexchange > question, so they can be more easily viewed there: > http://unix.stackexchange.com/questions/37391/high-cpu-usage-with-cfs Here's an updated version that measures the user and system time of each thread and also outputs those results. It also has a correction for how the mean and standard deviation are updated inside of the outer_iterations loop. The updated results are available here: http://unix.stackexchange.com/questions/37391/high-cpu-usage-with-cfs and here: https://bugzilla.redhat.com/show_bug.cgi?id=812148 Any other tests I can run to help diagnose the problem? Thanks, Dave [-- Attachment #2: test_sleep.c --] [-- Type: text/x-csrc, Size: 11284 bytes --] #include <limits.h> #include <math.h> #include <poll.h> #include <pthread.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #include <sys/select.h> #include <sys/syscall.h> #include <sys/time.h> // Apparently GLIBC doesn't provide a wrapper for this function so provide it here #ifndef HAS_GETTID pid_t gettid(void) { return syscall(SYS_gettid); } #endif // The different type of sleep that are supported enum sleep_type { SLEEP_TYPE_NONE, SLEEP_TYPE_SELECT, SLEEP_TYPE_POLL, SLEEP_TYPE_USLEEP, SLEEP_TYPE_YIELD, SLEEP_TYPE_PTHREAD_COND, SLEEP_TYPE_NANOSLEEP, }; // Information returned by the processing thread struct thread_res { long long clock; long long user; long long sys; }; // Function type for doing work with a sleep typedef struct thread_res *(*work_func)(const int pid, const int sleep_time, const int num_iterations, const int work_size); // Information passed to the thread struct thread_info { pid_t pid; int sleep_time; int num_iterations; int work_size; work_func func; }; inline void get_thread_times(pid_t pid, pid_t tid, unsigned long long *utime, unsigned long long *stime) { char filename[FILENAME_MAX]; FILE *f; sprintf(filename, "/proc/%d/task/%d/stat", pid, tid); f = fopen(filename, "r"); if (f == NULL) { *utime = 0; *stime = 0; return; } fscanf(f, "%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %Lu %Lu", utime, stime); fclose(f); } // In order to make SLEEP_TYPE a run-time parameter function pointers are used. // The function pointer could have been to the sleep function being used, but // then that would mean an extra function call inside of the "work loop" and I // wanted to keep the measurements as tight as possible and the extra work being // done to be as small/controlled as possible so instead the work is declared as // a seriees of macros that are called in all of the sleep functions. The code // is a bit uglier this way, but I believe it results in a more accurate test. // Fill in a buffer with random numbers (taken from latt.c by Jens Axboe <jens.axboe@oracle.com>) #define DECLARE_FUNC(NAME) struct thread_res *do_work_##NAME(const int pid, const int sleep_time, const int num_iterations, const int work_size) #define DECLARE_WORK() \ int *buf; \ int pseed; \ int inum, bnum; \ pid_t tid; \ struct timeval clock_before, clock_after; \ unsigned long long user_before, user_after; \ unsigned long long sys_before, sys_after; \ struct thread_res *diff; \ tid = gettid(); \ buf = malloc(work_size * sizeof(*buf)); \ diff = malloc(sizeof(*diff)); \ get_thread_times(pid, tid, &user_before, &sys_before); \ gettimeofday(&clock_before, NULL) #define DO_WORK(SLEEP_FUNC) \ for (inum=0; inum<num_iterations; ++inum) { \ SLEEP_FUNC \ \ pseed = 1; \ for (bnum=0; bnum<work_size; ++bnum) { \ pseed = pseed * 1103515245 + 12345; \ buf[bnum] = (pseed / 65536) % 32768; \ } \ } \ #define FINISH_WORK() \ gettimeofday(&clock_after, NULL); \ get_thread_times(pid, tid, &user_after, &sys_after); \ diff->clock = 1000000LL * (clock_after.tv_sec - clock_before.tv_sec); \ diff->clock += clock_after.tv_usec - clock_before.tv_usec; \ diff->user = user_after - user_before; \ diff->sys = sys_after - sys_before; \ free(buf); \ return diff DECLARE_FUNC(nosleep) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK(); FINISH_WORK(); } DECLARE_FUNC(select) { struct timeval ts; DECLARE_WORK(); DO_WORK( ts.tv_sec = 0; ts.tv_usec = sleep_time; select(0, 0, 0, 0, &ts); ); FINISH_WORK(); } DECLARE_FUNC(poll) { struct pollfd pfd; const int sleep_time_ms = sleep_time / 1000; DECLARE_WORK(); pfd.fd = 0; pfd.events = 0; DO_WORK( poll(&pfd, 1, sleep_time_ms); ); FINISH_WORK(); } DECLARE_FUNC(usleep) { DECLARE_WORK(); DO_WORK( usleep(sleep_time); ); FINISH_WORK(); } DECLARE_FUNC(yield) { DECLARE_WORK(); // Let the compiler know that sleep_time isn't used in this function (void)sleep_time; DO_WORK( sched_yield(); ); FINISH_WORK(); } DECLARE_FUNC(pthread_cond) { pthread_cond_t cond = PTHREAD_COND_INITIALIZER; pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; struct timespec ts; const int sleep_time_ns = sleep_time * 1000; DECLARE_WORK(); pthread_mutex_lock(&mutex); DO_WORK( clock_gettime(CLOCK_REALTIME, &ts); ts.tv_nsec += sleep_time_ns; if (ts.tv_nsec >= 1000000000) { ts.tv_sec += 1; ts.tv_nsec -= 1000000000; } pthread_cond_timedwait(&cond, &mutex, &ts); ); pthread_mutex_unlock(&mutex); pthread_cond_destroy(&cond); pthread_mutex_destroy(&mutex); FINISH_WORK(); } DECLARE_FUNC(nanosleep) { struct timespec req, rem; const int sleep_time_ns = sleep_time * 1000; DECLARE_WORK(); DO_WORK( req.tv_sec = 0; req.tv_nsec = sleep_time_ns; nanosleep(&req, &rem); ); FINISH_WORK(); } void *do_test(void *arg) { const struct thread_info *tinfo = (struct thread_info *)arg; // Call the function to do the work return (*tinfo->func)(tinfo->pid, tinfo->sleep_time, tinfo->num_iterations, tinfo->work_size); } struct thread_res_stats { double min; double max; double avg; double stddev; double prev_avg; }; #ifdef LLONG_MAX #define THREAD_RES_STATS_INITIALIZER {LLONG_MAX, LLONG_MIN, 0, 0, 0} #else #define THREAD_RES_STATS_INITIALIZER {LONG_MAX, LONG_MIN, 0, 0, 0} #endif void update_stats(struct thread_res_stats *stats, long long value, int num_samples, int num_iterations, double scale_to_usecs) { // Calculate the average time per iteration double value_per_iteration = value * scale_to_usecs / num_iterations; // Update the max and min if (value_per_iteration < stats->min) stats->min = value_per_iteration; if (value_per_iteration > stats->max) stats->max = value_per_iteration; // Update the average stats->avg += (value_per_iteration - stats->avg) / (double)(num_samples); // Update the standard deviation stats->stddev += (value_per_iteration - stats->prev_avg) * (value_per_iteration - stats->avg); // And record the current average for use in the next update stats->prev_avg= stats->avg; } void print_stats(const char *name, const struct thread_res_stats *stats) { printf("%s: min: %.1f us avg: %.1f us max: %.1f us stddev: %.1f us\n", name, stats->min, stats->avg, stats->max, stats->stddev); } int main(int argc, char **argv) { if (argc <= 6) { printf("Usage: %s <sleep_time> <outer_iterations> <inner_iterations> <work_size> <num_threads> <sleep_type>\n", argv[0]); printf(" outer_iterations: Number of iterations for each thread (used to calculate statistics)\n"); printf(" inner_iterations: Number of work/sleep cycles performed in each thread (used to improve consistency/observability))\n"); printf(" work_size: Number of array elements (in kb) that are filled with psuedo-random numbers\n"); printf(" num_threads: Number of threads to spawn and perform work/sleep cycles in\n"); printf(" sleep_type: 0=none 1=select 2=poll 3=usleep 4=yield 5=pthread_cond 6=nanosleep\n"); return -1; } struct thread_info tinfo; int outer_iterations; int sleep_type; int s, inum, tnum, num_samples, num_threads; pthread_attr_t attr; pthread_t *threads; struct thread_res *res; struct thread_res **times; // Track the stats for each of the measurements struct thread_res_stats stats_clock = THREAD_RES_STATS_INITIALIZER; struct thread_res_stats stats_user = THREAD_RES_STATS_INITIALIZER; struct thread_res_stats stats_sys = THREAD_RES_STATS_INITIALIZER; // Calculate the conversion factor from clock_t to seconds const long clocks_per_sec = sysconf(_SC_CLK_TCK); const double clocks_to_usec = 1000000 / (double)clocks_per_sec; // Get the parameters tinfo.pid = getpid(); tinfo.sleep_time = atoi(argv[1]); outer_iterations = atoi(argv[2]); tinfo.num_iterations = atoi(argv[3]); tinfo.work_size = atoi(argv[4]) * 1024; num_threads = atoi(argv[5]); sleep_type = atoi(argv[6]); switch (sleep_type) { case SLEEP_TYPE_NONE: tinfo.func = &do_work_nosleep; break; case SLEEP_TYPE_SELECT: tinfo.func = &do_work_select; break; case SLEEP_TYPE_POLL: tinfo.func = &do_work_poll; break; case SLEEP_TYPE_USLEEP: tinfo.func = &do_work_usleep; break; case SLEEP_TYPE_YIELD: tinfo.func = &do_work_yield; break; case SLEEP_TYPE_PTHREAD_COND: tinfo.func = &do_work_pthread_cond; break; case SLEEP_TYPE_NANOSLEEP: tinfo.func = &do_work_nanosleep; break; default: printf("Invalid sleep type: %d\n", sleep_type); return -7; } // Initialize the thread creation attributes s = pthread_attr_init(&attr); if (s != 0) { printf("Error initializing thread attributes\n"); return -2; } // Allocate the memory to track the threads threads = calloc(num_threads, sizeof(*threads)); times = calloc(num_threads, sizeof(*times)); if (threads == NULL) { printf("Error allocating memory to track threads\n"); return -3; } // Initialize the number of samples num_samples = 0; // Perform the requested number of outer iterations for (inum=0; inum<outer_iterations; ++inum) { // Start all of the threads for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_create(&threads[tnum], &attr, &do_test, &tinfo); if (s != 0) { printf("Error starting thread\n"); return -4; } } // Wait for all the threads to finish for (tnum=0; tnum<num_threads; ++tnum) { s = pthread_join(threads[tnum], (void **)(&res)); if (s != 0) { printf("Error waiting for thread\n"); return -6; } // Save the result for processing when they're all done times[tnum] = res; } // For each of the threads for (tnum=0; tnum<num_threads; ++tnum) { // Increment the number of samples in the statistics ++num_samples; // Update the statistics with this measurement update_stats(&stats_clock, times[tnum]->clock, num_samples, tinfo.num_iterations, 1); update_stats(&stats_user, times[tnum]->user, num_samples, tinfo.num_iterations, clocks_to_usec); update_stats(&stats_sys, times[tnum]->sys, num_samples, tinfo.num_iterations, clocks_to_usec); // And clean it up free(times[tnum]); } } // Clean up the thread creation attributes s = pthread_attr_destroy(&attr); if (s != 0) { printf("Error cleaning up thread attributes\n"); return -5; } // Finish the calculation of the standard deviation stats_clock.stddev = sqrtf(stats_clock.stddev / (num_samples - 1)); stats_user.stddev = sqrtf(stats_user.stddev / (num_samples - 1)); stats_sys.stddev = sqrtf(stats_sys.stddev / (num_samples - 1)); // Print out the statistics of the times print_stats("gettimeofday_per_iteration", &stats_clock); print_stats("utime_per_iteration", &stats_user); print_stats("stime_per_iteration", &stats_sys); // Clean up the allocated threads and times free(threads); free(times); return 0; } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: High CPU usage of scheduler? 2012-05-07 18:12 ` Dave Johansen @ 2012-08-02 21:45 ` Aaron Scamehorn 0 siblings, 0 replies; 9+ messages in thread From: Aaron Scamehorn @ 2012-08-02 21:45 UTC (permalink / raw) To: linux-kernel Dave Johansen <davejohansen <at> gmail.com> writes: > > Here's an updated version that measures the user and system time of > each thread and also outputs those results. It also has a correction > for how the mean and standard deviation are updated inside of the > outer_iterations loop. > > The updated results are available here: > http://unix.stackexchange.com/questions/37391/high-cpu-usage-with-cfs > and here: > https://bugzilla.redhat.com/show_bug.cgi?id=812148 > > Any other tests I can run to help diagnose the problem? > > Thanks, > Dave > > Attachment (test_sleep.c): text/x-csrc, 11 KiB I tried to take a look at that bugzilla, but Redhat is saying that it is "an internal bugzilla, and not visible to the public." Have there been any updates on this? Thanks, Aaron ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-02 21:50 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-26 22:08 High CPU usage of scheduler? Dave Johansen 2012-04-27 3:10 ` Yong Zhang 2012-04-27 15:23 ` Dave Johansen 2012-04-30 17:29 ` Dave Johansen 2012-04-30 18:50 ` Dave Johansen 2012-05-01 5:11 ` Dave Johansen 2012-05-01 17:28 ` Dave Johansen 2012-05-07 18:12 ` Dave Johansen 2012-08-02 21:45 ` Aaron Scamehorn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).