* VM: qsbench numbers @ 2001-11-04 14:11 ` Lorenzo Allegrucci 2001-11-04 17:18 ` Linus Torvalds 2001-11-04 21:17 ` Lorenzo Allegrucci 0 siblings, 2 replies; 5+ messages in thread From: Lorenzo Allegrucci @ 2001-11-04 14:11 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds I begin with the last Linus' kernel, three runs and kswapd CPU time appended. Linux-2.4.14-pre8: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.270u 7.330s 2:33.29 50.6% 0+0k 0+0io 19670pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.090u 6.890s 2:32.29 50.5% 0+0k 0+0io 18337pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.510u 6.660s 2:29.29 51.6% 0+0k 0+0io 18463pf+0w 0:01 kswapd Double swap space (from 200M to 400M): lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.510u 6.390s 2:24.39 53.2% 0+0k 0+0io 17902pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.600u 7.600s 2:56.97 44.1% 0+0k 0+0io 23599pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.370u 7.340s 2:50.26 45.6% 0+0k 0+0io 22295pf+0w 0:03 kswapd This is interesting. Runs 2 and 3 are slower, even with more swap space and the new VM seems to have lost its proverbial stability on performance. Old results below, for performance and behaviour comparisons. Linux-2.4.14-pre7: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 224 (qsbench). 17.770u 3.160s 1:19.95 26.1% 0+0k 0+0io 13294pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 226 (qsbench). 26.030u 15.530s 1:39.39 41.8% 0+0k 0+0io 13283pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 228 (qsbench). 29.350u 41.360s 2:27.63 47.8% 0+0k 0+0io 15214pf+0w 0:12 kswapd Double swap space: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.530u 2.920s 2:16.35 53.8% 0+0k 0+0io 17575pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.510u 3.160s 2:19.79 52.7% 0+0k 0+0io 17639pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.540u 3.270s 2:17.39 53.7% 0+0k 0+0io 17544pf+0w 0:01 kswapd Linux-2.4.14-pre6: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 224 (qsbench). 69.890u 3.430s 2:12.48 55.3% 0+0k 0+0io 16374pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 226 (qsbench). 69.550u 2.990s 2:11.31 55.2% 0+0k 0+0io 15374pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 Out of Memory: Killed process 228 (qsbench). 69.480u 3.100s 2:13.33 54.4% 0+0k 0+0io 15950pf+0w 0:01 kswapd Linux-2.4.14-pre5: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.340u 3.450s 2:13.62 55.2% 0+0k 0+0io 16829pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.590u 2.940s 2:15.48 54.2% 0+0k 0+0io 17182pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.140u 3.480s 2:14.66 54.6% 0+0k 0+0io 17122pf+0w 0:01 kswapd 2.4.14-pre5 has the best VM for qsbench :) Linux-2.4.13: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.260u 2.150s 2:20.68 52.1% 0+0k 0+0io 20173pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.020u 2.050s 2:18.78 52.6% 0+0k 0+0io 20353pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.810u 2.080s 2:19.50 52.2% 0+0k 0+0io 20413pf+0w 0:06 kswapd Linux-2.4.11: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.020u 1.650s 2:20.74 51.6% 0+0k 0+0io 10652pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.070u 1.650s 2:21.51 51.3% 0+0k 0+0io 10499pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.790u 1.670s 2:21.01 51.3% 0+0k 0+0io 10641pf+0w 0:04 kswapd Linux-2.4.10: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.410u 1.870s 2:45.25 43.7% 0+0k 0+0io 16088pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.910u 1.840s 2:45.16 44.0% 0+0k 0+0io 16338pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.310u 1.910s 2:45.20 44.3% 0+0k 0+0io 16211pf+0w 0:03 kswapd Linux-2.4.13-ac4: lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.800u 3.470s 3:04.15 40.3% 0+0k 0+0io 13916pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.530u 3.930s 3:13.90 38.9% 0+0k 0+0io 14101pf+0w lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.260u 3.640s 3:03.54 40.8% 0+0k 0+0io 13047pf+0w 0:08 kswapd -- Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: VM: qsbench numbers 2001-11-04 14:11 ` VM: qsbench numbers Lorenzo Allegrucci @ 2001-11-04 17:18 ` Linus Torvalds 2001-11-04 21:17 ` Lorenzo Allegrucci 1 sibling, 0 replies; 5+ messages in thread From: Linus Torvalds @ 2001-11-04 17:18 UTC (permalink / raw) To: Lorenzo Allegrucci; +Cc: linux-kernel On Sun, 4 Nov 2001, Lorenzo Allegrucci wrote: > > I begin with the last Linus' kernel, three runs and kswapd CPU > time appended. It's interesting how your numbers decrease with more swap-space. That, together with the fact that the "more swap space" case also degrades the second time around seems to imply that we leave swap-cache pages around after they aren't used. Does "free" after a run has completed imply that there's still lots of swap used? We _should_ have gotten rid of it at "free_swap_and_cache()" time, but if we missed it.. What happens if you make the "vm_swap_full()" define in <linux/swap.h> be unconditionally defined to "1"? That should make us be more aggressive about freeing those swap-cache pages, and it would be interesting to see if it also stabilizes your numbers. Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: VM: qsbench numbers 2001-11-04 14:11 ` VM: qsbench numbers Lorenzo Allegrucci 2001-11-04 17:18 ` Linus Torvalds @ 2001-11-04 21:17 ` Lorenzo Allegrucci 2001-11-05 1:03 ` Linus Torvalds 1 sibling, 1 reply; 5+ messages in thread From: Lorenzo Allegrucci @ 2001-11-04 21:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 4279 bytes --] At 09.18 04/11/01 -0800, Linus Torvalds wrote: > >On Sun, 4 Nov 2001, Lorenzo Allegrucci wrote: >> >> I begin with the last Linus' kernel, three runs and kswapd CPU >> time appended. > >It's interesting how your numbers decrease with more swap-space. That, >together with the fact that the "more swap space" case also degrades the >second time around seems to imply that we leave swap-cache pages around >after they aren't used. > >Does "free" after a run has completed imply that there's still lots of >swap used? We _should_ have gotten rid of it at "free_swap_and_cache()" >time, but if we missed it.. lenstra:~/src/qsort> free total used free shared buffers cached Mem: 255984 16760 239224 0 1092 8008 -/+ buffers/cache: 7660 248324 Swap: 195512 0 195512 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.590u 7.640s 2:31.06 51.7% 0+0k 0+0io 19036pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 255984 6008 249976 0 100 1096 -/+ buffers/cache: 4812 251172 Swap: 195512 5080 190432 and with more swap.. lenstra:~/src/qsort> free total used free shared buffers cached Mem: 255984 13488 242496 0 532 5360 -/+ buffers/cache: 7596 248388 Swap: 390592 0 390592 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.180u 7.650s 2:43.22 47.6% 0+0k 0+0io 21019pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 255984 6596 249388 0 108 1116 -/+ buffers/cache: 5372 250612 Swap: 390592 5576 385016 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 71.030u 7.040s 2:49.45 46.0% 0+0k 0+0io 22734pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 255984 8808 247176 0 108 1152 -/+ buffers/cache: 7548 248436 Swap: 390592 7948 382644 >What happens if you make the "vm_swap_full()" define in <linux/swap.h> be >unconditionally defined to "1"? lenstra:~/src/qsort> free total used free shared buffers cached Mem: 256000 16772 239228 0 1104 8008 -/+ buffers/cache: 7660 248340 Swap: 195512 0 195512 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.530u 7.290s 2:33.26 50.7% 0+0k 0+0io 19689pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 256000 5132 250868 0 116 1144 -/+ buffers/cache: 3872 252128 Swap: 195512 3748 191764 ..and now with 400M of swap: lenstra:~/src/qsort> free total used free shared buffers cached Mem: 256000 13096 242904 0 504 4904 -/+ buffers/cache: 7688 248312 Swap: 390592 0 390592 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.830u 7.100s 2:29.52 52.1% 0+0k 0+0io 18488pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 256000 4980 251020 0 108 1132 -/+ buffers/cache: 3740 252260 Swap: 390592 3840 386752 lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100 70.560u 6.840s 2:28.66 52.0% 0+0k 0+0io 18203pf+0w lenstra:~/src/qsort> free total used free shared buffers cached Mem: 256000 5044 250956 0 108 1112 -/+ buffers/cache: 3824 252176 Swap: 390592 3896 386696 Performace improved and numbers stabilized. >That should make us be more aggressive >about freeing those swap-cache pages, and it would be interesting to see >if it also stabilizes your numbers. > > Linus I attach qsbench.c [-- Attachment #2: Type: text/plain, Size: 2449 bytes --] /* * Copyright (C) 2001 Lorenzo Allegrucci (lenstra@tiscalinet.it) * Licensed under the GPL */ #include <stdio.h> #include <stdlib.h> #include <malloc.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #define MAX_PROCS 1024 /** * quick_sort - Sort in the range [l, r] */ void quick_sort(int a[], int l, int r) { int i, j, p, tmp; int m, min, max; i = l; j = r; m = (l + r) >> 1; if (a[m] >= a[l]) { max = a[m]; min = a[l]; } else { max = a[l]; min = a[m]; } if (a[r] >= max) p = max; else { if (a[r] >= min) p = a[r]; else p = min; } do { while (a[i] < p) i++; while (p < a[j]) j--; if (i <= j) { tmp = a[i]; a[i] = a[j]; a[j] = tmp; i++; j--; } } while (i <= j); if (l < j) quick_sort(a, l, j); if (i < r) quick_sort(a, i, r); } void do_qsort(int n, int s) { int * a, i, errors = 0; if ((a = malloc(sizeof(int) * n)) == NULL) { perror("malloc"); exit(1); } srand(s); //printf("seed = %d\n", s); for (i = 0; i < n; i++) a[i] = rand(); quick_sort(a, 0, n - 1); //printf("verify... "); fflush(stdout); for (i = 0; i < n - 1; i++) if (a[i] > a[i + 1]) errors++; //printf("done.\n"); if (errors) fprintf(stderr, "WARNING: %d errors.\n", errors); free(a); exit(0); } void start_procs(int n, int p, int s) { int i, pid[MAX_PROCS]; int status; if (p > MAX_PROCS) p = MAX_PROCS; for (i = 0; i < p; i++) { pid[i] = fork(); if (pid[i] == 0) do_qsort(n, s); else if (pid[i] < 0) perror("fork"); } for (i = 0; i < p; i++) waitpid(pid[i], &status, 0); } void usage(void) { fprintf(stderr, "Usage: qs [-h] [-n nr_elems] [-p nr_procs]" " [-s seed]\n"); exit(1); } int main(int argc, char * argv[]) { char *n = "1000000", *p = "1", *s = "1"; int nr_elems, nr_procs, seed; int c; if (argc == 1) usage(); while (1) { c = getopt(argc, argv, "hn:p:s:V"); if (c == -1) break; switch (c) { case 'h': usage(); case 'n': n = optarg; break; case 'p': p = optarg; break; case 's': s = optarg; break; case 'V': printf("Version 0.93\n"); return 1; case '?': return 1; } } nr_elems = atoi(n); nr_procs = atoi(p); seed = atoi(s); start_procs(nr_elems, nr_procs, seed); return 0; } [-- Attachment #3: Type: text/plain, Size: 13 bytes --] -- Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: VM: qsbench numbers 2001-11-04 21:17 ` Lorenzo Allegrucci @ 2001-11-05 1:03 ` Linus Torvalds 0 siblings, 0 replies; 5+ messages in thread From: Linus Torvalds @ 2001-11-05 1:03 UTC (permalink / raw) To: Lorenzo Allegrucci; +Cc: linux-kernel On Sun, 4 Nov 2001, Lorenzo Allegrucci wrote: > > > >Does "free" after a run has completed imply that there's still lots of > >swap used? We _should_ have gotten rid of it at "free_swap_and_cache()" > >time, but if we missed it.. > > 70.590u 7.640s 2:31.06 51.7% 0+0k 0+0io 19036pf+0w > lenstra:~/src/qsort> free > total used free shared buffers cached > Mem: 255984 6008 249976 0 100 1096 > -/+ buffers/cache: 4812 251172 > Swap: 195512 5080 190432 That's not a noticeable amount, and is perfectly explainable by simply having deamons that got swapped out with truly inactive pages. So a swapcache leak does not seem to be the reason for the unstable numbers. > >What happens if you make the "vm_swap_full()" define in <linux/swap.h> be > >unconditionally defined to "1"? > > 70.530u 7.290s 2:33.26 50.7% 0+0k 0+0io 19689pf+0w > 70.830u 7.100s 2:29.52 52.1% 0+0k 0+0io 18488pf+0w > 70.560u 6.840s 2:28.66 52.0% 0+0k 0+0io 18203pf+0w > > Performace improved and numbers stabilized. Indeed. Mind doing some more tests? In particular, the "vm_swap_full()" macro is only used in two places: mm/memory.c and mm/swapfile.c. Are you willing to test _which_ one (or is it both together) it is that seems to bring on the unstable numbers? Linus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: VM: qsbench numbers @ 2001-11-05 15:30 Lorenzo Allegrucci 0 siblings, 0 replies; 5+ messages in thread From: Lorenzo Allegrucci @ 2001-11-05 15:30 UTC (permalink / raw) To: linux-kernel [Forgot to CC linux-kernel people, sorry] At 17.03 04/11/01 -0800, you wrote: > >On Sun, 4 Nov 2001, Lorenzo Allegrucci wrote: >> > >> >Does "free" after a run has completed imply that there's still lots of >> >swap used? We _should_ have gotten rid of it at "free_swap_and_cache()" >> >time, but if we missed it.. >> >> 70.590u 7.640s 2:31.06 51.7% 0+0k 0+0io 19036pf+0w >> lenstra:~/src/qsort> free >> total used free shared buffers cached >> Mem: 255984 6008 249976 0 100 1096 >> -/+ buffers/cache: 4812 251172 >> Swap: 195512 5080 190432 > >That's not a noticeable amount, and is perfectly explainable by simply >having deamons that got swapped out with truly inactive pages. So a >swapcache leak does not seem to be the reason for the unstable numbers. > >> >What happens if you make the "vm_swap_full()" define in <linux/swap.h> be >> >unconditionally defined to "1"? >> >> 70.530u 7.290s 2:33.26 50.7% 0+0k 0+0io 19689pf+0w >> 70.830u 7.100s 2:29.52 52.1% 0+0k 0+0io 18488pf+0w >> 70.560u 6.840s 2:28.66 52.0% 0+0k 0+0io 18203pf+0w >> >> Performace improved and numbers stabilized. > >Indeed. > >Mind doing some more tests? In particular, the "vm_swap_full()" macro is >only used in two places: mm/memory.c and mm/swapfile.c. Are you willing to >test _which_ one (or is it both together) it is that seems to bring on the >unstable numbers? mm/memory.c: #undef vm_swap_full() #define vm_swap_full() 1 swap=200M 70.480u 7.440s 2:35.74 50.0% 0+0k 0+0io 19897pf+0w 70.640u 7.280s 2:28.87 52.3% 0+0k 0+0io 18453pf+0w 70.750u 7.170s 2:36.26 49.8% 0+0k 0+0io 19719pf+0w swap=400M 70.120u 6.940s 2:29.55 51.5% 0+0k 0+0io 18598pf+0w 70.160u 7.320s 2:37.34 49.2% 0+0k 0+0io 19720pf+0w 70.020u 11.310s 3:15.09 41.6% 0+0k 0+0io 28330pf+0w mm/memory.c: /* #undef vm_swap_full() */ /* #define vm_swap_full() 1 */ mm/swapfile.c: #undef vm_swap_full() #define vm_swap_full() 1 swap=200M 69.610u 7.830s 2:33.47 50.4% 0+0k 0+0io 19630pf+0w 70.260u 7.810s 2:54.06 44.8% 0+0k 0+0io 22816pf+0w 70.420u 7.420s 2:42.71 47.8% 0+0k 0+0io 20655pf+0w swap=400M 70.240u 6.980s 2:40.37 48.1% 0+0k 0+0io 20437pf+0w 70.430u 6.450s 2:25.36 52.8% 0+0k 0+0io 18400pf+0w 70.270u 6.420s 2:25.52 52.7% 0+0k 0+0io 18267pf+0w 70.850u 6.530s 2:35.82 49.6% 0+0k 0+0io 19481pf+0w These above are bad numbers but the worst is still to come.. I repeated the "What happens if you make the "vm_swap_full()" define in <linux/swap.h> be unconditionally defined to "1"?" thing. swap=200M 70.510u 7.510s 2:33.91 50.6% 0+0k 0+0io 19584pf+0w 70.100u 7.620s 2:42.20 47.9% 0+0k 0+0io 20562pf+0w 69.840u 7.910s 2:51.61 45.3% 0+0k 0+0io 22541pf+0w 70.370u 7.910s 2:52.06 45.4% 0+0k 0+0io 22793pf+0w swap=400M 70.560u 7.580s 2:37.38 49.6% 0+0k 0+0io 19962pf+0w 70.120u 7.560s 2:45.04 47.0% 0+0k 0+0io 20403pf+0w 70.390u 7.130s 2:29.82 51.7% 0+0k 0+0io 18159pf+0w <- 70.080u 7.190s 2:29.63 51.6% 0+0k 0+0io 18580pf+0w <- 70.300u 6.810s 2:29.70 51.5% 0+0k 0+0io 18267pf+0w <- 69.770u 7.670s 2:49.68 45.6% 0+0k 0+0io 20980pf+0w Well, numbers are unstable again. Either I made some error patching the kernel, or my previous test: >> 70.530u 7.290s 2:33.26 50.7% 0+0k 0+0io 19689pf+0w >> 70.830u 7.100s 2:29.52 52.1% 0+0k 0+0io 18488pf+0w >> 70.560u 6.840s 2:28.66 52.0% 0+0k 0+0io 18203pf+0w was a statistical "fluctuation". I don't know, and I would be very thankful if somebody can confirm/deny these results. -- Lorenzo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-11-05 15:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.LNX.4.33.0111040913370.6919-100000@penguin.transmeta. com> 2001-11-04 14:11 ` VM: qsbench numbers Lorenzo Allegrucci 2001-11-04 17:18 ` Linus Torvalds 2001-11-04 21:17 ` Lorenzo Allegrucci 2001-11-05 1:03 ` Linus Torvalds 2001-11-05 15:30 Lorenzo Allegrucci
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).