linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.40-mm1
@ 2002-10-01  9:32 Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2002-10-01  9:32 UTC (permalink / raw)
  To: lkml, linux-mm


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.40/2.5.40-mm1/

Mainly a resync.

- A few minor problems in the per-cpu-pages code have been fixed.

- Updated dcache RCU code.

- Significant brain surgery on the SARD patch.

- Decreased the disk scheduling tunable `fifo_batch' from 32 to 16 to
  improve disk read latency.

- Updated ext3 htree patch from Ted.

- Included a patch from Mala Anand which _should_ speed up kernel<->userspace
  memory copies for Intel ia32 hardware.  But I can't measure any difference
  with poorly-aligned pagecache copies.


-scsi_hack.patch
-might_sleep-2.patch
-slab-fix.patch
-hugetlb-doc.patch
-get_user_pages-PG_reserved.patch
-move_one_page_fix.patch
-zab-list_heads.patch
-remove-gfp_nfs.patch
-buddyinfo.patch
-free_area.patch
-per-node-kswapd.patch
-topology-api.patch
-topology_fixes.patch

 Merged

+misc.patch

 Trivia

+ioperm-fix.patch

 Fix the sys_ioperm() might-sleep-while-atomic bug

-sard.patch
+bd-sard.patch

 Somewhat rewritten to not key everything off minors and majors - use
 pointers instead.

+bio-get-nr-vecs.patch

 use bio_get_nr_vecs in fs/mpage.c

+dio-nr-segs.patch

 use bio_get_nr_vecs in fs/direct-io.c

-per-node-zone_normal.patch
+per-node-mem_map.patch

 Renamed

+free_area_init-cleanup.patch

 Clean up some mm init code.

+intel-user-copy.patch

 Supposedly faster copy_*_user.



ext3-dxdir.patch
  ext3 htree

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

misc.patch
  misc

write-deadlock.patch
  Fix the generic_file_write-from-same-mmapped-page deadlock

ioperm-fix.patch
  sys_ioperm() atomicity fix

radix_tree_gang_lookup.patch
  radix tree gang lookup

truncate_inode_pages.patch
  truncate/invalidate_inode_pages rewrite

proc_vmstat.patch
  Move the vm accounting out of /proc/stat

kswapd-reclaim-stats.patch
  Add kswapd_steal to /proc/vmstat

iowait.patch
  I/O wait statistics

bd-sard.patch

dio-bio-add-page.patch
  Use bio_add_page() in direct-io.c

tcp-wakeups.patch
  Use fast wakeups in TCP/IPV4

swapoff-deadlock.patch
  Fix a tmpfs swapoff deadlock

dirty-and-uptodate.patch
  page state cleanup

shmem_rename.patch
  shmem_rename() directory link count fix

dirent-size.patch
  tmpfs: show a non-zero size for directories

tmpfs-trivia.patch
  tmpfs: small fixlets

per-zone-vm.patch
  separate the kswapd and direct reclaim code paths

swsusp-feature.patch
  add shrink_all_memory() for swsusp

bio-get-nr-vecs.patch
  use bio_get_nr_vecs() in fs/mpage.c

dio-nr-segs.patch
  Use bio_get_nr_vecs() in direct-io.c

remove-page-virtual.patch
  remove page->virtual for !WANT_PAGE_VIRTUAL

dirty-memory-clamp.patch
  sterner dirty-memory clamping

mempool-wakeup-fix.patch
  Fix for stuck tasks in mempool_alloc()

remove-write_mapping_buffers.patch
  Remove write_mapping_buffers

buffer_boundary-scheduling.patch
  IO schduling for indirect blocks

ll_rw_block-cleanup.patch
  cleanup ll_rw_block()

lseek-ext2_readdir.patch
  remove lock_kernel() from ext2_readdir()

discontig-no-contig_page_data.patch
  undefine contif_page_data for discontigmem

per-node-mem_map.patch
  ia32 NUMA: per-node ZONE_NORMAL

alloc_pages_node-cleanup.patch
  alloc_pages_node cleanup

free_area_init-cleanup.patch
  free_area_init_node cleanup

batched-slab-asap.patch
  batched slab shrinking

akpm-deadline.patch
  deadline scheduler tweaks

rmqueue_bulk.patch
  bulk page allocator

free_pages_bulk.patch
  Bulk page freeing function

hot_cold_pages.patch
  Hot/Cold pages and zone->lock amortisation
  EDEC
  
  Hot/Cold pages and zone->lock amortisation
  

readahead-cold-pages.patch
  Use cache-cold pages for pagecache reads.

pagevec-hot-cold-hint.patch
  hot/cold hints for truncate and page reclaim

intel-user-copy.patch

read_barrier_depends.patch
  extended barrier primitives

rcu_ltimer.patch
  RCU core

dcache_rcu.patch
  Use RCU for dcache

^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: 2.5.40-mm1
@ 2002-10-09 23:20 Mala Anand
  2002-10-09 23:32 ` 2.5.40-mm1 Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Mala Anand @ 2002-10-09 23:20 UTC (permalink / raw)
  To: akpm, lkml, linux-mm; +Cc: Bill Hartner


>Andrew Morton wrote:

>So.  Patch is a huge win as-is.  For the PIII it looks like we need
>to enable it at all alignments except mod32.  And we need to test
>with aligned dest, unaligned source.

Pentium III (coppermine) 997Mhz 2-way
Read from pagecache to user buffer misaligning the source
Size of copy is 262144 and the number of iterations copied for
each test is 16384.
      Patch++ - uses copy_user_int if size > 64
      Patch - uses copy_user_int if size > 64, or src and dst
              are not aligned on an 8 byte boundary

dst aligned on an 4k and src misaligned

          2.5.40       2.5.40+patch     2.5.40+patch++
Align    throughout     throughput      throughput
(bytes)   KB/sec          KB/sec        KB/sec
0         275592          281356        285567
1         124266          197361
2         120157          200270
4         125935          197558
8         157244          156655        162189
16        167296          173202        173702
32        283731          285222        290810

Looks like the patch can be used for all the above tested
alignments on Pentium III.
>Can you please do some P4 testing?

P4 Xeon CPU 1.50 GHz 4-way - hyperthreading disabled
Src is aligned and dst is misaligned as follows:

 Dst      2.5.40       2.5.40+patch     2.5.40+patch++
Align    throughout     throughput      throughput
(bytes)   KB/sec          KB/sec        KB/sec
  0       1360071         1314783        912359
  1       323674           340447
  2       329202           336425
  4       512955           693170
  8       523223           615097        506641
 12       517184           558701        553700
 16       966598           872080        932736
 32       846937           838514        845178

I see too much variance in the test results so I ran
each test 3 times. I tried increasing the iterations
but it did not reduce the variance.

Dst is aligned and src is misaligned as follows:

 Dst      2.5.40       2.5.40+patch
Align    throughout     throughput
(bytes)   KB/sec          KB/sec
  0       1275372       1029815
  1        529907        511815
  2        534811        530850
  4        643196        627013
  8        568000        626676
 12        574468        658793
 16        631707        635979
 32        741485        592938

Since there is 5 - 10% variance in these test's results I am not
sure whether we can use this data to validate. I will try
to run this on another pentium 4 machine.

However I have seen using floating point registers instead of integer
registers on Pentium IV improves performance to a greater extent on
some alignments. I need to do more testing and then I will create a
patch for pentium IV.

Regards,
    Mala


   Mala Anand
   IBM Linux Technology Center - Kernel Performance
   E-mail:manand@us.ibm.com
   http://www-124.ibm.com/developerworks/opensource/linuxperf
   http://www-124.ibm.com/developerworks/projects/linuxperf
   Phone:838-8088; Tie-line:678-8088





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-10-09 23:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-01  9:32 2.5.40-mm1 Andrew Morton
2002-10-09 23:20 2.5.40-mm1 Mala Anand
2002-10-09 23:32 ` 2.5.40-mm1 Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).