linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.73-mm2
@ 2003-06-28  3:21 Andrew Morton
  2003-06-28  8:56 ` 2.5.73-mm2 William Lee Irwin III
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Andrew Morton @ 2003-06-28  3:21 UTC (permalink / raw)
  To: linux-kernel, linux-mm


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.73/2.5.73-mm2/

Just bits and pieces.





Changes since 2.5.73-mm1:


 linus.patch

 Latest Linus tree

-show_stack-fix.patch
-pci-1.patch
-pci-2.patch
-pci-3.patch
-pci-4.patch
-pci-5.patch
-alsa-pnp-fix.patch
-setscheduler-fix.patch
-ide_setting_sem-fix.patch
-misc6.patch
-AT_SECURE-auxv-entry.patch
-common-kernel-DSO-name.patch
-get_unmapped_area-speedup.patch
-d_invalidate-fix.patch
-nfs_unlink-d_count-fix.patch
-hpfs-d_count-fix.patch
-smbfs-oops-workaround.patch
-enable-cardbus-bursting.patch
-n_tty-column-counting-fix.patch
-numa-normalised-node-load.patch
-enable-local-apic-on-p4.patch
-knfsd-umount-fix.patch
-getrlimit-ifdef-fix.patch
-amd64-monotonic-clock.patch

 Merged

+kgdb-ga-docco-fixes.patch

 kgdb documentation fixes

+pppoe-revert.patch

 Back out broken pppoe changes until it gets fixed up.

+move_vma-VM_LOCKED-fix.patch

 Fix the mremap use-after-free fix

+ipcsem-speedup.patch

 Speed up sysv semaphore operations

+feral-fix.patch

 Fix the linux_isp qlogic driver for non-odular builds.

-cfq-2.patch

 Dropped.  I'm firming up the IO scheduler rework for a merge and the -mm
 CFQ inplementation is way out of date.

+blk-batching-cleanups.patch

 Minor touchups

+truncate-pagefault-race-fix-fix.patch

 Tighten up the MP synchronisation

-security-vm_enough_memory.patch
+security_vm_enough_memory.patch

 New version

+nbd-remove-blksize-bits.patch
+nbd-kobject-oops-fix.patch
+nbd-paranioa-cleanups.patch
+nbd-locking-fixes.patch

 More NBD work

-nr_running-speedup.patch

 Dropped.  It seemed to have no net benefit.

+lowmem_page_address-cleanup.patch

 Simplify lowmem_page_address()

+numa-memory-reporting-fix.patch

 Fix NUMA memory reporting (needs more work)

+syslog-efault-reporting.patch

 check copy_*_user return values

+acpismp-fix.patch

 Fix `acpismp=force'

+div64-cleanup.patch

 Consolidate and fix div64 code

+init_timer-debug-trap.patch

 Debug code to catch people running init_timer() against a running timer.

+dvd-ram-rw-fix.patch

 ide-scsi RW mount fix

+mixcomwd-update.patch
+arc-rimi-race-fix.patch

 minor fixups

+slab-drain-all-objects-fix.patch

 Make slab free up all the objects when destroying caches.

+ext3-remove-version.patch

 Remove the version information from ext3.

+cdrom-eject-hang-fix.patch

 Fix mount-time hangs caused by CDROM eject commands.





All 138 patches:


linus.patch
  cset-1.1348.16.4-to-1.1516.txt.gz

mm.patch
  add -mmN to EXTRAVERSION

kgdb-ga.patch
  kgdb stub for ia32 (George Anzinger's one)

kgdb-use-ggdb.patch

kgdb-ga-docco-fixes.patch
  kgdb doc. edits/corrections

HZ-100.patch

handle-no-readpage-2.patch
  check for presence of readpage() in the readahead code

pppoe-revert.patch
  PPPOE reversion

config_spinline.patch
  uninline spinlocks for profiling accuracy.

ppc64-fixes-2.patch
  Maek ppc64 compile

ppc64-bat-initialisation-fix.patch
  ppc64: BAT initialisation fix

ppc64-pci-update.patch

ppc64-reloc_hide.patch

ppc64-semaphore-reimplementation.patch
  ppc64: use the ia32 semaphore implementation

sym-do-160.patch
  make the SYM driver do 160 MB/sec

x86_64-fixes.patch
  x86_64 fixes

irqreturn-snd-via-fix.patch
  via sound irqreturn fix

config-PAGE_OFFSET.patch
  Configurable kenrel/user memory split

lru_cache_add-check.patch
  lru_cache_add debug check

delay-ksoftirqd-fallback.patch
  Try harded in IRQ context before falling back to ksoftirqd

fb-image-depth-fix.patch
  fbdev image depth fix

move_vma-VM_LOCKED-fix.patch
  move_vma() make_pages_present() fix

ds-09-vicam-usercopy-fix.patch
  vicam usercopy fix

buffer-debug.patch
  buffer.c debugging

reiserfs-unmapped-buffer-fix.patch
  Fix reiserfs BUG

e100-use-after-free-fix.patch

3-unmap-page-debugging.patch
  page unmappng debug patch

VM_RESERVED-check.patch
  VM_RESERVED check

ipcsem-speedup.patch
  ipc semaphore optimization

rcu-stats.patch
  RCU statistics reporting

mtrr-hang-fix.patch
  Fix mtrr-related hang

reslabify-pgds-and-pmds.patch
  re-slabify i386 pgd's and pmd's

linux-isp.patch

isp-update-1.patch

isp-remove-pci_detect.patch

feral-fix.patch
  linux-isp fix

list_del-debug.patch
  list_del debug check

airo-schedule-fix.patch
  airo.c: don't sleep in atomic regions

resurrect-batch_requests.patch
  bring back the batch_requests function

kblockd.patch
  Create `kblockd' workqueue

cfq-infrastructure.patch

elevator-completion-api.patch
  elevator completion API

as-iosched.patch
  anticipatory I/O scheduler
  AS: pgbench improvement
  AS: discrete read fifo batches
  AS sync/async batches
  AS: hash removal fix
  AS jumbo patch (for SCSI and TCQ)
  AS: fix stupid thinko
  AS: no batch-antic-limit
  AS: autotune write batches
  AS: divide by zero fix
  AS: more HZ != 1000 fixes
  AS: update_write_batch tuning
  AS locking
  AS HZ fixes

as-double-free-and-debug.patch
  AS: fix a leak + more debugging

as-fix-seek-estimation.patch
  AS: maybe repair performance drop of random read O_DIRECT

as-fix-seeky-loads.patch
  AS: fix IBM's seek load

unplug-use-kblockd.patch
  Use kblockd for running request queues

per-queue-nr_requests.patch
  per queue nr_requests

blk-invert-watermarks.patch
  blk_congestion_wait threshold cleanup

blk-as-hint.patch
  blk-as-hint

get_request_wait-oom-fix.patch
  handle OOM in get_request_wait().

blk-fair-batches.patch
  blk-fair-batches

blk-fair-batches-2.patch
  blk fair batches #2

generic-io-contexts.patch
  generic io contexts

blk-request-batching.patch
  block request batching

get_io_context-fix.patch
  get_io_context fixes

blk-allocation-commentary.patch
  block allocation comments

blk-batching-throttle-fix.patch
  blk batch requests fix

blk-batching-cleanups.patch
  block batching cleanups

print-build-options-on-oops.patch
  print a few config options on oops

mmap-prefault.patch
  prefault of executable mmaps

bio-debug-trap.patch
  BIO debugging patch

sound-irq-hack.patch

show_task-free-stack-fix.patch
  show_task() fix and cleanup

put_task_struct-debug.patch

ia32-mknod64.patch
  mknod64 for ia32

ext2-64-bit-special-inodes.patch
  ext2: support for 64-bit device nodes

ext3-64-bit-special-inodes.patch
  ext3: support for 64-bit device nodes

64-bit-dev_t-kdev_t.patch
  64-bit dev_t and kdev_t

oops-dump-preceding-code.patch
  i386 oops output: dump preceding code

lockmeter.patch

invalidate_mmap_range.patch
  Interface to invalidate regions of mmaps

aio-mm-refcounting-fix.patch
  fix /proc mm_struct refcounting bug

aio-01-retry.patch
  AIO: Core retry infrastructure

io_submit_one-EINVAL-fix.patch
  Fix aio process hang on EINVAL

aio-02-lockpage_wq.patch
  AIO: Async page wait

aio-03-fs_read.patch
  AIO: Filesystem aio read

aio-04-buffer_wq.patch
  AIO: Async buffer wait

aio-05-fs_write.patch
  AIO: Filesystem aio write

aio-05-fs_write-fix.patch

aio-06-bread_wq.patch
  AIO: Async block read

aio-06-bread_wq-fix.patch

aio-07-ext2getblk_wq.patch
  AIO: Async get block for ext2

O_SYNC-speedup-2.patch
  speed up O_SYNC writes

aio-09-o_sync.patch
  aio O_SYNC

aio-10-BUG-fix.patch
  AIO: fix a BUG

aio-11-workqueue-flush.patch
  AIO: flush workqueues before destroying ioctx'es

aio-12-readahead.patch
  AIO: readahead fixes

lock_buffer_wq-fix.patch
  lock_buffer_wq fix

unuse_mm-locked.patch
  AIO: hold the context lock across unuse_mm

aio-take-task_lock.patch
  From: Suparna Bhattacharya <suparna@in.ibm.com>
  Subject: Re: 2.5.72-mm1 - Under heavy testing with AIO,.. vmstat seems to blow the kernel

vfsmount_lock.patch
  From: Maneesh Soni <maneesh@in.ibm.com>
  Subject: [patch 1/2] vfsmount_lock

sched-hot-balancing-fix.patch
  fix for CPU scheduler load distribution

truncate-pagefault-race-fix.patch
  Fix vmtruncate race and distributed filesystem race

truncate-pagefault-race-fix-fix.patch
  Make sure truncate fix has no race

sleepometer.patch
  sleep instrumentation

time-goes-backwards.patch
  demonstrate do_gettimeofday() going backwards

skip-apic-ids-on-boot.patch
  skip apicids on boot

printk-oops-mangle-fix.patch
  disentangle printk's whilst oopsing on SMP

20-odirect_enable.patch

21-odirect_cruft.patch

22-read_proc.patch

23-write_proc.patch

24-commit_proc.patch

25-odirect.patch

nfs-O_DIRECT-always-enabled.patch
  Force CONFIG_NFS_DIRECTIO

seqcount-locking.patch
  i_size atomic access: infrastructure

i_size-atomic-access.patch
  i_size atomic access

aha152x-oops-fix.patch
  aha152X oops fixes

security_vm_enough_memory.patch
  Security hook for vm_enough_memory

nbd-cleanups.patch
  NBD: cosmetic cleanups

nbd-enhanced-diagnostics.patch
  nbd: enhanced diagnostics support

nbd-remove-blksize-bits.patch
  nbd: remove unneeded blksize_bits field

nbd-kobject-oops-fix.patch
  nbd: initialise the embedded kobject

nbd-paranioa-cleanups.patch
  nbd: cleanup PARANOIA usage & code

nbd-locking-fixes.patch
  nbd: fix locking issues with ioctl UI

pcmcia-event-20030623-1.patch

pcmcia-event-20030623-2.patch

pcmcia-event-20030623-3.patch

pcmcia-event-20030623-4.patch

pcmcia-event-20030623-5.patch

pcmcia-event-20030623-6.patch

sym2-bus_addr-fix.patch
  sym53c8xx_2 bus_addr fix

lost-tick-speedstep-fix.patch
  Fix lost tick detection for speedstep

sym2-remove-broken-bios-check.patch
  remove a bogus check in sym2 driver

rename-timer.patch
  timer cleanups

lowmem_page_address-cleanup.patch
  cleanup and generalise lowmem_page_address

numa-memory-reporting-fix.patch
  fix NUMA memory reporting ... again

syslog-efault-reporting.patch
  Fix syslog(2) EFAULT reporting

acpismp-fix.patch
  ACPI_HT_ONLY acpismp=force

div64-cleanup.patch
  Kill div64.h dupes and parenthesize do_div() parameters

init_timer-debug-trap.patch
  init_timer debugging

dvd-ram-rw-fix.patch
  2.5.73 can't mount DVD-RAM via ide-scsi

mixcomwd-update.patch
  Remove check_region and MOD_*_USE_COUNT from mixcomwd.c

arc-rimi-race-fix.patch
  Remove racy check_mem_region() call from arc-rimi.c

slab-drain-all-objects-fix.patch
  kmem_cache_destroy() forgets to drain all objects

ext3-remove-version.patch
  ext3: remove the version number

cdrom-eject-hang-fix.patch
  cdrom eject scribbles on the request flags




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28  3:21 2.5.73-mm2 Andrew Morton
@ 2003-06-28  8:56 ` William Lee Irwin III
  2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-06-28  8:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

On Fri, Jun 27, 2003 at 08:21:30PM -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.73/2.5.73-mm2/
> Just bits and pieces.

This could almost be sent through rusty, but I think in general BKL
removal patches should go through whatever ringer -mm et al provide.


-- wli

Remove spurious BKL acquisitions in /proc/. The BKL is not required to
access nr_threads for reporting, and get_locks_status() takes it
internally, wrapping all operations with it.


diff -prauN wli-2.5.73-3/fs/proc/proc_misc.c wli-2.5.73-4/fs/proc/proc_misc.c
--- wli-2.5.73-3/fs/proc/proc_misc.c	2003-06-23 10:29:54.000000000 -0700
+++ wli-2.5.73-4/fs/proc/proc_misc.c	2003-06-23 10:32:25.000000000 -0700
@@ -497,11 +497,10 @@ static int ds1286_read_proc(char *page, 
 static int locks_read_proc(char *page, char **start, off_t off,
 				 int count, int *eof, void *data)
 {
-	int len;
-	lock_kernel();
-	len = get_locks_status(page, start, off, count);
-	unlock_kernel();
-	if (len < count) *eof = 1;
+	int len = get_locks_status(page, start, off, count);
+
+	if (len < count)
+		*eof = 1;
 	return len;
 }
 
diff -prauN wli-2.5.73-3/fs/proc/root.c wli-2.5.73-4/fs/proc/root.c
--- wli-2.5.73-3/fs/proc/root.c	2003-06-22 11:33:07.000000000 -0700
+++ wli-2.5.73-4/fs/proc/root.c	2003-06-23 10:32:25.000000000 -0700
@@ -81,11 +81,13 @@ void __init proc_root_init(void)
 
 static struct dentry *proc_root_lookup(struct inode * dir, struct dentry * dentry)
 {
-	if (dir->i_ino == PROC_ROOT_INO) { /* check for safety... */
-		lock_kernel();
+	/*
+	 * nr_threads is actually protected by the tasklist_lock;
+	 * however, it's conventional to do reads, especially for
+	 * reporting, without any locking whatsoever.
+	 */
+	if (dir->i_ino == PROC_ROOT_INO) /* check for safety... */
 		dir->i_nlink = proc_root.nlink + nr_threads;
-		unlock_kernel();
-	}
 
 	if (!proc_lookup(dir, dentry)) {
 		return NULL;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28  3:21 2.5.73-mm2 Andrew Morton
  2003-06-28  8:56 ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-28 15:54 ` William Lee Irwin III
  2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
                     ` (2 more replies)
  2003-06-29 19:04 ` [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927 Adrian Bunk
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-06-28 15:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

On Fri, Jun 27, 2003 at 08:21:30PM -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.73/2.5.73-mm2/
> Just bits and pieces.

Here's highpmd. This allocates L2 pagetables from highmem, decreasing
the per-process lowmem overhead on CONFIG_HIGHMEM64G from 20KB to 8KB.
Some attempts were made to update non-i386 architectures to the new
API's, though they're entirely untested. It's been tested for a while
in -wli on i386 machines, both lowmem and highmem boxen.

-- wli

diff -prauN mm2-2.5.73-1/arch/i386/Kconfig mm2-2.5.73-2/arch/i386/Kconfig
--- mm2-2.5.73-1/arch/i386/Kconfig	2003-06-28 03:09:46.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/Kconfig	2003-06-28 03:11:37.000000000 -0700
@@ -765,6 +765,15 @@ config HIGHPTE
 	  low memory.  Setting this option will put user-space page table
 	  entries in high memory.
 
+config HIGHPMD
+	bool "Allocate 2nd-level pagetables from highmem"
+	depends on HIGHMEM64G
+	help
+	  The VM uses one pmd entry for each pagetable page of physical
+	  memory allocated. For systems with extreme amounts of highmem,
+	  this cannot be tolerated. Setting this option will put
+	  userspace 2nd-level pagetables in highmem.
+
 config MATH_EMULATION
 	bool "Math emulation"
 	---help---
diff -prauN mm2-2.5.73-1/arch/i386/kernel/vm86.c mm2-2.5.73-2/arch/i386/kernel/vm86.c
--- mm2-2.5.73-1/arch/i386/kernel/vm86.c	2003-06-22 11:32:33.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/kernel/vm86.c	2003-06-28 03:11:37.000000000 -0700
@@ -144,12 +144,14 @@ static void mark_screen_rdonly(struct ta
 		pgd_clear(pgd);
 		goto out;
 	}
-	pmd = pmd_offset(pgd, 0xA0000);
-	if (pmd_none(*pmd))
+	pmd = pmd_offset_map(pgd, 0xA0000);
+	if (pmd_none(*pmd)) {
+		pmd_unmap(pmd);
 		goto out;
-	if (pmd_bad(*pmd)) {
+	} else if (pmd_bad(*pmd)) {
 		pmd_ERROR(*pmd);
 		pmd_clear(pmd);
+		pmd_unmap(pmd);
 		goto out;
 	}
 	pte = mapped = pte_offset_map(pmd, 0xA0000);
@@ -159,6 +161,7 @@ static void mark_screen_rdonly(struct ta
 		pte++;
 	}
 	pte_unmap(mapped);
+	pmd_unmap(pmd);
 out:
 	spin_unlock(&tsk->mm->page_table_lock);
 	preempt_enable();
diff -prauN mm2-2.5.73-1/arch/i386/mm/fault.c mm2-2.5.73-2/arch/i386/mm/fault.c
--- mm2-2.5.73-1/arch/i386/mm/fault.c	2003-06-28 03:09:46.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/fault.c	2003-06-28 03:28:31.000000000 -0700
@@ -253,6 +253,7 @@ no_context:
 	printk(" printing eip:\n");
 	printk("%08lx\n", regs->eip);
 	asm("movl %%cr3,%0":"=r" (page));
+#ifndef CONFIG_HIGHPMD /* Oh boy. Error reporting is going to blow major goats. */
 	page = ((unsigned long *) __va(page))[address >> 22];
 	printk(KERN_ALERT "*pde = %08lx\n", page);
 	/*
@@ -268,7 +269,14 @@ no_context:
 		page = ((unsigned long *) __va(page))[address >> PAGE_SHIFT];
 		printk(KERN_ALERT "*pte = %08lx\n", page);
 	}
-#endif
+#endif /* !CONFIG_HIGHPTE */
+#else	/* CONFIG_HIGHPMD */
+	printk(KERN_ALERT "%%cr3 = 0x%lx\n", page);
+	/* Mask off flag bits. It should end up 32B-aligned. */
+	page &= ~(PTRS_PER_PGD*sizeof(pgd_t) - 1);
+	printk(KERN_ALERT "*pdpte = 0x%Lx\n",
+			pgd_val(((pgd_t *)__va(page))[address >> PGDIR_SHIFT]));
+#endif /* CONFIG_HIGHPMD */
 	die("Oops", regs, error_code);
 	bust_spinlocks(0);
 	do_exit(SIGKILL);
@@ -336,8 +344,8 @@ vmalloc_fault:
 		 * and redundant with the set_pmd() on non-PAE.
 		 */
 
-		pmd = pmd_offset(pgd, address);
-		pmd_k = pmd_offset(pgd_k, address);
+		pmd = pmd_offset_kernel(pgd, address);
+		pmd_k = pmd_offset_kernel(pgd_k, address);
 		if (!pmd_present(*pmd_k))
 			goto no_context;
 		set_pmd(pmd, *pmd_k);
diff -prauN mm2-2.5.73-1/arch/i386/mm/hugetlbpage.c mm2-2.5.73-2/arch/i386/mm/hugetlbpage.c
--- mm2-2.5.73-1/arch/i386/mm/hugetlbpage.c	2003-06-22 11:33:17.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/hugetlbpage.c	2003-06-28 03:11:37.000000000 -0700
@@ -87,8 +87,8 @@ static pte_t *huge_pte_alloc(struct mm_s
 	pmd_t *pmd = NULL;
 
 	pgd = pgd_offset(mm, addr);
-	pmd = pmd_alloc(mm, pgd, addr);
-	return (pte_t *) pmd;
+	pmd = pmd_alloc_map(mm, pgd, addr);
+	return (pte_t *)pmd;
 }
 
 static pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
@@ -97,8 +97,8 @@ static pte_t *huge_pte_offset(struct mm_
 	pmd_t *pmd = NULL;
 
 	pgd = pgd_offset(mm, addr);
-	pmd = pmd_offset(pgd, addr);
-	return (pte_t *) pmd;
+	pmd = pmd_offset_map(pgd, addr);
+	return (pte_t *)pmd;
 }
 
 static void set_huge_pte(struct mm_struct *mm, struct vm_area_struct *vma, struct page *page, pte_t * page_table, int write_access)
@@ -145,6 +145,8 @@ int copy_hugetlb_page_range(struct mm_st
 		ptepage = pte_page(entry);
 		get_page(ptepage);
 		set_pte(dst_pte, entry);
+		pmd_unmap(dst_pte);
+		pmd_unmap_nested(src_pte);
 		dst->rss += (HPAGE_SIZE / PAGE_SIZE);
 		addr += HPAGE_SIZE;
 	}
@@ -182,6 +184,7 @@ follow_hugetlb_page(struct mm_struct *mm
 
 			get_page(page);
 			pages[i] = page;
+			pmd_unmap(pte);
 		}
 
 		if (vmas)
@@ -271,6 +274,7 @@ follow_huge_pmd(struct mm_struct *mm, un
 		page += ((address & ~HPAGE_MASK) >> PAGE_SHIFT);
 		get_page(page);
 	}
+	pmd_unmap(pmd);
 	return page;
 }
 #endif
@@ -314,6 +318,7 @@ void unmap_hugepage_range(struct vm_area
 		page = pte_page(*pte);
 		huge_page_release(page);
 		pte_clear(pte);
+		pmd_unmap(pte);
 	}
 	mm->rss -= (end - start) >> PAGE_SHIFT;
 	flush_tlb_range(vma, start, end);
@@ -358,16 +363,19 @@ int hugetlb_prefault(struct address_spac
 			page = alloc_hugetlb_page();
 			if (!page) {
 				ret = -ENOMEM;
+				pmd_unmap(pte);
 				goto out;
 			}
 			ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC);
 			unlock_page(page);
 			if (ret) {
 				free_huge_page(page);
+				pmd_unmap(pte);
 				goto out;
 			}
 		}
 		set_huge_pte(mm, vma, page, pte, vma->vm_flags & VM_WRITE);
+		pmd_unmap(pte);
 	}
 out:
 	spin_unlock(&mm->page_table_lock);
diff -prauN mm2-2.5.73-1/arch/i386/mm/init.c mm2-2.5.73-2/arch/i386/mm/init.c
--- mm2-2.5.73-1/arch/i386/mm/init.c	2003-06-28 03:09:46.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/init.c	2003-06-28 03:22:00.000000000 -0700
@@ -59,10 +59,10 @@ static pmd_t * __init one_md_table_init(
 #ifdef CONFIG_X86_PAE
 	pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE);
 	set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
-	if (pmd_table != pmd_offset(pgd, 0)) 
+	if (pmd_table != pmd_offset_kernel(pgd, 0)) 
 		BUG();
 #else
-	pmd_table = pmd_offset(pgd, 0);
+	pmd_table = pmd_offset_kernel(pgd, 0);
 #endif
 
 	return pmd_table;
@@ -113,7 +113,7 @@ static void __init page_table_range_init
 		if (pgd_none(*pgd)) 
 			one_md_table_init(pgd);
 
-		pmd = pmd_offset(pgd, vaddr);
+		pmd = pmd_offset_kernel(pgd, vaddr);
 		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end); pmd++, pmd_idx++) {
 			if (pmd_none(*pmd)) 
 				one_page_table_init(pmd);
@@ -194,7 +194,7 @@ pte_t *kmap_pte;
 pgprot_t kmap_prot;
 
 #define kmap_get_fixmap_pte(vaddr)					\
-	pte_offset_kernel(pmd_offset(pgd_offset_k(vaddr), (vaddr)), (vaddr))
+	pte_offset_kernel(pmd_offset_kernel(pgd_offset_k(vaddr), (vaddr)), (vaddr))
 
 void __init kmap_init(void)
 {
@@ -218,7 +218,7 @@ void __init permanent_kmaps_init(pgd_t *
 	page_table_range_init(vaddr, vaddr + PAGE_SIZE*LAST_PKMAP, pgd_base);
 
 	pgd = swapper_pg_dir + pgd_index(vaddr);
-	pmd = pmd_offset(pgd, vaddr);
+	pmd = pmd_offset_kernel(pgd, vaddr);
 	pte = pte_offset_kernel(pmd, vaddr);
 	pkmap_page_table = pte;	
 }
@@ -513,20 +513,9 @@ void __init mem_init(void)
 }
 
 kmem_cache_t *pgd_cache;
-kmem_cache_t *pmd_cache;
 
 void __init pgtable_cache_init(void)
 {
-	if (PTRS_PER_PMD > 1) {
-		pmd_cache = kmem_cache_create("pmd",
-					PTRS_PER_PMD*sizeof(pmd_t),
-					0,
-					SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
-					pmd_ctor,
-					NULL);
-		if (!pmd_cache)
-			panic("pgtable_cache_init(): cannot create pmd cache");
-	}
 	pgd_cache = kmem_cache_create("pgd",
 				PTRS_PER_PGD*sizeof(pgd_t),
 				0,
diff -prauN mm2-2.5.73-1/arch/i386/mm/ioremap.c mm2-2.5.73-2/arch/i386/mm/ioremap.c
--- mm2-2.5.73-1/arch/i386/mm/ioremap.c	2003-06-22 11:32:38.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/ioremap.c	2003-06-28 03:11:37.000000000 -0700
@@ -82,7 +82,7 @@ static int remap_area_pages(unsigned lon
 	spin_lock(&init_mm.page_table_lock);
 	do {
 		pmd_t *pmd;
-		pmd = pmd_alloc(&init_mm, dir, address);
+		pmd = pmd_alloc_kernel(&init_mm, dir, address);
 		error = -ENOMEM;
 		if (!pmd)
 			break;
diff -prauN mm2-2.5.73-1/arch/i386/mm/pageattr.c mm2-2.5.73-2/arch/i386/mm/pageattr.c
--- mm2-2.5.73-1/arch/i386/mm/pageattr.c	2003-06-28 03:09:46.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/pageattr.c	2003-06-28 03:12:16.000000000 -0700
@@ -23,7 +23,7 @@ static inline pte_t *lookup_address(unsi
 	pmd_t *pmd;
 	if (pgd_none(*pgd))
 		return NULL;
-	pmd = pmd_offset(pgd, address); 	       
+	pmd = pmd_offset_kernel(pgd, address); 	       
 	if (pmd_none(*pmd))
 		return NULL;
 	if (pmd_large(*pmd))
@@ -79,7 +79,7 @@ static void set_pmd_pte(pte_t *kpte, uns
 		pgd_t *pgd;
 		pmd_t *pmd;
 		pgd = (pgd_t *)page_address(page) + pgd_index(address);
-		pmd = pmd_offset(pgd, address);
+		pmd = pmd_offset_kernel(pgd, address);
 		set_pte_atomic((pte_t *)pmd, pte);
 	}
 	spin_unlock_irqrestore(&pgd_lock, flags);
@@ -92,7 +92,7 @@ static void set_pmd_pte(pte_t *kpte, uns
 static inline void revert_page(struct page *kpte_page, unsigned long address)
 {
 	pte_t *linear = (pte_t *) 
-		pmd_offset(pgd_offset(&init_mm, address), address);
+		pmd_offset_kernel(pgd_offset_k(address), address);
 	set_pmd_pte(linear,  address,
 		    pfn_pte((__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT,
 			    PAGE_KERNEL_LARGE));
diff -prauN mm2-2.5.73-1/arch/i386/mm/pgtable.c mm2-2.5.73-2/arch/i386/mm/pgtable.c
--- mm2-2.5.73-1/arch/i386/mm/pgtable.c	2003-06-28 03:09:46.000000000 -0700
+++ mm2-2.5.73-2/arch/i386/mm/pgtable.c	2003-06-28 08:20:29.000000000 -0700
@@ -70,7 +70,7 @@ static void set_pte_pfn(unsigned long va
 		BUG();
 		return;
 	}
-	pmd = pmd_offset(pgd, vaddr);
+	pmd = pmd_offset_kernel(pgd, vaddr);
 	if (pmd_none(*pmd)) {
 		BUG();
 		return;
@@ -110,7 +110,7 @@ void set_pmd_pfn(unsigned long vaddr, un
 		printk ("set_pmd_pfn: pgd_none\n");
 		return; /* BUG(); */
 	}
-	pmd = pmd_offset(pgd, vaddr);
+	pmd = pmd_offset_kernel(pgd, vaddr);
 	set_pmd(pmd, pfn_pmd(pfn, flags));
 	/*
 	 * It's enough to flush this one mapping.
@@ -152,11 +152,6 @@ struct page *pte_alloc_one(struct mm_str
 	return pte;
 }
 
-void pmd_ctor(void *pmd, kmem_cache_t *cache, unsigned long flags)
-{
-	memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
 /*
  * List of all pgd's needed for non-PAE so it can invalidate entries
  * in both cached and uncached pgd's; not needed for PAE since the
@@ -203,6 +198,12 @@ void pgd_dtor(void *pgd, kmem_cache_t *c
 	spin_unlock_irqrestore(&pgd_lock, flags);
 }
 
+#ifdef CONFIG_HIGHPMD
+#define	GFP_PMD		(__GFP_REPEAT|__GFP_HIGHMEM|GFP_KERNEL)
+#else
+#define GFP_PMD		(__GFP_REPEAT|GFP_KERNEL)
+#endif
+
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	int i;
@@ -212,16 +213,17 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 		return pgd;
 
 	for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
-		pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+		struct page *pmd = alloc_page(GFP_PMD);
 		if (!pmd)
 			goto out_oom;
-		set_pgd(&pgd[i], __pgd(1 + __pa((u64)((u32)pmd))));
+		clear_highpage(pmd);
+		set_pgd(&pgd[i], __pgd(1ULL | (u64)page_to_pfn(pmd) << PAGE_SHIFT));
 	}
 	return pgd;
 
 out_oom:
 	for (i--; i >= 0; i--)
-		kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
+		__free_page(pgd_page(pgd[i]));
 	kmem_cache_free(pgd_cache, pgd);
 	return NULL;
 }
@@ -233,7 +235,7 @@ void pgd_free(pgd_t *pgd)
 	/* in the PAE case user pgd entries are overwritten before usage */
 	if (PTRS_PER_PMD > 1)
 		for (i = 0; i < USER_PTRS_PER_PGD; ++i)
-			kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
+			__free_page(pgd_page(pgd[i]));
 	/* in the non-PAE case, clear_page_tables() clears user pgd entries */
 	kmem_cache_free(pgd_cache, pgd);
 }
diff -prauN mm2-2.5.73-1/arch/sparc/mm/srmmu.c mm2-2.5.73-2/arch/sparc/mm/srmmu.c
--- mm2-2.5.73-1/arch/sparc/mm/srmmu.c	2003-06-22 11:32:56.000000000 -0700
+++ mm2-2.5.73-2/arch/sparc/mm/srmmu.c	2003-06-28 03:11:37.000000000 -0700
@@ -2180,7 +2180,7 @@ void __init ld_mmu_srmmu(void)
 
 	BTFIXUPSET_CALL(pte_pfn, srmmu_pte_pfn, BTFIXUPCALL_NORM);
 	BTFIXUPSET_CALL(pmd_page, srmmu_pmd_page, BTFIXUPCALL_NORM);
-	BTFIXUPSET_CALL(pgd_page, srmmu_pgd_page, BTFIXUPCALL_NORM);
+	BTFIXUPSET_CALL(__pgd_page, srmmu_pgd_page, BTFIXUPCALL_NORM);
 
 	BTFIXUPSET_SETHI(none_mask, 0xF0000000);
 
diff -prauN mm2-2.5.73-1/arch/sparc/mm/sun4c.c mm2-2.5.73-2/arch/sparc/mm/sun4c.c
--- mm2-2.5.73-1/arch/sparc/mm/sun4c.c	2003-06-22 11:33:06.000000000 -0700
+++ mm2-2.5.73-2/arch/sparc/mm/sun4c.c	2003-06-28 03:11:37.000000000 -0700
@@ -2252,5 +2252,5 @@ void __init ld_mmu_sun4c(void)
 
 	/* These should _never_ get called with two level tables. */
 	BTFIXUPSET_CALL(pgd_set, sun4c_pgd_set, BTFIXUPCALL_NOP);
-	BTFIXUPSET_CALL(pgd_page, sun4c_pgd_page, BTFIXUPCALL_RETO0);
+	BTFIXUPSET_CALL(__pgd_page, sun4c_pgd_page, BTFIXUPCALL_RETO0);
 }
diff -prauN mm2-2.5.73-1/drivers/char/drm/drm_memory.h mm2-2.5.73-2/drivers/char/drm/drm_memory.h
--- mm2-2.5.73-1/drivers/char/drm/drm_memory.h	2003-06-22 11:32:35.000000000 -0700
+++ mm2-2.5.73-2/drivers/char/drm/drm_memory.h	2003-06-28 03:11:37.000000000 -0700
@@ -123,7 +123,7 @@ static inline unsigned long
 drm_follow_page (void *vaddr)
 {
 	pgd_t *pgd = pgd_offset_k((unsigned long) vaddr);
-	pmd_t *pmd = pmd_offset(pgd, (unsigned long) vaddr);
+	pmd_t *pmd = pmd_offset_kernel(pgd, (unsigned long)vaddr);
 	pte_t *ptep = pte_offset_kernel(pmd, (unsigned long) vaddr);
 	return pte_pfn(*ptep) << PAGE_SHIFT;
 }
diff -prauN mm2-2.5.73-1/fs/exec.c mm2-2.5.73-2/fs/exec.c
--- mm2-2.5.73-1/fs/exec.c	2003-06-28 03:09:53.000000000 -0700
+++ mm2-2.5.73-2/fs/exec.c	2003-06-28 03:11:37.000000000 -0700
@@ -304,10 +304,10 @@ void put_dirty_page(struct task_struct *
 	if (!pte_chain)
 		goto out_sig;
 	spin_lock(&tsk->mm->page_table_lock);
-	pmd = pmd_alloc(tsk->mm, pgd, address);
+	pmd = pmd_alloc_map(tsk->mm, pgd, address);
 	if (!pmd)
 		goto out;
-	pte = pte_alloc_map(tsk->mm, pmd, address);
+	pte = pte_alloc_map(tsk->mm, &pmd, address);
 	if (!pte)
 		goto out;
 	if (!pte_none(*pte)) {
@@ -319,6 +319,7 @@ void put_dirty_page(struct task_struct *
 	set_pte(pte, pte_mkdirty(pte_mkwrite(mk_pte(page, prot))));
 	pte_chain = page_add_rmap(page, pte, pte_chain);
 	pte_unmap(pte);
+	pmd_unmap(pmd);
 	tsk->mm->rss++;
 	spin_unlock(&tsk->mm->page_table_lock);
 
@@ -326,6 +327,8 @@ void put_dirty_page(struct task_struct *
 	pte_chain_free(pte_chain);
 	return;
 out:
+	if (pmd)
+		pmd_unmap(pmd);
 	spin_unlock(&tsk->mm->page_table_lock);
 out_sig:
 	__free_page(page);
diff -prauN mm2-2.5.73-1/include/asm-alpha/pgtable.h mm2-2.5.73-2/include/asm-alpha/pgtable.h
--- mm2-2.5.73-1/include/asm-alpha/pgtable.h	2003-06-22 11:32:38.000000000 -0700
+++ mm2-2.5.73-2/include/asm-alpha/pgtable.h	2003-06-28 08:20:41.000000000 -0700
@@ -229,9 +229,11 @@ pmd_page_kernel(pmd_t pmd)
 #define pmd_page(pmd)	(mem_map + ((pmd_val(pmd) & _PFN_MASK) >> 32))
 #endif
 
-extern inline unsigned long pgd_page(pgd_t pgd)
+extern inline unsigned long __pgd_page(pgd_t pgd)
 { return PAGE_OFFSET + ((pgd_val(pgd) & _PFN_MASK) >> (32-PAGE_SHIFT)); }
 
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
+
 extern inline int pte_none(pte_t pte)		{ return !pte_val(pte); }
 extern inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_VALID; }
 extern inline void pte_clear(pte_t *ptep)	{ pte_val(*ptep) = 0; }
@@ -280,7 +282,7 @@ extern inline pte_t pte_mkyoung(pte_t pt
 /* Find an entry in the second-level page table.. */
 extern inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
 {
-	return (pmd_t *) pgd_page(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
+	return (pmd_t *)__pgd_page(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1));
 }
 
 /* Find an entry in the third-level page table.. */
diff -prauN mm2-2.5.73-1/include/asm-arm/pgtable.h mm2-2.5.73-2/include/asm-arm/pgtable.h
--- mm2-2.5.73-1/include/asm-arm/pgtable.h	2003-06-22 11:32:38.000000000 -0700
+++ mm2-2.5.73-2/include/asm-arm/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -125,6 +125,11 @@ extern struct page *empty_zero_page;
 
 /* Find an entry in the second-level page table.. */
 #define pmd_offset(dir, addr)	((pmd_t *)(dir))
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
 
 /* Find an entry in the third-level page table.. */
 #define __pte_index(addr)	(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
diff -prauN mm2-2.5.73-1/include/asm-arm26/pgtable.h mm2-2.5.73-2/include/asm-arm26/pgtable.h
--- mm2-2.5.73-1/include/asm-arm26/pgtable.h	2003-06-22 11:32:32.000000000 -0700
+++ mm2-2.5.73-2/include/asm-arm26/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -189,6 +189,12 @@ extern struct page *empty_zero_page;
 #define pte_unmap(pte)                  do { } while (0)
 #define pte_unmap_nested(pte)           do { } while (0)
 
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pgd, addr)			do { } while (0)
+#define pmd_unmap_nested(pgd, addr)		do { } while (0)
+
 
 #define _PAGE_PRESENT   0x01
 #define _PAGE_READONLY  0x02
diff -prauN mm2-2.5.73-1/include/asm-h8300/pgtable.h mm2-2.5.73-2/include/asm-h8300/pgtable.h
--- mm2-2.5.73-1/include/asm-h8300/pgtable.h	2003-06-22 11:32:42.000000000 -0700
+++ mm2-2.5.73-2/include/asm-h8300/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -15,6 +15,11 @@ typedef pte_t *pte_addr_t;
 #define pgd_clear(pgdp)
 #define kern_addr_valid(addr)	(1)
 #define	pmd_offset(a, b)	((void *)0)
+#define pmd_offset_kernel(a,b)		pmd_offset(a,b)
+#define pmd_offset_map(a,b)		pmd_offset(a,b)
+#define pmd_offset_map_nested(a,b)	pmd_offset(a,b)
+#define pmd_unmap(pmd)			do { } while (0)
+#define pmd_unmap_nested(pmd)		do { } while (0)
 
 #define PAGE_NONE		__pgprot(0)    /* these mean nothing to NO_MM */
 #define PAGE_SHARED		__pgprot(0)    /* these mean nothing to NO_MM */
diff -prauN mm2-2.5.73-1/include/asm-i386/kmap_types.h mm2-2.5.73-2/include/asm-i386/kmap_types.h
--- mm2-2.5.73-1/include/asm-i386/kmap_types.h	2003-06-22 11:33:01.000000000 -0700
+++ mm2-2.5.73-2/include/asm-i386/kmap_types.h	2003-06-28 03:11:37.000000000 -0700
@@ -17,14 +17,16 @@ D(3)	KM_USER0,
 D(4)	KM_USER1,
 D(5)	KM_BIO_SRC_IRQ,
 D(6)	KM_BIO_DST_IRQ,
-D(7)	KM_PTE0,
-D(8)	KM_PTE1,
-D(9)	KM_PTE2,
-D(10)	KM_IRQ0,
-D(11)	KM_IRQ1,
-D(12)	KM_SOFTIRQ0,
-D(13)	KM_SOFTIRQ1,
-D(14)	KM_TYPE_NR
+D(7)	KM_PMD0,
+D(8)	KM_PMD1,
+D(9)	KM_PTE0,
+D(10)	KM_PTE1,
+D(11)	KM_PTE2,
+D(12)	KM_IRQ0,
+D(13)	KM_IRQ1,
+D(14)	KM_SOFTIRQ0,
+D(15)	KM_SOFTIRQ1,
+D(16)	KM_TYPE_NR
 };
 
 #undef D
diff -prauN mm2-2.5.73-1/include/asm-i386/pgalloc.h mm2-2.5.73-2/include/asm-i386/pgalloc.h
--- mm2-2.5.73-1/include/asm-i386/pgalloc.h	2003-06-22 11:32:31.000000000 -0700
+++ mm2-2.5.73-2/include/asm-i386/pgalloc.h	2003-06-28 08:06:24.000000000 -0700
@@ -46,6 +46,7 @@ static inline void pte_free(struct page 
  */
 
 #define pmd_alloc_one(mm, addr)		({ BUG(); ((pmd_t *)2); })
+#define pmd_alloc_one_kernel(mm, addr)	({ BUG(); ((pmd_t *)2); })
 #define pmd_free(x)			do { } while (0)
 #define __pmd_free_tlb(tlb,x)		do { } while (0)
 #define pgd_populate(mm, pmd, pte)	BUG()
diff -prauN mm2-2.5.73-1/include/asm-i386/pgtable-2level.h mm2-2.5.73-2/include/asm-i386/pgtable-2level.h
--- mm2-2.5.73-1/include/asm-i386/pgtable-2level.h	2003-06-22 11:32:55.000000000 -0700
+++ mm2-2.5.73-2/include/asm-i386/pgtable-2level.h	2003-06-28 03:11:37.000000000 -0700
@@ -48,13 +48,15 @@ static inline int pgd_present(pgd_t pgd)
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
 #define set_pgd(pgdptr, pgdval) (*(pgdptr) = pgdval)
 
-#define pgd_page(pgd) \
-((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+
+#define pmd_offset_map(pgd, addr)		({ (pmd_t *)(pgd); })
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset_map(pgd, addr)
+#define pmd_offset_kernel(pgd, addr)		pmd_offset_map(pgd, addr)
+
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
 
-static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
-{
-	return (pmd_t *) dir;
-}
 #define ptep_get_and_clear(xp)	__pte(xchg(&(xp)->pte_low, 0))
 #define pte_same(a, b)		((a).pte_low == (b).pte_low)
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
diff -prauN mm2-2.5.73-1/include/asm-i386/pgtable-3level.h mm2-2.5.73-2/include/asm-i386/pgtable-3level.h
--- mm2-2.5.73-1/include/asm-i386/pgtable-3level.h	2003-06-28 03:09:54.000000000 -0700
+++ mm2-2.5.73-2/include/asm-i386/pgtable-3level.h	2003-06-28 08:21:14.000000000 -0700
@@ -64,12 +64,32 @@ static inline void set_pte(pte_t *ptep, 
  */
 static inline void pgd_clear (pgd_t * pgd) { }
 
-#define pgd_page(pgd) \
-((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+	return pgd_val(pgd) >> PAGE_SHIFT;
+}
+
+#define pgd_page(pgd)		pfn_to_page(pgd_pfn(pgd))
+
+#define pmd_offset_kernel(pgd, addr)					\
+	((pmd_t *)__va(pgd_val(*(pgd)) & PAGE_MASK) + pmd_index(addr))
 
 /* Find an entry in the second-level page table.. */
-#define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \
-			pmd_index(address))
+#ifdef CONFIG_HIGHPMD
+#define __pmd_offset(pgd, addr, type)					\
+	((pmd_t *)kmap_atomic(pgd_page(*(pgd)), type) + pmd_index(addr))
+#define __pmd_unmap(pmd, type)		kunmap_atomic(pmd, type)
+#else
+#define __pmd_offset(pgd, addr, type)					\
+	((pmd_t *)__va(pgd_val(*(pgd)) & PAGE_MASK) + pmd_index(addr))
+#define __pmd_unmap(pmd, type)		do { } while (0)
+#endif
+
+#define pmd_offset_map(pgd, addr)		__pmd_offset(pgd, addr, KM_PMD0)
+#define pmd_offset_map_nested(pgd, addr)	__pmd_offset(pgd, addr, KM_PMD1)
+
+#define pmd_unmap(pmd)				__pmd_unmap(pmd, KM_PMD0);
+#define pmd_unmap_nested(pmd)			__pmd_unmap(pmd, KM_PMD1);
 
 static inline pte_t ptep_get_and_clear(pte_t *ptep)
 {
diff -prauN mm2-2.5.73-1/include/asm-i386/pgtable.h mm2-2.5.73-2/include/asm-i386/pgtable.h
--- mm2-2.5.73-1/include/asm-i386/pgtable.h	2003-06-28 03:09:54.000000000 -0700
+++ mm2-2.5.73-2/include/asm-i386/pgtable.h	2003-06-28 08:18:44.000000000 -0700
@@ -33,11 +33,9 @@
 extern unsigned long empty_zero_page[1024];
 extern pgd_t swapper_pg_dir[1024];
 extern kmem_cache_t *pgd_cache;
-extern kmem_cache_t *pmd_cache;
 extern spinlock_t pgd_lock;
 extern struct list_head pgd_list;
 
-void pmd_ctor(void *, kmem_cache_t *, unsigned long);
 void pgd_ctor(void *, kmem_cache_t *, unsigned long);
 void pgd_dtor(void *, kmem_cache_t *, unsigned long);
 void pgtable_cache_init(void);
diff -prauN mm2-2.5.73-1/include/asm-ia64/pgtable.h mm2-2.5.73-2/include/asm-ia64/pgtable.h
--- mm2-2.5.73-1/include/asm-ia64/pgtable.h	2003-06-22 11:32:39.000000000 -0700
+++ mm2-2.5.73-2/include/asm-ia64/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -257,7 +257,8 @@ ia64_phys_addr_valid (unsigned long addr
 #define pgd_bad(pgd)			(!ia64_phys_addr_valid(pgd_val(pgd)))
 #define pgd_present(pgd)		(pgd_val(pgd) != 0UL)
 #define pgd_clear(pgdp)			(pgd_val(*(pgdp)) = 0UL)
-#define pgd_page(pgd)			((unsigned long) __va(pgd_val(pgd) & _PFN_MASK))
+#define __pgd_page(pgd)			((unsigned long)__va(pgd_val(pgd) & _PFN_MASK))
+#define pgd_page(pgd)			virt_to_page(__pgd_page(pgd))
 
 /*
  * The following have defined behavior only work if pte_present() is true.
@@ -326,7 +327,13 @@ pgd_offset (struct mm_struct *mm, unsign
 
 /* Find an entry in the second-level page table.. */
 #define pmd_offset(dir,addr) \
-	((pmd_t *) pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
+	((pmd_t *)__pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
+
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
 
 /*
  * Find an entry in the third-level page table.  This looks more complicated than it
diff -prauN mm2-2.5.73-1/include/asm-m68k/motorola_pgtable.h mm2-2.5.73-2/include/asm-m68k/motorola_pgtable.h
--- mm2-2.5.73-1/include/asm-m68k/motorola_pgtable.h	2003-06-22 11:32:57.000000000 -0700
+++ mm2-2.5.73-2/include/asm-m68k/motorola_pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -115,6 +115,7 @@ extern inline void pgd_set(pgd_t * pgdp,
 #define __pte_page(pte) ((unsigned long)__va(pte_val(pte) & PAGE_MASK))
 #define __pmd_page(pmd) ((unsigned long)__va(pmd_val(pmd) & _TABLE_MASK))
 #define __pgd_page(pgd) ((unsigned long)__va(pgd_val(pgd) & _TABLE_MASK))
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
 
 
 #define pte_none(pte)		(!pte_val(pte))
diff -prauN mm2-2.5.73-1/include/asm-m68knommu/pgtable.h mm2-2.5.73-2/include/asm-m68knommu/pgtable.h
--- mm2-2.5.73-1/include/asm-m68knommu/pgtable.h	2003-06-22 11:32:56.000000000 -0700
+++ mm2-2.5.73-2/include/asm-m68knommu/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -21,7 +21,12 @@ typedef pte_t *pte_addr_t;
 #define pgd_bad(pgd)		(0)
 #define pgd_clear(pgdp)
 #define kern_addr_valid(addr)	(1)
-#define	pmd_offset(a, b)	((void *)0)
+#define	pmd_offset(a, b)		((void *)0)
+#define	pmd_offset_kernel(a, b)		pmd_offset(a, b)
+#define	pmd_offset_map(a, b)		pmd_offset(a, b)
+#define	pmd_offset_map_nested(a, b)	pmd_offset(a, b)
+#define pmd_unmap(pmd)			do { } while (0)
+#define pmd_unmap_nested(pmd)		do { } while (0)
 
 #define PAGE_NONE	__pgprot(0)
 #define PAGE_SHARED	__pgprot(0)
diff -prauN mm2-2.5.73-1/include/asm-mips64/pgtable.h mm2-2.5.73-2/include/asm-mips64/pgtable.h
--- mm2-2.5.73-1/include/asm-mips64/pgtable.h	2003-06-28 03:09:55.000000000 -0700
+++ mm2-2.5.73-2/include/asm-mips64/pgtable.h	2003-06-28 03:16:19.000000000 -0700
@@ -155,11 +155,13 @@ extern pmd_t empty_bad_pmd_table[2*PAGE_
 #define pmd_page(pmd)		(pfn_to_page(pmd_phys(pmd) >> PAGE_SHIFT))
 #define pmd_page_kernel(pmd)	pmd_val(pmd)
 
-static inline unsigned long pgd_page(pgd_t pgd)
+static inline unsigned long __pgd_page(pgd_t pgd)
 {
 	return pgd_val(pgd);
 }
 
+#define pgd_page(pgd)		virt_to_page(__pgd_page(pgd))
+
 static inline int pte_none(pte_t pte)
 {
 	return !(pte_val(pte) & ~_PAGE_GLOBAL);
@@ -397,7 +399,7 @@ static inline pte_t pte_modify(pte_t pte
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pgd_t * dir, unsigned long address)
 {
-	return (pmd_t *) pgd_page(*dir) +
+	return (pmd_t *)__pgd_page(*dir) +
 	       ((address >> PMD_SHIFT) & (PTRS_PER_PMD - 1));
 }
 
diff -prauN mm2-2.5.73-1/include/asm-parisc/pgtable.h mm2-2.5.73-2/include/asm-parisc/pgtable.h
--- mm2-2.5.73-1/include/asm-parisc/pgtable.h	2003-06-22 11:33:15.000000000 -0700
+++ mm2-2.5.73-2/include/asm-parisc/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -242,7 +242,8 @@ extern unsigned long *empty_zero_page;
 
 
 #ifdef __LP64__
-#define pgd_page(pgd) ((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define __pgd_page(pgd) ((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
 
 /* For 64 bit we have three level tables */
 
@@ -339,11 +340,17 @@ extern inline pte_t pte_modify(pte_t pte
 
 #ifdef __LP64__
 #define pmd_offset(dir,address) \
-((pmd_t *) pgd_page(*(dir)) + (((address)>>PMD_SHIFT) & (PTRS_PER_PMD-1)))
+((pmd_t *)__pgd_page(*(dir)) + (((address)>>PMD_SHIFT) & (PTRS_PER_PMD-1)))
 #else
 #define pmd_offset(dir,addr) ((pmd_t *) dir)
 #endif
 
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
+
 /* Find an entry in the third-level page table.. */ 
 #define pte_index(address) (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE-1))
 #define pte_offset_kernel(pmd, address) \
diff -prauN mm2-2.5.73-1/include/asm-ppc/pgtable.h mm2-2.5.73-2/include/asm-ppc/pgtable.h
--- mm2-2.5.73-1/include/asm-ppc/pgtable.h	2003-06-22 11:32:37.000000000 -0700
+++ mm2-2.5.73-2/include/asm-ppc/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -370,8 +370,9 @@ static inline int pgd_bad(pgd_t pgd)		{ 
 static inline int pgd_present(pgd_t pgd)	{ return 1; }
 #define pgd_clear(xp)				do { } while (0)
 
-#define pgd_page(pgd) \
+#define __pgd_page(pgd) \
 	((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
 
 /*
  * The following only work if pte_present() is true.
diff -prauN mm2-2.5.73-1/include/asm-ppc64/pgtable.h mm2-2.5.73-2/include/asm-ppc64/pgtable.h
--- mm2-2.5.73-1/include/asm-ppc64/pgtable.h	2003-06-22 11:33:18.000000000 -0700
+++ mm2-2.5.73-2/include/asm-ppc64/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -190,7 +190,8 @@ extern unsigned long empty_zero_page[PAG
 #define pgd_bad(pgd)		((pgd_val(pgd)) == 0)
 #define pgd_present(pgd)	(pgd_val(pgd) != 0UL)
 #define pgd_clear(pgdp)		(pgd_val(*(pgdp)) = 0UL)
-#define pgd_page(pgd)		(__bpn_to_ba(pgd_val(pgd))) 
+#define __pgd_page(pgd)		(__bpn_to_ba(pgd_val(pgd))) 
+#define pgd_page(pgd)		virt_to_page(__pgd_page(pgd))
 
 /* 
  * Find an entry in a page-table-directory.  We combine the address region 
@@ -203,12 +204,18 @@ extern unsigned long empty_zero_page[PAG
 
 /* Find an entry in the second-level page table.. */
 #define pmd_offset(dir,addr) \
-  ((pmd_t *) pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
+  ((pmd_t *)__pgd_page(*(dir)) + (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1)))
 
 /* Find an entry in the third-level page table.. */
 #define pte_offset_kernel(dir,addr) \
   ((pte_t *) pmd_page_kernel(*(dir)) + (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)))
 
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
+
 #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
 #define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
 #define pte_unmap(pte)			do { } while(0)
diff -prauN mm2-2.5.73-1/include/asm-s390/pgtable.h mm2-2.5.73-2/include/asm-s390/pgtable.h
--- mm2-2.5.73-1/include/asm-s390/pgtable.h	2003-06-22 11:33:07.000000000 -0700
+++ mm2-2.5.73-2/include/asm-s390/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -613,6 +613,7 @@ static inline pte_t mk_pte_phys(unsigned
 /* to find an entry in a page-table-directory */
 #define pgd_index(address) ((address >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
 #define pgd_offset(mm, address) ((mm)->pgd+pgd_index(address))
+#define pgd_page(pgd)	virt_to_page(pgd_page_kernel(pgd))
 
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(address) pgd_offset(&init_mm, address)
@@ -634,6 +635,12 @@ extern inline pmd_t * pmd_offset(pgd_t *
 
 #endif /* __s390x__ */
 
+#define pmd_offset_kernel(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)					do { } while (0)
+#define pmd_unmap_nested(pmd)				do { } while (0)
+
 /* Find an entry in the third-level page table.. */
 #define pte_index(address) (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE-1))
 #define pte_offset_kernel(pmd, address) \
diff -prauN mm2-2.5.73-1/include/asm-sh/pgtable-2level.h mm2-2.5.73-2/include/asm-sh/pgtable-2level.h
--- mm2-2.5.73-1/include/asm-sh/pgtable-2level.h	2003-06-22 11:33:32.000000000 -0700
+++ mm2-2.5.73-2/include/asm-sh/pgtable-2level.h	2003-06-28 03:11:37.000000000 -0700
@@ -48,8 +48,9 @@ static inline void pgd_clear (pgd_t * pg
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
 #define set_pgd(pgdptr, pgdval) (*(pgdptr) = pgdval)
 
-#define pgd_page(pgd) \
+#define __pgd_page(pgd) \
 ((unsigned long) __va(pgd_val(pgd) & PAGE_MASK))
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
 
 static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
 {
diff -prauN mm2-2.5.73-1/include/asm-sparc/pgtable.h mm2-2.5.73-2/include/asm-sparc/pgtable.h
--- mm2-2.5.73-1/include/asm-sparc/pgtable.h	2003-06-22 11:32:56.000000000 -0700
+++ mm2-2.5.73-2/include/asm-sparc/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -202,10 +202,11 @@ extern unsigned long empty_zero_page;
 /*
  */
 BTFIXUPDEF_CALL_CONST(struct page *, pmd_page, pmd_t)
-BTFIXUPDEF_CALL_CONST(unsigned long, pgd_page, pgd_t)
+BTFIXUPDEF_CALL_CONST(unsigned long, __pgd_page, pgd_t)
 
 #define pmd_page(pmd) BTFIXUP_CALL(pmd_page)(pmd)
-#define pgd_page(pgd) BTFIXUP_CALL(pgd_page)(pgd)
+#define __pgd_page(pgd) BTFIXUP_CALL(__pgd_page)(pgd)
+#define pgd_page(pgd)	virt_to_page(__pgd_page(pgd))
 
 BTFIXUPDEF_SETHI(none_mask)
 BTFIXUPDEF_CALL_CONST(int, pte_present, pte_t)
@@ -352,6 +353,11 @@ extern __inline__ pte_t pte_modify(pte_t
 /* Find an entry in the second-level page table.. */
 BTFIXUPDEF_CALL(pmd_t *, pmd_offset, pgd_t *, unsigned long)
 #define pmd_offset(dir,addr) BTFIXUP_CALL(pmd_offset)(dir,addr)
+#define pmd_offset_kernel(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)					do { } while (0)
+#define pmd_unmap_nested(pmd)				do { } while (0)
 
 /* Find an entry in the third-level page table.. */ 
 BTFIXUPDEF_CALL(pte_t *, pte_offset_kernel, pmd_t *, unsigned long)
diff -prauN mm2-2.5.73-1/include/asm-sparc64/pgtable.h mm2-2.5.73-2/include/asm-sparc64/pgtable.h
--- mm2-2.5.73-1/include/asm-sparc64/pgtable.h	2003-06-22 11:32:31.000000000 -0700
+++ mm2-2.5.73-2/include/asm-sparc64/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -228,7 +228,8 @@ static inline pte_t pte_modify(pte_t ori
 	(pgd_val(*(pgdp)) = (__pa((unsigned long) (pmdp)) >> 11UL))
 #define __pmd_page(pmd)			((unsigned long) __va((pmd_val(pmd)<<11UL)))
 #define pmd_page(pmd) 			virt_to_page((void *)__pmd_page(pmd))
-#define pgd_page(pgd)			((unsigned long) __va((pgd_val(pgd)<<11UL)))
+#define __pgd_page(pgd)			((unsigned long) __va((pgd_val(pgd)<<11UL)))
+#define pgd_page(pgd)			virt_to_page(__pgd_page(pgd))
 #define pte_none(pte) 			(!pte_val(pte))
 #define pte_present(pte)		(pte_val(pte) & _PAGE_PRESENT)
 #define pte_clear(pte)			(pte_val(*(pte)) = 0UL)
@@ -270,8 +271,13 @@ static inline pte_t pte_modify(pte_t ori
 #define pgd_offset_k(address) pgd_offset(&init_mm, address)
 
 /* Find an entry in the second-level page table.. */
-#define pmd_offset(dir, address)	((pmd_t *) pgd_page(*(dir)) + \
+#define pmd_offset(dir, address)	((pmd_t *)__pgd_page(*(dir)) + \
 					((address >> PMD_SHIFT) & (REAL_PTRS_PER_PMD-1)))
+#define pmd_offset_kernel(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)					do { } while (0)
+#define pmd_unmap_nested(pmd)				do { } while (0)
 
 /* Find an entry in the third-level page table.. */
 #define pte_index(dir, address)	((pte_t *) __pmd_page(*(dir)) + \
diff -prauN mm2-2.5.73-1/include/asm-v850/pgtable.h mm2-2.5.73-2/include/asm-v850/pgtable.h
--- mm2-2.5.73-1/include/asm-v850/pgtable.h	2003-06-22 11:32:58.000000000 -0700
+++ mm2-2.5.73-2/include/asm-v850/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -13,6 +13,11 @@ typedef pte_t *pte_addr_t;
 #define pgd_clear(pgdp)		((void)0)
 
 #define	pmd_offset(a, b)	((void *)0)
+#define pmd_offset_kernel(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)	pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)				do { } while (0)
+#define pmd_unmap_nested(pmd)			do { } while (0)
 
 #define kern_addr_valid(addr)	(1)
 
diff -prauN mm2-2.5.73-1/include/asm-x86_64/pgtable.h mm2-2.5.73-2/include/asm-x86_64/pgtable.h
--- mm2-2.5.73-1/include/asm-x86_64/pgtable.h	2003-06-28 03:09:57.000000000 -0700
+++ mm2-2.5.73-2/include/asm-x86_64/pgtable.h	2003-06-28 03:11:37.000000000 -0700
@@ -98,8 +98,9 @@ static inline void set_pml4(pml4_t *dst,
 	pml4_val(*dst) = pml4_val(val); 
 }
 
-#define pgd_page(pgd) \
+#define __pgd_page(pgd) \
 ((unsigned long) __va(pgd_val(pgd) & PHYSICAL_PAGE_MASK))
+#define pgd_page(pgd)		virt_to_page(__pgd_page(pgd))
 
 #define ptep_get_and_clear(xp)	__pte(xchg(&(xp)->pte, 0))
 #define pte_same(a, b)		((a).pte == (b).pte)
@@ -332,8 +333,13 @@ static inline pgd_t *current_pgd_offset_
 #define pmd_page(pmd)		(pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
 
 #define pmd_index(address) (((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
-#define pmd_offset(dir, address) ((pmd_t *) pgd_page(*(dir)) + \
+#define pmd_offset(dir, address) ((pmd_t *)__pgd_page(*(dir)) + \
 			pmd_index(address))
+#define pmd_offset_kernel(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map(pgd, addr)			pmd_offset(pgd, addr)
+#define pmd_offset_map_nested(pgd, addr)		pmd_offset(pgd, addr)
+#define pmd_unmap(pmd)					do { } while (0)
+#define pmd_unmap_nested(pmd)				do { } while (0)
 #define pmd_none(x)	(!pmd_val(x))
 #define pmd_present(x)	(pmd_val(x) & _PAGE_PRESENT)
 #define pmd_clear(xp)	do { set_pmd(xp, __pmd(0)); } while (0)
diff -prauN mm2-2.5.73-1/include/linux/mm.h mm2-2.5.73-2/include/linux/mm.h
--- mm2-2.5.73-1/include/linux/mm.h	2003-06-28 03:09:57.000000000 -0700
+++ mm2-2.5.73-2/include/linux/mm.h	2003-06-28 03:11:37.000000000 -0700
@@ -426,8 +426,9 @@ extern void invalidate_mmap_range(struct
 				  loff_t const holelen);
 extern int vmtruncate(struct inode * inode, loff_t offset);
 extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
+pmd_t *FASTCALL(__pmd_alloc_kernel(struct mm_struct *mm, pgd_t *pmd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
-extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
+pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t **pmd, unsigned long address));
 extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot);
 extern int handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma, unsigned long address, int write_access);
 extern int make_pages_present(unsigned long addr, unsigned long end);
@@ -488,12 +489,11 @@ static inline int set_page_dirty(struct 
  * inlining and the symmetry break with pte_alloc_map() that does all
  * of this out-of-line.
  */
-static inline pmd_t *pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
-{
-	if (pgd_none(*pgd))
-		return __pmd_alloc(mm, pgd, address);
-	return pmd_offset(pgd, address);
-}
+#define pmd_alloc_map(mm, pgd, addr)				\
+	(pgd_none(*(pgd))? __pmd_alloc(mm,pgd,addr): pmd_offset_map(pgd,addr))
+
+#define pmd_alloc_kernel(mm, pgd, addr)				\
+	(pgd_none(*(pgd))? __pmd_alloc_kernel(mm,pgd,addr): pmd_offset_kernel(pgd,addr))
 
 extern void free_area_init(unsigned long * zones_size);
 extern void free_area_init_node(int nid, pg_data_t *pgdat, struct page *pmap,
diff -prauN mm2-2.5.73-1/mm/fremap.c mm2-2.5.73-2/mm/fremap.c
--- mm2-2.5.73-1/mm/fremap.c	2003-06-22 11:32:31.000000000 -0700
+++ mm2-2.5.73-2/mm/fremap.c	2003-06-28 03:11:37.000000000 -0700
@@ -67,11 +67,11 @@ int install_page(struct mm_struct *mm, s
 	pgd = pgd_offset(mm, addr);
 	spin_lock(&mm->page_table_lock);
 
-	pmd = pmd_alloc(mm, pgd, addr);
+	pmd = pmd_alloc_map(mm, pgd, addr);
 	if (!pmd)
 		goto err_unlock;
 
-	pte = pte_alloc_map(mm, pmd, addr);
+	pte = pte_alloc_map(mm, &pmd, addr);
 	if (!pte)
 		goto err_unlock;
 
@@ -82,6 +82,7 @@ int install_page(struct mm_struct *mm, s
 	set_pte(pte, mk_pte(page, prot));
 	pte_chain = page_add_rmap(page, pte, pte_chain);
 	pte_unmap(pte);
+	pmd_unmap(pmd);
 	if (flush)
 		flush_tlb_page(vma, addr);
 	update_mmu_cache(vma, addr, *pte);
diff -prauN mm2-2.5.73-1/mm/memory.c mm2-2.5.73-2/mm/memory.c
--- mm2-2.5.73-1/mm/memory.c	2003-06-28 03:09:58.000000000 -0700
+++ mm2-2.5.73-2/mm/memory.c	2003-06-28 08:35:17.000000000 -0700
@@ -103,6 +103,7 @@ static inline void free_one_pmd(struct m
 static inline void free_one_pgd(struct mmu_gather *tlb, pgd_t * dir)
 {
 	pmd_t * pmd, * md, * emd;
+	struct page *page;
 
 	if (pgd_none(*dir))
 		return;
@@ -111,7 +112,8 @@ static inline void free_one_pgd(struct m
 		pgd_clear(dir);
 		return;
 	}
-	pmd = pmd_offset(dir, 0);
+	page = pgd_page(*dir);
+	pmd = pmd_offset_map(dir, 0);
 	pgd_clear(dir);
 	/*
 	 * Beware if changing the loop below.  It once used int j,
@@ -128,7 +130,8 @@ static inline void free_one_pgd(struct m
 	 */
 	for (md = pmd, emd = pmd + PTRS_PER_PMD; md < emd; md++)
 		free_one_pmd(tlb,md);
-	pmd_free_tlb(tlb, pmd);
+	pmd_unmap(pmd);
+	pmd_free_tlb(tlb, page);
 }
 
 /*
@@ -148,30 +151,40 @@ void clear_page_tables(struct mmu_gather
 	} while (--nr);
 }
 
-pte_t * pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
+/*
+ * error return happens with pmd unmapped
+ */
+pte_t *pte_alloc_map(struct mm_struct *mm, pmd_t **pmd, unsigned long address)
 {
-	if (!pmd_present(*pmd)) {
+	if (!pmd_present(**pmd)) {
+		pgd_t *pgd;
 		struct page *new;
 
+		pmd_unmap(*pmd);
 		spin_unlock(&mm->page_table_lock);
 		new = pte_alloc_one(mm, address);
 		spin_lock(&mm->page_table_lock);
-		if (!new)
+		if (!new) {
+			*pmd = NULL;
 			return NULL;
+		}
+
+		pgd = pgd_offset(mm, address);
+		*pmd = pmd_offset_map(pgd, address);
 
 		/*
 		 * Because we dropped the lock, we should re-check the
 		 * entry, as somebody else could have populated it..
 		 */
-		if (pmd_present(*pmd)) {
+		if (pmd_present(**pmd)) {
 			pte_free(new);
 			goto out;
 		}
 		pgtable_add_rmap(new, mm, address);
-		pmd_populate(mm, pmd, new);
+		pmd_populate(mm, *pmd, new);
 	}
 out:
-	return pte_offset_map(pmd, address);
+	return pte_offset_map(*pmd, address);
 }
 
 pte_t * pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
@@ -256,11 +269,10 @@ skip_copy_pmd_range:	address = (address 
 			continue;
 		}
 
-		src_pmd = pmd_offset(src_pgd, address);
-		dst_pmd = pmd_alloc(dst, dst_pgd, address);
+		dst_pmd = pmd_alloc_map(dst, dst_pgd, address);
 		if (!dst_pmd)
 			goto nomem;
-
+		src_pmd = pmd_offset_map_nested(src_pgd, address);
 		do {
 			pte_t * src_pte, * dst_pte;
 		
@@ -273,15 +285,20 @@ skip_copy_pmd_range:	address = (address 
 				pmd_clear(src_pmd);
 skip_copy_pte_range:
 				address = (address + PMD_SIZE) & PMD_MASK;
-				if (address >= end)
+				if (address >= end) {
+					pmd_unmap(dst_pmd);
+					pmd_unmap_nested(src_pmd);
 					goto out;
+				}
 				goto cont_copy_pmd_range;
 			}
 
-			dst_pte = pte_alloc_map(dst, dst_pmd, address);
+			pmd_unmap_nested(src_pmd);
+			dst_pte = pte_alloc_map(dst, &dst_pmd, address);
 			if (!dst_pte)
 				goto nomem;
 			spin_lock(&src->page_table_lock);	
+			src_pmd = pmd_offset_map_nested(src_pgd, address);
 			src_pte = pte_offset_map_nested(src_pmd, address);
 			do {
 				pte_t pte = *src_pte;
@@ -348,6 +365,8 @@ skip_copy_pte_range:
 				 */
 				pte_unmap_nested(src_pte);
 				pte_unmap(dst_pte);
+				pmd_unmap_nested(src_pmd);
+				pmd_unmap(dst_pmd);
 				spin_unlock(&src->page_table_lock);	
 				spin_unlock(&dst->page_table_lock);	
 				pte_chain = pte_chain_alloc(GFP_KERNEL);
@@ -355,12 +374,16 @@ skip_copy_pte_range:
 				if (!pte_chain)
 					goto nomem;
 				spin_lock(&src->page_table_lock);
+				dst_pmd = pmd_offset_map(dst_pgd, address);
+				src_pmd = pmd_offset_map_nested(src_pgd, address);
 				dst_pte = pte_offset_map(dst_pmd, address);
 				src_pte = pte_offset_map_nested(src_pmd,
 								address);
 cont_copy_pte_range_noset:
 				address += PAGE_SIZE;
 				if (address >= end) {
+					pmd_unmap(dst_pmd);
+					pmd_unmap_nested(src_pmd);
 					pte_unmap_nested(src_pte);
 					pte_unmap(dst_pte);
 					goto out_unlock;
@@ -376,6 +399,8 @@ cont_copy_pmd_range:
 			src_pmd++;
 			dst_pmd++;
 		} while ((unsigned long)src_pmd & PMD_TABLE_MASK);
+		pmd_unmap_nested(src_pmd-1);
+		pmd_unmap(dst_pmd-1);
 	}
 out_unlock:
 	spin_unlock(&src->page_table_lock);
@@ -451,7 +476,7 @@ zap_pmd_range(struct mmu_gather *tlb, pg
 		pgd_clear(dir);
 		return;
 	}
-	pmd = pmd_offset(dir, address);
+	pmd = pmd_offset_map(dir, address);
 	end = address + size;
 	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
 		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
@@ -460,6 +485,7 @@ zap_pmd_range(struct mmu_gather *tlb, pg
 		address = (address + PMD_SIZE) & PMD_MASK; 
 		pmd++;
 	} while (address < end);
+	pmd_unmap(pmd - 1);
 }
 
 void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
@@ -641,20 +667,27 @@ follow_page(struct mm_struct *mm, unsign
 	if (pgd_none(*pgd) || pgd_bad(*pgd))
 		goto out;
 
-	pmd = pmd_offset(pgd, address);
+	pmd = pmd_offset_map(pgd, address);
 	if (pmd_none(*pmd))
-		goto out;
-	if (pmd_huge(*pmd))
-		return follow_huge_pmd(mm, address, pmd, write);
-	if (pmd_bad(*pmd))
-		goto out;
+		goto out_unmap;
+	if (pmd_bad(*pmd)) {
+		pmd_ERROR(*pmd);
+		pmd_clear(pmd);
+		goto out_unmap;
+	}
+	if (pmd_huge(*pmd)) {
+		struct page *page = follow_huge_pmd(mm, address, pmd, write);
+		pmd_unmap(pmd);
+		return page;
+	}
 
 	ptep = pte_offset_map(pmd, address);
 	if (!ptep)
-		goto out;
+		goto out_unmap;
 
 	pte = *ptep;
 	pte_unmap(ptep);
+	pmd_unmap(pmd);
 	if (pte_present(pte)) {
 		if (!write || (pte_write(pte) && pte_dirty(pte))) {
 			pfn = pte_pfn(pte);
@@ -665,6 +698,9 @@ follow_page(struct mm_struct *mm, unsign
 
 out:
 	return NULL;
+out_unmap:
+	pmd_unmap(pmd);
+	goto out;
 }
 
 /* 
@@ -723,7 +759,7 @@ int get_user_pages(struct task_struct *t
 			pgd = pgd_offset_k(pg);
 			if (!pgd)
 				return i ? : -EFAULT;
-			pmd = pmd_offset(pgd, pg);
+			pmd = pmd_offset_kernel(pgd, pg);
 			if (!pmd)
 				return i ? : -EFAULT;
 			pte = pte_offset_kernel(pmd, pg);
@@ -815,8 +851,8 @@ static void zeromap_pte_range(pte_t * pt
 	} while (address && (address < end));
 }
 
-static inline int zeromap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address,
-                                    unsigned long size, pgprot_t prot)
+static inline int zeromap_pmd_range(struct mm_struct *mm, pmd_t **pmd,
+			unsigned long address, unsigned long size, pgprot_t prot)
 {
 	unsigned long end;
 
@@ -825,13 +861,13 @@ static inline int zeromap_pmd_range(stru
 	if (end > PGDIR_SIZE)
 		end = PGDIR_SIZE;
 	do {
-		pte_t * pte = pte_alloc_map(mm, pmd, address);
+		pte_t *pte = pte_alloc_map(mm, pmd, address);
 		if (!pte)
 			return -ENOMEM;
 		zeromap_pte_range(pte, address, end - address, prot);
 		pte_unmap(pte);
 		address = (address + PMD_SIZE) & PMD_MASK;
-		pmd++;
+		(*pmd)++;
 	} while (address && (address < end));
 	return 0;
 }
@@ -851,13 +887,14 @@ int zeromap_page_range(struct vm_area_st
 
 	spin_lock(&mm->page_table_lock);
 	do {
-		pmd_t *pmd = pmd_alloc(mm, dir, address);
+		pmd_t *pmd = pmd_alloc_map(mm, dir, address);
 		error = -ENOMEM;
 		if (!pmd)
 			break;
-		error = zeromap_pmd_range(mm, pmd, address, end - address, prot);
+		error = zeromap_pmd_range(mm, &pmd, address, end - address, prot);
 		if (error)
 			break;
+		pmd_unmap(pmd - 1);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
@@ -892,8 +929,9 @@ static inline void remap_pte_range(pte_t
 	} while (address && (address < end));
 }
 
-static inline int remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned long address, unsigned long size,
-	unsigned long phys_addr, pgprot_t prot)
+static inline int remap_pmd_range(struct mm_struct *mm, pmd_t **pmd,
+				unsigned long address, unsigned long size,
+				unsigned long phys_addr, pgprot_t prot)
 {
 	unsigned long base, end;
 
@@ -904,13 +942,13 @@ static inline int remap_pmd_range(struct
 		end = PGDIR_SIZE;
 	phys_addr -= address;
 	do {
-		pte_t * pte = pte_alloc_map(mm, pmd, base + address);
+		pte_t *pte = pte_alloc_map(mm, pmd, base + address);
 		if (!pte)
 			return -ENOMEM;
 		remap_pte_range(pte, base + address, end - address, address + phys_addr, prot);
 		pte_unmap(pte);
 		address = (address + PMD_SIZE) & PMD_MASK;
-		pmd++;
+		(*pmd)++;
 	} while (address && (address < end));
 	return 0;
 }
@@ -932,13 +970,14 @@ int remap_page_range(struct vm_area_stru
 
 	spin_lock(&mm->page_table_lock);
 	do {
-		pmd_t *pmd = pmd_alloc(mm, dir, from);
+		pmd_t *pmd = pmd_alloc_map(mm, dir, from);
 		error = -ENOMEM;
 		if (!pmd)
 			break;
-		error = remap_pmd_range(mm, pmd, from, end - from, phys_addr + from, prot);
+		error = remap_pmd_range(mm, &pmd, from, end - from, phys_addr + from, prot);
 		if (error)
 			break;
+		pmd_unmap(pmd - 1);
 		from = (from + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (from && (from < end));
@@ -1008,6 +1047,7 @@ static int do_wp_page(struct mm_struct *
 		 * data, but for the moment just pretend this is OOM.
 		 */
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 		printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n",
 				address);
 		goto oom;
@@ -1022,11 +1062,13 @@ static int do_wp_page(struct mm_struct *
 			establish_pte(vma, address, page_table,
 				pte_mkyoung(pte_mkdirty(pte_mkwrite(pte))));
 			pte_unmap(page_table);
+			pmd_unmap(pmd);
 			ret = VM_FAULT_MINOR;
 			goto out;
 		}
 	}
 	pte_unmap(page_table);
+	pmd_unmap(pmd);
 
 	/*
 	 * Ok, we need to copy. Oh, well..
@@ -1046,6 +1088,7 @@ static int do_wp_page(struct mm_struct *
 	 * Re-check the pte - we dropped the lock
 	 */
 	spin_lock(&mm->page_table_lock);
+	pmd = pmd_offset_map(pgd_offset(mm, address), address);
 	page_table = pte_offset_map(pmd, address);
 	if (pte_same(*page_table, pte)) {
 		if (PageReserved(old_page))
@@ -1059,6 +1102,7 @@ static int do_wp_page(struct mm_struct *
 		new_page = old_page;
 	}
 	pte_unmap(page_table);
+	pmd_unmap(pmd);
 	page_cache_release(new_page);
 	page_cache_release(old_page);
 	ret = VM_FAULT_MINOR;
@@ -1227,6 +1271,7 @@ static int do_swap_page(struct mm_struct
 	struct pte_chain *pte_chain = NULL;
 
 	pte_unmap(page_table);
+	pmd_unmap(pmd);
 	spin_unlock(&mm->page_table_lock);
 	page = lookup_swap_cache(entry);
 	if (!page) {
@@ -1238,12 +1283,14 @@ static int do_swap_page(struct mm_struct
 			 * we released the page table lock.
 			 */
 			spin_lock(&mm->page_table_lock);
+			pmd = pmd_offset_map(pgd_offset(mm, address), address);
 			page_table = pte_offset_map(pmd, address);
 			if (pte_same(*page_table, orig_pte))
 				ret = VM_FAULT_OOM;
 			else
 				ret = VM_FAULT_MINOR;
 			pte_unmap(page_table);
+			pmd_unmap(pmd);
 			spin_unlock(&mm->page_table_lock);
 			goto out;
 		}
@@ -1266,9 +1313,11 @@ static int do_swap_page(struct mm_struct
 	 * released the page table lock.
 	 */
 	spin_lock(&mm->page_table_lock);
+	pmd = pmd_offset_map(pgd_offset(mm, address), address);
 	page_table = pte_offset_map(pmd, address);
 	if (!pte_same(*page_table, orig_pte)) {
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 		spin_unlock(&mm->page_table_lock);
 		unlock_page(page);
 		page_cache_release(page);
@@ -1294,6 +1343,7 @@ static int do_swap_page(struct mm_struct
 
 	/* No need to invalidate - it was non-present before */
 	update_mmu_cache(vma, address, pte);
+	pmd_unmap(pmd);
 	pte_unmap(page_table);
 	spin_unlock(&mm->page_table_lock);
 out:
@@ -1319,11 +1369,13 @@ do_anonymous_page(struct mm_struct *mm, 
 	pte_chain = pte_chain_alloc(GFP_ATOMIC);
 	if (!pte_chain) {
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 		spin_unlock(&mm->page_table_lock);
 		pte_chain = pte_chain_alloc(GFP_KERNEL);
 		if (!pte_chain)
 			goto no_mem;
 		spin_lock(&mm->page_table_lock);
+		pmd = pmd_offset_map(pgd_offset(mm, addr), addr);
 		page_table = pte_offset_map(pmd, addr);
 	}
 		
@@ -1334,6 +1386,7 @@ do_anonymous_page(struct mm_struct *mm, 
 	if (write_access) {
 		/* Allocate our own private page. */
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 		spin_unlock(&mm->page_table_lock);
 
 		page = alloc_page(GFP_HIGHUSER);
@@ -1342,9 +1395,11 @@ do_anonymous_page(struct mm_struct *mm, 
 		clear_user_highpage(page, addr);
 
 		spin_lock(&mm->page_table_lock);
+		pmd = pmd_offset_map(pgd_offset(mm, addr), addr);
 		page_table = pte_offset_map(pmd, addr);
 
 		if (!pte_none(*page_table)) {
+			pmd_unmap(pmd);
 			pte_unmap(page_table);
 			page_cache_release(page);
 			spin_unlock(&mm->page_table_lock);
@@ -1360,6 +1415,7 @@ do_anonymous_page(struct mm_struct *mm, 
 	set_pte(page_table, entry);
 	/* ignores ZERO_PAGE */
 	pte_chain = page_add_rmap(page, page_table, pte_chain);
+	pmd_unmap(pmd);
 	pte_unmap(page_table);
 
 	/* No need to invalidate - it was non-present before */
@@ -1402,6 +1458,7 @@ do_no_page(struct mm_struct *mm, struct 
 		return do_anonymous_page(mm, vma, page_table,
 					pmd, write_access, address);
 	pte_unmap(page_table);
+	pmd_unmap(pmd);
 
 	mapping = vma->vm_file->f_dentry->d_inode->i_mapping;
 	sequence = atomic_read(&mapping->truncate_count);
@@ -1446,6 +1503,7 @@ retry:
 		page_cache_release(new_page);
 		goto retry;
 	}
+	pmd = pmd_offset_map(pgd_offset(mm, address), address);
 	page_table = pte_offset_map(pmd, address);
 
 	/*
@@ -1468,9 +1526,11 @@ retry:
 		set_pte(page_table, entry);
 		pte_chain = page_add_rmap(new_page, page_table, pte_chain);
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 	} else {
 		/* One of our sibling threads was faster, back out. */
 		pte_unmap(page_table);
+		pmd_unmap(pmd);
 		page_cache_release(new_page);
 		spin_unlock(&mm->page_table_lock);
 		ret = VM_FAULT_MINOR;
@@ -1514,6 +1574,7 @@ static int do_file_page(struct mm_struct
 	pgoff = pte_to_pgoff(*pte);
 
 	pte_unmap(pte);
+	pmd_unmap(pmd);
 	spin_unlock(&mm->page_table_lock);
 
 	err = vma->vm_ops->populate(vma, address & PAGE_MASK, PAGE_SIZE, vma->vm_page_prot, pgoff, 0);
@@ -1574,6 +1635,7 @@ static inline int handle_pte_fault(struc
 	entry = pte_mkyoung(entry);
 	establish_pte(vma, address, pte, entry);
 	pte_unmap(pte);
+	pmd_unmap(pmd);
 	spin_unlock(&mm->page_table_lock);
 	return VM_FAULT_MINOR;
 }
@@ -1600,10 +1662,10 @@ int handle_mm_fault(struct mm_struct *mm
 	 * and the SMP-safe atomic PTE updates.
 	 */
 	spin_lock(&mm->page_table_lock);
-	pmd = pmd_alloc(mm, pgd, address);
+	pmd = pmd_alloc_map(mm, pgd, address);
 
 	if (pmd) {
-		pte_t * pte = pte_alloc_map(mm, pmd, address);
+		pte_t *pte = pte_alloc_map(mm, &pmd, address);
 		if (pte)
 			return handle_pte_fault(mm, vma, address, write_access, pte, pmd);
 	}
@@ -1640,7 +1702,30 @@ pmd_t *__pmd_alloc(struct mm_struct *mm,
 	}
 	pgd_populate(mm, pgd, new);
 out:
-	return pmd_offset(pgd, address);
+	return pmd_offset_map(pgd, address);
+}
+
+pmd_t *__pmd_alloc_kernel(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
+{
+	pmd_t *new;
+
+	spin_unlock(&mm->page_table_lock);
+	new = pmd_alloc_one_kernel(mm, address);
+	spin_lock(&mm->page_table_lock);
+	if (!new)
+		return NULL;
+
+	/*
+	 * Because we dropped the lock, we should re-check the
+	 * entry, as somebody else could have populated it..
+	 */
+	if (pgd_present(*pgd)) {
+		pmd_free(new);
+		goto out;
+	}
+	pgd_populate(mm, pgd, new);
+out:
+	return pmd_offset_kernel(pgd, address);
 }
 
 int make_pages_present(unsigned long addr, unsigned long end)
@@ -1672,7 +1757,7 @@ struct page * vmalloc_to_page(void * vma
 	pte_t *ptep, pte;
   
 	if (!pgd_none(*pgd)) {
-		pmd = pmd_offset(pgd, addr);
+		pmd = pmd_offset_map(pgd, addr);
 		if (!pmd_none(*pmd)) {
 			preempt_disable();
 			ptep = pte_offset_map(pmd, addr);
@@ -1682,6 +1767,7 @@ struct page * vmalloc_to_page(void * vma
 			pte_unmap(ptep);
 			preempt_enable();
 		}
+		pmd_unmap(pmd);
 	}
 	return page;
 }
diff -prauN mm2-2.5.73-1/mm/mprotect.c mm2-2.5.73-2/mm/mprotect.c
--- mm2-2.5.73-1/mm/mprotect.c	2003-06-28 03:09:58.000000000 -0700
+++ mm2-2.5.73-2/mm/mprotect.c	2003-06-28 03:11:37.000000000 -0700
@@ -73,7 +73,7 @@ change_pmd_range(pgd_t *pgd, unsigned lo
 		pgd_clear(pgd);
 		return;
 	}
-	pmd = pmd_offset(pgd, address);
+	pmd = pmd_offset_map(pgd, address);
 	address &= ~PGDIR_MASK;
 	end = address + size;
 	if (end > PGDIR_SIZE)
@@ -83,6 +83,7 @@ change_pmd_range(pgd_t *pgd, unsigned lo
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address && (address < end));
+	pmd_unmap(pmd - 1);
 }
 
 static void
diff -prauN mm2-2.5.73-1/mm/mremap.c mm2-2.5.73-2/mm/mremap.c
--- mm2-2.5.73-1/mm/mremap.c	2003-06-28 03:09:58.000000000 -0700
+++ mm2-2.5.73-2/mm/mremap.c	2003-06-28 03:11:37.000000000 -0700
@@ -38,7 +38,7 @@ static pte_t *get_one_pte_map_nested(str
 		goto end;
 	}
 
-	pmd = pmd_offset(pgd, addr);
+	pmd = pmd_offset_map_nested(pgd, addr);
 	if (pmd_none(*pmd))
 		goto end;
 	if (pmd_bad(*pmd)) {
@@ -53,6 +53,7 @@ static pte_t *get_one_pte_map_nested(str
 		pte = NULL;
 	}
 end:
+	pmd_unmap_nested(pmd);
 	return pte;
 }
 
@@ -61,12 +62,15 @@ static inline int page_table_present(str
 {
 	pgd_t *pgd;
 	pmd_t *pmd;
+	int ret;
 
 	pgd = pgd_offset(mm, addr);
 	if (pgd_none(*pgd))
 		return 0;
-	pmd = pmd_offset(pgd, addr);
-	return pmd_present(*pmd);
+	pmd = pmd_offset_map(pgd, addr);
+	ret = pmd_present(*pmd);
+	pmd_unmap(pmd);
+	return ret != 0;
 }
 #else
 #define page_table_present(mm, addr)	(1)
@@ -77,9 +81,10 @@ static inline pte_t *alloc_one_pte_map(s
 	pmd_t *pmd;
 	pte_t *pte = NULL;
 
-	pmd = pmd_alloc(mm, pgd_offset(mm, addr), addr);
+	pmd = pmd_alloc_map(mm, pgd_offset(mm, addr), addr);
 	if (pmd)
-		pte = pte_alloc_map(mm, pmd, addr);
+		pte = pte_alloc_map(mm, &pmd, addr);
+	pmd_unmap(pmd);
 	return pte;
 }
 
diff -prauN mm2-2.5.73-1/mm/msync.c mm2-2.5.73-2/mm/msync.c
--- mm2-2.5.73-1/mm/msync.c	2003-06-22 11:32:42.000000000 -0700
+++ mm2-2.5.73-2/mm/msync.c	2003-06-28 03:11:37.000000000 -0700
@@ -82,7 +82,7 @@ static inline int filemap_sync_pmd_range
 		pgd_clear(pgd);
 		return 0;
 	}
-	pmd = pmd_offset(pgd, address);
+	pmd = pmd_offset_map(pgd, address);
 	if ((address & PGDIR_MASK) != (end & PGDIR_MASK))
 		end = (address & PGDIR_MASK) + PGDIR_SIZE;
 	error = 0;
@@ -91,6 +91,7 @@ static inline int filemap_sync_pmd_range
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address && (address < end));
+	pmd_unmap(pmd - 1);
 	return error;
 }
 
diff -prauN mm2-2.5.73-1/mm/slab.c mm2-2.5.73-2/mm/slab.c
--- mm2-2.5.73-1/mm/slab.c	2003-06-28 03:09:58.000000000 -0700
+++ mm2-2.5.73-2/mm/slab.c	2003-06-28 03:27:00.000000000 -0700
@@ -2717,7 +2717,7 @@ void ptrinfo(unsigned long addr)
 			printk("No pgd.\n");
 			break;
 		}
-		pmd = pmd_offset(pgd, addr);
+		pmd = pmd_offset_kernel(pgd, addr);
 		if (pmd_none(*pmd)) {
 			printk("No pmd.\n");
 			break;
diff -prauN mm2-2.5.73-1/mm/swapfile.c mm2-2.5.73-2/mm/swapfile.c
--- mm2-2.5.73-1/mm/swapfile.c	2003-06-28 03:09:58.000000000 -0700
+++ mm2-2.5.73-2/mm/swapfile.c	2003-06-28 03:11:37.000000000 -0700
@@ -448,7 +448,7 @@ static int unuse_pgd(struct vm_area_stru
 		pgd_clear(dir);
 		return 0;
 	}
-	pmd = pmd_offset(dir, address);
+	pmd = pmd_offset_map(dir, address);
 	offset = address & PGDIR_MASK;
 	address &= ~PGDIR_MASK;
 	end = address + size;
@@ -463,6 +463,7 @@ static int unuse_pgd(struct vm_area_stru
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address && (address < end));
+	pmd_unmap(pmd - 1);
 	return 0;
 }
 
diff -prauN mm2-2.5.73-1/mm/vmalloc.c mm2-2.5.73-2/mm/vmalloc.c
--- mm2-2.5.73-1/mm/vmalloc.c	2003-06-22 11:32:56.000000000 -0700
+++ mm2-2.5.73-2/mm/vmalloc.c	2003-06-28 03:11:37.000000000 -0700
@@ -70,7 +70,7 @@ static void unmap_area_pmd(pgd_t *dir, u
 		return;
 	}
 
-	pmd = pmd_offset(dir, address);
+	pmd = pmd_offset_kernel(dir, address);
 	address &= ~PGDIR_MASK;
 	end = address + size;
 	if (end > PGDIR_SIZE)
@@ -159,7 +159,7 @@ int map_vm_area(struct vm_struct *area, 
 	dir = pgd_offset_k(address);
 	spin_lock(&init_mm.page_table_lock);
 	do {
-		pmd_t *pmd = pmd_alloc(&init_mm, dir, address);
+		pmd_t *pmd = pmd_alloc_kernel(&init_mm, dir, address);
 		if (!pmd) {
 			err = -ENOMEM;
 			break;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-28 16:08   ` Christoph Hellwig
  2003-06-28 20:49     ` 2.5.73-mm2 William Lee Irwin III
  2003-06-29  0:34     ` 2.5.73-mm2 Martin J. Bligh
  2003-06-28 23:00   ` 2.5.73-mm2 Andrew Morton
  2003-07-02  3:11   ` 2.5.73-mm2 William Lee Irwin III
  2 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2003-06-28 16:08 UTC (permalink / raw)
  To: William Lee Irwin III, Andrew Morton, linux-kernel, linux-mm

On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
> +config HIGHPMD
> +	bool "Allocate 2nd-level pagetables from highmem"
> +	depends on HIGHMEM64G
> +	help
> +	  The VM uses one pmd entry for each pagetable page of physical
> +	  memory allocated. For systems with extreme amounts of highmem,
> +	  this cannot be tolerated. Setting this option will put
> +	  userspace 2nd-level pagetables in highmem.

Does this make sense for !HIGHPTE?  In fact does it make sense to
carry along HIGHPTE as an option still? ..

> +#ifndef CONFIG_HIGHPMD /* Oh boy. Error reporting is going to blow major goats. */

Any chance you can rearragne the code to avoid the ifndef in favour
of an ifdef?

>  		set_pte(dst_pte, entry);
> +		pmd_unmap(dst_pte);
> +		pmd_unmap_nested(src_pte);

<Lots more pmd_unmap* calls snipped>

Looks like you changed some API so that pmds are now returned mapped?
It might make sense to change their names into foo_map then so the
breakage is at the API level if someone misses updates for the changes.

> +#ifdef CONFIG_HIGHPMD
> +#define	GFP_PMD		(__GFP_REPEAT|__GFP_HIGHMEM|GFP_KERNEL)
> +#else
> +#define GFP_PMD		(__GFP_REPEAT|GFP_KERNEL)
> +#endif

So what?  Do you want to use a space or tab after the #define? :)

Also Given that GFP_PMD is used just once it's argueable whether it makes
sense to get rid of the defintion and use the expanded values directly.


Otherwise the patch looks fine to me and should allow to get some more
free lowmem on those insanely big 32bit machines.. :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
@ 2003-06-28 20:49     ` William Lee Irwin III
  2003-06-29  0:34     ` 2.5.73-mm2 Martin J. Bligh
  1 sibling, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-06-28 20:49 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
>> +config HIGHPMD
>> +	bool "Allocate 2nd-level pagetables from highmem"
>> +	depends on HIGHMEM64G
>> +	help
>> +	  The VM uses one pmd entry for each pagetable page of physical
>> +	  memory allocated. For systems with extreme amounts of highmem,
>> +	  this cannot be tolerated. Setting this option will put
>> +	  userspace 2nd-level pagetables in highmem.

On Sat, Jun 28, 2003 at 05:08:37PM +0100, Christoph Hellwig wrote:
> Does this make sense for !HIGHPTE?  In fact does it make sense to
> carry along HIGHPTE as an option still? ..

It's both possible and functional, if that's the question. I basically
decided I'd enforce the minimum policy, though it is possible to
entangle the two. I'll make it depends on HIGHMEM64G && HIGHPTE for now.


On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
>> +#ifndef CONFIG_HIGHPMD /* Oh boy. Error reporting is going to blow major goats. */

On Sat, Jun 28, 2003 at 05:08:37PM +0100, Christoph Hellwig wrote:
> Any chance you can rearragne the code to avoid the ifndef in favour
> of an ifdef?

Okay.


On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
>>  		set_pte(dst_pte, entry);
>> +		pmd_unmap(dst_pte);
>> +		pmd_unmap_nested(src_pte);

On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
> <Lots more pmd_unmap* calls snipped>
> Looks like you changed some API so that pmds are now returned mapped?
> It might make sense to change their names into foo_map then so the
> breakage is at the API level if someone misses updates for the changes.

It's entirely analogous to pte_offset_map()/pte_alloc_map(). I stubbed
out the things for all architectures already, so it should only be a
question of additional functionality for 32-bit architectures
supporting PAE-like mechanisms. It looks like some mips64, m68k, and
other bits fell out of my tree. They're restored in the real content of
this followup with matching definition counts for all arches.


On Sat, Jun 28, 2003 at 05:08:37PM +0100, Christoph Hellwig wrote:
>> +#ifdef CONFIG_HIGHPMD
>> +#define	GFP_PMD		(__GFP_REPEAT|__GFP_HIGHMEM|GFP_KERNEL)
>> +#else
>> +#define GFP_PMD		(__GFP_REPEAT|GFP_KERNEL)
>> +#endif

On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
> So what?  Do you want to use a space or tab after the #define? :)
> Also Given that GFP_PMD is used just once it's argueable whether it makes
> sense to get rid of the defintion and use the expanded values directly.
> Otherwise the patch looks fine to me and should allow to get some more
> free lowmem on those insanely big 32bit machines.. :)

I guess if I'm going to bother at all, shoving both GFP_PMD and GFP_PTE
into gfp.h and updating all arches to cut out the duplicated #ifdefs
would be the way to do it. I have a feeling that won't fly, so it's
ripped out of the following:

(compressed and MIME-attached, unfortunately some MTA's are barfing on it)

-- wli

[-- Attachment #2: hpmd-2.5.73-mm2-2.gz --]
[-- Type: application/octet-stream, Size: 14021 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
  2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
@ 2003-06-28 23:00   ` Andrew Morton
  2003-06-28 23:11     ` 2.5.73-mm2 William Lee Irwin III
  2003-07-02  3:11   ` 2.5.73-mm2 William Lee Irwin III
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2003-06-28 23:00 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
>  Here's highpmd.

I taught patch-scripts a new trick:

check_patch()
{
	if grep "^+.*[ 	]$" $P/patches/$1.patch
	then
		echo warning: $1 adds trailing whitespace
	fi
}


+       if (pmd_table != pmd_offset_kernel(pgd, 0)) 
+       pmd = pmd_offset_kernel(pgd, address);         
+#define __pgd_page(pgd)                (__bpn_to_ba(pgd_val(pgd))) 
warning: highpmd adds trailing whitespace

You're far from the worst.   There's some editor out there which
adds trailing tabs all over the place.  I edited the diff.

What architectures has this been tested on?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 23:00   ` 2.5.73-mm2 Andrew Morton
@ 2003-06-28 23:11     ` William Lee Irwin III
  2003-06-29 12:45       ` 2.5.73-mm2 Zwane Mwaikambo
  0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-06-28 23:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>>  Here's highpmd.

On Sat, Jun 28, 2003 at 04:00:13PM -0700, Andrew Morton wrote:
> I taught patch-scripts a new trick:
> check_patch()
> {
> 	if grep "^+.*[ 	]$" $P/patches/$1.patch
> 	then
> 		echo warning: $1 adds trailing whitespace
> 	fi
> }

William Lee Irwin III <wli@holomorphy.com> wrote:
> +       if (pmd_table != pmd_offset_kernel(pgd, 0)) 
> +       pmd = pmd_offset_kernel(pgd, address);         
> +#define __pgd_page(pgd)                (__bpn_to_ba(pgd_val(pgd))) 

On Sat, Jun 28, 2003 at 04:00:13PM -0700, Andrew Morton wrote:
> warning: highpmd adds trailing whitespace
> You're far from the worst.   There's some editor out there which
> adds trailing tabs all over the place.  I edited the diff.

This is not my editor; it is either a manual screwup or cut and paste
(inside vi(1) and/or ed(1) buffers) of code which did so beforehand.

Thanks for cleaning that up for me; it's one of my own largest pet
peeves and I'm not terribly pleased to hear I've committed it (though
I'll not go so far as to shoot the messenger).


On Sat, Jun 28, 2003 at 04:00:13PM -0700, Andrew Morton wrote:
> What architectures has this been tested on?

i386 only, CONFIG_HIGHMEM64G with various combinations of highpte &
highpmd, and nohighmem. No CONFIG_HIGHMEM4G or non-i386 machines that
can run 2.5.x are within my grasp (obviously CONFIG_HIGHMEM4G machines
could, I just don't have them, and the discontig code barfs on mem=).


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
  2003-06-28 20:49     ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-29  0:34     ` Martin J. Bligh
  2003-06-29  2:18       ` 2.5.73-mm2 William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2003-06-29  0:34 UTC (permalink / raw)
  To: Christoph Hellwig, William Lee Irwin III, Andrew Morton,
	linux-kernel, linux-mm

--Christoph Hellwig <hch@infradead.org> wrote (on Saturday, June 28, 2003 17:08:37 +0100):

> On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
>> +config HIGHPMD
>> +	bool "Allocate 2nd-level pagetables from highmem"
>> +	depends on HIGHMEM64G
>> +	help
>> +	  The VM uses one pmd entry for each pagetable page of physical
>> +	  memory allocated. For systems with extreme amounts of highmem,
>> +	  this cannot be tolerated. Setting this option will put
>> +	  userspace 2nd-level pagetables in highmem.
> 
> Does this make sense for !HIGHPTE?  In fact does it make sense to
> carry along HIGHPTE as an option still? ..

Last time I measured it, it had about a 10% overhead in kernel time.
Seems like a good thing to keep as an option to me. Bill said he
had some other code to alleviate the overhead, but I don't think
it's merged ... I'd rather see UKVA (permanently map the pagetables
on a per-process basis) merged before it becomes "not an option" -
that gets rid of all the kmapping.
 
M.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-29  0:34     ` 2.5.73-mm2 Martin J. Bligh
@ 2003-06-29  2:18       ` William Lee Irwin III
  2003-06-29  3:07         ` 2.5.73-mm2 Martin J. Bligh
  0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-06-29  2:18 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Christoph Hellwig, Andrew Morton, linux-kernel, linux-mm

On Sat, Jun 28, 2003 at 05:34:05PM -0700, Martin J. Bligh wrote:
> Last time I measured it, it had about a 10% overhead in kernel time.
> Seems like a good thing to keep as an option to me. Bill said he
> had some other code to alleviate the overhead, but I don't think
> it's merged ... I'd rather see UKVA (permanently map the pagetables
> on a per-process basis) merged before it becomes "not an option" -
> that gets rid of all the kmapping.

There are several orthogonal things going on here. One is dropping the
hooks in the right places to get various concrete tasks done. Another
is general resource scalability vs. raw overhead tradeoffs. The last
one is gathering a wide enough repertoire of core hooks that arches can
use "advanced" techniques like recursive pagetables when they require
various kinds of intervention by the kernel to use.

This is just another set of hooks we'll need for our end goal, with a
fully functional implementation. It has direct applications and is
completely usable now for resource scalability albeit with some
overhead. Things are all headed in the appropriate directions; the
hooks do not conflict with and do not require any core modifications
whatsoever in order to use in combination with recursive pagetables;
they can simply recover information from already-available places and
transparently replace the highpmd and highpte arch code.

I can work directly with Dave to arrange a proper demonstration of this
(i.e. fully functional implementation) if need be. I've largely avoided
interceding in recursive pagetable mechanics in order not to duplicate
work.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-29  2:18       ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-29  3:07         ` Martin J. Bligh
  0 siblings, 0 replies; 23+ messages in thread
From: Martin J. Bligh @ 2003-06-29  3:07 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Christoph Hellwig, Andrew Morton, linux-kernel, linux-mm

--William Lee Irwin III <wli@holomorphy.com> wrote (on Saturday, June 28, 2003 19:18:09 -0700):

> On Sat, Jun 28, 2003 at 05:34:05PM -0700, Martin J. Bligh wrote:
>> Last time I measured it, it had about a 10% overhead in kernel time.
>> Seems like a good thing to keep as an option to me. Bill said he
>> had some other code to alleviate the overhead, but I don't think
>> it's merged ... I'd rather see UKVA (permanently map the pagetables
>> on a per-process basis) merged before it becomes "not an option" -
>> that gets rid of all the kmapping.
> 
> There are several orthogonal things going on here. One is dropping the
> hooks in the right places to get various concrete tasks done. Another
> is general resource scalability vs. raw overhead tradeoffs. The last
> one is gathering a wide enough repertoire of core hooks that arches can
> use "advanced" techniques like recursive pagetables when they require
> various kinds of intervention by the kernel to use.
> 
> This is just another set of hooks we'll need for our end goal, with a
> fully functional implementation. It has direct applications and is
> completely usable now for resource scalability albeit with some
> overhead. Things are all headed in the appropriate directions; the
> hooks do not conflict with and do not require any core modifications
> whatsoever in order to use in combination with recursive pagetables;
> they can simply recover information from already-available places and
> transparently replace the highpmd and highpte arch code.
> 
> I can work directly with Dave to arrange a proper demonstration of this
> (i.e. fully functional implementation) if need be. I've largely avoided
> interceding in recursive pagetable mechanics in order not to duplicate
> work.

Right, I'm not against what you're doing - I'm totally for it. My only
concern was that whilst it has some overhead, it should stay as a config
option (which you did). That lets people make the call of overhead vs
resource scaling.

Your patch is fine - just the talk of removing the config option scared 
me ;-)

M.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 23:11     ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-29 12:45       ` Zwane Mwaikambo
  0 siblings, 0 replies; 23+ messages in thread
From: Zwane Mwaikambo @ 2003-06-29 12:45 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel, linux-mm

On Sat, 28 Jun 2003, William Lee Irwin III wrote:

> On Sat, Jun 28, 2003 at 04:00:13PM -0700, Andrew Morton wrote:
> > What architectures has this been tested on?
> 
> i386 only, CONFIG_HIGHMEM64G with various combinations of highpte &
> highpmd, and nohighmem. No CONFIG_HIGHMEM4G or non-i386 machines that
> can run 2.5.x are within my grasp (obviously CONFIG_HIGHMEM4G machines
> could, I just don't have them, and the discontig code barfs on mem=).

It comes up fine on a CONFIG_HIGHMEM4G (16G) box.

-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927
  2003-06-28  3:21 2.5.73-mm2 Andrew Morton
  2003-06-28  8:56 ` 2.5.73-mm2 William Lee Irwin III
  2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
@ 2003-06-29 19:04 ` Adrian Bunk
  2003-06-29 21:04   ` Ralf Baechle
  2003-07-01  0:39 ` 2.5.73-mm2 William Lee Irwin III
  2003-07-01  5:56 ` 2.5.73-mm2 William Lee Irwin III
  4 siblings, 1 reply; 23+ messages in thread
From: Adrian Bunk @ 2003-06-29 19:04 UTC (permalink / raw)
  To: Andrew Morton, ralf; +Cc: linux-kernel, trivial

The following problem seems to come from Linus' tree:

I got an error at the final linking with CONFIG_TC35815 enabled since
the variables tc_readl and tc_writel are not available.

The only place where they are defined is arch/mips/pci/ops-jmr3927.c, so 
I assume the following was intended:


--- linux-2.5.73-mm2/drivers/net/Kconfig.old	2003-06-28 11:14:16.000000000 +0200
+++ linux-2.5.73-mm2/drivers/net/Kconfig	2003-06-29 20:55:16.000000000 +0200
@@ -1397,7 +1397,7 @@
 
 config TC35815
 	tristate "TOSHIBA TC35815 Ethernet support"
-	depends on NET_PCI && PCI
+	depends on NET_PCI && PCI && TOSHIBA_JMR3927
 
 config DGRS
 	tristate "Digi Intl. RightSwitch SE-X support"


cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927
  2003-06-29 19:04 ` [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927 Adrian Bunk
@ 2003-06-29 21:04   ` Ralf Baechle
  0 siblings, 0 replies; 23+ messages in thread
From: Ralf Baechle @ 2003-06-29 21:04 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, linux-kernel, trivial

On Sun, Jun 29, 2003 at 09:04:32PM +0200, Adrian Bunk wrote:

> The following problem seems to come from Linus' tree:
> 
> I got an error at the final linking with CONFIG_TC35815 enabled since
> the variables tc_readl and tc_writel are not available.
> 
> The only place where they are defined is arch/mips/pci/ops-jmr3927.c, so 
> I assume the following was intended:

Not really intended but it makes sense as this particular boards seems
to be the only user of that chip - which already has vanished from
Toshiba's pages anyway ...

  Ralf

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28  3:21 2.5.73-mm2 Andrew Morton
                   ` (2 preceding siblings ...)
  2003-06-29 19:04 ` [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927 Adrian Bunk
@ 2003-07-01  0:39 ` William Lee Irwin III
  2003-07-01  2:14   ` 2.5.73-mm2 Andrew Morton
  2003-07-01 10:46   ` 2.5.73-mm2 Hugh Dickins
  2003-07-01  5:56 ` 2.5.73-mm2 William Lee Irwin III
  4 siblings, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-01  0:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

On Fri, Jun 27, 2003 at 08:21:30PM -0700, Andrew Morton wrote:
> Just bits and pieces.

It was suggested during my last round of OOM killer fixes that one of
my patches, which just checked nr_free_buffer_pages() > 0, should also
consider userspace (i.e. reclaimable at will) memory free.

This patch implements that suggestion. Lightly tested, and expected to
fall within the "relatively trivial" category.

We're still not out of hot water here yet, since the minimum thresholds
will still send all processes into perpetual torpor under persistent
low memory conditions while fooling the OOM heuristics. But this is at
least better than complete ignorance of ZONE_NORMAL exhaustion.


-- wli


diff -prauN wli-2.5.73-31/include/linux/mm.h wli-2.5.73-32/include/linux/mm.h
--- wli-2.5.73-31/include/linux/mm.h	2003-06-29 01:39:42.000000000 -0700
+++ wli-2.5.73-32/include/linux/mm.h	2003-06-30 16:42:28.000000000 -0700
@@ -605,7 +605,8 @@ static inline struct vm_area_struct * fi
 
 extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
 
-extern unsigned int nr_used_zone_pages(void);
+unsigned int nr_used_low_pages(void);
+unsigned int nr_used_zone_pages(void);
 
 extern struct page * vmalloc_to_page(void *addr);
 extern struct page * follow_page(struct mm_struct *mm, unsigned long address,
diff -prauN wli-2.5.73-31/mm/oom_kill.c wli-2.5.73-32/mm/oom_kill.c
--- wli-2.5.73-31/mm/oom_kill.c	2003-06-22 11:32:55.000000000 -0700
+++ wli-2.5.73-32/mm/oom_kill.c	2003-06-30 16:46:49.000000000 -0700
@@ -217,9 +217,9 @@ void out_of_memory(void)
 	unsigned long now, since;
 
 	/*
-	 * Enough swap space left?  Not OOM.
+	 * Enough swap space and ZONE_NORMAL left?  Not OOM.
 	 */
-	if (nr_swap_pages > 0)
+	if (nr_swap_pages > 0 && nr_free_buffer_pages() + nr_used_low_pages() > 0)
 		return;
 
 	spin_lock(&oom_lock);
diff -prauN wli-2.5.73-31/mm/page_alloc.c wli-2.5.73-32/mm/page_alloc.c
--- wli-2.5.73-31/mm/page_alloc.c	2003-06-23 10:53:46.000000000 -0700
+++ wli-2.5.73-32/mm/page_alloc.c	2003-06-30 17:06:20.000000000 -0700
@@ -738,17 +738,6 @@ unsigned int nr_free_pages(void)
 }
 EXPORT_SYMBOL(nr_free_pages);
 
-unsigned int nr_used_zone_pages(void)
-{
-	unsigned int pages = 0;
-	struct zone *zone;
-
-	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
-
-	return pages;
-}
-
 #ifdef CONFIG_NUMA
 unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
 {
@@ -782,6 +771,28 @@ static unsigned int nr_free_zone_pages(i
 	return sum;
 }
 
+static unsigned int __nr_used_zone_pages(int offset)
+{
+	struct zone *zone;
+	unsigned int sum = 0;
+
+	for_each_zone(zone)
+		if (zone - zone->zone_pgdat->node_zones <= offset)
+			sum += zone->nr_active + zone->nr_inactive;
+
+	return sum;
+}
+
+unsigned int nr_used_zone_pages(void)
+{
+	return __nr_used_zone_pages(GFP_HIGHUSER & GFP_ZONEMASK);
+}
+
+unsigned int nr_used_low_pages(void)
+{
+	return __nr_used_zone_pages(GFP_USER & GFP_ZONEMASK);
+}
+
 /*
  * Amount of free RAM allocatable within ZONE_DMA and ZONE_NORMAL
  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01  0:39 ` 2.5.73-mm2 William Lee Irwin III
@ 2003-07-01  2:14   ` Andrew Morton
  2003-07-01  2:46     ` 2.5.73-mm2 William Lee Irwin III
  2003-07-01 10:46   ` 2.5.73-mm2 Hugh Dickins
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2003-07-01  2:14 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>
>  @@ -217,9 +217,9 @@ void out_of_memory(void)
>   	unsigned long now, since;
>   
>   	/*
>  -	 * Enough swap space left?  Not OOM.
>  +	 * Enough swap space and ZONE_NORMAL left?  Not OOM.
>   	 */
>  -	if (nr_swap_pages > 0)
>  +	if (nr_swap_pages > 0 && nr_free_buffer_pages() + nr_used_low_pages() > 0)
>   		return;

a) if someone is trying to allocate some ZONE_DMA pages and there are
   still swappable or free ZONE_NORMAL pages, nobody gets killed.

b) If there are free ZONE_NORMAL pages then why on earth did we call
   out_of_memory()?  Does nr_free_buffer_pages() ever return non-zero in
   here?  It will do so for a ZONE_DMA allocation, but you're not doing
   them...

Generally, I'm thinking that this test should just be removed.  It is
the responsibility of try_to_free_pages() to work out whether the
allocation can succeed.

If try_to_free_pages() calls out_of_memory() when there are still
swappable, reclaimable or free pages in the relevant zones then
try_to_free_pages() goofed, and needs mending.  out_of_memory()
shouldn't be cleaning up after try_to_free_pages()'s mistakes.

I have a bad feeling that it _will_ goof.  A long time ago I looked
at the amount of scanning we're doing in there and decided that it
was way overkill and reduced it by a lot.  I may have gone overboard.  

So how's about I and thy take that test out, see how things get along?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01  2:14   ` 2.5.73-mm2 Andrew Morton
@ 2003-07-01  2:46     ` William Lee Irwin III
  0 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-01  2:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

William Lee Irwin III <wli@holomorphy.com> wrote:
>>  @@ -217,9 +217,9 @@ void out_of_memory(void)
>>   	unsigned long now, since;
>>   
>>   	/*
>>  -	 * Enough swap space left?  Not OOM.
>>  +	 * Enough swap space and ZONE_NORMAL left?  Not OOM.
>>   	 */
>>  -	if (nr_swap_pages > 0)
>>  +	if (nr_swap_pages > 0 && nr_free_buffer_pages() + nr_used_low_pages() > 0)
>>   		return;

On Mon, Jun 30, 2003 at 07:14:56PM -0700, Andrew Morton wrote:
> a) if someone is trying to allocate some ZONE_DMA pages and there are
>    still swappable or free ZONE_NORMAL pages, nobody gets killed.

This is yet another problem for the method above. =(


On Mon, Jun 30, 2003 at 07:14:56PM -0700, Andrew Morton wrote:
> b) If there are free ZONE_NORMAL pages then why on earth did we call
>    out_of_memory()?  Does nr_free_buffer_pages() ever return non-zero in
>    here?  It will do so for a ZONE_DMA allocation, but you're not doing
>    them...

Allocations will enter this path if free memory is below the minimum
page thresholds, since the allocation will be sort of artificially
failed. Basically, with this in place it's more likely to livelock than
to go on killing sprees. There's a small amount of empirical evidence
suggesting this avoids livelocking in some common scenarios, though
that really isn't good enough for this kind of affair.


On Mon, Jun 30, 2003 at 07:14:56PM -0700, Andrew Morton wrote:
> Generally, I'm thinking that this test should just be removed.  It is
> the responsibility of try_to_free_pages() to work out whether the
> allocation can succeed.
> If try_to_free_pages() calls out_of_memory() when there are still
> swappable, reclaimable or free pages in the relevant zones then
> try_to_free_pages() goofed, and needs mending.  out_of_memory()
> shouldn't be cleaning up after try_to_free_pages()'s mistakes.
> I have a bad feeling that it _will_ goof.  A long time ago I looked
> at the amount of scanning we're doing in there and decided that it
> was way overkill and reduced it by a lot.  I may have gone overboard.  
> So how's about I and thy take that test out, see how things get along?

I'm not particularly attached to the method, only the result, so I'm game.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28  3:21 2.5.73-mm2 Andrew Morton
                   ` (3 preceding siblings ...)
  2003-07-01  0:39 ` 2.5.73-mm2 William Lee Irwin III
@ 2003-07-01  5:56 ` William Lee Irwin III
  4 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-01  5:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

[-- Attachment #1: brief message --]
[-- Type: text/plain, Size: 563 bytes --]

On Fri, Jun 27, 2003 at 08:21:30PM -0700, Andrew Morton wrote:
> Just bits and pieces.

And here is cpumask_t. This enables architectures with NR_CPUS >
BITS_PER_LONG to utilize all those cpus. Tested on ppc64.

This unfortunately has not undergone compiletesting for all
architectures, so some amount of source-level breakage is implied.
However, the fixups required should be very simple once some kind
of compiletesting is done.

This patch is sent as a MIME attachment and compressed in order to
avoid various MTA's barfing on messages of this size.


-- wli

[-- Attachment #2: cpumask_t-2.5.73-mm2-1.bz2 --]
[-- Type: application/octet-stream, Size: 36623 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01  0:39 ` 2.5.73-mm2 William Lee Irwin III
  2003-07-01  2:14   ` 2.5.73-mm2 Andrew Morton
@ 2003-07-01 10:46   ` Hugh Dickins
  2003-07-01 10:51     ` 2.5.73-mm2 William Lee Irwin III
  1 sibling, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2003-07-01 10:46 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, 30 Jun 2003, William Lee Irwin III wrote:
> 
> It was suggested during my last round of OOM killer fixes that one of
> my patches, which just checked nr_free_buffer_pages() > 0, should also
> consider userspace (i.e. reclaimable at will) memory free.

If you pursued it, wouldn't your patch also need to change
nr_free_buffer_pages() to do what you think it does, count
the free lowmem pages?  It, and nr_free_pagecache_pages(),
and nr_free_zone_pages(), are horribly badly named.  They
count present_pages-pages_high, they don't count free pages:
okay for initialization estimates, useless for anything dynamic.

Hugh

p.s. any chance of some more imaginative Subject lines :-?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01 10:46   ` 2.5.73-mm2 Hugh Dickins
@ 2003-07-01 10:51     ` William Lee Irwin III
  2003-07-01 11:08       ` 2.5.73-mm2 Hugh Dickins
  0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-01 10:51 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, 30 Jun 2003, William Lee Irwin III wrote:
>> It was suggested during my last round of OOM killer fixes that one of
>> my patches, which just checked nr_free_buffer_pages() > 0, should also
>> consider userspace (i.e. reclaimable at will) memory free.

On Tue, Jul 01, 2003 at 11:46:34AM +0100, Hugh Dickins wrote:
> If you pursued it, wouldn't your patch also need to change
> nr_free_buffer_pages() to do what you think it does, count
> the free lowmem pages?  It, and nr_free_pagecache_pages(),
> and nr_free_zone_pages(), are horribly badly named.  They
> count present_pages-pages_high, they don't count free pages:
> okay for initialization estimates, useless for anything dynamic.
> Hugh
> p.s. any chance of some more imaginative Subject lines :-?

Well, I was mostly looking for getting handed back 0 when lowmem is
empty; I actually did realize they didn't give entirely accurate counts
of free lowmem pages.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01 10:51     ` 2.5.73-mm2 William Lee Irwin III
@ 2003-07-01 11:08       ` Hugh Dickins
  2003-07-01 11:08         ` 2.5.73-mm2 William Lee Irwin III
  0 siblings, 1 reply; 23+ messages in thread
From: Hugh Dickins @ 2003-07-01 11:08 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel, linux-mm

On Tue, 1 Jul 2003, William Lee Irwin III wrote:
> On Tue, Jul 01, 2003 at 11:46:34AM +0100, Hugh Dickins wrote:
> > If you pursued it, wouldn't your patch also need to change
> > nr_free_buffer_pages() to do what you think it does, count
> > the free lowmem pages?  It, and nr_free_pagecache_pages(),
> > and nr_free_zone_pages(), are horribly badly named.  They
> > count present_pages-pages_high, they don't count free pages:
> > okay for initialization estimates, useless for anything dynamic.
> 
> Well, I was mostly looking for getting handed back 0 when lowmem is
> empty; I actually did realize they didn't give entirely accurate counts
> of free lowmem pages.

I'm not pleading for complete accuracy, but nr_free_buffer_pages()
will never hand back 0 (if your system managed to boot).
It's a static count of present_pages (adjusted), not of
free pages.  Or am I misreading nr_free_zone_pages()?

Hugh


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01 11:08       ` 2.5.73-mm2 Hugh Dickins
@ 2003-07-01 11:08         ` William Lee Irwin III
  2003-07-01 12:39           ` 2.5.73-mm2 Nikita Danilov
  0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-01 11:08 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm

On Tue, 1 Jul 2003, William Lee Irwin III wrote:
>> Well, I was mostly looking for getting handed back 0 when lowmem is
>> empty; I actually did realize they didn't give entirely accurate counts
>> of free lowmem pages.

On Tue, Jul 01, 2003 at 12:08:03PM +0100, Hugh Dickins wrote:
> I'm not pleading for complete accuracy, but nr_free_buffer_pages()
> will never hand back 0 (if your system managed to boot).
> It's a static count of present_pages (adjusted), not of
> free pages.  Or am I misreading nr_free_zone_pages()?

You're right. Wow, that's even more worse than I suspected.


-- wli

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-07-01 11:08         ` 2.5.73-mm2 William Lee Irwin III
@ 2003-07-01 12:39           ` Nikita Danilov
  0 siblings, 0 replies; 23+ messages in thread
From: Nikita Danilov @ 2003-07-01 12:39 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, linux-mm

William Lee Irwin III writes:
 > On Tue, 1 Jul 2003, William Lee Irwin III wrote:
 > >> Well, I was mostly looking for getting handed back 0 when lowmem is
 > >> empty; I actually did realize they didn't give entirely accurate counts
 > >> of free lowmem pages.
 > 
 > On Tue, Jul 01, 2003 at 12:08:03PM +0100, Hugh Dickins wrote:
 > > I'm not pleading for complete accuracy, but nr_free_buffer_pages()
 > > will never hand back 0 (if your system managed to boot).
 > > It's a static count of present_pages (adjusted), not of
 > > free pages.  Or am I misreading nr_free_zone_pages()?
 > 
 > You're right. Wow, that's even more worse than I suspected.
 > 

Another thing is that if one boots with mem=X, nr_free_pagecache_pages()
returns X. However part of X (occupied by kernel image, etc) is not part
of any zone. As a result, zone actually contains fewer pages than
reported by nr_free_pagecache_pages(). With X small enough (comparable
with kernel image size, for example) this can confuse
balance_dirty_pages() enough so that throttling would never start, and
VM will oom_kill().

 > 
 > -- wli

Nikita.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 2.5.73-mm2
  2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
  2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
  2003-06-28 23:00   ` 2.5.73-mm2 Andrew Morton
@ 2003-07-02  3:11   ` William Lee Irwin III
  2 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2003-07-02  3:11 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

On Fri, Jun 27, 2003 at 08:21:30PM -0700, Andrew Morton wrote:
>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.73/2.5.73-mm2/
>> Just bits and pieces.

On Sat, Jun 28, 2003 at 08:54:36AM -0700, William Lee Irwin III wrote:
> Here's highpmd. This allocates L2 pagetables from highmem, decreasing
> the per-process lowmem overhead on CONFIG_HIGHMEM64G from 20KB to 8KB.
> Some attempts were made to update non-i386 architectures to the new
> API's, though they're entirely untested. It's been tested for a while
> in -wli on i386 machines, both lowmem and highmem boxen.

Here's highpmd again, but with the bash-shared-mappings oops fixed.
Some missing s/pmd_alloc()/pmd_alloc_map()/ conversions in non-i386
code are also included in the update.

Included as a MIME attachment to prevent MTA's from barfing on its size.


-- wli

[-- Attachment #2: highpmd-2.5.73-mm2-2.bz2 --]
[-- Type: application/octet-stream, Size: 14399 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-07-02  2:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-28  3:21 2.5.73-mm2 Andrew Morton
2003-06-28  8:56 ` 2.5.73-mm2 William Lee Irwin III
2003-06-28 15:54 ` 2.5.73-mm2 William Lee Irwin III
2003-06-28 16:08   ` 2.5.73-mm2 Christoph Hellwig
2003-06-28 20:49     ` 2.5.73-mm2 William Lee Irwin III
2003-06-29  0:34     ` 2.5.73-mm2 Martin J. Bligh
2003-06-29  2:18       ` 2.5.73-mm2 William Lee Irwin III
2003-06-29  3:07         ` 2.5.73-mm2 Martin J. Bligh
2003-06-28 23:00   ` 2.5.73-mm2 Andrew Morton
2003-06-28 23:11     ` 2.5.73-mm2 William Lee Irwin III
2003-06-29 12:45       ` 2.5.73-mm2 Zwane Mwaikambo
2003-07-02  3:11   ` 2.5.73-mm2 William Lee Irwin III
2003-06-29 19:04 ` [patch] 2.5.73-mm2: let CONFIG_TC35815 depend on CONFIG_TOSHIBA_JMR3927 Adrian Bunk
2003-06-29 21:04   ` Ralf Baechle
2003-07-01  0:39 ` 2.5.73-mm2 William Lee Irwin III
2003-07-01  2:14   ` 2.5.73-mm2 Andrew Morton
2003-07-01  2:46     ` 2.5.73-mm2 William Lee Irwin III
2003-07-01 10:46   ` 2.5.73-mm2 Hugh Dickins
2003-07-01 10:51     ` 2.5.73-mm2 William Lee Irwin III
2003-07-01 11:08       ` 2.5.73-mm2 Hugh Dickins
2003-07-01 11:08         ` 2.5.73-mm2 William Lee Irwin III
2003-07-01 12:39           ` 2.5.73-mm2 Nikita Danilov
2003-07-01  5:56 ` 2.5.73-mm2 William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).