linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch 00/33] 2.6.20-stable review
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
@ 2007-04-26 16:48   ` David Lang
  2007-04-26 17:30     ` Greg KH
  2007-04-26 16:54   ` [patch 01/33] knfsd: Use a spinlock to protect sk_info_authunix Greg KH
                     ` (35 subsequent siblings)
  36 siblings, 1 reply; 48+ messages in thread
From: David Lang @ 2007-04-26 16:48 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, stable

any idea why there are so many more -stable patches for 2.6.20? this is the 10th 
-stable series, and most of them have been dozens of patches.

is there a new team reporting and fixing bugs? or were there just more small 
problems found in 2.6.20 then normal? or something else?

David Lang

On Thu, 26 Apr 2007, Greg KH wrote:

> This is the start of the stable review cycle for the 2.6.20.10 release.
> There are 33 patches in this series, all will be posted as a response to
> this one.  If anyone has any issues with these being applied, please let
> us know.  If anyone is a maintainer of the proper subsystem, and wants
> to add a Signed-off-by: line to the patch, please respond with it.
>
> These patches are sent out with a number of different people on the Cc:
> line.  If you wish to be a reviewer, please email stable@kernel.org to
> add your name to the list.  If you want to be off the reviewer list,
> also email us.
>
> Responses should be made by April 29, 00:00:00 UTC.  Anything received
> after that time might be too late.
>
> thanks,
>
> the -stable release team
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 00/33] 2.6.20-stable review
@ 2007-04-26 16:54 ` Greg KH
  2007-04-26 16:48   ` David Lang
                     ` (36 more replies)
  0 siblings, 37 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan

This is the start of the stable review cycle for the 2.6.20.10 release.
There are 33 patches in this series, all will be posted as a response to
this one.  If anyone has any issues with these being applied, please let
us know.  If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Cc:
line.  If you wish to be a reviewer, please email stable@kernel.org to
add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by April 29, 00:00:00 UTC.  Anything received
after that time might be too late.

thanks,

the -stable release team

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 01/33] knfsd: Use a spinlock to protect sk_info_authunix
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
  2007-04-26 16:48   ` David Lang
@ 2007-04-26 16:54   ` Greg KH
  2007-04-26 16:55   ` [patch 02/33] IB/mthca: Fix data corruption after FMR unmap on Sinai Greg KH
                     ` (34 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:54 UTC (permalink / raw)
  To: linux-kernel, stable, Andrew Morton
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, alan, Gabriel Barazer, nfs, Greg Banks,
	Neil Brown

[-- Attachment #1: knfsd-use-a-spinlock-to-protect-sk_info_authunix.patch --]
[-- Type: text/plain, Size: 2146 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: NeilBrown <neilb@suse.de>

sk_info_authunix is not being protected properly so the object that
it points to can be cache_put twice, leading to corruption.

We borrow svsk->sk_defer_lock to provide the protection.  We should probably
rename that lock to have a more generic name - later.

Thanks to Gabriel for reporting this.

Cc: Greg Banks <gnb@melbourne.sgi.com>
Cc: Gabriel Barazer <gabriel@oxeva.fr>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/sunrpc/svcauth_unix.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -383,7 +383,10 @@ void svcauth_unix_purge(void)
 static inline struct ip_map *
 ip_map_cached_get(struct svc_rqst *rqstp)
 {
-	struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
+	struct ip_map *ipm;
+	struct svc_sock *svsk = rqstp->rq_sock;
+	spin_lock_bh(&svsk->sk_defer_lock);
+	ipm = svsk->sk_info_authunix;
 	if (ipm != NULL) {
 		if (!cache_valid(&ipm->h)) {
 			/*
@@ -391,12 +394,14 @@ ip_map_cached_get(struct svc_rqst *rqstp
 			 * remembered, e.g. by a second mount from the
 			 * same IP address.
 			 */
-			rqstp->rq_sock->sk_info_authunix = NULL;
+			svsk->sk_info_authunix = NULL;
+			spin_unlock_bh(&svsk->sk_defer_lock);
 			cache_put(&ipm->h, &ip_map_cache);
 			return NULL;
 		}
 		cache_get(&ipm->h);
 	}
+	spin_unlock_bh(&svsk->sk_defer_lock);
 	return ipm;
 }
 
@@ -405,9 +410,15 @@ ip_map_cached_put(struct svc_rqst *rqstp
 {
 	struct svc_sock *svsk = rqstp->rq_sock;
 
-	if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL)
-		svsk->sk_info_authunix = ipm;	/* newly cached, keep the reference */
-	else
+	spin_lock_bh(&svsk->sk_defer_lock);
+	if (svsk->sk_sock->type == SOCK_STREAM &&
+	    svsk->sk_info_authunix == NULL) {
+		/* newly cached, keep the reference */
+		svsk->sk_info_authunix = ipm;
+		ipm = NULL;
+	}
+	spin_unlock_bh(&svsk->sk_defer_lock);
+	if (ipm)
 		cache_put(&ipm->h, &ip_map_cache);
 }
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 02/33] IB/mthca: Fix data corruption after FMR unmap on Sinai
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
  2007-04-26 16:48   ` David Lang
  2007-04-26 16:54   ` [patch 01/33] knfsd: Use a spinlock to protect sk_info_authunix Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 03/33] HID: zeroing of bytes in output fields is bogus Greg KH
                     ` (33 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, mst, general,
	Michael S. Tsirkin, Roland Dreier

[-- Attachment #1: ib-mthca-fix-data-corruption-after-fmr-unmap-on-sinai.patch --]
[-- Type: text/plain, Size: 1379 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Michael S. Tsirkin <mst@dev.mellanox.co.il>

In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different.  This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.

Fix by re-applying adjust_key() after masking the key.

Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/infiniband/hw/mthca/mthca_mr.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -751,6 +751,7 @@ void mthca_arbel_fmr_unmap(struct mthca_
 
 	key = arbel_key_to_hw_index(fmr->ibmr.lkey);
 	key &= dev->limits.num_mpts - 1;
+	key = adjust_key(dev, key);
 	fmr->ibmr.lkey = fmr->ibmr.rkey = arbel_hw_index_to_key(key);
 
 	fmr->maps = 0;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 03/33] HID: zeroing of bytes in output fields is bogus
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (2 preceding siblings ...)
  2007-04-26 16:55   ` [patch 02/33] IB/mthca: Fix data corruption after FMR unmap on Sinai Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 04/33] KVM: MMU: Fix guest writes to nonpae pde Greg KH
                     ` (32 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Jiri Kosina

[-- Attachment #1: hid-zeroing-of-bytes-in-output-fields-is-bogus.patch --]
[-- Type: text/plain, Size: 1609 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Jiri Kosina <jkosina@suse.cz>

HID: zeroing of bytes in output fields is bogus

This patch removes bogus zeroing of unused bits in output reports,
introduced in Simon's patch in commit d4ae650a.
According to the specification, any sane device should not care
about values of unused bits.

What is worse, the zeroing is done in a way which is broken and
might clear certain bits in output reports which are actually
_used_ - a device that has multiple fields with one value of
the size 1 bit each might serve as an example of why this is
bogus - the second call of hid_output_report() would clear the
first bit of report, which has already been set up previously.

This patch will break LEDs on SpaceNavigator, because this device
is broken and takes into account the bits which it shouldn't touch.
The quirk for this particular device will be provided in a separate
patch.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/hid/hid-core.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -876,10 +876,6 @@ static void hid_output_field(struct hid_
 	unsigned size = field->report_size;
 	unsigned n;
 
-	/* make sure the unused bits in the last byte are zeros */
-	if (count > 0 && size > 0)
-		data[(count*size-1)/8] = 0;
-
 	for (n = 0; n < count; n++) {
 		if (field->logical_minimum < 0)	/* signed values */
 			implement(data, offset + n * size, size, s32ton(field->value[n], size));

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 04/33] KVM: MMU: Fix guest writes to nonpae pde
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (3 preceding siblings ...)
  2007-04-26 16:55   ` [patch 03/33] HID: zeroing of bytes in output fields is bogus Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 05/33] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram Greg KH
                     ` (31 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, kvm-devel, Avi Kivity,
	Ingo Molnar

[-- Attachment #1: kvm-mmu-fix-guest-writes-to-nonpae-pde.patch --]
[-- Type: text/plain, Size: 3154 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Avi Kivity <avi@qumranet.com>

KVM shadow page tables are always in pae mode, regardless of the guest
setting.  This means that a guest pde (mapping 4MB of memory) is mapped
to two shadow pdes (mapping 2MB each).

When the guest writes to a pte or pde, we intercept the write and emulate it.
We also remove any shadowed mappings corresponding to the write.  Since the
mmu did not account for the doubling in the number of pdes, it removed the
wrong entry, resulting in a mismatch between shadow page tables and guest
page tables, followed shortly by guest memory corruption.

This patch fixes the problem by detecting the special case of writing to
a non-pae pde and adjusting the address and number of shadow pdes zapped
accordingly.

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/kvm/mmu.c |   47 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 12 deletions(-)

--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -1093,22 +1093,40 @@ out:
 	return r;
 }
 
+static void mmu_pre_write_zap_pte(struct kvm_vcpu *vcpu,
+				  struct kvm_mmu_page *page,
+				  u64 *spte)
+{
+	u64 pte;
+	struct kvm_mmu_page *child;
+
+	pte = *spte;
+	if (is_present_pte(pte)) {
+		if (page->role.level == PT_PAGE_TABLE_LEVEL)
+			rmap_remove(vcpu, spte);
+		else {
+			child = page_header(pte & PT64_BASE_ADDR_MASK);
+			mmu_page_remove_parent_pte(vcpu, child, spte);
+		}
+	}
+	*spte = 0;
+}
+
 void kvm_mmu_pre_write(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *page;
-	struct kvm_mmu_page *child;
 	struct hlist_node *node, *n;
 	struct hlist_head *bucket;
 	unsigned index;
 	u64 *spte;
-	u64 pte;
 	unsigned offset = offset_in_page(gpa);
 	unsigned pte_size;
 	unsigned page_offset;
 	unsigned misaligned;
 	int level;
 	int flooded = 0;
+	int npte;
 
 	pgprintk("%s: gpa %llx bytes %d\n", __FUNCTION__, gpa, bytes);
 	if (gfn == vcpu->last_pt_write_gfn) {
@@ -1144,22 +1162,27 @@ void kvm_mmu_pre_write(struct kvm_vcpu *
 		}
 		page_offset = offset;
 		level = page->role.level;
+		npte = 1;
 		if (page->role.glevels == PT32_ROOT_LEVEL) {
-			page_offset <<= 1;          /* 32->64 */
+			page_offset <<= 1;	/* 32->64 */
+			/*
+			 * A 32-bit pde maps 4MB while the shadow pdes map
+			 * only 2MB.  So we need to double the offset again
+			 * and zap two pdes instead of one.
+			 */
+			if (level == PT32_ROOT_LEVEL) {
+				page_offset &= ~7; /* kill rounding error */
+				page_offset <<= 1;
+				npte = 2;
+			}
 			page_offset &= ~PAGE_MASK;
 		}
 		spte = __va(page->page_hpa);
 		spte += page_offset / sizeof(*spte);
-		pte = *spte;
-		if (is_present_pte(pte)) {
-			if (level == PT_PAGE_TABLE_LEVEL)
-				rmap_remove(vcpu, spte);
-			else {
-				child = page_header(pte & PT64_BASE_ADDR_MASK);
-				mmu_page_remove_parent_pte(vcpu, child, spte);
-			}
+		while (npte--) {
+			mmu_pre_write_zap_pte(vcpu, page, spte);
+			++spte;
 		}
-		*spte = 0;
 	}
 }
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 05/33] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (4 preceding siblings ...)
  2007-04-26 16:55   ` [patch 04/33] KVM: MMU: Fix guest writes to nonpae pde Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 06/33] holepunch: fix shmem_truncate_range punching too far Greg KH
                     ` (30 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, kvm-devel, Avi Kivity,
	Ingo Molnar

[-- Attachment #1: kvm-mmu-fix-host-memory-corruption-on-i386-with-4gb-ram.patch --]
[-- Type: text/plain, Size: 1574 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Avi Kivity <avi@qumranet.com>

PAGE_MASK is an unsigned long, so using it to mask physical addresses on
i386 (which are 64-bit wide) leads to truncation.  This can result in
page->private of unrelated memory pages being modified, with disasterous
results.

Fix by not using PAGE_MASK for physical addresses; instead calculate
the correct value directly from PAGE_SIZE.  Also fix a similar BUG_ON().

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/kvm/mmu.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/kvm/mmu.c
+++ b/drivers/kvm/mmu.c
@@ -131,7 +131,7 @@ static int dbg = 1;
 	(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
 
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & PAGE_MASK)
+#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
 #define PT64_DIR_BASE_ADDR_MASK \
 	(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 
@@ -406,8 +406,8 @@ static void rmap_write_protect(struct kv
 			spte = desc->shadow_ptes[0];
 		}
 		BUG_ON(!spte);
-		BUG_ON((*spte & PT64_BASE_ADDR_MASK) !=
-		       page_to_pfn(page) << PAGE_SHIFT);
+		BUG_ON((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT
+		       != page_to_pfn(page));
 		BUG_ON(!(*spte & PT_PRESENT_MASK));
 		BUG_ON(!(*spte & PT_WRITABLE_MASK));
 		rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 06/33] holepunch: fix shmem_truncate_range punching too far
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (5 preceding siblings ...)
  2007-04-26 16:55   ` [patch 05/33] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 07/33] holepunch: fix shmem_truncate_range punch locking Greg KH
                     ` (29 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Miklos Szeredi, Hugh Dickins

[-- Attachment #1: holepunch-fix-shmem_truncate_range-punching-too-far.patch --]
[-- Type: text/plain, Size: 3674 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Hugh Dickins <hugh@veritas.com>

Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered
in rare circumstances, because shmem_truncate_range() erroneously
removes partially truncated directory pages at the end of the range:
later reclaim on pages pointing to these removed directories triggers
the BUG.  Indeed, and it can also cause data loss beyond the hole.

Fix this as in the patch proposed by Miklos, but distinguish between
"limit" (how far we need to search: ignore truncation's next_index
optimization in the holepunch case - if there are races it's more
consistent to act on the whole range specified) and "upper_limit"
(how far we can free directory pages: generally we must be careful
to keep partially punched pages, but can relax at end of file -
i_size being held stable by i_mutex).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 mm/shmem.c |   32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -481,7 +481,8 @@ static void shmem_truncate_range(struct 
 	long nr_swaps_freed = 0;
 	int offset;
 	int freed;
-	int punch_hole = 0;
+	int punch_hole;
+	unsigned long upper_limit;
 
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 	idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
@@ -492,11 +493,18 @@ static void shmem_truncate_range(struct 
 	info->flags |= SHMEM_TRUNCATE;
 	if (likely(end == (loff_t) -1)) {
 		limit = info->next_index;
+		upper_limit = SHMEM_MAX_INDEX;
 		info->next_index = idx;
+		punch_hole = 0;
 	} else {
-		limit = (end + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-		if (limit > info->next_index)
-			limit = info->next_index;
+		if (end + 1 >= inode->i_size) {	/* we may free a little more */
+			limit = (inode->i_size + PAGE_CACHE_SIZE - 1) >>
+							PAGE_CACHE_SHIFT;
+			upper_limit = SHMEM_MAX_INDEX;
+		} else {
+			limit = (end + 1) >> PAGE_CACHE_SHIFT;
+			upper_limit = limit;
+		}
 		punch_hole = 1;
 	}
 
@@ -520,10 +528,10 @@ static void shmem_truncate_range(struct 
 	 * If there are no indirect blocks or we are punching a hole
 	 * below indirect blocks, nothing to be done.
 	 */
-	if (!topdir || (punch_hole && (limit <= SHMEM_NR_DIRECT)))
+	if (!topdir || limit <= SHMEM_NR_DIRECT)
 		goto done2;
 
-	BUG_ON(limit <= SHMEM_NR_DIRECT);
+	upper_limit -= SHMEM_NR_DIRECT;
 	limit -= SHMEM_NR_DIRECT;
 	idx = (idx > SHMEM_NR_DIRECT)? (idx - SHMEM_NR_DIRECT): 0;
 	offset = idx % ENTRIES_PER_PAGE;
@@ -543,7 +551,7 @@ static void shmem_truncate_range(struct 
 		if (*dir) {
 			diroff = ((idx - ENTRIES_PER_PAGEPAGE/2) %
 				ENTRIES_PER_PAGEPAGE) / ENTRIES_PER_PAGE;
-			if (!diroff && !offset) {
+			if (!diroff && !offset && upper_limit >= stage) {
 				*dir = NULL;
 				nr_pages_to_free++;
 				list_add(&middir->lru, &pages_to_free);
@@ -570,9 +578,11 @@ static void shmem_truncate_range(struct 
 			}
 			stage = idx + ENTRIES_PER_PAGEPAGE;
 			middir = *dir;
-			*dir = NULL;
-			nr_pages_to_free++;
-			list_add(&middir->lru, &pages_to_free);
+			if (upper_limit >= stage) {
+				*dir = NULL;
+				nr_pages_to_free++;
+				list_add(&middir->lru, &pages_to_free);
+			}
 			shmem_dir_unmap(dir);
 			cond_resched();
 			dir = shmem_dir_map(middir);
@@ -598,7 +608,7 @@ static void shmem_truncate_range(struct 
 		}
 		if (offset)
 			offset = 0;
-		else if (subdir && !page_private(subdir)) {
+		else if (subdir && upper_limit - idx >= ENTRIES_PER_PAGE) {
 			dir[diroff] = NULL;
 			nr_pages_to_free++;
 			list_add(&subdir->lru, &pages_to_free);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 07/33] holepunch: fix shmem_truncate_range punch locking
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (6 preceding siblings ...)
  2007-04-26 16:55   ` [patch 06/33] holepunch: fix shmem_truncate_range punching too far Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 08/33] holepunch: fix disconnected pages after second truncate Greg KH
                     ` (28 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Miklos Szeredi, Hugh Dickins

[-- Attachment #1: holepunch-fix-shmem_truncate_range-punch-locking.patch --]
[-- Type: text/plain, Size: 7128 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Hugh Dickins <hugh@veritas.com>

Miklos Szeredi observes that during truncation of shmem page directories,
info->lock is released to improve latency (after lowering i_size and
next_index to exclude races); but this is quite wrong for holepunching,
which receives no such protection from i_size or next_index, and is left
vulnerable to races with shmem_unuse, shmem_getpage and shmem_writepage.

Hold info->lock throughout when holepunching?  No, any user could prevent
rescheduling for far too long.  Instead take info->lock just when needed:
in shmem_free_swp when removing the swap entries, and whenever removing
a directory page from the level above.  But so long as we remove before
scanning, we can safely skip taking the lock at the lower levels, except
at misaligned start and end of the hole.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/shmem.c |   96 ++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 23 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -402,26 +402,38 @@ static swp_entry_t *shmem_swp_alloc(stru
 /*
  * shmem_free_swp - free some swap entries in a directory
  *
- * @dir:   pointer to the directory
- * @edir:  pointer after last entry of the directory
+ * @dir:        pointer to the directory
+ * @edir:       pointer after last entry of the directory
+ * @punch_lock: pointer to spinlock when needed for the holepunch case
  */
-static int shmem_free_swp(swp_entry_t *dir, swp_entry_t *edir)
+static int shmem_free_swp(swp_entry_t *dir, swp_entry_t *edir,
+						spinlock_t *punch_lock)
 {
+	spinlock_t *punch_unlock = NULL;
 	swp_entry_t *ptr;
 	int freed = 0;
 
 	for (ptr = dir; ptr < edir; ptr++) {
 		if (ptr->val) {
+			if (unlikely(punch_lock)) {
+				punch_unlock = punch_lock;
+				punch_lock = NULL;
+				spin_lock(punch_unlock);
+				if (!ptr->val)
+					continue;
+			}
 			free_swap_and_cache(*ptr);
 			*ptr = (swp_entry_t){0};
 			freed++;
 		}
 	}
+	if (punch_unlock)
+		spin_unlock(punch_unlock);
 	return freed;
 }
 
-static int shmem_map_and_free_swp(struct page *subdir,
-		int offset, int limit, struct page ***dir)
+static int shmem_map_and_free_swp(struct page *subdir, int offset,
+		int limit, struct page ***dir, spinlock_t *punch_lock)
 {
 	swp_entry_t *ptr;
 	int freed = 0;
@@ -431,7 +443,8 @@ static int shmem_map_and_free_swp(struct
 		int size = limit - offset;
 		if (size > LATENCY_LIMIT)
 			size = LATENCY_LIMIT;
-		freed += shmem_free_swp(ptr+offset, ptr+offset+size);
+		freed += shmem_free_swp(ptr+offset, ptr+offset+size,
+							punch_lock);
 		if (need_resched()) {
 			shmem_swp_unmap(ptr);
 			if (*dir) {
@@ -482,6 +495,8 @@ static void shmem_truncate_range(struct 
 	int offset;
 	int freed;
 	int punch_hole;
+	spinlock_t *needs_lock;
+	spinlock_t *punch_lock;
 	unsigned long upper_limit;
 
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
@@ -495,6 +510,7 @@ static void shmem_truncate_range(struct 
 		limit = info->next_index;
 		upper_limit = SHMEM_MAX_INDEX;
 		info->next_index = idx;
+		needs_lock = NULL;
 		punch_hole = 0;
 	} else {
 		if (end + 1 >= inode->i_size) {	/* we may free a little more */
@@ -505,6 +521,7 @@ static void shmem_truncate_range(struct 
 			limit = (end + 1) >> PAGE_CACHE_SHIFT;
 			upper_limit = limit;
 		}
+		needs_lock = &info->lock;
 		punch_hole = 1;
 	}
 
@@ -521,7 +538,7 @@ static void shmem_truncate_range(struct 
 		size = limit;
 		if (size > SHMEM_NR_DIRECT)
 			size = SHMEM_NR_DIRECT;
-		nr_swaps_freed = shmem_free_swp(ptr+idx, ptr+size);
+		nr_swaps_freed = shmem_free_swp(ptr+idx, ptr+size, needs_lock);
 	}
 
 	/*
@@ -531,6 +548,19 @@ static void shmem_truncate_range(struct 
 	if (!topdir || limit <= SHMEM_NR_DIRECT)
 		goto done2;
 
+	/*
+	 * The truncation case has already dropped info->lock, and we're safe
+	 * because i_size and next_index have already been lowered, preventing
+	 * access beyond.  But in the punch_hole case, we still need to take
+	 * the lock when updating the swap directory, because there might be
+	 * racing accesses by shmem_getpage(SGP_CACHE), shmem_unuse_inode or
+	 * shmem_writepage.  However, whenever we find we can remove a whole
+	 * directory page (not at the misaligned start or end of the range),
+	 * we first NULLify its pointer in the level above, and then have no
+	 * need to take the lock when updating its contents: needs_lock and
+	 * punch_lock (either pointing to info->lock or NULL) manage this.
+	 */
+
 	upper_limit -= SHMEM_NR_DIRECT;
 	limit -= SHMEM_NR_DIRECT;
 	idx = (idx > SHMEM_NR_DIRECT)? (idx - SHMEM_NR_DIRECT): 0;
@@ -552,7 +582,13 @@ static void shmem_truncate_range(struct 
 			diroff = ((idx - ENTRIES_PER_PAGEPAGE/2) %
 				ENTRIES_PER_PAGEPAGE) / ENTRIES_PER_PAGE;
 			if (!diroff && !offset && upper_limit >= stage) {
-				*dir = NULL;
+				if (needs_lock) {
+					spin_lock(needs_lock);
+					*dir = NULL;
+					spin_unlock(needs_lock);
+					needs_lock = NULL;
+				} else
+					*dir = NULL;
 				nr_pages_to_free++;
 				list_add(&middir->lru, &pages_to_free);
 			}
@@ -578,8 +614,16 @@ static void shmem_truncate_range(struct 
 			}
 			stage = idx + ENTRIES_PER_PAGEPAGE;
 			middir = *dir;
+			if (punch_hole)
+				needs_lock = &info->lock;
 			if (upper_limit >= stage) {
-				*dir = NULL;
+				if (needs_lock) {
+					spin_lock(needs_lock);
+					*dir = NULL;
+					spin_unlock(needs_lock);
+					needs_lock = NULL;
+				} else
+					*dir = NULL;
 				nr_pages_to_free++;
 				list_add(&middir->lru, &pages_to_free);
 			}
@@ -588,31 +632,37 @@ static void shmem_truncate_range(struct 
 			dir = shmem_dir_map(middir);
 			diroff = 0;
 		}
+		punch_lock = needs_lock;
 		subdir = dir[diroff];
-		if (subdir && page_private(subdir)) {
+		if (subdir && !offset && upper_limit-idx >= ENTRIES_PER_PAGE) {
+			if (needs_lock) {
+				spin_lock(needs_lock);
+				dir[diroff] = NULL;
+				spin_unlock(needs_lock);
+				punch_lock = NULL;
+			} else
+				dir[diroff] = NULL;
+			nr_pages_to_free++;
+			list_add(&subdir->lru, &pages_to_free);
+		}
+		if (subdir && page_private(subdir) /* has swap entries */) {
 			size = limit - idx;
 			if (size > ENTRIES_PER_PAGE)
 				size = ENTRIES_PER_PAGE;
 			freed = shmem_map_and_free_swp(subdir,
-						offset, size, &dir);
+					offset, size, &dir, punch_lock);
 			if (!dir)
 				dir = shmem_dir_map(middir);
 			nr_swaps_freed += freed;
-			if (offset)
+			if (offset || punch_lock) {
 				spin_lock(&info->lock);
-			set_page_private(subdir, page_private(subdir) - freed);
-			if (offset)
+				set_page_private(subdir,
+					page_private(subdir) - freed);
 				spin_unlock(&info->lock);
-			if (!punch_hole)
-				BUG_ON(page_private(subdir) > offset);
-		}
-		if (offset)
-			offset = 0;
-		else if (subdir && upper_limit - idx >= ENTRIES_PER_PAGE) {
-			dir[diroff] = NULL;
-			nr_pages_to_free++;
-			list_add(&subdir->lru, &pages_to_free);
+			} else
+				BUG_ON(page_private(subdir) != freed);
 		}
+		offset = 0;
 	}
 done1:
 	shmem_dir_unmap(dir);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 08/33] holepunch: fix disconnected pages after second truncate
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (7 preceding siblings ...)
  2007-04-26 16:55   ` [patch 07/33] holepunch: fix shmem_truncate_range punch locking Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 09/33] holepunch: fix mmap_sem i_mutex deadlock Greg KH
                     ` (27 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Miklos Szeredi, Hugh Dickins

[-- Attachment #1: holepunch-fix-disconnected-pages-after-second-truncate.patch --]
[-- Type: text/plain, Size: 1899 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Hugh Dickins <hugh@veritas.com>

shmem_truncate_range has its own truncate_inode_pages_range, to free any
pages racily instantiated while it was in progress: a SHMEM_PAGEIN flag
is set when this might have happened.  But holepunching gets no chance
to clear that flag at the start of vmtruncate_range, so it's always set
(unless a truncate came just before), so holepunch almost always does
this second truncate_inode_pages_range.

shmem holepunch has unlikely swap<->file races hereabouts whatever we do
(without a fuller rework than is fit for this release): I was going to
skip the second truncate in the punch_hole case, but Miklos points out
that would make holepunch correctness more vulnerable to swapoff.  So
keep the second truncate, but follow it by an unmap_mapping_range to
eliminate the disconnected pages (freed from pagecache while still
mapped in userspace) that it might have left behind.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/shmem.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -674,8 +674,16 @@ done2:
 		 * generic_delete_inode did it, before we lowered next_index.
 		 * Also, though shmem_getpage checks i_size before adding to
 		 * cache, no recheck after: so fix the narrow window there too.
+		 *
+		 * Recalling truncate_inode_pages_range and unmap_mapping_range
+		 * every time for punch_hole (which never got a chance to clear
+		 * SHMEM_PAGEIN at the start of vmtruncate_range) is expensive,
+		 * yet hardly ever necessary: try to optimize them out later.
 		 */
 		truncate_inode_pages_range(inode->i_mapping, start, end);
+		if (punch_hole)
+			unmap_mapping_range(inode->i_mapping, start,
+							end - start, 1);
 	}
 
 	spin_lock(&info->lock);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 09/33] holepunch: fix mmap_sem i_mutex deadlock
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (8 preceding siblings ...)
  2007-04-26 16:55   ` [patch 08/33] holepunch: fix disconnected pages after second truncate Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 10/33] Fix sparc64 SBUS IOMMU allocator Greg KH
                     ` (26 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Miklos Szeredi, Hugh Dickins

[-- Attachment #1: holepunch-fix-mmap_sem-i_mutex-deadlock.patch --]
[-- Type: text/plain, Size: 2207 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Hugh Dickins <hugh@veritas.com>

sys_madvise has down_write of mmap_sem, then madvise_remove calls
vmtruncate_range which takes i_mutex and i_alloc_sem: no, we can
easily devise deadlocks from that ordering.

madvise_remove drop mmap_sem while calling vmtruncate_range: luckily,
since madvise_remove doesn't split or merge vmas, it's easy to handle
this case with a NULL prev, without restructuring sys_madvise.  (Though
sad to retake mmap_sem when it's unlikely to be needed, and certainly
down_read is sufficient for MADV_REMOVE, unlike the other madvices.)

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/madvise.c |   19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -159,9 +159,10 @@ static long madvise_remove(struct vm_are
 				unsigned long start, unsigned long end)
 {
 	struct address_space *mapping;
-        loff_t offset, endoff;
+	loff_t offset, endoff;
+	int error;
 
-	*prev = vma;
+	*prev = NULL;	/* tell sys_madvise we drop mmap_sem */
 
 	if (vma->vm_flags & (VM_LOCKED|VM_NONLINEAR|VM_HUGETLB))
 		return -EINVAL;
@@ -180,7 +181,12 @@ static long madvise_remove(struct vm_are
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 	endoff = (loff_t)(end - vma->vm_start - 1)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-	return  vmtruncate_range(mapping->host, offset, endoff);
+
+	/* vmtruncate_range needs to take i_mutex and i_alloc_sem */
+	up_write(&current->mm->mmap_sem);
+	error = vmtruncate_range(mapping->host, offset, endoff);
+	down_write(&current->mm->mmap_sem);
+	return error;
 }
 
 static long
@@ -315,12 +321,15 @@ asmlinkage long sys_madvise(unsigned lon
 		if (error)
 			goto out;
 		start = tmp;
-		if (start < prev->vm_end)
+		if (prev && start < prev->vm_end)
 			start = prev->vm_end;
 		error = unmapped_error;
 		if (start >= end)
 			goto out;
-		vma = prev->vm_next;
+		if (prev)
+			vma = prev->vm_next;
+		else	/* madvise_remove dropped mmap_sem */
+			vma = find_vma(current->mm, start);
 	}
 out:
 	up_write(&current->mm->mmap_sem);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 10/33] Fix sparc64 SBUS IOMMU allocator
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (9 preceding siblings ...)
  2007-04-26 16:55   ` [patch 09/33] holepunch: fix mmap_sem i_mutex deadlock Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 11/33] Fix qlogicpti DMA unmapping Greg KH
                     ` (25 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, David S. Miller

[-- Attachment #1: fix-sparc64-sbus-iommu-allocator.patch --]
[-- Type: text/plain, Size: 23906 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: David Miller <davem@davemloft.net>

[SPARC64]: Fix SBUS IOMMU allocation code.

There are several IOMMU allocator bugs.  Instead of trying to fix this
overly complicated code, just mirror the PCI IOMMU arena allocator
which is very stable and well stress tested.

I tried to make the code as identical as possible so we can switch
sun4u PCI and SBUS over to a common piece of IOMMU code.  All that
will be need are two callbacks, one to do a full IOMMU flush and one
to do a streaming buffer flush.

This patch gets rid of a lot of hangs and mysterious crashes on SBUS
sparc64 systems, at least for me.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/sparc64/kernel/sbus.c |  566 ++++++++++++++++++---------------------------
 1 file changed, 235 insertions(+), 331 deletions(-)

--- a/arch/sparc64/kernel/sbus.c
+++ b/arch/sparc64/kernel/sbus.c
@@ -24,48 +24,25 @@
 
 #include "iommu_common.h"
 
-/* These should be allocated on an SMP_CACHE_BYTES
- * aligned boundary for optimal performance.
- *
- * On SYSIO, using an 8K page size we have 1GB of SBUS
- * DMA space mapped.  We divide this space into equally
- * sized clusters. We allocate a DMA mapping from the
- * cluster that matches the order of the allocation, or
- * if the order is greater than the number of clusters,
- * we try to allocate from the last cluster.
- */
-
-#define NCLUSTERS	8UL
-#define ONE_GIG		(1UL * 1024UL * 1024UL * 1024UL)
-#define CLUSTER_SIZE	(ONE_GIG / NCLUSTERS)
-#define CLUSTER_MASK	(CLUSTER_SIZE - 1)
-#define CLUSTER_NPAGES	(CLUSTER_SIZE >> IO_PAGE_SHIFT)
 #define MAP_BASE	((u32)0xc0000000)
 
+struct sbus_iommu_arena {
+	unsigned long	*map;
+	unsigned int	hint;
+	unsigned int	limit;
+};
+
 struct sbus_iommu {
-/*0x00*/spinlock_t		lock;
+	spinlock_t		lock;
 
-/*0x08*/iopte_t			*page_table;
-/*0x10*/unsigned long		strbuf_regs;
-/*0x18*/unsigned long		iommu_regs;
-/*0x20*/unsigned long		sbus_control_reg;
-
-/*0x28*/volatile unsigned long	strbuf_flushflag;
-
-	/* If NCLUSTERS is ever decresed to 4 or lower,
-	 * you must increase the size of the type of
-	 * these counters.  You have been duly warned. -DaveM
-	 */
-/*0x30*/struct {
-		u16	next;
-		u16	flush;
-	} alloc_info[NCLUSTERS];
-
-	/* The lowest used consistent mapping entry.  Since
-	 * we allocate consistent maps out of cluster 0 this
-	 * is relative to the beginning of closter 0.
-	 */
-/*0x50*/u32		lowest_consistent_map;
+	struct sbus_iommu_arena	arena;
+
+	iopte_t			*page_table;
+	unsigned long		strbuf_regs;
+	unsigned long		iommu_regs;
+	unsigned long		sbus_control_reg;
+
+	volatile unsigned long	strbuf_flushflag;
 };
 
 /* Offsets from iommu_regs */
@@ -91,19 +68,6 @@ static void __iommu_flushall(struct sbus
 		tag += 8UL;
 	}
 	upa_readq(iommu->sbus_control_reg);
-
-	for (entry = 0; entry < NCLUSTERS; entry++) {
-		iommu->alloc_info[entry].flush =
-			iommu->alloc_info[entry].next;
-	}
-}
-
-static void iommu_flush(struct sbus_iommu *iommu, u32 base, unsigned long npages)
-{
-	while (npages--)
-		upa_writeq(base + (npages << IO_PAGE_SHIFT),
-			   iommu->iommu_regs + IOMMU_FLUSH);
-	upa_readq(iommu->sbus_control_reg);
 }
 
 /* Offsets from strbuf_regs */
@@ -156,178 +120,115 @@ static void sbus_strbuf_flush(struct sbu
 		       base, npages);
 }
 
-static iopte_t *alloc_streaming_cluster(struct sbus_iommu *iommu, unsigned long npages)
+/* Based largely upon the ppc64 iommu allocator.  */
+static long sbus_arena_alloc(struct sbus_iommu *iommu, unsigned long npages)
 {
-	iopte_t *iopte, *limit, *first, *cluster;
-	unsigned long cnum, ent, nent, flush_point, found;
-
-	cnum = 0;
-	nent = 1;
-	while ((1UL << cnum) < npages)
-		cnum++;
-	if(cnum >= NCLUSTERS) {
-		nent = 1UL << (cnum - NCLUSTERS);
-		cnum = NCLUSTERS - 1;
-	}
-	iopte  = iommu->page_table + (cnum * CLUSTER_NPAGES);
-
-	if (cnum == 0)
-		limit = (iommu->page_table +
-			 iommu->lowest_consistent_map);
-	else
-		limit = (iopte + CLUSTER_NPAGES);
-
-	iopte += ((ent = iommu->alloc_info[cnum].next) << cnum);
-	flush_point = iommu->alloc_info[cnum].flush;
-
-	first = iopte;
-	cluster = NULL;
-	found = 0;
-	for (;;) {
-		if (iopte_val(*iopte) == 0UL) {
-			found++;
-			if (!cluster)
-				cluster = iopte;
+	struct sbus_iommu_arena *arena = &iommu->arena;
+	unsigned long n, i, start, end, limit;
+	int pass;
+
+	limit = arena->limit;
+	start = arena->hint;
+	pass = 0;
+
+again:
+	n = find_next_zero_bit(arena->map, limit, start);
+	end = n + npages;
+	if (unlikely(end >= limit)) {
+		if (likely(pass < 1)) {
+			limit = start;
+			start = 0;
+			__iommu_flushall(iommu);
+			pass++;
+			goto again;
 		} else {
-			/* Used cluster in the way */
-			cluster = NULL;
-			found = 0;
+			/* Scanned the whole thing, give up. */
+			return -1;
 		}
+	}
 
-		if (found == nent)
-			break;
-
-		iopte += (1 << cnum);
-		ent++;
-		if (iopte >= limit) {
-			iopte = (iommu->page_table + (cnum * CLUSTER_NPAGES));
-			ent = 0;
-
-			/* Multiple cluster allocations must not wrap */
-			cluster = NULL;
-			found = 0;
+	for (i = n; i < end; i++) {
+		if (test_bit(i, arena->map)) {
+			start = i + 1;
+			goto again;
 		}
-		if (ent == flush_point)
-			__iommu_flushall(iommu);
-		if (iopte == first)
-			goto bad;
 	}
 
-	/* ent/iopte points to the last cluster entry we're going to use,
-	 * so save our place for the next allocation.
-	 */
-	if ((iopte + (1 << cnum)) >= limit)
-		ent = 0;
-	else
-		ent = ent + 1;
-	iommu->alloc_info[cnum].next = ent;
-	if (ent == flush_point)
-		__iommu_flushall(iommu);
-
-	/* I've got your streaming cluster right here buddy boy... */
-	return cluster;
-
-bad:
-	printk(KERN_EMERG "sbus: alloc_streaming_cluster of npages(%ld) failed!\n",
-	       npages);
-	return NULL;
+	for (i = n; i < end; i++)
+		__set_bit(i, arena->map);
+
+	arena->hint = end;
+
+	return n;
 }
 
-static void free_streaming_cluster(struct sbus_iommu *iommu, u32 base, unsigned long npages)
+static void sbus_arena_free(struct sbus_iommu_arena *arena, unsigned long base, unsigned long npages)
 {
-	unsigned long cnum, ent, nent;
-	iopte_t *iopte;
+	unsigned long i;
 
-	cnum = 0;
-	nent = 1;
-	while ((1UL << cnum) < npages)
-		cnum++;
-	if(cnum >= NCLUSTERS) {
-		nent = 1UL << (cnum - NCLUSTERS);
-		cnum = NCLUSTERS - 1;
-	}
-	ent = (base & CLUSTER_MASK) >> (IO_PAGE_SHIFT + cnum);
-	iopte = iommu->page_table + ((base - MAP_BASE) >> IO_PAGE_SHIFT);
-	do {
-		iopte_val(*iopte) = 0UL;
-		iopte += 1 << cnum;
-	} while(--nent);
-
-	/* If the global flush might not have caught this entry,
-	 * adjust the flush point such that we will flush before
-	 * ever trying to reuse it.
-	 */
-#define between(X,Y,Z)	(((Z) - (Y)) >= ((X) - (Y)))
-	if (between(ent, iommu->alloc_info[cnum].next, iommu->alloc_info[cnum].flush))
-		iommu->alloc_info[cnum].flush = ent;
-#undef between
+	for (i = base; i < (base + npages); i++)
+		__clear_bit(i, arena->map);
 }
 
-/* We allocate consistent mappings from the end of cluster zero. */
-static iopte_t *alloc_consistent_cluster(struct sbus_iommu *iommu, unsigned long npages)
+static void sbus_iommu_table_init(struct sbus_iommu *iommu, unsigned int tsbsize)
 {
-	iopte_t *iopte;
+	unsigned long tsbbase, order, sz, num_tsb_entries;
 
-	iopte = iommu->page_table + (1 * CLUSTER_NPAGES);
-	while (iopte > iommu->page_table) {
-		iopte--;
-		if (!(iopte_val(*iopte) & IOPTE_VALID)) {
-			unsigned long tmp = npages;
-
-			while (--tmp) {
-				iopte--;
-				if (iopte_val(*iopte) & IOPTE_VALID)
-					break;
-			}
-			if (tmp == 0) {
-				u32 entry = (iopte - iommu->page_table);
+	num_tsb_entries = tsbsize / sizeof(iopte_t);
 
-				if (entry < iommu->lowest_consistent_map)
-					iommu->lowest_consistent_map = entry;
-				return iopte;
-			}
-		}
+	/* Setup initial software IOMMU state. */
+	spin_lock_init(&iommu->lock);
+
+	/* Allocate and initialize the free area map.  */
+	sz = num_tsb_entries / 8;
+	sz = (sz + 7UL) & ~7UL;
+	iommu->arena.map = kzalloc(sz, GFP_KERNEL);
+	if (!iommu->arena.map) {
+		prom_printf("PCI_IOMMU: Error, kmalloc(arena.map) failed.\n");
+		prom_halt();
+	}
+	iommu->arena.limit = num_tsb_entries;
+
+	/* Now allocate and setup the IOMMU page table itself.  */
+	order = get_order(tsbsize);
+	tsbbase = __get_free_pages(GFP_KERNEL, order);
+	if (!tsbbase) {
+		prom_printf("IOMMU: Error, gfp(tsb) failed.\n");
+		prom_halt();
 	}
-	return NULL;
+	iommu->page_table = (iopte_t *)tsbbase;
+	memset(iommu->page_table, 0, tsbsize);
 }
 
-static void free_consistent_cluster(struct sbus_iommu *iommu, u32 base, unsigned long npages)
+static inline iopte_t *alloc_npages(struct sbus_iommu *iommu, unsigned long npages)
 {
-	iopte_t *iopte = iommu->page_table + ((base - MAP_BASE) >> IO_PAGE_SHIFT);
+	long entry;
 
-	if ((iopte - iommu->page_table) == iommu->lowest_consistent_map) {
-		iopte_t *walk = iopte + npages;
-		iopte_t *limit;
+	entry = sbus_arena_alloc(iommu, npages);
+	if (unlikely(entry < 0))
+		return NULL;
 
-		limit = iommu->page_table + CLUSTER_NPAGES;
-		while (walk < limit) {
-			if (iopte_val(*walk) != 0UL)
-				break;
-			walk++;
-		}
-		iommu->lowest_consistent_map =
-			(walk - iommu->page_table);
-	}
+	return iommu->page_table + entry;
+}
 
-	while (npages--)
-		*iopte++ = __iopte(0UL);
+static inline void free_npages(struct sbus_iommu *iommu, dma_addr_t base, unsigned long npages)
+{
+	sbus_arena_free(&iommu->arena, base >> IO_PAGE_SHIFT, npages);
 }
 
 void *sbus_alloc_consistent(struct sbus_dev *sdev, size_t size, dma_addr_t *dvma_addr)
 {
-	unsigned long order, first_page, flags;
 	struct sbus_iommu *iommu;
 	iopte_t *iopte;
+	unsigned long flags, order, first_page;
 	void *ret;
 	int npages;
 
-	if (size <= 0 || sdev == NULL || dvma_addr == NULL)
-		return NULL;
-
 	size = IO_PAGE_ALIGN(size);
 	order = get_order(size);
 	if (order >= 10)
 		return NULL;
+
 	first_page = __get_free_pages(GFP_KERNEL|__GFP_COMP, order);
 	if (first_page == 0UL)
 		return NULL;
@@ -336,108 +237,121 @@ void *sbus_alloc_consistent(struct sbus_
 	iommu = sdev->bus->iommu;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	iopte = alloc_consistent_cluster(iommu, size >> IO_PAGE_SHIFT);
-	if (iopte == NULL) {
-		spin_unlock_irqrestore(&iommu->lock, flags);
+	iopte = alloc_npages(iommu, size >> IO_PAGE_SHIFT);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+	if (unlikely(iopte == NULL)) {
 		free_pages(first_page, order);
 		return NULL;
 	}
 
-	/* Ok, we're committed at this point. */
-	*dvma_addr = MAP_BASE +	((iopte - iommu->page_table) << IO_PAGE_SHIFT);
+	*dvma_addr = (MAP_BASE +
+		      ((iopte - iommu->page_table) << IO_PAGE_SHIFT));
 	ret = (void *) first_page;
 	npages = size >> IO_PAGE_SHIFT;
+	first_page = __pa(first_page);
 	while (npages--) {
-		*iopte++ = __iopte(IOPTE_VALID | IOPTE_CACHE | IOPTE_WRITE |
-				   (__pa(first_page) & IOPTE_PAGE));
+		iopte_val(*iopte) = (IOPTE_VALID | IOPTE_CACHE |
+				     IOPTE_WRITE |
+				     (first_page & IOPTE_PAGE));
+		iopte++;
 		first_page += IO_PAGE_SIZE;
 	}
-	iommu_flush(iommu, *dvma_addr, size >> IO_PAGE_SHIFT);
-	spin_unlock_irqrestore(&iommu->lock, flags);
 
 	return ret;
 }
 
 void sbus_free_consistent(struct sbus_dev *sdev, size_t size, void *cpu, dma_addr_t dvma)
 {
-	unsigned long order, npages;
 	struct sbus_iommu *iommu;
-
-	if (size <= 0 || sdev == NULL || cpu == NULL)
-		return;
+	iopte_t *iopte;
+	unsigned long flags, order, npages;
 
 	npages = IO_PAGE_ALIGN(size) >> IO_PAGE_SHIFT;
 	iommu = sdev->bus->iommu;
+	iopte = iommu->page_table +
+		((dvma - MAP_BASE) >> IO_PAGE_SHIFT);
+
+	spin_lock_irqsave(&iommu->lock, flags);
+
+	free_npages(iommu, dvma - MAP_BASE, npages);
 
-	spin_lock_irq(&iommu->lock);
-	free_consistent_cluster(iommu, dvma, npages);
-	iommu_flush(iommu, dvma, npages);
-	spin_unlock_irq(&iommu->lock);
+	spin_unlock_irqrestore(&iommu->lock, flags);
 
 	order = get_order(size);
 	if (order < 10)
 		free_pages((unsigned long)cpu, order);
 }
 
-dma_addr_t sbus_map_single(struct sbus_dev *sdev, void *ptr, size_t size, int dir)
+dma_addr_t sbus_map_single(struct sbus_dev *sdev, void *ptr, size_t sz, int direction)
 {
-	struct sbus_iommu *iommu = sdev->bus->iommu;
-	unsigned long npages, pbase, flags;
-	iopte_t *iopte;
-	u32 dma_base, offset;
-	unsigned long iopte_bits;
+	struct sbus_iommu *iommu;
+	iopte_t *base;
+	unsigned long flags, npages, oaddr;
+	unsigned long i, base_paddr;
+	u32 bus_addr, ret;
+	unsigned long iopte_protection;
+
+	iommu = sdev->bus->iommu;
 
-	if (dir == SBUS_DMA_NONE)
+	if (unlikely(direction == SBUS_DMA_NONE))
 		BUG();
 
-	pbase = (unsigned long) ptr;
-	offset = (u32) (pbase & ~IO_PAGE_MASK);
-	size = (IO_PAGE_ALIGN(pbase + size) - (pbase & IO_PAGE_MASK));
-	pbase = (unsigned long) __pa(pbase & IO_PAGE_MASK);
+	oaddr = (unsigned long)ptr;
+	npages = IO_PAGE_ALIGN(oaddr + sz) - (oaddr & IO_PAGE_MASK);
+	npages >>= IO_PAGE_SHIFT;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	npages = size >> IO_PAGE_SHIFT;
-	iopte = alloc_streaming_cluster(iommu, npages);
-	if (iopte == NULL)
-		goto bad;
-	dma_base = MAP_BASE + ((iopte - iommu->page_table) << IO_PAGE_SHIFT);
-	npages = size >> IO_PAGE_SHIFT;
-	iopte_bits = IOPTE_VALID | IOPTE_STBUF | IOPTE_CACHE;
-	if (dir != SBUS_DMA_TODEVICE)
-		iopte_bits |= IOPTE_WRITE;
-	while (npages--) {
-		*iopte++ = __iopte(iopte_bits | (pbase & IOPTE_PAGE));
-		pbase += IO_PAGE_SIZE;
-	}
-	npages = size >> IO_PAGE_SHIFT;
+	base = alloc_npages(iommu, npages);
 	spin_unlock_irqrestore(&iommu->lock, flags);
 
-	return (dma_base | offset);
+	if (unlikely(!base))
+		BUG();
 
-bad:
-	spin_unlock_irqrestore(&iommu->lock, flags);
-	BUG();
-	return 0;
+	bus_addr = (MAP_BASE +
+		    ((base - iommu->page_table) << IO_PAGE_SHIFT));
+	ret = bus_addr | (oaddr & ~IO_PAGE_MASK);
+	base_paddr = __pa(oaddr & IO_PAGE_MASK);
+
+	iopte_protection = IOPTE_VALID | IOPTE_STBUF | IOPTE_CACHE;
+	if (direction != SBUS_DMA_TODEVICE)
+		iopte_protection |= IOPTE_WRITE;
+
+	for (i = 0; i < npages; i++, base++, base_paddr += IO_PAGE_SIZE)
+		iopte_val(*base) = iopte_protection | base_paddr;
+
+	return ret;
 }
 
-void sbus_unmap_single(struct sbus_dev *sdev, dma_addr_t dma_addr, size_t size, int direction)
+void sbus_unmap_single(struct sbus_dev *sdev, dma_addr_t bus_addr, size_t sz, int direction)
 {
 	struct sbus_iommu *iommu = sdev->bus->iommu;
-	u32 dma_base = dma_addr & IO_PAGE_MASK;
-	unsigned long flags;
+	iopte_t *base;
+	unsigned long flags, npages, i;
 
-	size = (IO_PAGE_ALIGN(dma_addr + size) - dma_base);
+	if (unlikely(direction == SBUS_DMA_NONE))
+		BUG();
+
+	npages = IO_PAGE_ALIGN(bus_addr + sz) - (bus_addr & IO_PAGE_MASK);
+	npages >>= IO_PAGE_SHIFT;
+	base = iommu->page_table +
+		((bus_addr - MAP_BASE) >> IO_PAGE_SHIFT);
+
+	bus_addr &= IO_PAGE_MASK;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	free_streaming_cluster(iommu, dma_base, size >> IO_PAGE_SHIFT);
-	sbus_strbuf_flush(iommu, dma_base, size >> IO_PAGE_SHIFT, direction);
+	sbus_strbuf_flush(iommu, bus_addr, npages, direction);
+	for (i = 0; i < npages; i++)
+		iopte_val(base[i]) = 0UL;
+	free_npages(iommu, bus_addr - MAP_BASE, npages);
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
 #define SG_ENT_PHYS_ADDRESS(SG)	\
 	(__pa(page_address((SG)->page)) + (SG)->offset)
 
-static inline void fill_sg(iopte_t *iopte, struct scatterlist *sg, int nused, int nelems, unsigned long iopte_bits)
+static inline void fill_sg(iopte_t *iopte, struct scatterlist *sg,
+			   int nused, int nelems, unsigned long iopte_protection)
 {
 	struct scatterlist *dma_sg = sg;
 	struct scatterlist *sg_end = sg + nelems;
@@ -462,7 +376,7 @@ static inline void fill_sg(iopte_t *iopt
 			for (;;) {
 				unsigned long tmp;
 
-				tmp = (unsigned long) SG_ENT_PHYS_ADDRESS(sg);
+				tmp = SG_ENT_PHYS_ADDRESS(sg);
 				len = sg->length;
 				if (((tmp ^ pteval) >> IO_PAGE_SHIFT) != 0UL) {
 					pteval = tmp & IO_PAGE_MASK;
@@ -478,7 +392,7 @@ static inline void fill_sg(iopte_t *iopt
 				sg++;
 			}
 
-			pteval = ((pteval & IOPTE_PAGE) | iopte_bits);
+			pteval = iopte_protection | (pteval & IOPTE_PAGE);
 			while (len > 0) {
 				*iopte++ = __iopte(pteval);
 				pteval += IO_PAGE_SIZE;
@@ -509,103 +423,111 @@ static inline void fill_sg(iopte_t *iopt
 	}
 }
 
-int sbus_map_sg(struct sbus_dev *sdev, struct scatterlist *sg, int nents, int dir)
+int sbus_map_sg(struct sbus_dev *sdev, struct scatterlist *sglist, int nelems, int direction)
 {
-	struct sbus_iommu *iommu = sdev->bus->iommu;
-	unsigned long flags, npages;
-	iopte_t *iopte;
+	struct sbus_iommu *iommu;
+	unsigned long flags, npages, iopte_protection;
+	iopte_t *base;
 	u32 dma_base;
 	struct scatterlist *sgtmp;
 	int used;
-	unsigned long iopte_bits;
-
-	if (dir == SBUS_DMA_NONE)
-		BUG();
 
 	/* Fast path single entry scatterlists. */
-	if (nents == 1) {
-		sg->dma_address =
+	if (nelems == 1) {
+		sglist->dma_address =
 			sbus_map_single(sdev,
-					(page_address(sg->page) + sg->offset),
-					sg->length, dir);
-		sg->dma_length = sg->length;
+					(page_address(sglist->page) + sglist->offset),
+					sglist->length, direction);
+		sglist->dma_length = sglist->length;
 		return 1;
 	}
 
-	npages = prepare_sg(sg, nents);
+	iommu = sdev->bus->iommu;
+
+	if (unlikely(direction == SBUS_DMA_NONE))
+		BUG();
+
+	npages = prepare_sg(sglist, nelems);
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	iopte = alloc_streaming_cluster(iommu, npages);
-	if (iopte == NULL)
-		goto bad;
-	dma_base = MAP_BASE + ((iopte - iommu->page_table) << IO_PAGE_SHIFT);
+	base = alloc_npages(iommu, npages);
+	spin_unlock_irqrestore(&iommu->lock, flags);
+
+	if (unlikely(base == NULL))
+		BUG();
+
+	dma_base = MAP_BASE +
+		((base - iommu->page_table) << IO_PAGE_SHIFT);
 
 	/* Normalize DVMA addresses. */
-	sgtmp = sg;
-	used = nents;
+	used = nelems;
 
+	sgtmp = sglist;
 	while (used && sgtmp->dma_length) {
 		sgtmp->dma_address += dma_base;
 		sgtmp++;
 		used--;
 	}
-	used = nents - used;
+	used = nelems - used;
+
+	iopte_protection = IOPTE_VALID | IOPTE_STBUF | IOPTE_CACHE;
+	if (direction != SBUS_DMA_TODEVICE)
+		iopte_protection |= IOPTE_WRITE;
 
-	iopte_bits = IOPTE_VALID | IOPTE_STBUF | IOPTE_CACHE;
-	if (dir != SBUS_DMA_TODEVICE)
-		iopte_bits |= IOPTE_WRITE;
+	fill_sg(base, sglist, used, nelems, iopte_protection);
 
-	fill_sg(iopte, sg, used, nents, iopte_bits);
 #ifdef VERIFY_SG
-	verify_sglist(sg, nents, iopte, npages);
+	verify_sglist(sglist, nelems, base, npages);
 #endif
-	spin_unlock_irqrestore(&iommu->lock, flags);
 
 	return used;
-
-bad:
-	spin_unlock_irqrestore(&iommu->lock, flags);
-	BUG();
-	return 0;
 }
 
-void sbus_unmap_sg(struct sbus_dev *sdev, struct scatterlist *sg, int nents, int direction)
+void sbus_unmap_sg(struct sbus_dev *sdev, struct scatterlist *sglist, int nelems, int direction)
 {
-	unsigned long size, flags;
 	struct sbus_iommu *iommu;
-	u32 dvma_base;
-	int i;
+	iopte_t *base;
+	unsigned long flags, i, npages;
+	u32 bus_addr;
 
-	/* Fast path single entry scatterlists. */
-	if (nents == 1) {
-		sbus_unmap_single(sdev, sg->dma_address, sg->dma_length, direction);
-		return;
-	}
+	if (unlikely(direction == SBUS_DMA_NONE))
+		BUG();
 
-	dvma_base = sg[0].dma_address & IO_PAGE_MASK;
-	for (i = 0; i < nents; i++) {
-		if (sg[i].dma_length == 0)
+	iommu = sdev->bus->iommu;
+
+	bus_addr = sglist->dma_address & IO_PAGE_MASK;
+
+	for (i = 1; i < nelems; i++)
+		if (sglist[i].dma_length == 0)
 			break;
-	}
 	i--;
-	size = IO_PAGE_ALIGN(sg[i].dma_address + sg[i].dma_length) - dvma_base;
+	npages = (IO_PAGE_ALIGN(sglist[i].dma_address + sglist[i].dma_length) -
+		  bus_addr) >> IO_PAGE_SHIFT;
+
+	base = iommu->page_table +
+		((bus_addr - MAP_BASE) >> IO_PAGE_SHIFT);
 
-	iommu = sdev->bus->iommu;
 	spin_lock_irqsave(&iommu->lock, flags);
-	free_streaming_cluster(iommu, dvma_base, size >> IO_PAGE_SHIFT);
-	sbus_strbuf_flush(iommu, dvma_base, size >> IO_PAGE_SHIFT, direction);
+	sbus_strbuf_flush(iommu, bus_addr, npages, direction);
+	for (i = 0; i < npages; i++)
+		iopte_val(base[i]) = 0UL;
+	free_npages(iommu, bus_addr - MAP_BASE, npages);
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
-void sbus_dma_sync_single_for_cpu(struct sbus_dev *sdev, dma_addr_t base, size_t size, int direction)
+void sbus_dma_sync_single_for_cpu(struct sbus_dev *sdev, dma_addr_t bus_addr, size_t sz, int direction)
 {
-	struct sbus_iommu *iommu = sdev->bus->iommu;
-	unsigned long flags;
+	struct sbus_iommu *iommu;
+	unsigned long flags, npages;
+
+	iommu = sdev->bus->iommu;
 
-	size = (IO_PAGE_ALIGN(base + size) - (base & IO_PAGE_MASK));
+	npages = IO_PAGE_ALIGN(bus_addr + sz) - (bus_addr & IO_PAGE_MASK);
+	npages >>= IO_PAGE_SHIFT;
+	bus_addr &= IO_PAGE_MASK;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	sbus_strbuf_flush(iommu, base & IO_PAGE_MASK, size >> IO_PAGE_SHIFT, direction);
+	sbus_strbuf_flush(iommu, bus_addr, npages, direction);
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
@@ -613,23 +535,25 @@ void sbus_dma_sync_single_for_device(str
 {
 }
 
-void sbus_dma_sync_sg_for_cpu(struct sbus_dev *sdev, struct scatterlist *sg, int nents, int direction)
+void sbus_dma_sync_sg_for_cpu(struct sbus_dev *sdev, struct scatterlist *sglist, int nelems, int direction)
 {
-	struct sbus_iommu *iommu = sdev->bus->iommu;
-	unsigned long flags, size;
-	u32 base;
-	int i;
+	struct sbus_iommu *iommu;
+	unsigned long flags, npages, i;
+	u32 bus_addr;
+
+	iommu = sdev->bus->iommu;
 
-	base = sg[0].dma_address & IO_PAGE_MASK;
-	for (i = 0; i < nents; i++) {
-		if (sg[i].dma_length == 0)
+	bus_addr = sglist[0].dma_address & IO_PAGE_MASK;
+	for (i = 0; i < nelems; i++) {
+		if (!sglist[i].dma_length)
 			break;
 	}
 	i--;
-	size = IO_PAGE_ALIGN(sg[i].dma_address + sg[i].dma_length) - base;
+	npages = (IO_PAGE_ALIGN(sglist[i].dma_address + sglist[i].dma_length)
+		  - bus_addr) >> IO_PAGE_SHIFT;
 
 	spin_lock_irqsave(&iommu->lock, flags);
-	sbus_strbuf_flush(iommu, base, size >> IO_PAGE_SHIFT, direction);
+	sbus_strbuf_flush(iommu, bus_addr, npages, direction);
 	spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
@@ -1104,7 +1028,7 @@ static void __init sbus_iommu_init(int _
 	struct linux_prom64_registers *pr;
 	struct device_node *dp;
 	struct sbus_iommu *iommu;
-	unsigned long regs, tsb_base;
+	unsigned long regs;
 	u64 control;
 	int i;
 
@@ -1132,14 +1056,6 @@ static void __init sbus_iommu_init(int _
 
 	memset(iommu, 0, sizeof(*iommu));
 
-	/* We start with no consistent mappings. */
-	iommu->lowest_consistent_map = CLUSTER_NPAGES;
-
-	for (i = 0; i < NCLUSTERS; i++) {
-		iommu->alloc_info[i].flush = 0;
-		iommu->alloc_info[i].next = 0;
-	}
-
 	/* Setup spinlock. */
 	spin_lock_init(&iommu->lock);
 
@@ -1159,25 +1075,13 @@ static void __init sbus_iommu_init(int _
 	       sbus->portid, regs);
 
 	/* Setup for TSB_SIZE=7, TBW_SIZE=0, MMU_DE=1, MMU_EN=1 */
+	sbus_iommu_table_init(iommu, IO_TSB_SIZE);
+
 	control = upa_readq(iommu->iommu_regs + IOMMU_CONTROL);
 	control = ((7UL << 16UL)	|
 		   (0UL << 2UL)		|
 		   (1UL << 1UL)		|
 		   (1UL << 0UL));
-
-	/* Using the above configuration we need 1MB iommu page
-	 * table (128K ioptes * 8 bytes per iopte).  This is
-	 * page order 7 on UltraSparc.
-	 */
-	tsb_base = __get_free_pages(GFP_ATOMIC, get_order(IO_TSB_SIZE));
-	if (tsb_base == 0UL) {
-		prom_printf("sbus_iommu_init: Fatal error, cannot alloc TSB table.\n");
-		prom_halt();
-	}
-
-	iommu->page_table = (iopte_t *) tsb_base;
-	memset(iommu->page_table, 0, IO_TSB_SIZE);
-
 	upa_writeq(control, iommu->iommu_regs + IOMMU_CONTROL);
 
 	/* Clean out any cruft in the IOMMU using
@@ -1195,7 +1099,7 @@ static void __init sbus_iommu_init(int _
 	upa_readq(iommu->sbus_control_reg);
 
 	/* Give the TSB to SYSIO. */
-	upa_writeq(__pa(tsb_base), iommu->iommu_regs + IOMMU_TSBBASE);
+	upa_writeq(__pa(iommu->page_table), iommu->iommu_regs + IOMMU_TSBBASE);
 
 	/* Setup streaming buffer, DE=1 SB_EN=1 */
 	control = (1UL << 1UL) | (1UL << 0UL);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 11/33] Fix qlogicpti DMA unmapping
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (10 preceding siblings ...)
  2007-04-26 16:55   ` [patch 10/33] Fix sparc64 SBUS IOMMU allocator Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 12/33] Fix compat sys_ipc() on sparc64 Greg KH
                     ` (24 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, David S. Miller

[-- Attachment #1: fix-qlogicpti-dma-unmapping.patch --]
[-- Type: text/plain, Size: 978 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: David Miller <davem@davemloft.net>

[SCSI] QLOGICPTI: Do not unmap DMA unless we actually mapped something.

We only map DMA when cmd->request_bufflen is non-zero for non-sg
buffers, we thus should make the same check when unmapping.

Based upon a report from Pasi Pirhonen.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/scsi/qlogicpti.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -1281,7 +1281,7 @@ static struct scsi_cmnd *qlogicpti_intr_
 				      (struct scatterlist *)Cmnd->request_buffer,
 				      Cmnd->use_sg,
 				      Cmnd->sc_data_direction);
-		} else {
+		} else if (Cmnd->request_bufflen) {
 			sbus_unmap_single(qpti->sdev,
 					  (__u32)((unsigned long)Cmnd->SCp.ptr),
 					  Cmnd->request_bufflen,

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 12/33] Fix compat sys_ipc() on sparc64
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (11 preceding siblings ...)
  2007-04-26 16:55   ` [patch 11/33] Fix qlogicpti DMA unmapping Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 13/33] Fix bogus inline directive in sparc64 PCI code Greg KH
                     ` (23 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, David S. Miller

[-- Attachment #1: fix-compat-sys_ipc-on-sparc64.patch --]
[-- Type: text/plain, Size: 2170 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: David Miller <davem@davemloft.net>

The 32-bit syscall trampoline for sys_ipc() on sparc64
was sign extending various arguments, which is bogus when
using compat_sys_ipc() since that function expects zero
extended copies of all the arguments.

This bug breaks the sparc64 kernel when built with gcc-4.2.x
among other things.

[SPARC64]: Fix arg passing to compat_sys_ipc().

Do not sign extend args using the sys32_ipc stub, that is
buggy and unnecessary.

Based upon an excellent report by Mikael Pettersson.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/sparc64/kernel/sys32.S   |    1 -
 arch/sparc64/kernel/systbls.S |    2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

--- a/arch/sparc64/kernel/sys32.S
+++ b/arch/sparc64/kernel/sys32.S
@@ -91,7 +91,6 @@ SIGN1(sys32_select, compat_sys_select, %
 SIGN1(sys32_mkdir, sys_mkdir, %o1)
 SIGN3(sys32_futex, compat_sys_futex, %o1, %o2, %o5)
 SIGN1(sys32_sysfs, compat_sys_sysfs, %o0)
-SIGN3(sys32_ipc, compat_sys_ipc, %o1, %o2, %o3)
 SIGN2(sys32_sendfile, compat_sys_sendfile, %o0, %o1)
 SIGN2(sys32_sendfile64, compat_sys_sendfile64, %o0, %o1)
 SIGN1(sys32_prctl, sys_prctl, %o0)
--- a/arch/sparc64/kernel/systbls.S
+++ b/arch/sparc64/kernel/systbls.S
@@ -62,7 +62,7 @@ sys_call_table32:
 /*200*/	.word sys32_ssetmask, sys_sigsuspend, compat_sys_newlstat, sys_uselib, compat_sys_old_readdir
 	.word sys32_readahead, sys32_socketcall, sys32_syslog, sys32_lookup_dcookie, sys32_fadvise64
 /*210*/	.word sys32_fadvise64_64, sys32_tgkill, sys32_waitpid, sys_swapoff, sys32_sysinfo
-	.word sys32_ipc, sys32_sigreturn, sys_clone, sys32_ioprio_get, compat_sys_adjtimex
+	.word compat_sys_ipc, sys32_sigreturn, sys_clone, sys32_ioprio_get, compat_sys_adjtimex
 /*220*/	.word sys32_sigprocmask, sys_ni_syscall, sys32_delete_module, sys_ni_syscall, sys32_getpgid
 	.word sys32_bdflush, sys32_sysfs, sys_nis_syscall, sys32_setfsuid16, sys32_setfsgid16
 /*230*/	.word sys32_select, compat_sys_time, sys32_splice, compat_sys_stime, compat_sys_statfs64

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 13/33] Fix bogus inline directive in sparc64 PCI code
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (12 preceding siblings ...)
  2007-04-26 16:55   ` [patch 12/33] Fix compat sys_ipc() on sparc64 Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:55   ` [patch 14/33] Fix errors in tcp_memcalculations Greg KH
                     ` (22 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, Tom spot Callaway,
	David S. Miller

[-- Attachment #1: fix-bogus-inline-directive-in-sparc64-pci-code.patch --]
[-- Type: text/plain, Size: 1049 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Tom "spot" Callaway <tcallawa@redhat.com>

[SPARC64]: Fix inline directive in pci_iommu.c

While building a test kernel for the new esp driver (against
git-current), I hit this bug. Trivial fix, put the inline declaration
in the right place. :)

Signed-off-by: Tom "spot" Callaway <tcallawa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/sparc64/kernel/pci_iommu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/sparc64/kernel/pci_iommu.c
+++ b/arch/sparc64/kernel/pci_iommu.c
@@ -64,7 +64,7 @@ static void __iommu_flushall(struct pci_
 #define IOPTE_IS_DUMMY(iommu, iopte)	\
 	((iopte_val(*iopte) & IOPTE_PAGE) == (iommu)->dummy_page_pa)
 
-static void inline iopte_make_dummy(struct pci_iommu *iommu, iopte_t *iopte)
+static inline void iopte_make_dummy(struct pci_iommu *iommu, iopte_t *iopte)
 {
 	unsigned long val = iopte_val(*iopte);
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 14/33] Fix errors in tcp_memcalculations.
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (13 preceding siblings ...)
  2007-04-26 16:55   ` [patch 13/33] Fix bogus inline directive in sparc64 PCI code Greg KH
@ 2007-04-26 16:55   ` Greg KH
  2007-04-26 16:56   ` [patch 15/33] Fix netpoll UDP input path Greg KH
                     ` (21 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, John Heffner,
	David S. Miller

[-- Attachment #1: fix-errors-in-tcp_mem-calculations.patch --]
[-- Type: text/plain, Size: 1804 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: John Heffner <jheffner@psc.edu>

In 2.6.18 a change was made to the tcp_mem[] calculations,
but this causes regressions for some folks up to 2.6.20

The following fix to smooth out the calculation from the
pending 2.6.21 tree by John Heffner fixes the problem for
these folks.

[TCP]: Fix tcp_mem[] initialization.

Change tcp_mem initialization function.  The fraction of total memory
is now a continuous function of memory size, and independent of page
size.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/ipv4/tcp.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2457,11 +2457,18 @@ void __init tcp_init(void)
 		sysctl_max_syn_backlog = 128;
 	}
 
-	/* Allow no more than 3/4 kernel memory (usually less) allocated to TCP */
-	sysctl_tcp_mem[0] = (1536 / sizeof (struct inet_bind_hashbucket)) << order;
-	sysctl_tcp_mem[1] = sysctl_tcp_mem[0] * 4 / 3;
+	/* Set the pressure threshold to be a fraction of global memory that
+	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of
+	 * memory, with a floor of 128 pages.
+	 */
+	limit = min(nr_all_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
+	limit = (limit * (nr_all_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = max(limit, 128UL);
+	sysctl_tcp_mem[0] = limit / 4 * 3;
+	sysctl_tcp_mem[1] = limit;
 	sysctl_tcp_mem[2] = sysctl_tcp_mem[0] * 2;
 
+	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7);
 	max_share = min(4UL*1024*1024, limit);
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 15/33] Fix netpoll UDP input path
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (14 preceding siblings ...)
  2007-04-26 16:55   ` [patch 14/33] Fix errors in tcp_memcalculations Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 16/33] Fix IRDA oopser Greg KH
                     ` (20 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, Aubrey.Li,
	David S. Miller

[-- Attachment #1: fix-netpoll-udp-input-path.patch --]
[-- Type: text/plain, Size: 1210 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Aubrey.Li <aubreylee@gmail.com>

Netpoll UDP input handler needs to pull up the UDP headers
and handle receive checksum offloading properly just like
the normal UDP input path does else we get corrupted
checksums.

[NET]: Fix UDP checksum issue in net poll mode.

In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.

Signed-off-by: Aubrey.Li <aubreylee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/core/netpoll.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -471,6 +471,13 @@ int __netpoll_rx(struct sk_buff *skb)
 	if (skb->len < len || len < iph->ihl*4)
 		goto out;
 
+	/*
+	 * Our transport medium may have padded the buffer out.
+	 * Now We trim to the true length of the frame.
+	 */
+	if (pskb_trim_rcsum(skb, len))
+		goto out;
+
 	if (iph->protocol != IPPROTO_UDP)
 		goto out;
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 16/33] Fix IRDA oopser
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (15 preceding siblings ...)
  2007-04-26 16:56   ` [patch 15/33] Fix netpoll UDP input path Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 17/33] cache_k8_northbridges() overflows beyond allocation Greg KH
                     ` (19 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, bunk, Olaf Kirch,
	Samuel Ortiz, David S. Miller

[-- Attachment #1: fix-irda-oops-er.patch --]
[-- Type: text/plain, Size: 1657 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Olaf Kirch <olaf.kirch@oracle.com>

This fixes and OOPS due to incorrect socket orpahning in the
IRDA stack.

[IrDA]: Correctly handling socket error

This patch fixes an oops first reported in mid 2006 - see
http://lkml.org/lkml/2006/8/29/358 The cause of this bug report is that
when an error is signalled on the socket, irda_recvmsg_stream returns
without removing a local wait_queue variable from the socket's sk_sleep
queue. This causes havoc further down the road.

In response to this problem, a patch was made that invoked sock_orphan on
the socket when receiving a disconnect indication. This is not a good fix,
as this sets sk_sleep to NULL, causing applications sleeping in recvmsg
(and other places) to oops.

This is against the latest net-2.6 and should be considered for -stable
inclusion.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/irda/af_irda.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -138,7 +138,6 @@ static void irda_disconnect_indication(v
 		sk->sk_shutdown |= SEND_SHUTDOWN;
 
 		sk->sk_state_change(sk);
-                sock_orphan(sk);
 		release_sock(sk);
 
 		/* Close our TSAP.
@@ -1446,7 +1445,7 @@ static int irda_recvmsg_stream(struct ki
 			 */
 			ret = sock_error(sk);
 			if (ret)
-				break;
+				;
 			else if (sk->sk_shutdown & RCV_SHUTDOWN)
 				;
 			else if (noblock)

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 17/33] cache_k8_northbridges() overflows beyond allocation
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (16 preceding siblings ...)
  2007-04-26 16:56   ` [patch 16/33] Fix IRDA oopser Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 18/33] exec.c: fix coredump to pipe problem and obscure "security hole" Greg KH
                     ` (18 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Andi Kleen, Badari Pulavarty

[-- Attachment #1: cache_k8_northbridges-overflows-beyond-allocation.patch --]
[-- Type: text/plain, Size: 1104 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Badari Pulavarty <pbadari@gmail.com>

cache_k8_northbridges() overflows beyond allocation

cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/x86_64/kernel/k8.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86_64/kernel/k8.c
+++ b/arch/x86_64/kernel/k8.c
@@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
 	dev = NULL;
 	i = 0;
 	while ((dev = next_k8_northbridge(dev)) != NULL) {
-		k8_northbridges[i++] = dev;
-		pci_read_config_dword(dev, 0x9c, &flush_words[i]);
+		k8_northbridges[i] = dev;
+		pci_read_config_dword(dev, 0x9c, &flush_words[i++]);
 	}
 	k8_northbridges[i] = NULL;
 	return 0;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 18/33] exec.c: fix coredump to pipe problem and obscure "security hole"
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (17 preceding siblings ...)
  2007-04-26 16:56   ` [patch 17/33] cache_k8_northbridges() overflows beyond allocation Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 19/33] NFS: Fix an Oops in nfs_setattr() Greg KH
                     ` (17 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable, git-commits-head
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Alan Cox

[-- Attachment #1: exec.c-fix-coredump-to-pipe-problem-and-obscure-security-hole.patch --]
[-- Type: text/plain, Size: 2918 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Alan Cox <alan@lxorguk.ukuu.org.uk>

exec.c: fix coredump to pipe problem and obscure "security hole"

The patch checks for "|" in the pattern not the output and doesn't nail a
pid on to a piped name (as it is a program name not a file)

Also fixes a very very obscure security corner case.  If you happen to have
decided on a core pattern that starts with the program name then the user
can run a program called "|myevilhack" as it stands.  I doubt anyone does
this.

Signed-off-by: Alan Cox <alan@redhat.com>
Confirmed-by: Christopher S. Aker <caker@theshore.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/exec.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1246,13 +1246,17 @@ EXPORT_SYMBOL(set_binfmt);
  * name into corename, which must have space for at least
  * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator.
  */
-static void format_corename(char *corename, const char *pattern, long signr)
+static int format_corename(char *corename, const char *pattern, long signr)
 {
 	const char *pat_ptr = pattern;
 	char *out_ptr = corename;
 	char *const out_end = corename + CORENAME_MAX_SIZE;
 	int rc;
 	int pid_in_pattern = 0;
+	int ispipe = 0;
+
+	if (*pattern == '|')
+		ispipe = 1;
 
 	/* Repeat as long as we have more pattern to process and more output
 	   space */
@@ -1343,8 +1347,8 @@ static void format_corename(char *corena
 	 *
 	 * If core_pattern does not include a %p (as is the default)
 	 * and core_uses_pid is set, then .%pid will be appended to
-	 * the filename */
-	if (!pid_in_pattern
+	 * the filename. Do not do this for piped commands. */
+	if (!ispipe && !pid_in_pattern
             && (core_uses_pid || atomic_read(&current->mm->mm_users) != 1)) {
 		rc = snprintf(out_ptr, out_end - out_ptr,
 			      ".%d", current->tgid);
@@ -1352,8 +1356,9 @@ static void format_corename(char *corena
 			goto out;
 		out_ptr += rc;
 	}
-      out:
+out:
 	*out_ptr = 0;
+	return ispipe;
 }
 
 static void zap_process(struct task_struct *start)
@@ -1504,16 +1509,15 @@ int do_coredump(long signr, int exit_cod
 	 * uses lock_kernel()
 	 */
  	lock_kernel();
-	format_corename(corename, core_pattern, signr);
+	ispipe = format_corename(corename, core_pattern, signr);
 	unlock_kernel();
- 	if (corename[0] == '|') {
+ 	if (ispipe) {
 		/* SIGPIPE can happen, but it's just never processed */
  		if(call_usermodehelper_pipe(corename+1, NULL, NULL, &file)) {
  			printk(KERN_INFO "Core dump to %s pipe failed\n",
 			       corename);
  			goto fail_unlock;
  		}
-		ispipe = 1;
  	} else
  		file = filp_open(corename,
 				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 19/33] NFS: Fix an Oops in nfs_setattr()
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (18 preceding siblings ...)
  2007-04-26 16:56   ` [patch 18/33] exec.c: fix coredump to pipe problem and obscure "security hole" Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 20/33] x86: Dont probe for DDC on VBE1.2 Greg KH
                     ` (16 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Trond Myklebust

[-- Attachment #1: nfs-fix-an-oops-in-nfs_setattr.patch --]
[-- Type: text/plain, Size: 1410 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Fix an Oops in nfs_setattr()

It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/nfs/dir.c   |    3 ++-
 fs/nfs/inode.c |    6 ++++--
 2 files changed, 6 insertions(+), 3 deletions(-)

--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1659,7 +1659,8 @@ go_ahead:
 	 * ... prune child dentries and writebacks if needed.
 	 */
 	if (atomic_read(&old_dentry->d_count) > 1) {
-		nfs_wb_all(old_inode);
+		if (S_ISREG(old_inode->i_mode))
+			nfs_wb_all(old_inode);
 		shrink_dcache_parent(old_dentry);
 	}
 	nfs_inode_return_delegation(old_inode);
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -334,8 +334,10 @@ nfs_setattr(struct dentry *dentry, struc
 	lock_kernel();
 	nfs_begin_data_update(inode);
 	/* Write all dirty data */
-	filemap_write_and_wait(inode->i_mapping);
-	nfs_wb_all(inode);
+	if (S_ISREG(inode->i_mode)) {
+		filemap_write_and_wait(inode->i_mapping);
+		nfs_wb_all(inode);
+	}
 	/*
 	 * Return any delegations if we're going to change ACLs
 	 */

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 20/33] x86: Dont probe for DDC on VBE1.2
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (19 preceding siblings ...)
  2007-04-26 16:56   ` [patch 19/33] NFS: Fix an Oops in nfs_setattr() Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 21/33] vt: fix potential race in VT_WAITACTIVE handler Greg KH
                     ` (15 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Andi Kleen, Zwane Mwaikambo

[-- Attachment #1: x86-don-t-probe-for-ddc-on-vbe1.2.patch --]
[-- Type: text/plain, Size: 2969 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Zwane Mwaikambo <zwane@infradead.org>

[PATCH] x86: Don't probe for DDC on VBE1.2

VBE1.2 doesn't support function 15h (DDC) resulting in a 'hang' whilst
uncompressing kernel with some video cards. Make sure we check VBE version
before fiddling around with DDC.

http://bugzilla.kernel.org/show_bug.cgi?id=1458

Opened: 2003-10-30 09:12 Last update: 2007-02-13 22:03

Much thanks to Tobias Hain for help in testing and investigating the bug.
Tested on;

i386, Chips & Technologies 65548 VESA VBE 1.2
CONFIG_VIDEO_SELECT=Y
CONFIG_FIRMWARE_EDID=Y

Untested on x86_64.

Signed-off-by: Zwane Mwaikambo <zwane@infradead.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/i386/boot/video.S   |   14 ++++++++++++++
 arch/x86_64/boot/video.S |   14 ++++++++++++++
 2 files changed, 28 insertions(+)

--- a/arch/i386/boot/video.S
+++ b/arch/i386/boot/video.S
@@ -571,6 +571,16 @@ setr1:	lodsw
 	jmp	_m_s
 
 check_vesa:
+#ifdef CONFIG_FIRMWARE_EDID
+	leaw	modelist+1024, %di
+	movw	$0x4f00, %ax
+	int	$0x10
+	cmpw	$0x004f, %ax
+	jnz	setbad
+
+	movw	4(%di), %ax
+	movw	%ax, vbe_version
+#endif
 	leaw	modelist+1024, %di
 	subb	$VIDEO_FIRST_VESA>>8, %bh
 	movw	%bx, %cx			# Get mode information structure
@@ -1945,6 +1955,9 @@ store_edid:
 	rep
 	stosl
 
+	cmpw	$0x0200, vbe_version		# only do EDID on >= VBE2.0
+	jl	no_edid
+
 	pushw   %es				# save ES
 	xorw    %di, %di                        # Report Capability
 	pushw   %di
@@ -1987,6 +2000,7 @@ do_restore:	.byte	0	# Screen contents al
 svga_prefix:	.byte	VIDEO_FIRST_BIOS>>8	# Default prefix for BIOS modes
 graphic_mode:	.byte	0	# Graphic mode with a linear frame buffer
 dac_size:	.byte	6	# DAC bit depth
+vbe_version:	.word	0	# VBE bios version
 
 # Status messages
 keymsg:		.ascii	"Press <RETURN> to see video modes available, "
--- a/arch/x86_64/boot/video.S
+++ b/arch/x86_64/boot/video.S
@@ -571,6 +571,16 @@ setr1:	lodsw
 	jmp	_m_s
 
 check_vesa:
+#ifdef CONFIG_FIRMWARE_EDID
+	leaw	modelist+1024, %di
+	movw	$0x4f00, %ax
+	int	$0x10
+	cmpw	$0x004f, %ax
+	jnz	setbad
+
+	movw	4(%di), %ax
+	movw	%ax, vbe_version
+#endif
 	leaw	modelist+1024, %di
 	subb	$VIDEO_FIRST_VESA>>8, %bh
 	movw	%bx, %cx			# Get mode information structure
@@ -1945,6 +1955,9 @@ store_edid:
 	rep
 	stosl
 
+	cmpw	$0x0200, vbe_version		# only do EDID on >= VBE2.0
+	jl	no_edid
+
 	pushw   %es				# save ES
 	xorw    %di, %di                        # Report Capability
 	pushw   %di
@@ -1987,6 +2000,7 @@ do_restore:	.byte	0	# Screen contents al
 svga_prefix:	.byte	VIDEO_FIRST_BIOS>>8	# Default prefix for BIOS modes
 graphic_mode:	.byte	0	# Graphic mode with a linear frame buffer
 dac_size:	.byte	6	# DAC bit depth
+vbe_version:	.word	0	# VBE bios version
 
 # Status messages
 keymsg:		.ascii	"Press <RETURN> to see video modes available, "

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 21/33] vt: fix potential race in VT_WAITACTIVE handler
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (20 preceding siblings ...)
  2007-04-26 16:56   ` [patch 20/33] x86: Dont probe for DDC on VBE1.2 Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 22/33] 3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling Greg KH
                     ` (14 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Antonino Daplas,
	Michal Januszewski

[-- Attachment #1: vt-fix-potential-race-in-vt_waitactive-handler.patch --]
[-- Type: text/plain, Size: 1685 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Michal Januszewski <spock@gentoo.org>

[PATCH] vt: fix potential race in VT_WAITACTIVE handler

On a multiprocessor machine the VT_WAITACTIVE ioctl call may return 0 if
fg_console has already been updated in redraw_screen() but the console
switch itself hasn't been completed.  Fix this by checking fg_console in
vt_waitactive() with the console sem held.

Signed-off-by: Michal Januszewski <spock@gentoo.org>
Acked-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/char/vt_ioctl.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

--- a/drivers/char/vt_ioctl.c
+++ b/drivers/char/vt_ioctl.c
@@ -1038,10 +1038,22 @@ int vt_waitactive(int vt)
 
 	add_wait_queue(&vt_activate_queue, &wait);
 	for (;;) {
-		set_current_state(TASK_INTERRUPTIBLE);
 		retval = 0;
-		if (vt == fg_console)
+
+		/*
+		 * Synchronize with redraw_screen(). By acquiring the console
+		 * semaphore we make sure that the console switch is completed
+		 * before we return. If we didn't wait for the semaphore, we
+		 * could return at a point where fg_console has already been
+		 * updated, but the console switch hasn't been completed.
+		 */
+		acquire_console_sem();
+		set_current_state(TASK_INTERRUPTIBLE);
+		if (vt == fg_console) {
+			release_console_sem();
 			break;
+		}
+		release_console_sem();
 		retval = -EINTR;
 		if (signal_pending(current))
 			break;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 22/33] 3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (21 preceding siblings ...)
  2007-04-26 16:56   ` [patch 21/33] vt: fix potential race in VT_WAITACTIVE handler Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 23/33] fix bogon in /dev/mem mmaping on nommu Greg KH
                     ` (13 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, adam radford,
	James Bottomley

[-- Attachment #1: 3w-xxxx-fix-oops-caused-by-incorrect-request_sense-handling.patch --]
[-- Type: text/plain, Size: 1738 bytes --]


-stable review patch.  If anyone has any objections, please let us know.

------------------
From: James Bottomley <James.Bottomley@steeleye.com>

3w-xxxx emulates a REQUEST_SENSE response by simply returning nothing.
Unfortunately, it's assuming that the REQUEST_SENSE command is
implemented with use_sg == 0, which is no longer the case.  The oops
occurs because it's clearing the scatterlist in request_buffer instead
of the memory region.

This is fixed by using tw_transfer_internal() to transfer correctly to
the scatterlist.

Acked-by: adam radford <aradford@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/scsi/3w-xxxx.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/drivers/scsi/3w-xxxx.c
+++ b/drivers/scsi/3w-xxxx.c
@@ -1864,10 +1864,17 @@ static int tw_scsiop_read_write(TW_Devic
 /* This function will handle the request sense scsi command */
 static int tw_scsiop_request_sense(TW_Device_Extension *tw_dev, int request_id)
 {
+	char request_buffer[18];
+
 	dprintk(KERN_NOTICE "3w-xxxx: tw_scsiop_request_sense()\n");
 
-	/* For now we just zero the request buffer */
-	memset(tw_dev->srb[request_id]->request_buffer, 0, tw_dev->srb[request_id]->request_bufflen);
+	memset(request_buffer, 0, sizeof(request_buffer));
+	request_buffer[0] = 0x70; /* Immediate fixed format */
+	request_buffer[7] = 10;	/* minimum size per SPC: 18 bytes */
+	/* leave all other fields zero, giving effectively NO_SENSE return */
+	tw_transfer_internal(tw_dev, request_id, request_buffer,
+			     sizeof(request_buffer));
+
 	tw_dev->state[request_id] = TW_S_COMPLETED;
 	tw_state_request_finish(tw_dev, request_id);
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 23/33] fix bogon in /dev/mem mmaping on nommu
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (22 preceding siblings ...)
  2007-04-26 16:56   ` [patch 22/33] 3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 24/33] fix OOM killing processes wrongly thought MPOL_BIND Greg KH
                     ` (12 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, dhowells, benh

[-- Attachment #1: fix-bogon-in-dev-mem-mmap-ing-on-nommu.patch --]
[-- Type: text/plain, Size: 957 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/char/mem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -248,7 +248,7 @@ static unsigned long get_unmapped_area_m
 {
 	if (!valid_mmap_phys_addr_range(pgoff, len))
 		return (unsigned long) -EINVAL;
-	return pgoff;
+	return pgoff << PAGE_SHIFT;
 }
 
 /* can't do an in-place private mapping if there's no MMU */

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 24/33] fix OOM killing processes wrongly thought MPOL_BIND
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (23 preceding siblings ...)
  2007-04-26 16:56   ` [patch 23/33] fix bogon in /dev/mem mmaping on nommu Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 25/33] Fix possible NULL pointer access in 8250 serial driver Greg KH
                     ` (11 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, bill.irwin, hugh, kamezawa.hiroyu,
	clameter

[-- Attachment #1: fix-oom-killing-processes-wrongly-thought-mpol_bind.patch --]
[-- Type: text/plain, Size: 1023 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Hugh Dickins <hugh@veritas.com>

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)".  memhog is killed correctly once we initialize nodemask in
constrained_alloc().

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Irwin <bill.irwin@oracle.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/oom_kill.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -176,6 +176,8 @@ static inline int constrained_alloc(stru
 	struct zone **z;
 	nodemask_t nodes;
 	int node;
+
+	nodes_clear(nodes);
 	/* node has memory ? */
 	for_each_online_node(node)
 		if (NODE_DATA(node)->node_present_pages)

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 25/33] Fix possible NULL pointer access in 8250 serial driver
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (24 preceding siblings ...)
  2007-04-26 16:56   ` [patch 24/33] fix OOM killing processes wrongly thought MPOL_BIND Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:56   ` [patch 26/33] page migration: fix NR_FILE_PAGES accounting Greg KH
                     ` (10 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, izumi2005, rmk, kaneshige.kenji

[-- Attachment #1: fix-possible-null-pointer-access-in-8250-serial-driver.patch --]
[-- Type: text/plain, Size: 4121 bytes --]


-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Taku Izumi <izumi2005@soft.fujitsu.com>

I encountered the following kernel panic.  The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c.  I confirmed this
problem is fixed by the attached patch, but I don't know this is the
correct fix.

sadc[4378]: NaT consumption 2216203124768 [1]
Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan
container button sg e100 eepro100 mii ehci_hcd ohci_hcd

Pid: 4378, CPU 0, comm: sadc
psr : 00001210085a2010 ifs : 8000000000000289 ip : [<a000000100482071>]
Not tainted
ip is at check_modem_status+0xf1/0x360
unat: 0000000000000000 pfs : 0000000000000289 rsc : 0000000000000003
rnat: 800000000000cc18 bsps: 0000000000000000 pr : 0000000000aa6a99
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100481fb0 b6 : a0000001004822e0 b7 : a000000100477f20
f6 : 1003e2222222222222222 f7 : 0ffdba200000000000000
f8 : 100018000000000000000 f9 : 10002a000000000000000
f10 : 0fffdccccccccc8c00000 f11 : 1003e0000000000000000
r1 : a000000100b9af40 r2 : 0000000000000008 r3 : a000000100ad4e21
r8 : 00000000000000bb r9 : 0000000000000001 r10 : 0000000000000000
r11 : a000000100ad4d58 r12 : e0000000037b7df0 r13 : e0000000037b0000
r14 : 0000000000000001 r15 : 0000000000000018 r16 : a000000100ad4d6c
r17 : 0000000000000000 r18 : 0000000000000000 r19 : 0000000000000000
r20 : a00000010099bc88 r21 : 00000000000000bb r22 : 00000000000000bb
r23 : c003fffffc0ff3fe r24 : c003fffffc000000 r25 : 00000000000ff3fe
r26 : a0000001009b7ad0 r27 : 0000000000000001 r28 : a0000001009b7ad8
r29 : 0000000000000000 r30 : a0000001009b7ad0 r31 : a0000001009b7ad0

Call Trace:
[<a000000100013940>] show_stack+0x40/0xa0
sp=e0000000037b7810 bsp=e0000000037b1118
[<a0000001000145a0>] show_regs+0x840/0x880
sp=e0000000037b79e0 bsp=e0000000037b10c0
[<a0000001000368e0>] die+0x1c0/0x2c0
sp=e0000000037b79e0 bsp=e0000000037b1078
[<a000000100036a30>] die_if_kernel+0x50/0x80
sp=e0000000037b7a00 bsp=e0000000037b1048
[<a000000100037c40>] ia64_fault+0x11e0/0x1300
sp=e0000000037b7a00 bsp=e0000000037b0fe8
[<a00000010000bdc0>] ia64_leave_kernel+0x0/0x280
sp=e0000000037b7c20 bsp=e0000000037b0fe8
[<a000000100482070>] check_modem_status+0xf0/0x360
sp=e0000000037b7df0 bsp=e0000000037b0fa0
[<a000000100482300>] serial8250_get_mctrl+0x20/0xa0
sp=e0000000037b7df0 bsp=e0000000037b0f80
[<a000000100478170>] uart_read_proc+0x250/0x860
sp=e0000000037b7df0 bsp=e0000000037b0ee0
[<a0000001001c16d0>] proc_file_read+0x1d0/0x4c0
sp=e0000000037b7e10 bsp=e0000000037b0e80
[<a0000001001394b0>] vfs_read+0x1b0/0x300
sp=e0000000037b7e20 bsp=e0000000037b0e30
[<a000000100139cd0>] sys_read+0x70/0xe0
sp=e0000000037b7e20 bsp=e0000000037b0db0
[<a00000010000bc20>] ia64_ret_from_syscall+0x0/0x20
sp=e0000000037b7e30 bsp=e0000000037b0db0
[<a000000000010620>] __kernel_syscall_via_break+0x0/0x20
sp=e0000000037b8000 bsp=e0000000037b0db0


Fix the possible NULL pointer access in check_modem_status() in 8250.c.  The
check_modem_status() would access 'info' member of uart_port structure, but it
is not initialized before uart_open() is called.  The check_modem_status() can
be called through /proc/tty/driver/serial before uart_open() is called.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Taku Izumi <izumi2005@soft.fujitsu.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/serial/8250.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -1289,7 +1289,8 @@ static unsigned int check_modem_status(s
 {
 	unsigned int status = serial_in(up, UART_MSR);
 
-	if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI) {
+	if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI &&
+	    up->port.info != NULL) {
 		if (status & UART_MSR_TERI)
 			up->port.icount.rng++;
 		if (status & UART_MSR_DDSR)

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 26/33] page migration: fix NR_FILE_PAGES accounting
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (25 preceding siblings ...)
  2007-04-26 16:56   ` [patch 25/33] Fix possible NULL pointer access in 8250 serial driver Greg KH
@ 2007-04-26 16:56   ` Greg KH
  2007-04-26 16:57   ` [patch 27/33] Taskstats fix the structure members alignment issue Greg KH
                     ` (9 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:56 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, solo, mbligh, clameter

[-- Attachment #1: page-migration-fix-nr_file_pages-accounting.patch --]
[-- Type: text/plain, Size: 1520 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Christoph Lameter <clameter@sgi.com>

NR_FILE_PAGES must be accounted for depending on the zone that the page
belongs to.  If we replace the page in the radix tree then we may have to
shift the count to another zone.

Suggested-by: Ethan Solomita <solo@google.com>
Cc: Martin Bligh <mbligh@mbligh.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 mm/migrate.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -297,7 +297,7 @@ static int migrate_page_move_mapping(str
 	void **pslot;
 
 	if (!mapping) {
-		/* Anonymous page */
+		/* Anonymous page without mapping */
 		if (page_count(page) != 1)
 			return -EAGAIN;
 		return 0;
@@ -333,6 +333,19 @@ static int migrate_page_move_mapping(str
 	 */
 	__put_page(page);
 
+	/*
+	 * If moved to a different zone then also account
+	 * the page for that zone. Other VM counters will be
+	 * taken care of when we establish references to the
+	 * new page and drop references to the old page.
+	 *
+	 * Note that anonymous pages are accounted for
+	 * via NR_FILE_PAGES and NR_ANON_PAGES if they
+	 * are mapped to swap space.
+	 */
+	__dec_zone_page_state(page, NR_FILE_PAGES);
+	__inc_zone_page_state(newpage, NR_FILE_PAGES);
+
 	write_unlock_irq(&mapping->tree_lock);
 
 	return 0;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 27/33] Taskstats fix the structure members alignment issue
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (26 preceding siblings ...)
  2007-04-26 16:56   ` [patch 26/33] page migration: fix NR_FILE_PAGES accounting Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 28/33] reiserfs: fix xattr root locking/refcount bug Greg KH
                     ` (8 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, nagar, balbir, jlan, balbir

[-- Attachment #1: taskstats-fix-the-structure-members-alignment-issue.patch --]
[-- Type: text/plain, Size: 7124 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Balbir Singh <balbir@in.ibm.com>

We broke the the alignment of members of taskstats to the 8 byte boundary
with the CSA patches.  In the current kernel, the taskstats structure is
not suitable for use by 32 bit applications in a 64 bit kernel.

On x86_64

Offsets of taskstats' members (64 bit kernel, 64 bit application)

@taskstats'offsetof[@taskstats'indices] = (
        0,      # version
        4,      # ac_exitcode
        8,      # ac_flag
        9,      # ac_nice
        16,     # cpu_count
        24,     # cpu_delay_total
        32,     # blkio_count
        40,     # blkio_delay_total
        48,     # swapin_count
        56,     # swapin_delay_total
        64,     # cpu_run_real_total
        72,     # cpu_run_virtual_total
        80,     # ac_comm
        112,    # ac_sched
        113,    # ac_pad
        116,    # ac_uid
        120,    # ac_gid
        124,    # ac_pid
        128,    # ac_ppid
        132,    # ac_btime
        136,    # ac_etime
        144,    # ac_utime
        152,    # ac_stime
        160,    # ac_minflt
        168,    # ac_majflt
        176,    # coremem
        184,    # virtmem
        192,    # hiwater_rss
        200,    # hiwater_vm
        208,    # read_char
        216,    # write_char
        224,    # read_syscalls
        232,    # write_syscalls
        240,    # read_bytes
        248,    # write_bytes
        256,    # cancelled_write_bytes
    );

Offsets of taskstats' members (64 bit kernel, 32 bit application)

@taskstats'offsetof[@taskstats'indices] = (
        0,      # version
        4,      # ac_exitcode
        8,      # ac_flag
        9,      # ac_nice
        12,     # cpu_count
        20,     # cpu_delay_total
        28,     # blkio_count
        36,     # blkio_delay_total
        44,     # swapin_count
        52,     # swapin_delay_total
        60,     # cpu_run_real_total
        68,     # cpu_run_virtual_total
        76,     # ac_comm
        108,    # ac_sched
        109,    # ac_pad
        112,    # ac_uid
        116,    # ac_gid
        120,    # ac_pid
        124,    # ac_ppid
        128,    # ac_btime
        132,    # ac_etime
        140,    # ac_utime
        148,    # ac_stime
        156,    # ac_minflt
        164,    # ac_majflt
        172,    # coremem
        180,    # virtmem
        188,    # hiwater_rss
        196,    # hiwater_vm
        204,    # read_char
        212,    # write_char
        220,    # read_syscalls
        228,    # write_syscalls
        236,    # read_bytes
        244,    # write_bytes
        252,    # cancelled_write_bytes
    );

This is one way to solve the problem without re-arranging structure members
is to pack the structure.  The patch adds an __attribute__((aligned(8))) to
the taskstats structure members so that 32 bit applications using taskstats
can work with a 64 bit kernel.

Using __attribute__((packed)) would break the 64 bit alignment of members.

The fix was tested on x86_64. After the fix, we got

Offsets of taskstats' members (64 bit kernel, 64 bit application)

@taskstats'offsetof[@taskstats'indices] = (
        0,      # version
        4,      # ac_exitcode
        8,      # ac_flag
        9,      # ac_nice
        16,     # cpu_count
        24,     # cpu_delay_total
        32,     # blkio_count
        40,     # blkio_delay_total
        48,     # swapin_count
        56,     # swapin_delay_total
        64,     # cpu_run_real_total
        72,     # cpu_run_virtual_total
        80,     # ac_comm
        112,    # ac_sched
        113,    # ac_pad
        120,    # ac_uid
        124,    # ac_gid
        128,    # ac_pid
        132,    # ac_ppid
        136,    # ac_btime
        144,    # ac_etime
        152,    # ac_utime
        160,    # ac_stime
        168,    # ac_minflt
        176,    # ac_majflt
        184,    # coremem
        192,    # virtmem
        200,    # hiwater_rss
        208,    # hiwater_vm
        216,    # read_char
        224,    # write_char
        232,    # read_syscalls
        240,    # write_syscalls
        248,    # read_bytes
        256,    # write_bytes
        264,    # cancelled_write_bytes
    );

Offsets of taskstats' members (64 bit kernel, 32 bit application)

@taskstats'offsetof[@taskstats'indices] = (
        0,      # version
        4,      # ac_exitcode
        8,      # ac_flag
        9,      # ac_nice
        16,     # cpu_count
        24,     # cpu_delay_total
        32,     # blkio_count
        40,     # blkio_delay_total
        48,     # swapin_count
        56,     # swapin_delay_total
        64,     # cpu_run_real_total
        72,     # cpu_run_virtual_total
        80,     # ac_comm
        112,    # ac_sched
        113,    # ac_pad
        120,    # ac_uid
        124,    # ac_gid
        128,    # ac_pid
        132,    # ac_ppid
        136,    # ac_btime
        144,    # ac_etime
        152,    # ac_utime
        160,    # ac_stime
        168,    # ac_minflt
        176,    # ac_majflt
        184,    # coremem
        192,    # virtmem
        200,    # hiwater_rss
        208,    # hiwater_vm
        216,    # read_char
        224,    # write_char
        232,    # read_syscalls
        240,    # write_syscalls
        248,    # read_bytes
        256,    # write_bytes
        264,    # cancelled_write_bytes
    );

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/taskstats.h |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/include/linux/taskstats.h
+++ b/include/linux/taskstats.h
@@ -31,7 +31,7 @@
  */
 
 
-#define TASKSTATS_VERSION	3
+#define TASKSTATS_VERSION	4
 #define TS_COMM_LEN		32	/* should be >= TASK_COMM_LEN
 					 * in linux/sched.h */
 
@@ -66,7 +66,7 @@ struct taskstats {
 	/* Delay waiting for cpu, while runnable
 	 * count, delay_total NOT updated atomically
 	 */
-	__u64	cpu_count;
+	__u64	cpu_count __attribute__((aligned(8)));
 	__u64	cpu_delay_total;
 
 	/* Following four fields atomically updated using task->delays->lock */
@@ -101,14 +101,17 @@ struct taskstats {
 
 	/* Basic Accounting Fields start */
 	char	ac_comm[TS_COMM_LEN];	/* Command name */
-	__u8	ac_sched;		/* Scheduling discipline */
+	__u8	ac_sched __attribute__((aligned(8)));
+					/* Scheduling discipline */
 	__u8	ac_pad[3];
-	__u32	ac_uid;			/* User ID */
+	__u32	ac_uid __attribute__((aligned(8)));
+					/* User ID */
 	__u32	ac_gid;			/* Group ID */
 	__u32	ac_pid;			/* Process ID */
 	__u32	ac_ppid;		/* Parent process ID */
 	__u32	ac_btime;		/* Begin time [sec since 1970] */
-	__u64	ac_etime;		/* Elapsed time [usec] */
+	__u64	ac_etime __attribute__((aligned(8)));
+					/* Elapsed time [usec] */
 	__u64	ac_utime;		/* User CPU time [usec] */
 	__u64	ac_stime;		/* SYstem CPU time [usec] */
 	__u64	ac_minflt;		/* Minor Page Fault Count */

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 28/33] reiserfs: fix xattr root locking/refcount bug
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (27 preceding siblings ...)
  2007-04-26 16:57   ` [patch 27/33] Taskstats fix the structure members alignment issue Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 29/33] hwmon/w83627ehf: Fix the fan5 clock divider write Greg KH
                     ` (7 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable, torvalds
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, akpm, alan, a.righi, vs, jeffm, zam, edward

[-- Attachment #1: reiserfs-fix-xattr-root-locking-refcount-bug.patch --]
[-- Type: text/plain, Size: 5336 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Jeff Mahoney <jeffm@suse.com>

The listxattr() and getxattr() operations are only protected by a read
lock.  As a result, if either of these operations run in parallel, a race
condition exists where the xattr_root will end up being cached twice, which
results in the leaking of a reference and a BUG() on umount.

This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(),
into one get_xa_root() function that takes the appropriate locking around
the entire critical section.

Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it>

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Andrea Righi <a.righi@cineca.it>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Edward Shishkin <edward@namesys.com>
Cc: Alex Zarochentsev <zam@namesys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/reiserfs/xattr.c |   92 +++++++++++++---------------------------------------
 1 file changed, 24 insertions(+), 68 deletions(-)

--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -54,82 +54,48 @@
 static struct reiserfs_xattr_handler *find_xattr_handler_prefix(const char
 								*prefix);
 
-static struct dentry *create_xa_root(struct super_block *sb)
+/* Returns the dentry referring to the root of the extended attribute
+ * directory tree. If it has already been retrieved, it is used. If it
+ * hasn't been created and the flags indicate creation is allowed, we
+ * attempt to create it. On error, we return a pointer-encoded error.
+ */
+static struct dentry *get_xa_root(struct super_block *sb, int flags)
 {
 	struct dentry *privroot = dget(REISERFS_SB(sb)->priv_root);
 	struct dentry *xaroot;
 
 	/* This needs to be created at mount-time */
 	if (!privroot)
-		return ERR_PTR(-EOPNOTSUPP);
+		return ERR_PTR(-ENODATA);
 
-	xaroot = lookup_one_len(XAROOT_NAME, privroot, strlen(XAROOT_NAME));
-	if (IS_ERR(xaroot)) {
+	mutex_lock(&privroot->d_inode->i_mutex);
+	if (REISERFS_SB(sb)->xattr_root) {
+		xaroot = dget(REISERFS_SB(sb)->xattr_root);
 		goto out;
-	} else if (!xaroot->d_inode) {
-		int err;
-		mutex_lock(&privroot->d_inode->i_mutex);
-		err =
-		    privroot->d_inode->i_op->mkdir(privroot->d_inode, xaroot,
-						   0700);
-		mutex_unlock(&privroot->d_inode->i_mutex);
-
-		if (err) {
-			dput(xaroot);
-			dput(privroot);
-			return ERR_PTR(err);
-		}
-		REISERFS_SB(sb)->xattr_root = dget(xaroot);
 	}
 
-      out:
-	dput(privroot);
-	return xaroot;
-}
-
-/* This will return a dentry, or error, refering to the xa root directory.
- * If the xa root doesn't exist yet, the dentry will be returned without
- * an associated inode. This dentry can be used with ->mkdir to create
- * the xa directory. */
-static struct dentry *__get_xa_root(struct super_block *s)
-{
-	struct dentry *privroot = dget(REISERFS_SB(s)->priv_root);
-	struct dentry *xaroot = NULL;
-
-	if (IS_ERR(privroot) || !privroot)
-		return privroot;
-
 	xaroot = lookup_one_len(XAROOT_NAME, privroot, strlen(XAROOT_NAME));
 	if (IS_ERR(xaroot)) {
 		goto out;
 	} else if (!xaroot->d_inode) {
-		dput(xaroot);
-		xaroot = NULL;
-		goto out;
+		int err = -ENODATA;
+		if (flags == 0 || flags & XATTR_CREATE)
+			err = privroot->d_inode->i_op->mkdir(privroot->d_inode,
+			                                     xaroot, 0700);
+		if (err) {
+			dput(xaroot);
+			xaroot = ERR_PTR(err);
+			goto out;
+		}
 	}
-
-	REISERFS_SB(s)->xattr_root = dget(xaroot);
+	REISERFS_SB(sb)->xattr_root = dget(xaroot);
 
       out:
+	mutex_unlock(&privroot->d_inode->i_mutex);
 	dput(privroot);
 	return xaroot;
 }
 
-/* Returns the dentry (or NULL) referring to the root of the extended
- * attribute directory tree. If it has already been retrieved, it is used.
- * Otherwise, we attempt to retrieve it from disk. It may also return
- * a pointer-encoded error.
- */
-static inline struct dentry *get_xa_root(struct super_block *s)
-{
-	struct dentry *dentry = dget(REISERFS_SB(s)->xattr_root);
-
-	if (!dentry)
-		dentry = __get_xa_root(s);
-
-	return dentry;
-}
-
 /* Opens the directory corresponding to the inode's extended attribute store.
  * If flags allow, the tree to the directory may be created. If creation is
  * prohibited, -ENODATA is returned. */
@@ -138,21 +104,11 @@ static struct dentry *open_xa_dir(const 
 	struct dentry *xaroot, *xadir;
 	char namebuf[17];
 
-	xaroot = get_xa_root(inode->i_sb);
-	if (IS_ERR(xaroot)) {
+	xaroot = get_xa_root(inode->i_sb, flags);
+	if (IS_ERR(xaroot))
 		return xaroot;
-	} else if (!xaroot) {
-		if (flags == 0 || flags & XATTR_CREATE) {
-			xaroot = create_xa_root(inode->i_sb);
-			if (IS_ERR(xaroot))
-				return xaroot;
-		}
-		if (!xaroot)
-			return ERR_PTR(-ENODATA);
-	}
 
 	/* ok, we have xaroot open */
-
 	snprintf(namebuf, sizeof(namebuf), "%X.%X",
 		 le32_to_cpu(INODE_PKEY(inode)->k_objectid),
 		 inode->i_generation);
@@ -821,7 +777,7 @@ int reiserfs_delete_xattrs(struct inode 
 
 	/* Leftovers besides . and .. -- that's not good. */
 	if (dir->d_inode->i_nlink <= 2) {
-		root = get_xa_root(inode->i_sb);
+		root = get_xa_root(inode->i_sb, XATTR_REPLACE);
 		reiserfs_write_lock_xattrs(inode->i_sb);
 		err = vfs_rmdir(root->d_inode, dir);
 		reiserfs_write_unlock_xattrs(inode->i_sb);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 29/33] hwmon/w83627ehf: Fix the fan5 clock divider write
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (28 preceding siblings ...)
  2007-04-26 16:57   ` [patch 28/33] reiserfs: fix xattr root locking/refcount bug Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 30/33] ALSA: intel8x0 - Fix speaker output after S2RAM Greg KH
                     ` (6 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Jean Delvare

[-- Attachment #1: hwmon-w83627ehf-fix-the-fan5-clock-divider-write.patch --]
[-- Type: text/plain, Size: 1715 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Jean Delvare <khali@linux-fr.org>

Users have been complaining about the w83627ehf driver flooding their
logs with debug messages like:

w83627ehf 9191-0a10: Increasing fan 4 clock divider from 64 to 128

or:

w83627ehf 9191-0290: Increasing fan 4 clock divider from 4 to 8

The reason is that we failed to actually write the LSB of the encoded
clock divider value for that fan, causing the next read to report the
same old value again and again.

Additionally, the fan number was improperly reported, making the bug
harder to find.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/hwmon/w83627ehf.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/hwmon/w83627ehf.c
+++ b/drivers/hwmon/w83627ehf.c
@@ -389,7 +389,7 @@ static void w83627ehf_write_fan_div(stru
 		break;
 	case 4:
 		reg = (w83627ehf_read_value(client, W83627EHF_REG_DIODE) & 0x73)
-		    | ((data->fan_div[4] & 0x03) << 3)
+		    | ((data->fan_div[4] & 0x03) << 2)
 		    | ((data->fan_div[4] & 0x04) << 5);
 		w83627ehf_write_value(client, W83627EHF_REG_DIODE, reg);
 		break;
@@ -453,9 +453,9 @@ static struct w83627ehf_data *w83627ehf_
 			   time */
 			if (data->fan[i] == 0xff
 			 && data->fan_div[i] < 0x07) {
-			 	dev_dbg(&client->dev, "Increasing fan %d "
+			 	dev_dbg(&client->dev, "Increasing fan%d "
 					"clock divider from %u to %u\n",
-					i, div_from_reg(data->fan_div[i]),
+					i + 1, div_from_reg(data->fan_div[i]),
 					div_from_reg(data->fan_div[i] + 1));
 				data->fan_div[i]++;
 				w83627ehf_write_fan_div(client, i);

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 30/33] ALSA: intel8x0 - Fix speaker output after S2RAM
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (29 preceding siblings ...)
  2007-04-26 16:57   ` [patch 29/33] hwmon/w83627ehf: Fix the fan5 clock divider write Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 31/33] AGPGART: intel_agp: fix G965 GTT size detect Greg KH
                     ` (5 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Takashi Iwai, Tommi Kyntola,
	Jaroslav Kysela

[-- Attachment #1: alsa-intel8x0-fix-speaker-output-after-s2ram.patch --]
[-- Type: text/plain, Size: 1041 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Tommi Kyntola <tommi.kyntola@ray.fi>

[ALSA] intel8x0 - Fix speaker output after S2RAM

Fixed the mute speaker problem after S2RAM on some laptops:
	http://bugme.osdl.org/show_bug.cgi?id=6181

Signed-off-by: Tommi Kyntola <tommi.kyntola@ray.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 sound/pci/intel8x0.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/sound/pci/intel8x0.c
+++ b/sound/pci/intel8x0.c
@@ -2489,7 +2489,10 @@ static int intel8x0_suspend(struct pci_d
 	}
 	pci_disable_device(pci);
 	pci_save_state(pci);
-	pci_set_power_state(pci, pci_choose_state(pci, state));
+	/* The call below may disable built-in speaker on some laptops
+	 * after S2RAM.  So, don't touch it.
+	 */
+	/* pci_set_power_state(pci, pci_choose_state(pci, state)); */
 	return 0;
 }
 

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 31/33] AGPGART: intel_agp: fix G965 GTT size detect
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (30 preceding siblings ...)
  2007-04-26 16:57   ` [patch 30/33] ALSA: intel8x0 - Fix speaker output after S2RAM Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 32/33] cfq-iosched: fix alias + front merge bug Greg KH
                     ` (4 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Wang Zhenyu

[-- Attachment #1: agpgart-intel_agp-fix-g965-gtt-size-detect.patch --]
[-- Type: text/plain, Size: 1078 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Wang Zhenyu <zhenyu.z.wang@intel.com>

[AGPGART] intel_agp: fix G965 GTT size detect

On G965, I810_PGETBL_CTL is a mmio offset, but we wrongly take it
as pci config space offset in detecting GTT size. This one line patch
fixs this.

Signed-off-by: Wang Zhenyu <zhenyu.z.wang@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/char/agp/intel-agp.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -405,9 +405,8 @@ static void intel_i830_init_gtt_entries(
 
 	if (IS_I965) {
 		u32 pgetbl_ctl;
+		pgetbl_ctl = readl(intel_i830_private.registers+I810_PGETBL_CTL);
 
-		pci_read_config_dword(agp_bridge->dev, I810_PGETBL_CTL,
-				      &pgetbl_ctl);
 		/* The 965 has a field telling us the size of the GTT,
 		 * which may be larger than what is necessary to map the
 		 * aperture.

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 32/33] cfq-iosched: fix alias + front merge bug
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (31 preceding siblings ...)
  2007-04-26 16:57   ` [patch 31/33] AGPGART: intel_agp: fix G965 GTT size detect Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 16:57   ` [patch 33/33] Revert "adjust legacy IDE resource setting (v2)" Greg KH
                     ` (3 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Jens Axboe

[-- Attachment #1: cfq-iosched-fix-alias-front-merge-bug.patch --]
[-- Type: text/plain, Size: 1762 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Jens Axboe <jens.axboe@oracle.com>

There's a really rare and obscure bug in CFQ, that causes a crash in
cfq_dispatch_insert() due to rq == NULL. One example of that is seen
here:

http://lkml.org/lkml/2007/4/15/41

Neil correctly diagnosed the situation for how this can happen, read
that analysis here:

http://lkml.org/lkml/2007/4/25/57

This looks like it requires md to trigger, even though it should
potentially be possible to due with O_DIRECT (at least if you edit the
kernel and doctor some of the unplug calls).

The fix is to move the ->next_rq update to when we add a request to the
rbtree. Then we remove the possibility for a request to exist in the
rbtree code, but not have ->next_rq correctly updated.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 block/cfq-iosched.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -462,6 +462,12 @@ static void cfq_add_rq_rb(struct request
 
 	if (!cfq_cfqq_on_rr(cfqq))
 		cfq_add_cfqq_rr(cfqd, cfqq);
+
+	/*
+	 * check if this request is a better next-serve candidate
+	 */
+	cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
+	BUG_ON(!cfqq->next_rq);
 }
 
 static inline void
@@ -1623,12 +1629,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s
 		cfqq->meta_pending++;
 
 	/*
-	 * check if this request is a better next-serve candidate)) {
-	 */
-	cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
-	BUG_ON(!cfqq->next_rq);
-
-	/*
 	 * we never wait for an async request and we don't allow preemption
 	 * of an async request. so just return early
 	 */

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [patch 33/33] Revert "adjust legacy IDE resource setting (v2)"
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (32 preceding siblings ...)
  2007-04-26 16:57   ` [patch 32/33] cfq-iosched: fix alias + front merge bug Greg KH
@ 2007-04-26 16:57   ` Greg KH
  2007-04-26 17:01   ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (2 subsequent siblings)
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 16:57 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, torvalds, akpm, alan, Bartlomiej Zolnierkiewicz

[-- Attachment #1: revert-adjust-legacy-ide-resource-setting.patch --]
[-- Type: text/plain, Size: 3261 bytes --]

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Revert "adjust legacy IDE resource setting (v2)"

This reverts commit ed8ccee0918ad063a4741c0656fda783e02df627.

It causes hang on boot for some users and we don't yet know why:

http://bugzilla.kernel.org/show_bug.cgi?id=7562

http://lkml.org/lkml/2007/4/20/404
http://lkml.org/lkml/2007/3/25/113

Just reverse it for 2.6.21-final, having broken X server is somehow
better than unbootable system.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/pci/probe.c |   45 +++++++++++++--------------------------------
 1 file changed, 13 insertions(+), 32 deletions(-)

--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -639,34 +639,7 @@ static void pci_read_irq(struct pci_dev 
 	dev->irq = irq;
 }
 
-static void change_legacy_io_resource(struct pci_dev * dev, unsigned index,
-                                      unsigned start, unsigned end)
-{
-	unsigned base = start & PCI_BASE_ADDRESS_IO_MASK;
-	unsigned len = (end | ~PCI_BASE_ADDRESS_IO_MASK) - base + 1;
-
-	/*
-	 * Some X versions get confused when the BARs reported through
-	 * /sys or /proc differ from those seen in config space, thus
-	 * try to update the config space values, too.
-	 */
-	if (!(pci_resource_flags(dev, index) & IORESOURCE_IO))
-		printk(KERN_WARNING "%s: cannot adjust BAR%u (not I/O)\n",
-		       pci_name(dev), index);
-	else if (pci_resource_len(dev, index) != len)
-		printk(KERN_WARNING "%s: cannot adjust BAR%u (size %04X)\n",
-		       pci_name(dev), index, (unsigned)pci_resource_len(dev, index));
-	else {
-		printk(KERN_INFO "%s: trying to change BAR%u from %04X to %04X\n",
-		       pci_name(dev), index,
-		       (unsigned)pci_resource_start(dev, index), base);
-		pci_write_config_dword(dev, PCI_BASE_ADDRESS_0 + index * 4, base);
-	}
-	pci_resource_start(dev, index) = start;
-	pci_resource_end(dev, index)   = end;
-	pci_resource_flags(dev, index) =
-		IORESOURCE_IO | IORESOURCE_PCI_FIXED | PCI_BASE_ADDRESS_SPACE_IO;
-}
+#define LEGACY_IO_RESOURCE	(IORESOURCE_IO | IORESOURCE_PCI_FIXED)
 
 /**
  * pci_setup_device - fill in class and map information of a device
@@ -719,12 +692,20 @@ static int pci_setup_device(struct pci_d
 			u8 progif;
 			pci_read_config_byte(dev, PCI_CLASS_PROG, &progif);
 			if ((progif & 1) == 0) {
-				change_legacy_io_resource(dev, 0, 0x1F0, 0x1F7);
-				change_legacy_io_resource(dev, 1, 0x3F6, 0x3F6);
+				dev->resource[0].start = 0x1F0;
+				dev->resource[0].end = 0x1F7;
+				dev->resource[0].flags = LEGACY_IO_RESOURCE;
+				dev->resource[1].start = 0x3F6;
+				dev->resource[1].end = 0x3F6;
+				dev->resource[1].flags = LEGACY_IO_RESOURCE;
 			}
 			if ((progif & 4) == 0) {
-				change_legacy_io_resource(dev, 2, 0x170, 0x177);
-				change_legacy_io_resource(dev, 3, 0x376, 0x376);
+				dev->resource[2].start = 0x170;
+				dev->resource[2].end = 0x177;
+				dev->resource[2].flags = LEGACY_IO_RESOURCE;
+				dev->resource[3].start = 0x376;
+				dev->resource[3].end = 0x376;
+				dev->resource[3].flags = LEGACY_IO_RESOURCE;
 			}
 		}
 		break;

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (33 preceding siblings ...)
  2007-04-26 16:57   ` [patch 33/33] Revert "adjust legacy IDE resource setting (v2)" Greg KH
@ 2007-04-26 17:01   ` Greg KH
  2007-04-26 20:29   ` Chuck Ebbert
  2007-04-27 10:15   ` Wu, Bryan
  36 siblings, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-26 17:01 UTC (permalink / raw)
  To: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Thu, Apr 26, 2007 at 09:54:45AM -0700, Greg KH wrote:
> This is the start of the stable review cycle for the 2.6.20.10 release.
> There are 33 patches in this series, all will be posted as a response to
> this one.

A full all-in-one patch can be found at:
	kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.20.10-rc1.gz

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-26 16:48   ` David Lang
@ 2007-04-26 17:30     ` Greg KH
  2007-04-26 17:45       ` [stable] " Chris Wright
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2007-04-26 17:30 UTC (permalink / raw)
  To: David Lang; +Cc: linux-kernel, stable, cebbert

On Thu, Apr 26, 2007 at 09:48:22AM -0700, David Lang wrote:
>  any idea why there are so many more -stable patches for 2.6.20? this is the 
>  10th -stable series, and most of them have been dozens of patches.
> 
>  is there a new team reporting and fixing bugs? or were there just more small 
>  problems found in 2.6.20 then normal? or something else?

I think it's entirely due to the awesome effort by Chuck Ebbert of Red
Hat.  He has been digging through all of the applied patches to Linus's
tree and been forwarding them on to the stable team.  Without his effort
and help, we would not have so many patches in these releases.

I know I personally owe him at least one beer if I run into him at some
conference, and I think all other users of the stable tree do too.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [stable] [patch 00/33] 2.6.20-stable review
  2007-04-26 17:30     ` Greg KH
@ 2007-04-26 17:45       ` Chris Wright
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wright @ 2007-04-26 17:45 UTC (permalink / raw)
  To: Greg KH; +Cc: David Lang, linux-kernel, cebbert, stable

* Greg KH (gregkh@suse.de) wrote:
> On Thu, Apr 26, 2007 at 09:48:22AM -0700, David Lang wrote:
> >  any idea why there are so many more -stable patches for 2.6.20? this is the 
> >  10th -stable series, and most of them have been dozens of patches.
> > 
> >  is there a new team reporting and fixing bugs? or were there just more small 
> >  problems found in 2.6.20 then normal? or something else?
> 
> I think it's entirely due to the awesome effort by Chuck Ebbert of Red
> Hat.  He has been digging through all of the applied patches to Linus's
> tree and been forwarding them on to the stable team.  Without his effort
> and help, we would not have so many patches in these releases.

Agreed, Chuck has been really helpful.  AFAIK, he sees it as time
well-spent to make sure -stable is carrying patches so Fedora doesn't
have to, great mantra.

BTW, here's some stats (so it's not that far off from normal):

[chrisw@sequoia linux-2.6-stable]$ for x in $(seq 12 20); do echo linux-2.6.$x.y $(git-rev-list
v2.6.$x..linux-2.6.$x.y | wc -l); done
linux-2.6.12.y 53
linux-2.6.13.y 44
linux-2.6.14.y 96
linux-2.6.15.y 110
linux-2.6.16.y 788
linux-2.6.17.y 191
linux-2.6.18.y 240
linux-2.6.19.y 189
linux-2.6.20.y 235

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (34 preceding siblings ...)
  2007-04-26 17:01   ` [patch 00/33] 2.6.20-stable review Greg KH
@ 2007-04-26 20:29   ` Chuck Ebbert
  2007-04-27 10:15   ` Wu, Bryan
  36 siblings, 0 replies; 48+ messages in thread
From: Chuck Ebbert @ 2007-04-26 20:29 UTC (permalink / raw)
  To: Greg KH, Karsten Keil
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, torvalds, akpm, alan

Greg KH wrote:
> This is the start of the stable review cycle for the 2.6.20.10 release.
> There are 33 patches in this series, all will be posted as a response to
> this one.  If anyone has any issues with these being applied, please let
> us know.  If anyone is a maintainer of the proper subsystem, and wants
> to add a Signed-off-by: line to the patch, please respond with it.
> 

Karsten, did you submit the ISDN patch?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
                     ` (35 preceding siblings ...)
  2007-04-26 20:29   ` Chuck Ebbert
@ 2007-04-27 10:15   ` Wu, Bryan
  2007-04-27 11:05     ` Jesper Juhl
  2007-04-27 15:13     ` Greg KH
  36 siblings, 2 replies; 48+ messages in thread
From: Wu, Bryan @ 2007-04-27 10:15 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Thu, 2007-04-26 at 09:54 -0700, Greg KH wrote:
> This is the start of the stable review cycle for the 2.6.20.10 release.
> There are 33 patches in this series, all will be posted as a response to
> this one.  If anyone has any issues with these being applied, please let
> us know.  If anyone is a maintainer of the proper subsystem, and wants
> to add a Signed-off-by: line to the patch, please respond with it.
> 
> These patches are sent out with a number of different people on the Cc:
> line.  If you wish to be a reviewer, please email stable@kernel.org to
> add your name to the list.  If you want to be off the reviewer list,
> also email us.

Hi Greg:

I am just wondering that is there any rule for stable kernel version
release?

AFAIK, 2.6.x kernels are all stable release and 2.6.x.y is for stable
tree bug fixing and long term supporting. But I found 2.6.16.y got 49
version updating, it is more active than other stable release such as
2.6.17 and 2.6.19. It looks like 2.6.16 is a long-long term supporting
version and even number 2.6.x kernel is more active than odd number
2.6.x kernel.

You know for some customer's product, they want to use the stable and
long term support kernel instead to use the latest one. 

Could you please give us some idea about this regular?

Thanks
-Bryan Wu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-27 10:15   ` Wu, Bryan
@ 2007-04-27 11:05     ` Jesper Juhl
  2007-04-27 13:47       ` Chuck Ebbert
  2007-04-27 15:13     ` Greg KH
  1 sibling, 1 reply; 48+ messages in thread
From: Jesper Juhl @ 2007-04-27 11:05 UTC (permalink / raw)
  To: bryan.wu
  Cc: Greg KH, linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On 27/04/07, Wu, Bryan <bryan.wu@analog.com> wrote:
> On Thu, 2007-04-26 at 09:54 -0700, Greg KH wrote:
> > This is the start of the stable review cycle for the 2.6.20.10 release.
> > There are 33 patches in this series, all will be posted as a response to
> > this one.  If anyone has any issues with these being applied, please let
> > us know.  If anyone is a maintainer of the proper subsystem, and wants
> > to add a Signed-off-by: line to the patch, please respond with it.
> >
> > These patches are sent out with a number of different people on the Cc:
> > line.  If you wish to be a reviewer, please email stable@kernel.org to
> > add your name to the list.  If you want to be off the reviewer list,
> > also email us.
>
> Hi Greg:
>
> I am just wondering that is there any rule for stable kernel version
> release?
>
> AFAIK, 2.6.x kernels are all stable release and 2.6.x.y is for stable
> tree bug fixing and long term supporting. But I found 2.6.16.y got 49
> version updating, it is more active than other stable release such as
> 2.6.17 and 2.6.19. It looks like 2.6.16 is a long-long term supporting
> version and even number 2.6.x kernel is more active than odd number
> 2.6.x kernel.
>
> You know for some customer's product, they want to use the stable and
> long term support kernel instead to use the latest one.
>
> Could you please give us some idea about this regular?
>

2.6.16.y is special in that Adrian Bunk took it upon himself to
maintain that branch more or less indefinately. But that's not how
-stable normally works, it's Adrians own project.

The normal way -stable works is that it tracks the latest 2.6.x kernel
that has been released.
Now that 2.6.21 has been released, a final flush of the patch queue
against 2.6.20 is done, that will be 2.6.20.10, and then -stable will
switch to 2.6.21.y, when 2.6.22 comes out a final 2.6.21.y is made and
then it's off to track 2.6.22

The rules for what is suitable for a -stable release etc is written in
Documentation/stable_kernel_rules.txt

I believe the above reflects reality - if I've said something wrong I
assume Greg will correct me :)

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-27 11:05     ` Jesper Juhl
@ 2007-04-27 13:47       ` Chuck Ebbert
  0 siblings, 0 replies; 48+ messages in thread
From: Chuck Ebbert @ 2007-04-27 13:47 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: bryan.wu, Greg KH, linux-kernel, stable, Justin Forbes,
	Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap, Dave Jones,
	Chuck Wolber, Chris Wedgwood, Michael Krufky, torvalds, akpm,
	alan

Jesper Juhl wrote:
> 
> The normal way -stable works is that it tracks the latest 2.6.x kernel
> that has been released.
> Now that 2.6.21 has been released, a final flush of the patch queue
> against 2.6.20 is done, that will be 2.6.20.10, and then -stable will
> switch to 2.6.21.y, when 2.6.22 comes out a final 2.6.21.y is made and
> then it's off to track 2.6.22
> 

The older stable version is usually maintained for a while after the next
major release comes out. How long that is can vary. I would personally like
to see 2.6.20 continue for a while, at least until 2.6.21.x is judged stable
enough (that could be a long time.)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-27 10:15   ` Wu, Bryan
  2007-04-27 11:05     ` Jesper Juhl
@ 2007-04-27 15:13     ` Greg KH
  2007-04-28  4:21       ` Bryan WU
  1 sibling, 1 reply; 48+ messages in thread
From: Greg KH @ 2007-04-27 15:13 UTC (permalink / raw)
  To: Wu, Bryan
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote:
> 
> You know for some customer's product, they want to use the stable and
> long term support kernel instead to use the latest one. 

Then they should get that support from a vendor, not from the kernel.org
releases :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-27 15:13     ` Greg KH
@ 2007-04-28  4:21       ` Bryan WU
  2007-04-28  5:48         ` Greg KH
  0 siblings, 1 reply; 48+ messages in thread
From: Bryan WU @ 2007-04-28  4:21 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Fri, 2007-04-27 at 08:13 -0700, Greg KH wrote:
> On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote:
> > 
> > You know for some customer's product, they want to use the stable and
> > long term support kernel instead to use the latest one. 
> 
> Then they should get that support from a vendor, not from the kernel.org
> releases :)
> 

Yeah, but we are the vendor as you mentioned. -:))

If we wanna to release a kernel to customer product development, how to
choose the stable version? Currently, we always followed the kernel
release cycle/rules and give customer the latest stable version.

Thank you Greg
-Bryan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-28  4:21       ` Bryan WU
@ 2007-04-28  5:48         ` Greg KH
  2007-04-28  6:46           ` Bryan WU
  0 siblings, 1 reply; 48+ messages in thread
From: Greg KH @ 2007-04-28  5:48 UTC (permalink / raw)
  To: Bryan WU
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Sat, Apr 28, 2007 at 12:21:24PM +0800, Bryan WU wrote:
> On Fri, 2007-04-27 at 08:13 -0700, Greg KH wrote:
> > On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote:
> > > 
> > > You know for some customer's product, they want to use the stable and
> > > long term support kernel instead to use the latest one. 
> > 
> > Then they should get that support from a vendor, not from the kernel.org
> > releases :)
> > 
> 
> Yeah, but we are the vendor as you mentioned. -:))

Ah, then you already know what to do :)

> If we wanna to release a kernel to customer product development, how to
> choose the stable version?

That's up to you.

> Currently, we always followed the kernel release cycle/rules and give
> customer the latest stable version.

Ok, then what has really changed here?  We've been doing this .y release
thing (also called -stable) for about 2 years now, nothing is different
this week from last.

Confused,

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-28  5:48         ` Greg KH
@ 2007-04-28  6:46           ` Bryan WU
  2007-04-28  7:01             ` Greg KH
  2007-04-28 16:24             ` Linus Torvalds
  0 siblings, 2 replies; 48+ messages in thread
From: Bryan WU @ 2007-04-28  6:46 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Fri, 2007-04-27 at 22:48 -0700, Greg KH wrote:
> On Sat, Apr 28, 2007 at 12:21:24PM +0800, Bryan WU wrote:
> > On Fri, 2007-04-27 at 08:13 -0700, Greg KH wrote:
> > > On Fri, Apr 27, 2007 at 06:15:54PM +0800, Wu, Bryan wrote:
> > > > 
> > > > You know for some customer's product, they want to use the stable and
> > > > long term support kernel instead to use the latest one. 
> > > 
> > > Then they should get that support from a vendor, not from the kernel.org
> > > releases :)
> > > 
> > 
> > Yeah, but we are the vendor as you mentioned. -:))
> 
> Ah, then you already know what to do :)
> 
> > If we wanna to release a kernel to customer product development, how to
> > choose the stable version?
> 
> That's up to you.
> 
> > Currently, we always followed the kernel release cycle/rules and give
> > customer the latest stable version.
> 
> Ok, then what has really changed here?  We've been doing this .y release
> thing (also called -stable) for about 2 years now, nothing is different
> this week from last.
> 
> Confused,

It's clear to me. Thanks.

You know, because the kernel development is so active and so many stable
versions release, it is very hard to decide use which version for mass
production, especially some embedded systems which does not often
upgrade. 

I know it is an old topic, sorry for confusing.
Thanks
-Bryan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-28  6:46           ` Bryan WU
@ 2007-04-28  7:01             ` Greg KH
  2007-04-28 16:24             ` Linus Torvalds
  1 sibling, 0 replies; 48+ messages in thread
From: Greg KH @ 2007-04-28  7:01 UTC (permalink / raw)
  To: Bryan WU
  Cc: linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, torvalds, akpm,
	alan

On Sat, Apr 28, 2007 at 02:46:16PM +0800, Bryan WU wrote:
> 
> You know, because the kernel development is so active and so many stable
> versions release, it is very hard to decide use which version for mass
> production, especially some embedded systems which does not often
> upgrade. 

It is difficult.  But you need to just spend the time and test and
verify that the kernel you pick works for your needs.

And then, after choosing that, you need to monitor the newer releases
for bugfixes and security updates that affect your customers, and then
backport them and provide them to your customers in a method by which
they can easily and securely update their machines.

It's not the simplest business to be in, that's for sure, but then
again, it's no different from any other OS provider, except the fact
that we are moving faster, and providing more features and fixes than
anyone ever has before in the history of computing :)

Welcome to the community, if there's anything we can do to help out,
please let us know, we are all in this together for the long haul.

greg k-h

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [patch 00/33] 2.6.20-stable review
  2007-04-28  6:46           ` Bryan WU
  2007-04-28  7:01             ` Greg KH
@ 2007-04-28 16:24             ` Linus Torvalds
  1 sibling, 0 replies; 48+ messages in thread
From: Linus Torvalds @ 2007-04-28 16:24 UTC (permalink / raw)
  To: Bryan WU
  Cc: Greg KH, linux-kernel, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, akpm, alan



On Sat, 28 Apr 2007, Bryan WU wrote:
> 
> You know, because the kernel development is so active and so many stable
> versions release, it is very hard to decide use which version for mass
> production, especially some embedded systems which does not often
> upgrade. 

I actually think that one of the advantages (at least that was the _plan_) 
of having kernel releases every two-to-three months is that for vendors 
who don't upgrade very often, you should always have a choice of few 
kernels to decide on - you can simply decide to go with a less recent 
kernel that you've been testing for a while. 

And with a fairly short release cycle, even if you decide that "hey, we 
just don't know enough about the latest kernel, so let's go with the <n-1> 
release", you won't be _totally_ behind the times. Yeah, you'll be using 
something older, but it will be just two months older, not "totally 
ancient".

In other words, the fact that the kernel developers cut releases fairly 
often should mean that vendors can much more easily decide on their own 
release cycle _independently_ of the kernel release cycle, because at any 
point in time, you always have *some* kernel release that isn't horribly 
old, and that you can have a few months of knowledge about.

Compare that to a release cycle of every two years or something (eg a 
major gcc release), where if you're unlucky, you have the choice between 
"recent and all the features, but it's not seen a lot of testing yet", or 
"really quite old and stable, and we'll look bad for packaging it when 
people know there is a much more recent release".

In that kind of situation, a downstream vendor just doesn't have a lot of 
good choices: delay the release until you know more about the package, or 
ship a really old version, or ship a new and scary one. All three are "big 
choices", and you can just pray you do the right one.

In contrast, the kernel release cycle has been geared to making those 
choices "small and inconsequential". Right now, you can basically choose 
between any of 
 - 2.6.19.7
 - 2.6.20.10
 - 2.6.21.1
and none of them are horribly old, and you can base your choice on just 
how adventurous you are _and_ on being familiar with two of them.

For example, I think 2.6.20 was a good release, and so my gut feel is that 
2.6.20.10 is probably preferable over the 2.6.19-based one, but if you 
want to live on the edge, you'd pick 2.6.21.1, and if you want to go for 
having a _loong_ time of being comfy with something, you might decide to 
go with 2.6.16.49 that Adrian has been maintaining.

In other words, having tight development releases just makes all these 
choices easier. There's more choice, for sure, and that could feel scary, 
but at the same time, it should result in less of a "choice between two 
evils" kind of situation, and more of a "I *can* make an informed choice 
if I just spend some effort on it".

At least that was the plan. From everything I've heard, most people are 
pretty happy with the 2.6.x development model. You cannot please 
everybody, but the release frequency means that developers feel like they 
can work on relevant stuff all the time, and vendors can always choose 
something that is known stable and not horribly ancient.

			Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2007-04-28 16:27 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070426165111.393445007@mini.kroah.org>
2007-04-26 16:54 ` [patch 00/33] 2.6.20-stable review Greg KH
2007-04-26 16:48   ` David Lang
2007-04-26 17:30     ` Greg KH
2007-04-26 17:45       ` [stable] " Chris Wright
2007-04-26 16:54   ` [patch 01/33] knfsd: Use a spinlock to protect sk_info_authunix Greg KH
2007-04-26 16:55   ` [patch 02/33] IB/mthca: Fix data corruption after FMR unmap on Sinai Greg KH
2007-04-26 16:55   ` [patch 03/33] HID: zeroing of bytes in output fields is bogus Greg KH
2007-04-26 16:55   ` [patch 04/33] KVM: MMU: Fix guest writes to nonpae pde Greg KH
2007-04-26 16:55   ` [patch 05/33] KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram Greg KH
2007-04-26 16:55   ` [patch 06/33] holepunch: fix shmem_truncate_range punching too far Greg KH
2007-04-26 16:55   ` [patch 07/33] holepunch: fix shmem_truncate_range punch locking Greg KH
2007-04-26 16:55   ` [patch 08/33] holepunch: fix disconnected pages after second truncate Greg KH
2007-04-26 16:55   ` [patch 09/33] holepunch: fix mmap_sem i_mutex deadlock Greg KH
2007-04-26 16:55   ` [patch 10/33] Fix sparc64 SBUS IOMMU allocator Greg KH
2007-04-26 16:55   ` [patch 11/33] Fix qlogicpti DMA unmapping Greg KH
2007-04-26 16:55   ` [patch 12/33] Fix compat sys_ipc() on sparc64 Greg KH
2007-04-26 16:55   ` [patch 13/33] Fix bogus inline directive in sparc64 PCI code Greg KH
2007-04-26 16:55   ` [patch 14/33] Fix errors in tcp_memcalculations Greg KH
2007-04-26 16:56   ` [patch 15/33] Fix netpoll UDP input path Greg KH
2007-04-26 16:56   ` [patch 16/33] Fix IRDA oopser Greg KH
2007-04-26 16:56   ` [patch 17/33] cache_k8_northbridges() overflows beyond allocation Greg KH
2007-04-26 16:56   ` [patch 18/33] exec.c: fix coredump to pipe problem and obscure "security hole" Greg KH
2007-04-26 16:56   ` [patch 19/33] NFS: Fix an Oops in nfs_setattr() Greg KH
2007-04-26 16:56   ` [patch 20/33] x86: Dont probe for DDC on VBE1.2 Greg KH
2007-04-26 16:56   ` [patch 21/33] vt: fix potential race in VT_WAITACTIVE handler Greg KH
2007-04-26 16:56   ` [patch 22/33] 3w-xxxx: fix oops caused by incorrect REQUEST_SENSE handling Greg KH
2007-04-26 16:56   ` [patch 23/33] fix bogon in /dev/mem mmaping on nommu Greg KH
2007-04-26 16:56   ` [patch 24/33] fix OOM killing processes wrongly thought MPOL_BIND Greg KH
2007-04-26 16:56   ` [patch 25/33] Fix possible NULL pointer access in 8250 serial driver Greg KH
2007-04-26 16:56   ` [patch 26/33] page migration: fix NR_FILE_PAGES accounting Greg KH
2007-04-26 16:57   ` [patch 27/33] Taskstats fix the structure members alignment issue Greg KH
2007-04-26 16:57   ` [patch 28/33] reiserfs: fix xattr root locking/refcount bug Greg KH
2007-04-26 16:57   ` [patch 29/33] hwmon/w83627ehf: Fix the fan5 clock divider write Greg KH
2007-04-26 16:57   ` [patch 30/33] ALSA: intel8x0 - Fix speaker output after S2RAM Greg KH
2007-04-26 16:57   ` [patch 31/33] AGPGART: intel_agp: fix G965 GTT size detect Greg KH
2007-04-26 16:57   ` [patch 32/33] cfq-iosched: fix alias + front merge bug Greg KH
2007-04-26 16:57   ` [patch 33/33] Revert "adjust legacy IDE resource setting (v2)" Greg KH
2007-04-26 17:01   ` [patch 00/33] 2.6.20-stable review Greg KH
2007-04-26 20:29   ` Chuck Ebbert
2007-04-27 10:15   ` Wu, Bryan
2007-04-27 11:05     ` Jesper Juhl
2007-04-27 13:47       ` Chuck Ebbert
2007-04-27 15:13     ` Greg KH
2007-04-28  4:21       ` Bryan WU
2007-04-28  5:48         ` Greg KH
2007-04-28  6:46           ` Bryan WU
2007-04-28  7:01             ` Greg KH
2007-04-28 16:24             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).