linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ 00/73] 3.2.25-stable review
@ 2012-07-31  4:43 Ben Hutchings
  2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
                   ` (74 more replies)
  0 siblings, 75 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan

This is the start of the stable review cycle for the 3.2.25 release.
There are 73 patches in this series, which will be posted as responses
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Aug  2 10:00:00 UTC 2012.
Anything received after that time might be too late.

A combined patch relative to 3.2.24 will be posted as an additional
response to this, and the diffstat can be found below.

Ben.

-------------
 Makefile                                     |    4 +-
 arch/arm/mach-omap2/opp.c                    |    3 +-
 arch/powerpc/include/asm/reg.h               |    3 +-
 arch/powerpc/kernel/ftrace.c                 |   12 +-
 arch/s390/kernel/processor.c                 |    2 +
 arch/s390/kernel/smp.c                       |    3 -
 arch/x86/kernel/microcode_core.c             |   31 +++--
 arch/x86/pci/fixup.c                         |   17 +++
 block/blk-core.c                             |    6 +-
 block/blk-exec.c                             |    2 +-
 block/blk-sysfs.c                            |    4 +-
 block/blk-throttle.c                         |    4 +-
 block/blk.h                                  |    2 +-
 drivers/acpi/ac.c                            |    4 +-
 drivers/gpu/drm/nouveau/nva3_copy.fuc        |    4 +-
 drivers/gpu/drm/nouveau/nva3_copy.fuc.h      |   94 +++++++++++++-
 drivers/gpu/drm/nouveau/nvc0_copy.fuc.h      |   87 ++++++++++++-
 drivers/gpu/drm/radeon/atombios_dp.c         |   10 +-
 drivers/gpu/drm/radeon/radeon_connectors.c   |   35 ++++--
 drivers/gpu/drm/radeon/radeon_cursor.c       |    8 +-
 drivers/gpu/drm/radeon/radeon_object.c       |    3 +-
 drivers/iommu/amd_iommu.c                    |   10 +-
 drivers/media/video/cx25821/cx25821-core.c   |    3 -
 drivers/media/video/cx25821/cx25821.h        |    2 +-
 drivers/mmc/host/sdhci-pci.c                 |    1 +
 drivers/net/ethernet/realtek/r8169.c         |    1 +
 drivers/net/wireless/mwifiex/cfg80211.c      |    4 +-
 drivers/net/wireless/rt2x00/rt2800usb.c      |   23 +++-
 drivers/net/wireless/rtlwifi/rtl8192de/phy.c |    6 +-
 drivers/scsi/hosts.c                         |    7 +-
 drivers/scsi/libsas/sas_expander.c           |   47 +++----
 drivers/scsi/scsi.c                          |    8 +-
 drivers/scsi/scsi_error.c                    |   14 +++
 drivers/scsi/scsi_lib.c                      |   43 +++----
 drivers/scsi/scsi_priv.h                     |    1 -
 drivers/scsi/scsi_scan.c                     |    3 +
 drivers/scsi/scsi_sysfs.c                    |   46 ++++---
 drivers/target/iscsi/iscsi_target.c          |   22 +---
 drivers/target/iscsi/iscsi_target_core.h     |    2 -
 drivers/target/iscsi/iscsi_target_login.c    |   60 +--------
 drivers/target/target_core_cdb.c             |   43 +++++--
 drivers/target/target_core_transport.c       |   10 ++
 drivers/usb/core/devio.c                     |   10 +-
 drivers/usb/gadget/u_ether.c                 |   12 +-
 drivers/usb/serial/option.c                  |    8 +-
 fs/btrfs/async-thread.c                      |    9 +-
 fs/btrfs/disk-io.c                           |    5 +-
 fs/cifs/cifssmb.c                            |   30 +++++
 fs/ext4/balloc.c                             |    3 +-
 fs/ext4/bitmap.c                             |   12 +-
 fs/ext4/ext4.h                               |    6 +-
 fs/ext4/ialloc.c                             |    3 +-
 fs/ext4/inode.c                              |   41 ++++--
 fs/ext4/resize.c                             |    5 +
 fs/ext4/super.c                              |  174 ++++++++++++++++++--------
 fs/hugetlbfs/inode.c                         |    3 +-
 fs/locks.c                                   |    6 +-
 fs/nfs/internal.h                            |    2 +-
 fs/nfs/write.c                               |    4 +-
 fs/udf/super.c                               |    2 +-
 include/linux/blkdev.h                       |    1 +
 include/linux/cpu.h                          |    5 +-
 include/linux/cpuset.h                       |   47 +++----
 include/linux/fs.h                           |   11 +-
 include/linux/init_task.h                    |    8 ++
 include/linux/migrate.h                      |   23 +++-
 include/linux/mmzone.h                       |    2 +
 include/linux/sched.h                        |    3 +-
 include/target/target_core_base.h            |    1 +
 kernel/cpuset.c                              |   43 ++-----
 kernel/fork.c                                |    3 +
 kernel/power/hibernate.c                     |    6 +
 kernel/power/suspend.c                       |    3 +
 kernel/sched.c                               |   86 +++++++++++--
 kernel/sched_fair.c                          |    2 +-
 kernel/time/tick-sched.c                     |    1 +
 kernel/workqueue.c                           |   38 +++++-
 mm/compaction.c                              |    4 +-
 mm/filemap.c                                 |   11 +-
 mm/hugetlb.c                                 |   13 +-
 mm/memory-failure.c                          |    2 +-
 mm/memory_hotplug.c                          |    2 +-
 mm/mempolicy.c                               |   30 +++--
 mm/migrate.c                                 |  171 +++++++++++++++++--------
 mm/page_alloc.c                              |  118 +++++++++++------
 mm/slab.c                                    |   13 +-
 mm/slub.c                                    |   40 +++---
 mm/vmscan.c                                  |  114 ++++++++++++++---
 sound/pci/hda/patch_hdmi.c                   |   12 +-
 sound/pci/hda/patch_realtek.c                |    1 +
 sound/soc/soc-dapm.c                         |   15 ++-
 91 files changed, 1268 insertions(+), 590 deletions(-)

-- 
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 02/73] mm: compaction: allow compaction to isolate dirty pages Ben Hutchings
                   ` (73 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Mel Gorman

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 938929f14cb595f43cd1a4e63e22d36cab1e4a1f upstream.

Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 .
        Large machines with 1TB or more of RAM take a long time to boot
        without this patch and may spew out soft lockup warnings.

When min_free_kbytes is updated, some pageblocks are marked
MIGRATE_RESERVE.  Ordinarily, this work is unnoticable as it happens early
in boot but on large machines with 1TB of memory, this has been reported
to delay boot times, probably due to the NUMA distances involved.

The bulk of the work is due to calling calling pageblock_is_reserved() an
unnecessary amount of times and accessing far more struct page metadata
than is necessary.  This patch significantly reduces the amount of work
done by setup_zone_migrate_reserve() improving boot times on 1TB machines.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/page_alloc.c |   40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 516ab62..671e6c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3388,25 +3388,33 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 		if (page_to_nid(page) != zone_to_nid(zone))
 			continue;
 
-		/* Blocks with reserved pages will never free, skip them. */
-		block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn);
-		if (pageblock_is_reserved(pfn, block_end_pfn))
-			continue;
-
 		block_migratetype = get_pageblock_migratetype(page);
 
-		/* If this block is reserved, account for it */
-		if (reserve > 0 && block_migratetype == MIGRATE_RESERVE) {
-			reserve--;
-			continue;
-		}
+		/* Only test what is necessary when the reserves are not met */
+		if (reserve > 0) {
+			/*
+			 * Blocks with reserved pages will never free, skip
+			 * them.
+			 */
+			block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn);
+			if (pageblock_is_reserved(pfn, block_end_pfn))
+				continue;
 
-		/* Suitable for reserving if this block is movable */
-		if (reserve > 0 && block_migratetype == MIGRATE_MOVABLE) {
-			set_pageblock_migratetype(page, MIGRATE_RESERVE);
-			move_freepages_block(zone, page, MIGRATE_RESERVE);
-			reserve--;
-			continue;
+			/* If this block is reserved, account for it */
+			if (block_migratetype == MIGRATE_RESERVE) {
+				reserve--;
+				continue;
+			}
+
+			/* Suitable for reserving if this block is movable */
+			if (block_migratetype == MIGRATE_MOVABLE) {
+				set_pageblock_migratetype(page,
+							MIGRATE_RESERVE);
+				move_freepages_block(zone, page,
+							MIGRATE_RESERVE);
+				reserve--;
+				continue;
+			}
 		}
 
 		/*



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 02/73] mm: compaction: allow compaction to isolate dirty pages
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
  2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Ben Hutchings
                   ` (72 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Andrea Arcangeli, Rik van Riel,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit a77ebd333cd810d7b680d544be88c875131c2bd3 upstream.

Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
	information by reducing LRU list churning had the side-effect of
	reducing THP allocation success rates. This was part of a series
	to restore the success rates while preserving the reclaim fix.

Short summary: There are severe stalls when a USB stick using VFAT is
used with THP enabled that are reduced by this series.  If you are
experiencing this problem, please test and report back and considering I
have seen complaints from openSUSE and Fedora users on this as well as a
few private mails, I'm guessing it's a widespread issue.  This is a new
type of USB-related stall because it is due to synchronous compaction
writing where as in the past the big problem was dirty pages reaching
the end of the LRU and being written by reclaim.

Am cc'ing Andrew this time and this series would replace
mm-do-not-stall-in-synchronous-compaction-for-thp-allocations.patch.
I'm also cc'ing Dave Jones as he might have merged that patch to Fedora
for wider testing and ideally it would be reverted and replaced by this
series.

That said, the later patches could really do with some review.  If this
series is not the answer then a new direction needs to be discussed
because as it is, the stalls are unacceptable as the results in this
leader show.

For testers that try backporting this to 3.1, it won't work because
there is a non-obvious dependency on not writing back pages in direct
reclaim so you need those patches too.

Changelog since V5
o Rebase to 3.2-rc5
o Tidy up the changelogs a bit

Changelog since V4
o Added reviewed-bys, credited Andrea properly for sync-light
o Allow dirty pages without mappings to be considered for migration
o Bound the number of pages freed for compaction
o Isolate PageReclaim pages on their own LRU list

This is against 3.2-rc5 and follows on from discussions on "mm: Do
not stall in synchronous compaction for THP allocations" and "[RFC
PATCH 0/5] Reduce compaction-related stalls". Initially, the proposed
patch eliminated stalls due to compaction which sometimes resulted in
user-visible interactivity problems on browsers by simply never using
sync compaction. The downside was that THP success allocation rates
were lower because dirty pages were not being migrated as reported by
Andrea. His approach at fixing this was nacked on the grounds that
it reverted fixes from Rik merged that reduced the amount of pages
reclaimed as it severely impacted his workloads performance.

This series attempts to reconcile the requirements of maximising THP
usage, without stalling in a user-visible fashion due to compaction
or cheating by reclaiming an excessive number of pages.

Patch 1 partially reverts commit 39deaf85 to allow migration to isolate
	dirty pages. This is because migration can move some dirty
	pages without blocking.

Patch 2 notes that the /proc/sys/vm/compact_memory handler is not using
	synchronous compaction when it should be. This is unrelated
	to the reported stalls but is worth fixing.

Patch 3 checks if we isolated a compound page during lumpy scan and
	account for it properly. For the most part, this affects
	tracing so it's unrelated to the stalls but worth fixing.

Patch 4 notes that it is possible to abort reclaim early for compaction
	and return 0 to the page allocator potentially entering the
	"may oom" path. This has not been observed in practice but
	the rest of the series potentially makes it easier to happen.

Patch 5 adds a sync parameter to the migratepage callback and gives
	the callback responsibility for migrating the page without
	blocking if sync==false. For example, fallback_migrate_page
	will not call writepage if sync==false. This increases the
	number of pages that can be handled by asynchronous compaction
	thereby reducing stalls.

Patch 6 restores filter-awareness to isolate_lru_page for migration.
	In practice, it means that pages under writeback and pages
	without a ->migratepage callback will not be isolated
	for migration.

Patch 7 avoids calling direct reclaim if compaction is deferred but
	makes sure that compaction is only deferred if sync
	compaction was used.

Patch 8 introduces a sync-light migration mechanism that sync compaction
	uses. The objective is to allow some stalls but to not call
	->writepage which can lead to significant user-visible stalls.

Patch 9 notes that while we want to abort reclaim ASAP to allow
	compation to go ahead that we leave a very small window of
	opportunity for compaction to run. This patch allows more pages
	to be freed by reclaim but bounds the number to a reasonable
	level based on the high watermark on each zone.

Patch 10 allows slabs to be shrunk even after compaction_ready() is
	true for one zone. This is to avoid a problem whereby a single
	small zone can abort reclaim even though no pages have been
	reclaimed and no suitably large zone is in a usable state.

Patch 11 fixes a problem with the rate of page scanning. As reclaim is
	rarely stalling on pages under writeback it means that scan
	rates are very high. This is particularly true for direct
	reclaim which is not calling writepage. The vmstat figures
	implied that much of this was busy work with PageReclaim pages
	marked for immediate reclaim. This patch is a prototype that
	moves these pages to their own LRU list.

This has been tested and other than 2 USB keys getting trashed,
nothing horrible fell out. That said, I am a bit unhappy with the
rescue logic in patch 11 but did not find a better way around it. It
does significantly reduce scan rates and System CPU time indicating
it is the right direction to take.

What is of critical importance is that stalls due to compaction
are massively reduced even though sync compaction was still
allowed. Testing from people complaining about stalls copying to USBs
with THP enabled are particularly welcome.

The following tests all involve THP usage and USB keys in some
way. Each test follows this type of pattern

1. Read from some fast fast storage, be it raw device or file. Each time
   the copy finishes, start again until the test ends
2. Write a large file to a filesystem on a USB stick. Each time the copy
   finishes, start again until the test ends
3. When memory is low, start an alloc process that creates a mapping
   the size of physical memory to stress THP allocation. This is the
   "real" part of the test and the part that is meant to trigger
   stalls when THP is enabled. Copying continues in the background.
4. Record the CPU usage and time to execute of the alloc process
5. Record the number of THP allocs and fallbacks as well as the number of THP
   pages in use a the end of the test just before alloc exited
6. Run the test 5 times to get an idea of variability
7. Between each run, sync is run and caches dropped and the test
   waits until nr_dirty is a small number to avoid interference
   or caching between iterations that would skew the figures.

The individual tests were then

writebackCPDeviceBasevfat
	Disable THP, read from a raw device (sda), vfat on USB stick
writebackCPDeviceBaseext4
	Disable THP, read from a raw device (sda), ext4 on USB stick
writebackCPDevicevfat
	THP enabled, read from a raw device (sda), vfat on USB stick
writebackCPDeviceext4
	THP enabled, read from a raw device (sda), ext4 on USB stick
writebackCPFilevfat
	THP enabled, read from a file on fast storage and USB, both vfat
writebackCPFileext4
	THP enabled, read from a file on fast storage and USB, both ext4

The kernels tested were

3.1		3.1
vanilla		3.2-rc5
freemore	Patches 1-10
immediate	Patches 1-11
andrea		The 8 patches Andrea posted as a basis of comparison

The results are very long unfortunately. I'll start with the case
where we are not using THP at all

writebackCPDeviceBasevfat
                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1         andrea-v2r1
System Time         1.28 (    0.00%)   54.49 (-4143.46%)   48.63 (-3687.69%)    4.69 ( -265.11%)   51.88 (-3940.81%)
+/-                 0.06 (    0.00%)    2.45 (-4305.55%)    4.75 (-8430.57%)    7.46 (-13282.76%)    4.76 (-8440.70%)
User Time           0.09 (    0.00%)    0.05 (   40.91%)    0.06 (   29.55%)    0.07 (   15.91%)    0.06 (   27.27%)
+/-                 0.02 (    0.00%)    0.01 (   45.39%)    0.02 (   25.07%)    0.00 (   77.06%)    0.01 (   52.24%)
Elapsed Time      110.27 (    0.00%)   56.38 (   48.87%)   49.95 (   54.70%)   11.77 (   89.33%)   53.43 (   51.54%)
+/-                 7.33 (    0.00%)    3.77 (   48.61%)    4.94 (   32.63%)    6.71 (    8.50%)    4.76 (   35.03%)
THP Active          0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)
+/-                 0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)
Fault Alloc         0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)
+/-                 0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)
Fault Fallback      0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)
+/-                 0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)    0.00 (    0.00%)

The THP figures are obviously all 0 because THP was enabled. The
main thing to watch is the elapsed times and how they compare to
times when THP is enabled later. It's also important to note that
elapsed time is improved by this series as System CPu time is much
reduced.

writebackCPDevicevfat

                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1         andrea-v2r1
System Time         1.22 (    0.00%)   13.89 (-1040.72%)   46.40 (-3709.20%)    4.44 ( -264.37%)   47.37 (-3789.33%)
+/-                 0.06 (    0.00%)   22.82 (-37635.56%)    3.84 (-6249.44%)    6.48 (-10618.92%)    6.60
(-10818.53%)
User Time           0.06 (    0.00%)    0.06 (   -6.90%)    0.05 (   17.24%)    0.05 (   13.79%)    0.04 (   31.03%)
+/-                 0.01 (    0.00%)    0.01 (   33.33%)    0.01 (   33.33%)    0.01 (   39.14%)    0.01 (   25.46%)
Elapsed Time     10445.54 (    0.00%) 2249.92 (   78.46%)   70.06 (   99.33%)   16.59 (   99.84%)  472.43 (
95.48%)
+/-               643.98 (    0.00%)  811.62 (  -26.03%)   10.02 (   98.44%)    7.03 (   98.91%)   59.99 (   90.68%)
THP Active         15.60 (    0.00%)   35.20 (  225.64%)   65.00 (  416.67%)   70.80 (  453.85%)   62.20 (  398.72%)
+/-                18.48 (    0.00%)   51.29 (  277.59%)   15.99 (   86.52%)   37.91 (  205.18%)   22.02 (  119.18%)
Fault Alloc       121.80 (    0.00%)   76.60 (   62.89%)  155.40 (  127.59%)  181.20 (  148.77%)  286.60 (  235.30%)
+/-                73.51 (    0.00%)   61.11 (   83.12%)   34.89 (   47.46%)   31.88 (   43.36%)   68.13 (   92.68%)
Fault Fallback    881.20 (    0.00%)  926.60 (   -5.15%)  847.60 (    3.81%)  822.00 (    6.72%)  716.60 (   18.68%)
+/-                73.51 (    0.00%)   61.26 (   16.67%)   34.89 (   52.54%)   31.65 (   56.94%)   67.75 (    7.84%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)       3540.88   1945.37    716.04     64.97   1937.03
Total Elapsed Time (seconds)              52417.33  11425.90    501.02    230.95   2520.28

The first thing to note is the "Elapsed Time" for the vanilla kernels
of 2249 seconds versus 56 with THP disabled which might explain the
reports of USB stalls with THP enabled. Applying the patches brings
performance in line with THP-disabled performance while isolating
pages for immediate reclaim from the LRU cuts down System CPU time.

The "Fault Alloc" success rate figures are also improved. The vanilla
kernel only managed to allocate 76.6 pages on average over the course
of 5 iterations where as applying the series allocated 181.20 on
average albeit it is well within variance. It's worth noting that
applies the series at least descreases the amount of variance which
implies an improvement.

Andrea's series had a higher success rate for THP allocations but
at a severe cost to elapsed time which is still better than vanilla
but still much worse than disabling THP altogether. One can bring my
series close to Andrea's by removing this check

        /*
         * If compaction is deferred for high-order allocations, it is because
         * sync compaction recently failed. In this is the case and the caller
         * has requested the system not be heavily disrupted, fail the
         * allocation now instead of entering direct reclaim
         */
        if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
                goto nopage;

I didn't include a patch that removed the above check because hurting
overall performance to improve the THP figure is not what the average
user wants. It's something to consider though if someone really wants
to maximise THP usage no matter what it does to the workload initially.

This is summary of vmstat figures from the same test.

                                       3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
Page Ins                                  3257266139  1111844061    17263623    10901575   161423219
Page Outs                                   81054922    30364312     3626530     3657687     8753730
Swap Ins                                        3294        2851        6560        4964        4592
Swap Outs                                     390073      528094      620197      790912      698285
Direct pages scanned                      1077581700  3024951463  1764930052   115140570  5901188831
Kswapd pages scanned                        34826043     7112868     2131265     1686942     1893966
Kswapd pages reclaimed                      28950067     4911036     1246044      966475     1497726
Direct pages reclaimed                     805148398   280167837     3623473     2215044    40809360
Kswapd efficiency                                83%         69%         58%         57%         79%
Kswapd velocity                              664.399     622.521    4253.852    7304.360     751.490
Direct efficiency                                74%          9%          0%          1%          0%
Direct velocity                            20557.737  264745.137 3522673.849  498551.938 2341481.435
Percentage direct scans                          96%         99%         99%         98%         99%
Page writes by reclaim                        722646      529174      620319      791018      699198
Page writes file                              332573        1080         122         106         913
Page writes anon                              390073      528094      620197      790912      698285
Page reclaim immediate                             0  2552514720  1635858848   111281140  5478375032
Page rescued immediate                             0           0           0       87848           0
Slabs scanned                                  23552       23552        9216        8192        9216
Direct inode steals                              231           0           0           0           0
Kswapd inode steals                                0           0           0           0           0
Kswapd skipped wait                            28076         786           0          61           6
THP fault alloc                                  609         383         753         906        1433
THP collapse alloc                                12           6           0           0           6
THP splits                                       536         211         456         593        1136
THP fault fallback                              4406        4633        4263        4110        3583
THP collapse fail                                120         127           0           0           4
Compaction stalls                               1810         728         623         779        3200
Compaction success                               196          53          60          80         123
Compaction failures                             1614         675         563         699        3077
Compaction pages moved                        193158       53545      243185      333457      226688
Compaction move failure                         9952        9396       16424       23676       45070

The main things to look at are

1. Page In/out figures are much reduced by the series.

2. Direct page scanning is incredibly high (264745.137 pages scanned
   per second on the vanilla kernel) but isolating PageReclaim pages
   on their own list reduces the number of pages scanned significantly.

3. The fact that "Page rescued immediate" is a positive number implies
   that we sometimes race removing pages from the LRU_IMMEDIATE list
   that need to be put back on a normal LRU but it happens only for
   0.07% of the pages marked for immediate reclaim.

writebackCPDeviceext4
                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1         andrea-v2r1
System Time         1.51 (    0.00%)    1.77 (  -17.66%)    1.46 (    2.92%)    1.15 (   23.77%)    1.89 (  -25.63%)
+/-                 0.27 (    0.00%)    0.67 ( -148.52%)    0.33 (  -22.76%)    0.30 (  -11.15%)    0.19 (   30.16%)
User Time           0.03 (    0.00%)    0.04 (  -37.50%)    0.05 (  -62.50%)    0.07 ( -112.50%)    0.04 (  -18.75%)
+/-                 0.01 (    0.00%)    0.02 ( -146.64%)    0.02 (  -97.91%)    0.02 (  -75.59%)    0.02 (  -63.30%)
Elapsed Time      124.93 (    0.00%)  114.49 (    8.36%)   96.77 (   22.55%)   27.48 (   78.00%)  205.70 (  -64.65%)
+/-                20.20 (    0.00%)   74.39 ( -268.34%)   59.88 ( -196.48%)    7.72 (   61.79%)   25.03 (  -23.95%)
THP Active        161.80 (    0.00%)   83.60 (   51.67%)  141.20 (   87.27%)   84.60 (   52.29%)   82.60 (   51.05%)
+/-                71.95 (    0.00%)   43.80 (   60.88%)   26.91 (   37.40%)   59.02 (   82.03%)   52.13 (   72.45%)
Fault Alloc       471.40 (    0.00%)  228.60 (   48.49%)  282.20 (   59.86%)  225.20 (   47.77%)  388.40 (   82.39%)
+/-                88.07 (    0.00%)   87.42 (   99.26%)   73.79 (   83.78%)  109.62 (  124.47%)   82.62 (   93.81%)
Fault Fallback    531.60 (    0.00%)  774.60 (  -45.71%)  720.80 (  -35.59%)  777.80 (  -46.31%)  614.80 (  -15.65%)
+/-                88.07 (    0.00%)   87.26 (    0.92%)   73.79 (   16.22%)  109.62 (  -24.47%)   82.29 (    6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)         50.22     33.76     30.65     24.14    128.45
Total Elapsed Time (seconds)               1113.73   1132.19   1029.45    759.49   1707.26

Similar test but the USB stick is using ext4 instead of vfat. As
ext4 does not use writepage for migration, the large stalls due to
compaction when THP is enabled are not observed. Still, isolating
PageReclaim pages on their own list helped completion time largely
by reducing the number of pages scanned by direct reclaim although
time spend in congestion_wait could also be a factor.

Again, Andrea's series had far higher success rates for THP allocation
at the cost of elapsed time. I didn't look too closely but a quick
look at the vmstat figures tells me kswapd reclaimed 8 times more pages
than the patch series and direct reclaim reclaimed roughly three times
as many pages. It follows that if memory is aggressively reclaimed,
there will be more available for THP.

writebackCPFilevfat
                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1         andrea-v2r1
System Time         1.76 (    0.00%)   29.10 (-1555.52%)   46.01 (-2517.18%)    4.79 ( -172.35%)   54.89 (-3022.53%)
+/-                 0.14 (    0.00%)   25.61 (-18185.17%)    2.15 (-1434.83%)    6.60 (-4610.03%)    9.75
(-6863.76%)
User Time           0.05 (    0.00%)    0.07 (  -45.83%)    0.05 (   -4.17%)    0.06 (  -29.17%)    0.06 (  -16.67%)
+/-                 0.02 (    0.00%)    0.02 (   20.11%)    0.02 (   -3.14%)    0.01 (   31.58%)    0.01 (   47.41%)
Elapsed Time     22520.79 (    0.00%) 1082.85 (   95.19%)   73.30 (   99.67%)   32.43 (   99.86%)  291.84 (  98.70%)
+/-              7277.23 (    0.00%)  706.29 (   90.29%)   19.05 (   99.74%)   17.05 (   99.77%)  125.55 (   98.27%)
THP Active         83.80 (    0.00%)   12.80 (   15.27%)   15.60 (   18.62%)   13.00 (   15.51%)    0.80 (    0.95%)
+/-                66.81 (    0.00%)   20.19 (   30.22%)    5.92 (    8.86%)   15.06 (   22.54%)    1.17 (    1.75%)
Fault Alloc       171.00 (    0.00%)   67.80 (   39.65%)   97.40 (   56.96%)  125.60 (   73.45%)  133.00 (   77.78%)
+/-                82.91 (    0.00%)   30.69 (   37.02%)   53.91 (   65.02%)   55.05 (   66.40%)   21.19 (   25.56%)
Fault Fallback    832.00 (    0.00%)  935.20 (  -12.40%)  906.00 (   -8.89%)  877.40 (   -5.46%)  870.20 (   -4.59%)
+/-                82.91 (    0.00%)   30.69 (   62.98%)   54.01 (   34.86%)   55.05 (   33.60%)   20.91 (   74.78%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)       7229.81    928.42    704.52     80.68   1330.76
Total Elapsed Time (seconds)             112849.04   5618.69    571.11    360.54   1664.28

In this case, the test is reading/writing only from filesystems but as
it's vfat, it's slow due to calling writepage during compaction. Little
to observe really - the time to complete the test goes way down
with the series applied and THP allocation success rates go up in
comparison to 3.2-rc5.  The success rates are lower than 3.1.0 but
the elapsed time for that kernel is abysmal so it is not really a
sensible comparison.

As before, Andrea's series allocates more THPs at the cost of overall
performance.

writebackCPFileext4
                   3.1.0-vanilla         rc5-vanilla       freemore-v6r1        isolate-v6r1         andrea-v2r1
System Time         1.51 (    0.00%)    1.77 (  -17.66%)    1.46 (    2.92%)    1.15 (   23.77%)    1.89 (  -25.63%)
+/-                 0.27 (    0.00%)    0.67 ( -148.52%)    0.33 (  -22.76%)    0.30 (  -11.15%)    0.19 (   30.16%)
User Time           0.03 (    0.00%)    0.04 (  -37.50%)    0.05 (  -62.50%)    0.07 ( -112.50%)    0.04 (  -18.75%)
+/-                 0.01 (    0.00%)    0.02 ( -146.64%)    0.02 (  -97.91%)    0.02 (  -75.59%)    0.02 (  -63.30%)
Elapsed Time      124.93 (    0.00%)  114.49 (    8.36%)   96.77 (   22.55%)   27.48 (   78.00%)  205.70 (  -64.65%)
+/-                20.20 (    0.00%)   74.39 ( -268.34%)   59.88 ( -196.48%)    7.72 (   61.79%)   25.03 (  -23.95%)
THP Active        161.80 (    0.00%)   83.60 (   51.67%)  141.20 (   87.27%)   84.60 (   52.29%)   82.60 (   51.05%)
+/-                71.95 (    0.00%)   43.80 (   60.88%)   26.91 (   37.40%)   59.02 (   82.03%)   52.13 (   72.45%)
Fault Alloc       471.40 (    0.00%)  228.60 (   48.49%)  282.20 (   59.86%)  225.20 (   47.77%)  388.40 (   82.39%)
+/-                88.07 (    0.00%)   87.42 (   99.26%)   73.79 (   83.78%)  109.62 (  124.47%)   82.62 (   93.81%)
Fault Fallback    531.60 (    0.00%)  774.60 (  -45.71%)  720.80 (  -35.59%)  777.80 (  -46.31%)  614.80 (  -15.65%)
+/-                88.07 (    0.00%)   87.26 (    0.92%)   73.79 (   16.22%)  109.62 (  -24.47%)   82.29 (    6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds)         50.22     33.76     30.65     24.14    128.45
Total Elapsed Time (seconds)               1113.73   1132.19   1029.45    759.49   1707.26

Same type of story - elapsed times go down. In this case, allocation
success rates are roughtly the same. As before, Andrea's has higher
success rates but takes a lot longer.

Overall the series does reduce latencies and while the tests are
inherency racy as alloc competes with the cp processes, the variability
was included. The THP allocation rates are not as high as they could
be but that is because we would have to be more aggressive about
reclaim and compaction impacting overall performance.

This patch:

Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list.

What was missed during review is that asynchronous migration moves dirty
pages if their ->migratepage callback is migrate_page() because these can
be moved without blocking.  This potentially impacted hugepage allocation
success rates by a factor depending on how many dirty pages are in the
system.

This patch partially reverts 39deaf85 to allow migration to isolate dirty
pages again.  This increases how much compaction disrupts the LRU but that
is addressed later in the series.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/compaction.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e6670c3..396ea2b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,9 +349,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 		}
 
-		if (!cc->sync)
-			mode |= ISOLATE_CLEAN;
-
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, mode, 0) != 0)
 			continue;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
  2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
  2012-07-31  4:43 ` [ 02/73] mm: compaction: allow compaction to isolate dirty pages Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Ben Hutchings
                   ` (71 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream.

Stable note: Not tracked in Bugzilla. A fix aimed at preserving page
	aging information by reducing LRU list churning had the side-effect
	of reducing THP allocation success rates. This was part of a series
	to restore the success rates while preserving the reclaim fix.

Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time.  Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates.  Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;

	if (PageDirty(page) && !sync &&
		mapping->a_ops->migratepage != migrate_page)
			rc = -EBUSY;

This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking.  This
patch updates the ->migratepage callback with a "sync" parameter.  It is
the responsibility of the callback to fail gracefully if migration would
block.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/btrfs/disk-io.c      |    4 +-
 fs/hugetlbfs/inode.c    |    3 +-
 fs/nfs/internal.h       |    2 +-
 fs/nfs/write.c          |    4 +-
 include/linux/fs.h      |    9 ++--
 include/linux/migrate.h |    2 +-
 mm/migrate.c            |  129 +++++++++++++++++++++++++++++++++--------------
 7 files changed, 106 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f99a099..1375494 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,7 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
-			struct page *newpage, struct page *page)
+			struct page *newpage, struct page *page, bool sync)
 {
 	/*
 	 * we can't safely write a btree page from here,
@@ -887,7 +887,7 @@ static int btree_migratepage(struct address_space *mapping,
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 #endif
 
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index e425ad9..06fd460 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -583,7 +583,8 @@ static int hugetlbfs_set_page_dirty(struct page *page)
 }
 
 static int hugetlbfs_migrate_page(struct address_space *mapping,
-				struct page *newpage, struct page *page)
+				struct page *newpage, struct page *page,
+				bool sync)
 {
 	int rc;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5ee9253..114398a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -332,7 +332,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-		struct page *, struct page *);
+		struct page *, struct page *, bool);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 0c38852..889e98b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1688,7 +1688,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-		struct page *page)
+		struct page *page, bool sync)
 {
 	/*
 	 * If PagePrivate is set, then the page is currently associated with
@@ -1703,7 +1703,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
 
 	nfs_fscache_release_page(page, GFP_KERNEL);
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 #endif
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a7409bc..b92b73d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -609,9 +609,12 @@ struct address_space_operations {
 			loff_t offset, unsigned long nr_segs);
 	int (*get_xip_mem)(struct address_space *, pgoff_t, int,
 						void **, unsigned long *);
-	/* migrate the contents of a page to the specified target */
+	/*
+	 * migrate the contents of a page to the specified target. If sync
+	 * is false, it must not block.
+	 */
 	int (*migratepage) (struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, bool);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
@@ -2537,7 +2540,7 @@ extern int generic_check_addressable(unsigned, u64);
 
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
-				struct page *, struct page *);
+				struct page *, struct page *, bool);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..14e6d2a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,7 +11,7 @@ typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, bool);
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
 			bool sync);
diff --git a/mm/migrate.c b/mm/migrate.c
index fc39198..4e86f3b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -216,6 +216,55 @@ out:
 	pte_unmap_unlock(ptep, ptl);
 }
 
+#ifdef CONFIG_BLOCK
+/* Returns true if all buffers are successfully locked */
+static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+{
+	struct buffer_head *bh = head;
+
+	/* Simple case, sync compaction */
+	if (sync) {
+		do {
+			get_bh(bh);
+			lock_buffer(bh);
+			bh = bh->b_this_page;
+
+		} while (bh != head);
+
+		return true;
+	}
+
+	/* async case, we cannot block on lock_buffer so use trylock_buffer */
+	do {
+		get_bh(bh);
+		if (!trylock_buffer(bh)) {
+			/*
+			 * We failed to lock the buffer and cannot stall in
+			 * async migration. Release the taken locks
+			 */
+			struct buffer_head *failed_bh = bh;
+			put_bh(failed_bh);
+			bh = head;
+			while (bh != failed_bh) {
+				unlock_buffer(bh);
+				put_bh(bh);
+				bh = bh->b_this_page;
+			}
+			return false;
+		}
+
+		bh = bh->b_this_page;
+	} while (bh != head);
+	return true;
+}
+#else
+static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
+								bool sync)
+{
+	return true;
+}
+#endif /* CONFIG_BLOCK */
+
 /*
  * Replace the page in the mapping.
  *
@@ -225,7 +274,8 @@ out:
  * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
  */
 static int migrate_page_move_mapping(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page,
+		struct buffer_head *head, bool sync)
 {
 	int expected_count;
 	void **pslot;
@@ -255,6 +305,19 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	}
 
 	/*
+	 * In the async migration case of moving a page with buffers, lock the
+	 * buffers using trylock before the mapping is moved. If the mapping
+	 * was moved, we later failed to lock the buffers and could not move
+	 * the mapping back due to an elevated page count, we would have to
+	 * block waiting on other references to be dropped.
+	 */
+	if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+		page_unfreeze_refs(page, expected_count);
+		spin_unlock_irq(&mapping->tree_lock);
+		return -EAGAIN;
+	}
+
+	/*
 	 * Now we know that no one else is looking at the page.
 	 */
 	get_page(newpage);	/* add cache reference */
@@ -409,13 +472,13 @@ EXPORT_SYMBOL(fail_migrate_page);
  * Pages are locked upon entry and exit.
  */
 int migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page, bool sync)
 {
 	int rc;
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
 
 	if (rc)
 		return rc;
@@ -432,28 +495,28 @@ EXPORT_SYMBOL(migrate_page);
  * exist.
  */
 int buffer_migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page, bool sync)
 {
 	struct buffer_head *bh, *head;
 	int rc;
 
 	if (!page_has_buffers(page))
-		return migrate_page(mapping, newpage, page);
+		return migrate_page(mapping, newpage, page, sync);
 
 	head = page_buffers(page);
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
 
 	if (rc)
 		return rc;
 
-	bh = head;
-	do {
-		get_bh(bh);
-		lock_buffer(bh);
-		bh = bh->b_this_page;
-
-	} while (bh != head);
+	/*
+	 * In the async case, migrate_page_move_mapping locked the buffers
+	 * with an IRQ-safe spinlock held. In the sync case, the buffers
+	 * need to be locked now
+	 */
+	if (sync)
+		BUG_ON(!buffer_migrate_lock_buffers(head, sync));
 
 	ClearPagePrivate(page);
 	set_page_private(newpage, page_private(page));
@@ -530,10 +593,13 @@ static int writeout(struct address_space *mapping, struct page *page)
  * Default handling if a filesystem does not provide a migration function.
  */
 static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page)
+	struct page *newpage, struct page *page, bool sync)
 {
-	if (PageDirty(page))
+	if (PageDirty(page)) {
+		if (!sync)
+			return -EBUSY;
 		return writeout(mapping, page);
+	}
 
 	/*
 	 * Buffers may be managed in a filesystem specific way.
@@ -543,7 +609,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, sync);
 }
 
 /*
@@ -579,29 +645,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 
 	mapping = page_mapping(page);
 	if (!mapping)
-		rc = migrate_page(mapping, newpage, page);
-	else {
+		rc = migrate_page(mapping, newpage, page, sync);
+	else if (mapping->a_ops->migratepage)
 		/*
-		 * Do not writeback pages if !sync and migratepage is
-		 * not pointing to migrate_page() which is nonblocking
-		 * (swapcache/tmpfs uses migratepage = migrate_page).
+		 * Most pages have a mapping and most filesystems provide a
+		 * migratepage callback. Anonymous pages are part of swap
+		 * space which also has its own migratepage callback. This
+		 * is the most common path for page migration.
 		 */
-		if (PageDirty(page) && !sync &&
-		    mapping->a_ops->migratepage != migrate_page)
-			rc = -EBUSY;
-		else if (mapping->a_ops->migratepage)
-			/*
-			 * Most pages have a mapping and most filesystems
-			 * should provide a migration function. Anonymous
-			 * pages are part of swap space which also has its
-			 * own migration function. This is the most common
-			 * path for page migration.
-			 */
-			rc = mapping->a_ops->migratepage(mapping,
-							newpage, page);
-		else
-			rc = fallback_migrate_page(mapping, newpage, page);
-	}
+		rc = mapping->a_ops->migratepage(mapping,
+						newpage, page, sync);
+	else
+		rc = fallback_migrate_page(mapping, newpage, page, sync);
 
 	if (rc) {
 		newpage->mapping = NULL;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (2 preceding siblings ...)
  2012-07-31  4:43 ` [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again Ben Hutchings
                   ` (70 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Minchan Kim, Rik van Riel,
	Andrea Arcangeli, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 66199712e9eef5aede09dbcd9dfff87798a66917 upstream.

Stable note: Not tracked in Buzilla. This was part of a series that
	reduced interactivity stalls experienced when THP was enabled.

If compaction is deferred, direct reclaim is used to try to free enough
pages for the allocation to succeed.  For small high-orders, this has a
reasonable chance of success.  However, if the caller has specified
__GFP_NO_KSWAPD to limit the disruption to the system, it makes more sense
to fail the allocation rather than stall the caller in direct reclaim.
This patch skips direct reclaim if compaction is deferred and the caller
specifies __GFP_NO_KSWAPD.

Async compaction only considers a subset of pages so it is possible for
compaction to be deferred prematurely and not enter direct reclaim even in
cases where it should.  To compensate for this, this patch also defers
compaction only if sync compaction failed.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/page_alloc.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9404b38..cb5723c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1981,14 +1981,20 @@ static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	struct page *page;
 
-	if (!order || compaction_deferred(preferred_zone))
+	if (!order)
 		return NULL;
 
+	if (compaction_deferred(preferred_zone)) {
+		*deferred_compaction = true;
+		return NULL;
+	}
+
 	current->flags |= PF_MEMALLOC;
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
 						nodemask, sync_migration);
@@ -2016,7 +2022,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		 * but not enough to satisfy watermarks.
 		 */
 		count_vm_event(COMPACTFAIL);
-		defer_compaction(preferred_zone);
+
+		/*
+		 * As async compaction considers a subset of pageblocks, only
+		 * defer if the failure was a sync compaction failure.
+		 */
+		if (sync_migration)
+			defer_compaction(preferred_zone);
 
 		cond_resched();
 	}
@@ -2028,8 +2040,9 @@ static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	return NULL;
 }
@@ -2179,6 +2192,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
 	bool sync_migration = false;
+	bool deferred_compaction = false;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -2259,12 +2273,22 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 	if (page)
 		goto got_pg;
 	sync_migration = true;
 
+	/*
+	 * If compaction is deferred for high-order allocations, it is because
+	 * sync compaction recently failed. In this is the case and the caller
+	 * has requested the system not be heavily disrupted, fail the
+	 * allocation now instead of entering direct reclaim
+	 */
+	if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
+		goto nopage;
+
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,
@@ -2328,8 +2352,9 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 		if (page)
 			goto got_pg;
 	}



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (3 preceding siblings ...)
  2012-07-31  4:43 ` [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
                   ` (69 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit c82449352854ff09e43062246af86bdeb628f0c3 upstream.

Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
	information by reducing LRU list churning had the side-effect of
	reducing THP allocation success rates. This was part of a series
	to restore the success rates while preserving the reclaim fix.

Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list.  This
had to be partially reverted because some dirty pages can be migrated by
compaction without blocking.

This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise LRU
disruption.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 include/linux/mmzone.h |    2 ++
 mm/compaction.c        |    3 +++
 mm/vmscan.c            |   35 +++++++++++++++++++++++++++++++++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 42e544c..2038b90 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -177,6 +177,8 @@ struct lruvec {
 #define ISOLATE_CLEAN		((__force isolate_mode_t)0x4)
 /* Isolate unmapped file */
 #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
+/* Isolate for asynchronous migration */
+#define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
 
 /* LRU Isolation modes. */
 typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/mm/compaction.c b/mm/compaction.c
index d31e64b..fb29158 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -349,6 +349,9 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 			continue;
 		}
 
+		if (!cc->sync)
+			mode |= ISOLATE_ASYNC_MIGRATE;
+
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, mode, 0) != 0)
 			continue;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cb68c53..efbcab1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1075,8 +1075,39 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 
 	ret = -EBUSY;
 
-	if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
-		return ret;
+	/*
+	 * To minimise LRU disruption, the caller can indicate that it only
+	 * wants to isolate pages it will be able to operate on without
+	 * blocking - clean pages for the most part.
+	 *
+	 * ISOLATE_CLEAN means that only clean pages should be isolated. This
+	 * is used by reclaim when it is cannot write to backing storage
+	 *
+	 * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
+	 * that it is possible to migrate without blocking
+	 */
+	if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
+		/* All the caller can do on PageWriteback is block */
+		if (PageWriteback(page))
+			return ret;
+
+		if (PageDirty(page)) {
+			struct address_space *mapping;
+
+			/* ISOLATE_CLEAN means only clean pages */
+			if (mode & ISOLATE_CLEAN)
+				return ret;
+
+			/*
+			 * Only pages without mappings or that have a
+			 * ->migratepage callback are possible to migrate
+			 * without blocking
+			 */
+			mapping = page_mapping(page);
+			if (mapping && !mapping->a_ops->migratepage)
+				return ret;
+		}
+	}
 
 	if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
 		return ret;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 06/73] mm: compaction: introduce sync-light migration for use by compaction
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (4 preceding siblings ...)
  2012-07-31  4:43 ` [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31 16:42   ` Herton Ronaldo Krzesinski
  2012-07-31  4:43 ` [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Ben Hutchings
                   ` (68 subsequent siblings)
  74 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream.

Stable note: Not tracked in Buzilla. This was part of a series that
	reduced interactivity stalls experienced when THP was enabled.
	These stalls were particularly noticable when copying data
	to a USB stick but the experiences for users varied a lot.

This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage.  Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.

This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.

[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/btrfs/disk-io.c      |    5 +--
 fs/hugetlbfs/inode.c    |    2 +-
 fs/nfs/internal.h       |    2 +-
 fs/nfs/write.c          |    4 +--
 include/linux/fs.h      |    6 ++--
 include/linux/migrate.h |   23 +++++++++++---
 mm/compaction.c         |    2 +-
 mm/memory-failure.c     |    2 +-
 mm/memory_hotplug.c     |    2 +-
 mm/mempolicy.c          |    2 +-
 mm/migrate.c            |   78 ++++++++++++++++++++++++++---------------------
 11 files changed, 76 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1375494..d852566 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
-			struct page *newpage, struct page *page, bool sync)
+			struct page *newpage, struct page *page,
+			enum migrate_mode mode)
 {
 	/*
 	 * we can't safely write a btree page from here,
@@ -887,7 +888,7 @@ static int btree_migratepage(struct address_space *mapping,
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
-	return migrate_page(mapping, newpage, page, sync);
+	return migrate_page(mapping, newpage, page, mode);
 }
 #endif
 
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 06fd460..1e85a7a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -584,7 +584,7 @@ static int hugetlbfs_set_page_dirty(struct page *page)
 
 static int hugetlbfs_migrate_page(struct address_space *mapping,
 				struct page *newpage, struct page *page,
-				bool sync)
+				enum migrate_mode mode)
 {
 	int rc;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 114398a..8102db9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -332,7 +332,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-		struct page *, struct page *, bool);
+		struct page *, struct page *, enum migrate_mode);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 889e98b..834f0fe 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1688,7 +1688,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-		struct page *page, bool sync)
+		struct page *page, enum migrate_mode mode)
 {
 	/*
 	 * If PagePrivate is set, then the page is currently associated with
@@ -1703,7 +1703,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
 
 	nfs_fscache_release_page(page, GFP_KERNEL);
 
-	return migrate_page(mapping, newpage, page, sync);
+	return migrate_page(mapping, newpage, page, mode);
 }
 #endif
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b92b73d..e694bd4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -525,6 +525,7 @@ enum positive_aop_returns {
 struct page;
 struct address_space;
 struct writeback_control;
+enum migrate_mode;
 
 struct iov_iter {
 	const struct iovec *iov;
@@ -614,7 +615,7 @@ struct address_space_operations {
 	 * is false, it must not block.
 	 */
 	int (*migratepage) (struct address_space *,
-			struct page *, struct page *, bool);
+			struct page *, struct page *, enum migrate_mode);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
@@ -2540,7 +2541,8 @@ extern int generic_check_addressable(unsigned, u64);
 
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
-				struct page *, struct page *, bool);
+				struct page *, struct page *,
+				enum migrate_mode);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 14e6d2a..eaf8674 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -6,18 +6,31 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
+/*
+ * MIGRATE_ASYNC means never block
+ * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
+ *	on most operations but not ->writepage as the potential stall time
+ *	is too significant
+ * MIGRATE_SYNC will block when migrating pages
+ */
+enum migrate_mode {
+	MIGRATE_ASYNC,
+	MIGRATE_SYNC_LIGHT,
+	MIGRATE_SYNC,
+};
+
 #ifdef CONFIG_MIGRATION
 #define PAGE_MIGRATION 1
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-			struct page *, struct page *, bool);
+			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode mode);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode mode);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -36,10 +49,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode mode) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode mode) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index fb29158..71a58f6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -557,7 +557,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
 				(unsigned long)cc, false,
-				cc->sync);
+				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 06d3479..56080ea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1557,7 +1557,7 @@ int soft_offline_page(struct page *page, int flags)
 					    page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
-								0, true);
+							0, MIGRATE_SYNC);
 		if (ret) {
 			putback_lru_pages(&pagelist);
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2168489..6629faf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -809,7 +809,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		}
 		/* this function returns # of failed pages */
 		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
+							true, MIGRATE_SYNC);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e3d58f0..06b145f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -942,7 +942,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_node_page, dest,
-								false, true);
+							false, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index 4e86f3b..9871a56 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -218,12 +218,13 @@ out:
 
 #ifdef CONFIG_BLOCK
 /* Returns true if all buffers are successfully locked */
-static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
+static bool buffer_migrate_lock_buffers(struct buffer_head *head,
+							enum migrate_mode mode)
 {
 	struct buffer_head *bh = head;
 
 	/* Simple case, sync compaction */
-	if (sync) {
+	if (mode != MIGRATE_ASYNC) {
 		do {
 			get_bh(bh);
 			lock_buffer(bh);
@@ -259,7 +260,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
 }
 #else
 static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
-								bool sync)
+							enum migrate_mode mode)
 {
 	return true;
 }
@@ -275,7 +276,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
  */
 static int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page,
-		struct buffer_head *head, bool sync)
+		struct buffer_head *head, enum migrate_mode mode)
 {
 	int expected_count;
 	void **pslot;
@@ -311,7 +312,8 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	 * the mapping back due to an elevated page count, we would have to
 	 * block waiting on other references to be dropped.
 	 */
-	if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
+	if (mode == MIGRATE_ASYNC && head &&
+			!buffer_migrate_lock_buffers(head, mode)) {
 		page_unfreeze_refs(page, expected_count);
 		spin_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
@@ -472,13 +474,14 @@ EXPORT_SYMBOL(fail_migrate_page);
  * Pages are locked upon entry and exit.
  */
 int migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page, bool sync)
+		struct page *newpage, struct page *page,
+		enum migrate_mode mode)
 {
 	int rc;
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
+	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
 
 	if (rc)
 		return rc;
@@ -495,17 +498,17 @@ EXPORT_SYMBOL(migrate_page);
  * exist.
  */
 int buffer_migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page, bool sync)
+		struct page *newpage, struct page *page, enum migrate_mode mode)
 {
 	struct buffer_head *bh, *head;
 	int rc;
 
 	if (!page_has_buffers(page))
-		return migrate_page(mapping, newpage, page, sync);
+		return migrate_page(mapping, newpage, page, mode);
 
 	head = page_buffers(page);
 
-	rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
+	rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
 
 	if (rc)
 		return rc;
@@ -515,8 +518,8 @@ int buffer_migrate_page(struct address_space *mapping,
 	 * with an IRQ-safe spinlock held. In the sync case, the buffers
 	 * need to be locked now
 	 */
-	if (sync)
-		BUG_ON(!buffer_migrate_lock_buffers(head, sync));
+	if (mode != MIGRATE_ASYNC)
+		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
 	set_page_private(newpage, page_private(page));
@@ -593,10 +596,11 @@ static int writeout(struct address_space *mapping, struct page *page)
  * Default handling if a filesystem does not provide a migration function.
  */
 static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page, bool sync)
+	struct page *newpage, struct page *page, enum migrate_mode mode)
 {
 	if (PageDirty(page)) {
-		if (!sync)
+		/* Only writeback pages in full synchronous migration */
+		if (mode != MIGRATE_SYNC)
 			return -EBUSY;
 		return writeout(mapping, page);
 	}
@@ -609,7 +613,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
 
-	return migrate_page(mapping, newpage, page, sync);
+	return migrate_page(mapping, newpage, page, mode);
 }
 
 /*
@@ -624,7 +628,7 @@ static int fallback_migrate_page(struct address_space *mapping,
  *  == 0 - success
  */
 static int move_to_new_page(struct page *newpage, struct page *page,
-					int remap_swapcache, bool sync)
+				int remap_swapcache, enum migrate_mode mode)
 {
 	struct address_space *mapping;
 	int rc;
@@ -645,7 +649,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 
 	mapping = page_mapping(page);
 	if (!mapping)
-		rc = migrate_page(mapping, newpage, page, sync);
+		rc = migrate_page(mapping, newpage, page, mode);
 	else if (mapping->a_ops->migratepage)
 		/*
 		 * Most pages have a mapping and most filesystems provide a
@@ -654,9 +658,9 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 		 * is the most common path for page migration.
 		 */
 		rc = mapping->a_ops->migratepage(mapping,
-						newpage, page, sync);
+						newpage, page, mode);
 	else
-		rc = fallback_migrate_page(mapping, newpage, page, sync);
+		rc = fallback_migrate_page(mapping, newpage, page, mode);
 
 	if (rc) {
 		newpage->mapping = NULL;
@@ -671,7 +675,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 }
 
 static int __unmap_and_move(struct page *page, struct page *newpage,
-				int force, bool offlining, bool sync)
+			int force, bool offlining, enum migrate_mode mode)
 {
 	int rc = -EAGAIN;
 	int remap_swapcache = 1;
@@ -680,7 +684,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	struct anon_vma *anon_vma = NULL;
 
 	if (!trylock_page(page)) {
-		if (!force || !sync)
+		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
 
 		/*
@@ -726,10 +730,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (PageWriteback(page)) {
 		/*
-		 * For !sync, there is no point retrying as the retry loop
-		 * is expected to be too short for PageWriteback to be cleared
+		 * Only in the case of a full syncronous migration is it
+		 * necessary to wait for PageWriteback. In the async case,
+		 * the retry loop is too short and in the sync-light case,
+		 * the overhead of stalling is too much
 		 */
-		if (!sync) {
+		if (mode != MIGRATE_SYNC) {
 			rc = -EBUSY;
 			goto uncharge;
 		}
@@ -800,7 +806,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 skip_unmap:
 	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page, remap_swapcache, sync);
+		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
 
 	if (rc && remap_swapcache)
 		remove_migration_ptes(page, page);
@@ -823,7 +829,8 @@ out:
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, bool offlining, bool sync)
+			struct page *page, int force, bool offlining,
+			enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -843,7 +850,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 		if (unlikely(split_huge_page(page)))
 			goto out;
 
-	rc = __unmap_and_move(page, newpage, force, offlining, sync);
+	rc = __unmap_and_move(page, newpage, force, offlining, mode);
 out:
 	if (rc != -EAGAIN) {
 		/*
@@ -891,7 +898,8 @@ out:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, bool offlining, bool sync)
+				int force, bool offlining,
+				enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -904,7 +912,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force || !sync)
+		if (!force || mode != MIGRATE_SYNC)
 			goto out;
 		lock_page(hpage);
 	}
@@ -915,7 +923,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 
 	if (!page_mapped(hpage))
-		rc = move_to_new_page(new_hpage, hpage, 1, sync);
+		rc = move_to_new_page(new_hpage, hpage, 1, mode);
 
 	if (rc)
 		remove_migration_ptes(hpage, hpage);
@@ -958,7 +966,7 @@ out:
  */
 int migrate_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -979,7 +987,7 @@ int migrate_pages(struct list_head *from,
 
 			rc = unmap_and_move(get_new_page, private,
 						page, pass > 2, offlining,
-						sync);
+						mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1009,7 +1017,7 @@ out:
 
 int migrate_huge_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -1026,7 +1034,7 @@ int migrate_huge_pages(struct list_head *from,
 
 			rc = unmap_and_move_huge_page(get_new_page,
 					private, page, pass > 2, offlining,
-					sync);
+					mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1155,7 +1163,7 @@ set_status:
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0, true);
+				(unsigned long)pm, 0, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (5 preceding siblings ...)
  2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction Ben Hutchings
                   ` (67 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit fe4b1b244bdb96136855f2c694071cb09d140766 upstream.

Stable note: Not tracked on Bugzilla. THP and compaction was found to
	aggressively reclaim pages and stall systems under different
	situations that was addressed piecemeal over time. This patch
	addresses a problem where the fix regressed THP allocation
	success rates.

In commit e0887c19 ("vmscan: limit direct reclaim for higher order
allocations"), Rik noted that reclaim was too aggressive when THP was
enabled.  In his initial patch he used the number of free pages to decide
if reclaim should abort for compaction.  My feedback was that reclaim and
compaction should be using the same logic when deciding if reclaim should
be aborted.

Unfortunately, this had the effect of reducing THP success rates when the
workload included something like streaming reads that continually
allocated pages.  The window during which compaction could run and return
a THP was too small.

This patch combines Rik's two patches together.  compaction_suitable() is
still used to decide if reclaim should be aborted to allow compaction is
used.  However, it will also ensure that there is a reasonable buffer of
free pages available.  This improves upon the THP allocation success rates
but bounds the number of pages that are freed for compaction.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |   44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2119,6 +2119,42 @@ restart:
 	throttle_vm_writeout(sc->gfp_mask);
 }
 
+/* Returns true if compaction should go ahead for a high-order request */
+static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+{
+	unsigned long balance_gap, watermark;
+	bool watermark_ok;
+
+	/* Do not consider compaction for orders reclaim is meant to satisfy */
+	if (sc->order <= PAGE_ALLOC_COSTLY_ORDER)
+		return false;
+
+	/*
+	 * Compaction takes time to run and there are potentially other
+	 * callers using the pages just freed. Continue reclaiming until
+	 * there is a buffer of free pages available to give compaction
+	 * a reasonable chance of completing and allocating the page
+	 */
+	balance_gap = min(low_wmark_pages(zone),
+		(zone->present_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+			KSWAPD_ZONE_BALANCE_GAP_RATIO);
+	watermark = high_wmark_pages(zone) + balance_gap + (2UL << sc->order);
+	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
+
+	/*
+	 * If compaction is deferred, reclaim up to a point where
+	 * compaction will have a chance of success when re-enabled
+	 */
+	if (compaction_deferred(zone))
+		return watermark_ok;
+
+	/* If compaction is not ready to start, keep reclaiming */
+	if (!compaction_suitable(zone, sc->order))
+		return false;
+
+	return watermark_ok;
+}
+
 /*
  * This is the direct reclaim path, for page-allocating processes.  We only
  * try to reclaim pages from zones which will satisfy the caller's allocation
@@ -2136,8 +2172,8 @@ restart:
  * scan then give up on it.
  *
  * This function returns true if a zone is being reclaimed for a costly
- * high-order allocation and compaction is either ready to begin or deferred.
- * This indicates to the caller that it should retry the allocation or fail.
+ * high-order allocation and compaction is ready to begin. This indicates to
+ * the caller that it should retry the allocation or fail.
  */
 static bool shrink_zones(int priority, struct zonelist *zonelist,
 					struct scan_control *sc)
@@ -2171,9 +2207,7 @@ static bool shrink_zones(int priority, s
 				 * noticable problem, like transparent huge page
 				 * allocations.
 				 */
-				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-					(compaction_suitable(zone, sc->order) ||
-					 compaction_deferred(zone))) {
+				if (compaction_ready(zone, sc)) {
 					should_abort_reclaim = true;
 					continue;
 				}



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (6 preceding siblings ...)
  2012-07-31  4:43 ` [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Ben Hutchings
                   ` (66 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 7335084d446b83cbcb15da80497d03f0c1dc9e21 upstream.

Stable note: Not tracked in Bugzilla. This patch makes later patches
	easier to apply but otherwise has little to justify it. The
	problem it fixes was never observed but the source of the
	theoretical problem did not exist for very long.

During direct reclaim it is possible that reclaim will be aborted so that
compaction can be attempted to satisfy a high-order allocation.  If this
decision is made before any pages are reclaimed, it is possible that 0 is
returned to the page allocator potentially triggering an OOM.  This has
not been observed but it is a possibility so this patch addresses it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2284,6 +2284,7 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	bool should_abort_reclaim;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2295,7 +2296,8 @@ static unsigned long do_try_to_free_page
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token(sc->mem_cgroup);
-		if (shrink_zones(priority, zonelist, sc))
+		should_abort_reclaim = shrink_zones(priority, zonelist, sc);
+		if (should_abort_reclaim)
 			break;
 
 		/*
@@ -2363,6 +2365,10 @@ out:
 	if (oom_killer_disabled)
 		return 0;
 
+	/* Aborting reclaim to try compaction? don't OOM, then */
+	if (should_abort_reclaim)
+		return 1;
+
 	/* top priority shrink_zones still had more to do? don't OOM, then */
 	if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
 		return 1;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (7 preceding siblings ...)
  2012-07-31  4:43 ` [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 10/73] vmscan: promote shared file mapped pages Ben Hutchings
                   ` (65 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Rik van Riel, Andrea Arcangeli,
	Minchan Kim, Dave Jones, Jan Kara, Andy Isaacson, Nai Xia,
	Johannes Weiner

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 0cee34fd72c582b4f8ad8ce00645b75fb4168199 upstream.

Stable note: Not tracked on Bugzilla. THP and compaction was found to
	aggressively reclaim pages and stall systems under different
	situations that was addressed piecemeal over time.

If compaction can proceed for a given zone, shrink_zones() does not
reclaim any more pages from it.  After commit [e0c2327: vmscan: abort
reclaim/compaction if compaction can proceed], do_try_to_free_pages()
tries to finish as soon as possible once one zone can compact.

This was intended to prevent slabs being shrunk unnecessarily but there
are side-effects.  One is that a small zone that is ready for compaction
will abort reclaim even if the chances of successfully allocating a THP
from that zone is small.  It also means that reclaim can return too early
even though sc->nr_to_reclaim pages were not reclaimed.

This partially reverts the commit until it is proven that slabs are really
being shrunk unnecessarily but preserves the check to return 1 to avoid
OOM if reclaim was aborted prematurely.

[aarcange@redhat.com: This patch replaces a revert from Andrea]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2173,7 +2173,8 @@ static inline bool compaction_ready(stru
  *
  * This function returns true if a zone is being reclaimed for a costly
  * high-order allocation and compaction is ready to begin. This indicates to
- * the caller that it should retry the allocation or fail.
+ * the caller that it should consider retrying the allocation instead of
+ * further reclaim.
  */
 static bool shrink_zones(int priority, struct zonelist *zonelist,
 					struct scan_control *sc)
@@ -2182,7 +2183,7 @@ static bool shrink_zones(int priority, s
 	struct zone *zone;
 	unsigned long nr_soft_reclaimed;
 	unsigned long nr_soft_scanned;
-	bool should_abort_reclaim = false;
+	bool aborted_reclaim = false;
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					gfp_zone(sc->gfp_mask), sc->nodemask) {
@@ -2208,7 +2209,7 @@ static bool shrink_zones(int priority, s
 				 * allocations.
 				 */
 				if (compaction_ready(zone, sc)) {
-					should_abort_reclaim = true;
+					aborted_reclaim = true;
 					continue;
 				}
 			}
@@ -2230,7 +2231,7 @@ static bool shrink_zones(int priority, s
 		shrink_zone(priority, zone, sc);
 	}
 
-	return should_abort_reclaim;
+	return aborted_reclaim;
 }
 
 static bool zone_reclaimable(struct zone *zone)
@@ -2284,7 +2285,7 @@ static unsigned long do_try_to_free_page
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
-	bool should_abort_reclaim;
+	bool aborted_reclaim;
 
 	get_mems_allowed();
 	delayacct_freepages_start();
@@ -2296,9 +2297,7 @@ static unsigned long do_try_to_free_page
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token(sc->mem_cgroup);
-		should_abort_reclaim = shrink_zones(priority, zonelist, sc);
-		if (should_abort_reclaim)
-			break;
+		aborted_reclaim = shrink_zones(priority, zonelist, sc);
 
 		/*
 		 * Don't shrink slabs when reclaiming memory from
@@ -2365,8 +2364,8 @@ out:
 	if (oom_killer_disabled)
 		return 0;
 
-	/* Aborting reclaim to try compaction? don't OOM, then */
-	if (should_abort_reclaim)
+	/* Aborted reclaim to try compaction? don't OOM, then */
+	if (aborted_reclaim)
 		return 1;
 
 	/* top priority shrink_zones still had more to do? don't OOM, then */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 10/73] vmscan: promote shared file mapped pages
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (8 preceding siblings ...)
  2012-07-31  4:43 ` [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 11/73] vmscan: activate executable pages after first usage Ben Hutchings
                   ` (64 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Konstantin Khlebnikov, Pekka Enberg,
	Minchan Kim, KAMEZAWA Hiroyuki, Wu Fengguang, Johannes Weiner,
	Nick Piggin, Mel Gorman, Shaohua Li, Rik van Riel

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

commit 34dbc67a644f11ab3475d822d72e25409911e760 upstream.

Stable note: Not tracked in Bugzilla. There were reports of shared
	mapped pages being unfairly reclaimed in comparison to older kernels.
	This is being addressed over time. The specific workload being
	addressed here in described in paragraph four and while paragraph
	five says it did not help performance as such, it made a difference
	to major page faults. I'm aware of at least one bug for a large
	vendor that was due to increased major faults.

Commit 645747462435 ("vmscan: detect mapped file pages used only once")
greatly decreases lifetime of single-used mapped file pages.
Unfortunately it also decreases life time of all shared mapped file
pages.  Because after commit bf3f3bc5e7347 ("mm: don't mark_page_accessed
in fault path") page-fault handler does not mark page active or even
referenced.

Thus page_check_references() activates file page only if it was used twice
while it stays in inactive list, meanwhile it activates anon pages after
first access.  Inactive list can be small enough, this way reclaimer can
accidentally throw away any widely used page if it wasn't used twice in
short period.

After this patch page_check_references() also activate file mapped page at
first inactive list scan if this page is already used multiple times via
several ptes.

I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5
(~2.6.18).  There a complete mess with >100 web/mail/spam/ftp containers,
they share all their files but there a lot of anonymous pages: ~500mb
shared file mapped memory and 15-20Gb non-shared anonymous memory.  In
this situation major-pagefaults are very costly, because all containers
share the same page.  In my load kernel created a disproportionate
pressure on the file memory, compared with the anonymous, they equaled
only if I raise swappiness up to 150 =)

These patches actually wasn't helped a lot in my problem, but I saw
noticable (10-20 times) reduce in count and average time of
major-pagefault in file-mapped areas.

Actually both patches are fixes for commit v2.6.33-5448-g6457474, because
it was aimed at one scenario (singly used pages), but it breaks the logic
in other scenarios (shared and/or executable pages)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 11adc89..753c1e6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -715,7 +715,7 @@ static enum page_references page_check_references(struct page *page,
 		 */
 		SetPageReferenced(page);
 
-		if (referenced_page)
+		if (referenced_page || referenced_ptes > 1)
 			return PAGEREF_ACTIVATE;
 
 		return PAGEREF_KEEP;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 11/73] vmscan: activate executable pages after first usage
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (9 preceding siblings ...)
  2012-07-31  4:43 ` [ 10/73] vmscan: promote shared file mapped pages Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Ben Hutchings
                   ` (63 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Konstantin Khlebnikov, Pekka Enberg,
	Minchan Kim, KAMEZAWA Hiroyuki, Wu Fengguang, Johannes Weiner,
	Nick Piggin, Mel Gorman, Shaohua Li, Rik van Riel

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

commit c909e99364c8b6ca07864d752950b6b4ecf6bef4 upstream.

Stable note: Not tracked in Bugzilla. There were reports of shared
	mapped pages being unfairly reclaimed in comparison to older kernels.
	This is being addressed over time.

Logic added in commit 8cab4754d24a0 ("vmscan: make mapped executable pages
the first class citizen") was noticeably weakened in commit
645747462435d84 ("vmscan: detect mapped file pages used only once").

Currently these pages can become "first class citizens" only after second
usage.  After this patch page_check_references() will activate they after
first usage, and executable code gets yet better chance to stay in memory.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 753c1e6..753a2dc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -718,6 +718,12 @@ static enum page_references page_check_references(struct page *page,
 		if (referenced_page || referenced_ptes > 1)
 			return PAGEREF_ACTIVATE;
 
+		/*
+		 * Activate file-backed executable pages after first usage.
+		 */
+		if (vm_flags & VM_EXEC)
+			return PAGEREF_ACTIVATE;
+
 		return PAGEREF_KEEP;
 	}
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (10 preceding siblings ...)
  2012-07-31  4:43 ` [ 11/73] vmscan: activate executable pages after first usage Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 13/73] mm: test PageSwapBacked in lumpy reclaim Ben Hutchings
                   ` (62 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Minchan Kim, KOSAKI Motohiro, Mel Gorman,
	Rik van Riel, Johannes Weiner, Andrea Arcangeli

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Minchan Kim <minchan@kernel.org>

commit 86cfd3a45042ab242d47f3935a02811a402beab6 upstream.

Stable note: Not tracked in Bugzilla. This patch reduces kswapd CPU
	usage on swapless systems with high anonymous memory usage.

It's pointless to continue reclaiming when we have no swap space and lots
of anon pages in the inactive list.

Without this patch, it is possible when swap is disabled to continue
trying to reclaim when there are only anonymous pages in the system even
though that will not make any progress.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 974162c..b935e6f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2000,8 +2000,9 @@ static inline bool should_continue_reclaim(struct zone *zone,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = (2UL << sc->order);
-	inactive_lru_pages = zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON) +
-				zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+	inactive_lru_pages = zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+	if (nr_swap_pages > 0)
+		inactive_lru_pages += zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
 	if (sc->nr_reclaimed < pages_for_compaction &&
 			inactive_lru_pages > pages_for_compaction)
 		return true;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 13/73] mm: test PageSwapBacked in lumpy reclaim
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (11 preceding siblings ...)
  2012-07-31  4:43 ` [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists Ben Hutchings
                   ` (61 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Hugh Dickins, KOSAKI Motohiro, Minchan Kim,
	Mel Gorman

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 043bcbe5ec51e0478ef2b44acef17193e01d7f70 upstream.

Stable note: Not tracked in Bugzilla. There were reports of shared
	mapped pages being unfairly reclaimed in comparison to older kernels.
	This is being addressed over time. Even though the subject
	refers to lumpy reclaim, it impacts compaction as well.

Lumpy reclaim does well to stop at a PageAnon when there's no swap, but
better is to stop at any PageSwapBacked, which includes shmem/tmpfs too.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b935e6f..8a4e767 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1166,7 +1166,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			 * anon page which don't already have a swap slot is
 			 * pointless.
 			 */
-			if (nr_swap_pages <= 0 && PageAnon(cursor_page) &&
+			if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
 			    !PageSwapCache(cursor_page))
 				break;
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (12 preceding siblings ...)
  2012-07-31  4:43 ` [ 13/73] mm: test PageSwapBacked in lumpy reclaim Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3 Ben Hutchings
                   ` (60 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Johannes Weiner, Mel Gorman

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <jweiner@redhat.com>

commit b95a2f2d486d0d768a92879c023a03757b9c7e58 upstream - WARNING: this is a substitute patch.

Stable note: Not tracked in Bugzilla. This is a partial backport of an
	upstream commit addressing a completely different issue
	that accidentally contained an important fix. The workload
	this patch helps was memcached when IO is started in the
	background. memcached should stay resident but without this patch
	it gets swapped. Sometimes this manifests as a drop in throughput
	but mostly it was observed through /proc/vmstat.

Commit [246e87a9: memcg: fix get_scan_count() for small targets] was meant
to fix a problem whereby small scan targets on memcg were ignored causing
priority to raise too sharply. It forced scanning to take place if the
target was small, memcg or kswapd.

>From the time it was introduced it caused excessive reclaim by kswapd
with workloads being pushed to swap that previously would have stayed
resident. This was accidentally fixed in commit [b95a2f2d: mm: vmscan:
convert global reclaim to per-memcg LRU lists] by making it harder for
kswapd to force scan small targets but that patchset is not suitable for
backporting. This was later changed again by commit [90126375: mm/vmscan:
push lruvec pointer into get_scan_count()] into a format that looks
like it would be a straight-forward backport but there is a subtle
difference due to the use of lruvecs.

The impact of the accidental fix is to make it harder for kswapd to force
scan small targets by taking zone->all_unreclaimable into account. This
patch is the closest equivalent available based on what is backported.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/vmscan.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1911,7 +1911,8 @@ static void get_scan_count(struct zone *
 	 * latencies, so it's better to scan a minimum amount there as
 	 * well.
 	 */
-	if (scanning_global_lru(sc) && current_is_kswapd())
+	if (scanning_global_lru(sc) && current_is_kswapd() &&
+	    zone->all_unreclaimable)
 		force_scan = true;
 	if (!scanning_global_lru(sc))
 		force_scan = true;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (13 preceding siblings ...)
  2012-07-31  4:43 ` [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Ben Hutchings
                   ` (59 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mel Gorman, Miao Xie, David Rientjes,
	Peter Zijlstra, Christoph Lameter

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit cc9a6c8776615f9c194ccf0b63a0aa5628235545 upstream.

Stable note:  Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This
	is part of a series of patches that reduce page allocator overhead.

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
[bwh: Forward-ported from 3.0 to 3.2: apply the upstream changes
 to get_any_partial()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -89,42 +89,33 @@ extern void rebuild_sched_domains(void);
 extern void cpuset_print_task_mems_allowed(struct task_struct *p);
 
 /*
- * reading current mems_allowed and mempolicy in the fastpath must protected
- * by get_mems_allowed()
+ * get_mems_allowed is required when making decisions involving mems_allowed
+ * such as during page allocation. mems_allowed can be updated in parallel
+ * and depending on the new value an operation can fail potentially causing
+ * process failure. A retry loop with get_mems_allowed and put_mems_allowed
+ * prevents these artificial failures.
  */
-static inline void get_mems_allowed(void)
+static inline unsigned int get_mems_allowed(void)
 {
-	current->mems_allowed_change_disable++;
+	return read_seqcount_begin(&current->mems_allowed_seq);
+}
 
-	/*
-	 * ensure that reading mems_allowed and mempolicy happens after the
-	 * update of ->mems_allowed_change_disable.
-	 *
-	 * the write-side task finds ->mems_allowed_change_disable is not 0,
-	 * and knows the read-side task is reading mems_allowed or mempolicy,
-	 * so it will clear old bits lazily.
-	 */
-	smp_mb();
-}
-
-static inline void put_mems_allowed(void)
-{
-	/*
-	 * ensure that reading mems_allowed and mempolicy before reducing
-	 * mems_allowed_change_disable.
-	 *
-	 * the write-side task will know that the read-side task is still
-	 * reading mems_allowed or mempolicy, don't clears old bits in the
-	 * nodemask.
-	 */
-	smp_mb();
-	--ACCESS_ONCE(current->mems_allowed_change_disable);
+/*
+ * If this returns false, the operation that took place after get_mems_allowed
+ * may have failed. It is up to the caller to retry the operation if
+ * appropriate.
+ */
+static inline bool put_mems_allowed(unsigned int seq)
+{
+	return !read_seqcount_retry(&current->mems_allowed_seq, seq);
 }
 
 static inline void set_mems_allowed(nodemask_t nodemask)
 {
 	task_lock(current);
+	write_seqcount_begin(&current->mems_allowed_seq);
 	current->mems_allowed = nodemask;
+	write_seqcount_end(&current->mems_allowed_seq);
 	task_unlock(current);
 }
 
@@ -234,12 +225,14 @@ static inline void set_mems_allowed(node
 {
 }
 
-static inline void get_mems_allowed(void)
+static inline unsigned int get_mems_allowed(void)
 {
+	return 0;
 }
 
-static inline void put_mems_allowed(void)
+static inline bool put_mems_allowed(unsigned int seq)
 {
+	return true;
 }
 
 #endif /* !CONFIG_CPUSETS */
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -30,6 +30,13 @@ extern struct fs_struct init_fs;
 #define INIT_THREADGROUP_FORK_LOCK(sig)
 #endif
 
+#ifdef CONFIG_CPUSETS
+#define INIT_CPUSET_SEQ							\
+	.mems_allowed_seq = SEQCNT_ZERO,
+#else
+#define INIT_CPUSET_SEQ
+#endif
+
 #define INIT_SIGNALS(sig) {						\
 	.nr_threads	= 1,						\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
@@ -193,6 +200,7 @@ extern struct cred init_cred;
 	INIT_FTRACE_GRAPH						\
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
+	INIT_CPUSET_SEQ							\
 }
 
 
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1481,7 +1481,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_CPUSETS
 	nodemask_t mems_allowed;	/* Protected by alloc_lock */
-	int mems_allowed_change_disable;
+	seqcount_t mems_allowed_seq;	/* Seqence no to catch updates */
 	int cpuset_mem_spread_rotor;
 	int cpuset_slab_spread_rotor;
 #endif
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -964,7 +964,6 @@ static void cpuset_change_task_nodemask(
 {
 	bool need_loop;
 
-repeat:
 	/*
 	 * Allow tasks that have access to memory reserves because they have
 	 * been OOM killed to get memory anywhere.
@@ -983,45 +982,19 @@ repeat:
 	 */
 	need_loop = task_has_mempolicy(tsk) ||
 			!nodes_intersects(*newmems, tsk->mems_allowed);
-	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
-	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
 
-	/*
-	 * ensure checking ->mems_allowed_change_disable after setting all new
-	 * allowed nodes.
-	 *
-	 * the read-side task can see an nodemask with new allowed nodes and
-	 * old allowed nodes. and if it allocates page when cpuset clears newly
-	 * disallowed ones continuous, it can see the new allowed bits.
-	 *
-	 * And if setting all new allowed nodes is after the checking, setting
-	 * all new allowed nodes and clearing newly disallowed ones will be done
-	 * continuous, and the read-side task may find no node to alloc page.
-	 */
-	smp_mb();
-
-	/*
-	 * Allocation of memory is very fast, we needn't sleep when waiting
-	 * for the read-side.
-	 */
-	while (need_loop && ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
-		task_unlock(tsk);
-		if (!task_curr(tsk))
-			yield();
-		goto repeat;
-	}
+	if (need_loop)
+		write_seqcount_begin(&tsk->mems_allowed_seq);
 
-	/*
-	 * ensure checking ->mems_allowed_change_disable before clearing all new
-	 * disallowed nodes.
-	 *
-	 * if clearing newly disallowed bits before the checking, the read-side
-	 * task may find no node to alloc page.
-	 */
-	smp_mb();
+	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
+	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
 
 	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP2);
 	tsk->mems_allowed = *newmems;
+
+	if (need_loop)
+		write_seqcount_end(&tsk->mems_allowed_seq);
+
 	task_unlock(tsk);
 }
 
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -979,6 +979,9 @@ static int copy_signal(unsigned long clo
 #ifdef CONFIG_CGROUPS
 	init_rwsem(&sig->threadgroup_fork_lock);
 #endif
+#ifdef CONFIG_CPUSETS
+	seqcount_init(&tsk->mems_allowed_seq);
+#endif
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -500,10 +500,13 @@ struct page *__page_cache_alloc(gfp_t gf
 	struct page *page;
 
 	if (cpuset_do_page_mem_spread()) {
-		get_mems_allowed();
-		n = cpuset_mem_spread_node();
-		page = alloc_pages_exact_node(n, gfp, 0);
-		put_mems_allowed();
+		unsigned int cpuset_mems_cookie;
+		do {
+			cpuset_mems_cookie = get_mems_allowed();
+			n = cpuset_mem_spread_node();
+			page = alloc_pages_exact_node(n, gfp, 0);
+		} while (!put_mems_allowed(cpuset_mems_cookie) && !page);
+
 		return page;
 	}
 	return alloc_pages(gfp, 0);
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -532,14 +532,16 @@ static struct page *dequeue_huge_page_vm
 				struct vm_area_struct *vma,
 				unsigned long address, int avoid_reserve)
 {
-	struct page *page = NULL;
+	struct page *page;
 	struct mempolicy *mpol;
 	nodemask_t *nodemask;
 	struct zonelist *zonelist;
 	struct zone *zone;
 	struct zoneref *z;
+	unsigned int cpuset_mems_cookie;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
 	zonelist = huge_zonelist(vma, address,
 					htlb_alloc_mask, &mpol, &nodemask);
 	/*
@@ -566,10 +568,15 @@ static struct page *dequeue_huge_page_vm
 			}
 		}
 	}
-err:
+
 	mpol_cond_put(mpol);
-	put_mems_allowed();
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
 	return page;
+
+err:
+	mpol_cond_put(mpol);
+	return NULL;
 }
 
 static void update_and_free_page(struct hstate *h, struct page *page)
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1843,18 +1843,24 @@ struct page *
 alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		unsigned long addr, int node)
 {
-	struct mempolicy *pol = get_vma_policy(current, vma, addr);
+	struct mempolicy *pol;
 	struct zonelist *zl;
 	struct page *page;
+	unsigned int cpuset_mems_cookie;
+
+retry_cpuset:
+	pol = get_vma_policy(current, vma, addr);
+	cpuset_mems_cookie = get_mems_allowed();
 
-	get_mems_allowed();
 	if (unlikely(pol->mode == MPOL_INTERLEAVE)) {
 		unsigned nid;
 
 		nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
 		mpol_cond_put(pol);
 		page = alloc_page_interleave(gfp, order, nid);
-		put_mems_allowed();
+		if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+			goto retry_cpuset;
+
 		return page;
 	}
 	zl = policy_zonelist(gfp, pol, node);
@@ -1865,7 +1871,8 @@ alloc_pages_vma(gfp_t gfp, int order, st
 		struct page *page =  __alloc_pages_nodemask(gfp, order,
 						zl, policy_nodemask(gfp, pol));
 		__mpol_put(pol);
-		put_mems_allowed();
+		if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+			goto retry_cpuset;
 		return page;
 	}
 	/*
@@ -1873,7 +1880,8 @@ alloc_pages_vma(gfp_t gfp, int order, st
 	 */
 	page = __alloc_pages_nodemask(gfp, order, zl,
 				      policy_nodemask(gfp, pol));
-	put_mems_allowed();
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
 	return page;
 }
 
@@ -1900,11 +1908,14 @@ struct page *alloc_pages_current(gfp_t g
 {
 	struct mempolicy *pol = current->mempolicy;
 	struct page *page;
+	unsigned int cpuset_mems_cookie;
 
 	if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
 		pol = &default_policy;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+
 	/*
 	 * No reference counting needed for current->mempolicy
 	 * nor system default_policy
@@ -1915,7 +1926,10 @@ struct page *alloc_pages_current(gfp_t g
 		page = __alloc_pages_nodemask(gfp, order,
 				policy_zonelist(gfp, pol, numa_node_id()),
 				policy_nodemask(gfp, pol));
-	put_mems_allowed();
+
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
+
 	return page;
 }
 EXPORT_SYMBOL(alloc_pages_current);
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2282,8 +2282,9 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	struct zone *preferred_zone;
-	struct page *page;
+	struct page *page = NULL;
 	int migratetype = allocflags_to_migratetype(gfp_mask);
+	unsigned int cpuset_mems_cookie;
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2302,15 +2303,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+
 	/* The preferred zone is used for statistics later */
 	first_zones_zonelist(zonelist, high_zoneidx,
 				nodemask ? : &cpuset_current_mems_allowed,
 				&preferred_zone);
-	if (!preferred_zone) {
-		put_mems_allowed();
-		return NULL;
-	}
+	if (!preferred_zone)
+		goto out;
 
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
@@ -2320,9 +2321,19 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
-	put_mems_allowed();
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+
+out:
+	/*
+	 * When updating a task's mems_allowed, it is possible to race with
+	 * parallel threads in such a way that an allocation can fail while
+	 * the mask is being updated. If a page allocation is about to fail,
+	 * check if the cpuset changed during allocation and if so, retry.
+	 */
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
+
 	return page;
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
@@ -2546,13 +2557,15 @@ void si_meminfo_node(struct sysinfo *val
 bool skip_free_areas_node(unsigned int flags, int nid)
 {
 	bool ret = false;
+	unsigned int cpuset_mems_cookie;
 
 	if (!(flags & SHOW_MEM_FILTER_NODES))
 		goto out;
 
-	get_mems_allowed();
-	ret = !node_isset(nid, cpuset_current_mems_allowed);
-	put_mems_allowed();
+	do {
+		cpuset_mems_cookie = get_mems_allowed();
+		ret = !node_isset(nid, cpuset_current_mems_allowed);
+	} while (!put_mems_allowed(cpuset_mems_cookie));
 out:
 	return ret;
 }
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3267,12 +3267,10 @@ static void *alternate_node_alloc(struct
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
 	nid_alloc = nid_here = numa_mem_id();
-	get_mems_allowed();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_slab_spread_node();
 	else if (current->mempolicy)
 		nid_alloc = slab_node(current->mempolicy);
-	put_mems_allowed();
 	if (nid_alloc != nid_here)
 		return ____cache_alloc_node(cachep, flags, nid_alloc);
 	return NULL;
@@ -3295,14 +3293,17 @@ static void *fallback_alloc(struct kmem_
 	enum zone_type high_zoneidx = gfp_zone(flags);
 	void *obj = NULL;
 	int nid;
+	unsigned int cpuset_mems_cookie;
 
 	if (flags & __GFP_THISNODE)
 		return NULL;
 
-	get_mems_allowed();
-	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+
 retry:
 	/*
 	 * Look through allowed nodes for objects available
@@ -3355,7 +3356,9 @@ retry:
 			}
 		}
 	}
-	put_mems_allowed();
+
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !obj))
+		goto retry_cpuset;
 	return obj;
 }
 
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1582,6 +1582,7 @@ static struct page *get_any_partial(stru
 	struct zone *zone;
 	enum zone_type high_zoneidx = gfp_zone(flags);
 	void *object;
+	unsigned int cpuset_mems_cookie;
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -1605,23 +1606,32 @@ static struct page *get_any_partial(stru
 			get_cycles() % 1024 > s->remote_node_defrag_ratio)
 		return NULL;
 
-	get_mems_allowed();
-	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
-	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
-		struct kmem_cache_node *n;
+	do {
+		cpuset_mems_cookie = get_mems_allowed();
+		zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+			struct kmem_cache_node *n;
 
-		n = get_node(s, zone_to_nid(zone));
+			n = get_node(s, zone_to_nid(zone));
 
-		if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
-				n->nr_partial > s->min_partial) {
-			object = get_partial_node(s, n, c);
-			if (object) {
-				put_mems_allowed();
-				return object;
+			if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
+					n->nr_partial > s->min_partial) {
+				object = get_partial_node(s, n, c);
+				if (object) {
+					/*
+					 * Return the object even if
+					 * put_mems_allowed indicated that
+					 * the cpuset mems_allowed was
+					 * updated in parallel. It's a
+					 * harmless race between the alloc
+					 * and the cpuset update.
+					 */
+					put_mems_allowed(cpuset_mems_cookie);
+					return object;
+				}
 			}
 		}
-	}
-	put_mems_allowed();
+	} while (!put_mems_allowed(cpuset_mems_cookie));
 #endif
 	return NULL;
 }
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2295,7 +2295,6 @@ static unsigned long do_try_to_free_page
 	unsigned long writeback_threshold;
 	bool aborted_reclaim;
 
-	get_mems_allowed();
 	delayacct_freepages_start();
 
 	if (scanning_global_lru(sc))
@@ -2359,7 +2358,6 @@ static unsigned long do_try_to_free_page
 
 out:
 	delayacct_freepages_end();
-	put_mems_allowed();
 
 	if (sc->nr_reclaimed)
 		return sc->nr_reclaimed;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (14 preceding siblings ...)
  2012-07-31  4:43 ` [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3 Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver Ben Hutchings
                   ` (58 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Konstantin Khlebnikov, Mel Gorman, David Rientjes

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khlebnikov@openvz.org>

commit b1c12cbcd0a02527c180a862e8971e249d3b347d upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This
	is part of a series of patches that reduce page allocator overhead.

Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce
large amounts of memory barrier related damage v3")

Local variable "page" can be uninitialized if the nodemask from vma policy
does not intersects with nodemask from cpuset.  Even if it doesn't happens
it is better to initialize this variable explicitly than to introduce
a kernel oops in a weird corner case.

mm/hugetlb.c: In function `alloc_huge_page':
mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/hugetlb.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cd65cb1..5a16423 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -532,7 +532,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 				struct vm_area_struct *vma,
 				unsigned long address, int avoid_reserve)
 {
-	struct page *page;
+	struct page *page = NULL;
 	struct mempolicy *mpol;
 	nodemask_t *nodemask;
 	struct zonelist *zonelist;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (15 preceding siblings ...)
  2012-07-31  4:43 ` [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations Ben Hutchings
                   ` (57 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Mark Rustad, Marcus Dennis, James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Rustad <mark.d.rustad@intel.com>

commit 222a806af830fda34ad1f6bc991cd226916de060 upstream.

Avoid crashing if the private_data pointer happens to be NULL. This has
been seen sometimes when a host reset happens, notably when there are
many LUNs:

host3: Assigned Port ID 0c1601
scsi host3: libfc: Host reset succeeded on port (0c1601)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000350
IP: [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0
<snip>
Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0)
Stack:
 000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0
 0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80
 00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80
Call Trace:
 [<ffffffff8105b590>] ? lock_timer_base+0x70/0x70
 [<ffffffff81352fbe>] scsi_eh_tur+0x3e/0xc0
 [<ffffffff81353a36>] scsi_eh_test_devices+0x76/0x170
 [<ffffffff81354125>] scsi_eh_host_reset+0x85/0x160
 [<ffffffff81354291>] scsi_eh_ready_devs+0x91/0x110
 [<ffffffff813543fd>] scsi_unjam_host+0xed/0x1f0
 [<ffffffff813546a8>] scsi_error_handler+0x1a8/0x200
 [<ffffffff81354500>] ? scsi_unjam_host+0x1f0/0x1f0
 [<ffffffff8106ec3e>] kthread+0x9e/0xb0
 [<ffffffff81509264>] kernel_thread_helper+0x4/0x10
 [<ffffffff8106eba0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81509260>] ? gs_change+0x13/0x13
Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00
 <48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c
RIP  [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0
 RSP <ffff88030920dc50>
CR2: 0000000000000350


Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -785,7 +785,13 @@ static void scsi_done(struct scsi_cmnd *
 /* Move this to a header if it becomes more generally useful */
 static struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
 {
-	return *(struct scsi_driver **)cmd->request->rq_disk->private_data;
+	struct scsi_driver **sdp;
+
+	sdp = (struct scsi_driver **)cmd->request->rq_disk->private_data;
+	if (!sdp)
+		return NULL;
+
+	return *sdp;
 }
 
 /**



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (16 preceding siblings ...)
  2012-07-31  4:43 ` [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more Ben Hutchings
                   ` (56 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Peter Zijlstra, Venkatesh Pallipadi, Ingo Molnar

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 556061b00c9f2fd6a5524b6bde823ef12f299ecf upstream.

While investigating why the load-balancer did funny I found that the
rq->cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames and context; keep functions static]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1887,7 +1887,7 @@ static void double_rq_unlock(struct rq *
 
 static void update_sysctl(void);
 static int get_update_sysctl_factor(void);
-static void update_cpu_load(struct rq *this_rq);
+static void update_idle_cpu_load(struct rq *this_rq);
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
@@ -3855,22 +3855,13 @@ decay_load_missed(unsigned long load, un
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-static void update_cpu_load(struct rq *this_rq)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+			      unsigned long pending_updates)
 {
-	unsigned long this_load = this_rq->load.weight;
-	unsigned long curr_jiffies = jiffies;
-	unsigned long pending_updates;
 	int i, scale;
 
 	this_rq->nr_load_updates++;
 
-	/* Avoid repeated calls on same jiffy, when moving in and out of idle */
-	if (curr_jiffies == this_rq->last_load_update_tick)
-		return;
-
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-
 	/* Update our load: */
 	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
@@ -3895,9 +3886,45 @@ static void update_cpu_load(struct rq *t
 	sched_avg_update(this_rq);
 }
 
+/*
+ * Called from nohz_idle_balance() to update the load ratings before doing the
+ * idle balance.
+ */
+static void update_idle_cpu_load(struct rq *this_rq)
+{
+	unsigned long curr_jiffies = jiffies;
+	unsigned long load = this_rq->load.weight;
+	unsigned long pending_updates;
+
+	/*
+	 * Bloody broken means of dealing with nohz, but better than nothing..
+	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
+	 * update and see 0 difference the one time and 2 the next, even though
+	 * we ticked at roughtly the same rate.
+	 *
+	 * Hence we only use this from nohz_idle_balance() and skip this
+	 * nonsense when called from the scheduler_tick() since that's
+	 * guaranteed a stable rate.
+	 */
+	if (load || curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
+
+	__update_cpu_load(this_rq, load, pending_updates);
+}
+
+/*
+ * Called from scheduler_tick()
+ */
 static void update_cpu_load_active(struct rq *this_rq)
 {
-	update_cpu_load(this_rq);
+	/*
+	 * See the mess in update_idle_cpu_load().
+	 */
+	this_rq->last_load_update_tick = jiffies;
+	__update_cpu_load(this_rq, this_rq->load.weight, 1);
 
 	calc_load_account_active(this_rq);
 }
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -4735,7 +4735,7 @@ static void nohz_idle_balance(int this_c
 
 		raw_spin_lock_irq(&this_rq->lock);
 		update_rq_clock(this_rq);
-		update_cpu_load(this_rq);
+		update_idle_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 
 		rebalance_domains(balance_cpu, CPU_IDLE);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (17 preceding siblings ...)
  2012-07-31  4:43 ` [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage Ben Hutchings
                   ` (55 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Peter Zijlstra, Venkatesh Pallipadi, Ingo Molnar

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 5aaa0b7a2ed5b12692c9ffb5222182bd558d3146 upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames and context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -145,6 +145,7 @@ extern unsigned long this_cpu_load(void)
 
 
 extern void calc_global_load(unsigned long ticks);
+extern void update_cpu_load_nohz(void);
 
 extern unsigned long get_parent_ip(unsigned long addr);
 
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3886,25 +3886,32 @@ static void __update_cpu_load(struct rq
 	sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_NO_HZ
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we cannot use the delta approach from the regular tick since that
+ * would seriously skew the load calculation. However we'll make do for those
+ * updates happening while idle (nohz_idle_balance) or coming out of idle
+ * (tick_nohz_idle_exit).
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
 /*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
 static void update_idle_cpu_load(struct rq *this_rq)
 {
-	unsigned long curr_jiffies = jiffies;
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
 	unsigned long load = this_rq->load.weight;
 	unsigned long pending_updates;
 
 	/*
-	 * Bloody broken means of dealing with nohz, but better than nothing..
-	 * jiffies is updated by one cpu, another cpu can drift wrt the jiffy
-	 * update and see 0 difference the one time and 2 the next, even though
-	 * we ticked at roughtly the same rate.
-	 *
-	 * Hence we only use this from nohz_idle_balance() and skip this
-	 * nonsense when called from the scheduler_tick() since that's
-	 * guaranteed a stable rate.
+	 * bail if there's load or we're actually up-to-date.
 	 */
 	if (load || curr_jiffies == this_rq->last_load_update_tick)
 		return;
@@ -3916,12 +3923,38 @@ static void update_idle_cpu_load(struct
 }
 
 /*
+ * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ */
+void update_cpu_load_nohz(void)
+{
+	struct rq *this_rq = this_rq();
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	if (curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	raw_spin_lock(&this_rq->lock);
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	if (pending_updates) {
+		this_rq->last_load_update_tick = curr_jiffies;
+		/*
+		 * We were idle, this means load 0, the current load might be
+		 * !0 due to remote wakeups and the sort.
+		 */
+		__update_cpu_load(this_rq, 0, pending_updates);
+	}
+	raw_spin_unlock(&this_rq->lock);
+}
+#endif /* CONFIG_NO_HZ */
+
+/*
  * Called from scheduler_tick()
  */
 static void update_cpu_load_active(struct rq *this_rq)
 {
 	/*
-	 * See the mess in update_idle_cpu_load().
+	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
 	__update_cpu_load(this_rq, this_rq->load.weight, 1);
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -549,6 +549,7 @@ void tick_nohz_restart_sched_tick(void)
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
+	update_cpu_load_nohz();
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	/*



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (18 preceding siblings ...)
  2012-07-31  4:43 ` [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 21/73] cx25821: Remove bad strcpy to read-only char* Ben Hutchings
                   ` (54 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, roger blofeld, Benjamin Herrenschmidt

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: roger blofeld <blofeldus@yahoo.com>

commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.

Just like the module loader, ftrace needs to be updated to use r12
instead of r11 with newer gcc's.

Signed-off-by: Roger Blofeld <blofeldus@yahoo.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/powerpc/kernel/ftrace.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 6f33296..91b46b7 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -240,9 +240,9 @@ __ftrace_make_nop(struct module *mod,
 
 	/*
 	 * On PPC32 the trampoline looks like:
-	 *  0x3d, 0x60, 0x00, 0x00  lis r11,sym@ha
-	 *  0x39, 0x6b, 0x00, 0x00  addi r11,r11,sym@l
-	 *  0x7d, 0x69, 0x03, 0xa6  mtctr r11
+	 *  0x3d, 0x80, 0x00, 0x00  lis r12,sym@ha
+	 *  0x39, 0x8c, 0x00, 0x00  addi r12,r12,sym@l
+	 *  0x7d, 0x89, 0x03, 0xa6  mtctr r12
 	 *  0x4e, 0x80, 0x04, 0x20  bctr
 	 */
 
@@ -257,9 +257,9 @@ __ftrace_make_nop(struct module *mod,
 	pr_devel(" %08x %08x ", jmp[0], jmp[1]);
 
 	/* verify that this is what we expect it to be */
-	if (((jmp[0] & 0xffff0000) != 0x3d600000) ||
-	    ((jmp[1] & 0xffff0000) != 0x396b0000) ||
-	    (jmp[2] != 0x7d6903a6) ||
+	if (((jmp[0] & 0xffff0000) != 0x3d800000) ||
+	    ((jmp[1] & 0xffff0000) != 0x398c0000) ||
+	    (jmp[2] != 0x7d8903a6) ||
 	    (jmp[3] != 0x4e800420)) {
 		printk(KERN_ERR "Not a trampoline\n");
 		return -EINVAL;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 21/73] cx25821: Remove bad strcpy to read-only char*
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (19 preceding siblings ...)
  2012-07-31  4:43 ` [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 22/73] x86: Fix boot on Twinhead H12Y Ben Hutchings
                   ` (53 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Ezequiel Garcia, Radek Masin

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ezequiel Garcia <elezegarcia@gmail.com>

commit 380e99fc44d79bc35af9ff1d3316ef4027ce775e upstream.

The strcpy was being used to set the name of the board.  Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.

The type of char* is changed to const char* to prevent future errors.

Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/media/video/cx25821/cx25821-core.c |    3 ---
 drivers/media/video/cx25821/cx25821.h      |    2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/media/video/cx25821/cx25821-core.c b/drivers/media/video/cx25821/cx25821-core.c
index 83c1aa6..f11f6f0 100644
--- a/drivers/media/video/cx25821/cx25821-core.c
+++ b/drivers/media/video/cx25821/cx25821-core.c
@@ -904,9 +904,6 @@ static int cx25821_dev_setup(struct cx25821_dev *dev)
 	list_add_tail(&dev->devlist, &cx25821_devlist);
 	mutex_unlock(&cx25821_devlist_mutex);
 
-	strcpy(cx25821_boards[UNKNOWN_BOARD].name, "unknown");
-	strcpy(cx25821_boards[CX25821_BOARD].name, "cx25821");
-
 	if (dev->pci->device != 0x8210) {
 		pr_info("%s(): Exiting. Incorrect Hardware device = 0x%02x\n",
 			__func__, dev->pci->device);
diff --git a/drivers/media/video/cx25821/cx25821.h b/drivers/media/video/cx25821/cx25821.h
index b9aa801..029f293 100644
--- a/drivers/media/video/cx25821/cx25821.h
+++ b/drivers/media/video/cx25821/cx25821.h
@@ -187,7 +187,7 @@ enum port {
 };
 
 struct cx25821_board {
-	char *name;
+	const char *name;
 	enum port porta;
 	enum port portb;
 	enum port portc;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 22/73] x86: Fix boot on Twinhead H12Y
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (20 preceding siblings ...)
  2012-07-31  4:43 ` [ 21/73] cx25821: Remove bad strcpy to read-only char* Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 23/73] r8169: RxConfig hack for the 8168evl Ben Hutchings
                   ` (52 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Alan Cox, Ingo Molnar

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alan Cox <alan@linux.intel.com>

commit 80b3e557371205566a71e569fbfcce5b11f92dbe upstream.

Despite lots of investigation into why this is needed we don't
know or have an elegant cure. The only answer found on this
laptop is to mark a problem region as used so that Linux doesn't
put anything there.

Currently all the users add reserve= command lines and anyone
not knowing this needs to find the magic page that documents it.
Automate it instead.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Tested-and-bugfixed-by: Arne Fitzenreiter <arne@fitzenreiter.de>
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=10231
Link: http://lkml.kernel.org/r/20120515174347.5109.94551.stgit@bluebook
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/x86/pci/fixup.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index d0e6e40..5dd467b 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -519,3 +519,20 @@ static void sb600_disable_hpet_bar(struct pci_dev *dev)
 	}
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ATI, 0x4385, sb600_disable_hpet_bar);
+
+/*
+ * Twinhead H12Y needs us to block out a region otherwise we map devices
+ * there and any access kills the box.
+ *
+ *   See: https://bugzilla.kernel.org/show_bug.cgi?id=10231
+ *
+ * Match off the LPC and svid/sdid (older kernels lose the bridge subvendor)
+ */
+static void __devinit twinhead_reserve_killing_zone(struct pci_dev *dev)
+{
+        if (dev->subsystem_vendor == 0x14FF && dev->subsystem_device == 0xA003) {
+                pr_info("Reserving memory on Twinhead H12Y\n");
+                request_mem_region(0xFFB00000, 0x100000, "twinhead");
+        }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x27B9, twinhead_reserve_killing_zone);



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 23/73] r8169: RxConfig hack for the 8168evl.
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (21 preceding siblings ...)
  2012-07-31  4:43 ` [ 22/73] x86: Fix boot on Twinhead H12Y Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Ben Hutchings
                   ` (51 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, françois romieu, David S. Miller

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: françois romieu <romieu@fr.zoreil.com>

commit eb2dc35d99028b698cdedba4f5522bc43e576bd2 upstream.

The 8168evl (RTL_GIGA_MAC_VER_34) based Gigabyte GA-990FXA motherboards
are very prone to NETDEV watchdog problems without this change. See
https://bugzilla.kernel.org/show_bug.cgi?id=42899 for instance.

I don't know why it *works*. It's depressingly effective though.

For the record:
- the problem may go along IOMMU (AMD-Vi) errors but it really looks
  like a red herring.
- the patch sets the RX_MULTI_EN bit. If the 8168c doc is any guide,
  the chipset now fetches several Rx descriptors at a time.
- long ago the driver ignored the RX_MULTI_EN bit.
  e542a2269f232d61270ceddd42b73a4348dee2bb changed the RxConfig
  settings. Whatever the problem it's now labeled a regression.
- Realtek's own driver can identify two different 8168evl devices
  (CFG_METHOD_16 and CFG_METHOD_17) where the r8169 driver only
  sees one. It sucks.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/ethernet/realtek/r8169.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 7260aa7..d7a04e0 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3894,6 +3894,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_22:
 	case RTL_GIGA_MAC_VER_23:
 	case RTL_GIGA_MAC_VER_24:
+	case RTL_GIGA_MAC_VER_34:
 		RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
 		break;
 	default:



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the  read/write kmaps
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (22 preceding siblings ...)
  2012-07-31  4:43 ` [ 23/73] r8169: RxConfig hack for the 8168evl Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 25/73] wireless: rt2x00: rt2800usb add more devices ids Ben Hutchings
                   ` (50 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Jeff Layton, Jian Li, Steve French

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jeff Layton <jlayton@redhat.com>

commit 3cf003c08be785af4bee9ac05891a15bcbff856a upstream.

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/cifs/cifssmb.c |   30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -89,6 +89,32 @@ static struct {
 /* Forward declarations */
 static void cifs_readv_complete(struct work_struct *work);
 
+#ifdef CONFIG_HIGHMEM
+/*
+ * On arches that have high memory, kmap address space is limited. By
+ * serializing the kmap operations on those arches, we ensure that we don't
+ * end up with a bunch of threads in writeback with partially mapped page
+ * arrays, stuck waiting for kmap to come back. That situation prevents
+ * progress and can deadlock.
+ */
+static DEFINE_MUTEX(cifs_kmap_mutex);
+
+static inline void
+cifs_kmap_lock(void)
+{
+	mutex_lock(&cifs_kmap_mutex);
+}
+
+static inline void
+cifs_kmap_unlock(void)
+{
+	mutex_unlock(&cifs_kmap_mutex);
+}
+#else /* !CONFIG_HIGHMEM */
+#define cifs_kmap_lock() do { ; } while(0)
+#define cifs_kmap_unlock() do { ; } while(0)
+#endif /* CONFIG_HIGHMEM */
+
 /* Mark as invalid, all open files on tree connections since they
    were closed when session to server was lost */
 static void mark_open_files_invalid(struct cifs_tcon *pTcon)
@@ -1540,6 +1566,7 @@ cifs_readv_receive(struct TCP_Server_Inf
 	eof_index = eof ? (eof - 1) >> PAGE_CACHE_SHIFT : 0;
 	cFYI(1, "eof=%llu eof_index=%lu", eof, eof_index);
 
+	cifs_kmap_lock();
 	list_for_each_entry_safe(page, tpage, &rdata->pages, lru) {
 		if (remaining >= PAGE_CACHE_SIZE) {
 			/* enough data to fill the page */
@@ -1589,6 +1616,7 @@ cifs_readv_receive(struct TCP_Server_Inf
 			page_cache_release(page);
 		}
 	}
+	cifs_kmap_unlock();
 
 	/* issue the read if we have any iovecs left to fill */
 	if (rdata->nr_iov > 1) {
@@ -2171,6 +2199,7 @@ cifs_async_writev(struct cifs_writedata
 	iov[0].iov_base = smb;
 
 	/* marshal up the pages into iov array */
+	cifs_kmap_lock();
 	wdata->bytes = 0;
 	for (i = 0; i < wdata->nr_pages; i++) {
 		iov[i + 1].iov_len = min(inode->i_size -
@@ -2179,6 +2208,7 @@ cifs_async_writev(struct cifs_writedata
 		iov[i + 1].iov_base = kmap(wdata->pages[i]);
 		wdata->bytes += iov[i + 1].iov_len;
 	}
+	cifs_kmap_unlock();
 
 	cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
 



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 25/73] wireless: rt2x00: rt2800usb add more devices ids
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (23 preceding siblings ...)
  2012-07-31  4:43 ` [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 26/73] wireless: rt2x00: rt2800usb more devices were identified Ben Hutchings
                   ` (49 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Xose Vazquez Perez, Gertjan van Wingerde,
	John W. Linville

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xose Vazquez Perez <xose.vazquez@gmail.com>

commit 63b376411173c343bbcb450f95539da91f079e0c upstream.

They were taken from ralink drivers:
2011_0719_RT3070_RT3370_RT5370_RT5372_Linux_STA_V2.5.0.3_DPO
2012_03_22_RT5572_Linux_STA_v2.6.0.0_DPO

0x1eda,0x2210 RT3070 Airties

0x083a,0xb511 RT3370 Panasonic
0x0471,0x20dd RT3370 Philips

0x1690,0x0764 RT35xx Askey
0x0df6,0x0065 RT35xx Sitecom
0x0df6,0x0066 RT35xx Sitecom
0x0df6,0x0068 RT35xx Sitecom

0x2001,0x3c1c RT5370 DLink
0x2001,0x3c1d RT5370 DLink

2001 is D-Link not Alpha

Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: drop the 5372 devices]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/wireless/rt2x00/rt2800usb.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -876,6 +876,7 @@ static struct usb_device_id rt2800usb_de
 	{ USB_DEVICE(0x1482, 0x3c09) },
 	/* AirTies */
 	{ USB_DEVICE(0x1eda, 0x2012) },
+	{ USB_DEVICE(0x1eda, 0x2210) },
 	{ USB_DEVICE(0x1eda, 0x2310) },
 	/* Allwin */
 	{ USB_DEVICE(0x8516, 0x2070) },
@@ -1088,6 +1089,10 @@ static struct usb_device_id rt2800usb_de
 #ifdef CONFIG_RT2800USB_RT33XX
 	/* Belkin */
 	{ USB_DEVICE(0x050d, 0x945b) },
+	/* Panasonic */
+	{ USB_DEVICE(0x083a, 0xb511) },
+	/* Philips */
+	{ USB_DEVICE(0x0471, 0x20dd) },
 	/* Ralink */
 	{ USB_DEVICE(0x148f, 0x3370) },
 	{ USB_DEVICE(0x148f, 0x8070) },
@@ -1099,6 +1104,7 @@ static struct usb_device_id rt2800usb_de
 	{ USB_DEVICE(0x8516, 0x3572) },
 	/* Askey */
 	{ USB_DEVICE(0x1690, 0x0744) },
+	{ USB_DEVICE(0x1690, 0x0764) },
 	/* Cisco */
 	{ USB_DEVICE(0x167b, 0x4001) },
 	/* EnGenius */
@@ -1113,6 +1119,9 @@ static struct usb_device_id rt2800usb_de
 	/* Sitecom */
 	{ USB_DEVICE(0x0df6, 0x0041) },
 	{ USB_DEVICE(0x0df6, 0x0062) },
+	{ USB_DEVICE(0x0df6, 0x0065) },
+	{ USB_DEVICE(0x0df6, 0x0066) },
+	{ USB_DEVICE(0x0df6, 0x0068) },
 	/* Toshiba */
 	{ USB_DEVICE(0x0930, 0x0a07) },
 	/* Zinwell */
@@ -1122,6 +1131,9 @@ static struct usb_device_id rt2800usb_de
 	/* Azurewave */
 	{ USB_DEVICE(0x13d3, 0x3329) },
 	{ USB_DEVICE(0x13d3, 0x3365) },
+	/* D-Link */
+	{ USB_DEVICE(0x2001, 0x3c1c) },
+	{ USB_DEVICE(0x2001, 0x3c1d) },
 	/* Ralink */
 	{ USB_DEVICE(0x148f, 0x5370) },
 	{ USB_DEVICE(0x148f, 0x5372) },



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 26/73] wireless: rt2x00: rt2800usb more devices were identified
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (24 preceding siblings ...)
  2012-07-31  4:43 ` [ 25/73] wireless: rt2x00: rt2800usb add more devices ids Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device Ben Hutchings
                   ` (48 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Xose Vazquez Perez, Gertjan van Wingerde,
	John W. Linville

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xose Vazquez Perez <xose.vazquez@gmail.com>

commit e828b9fb4f6c3513950759d5fb902db5bd054048 upstream.

found in 2012_03_22_RT5572_Linux_STA_v2.6.0.0_DPO

RT3070:
(0x2019,0x5201)  Planex Communications, Inc. RT8070
(0x7392,0x4085)  2L Central Europe BV 8070
7392 is Edimax

RT35xx:
(0x1690,0x0761) Askey
was Fujitsu Stylistic 550, but 1690 is Askey

Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/wireless/rt2x00/rt2800usb.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index 5851be7..5601302 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -992,6 +992,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* DVICO */
 	{ USB_DEVICE(0x0fe9, 0xb307) },
 	/* Edimax */
+	{ USB_DEVICE(0x7392, 0x4085) },
 	{ USB_DEVICE(0x7392, 0x7711) },
 	{ USB_DEVICE(0x7392, 0x7717) },
 	{ USB_DEVICE(0x7392, 0x7718) },
@@ -1067,6 +1068,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* Philips */
 	{ USB_DEVICE(0x0471, 0x200f) },
 	/* Planex */
+	{ USB_DEVICE(0x2019, 0x5201) },
 	{ USB_DEVICE(0x2019, 0xab25) },
 	{ USB_DEVICE(0x2019, 0xed06) },
 	/* Quanta */
@@ -1150,6 +1152,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x8516, 0x3572) },
 	/* Askey */
 	{ USB_DEVICE(0x1690, 0x0744) },
+	{ USB_DEVICE(0x1690, 0x0761) },
 	{ USB_DEVICE(0x1690, 0x0764) },
 	/* Cisco */
 	{ USB_DEVICE(0x167b, 0x4001) },
@@ -1235,12 +1238,8 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x07d1, 0x3c0b) },
 	{ USB_DEVICE(0x07d1, 0x3c17) },
 	{ USB_DEVICE(0x2001, 0x3c17) },
-	/* Edimax */
-	{ USB_DEVICE(0x7392, 0x4085) },
 	/* Encore */
 	{ USB_DEVICE(0x203d, 0x14a1) },
-	/* Fujitsu Stylistic 550 */
-	{ USB_DEVICE(0x1690, 0x0761) },
 	/* Gemtek */
 	{ USB_DEVICE(0x15a9, 0x0010) },
 	/* Gigabyte */
@@ -1261,7 +1260,6 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x05a6, 0x0101) },
 	{ USB_DEVICE(0x1d4d, 0x0010) },
 	/* Planex */
-	{ USB_DEVICE(0x2019, 0x5201) },
 	{ USB_DEVICE(0x2019, 0xab24) },
 	/* Qcom */
 	{ USB_DEVICE(0x18e8, 0x6259) },



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (25 preceding siblings ...)
  2012-07-31  4:43 ` [ 26/73] wireless: rt2x00: rt2800usb more devices were identified Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
                   ` (47 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Albert Pool, Gertjan van Wingerde,
	John W. Linville

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Albert Pool <albertpool@solcon.nl>

commit 8fd9d059af12786341dec5a688e607bcdb372238 upstream.

D-Link DWA-123 rev A1

Signed-off-by: Albert Pool<albertpool@solcon.nl>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/wireless/rt2x00/rt2800usb.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index bf78317..20a5040 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -1137,6 +1137,8 @@ static struct usb_device_id rt2800usb_device_table[] = {
 #ifdef CONFIG_RT2800USB_RT33XX
 	/* Belkin */
 	{ USB_DEVICE(0x050d, 0x945b) },
+	/* D-Link */
+	{ USB_DEVICE(0x2001, 0x3c17) },
 	/* Panasonic */
 	{ USB_DEVICE(0x083a, 0xb511) },
 	/* Philips */
@@ -1237,7 +1239,6 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* D-Link */
 	{ USB_DEVICE(0x07d1, 0x3c0b) },
 	{ USB_DEVICE(0x07d1, 0x3c17) },
-	{ USB_DEVICE(0x2001, 0x3c17) },
 	/* Encore */
 	{ USB_DEVICE(0x203d, 0x14a1) },
 	/* Gemtek */



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (26 preceding siblings ...)
  2012-07-31  4:43 ` [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-08-01  1:56   ` Herton Ronaldo Krzesinski
  2012-07-31  4:43 ` [ 29/73] usb: gadget: Fix g_ether interface link status Ben Hutchings
                   ` (46 subsequent siblings)
  74 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Nishanth Menon, Steve Sakoman,
	Tony Lindgren, Kevin Hilman

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nishanth Menon <nm@ti.com>

commit b110547e586eb5825bc1d04aa9147bff83b57672 upstream.

Commit 9fa2df6b90786301b175e264f5fa9846aba81a65
(ARM: OMAP2+: OPP: allow OPP enumeration to continue if device is not present)
makes the logic:
for (i = 0; i < opp_def_size; i++) {
	<snip>
	if (!oh || !oh->od) {
		<snip>
		continue;
	}
<snip>
opp_def++;
}

In short, the moment we hit a "Bad OPP", we end up looping the list
comparing against the bad opp definition pointer for the rest of the
iteration count. Instead, increment opp_def in the for loop itself
and allow continue to be used in code without much thought so that
we check the next set of OPP definition pointers :)

Cc: Steve Sakoman <steve@sakoman.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/arm/mach-omap2/opp.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mach-omap2/opp.c b/arch/arm/mach-omap2/opp.c
index de6d464..d8f6dbf 100644
--- a/arch/arm/mach-omap2/opp.c
+++ b/arch/arm/mach-omap2/opp.c
@@ -53,7 +53,7 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
 	omap_table_init = 1;
 
 	/* Lets now register with OPP library */
-	for (i = 0; i < opp_def_size; i++) {
+	for (i = 0; i < opp_def_size; i++, opp_def++) {
 		struct omap_hwmod *oh;
 		struct device *dev;
 
@@ -86,7 +86,6 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
 					__func__, opp_def->freq,
 					opp_def->hwmod_name, i, r);
 		}
-		opp_def++;
 	}
 
 	return 0;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 29/73] usb: gadget: Fix g_ether interface link status
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (27 preceding siblings ...)
  2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Ben Hutchings
                   ` (45 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Kevin Cernekee, Felipe Balbi

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kevin Cernekee <cernekee@gmail.com>

commit 31bde1ceaa873bcaecd49e829bfabceacc4c512d upstream.

A "usb0" interface that has never been connected to a host has an unknown
operstate, and therefore the IFF_RUNNING flag is (incorrectly) asserted
when queried by ifconfig, ifplugd, etc.  This is a result of calling
netif_carrier_off() too early in the probe function; it should be called
after register_netdev().

Similar problems have been fixed in many other drivers, e.g.:

    e826eafa6 (bonding: Call netif_carrier_off after register_netdevice)
    0d672e9f8 (drivers/net: Call netif_carrier_off at the end of the probe)
    6a3c869a6 (cxgb4: fix reported state of interfaces without link)

Fix is to move netif_carrier_off() to the end of the function.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/usb/gadget/u_ether.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index 47cf48b..5b46f02 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -798,12 +798,6 @@ int gether_setup_name(struct usb_gadget *g, u8 ethaddr[ETH_ALEN],
 
 	SET_ETHTOOL_OPS(net, &ops);
 
-	/* two kinds of host-initiated state changes:
-	 *  - iff DATA transfer is active, carrier is "on"
-	 *  - tx queueing enabled if open *and* carrier is "on"
-	 */
-	netif_carrier_off(net);
-
 	dev->gadget = g;
 	SET_NETDEV_DEV(net, &g->dev);
 	SET_NETDEV_DEVTYPE(net, &gadget_type);
@@ -817,6 +811,12 @@ int gether_setup_name(struct usb_gadget *g, u8 ethaddr[ETH_ALEN],
 		INFO(dev, "HOST MAC %pM\n", dev->host_mac);
 
 		the_dev = dev;
+
+		/* two kinds of host-initiated state changes:
+		 *  - iff DATA transfer is active, carrier is "on"
+		 *  - tx queueing enabled if open *and* carrier is "on"
+		 */
+		netif_carrier_off(net);
 	}
 
 	return status;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (28 preceding siblings ...)
  2012-07-31  4:43 ` [ 29/73] usb: gadget: Fix g_ether interface link status Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again Ben Hutchings
                   ` (44 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Theodore Tso

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream.

Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/ext4/balloc.c |    3 ++-
 fs/ext4/bitmap.c |    8 +++-----
 fs/ext4/ext4.h   |    2 +-
 fs/ext4/ialloc.c |    3 ++-
 4 files changed, 8 insertions(+), 8 deletions(-)

--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -557,7 +557,8 @@ ext4_fsblk_t ext4_count_free_clusters(st
 		if (bitmap_bh == NULL)
 			continue;
 
-		x = ext4_count_free(bitmap_bh, sb->s_blocksize);
+		x = ext4_count_free(bitmap_bh->b_data,
+				    EXT4_BLOCKS_PER_GROUP(sb) / 8);
 		printk(KERN_DEBUG "group %u: stored = %d, counted = %u\n",
 			i, ext4_free_group_clusters(sb, gdp), x);
 		bitmap_count += x;
--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -15,15 +15,13 @@
 
 static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0};
 
-unsigned int ext4_count_free(struct buffer_head *map, unsigned int numchars)
+unsigned int ext4_count_free(char *bitmap, unsigned int numchars)
 {
 	unsigned int i, sum = 0;
 
-	if (!map)
-		return 0;
 	for (i = 0; i < numchars; i++)
-		sum += nibblemap[map->b_data[i] & 0xf] +
-			nibblemap[(map->b_data[i] >> 4) & 0xf];
+		sum += nibblemap[bitmap[i] & 0xf] +
+			nibblemap[(bitmap[i] >> 4) & 0xf];
 	return sum;
 }
 
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1757,7 +1757,7 @@ struct mmpd_data {
 # define NORET_AND	noreturn,
 
 /* bitmap.c */
-extern unsigned int ext4_count_free(struct buffer_head *, unsigned);
+extern unsigned int ext4_count_free(char *bitmap, unsigned numchars);
 
 /* balloc.c */
 extern unsigned int ext4_block_group(struct super_block *sb,
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1057,7 +1057,8 @@ unsigned long ext4_count_free_inodes(str
 		if (!bitmap_bh)
 			continue;
 
-		x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8);
+		x = ext4_count_free(bitmap_bh->b_data,
+				    EXT4_INODES_PER_GROUP(sb) / 8);
 		printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
 			(unsigned long) i, ext4_free_inodes_count(sb, gdp), x);
 		bitmap_count += x;



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (29 preceding siblings ...)
  2012-07-31  4:43 ` [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup Ben Hutchings
                   ` (43 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Srivatsa S. Bhat, Rafael J. Wysocki

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>

commit 443772d408a25af62498793f6f805ce3c559309a upstream.

If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).

This issue was explained in more detail in commit f42ac38c59e0a03d (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e84eebb (tracing: allow tracing of
suspend/resume & hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.

So fix this by disabling function tracing during suspend/resume & hibernation.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 kernel/power/hibernate.c |    6 ++++++
 kernel/power/suspend.c   |    3 +++
 2 files changed, 9 insertions(+)

--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -367,6 +367,7 @@ int hibernation_snapshot(int platform_mo
 	}
 
 	suspend_console();
+	ftrace_stop();
 	pm_restrict_gfp_mask();
 	error = dpm_suspend(PMSG_FREEZE);
 	if (error)
@@ -392,6 +393,7 @@ int hibernation_snapshot(int platform_mo
 	if (error || !in_suspend)
 		pm_restore_gfp_mask();
 
+	ftrace_start();
 	resume_console();
 	dpm_complete(msg);
 
@@ -496,6 +498,7 @@ int hibernation_restore(int platform_mod
 
 	pm_prepare_console();
 	suspend_console();
+	ftrace_stop();
 	pm_restrict_gfp_mask();
 	error = dpm_suspend_start(PMSG_QUIESCE);
 	if (!error) {
@@ -503,6 +506,7 @@ int hibernation_restore(int platform_mod
 		dpm_resume_end(PMSG_RECOVER);
 	}
 	pm_restore_gfp_mask();
+	ftrace_start();
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -529,6 +533,7 @@ int hibernation_platform_enter(void)
 
 	entering_platform_hibernation = true;
 	suspend_console();
+	ftrace_stop();
 	error = dpm_suspend_start(PMSG_HIBERNATE);
 	if (error) {
 		if (hibernation_ops->recover)
@@ -572,6 +577,7 @@ int hibernation_platform_enter(void)
  Resume_devices:
 	entering_platform_hibernation = false;
 	dpm_resume_end(PMSG_RESTORE);
+	ftrace_start();
 	resume_console();
 
  Close:
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -25,6 +25,7 @@
 #include <linux/export.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
+#include <linux/ftrace.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -220,6 +221,7 @@ int suspend_devices_and_enter(suspend_st
 			goto Close;
 	}
 	suspend_console();
+	ftrace_stop();
 	suspend_test_start();
 	error = dpm_suspend_start(PMSG_SUSPEND);
 	if (error) {
@@ -239,6 +241,7 @@ int suspend_devices_and_enter(suspend_st
 	suspend_test_start();
 	dpm_resume_end(PMSG_RESUME);
 	suspend_test_finish("resume devices");
+	ftrace_start();
 	resume_console();
  Close:
 	if (suspend_ops->end)



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (30 preceding siblings ...)
  2012-07-31  4:43 ` [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
                   ` (42 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Shuah Khan, Borislav Petkov, H. Peter Anvin

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shuah Khan <shuahkhan@gmail.com>

commit e826abd523913f63eb03b59746ffb16153c53dc4 upstream.

Change reload_for_cpu() in kernel/microcode_core.c to call kstrtoul()
instead of calling obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1336324264.2897.9.camel@lorien2
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/x86/kernel/microcode_core.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index c9bda6d..fbdfc69 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -299,12 +299,11 @@ static ssize_t reload_store(struct device *dev,
 {
 	unsigned long val;
 	int cpu = dev->id;
-	int ret = 0;
-	char *end;
+	ssize_t ret = 0;
 
-	val = simple_strtoul(buf, &end, 0);
-	if (end == buf)
-		return -EINVAL;
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
 
 	if (val == 1) {
 		get_online_cpus();



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (31 preceding siblings ...)
  2012-07-31  4:43 ` [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-08-03  9:04   ` Sven Joachim
  2012-07-31  4:43 ` [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat Ben Hutchings
                   ` (41 subsequent siblings)
  74 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Borislav Petkov,
	Henrique de Moraes Holschuh, Peter Zijlstra, H. Peter Anvin

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <borislav.petkov@amd.com>

commit c9fc3f778a6a215ace14ee556067c73982b6d40f upstream.

Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.

So turn off the possibility of doing that per core and allow it only
system-wide.

This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:

$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...

and disable the interface on the other cores:

$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument

Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.

A more generic fix will follow.

Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/x86/kernel/microcode_core.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index fbdfc69..24b852b 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -298,19 +298,31 @@ static ssize_t reload_store(struct device *dev,
 			    const char *buf, size_t size)
 {
 	unsigned long val;
-	int cpu = dev->id;
-	ssize_t ret = 0;
+	int cpu;
+	ssize_t ret = 0, tmp_ret;
+
+	/* allow reload only from the BSP */
+	if (boot_cpu_data.cpu_index != dev->id)
+		return -EINVAL;
 
 	ret = kstrtoul(buf, 0, &val);
 	if (ret)
 		return ret;
 
-	if (val == 1) {
-		get_online_cpus();
-		if (cpu_online(cpu))
-			ret = reload_for_cpu(cpu);
-		put_online_cpus();
+	if (val != 1)
+		return size;
+
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		tmp_ret = reload_for_cpu(cpu);
+		if (tmp_ret != 0)
+			pr_warn("Error reloading microcode on CPU %d\n", cpu);
+
+		/* save retval of the first encountered reload error */
+		if (!ret)
+			ret = tmp_ret;
 	}
+	put_online_cpus();
 
 	if (!ret)
 		ret = size;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (32 preceding siblings ...)
  2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
                   ` (40 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Hans de Goede, Alan Stern, Greg Kroah-Hartman

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hans de Goede <hdegoede@redhat.com>

commit 2102e06a5f2e414694921f23591f072a5ba7db9f upstream.

iso data buffers may have holes in them if some packets were short, so for
iso urbs we should always copy the entire buffer, just like the regular
processcompl does.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/usb/core/devio.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index e0f1079..62679bc 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1604,10 +1604,14 @@ static int processcompl_compat(struct async *as, void __user * __user *arg)
 	void __user *addr = as->userurb;
 	unsigned int i;
 
-	if (as->userbuffer && urb->actual_length)
-		if (copy_to_user(as->userbuffer, urb->transfer_buffer,
-				 urb->actual_length))
+	if (as->userbuffer && urb->actual_length) {
+		if (urb->number_of_packets > 0)		/* Isochronous */
+			i = urb->transfer_buffer_length;
+		else					/* Non-Isoc */
+			i = urb->actual_length;
+		if (copy_to_user(as->userbuffer, urb->transfer_buffer, i))
 			return -EFAULT;
+	}
 	if (put_user(as->status, &userurb->status))
 		return -EFAULT;
 	if (put_user(urb->actual_length, &userurb->actual_length))



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 35/73] ASoC: dapm: Fix locking during codec shutdown
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (33 preceding siblings ...)
  2012-07-31  4:43 ` [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31 16:11   ` Herton Ronaldo Krzesinski
  2012-07-31  4:43 ` [ 36/73] ext4: fix overhead calculation used by ext4_statfs() Ben Hutchings
                   ` (39 subsequent siblings)
  74 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Liam Girdwood, Misael Lopez Cruz, Mark Brown

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Liam Girdwood <lrg@ti.com>

commit 01005a729a17ab419f61a366e22f3419e7a2c3fe upstream.

Codec shutdown performs a DAPM power sequence that might cause conflicts
and/or race conditions if another stream power event is running simultaneously.
Use card's dapm mutex to protect any potential race condition between them.

Signed-off-by: Misael Lopez Cruz <misael.lopez@ti.com>
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 sound/soc/soc-dapm.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 5be4f9a..114f2af 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -3537,10 +3537,13 @@ EXPORT_SYMBOL_GPL(snd_soc_dapm_free);
 
 static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
 {
+	struct snd_soc_card *card = dapm->card;
 	struct snd_soc_dapm_widget *w;
 	LIST_HEAD(down_list);
 	int powerdown = 0;
 
+	mutex_lock(&card->dapm_mutex);
+
 	list_for_each_entry(w, &dapm->card->widgets, list) {
 		if (w->dapm != dapm)
 			continue;
@@ -3563,6 +3566,8 @@ static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
 			snd_soc_dapm_set_bias_level(dapm,
 						    SND_SOC_BIAS_STANDBY);
 	}
+
+	mutex_unlock(&card->dapm_mutex);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 36/73] ext4: fix overhead calculation used by ext4_statfs()
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (34 preceding siblings ...)
  2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 37/73] udf: Improve table length check to avoid possible overflow Ben Hutchings
                   ` (38 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Theodore Tso

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream.

Commit f975d6bcc7a introduced bug which caused ext4_statfs() to
miscalculate the number of file system overhead blocks.  This causes
the f_blocks field in the statfs structure to be larger than it should
be.  This would in turn cause the "df" output to show the number of
data blocks in the file system and the number of data blocks used to
be larger than they should be.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
I put the call to ext4_calculate_overhead() in ext4_group_add(); is
it the right place?

Ben.
---
 fs/ext4/bitmap.c |    4 --
 fs/ext4/ext4.h   |    4 +-
 fs/ext4/resize.c |    7 ++-
 fs/ext4/super.c  |  174 ++++++++++++++++++++++++++++++++++++++----------------
 4 files changed, 132 insertions(+), 57 deletions(-)

--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -11,8 +11,6 @@
 #include <linux/jbd2.h>
 #include "ext4.h"
 
-#ifdef EXT4FS_DEBUG
-
 static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0};
 
 unsigned int ext4_count_free(char *bitmap, unsigned int numchars)
@@ -25,5 +23,3 @@ unsigned int ext4_count_free(char *bitma
 	return sum;
 }
 
-#endif  /*  EXT4FS_DEBUG  */
-
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1123,8 +1123,7 @@ struct ext4_sb_info {
 	unsigned long s_desc_per_block;	/* Number of group descriptors per block */
 	ext4_group_t s_groups_count;	/* Number of groups in the fs */
 	ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */
-	unsigned long s_overhead_last;  /* Last calculated overhead */
-	unsigned long s_blocks_last;    /* Last seen block count */
+	unsigned long s_overhead;  /* # of fs overhead clusters */
 	unsigned int s_cluster_ratio;	/* Number of blocks per cluster */
 	unsigned int s_cluster_bits;	/* log2 of s_cluster_ratio */
 	loff_t s_bitmap_maxbytes;	/* max bytes for bitmap files */
@@ -1925,6 +1924,7 @@ extern int ext4_group_extend(struct supe
 				ext4_fsblk_t n_blocks_count);
 
 /* super.c */
+extern int ext4_calculate_overhead(struct super_block *sb);
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -952,6 +952,11 @@ int ext4_group_add(struct super_block *s
 			   &sbi->s_flex_groups[flex_group].free_inodes);
 	}
 
+	/*
+	 * Update the fs overhead information
+	 */
+	ext4_calculate_overhead(sb);
+
 	ext4_handle_dirty_super(handle, sb);
 
 exit_journal:
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3083,6 +3083,114 @@ static void ext4_destroy_lazyinit_thread
 	kthread_stop(ext4_lazyinit_task);
 }
 
+/*
+ * Note: calculating the overhead so we can be compatible with
+ * historical BSD practice is quite difficult in the face of
+ * clusters/bigalloc.  This is because multiple metadata blocks from
+ * different block group can end up in the same allocation cluster.
+ * Calculating the exact overhead in the face of clustered allocation
+ * requires either O(all block bitmaps) in memory or O(number of block
+ * groups**2) in time.  We will still calculate the superblock for
+ * older file systems --- and if we come across with a bigalloc file
+ * system with zero in s_overhead_clusters the estimate will be close to
+ * correct especially for very large cluster sizes --- but for newer
+ * file systems, it's better to calculate this figure once at mkfs
+ * time, and store it in the superblock.  If the superblock value is
+ * present (even for non-bigalloc file systems), we will use it.
+ */
+static int count_overhead(struct super_block *sb, ext4_group_t grp,
+			  char *buf)
+{
+	struct ext4_sb_info	*sbi = EXT4_SB(sb);
+	struct ext4_group_desc	*gdp;
+	ext4_fsblk_t		first_block, last_block, b;
+	ext4_group_t		i, ngroups = ext4_get_groups_count(sb);
+	int			s, j, count = 0;
+
+	first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
+		(grp * EXT4_BLOCKS_PER_GROUP(sb));
+	last_block = first_block + EXT4_BLOCKS_PER_GROUP(sb) - 1;
+	for (i = 0; i < ngroups; i++) {
+		gdp = ext4_get_group_desc(sb, i, NULL);
+		b = ext4_block_bitmap(sb, gdp);
+		if (b >= first_block && b <= last_block) {
+			ext4_set_bit(EXT4_B2C(sbi, b - first_block), buf);
+			count++;
+		}
+		b = ext4_inode_bitmap(sb, gdp);
+		if (b >= first_block && b <= last_block) {
+			ext4_set_bit(EXT4_B2C(sbi, b - first_block), buf);
+			count++;
+		}
+		b = ext4_inode_table(sb, gdp);
+		if (b >= first_block && b + sbi->s_itb_per_group <= last_block)
+			for (j = 0; j < sbi->s_itb_per_group; j++, b++) {
+				int c = EXT4_B2C(sbi, b - first_block);
+				ext4_set_bit(c, buf);
+				count++;
+			}
+		if (i != grp)
+			continue;
+		s = 0;
+		if (ext4_bg_has_super(sb, grp)) {
+			ext4_set_bit(s++, buf);
+			count++;
+		}
+		for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) {
+			ext4_set_bit(EXT4_B2C(sbi, s++), buf);
+			count++;
+		}
+	}
+	if (!count)
+		return 0;
+	return EXT4_CLUSTERS_PER_GROUP(sb) -
+		ext4_count_free(buf, EXT4_CLUSTERS_PER_GROUP(sb) / 8);
+}
+
+/*
+ * Compute the overhead and stash it in sbi->s_overhead
+ */
+int ext4_calculate_overhead(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+	ext4_fsblk_t overhead = 0;
+	char *buf = (char *) get_zeroed_page(GFP_KERNEL);
+
+	memset(buf, 0, PAGE_SIZE);
+	if (!buf)
+		return -ENOMEM;
+
+	/*
+	 * Compute the overhead (FS structures).  This is constant
+	 * for a given filesystem unless the number of block groups
+	 * changes so we cache the previous value until it does.
+	 */
+
+	/*
+	 * All of the blocks before first_data_block are overhead
+	 */
+	overhead = EXT4_B2C(sbi, le32_to_cpu(es->s_first_data_block));
+
+	/*
+	 * Add the overhead found in each block group
+	 */
+	for (i = 0; i < ngroups; i++) {
+		int blks;
+
+		blks = count_overhead(sb, i, buf);
+		overhead += blks;
+		if (blks)
+			memset(buf, 0, PAGE_SIZE);
+		cond_resched();
+	}
+	sbi->s_overhead = overhead;
+	smp_wmb();
+	free_page((unsigned long) buf);
+	return 0;
+}
+
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
 	char *orig_data = kstrdup(data, GFP_KERNEL);
@@ -3695,6 +3803,18 @@ static int ext4_fill_super(struct super_
 
 no_journal:
 	/*
+	 * Get the # of file system overhead blocks from the
+	 * superblock if present.
+	 */
+	if (es->s_overhead_clusters)
+		sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
+	else {
+		ret = ext4_calculate_overhead(sb);
+		if (ret)
+			goto failed_mount_wq;
+	}
+
+	/*
 	 * The maximum number of concurrent works can be high and
 	 * concurrency isn't really necessary.  Limit it to 1.
 	 */
@@ -4568,67 +4688,21 @@ restore_opts:
 	return err;
 }
 
-/*
- * Note: calculating the overhead so we can be compatible with
- * historical BSD practice is quite difficult in the face of
- * clusters/bigalloc.  This is because multiple metadata blocks from
- * different block group can end up in the same allocation cluster.
- * Calculating the exact overhead in the face of clustered allocation
- * requires either O(all block bitmaps) in memory or O(number of block
- * groups**2) in time.  We will still calculate the superblock for
- * older file systems --- and if we come across with a bigalloc file
- * system with zero in s_overhead_clusters the estimate will be close to
- * correct especially for very large cluster sizes --- but for newer
- * file systems, it's better to calculate this figure once at mkfs
- * time, and store it in the superblock.  If the superblock value is
- * present (even for non-bigalloc file systems), we will use it.
- */
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_super_block *es = sbi->s_es;
-	struct ext4_group_desc *gdp;
+	ext4_fsblk_t overhead = 0;
 	u64 fsid;
 	s64 bfree;
 
-	if (test_opt(sb, MINIX_DF)) {
-		sbi->s_overhead_last = 0;
-	} else if (es->s_overhead_clusters) {
-		sbi->s_overhead_last = le32_to_cpu(es->s_overhead_clusters);
-	} else if (sbi->s_blocks_last != ext4_blocks_count(es)) {
-		ext4_group_t i, ngroups = ext4_get_groups_count(sb);
-		ext4_fsblk_t overhead = 0;
-
-		/*
-		 * Compute the overhead (FS structures).  This is constant
-		 * for a given filesystem unless the number of block groups
-		 * changes so we cache the previous value until it does.
-		 */
-
-		/*
-		 * All of the blocks before first_data_block are
-		 * overhead
-		 */
-		overhead = EXT4_B2C(sbi, le32_to_cpu(es->s_first_data_block));
-
-		/*
-		 * Add the overhead found in each block group
-		 */
-		for (i = 0; i < ngroups; i++) {
-			gdp = ext4_get_group_desc(sb, i, NULL);
-			overhead += ext4_num_overhead_clusters(sb, i, gdp);
-			cond_resched();
-		}
-		sbi->s_overhead_last = overhead;
-		smp_wmb();
-		sbi->s_blocks_last = ext4_blocks_count(es);
-	}
+	if (!test_opt(sb, MINIX_DF))
+		overhead = sbi->s_overhead;
 
 	buf->f_type = EXT4_SUPER_MAGIC;
 	buf->f_bsize = sb->s_blocksize;
-	buf->f_blocks = (ext4_blocks_count(es) -
-			 EXT4_C2B(sbi, sbi->s_overhead_last));
+	buf->f_blocks = ext4_blocks_count(es) - EXT4_C2B(sbi, sbi->s_overhead);
 	bfree = percpu_counter_sum_positive(&sbi->s_freeclusters_counter) -
 		percpu_counter_sum_positive(&sbi->s_dirtyclusters_counter);
 	/* prevent underflow in case that few free space is available */



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 37/73] udf: Improve table length check to avoid possible overflow
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (35 preceding siblings ...)
  2012-07-31  4:43 ` [ 36/73] ext4: fix overhead calculation used by ext4_statfs() Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 38/73] powerpc: Add "memory" attribute for mfmsr() Ben Hutchings
                   ` (37 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Jan Kara

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 57b9655d01ef057a523e810d29c37ac09b80eead upstream.

When a partition table length is corrupted to be close to 1 << 32, the
check for its length may overflow on 32-bit systems and we will think
the length is valid. Later on the kernel can crash trying to read beyond
end of buffer. Fix the check to avoid possible overflow.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/udf/super.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index 8a75838..dcbf987 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1340,7 +1340,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 	BUG_ON(ident != TAG_IDENT_LVD);
 	lvd = (struct logicalVolDesc *)bh->b_data;
 	table_len = le32_to_cpu(lvd->mapTableLength);
-	if (sizeof(*lvd) + table_len > sb->s_blocksize) {
+	if (table_len > sb->s_blocksize - sizeof(*lvd)) {
 		udf_err(sb, "error loading logical volume descriptor: "
 			"Partition table too long (%u > %lu)\n", table_len,
 			sb->s_blocksize - sizeof(*lvd));



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 38/73] powerpc: Add "memory" attribute for mfmsr()
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (36 preceding siblings ...)
  2012-07-31  4:43 ` [ 37/73] udf: Improve table length check to avoid possible overflow Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 39/73] mwifiex: correction in mcs index check Ben Hutchings
                   ` (36 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Tiejun Chen, Benjamin Herrenschmidt

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tiejun Chen <tiejun.chen@windriver.com>

commit b416c9a10baae6a177b4f9ee858b8d309542fbef upstream.

Add "memory" attribute in inline assembly language as a compiler
barrier to make sure 4.6.x GCC don't reorder mfmsr().

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/powerpc/include/asm/reg.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 2baeb7c..6386086 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1025,7 +1025,8 @@
 /* Macros for setting and retrieving special purpose registers */
 #ifndef __ASSEMBLY__
 #define mfmsr()		({unsigned long rval; \
-			asm volatile("mfmsr %0" : "=r" (rval)); rval;})
+			asm volatile("mfmsr %0" : "=r" (rval) : \
+						: "memory"); rval;})
 #ifdef CONFIG_PPC_BOOK3S_64
 #define __mtmsrd(v, l)	asm volatile("mtmsrd %0," __stringify(l) \
 				     : : "r" (v) : "memory")



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 39/73] mwifiex: correction in mcs index check
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (37 preceding siblings ...)
  2012-07-31  4:43 ` [ 38/73] powerpc: Add "memory" attribute for mfmsr() Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces Ben Hutchings
                   ` (35 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Amitkumar Karwar, Bing Zhao, John W. Linville

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Amitkumar Karwar <akarwar@marvell.com>

commit fe020120cb863ba918c6d603345342a880272c4d upstream.

mwifiex driver supports 2x2 chips as well. Hence valid mcs values
are 0 to 15. The check for mcs index is corrected in this patch.

For example: if 40MHz is enabled and mcs index is 11, "iw link"
command would show "tx bitrate: 108.0 MBit/s" without this patch.
Now it shows "tx bitrate: 108.0 MBit/s MCS 11 40Mhz" with the patch.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/wireless/mwifiex/cfg80211.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c
index 5c7fd18..76b5c0f 100644
--- a/drivers/net/wireless/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/mwifiex/cfg80211.c
@@ -634,9 +634,9 @@ mwifiex_dump_station_info(struct mwifiex_private *priv,
 
 	/*
 	 * Bit 0 in tx_htinfo indicates that current Tx rate is 11n rate. Valid
-	 * MCS index values for us are 0 to 7.
+	 * MCS index values for us are 0 to 15.
 	 */
-	if ((priv->tx_htinfo & BIT(0)) && (priv->tx_rate < 8)) {
+	if ((priv->tx_htinfo & BIT(0)) && (priv->tx_rate < 16)) {
 		sinfo->txrate.mcs = priv->tx_rate;
 		sinfo->txrate.flags |= RATE_INFO_FLAGS_MCS;
 		/* 40MHz rate */



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (38 preceding siblings ...)
  2012-07-31  4:43 ` [ 39/73] mwifiex: correction in mcs index check Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 41/73] USB: option: add ZTE MF821D Ben Hutchings
                   ` (34 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Andrew Bird (Sphere Systems), David S. Miller

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Andrew Bird (Sphere Systems)" <ajb@spheresystems.co.uk>

commit f264ddea0109bf7ce8aab920d64a637e830ace5b upstream.

These interfaces need to be handled by QMI/WWAN driver

Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/usb/serial/option.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 6815701..836cfa9 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -903,8 +903,10 @@ static const struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 0xff),
 	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
-	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff) },
-	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1058, 0xff, 0xff, 0xff) },



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 41/73] USB: option: add ZTE MF821D
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (39 preceding siblings ...)
  2012-07-31  4:43 ` [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE Ben Hutchings
                   ` (33 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Bjørn Mork,
	Thomas SchÀfer, Greg Kroah-Hartman

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjørn Mork <bjorn@mork.no>

commit 09110529780890804b22e997ae6b4fe3f0b3b158 upstream.

Sold by O2 (telefonica germany) under the name "LTE4G"

Tested-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/usb/serial/option.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 2b0c88d..08ff9b8 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -936,6 +936,8 @@ static const struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 0xff),
 	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0326, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff),
 	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff),



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (40 preceding siblings ...)
  2012-07-31  4:43 ` [ 41/73] USB: option: add ZTE MF821D Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 43/73] target: Add range checking to UNMAP emulation Ben Hutchings
                   ` (32 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Roland Dreier, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roland Dreier <roland@purestorage.com>

commit e2397c704429025bc6b331a970f699e52f34283e upstream.

Many SCSI commands are defined to return a CHECK CONDITION / ILLEGAL
REQUEST with ASC set to LOGICAL BLOCK ADDRESS OUT OF RANGE if the
initiator sends a command that accesses a too-big LBA.  Add an enum
value and case entries so that target code can return this status.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/target_core_transport.c |   10 ++++++++++
 include/target/target_core_base.h      |    1 +
 2 files changed, 11 insertions(+)

--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1820,6 +1820,7 @@ static void transport_generic_request_fa
 	case TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE:
 	case TCM_UNKNOWN_MODE_PAGE:
 	case TCM_WRITE_PROTECTED:
+	case TCM_ADDRESS_OUT_OF_RANGE:
 	case TCM_CHECK_CONDITION_ABORT_CMD:
 	case TCM_CHECK_CONDITION_UNIT_ATTENTION:
 	case TCM_CHECK_CONDITION_NOT_READY:
@@ -4496,6 +4497,15 @@ int transport_send_check_condition_and_s
 		/* WRITE PROTECTED */
 		buffer[offset+SPC_ASC_KEY_OFFSET] = 0x27;
 		break;
+	case TCM_ADDRESS_OUT_OF_RANGE:
+		/* CURRENT ERROR */
+		buffer[offset] = 0x70;
+		buffer[offset+SPC_ADD_SENSE_LEN_OFFSET] = 10;
+		/* ILLEGAL REQUEST */
+		buffer[offset+SPC_SENSE_KEY_OFFSET] = ILLEGAL_REQUEST;
+		/* LOGICAL BLOCK ADDRESS OUT OF RANGE */
+		buffer[offset+SPC_ASC_KEY_OFFSET] = 0x21;
+		break;
 	case TCM_CHECK_CONDITION_UNIT_ATTENTION:
 		/* CURRENT ERROR */
 		buffer[offset] = 0x70;
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -157,6 +157,7 @@ enum tcm_sense_reason_table {
 	TCM_CHECK_CONDITION_UNIT_ATTENTION	= 0x0e,
 	TCM_CHECK_CONDITION_NOT_READY		= 0x0f,
 	TCM_RESERVATION_CONFLICT		= 0x10,
+	TCM_ADDRESS_OUT_OF_RANGE		= 0x11,
 };
 
 struct se_obj {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 43/73] target: Add range checking to UNMAP emulation
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (41 preceding siblings ...)
  2012-07-31  4:43 ` [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 44/73] target: Fix reading of data length fields for UNMAP commands Ben Hutchings
                   ` (31 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Roland Dreier, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roland Dreier <roland@purestorage.com>

commit 2594e29865c291db162313187612cd9f14538f33 upstream.

When processing an UNMAP command, we need to make sure that the number
of blocks we're asked to UNMAP does not exceed our reported maximum
number of blocks per UNMAP, and that the range of blocks we're
unmapping doesn't go past the end of the device.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/target_core_cdb.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1145,6 +1145,18 @@ int target_emulate_unmap(struct se_task
 		pr_debug("UNMAP: Using lba: %llu and range: %u\n",
 				 (unsigned long long)lba, range);
 
+		if (range > dev->se_sub_dev->se_dev_attrib.max_unmap_lba_count) {
+			cmd->scsi_sense_reason = TCM_INVALID_PARAMETER_LIST;
+			ret = -EINVAL;
+			goto err;
+		}
+
+		if (lba + range > dev->transport->get_blocks(dev) + 1) {
+			cmd->scsi_sense_reason = TCM_ADDRESS_OUT_OF_RANGE;
+			ret = -EINVAL;
+			goto err;
+		}
+
 		ret = dev->transport->do_discard(dev, lba, range);
 		if (ret < 0) {
 			pr_err("blkdev_issue_discard() failed: %d\n",



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 44/73] target: Fix reading of data length fields for UNMAP commands
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (42 preceding siblings ...)
  2012-07-31  4:43 ` [ 43/73] target: Add range checking to UNMAP emulation Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 45/73] target: Fix possible integer underflow in UNMAP emulation Ben Hutchings
                   ` (30 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Roland Dreier, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roland Dreier <roland@purestorage.com>

commit 1a5fa4576ec8a462313c7516b31d7453481ddbe8 upstream.

The UNMAP DATA LENGTH and UNMAP BLOCK DESCRIPTOR DATA LENGTH fields
are in the unmap descriptor (the payload transferred to our data out
buffer), not in the CDB itself.  Read them from the correct place in
target_emulated_unmap.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/target_core_cdb.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1114,7 +1114,6 @@ int target_emulate_unmap(struct se_task
 	struct se_cmd *cmd = task->task_se_cmd;
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *buf, *ptr = NULL;
-	unsigned char *cdb = &cmd->t_task_cdb[0];
 	sector_t lba;
 	unsigned int size = cmd->data_length, range;
 	int ret = 0, offset;
@@ -1130,11 +1129,12 @@ int target_emulate_unmap(struct se_task
 	/* First UNMAP block descriptor starts at 8 byte offset */
 	offset = 8;
 	size -= 8;
-	dl = get_unaligned_be16(&cdb[0]);
-	bd_dl = get_unaligned_be16(&cdb[2]);
 
 	buf = transport_kmap_data_sg(cmd);
 
+	dl = get_unaligned_be16(&buf[0]);
+	bd_dl = get_unaligned_be16(&buf[2]);
+
 	ptr = &buf[offset];
 	pr_debug("UNMAP: Sub: %s Using dl: %hu bd_dl: %hu size: %hu"
 		" ptr: %p\n", dev->transport->name, dl, bd_dl, size, ptr);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 45/73] target: Fix possible integer underflow in UNMAP emulation
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (43 preceding siblings ...)
  2012-07-31  4:43 ` [ 44/73] target: Fix reading of data length fields for UNMAP commands Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 46/73] target: Check number of unmap descriptors against our limit Ben Hutchings
                   ` (29 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Roland Dreier, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roland Dreier <roland@purestorage.com>

commit b7fc7f3777582dea85156a821d78a522a0c083aa upstream.

It's possible for an initiator to send us an UNMAP command with a
descriptor that is less than 8 bytes; in that case it's really bad for
us to set an unsigned int to that value, subtract 8 from it, and then
use that as a limit for our loop (since the value will wrap around to
a huge positive value).

Fix this by making size be signed and only looping if size >= 16 (ie
if we have at least a full descriptor available).

Also remove offset as an obfuscated name for the constant 8.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/target_core_cdb.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1115,9 +1115,10 @@ int target_emulate_unmap(struct se_task
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *buf, *ptr = NULL;
 	sector_t lba;
-	unsigned int size = cmd->data_length, range;
-	int ret = 0, offset;
-	unsigned short dl, bd_dl;
+	int size = cmd->data_length;
+	u32 range;
+	int ret = 0;
+	int dl, bd_dl;
 
 	if (!dev->transport->do_discard) {
 		pr_err("UNMAP emulation not supported for: %s\n",
@@ -1126,20 +1127,19 @@ int target_emulate_unmap(struct se_task
 		return -ENOSYS;
 	}
 
-	/* First UNMAP block descriptor starts at 8 byte offset */
-	offset = 8;
-	size -= 8;
-
 	buf = transport_kmap_data_sg(cmd);
 
 	dl = get_unaligned_be16(&buf[0]);
 	bd_dl = get_unaligned_be16(&buf[2]);
 
-	ptr = &buf[offset];
-	pr_debug("UNMAP: Sub: %s Using dl: %hu bd_dl: %hu size: %hu"
+	size = min(size - 8, bd_dl);
+
+	/* First UNMAP block descriptor starts at 8 byte offset */
+	ptr = &buf[8];
+	pr_debug("UNMAP: Sub: %s Using dl: %u bd_dl: %u size: %u"
 		" ptr: %p\n", dev->transport->name, dl, bd_dl, size, ptr);
 
-	while (size) {
+	while (size >= 16) {
 		lba = get_unaligned_be64(&ptr[0]);
 		range = get_unaligned_be32(&ptr[8]);
 		pr_debug("UNMAP: Using lba: %llu and range: %u\n",



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 46/73] target: Check number of unmap descriptors against our limit
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (44 preceding siblings ...)
  2012-07-31  4:43 ` [ 45/73] target: Fix possible integer underflow in UNMAP emulation Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 47/73] s390/idle: fix sequence handling vs cpu hotplug Ben Hutchings
                   ` (28 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Roland Dreier, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roland Dreier <roland@purestorage.com>

commit 7409a6657aebf8be74c21d0eded80709b27275cb upstream.

Fail UNMAP commands that have more than our reported limit on unmap
descriptors.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/target_core_cdb.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1133,6 +1133,11 @@ int target_emulate_unmap(struct se_task
 	bd_dl = get_unaligned_be16(&buf[2]);
 
 	size = min(size - 8, bd_dl);
+	if (size / 16 > dev->se_sub_dev->se_dev_attrib.max_unmap_block_desc_count) {
+		cmd->scsi_sense_reason = TCM_INVALID_PARAMETER_LIST;
+		ret = -EINVAL;
+		goto err;
+	}
 
 	/* First UNMAP block descriptor starts at 8 byte offset */
 	ptr = &buf[8];



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 47/73] s390/idle: fix sequence handling vs cpu hotplug
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (45 preceding siblings ...)
  2012-07-31  4:43 ` [ 46/73] target: Check number of unmap descriptors against our limit Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation Ben Hutchings
                   ` (27 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Heiko Carstens, Martin Schwidefsky

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <heiko.carstens@de.ibm.com>

commit 0008204ffe85d23382d6fd0f971f3f0fbe70bae2 upstream.

The s390 idle accounting code uses a sequence counter which gets used
when the per cpu idle statistics get updated and read.

One assumption on read access is that only when the sequence counter is
even and did not change while reading all values the result is valid.
On cpu hotplug however the per cpu data structure gets initialized via
a cpu hotplug notifier on CPU_ONLINE.
CPU_ONLINE however is too late, since the onlined cpu is already running
and might access the per cpu data. Worst case is that the data structure
gets initialized while an idle thread is updating its idle statistics.
This will result in an uneven sequence counter after an update.

As a result user space tools like top, which access /proc/stat in order
to get idle stats, will busy loop waiting for the sequence counter to
become even again, which will never happen until the queried cpu will
update its idle statistics again. And even then the sequence counter
will only have an even value for a couple of cpu cycles.

Fix this by moving the initialization of the per cpu idle statistics
to cpu_init(). I prefer that solution in favor of changing the
notifier to CPU_UP_PREPARE, which would be a different solution to
the problem.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 arch/s390/kernel/processor.c |    2 ++
 arch/s390/kernel/smp.c       |    3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -26,12 +26,14 @@ static DEFINE_PER_CPU(struct cpuid, cpu_
 void __cpuinit cpu_init(void)
 {
 	struct cpuid *id = &per_cpu(cpu_id, smp_processor_id());
+	struct s390_idle_data *idle = &__get_cpu_var(s390_idle);
 
 	get_cpu_id(id);
 	atomic_inc(&init_mm.mm_count);
 	current->active_mm = &init_mm;
 	BUG_ON(current->mm);
 	enter_lazy_tlb(&init_mm, current);
+	memset(idle, 0, sizeof(*idle));
 }
 
 /*
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -1020,14 +1020,11 @@ static int __cpuinit smp_cpu_notify(stru
 	unsigned int cpu = (unsigned int)(long)hcpu;
 	struct cpu *c = &per_cpu(cpu_devices, cpu);
 	struct sys_device *s = &c->sysdev;
-	struct s390_idle_data *idle;
 	int err = 0;
 
 	switch (action) {
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
-		idle = &per_cpu(s390_idle, cpu);
-		memset(idle, 0, sizeof(struct s390_idle_data));
 		err = sysfs_create_group(&s->kobj, &cpu_online_attr_group);
 		break;
 	case CPU_DEAD:



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (46 preceding siblings ...)
  2012-07-31  4:43 ` [ 47/73] s390/idle: fix sequence handling vs cpu hotplug Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:43 ` [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier() Ben Hutchings
                   ` (26 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Forest Bond, Larry Finger, John W. Linville

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Forest Bond <forest.bond@rapidrollout.com>

commit f1b00f4dab29b57bdf1bc03ef12020b280fd2a72 upstream.

Commit d83579e2a50ac68389e6b4c58b845c702cf37516 incorporated some
changes from the vendor driver that made it newly important that the
calculated hardware version correctly include the CHIP_92D bit, as all
of the IS_92D_* macros were changed to depend on it.  However, this bit
was being unset for dual-mac, dual-phy devices.  The vendor driver
behavior was modified to not do this, but unfortunately this change was
not picked up along with the others.  This caused scanning in the 2.4GHz
band to be broken, and possibly other bugs as well.

This patch brings the version calculation logic in parity with the
vendor driver in this regard, and in doing so fixes the regression.
However, the version calculation code in general continues to be largely
incoherent and messy, and needs to be cleaned up.

Signed-off-by: Forest Bond <forest.bond@rapidrollout.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/wireless/rtlwifi/rtl8192de/phy.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192de/phy.c b/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
index 18380a7..4420312 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
@@ -3345,21 +3345,21 @@ void rtl92d_phy_config_macphymode_info(struct ieee80211_hw *hw)
 	switch (rtlhal->macphymode) {
 	case DUALMAC_SINGLEPHY:
 		rtlphy->rf_type = RF_2T2R;
-		rtlhal->version |= CHIP_92D_SINGLEPHY;
+		rtlhal->version |= RF_TYPE_2T2R;
 		rtlhal->bandset = BAND_ON_BOTH;
 		rtlhal->current_bandtype = BAND_ON_2_4G;
 		break;
 
 	case SINGLEMAC_SINGLEPHY:
 		rtlphy->rf_type = RF_2T2R;
-		rtlhal->version |= CHIP_92D_SINGLEPHY;
+		rtlhal->version |= RF_TYPE_2T2R;
 		rtlhal->bandset = BAND_ON_BOTH;
 		rtlhal->current_bandtype = BAND_ON_2_4G;
 		break;
 
 	case DUALMAC_DUALPHY:
 		rtlphy->rf_type = RF_1T1R;
-		rtlhal->version &= (~CHIP_92D_SINGLEPHY);
+		rtlhal->version &= RF_TYPE_1T1R;
 		/* Now we let MAC0 run on 5G band. */
 		if (rtlhal->interfaceindex == 0) {
 			rtlhal->bandset = BAND_ON_5G;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier()
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (47 preceding siblings ...)
  2012-07-31  4:43 ` [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation Ben Hutchings
@ 2012-07-31  4:43 ` Ben Hutchings
  2012-07-31  4:44 ` [ 50/73] ALSA: hda - Add support for Realtek ALC282 Ben Hutchings
                   ` (25 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:43 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Tejun Heo, Rafael J. Wysocki

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 6575820221f7a4dd6eadecf7bf83cdd154335eda upstream.

Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.

Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.

However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.

While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.

Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.

Workqueue cpu hotplug operations will soon go through further cleanup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 include/linux/cpu.h |    5 +++--
 kernel/workqueue.c  |   38 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 3 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -66,8 +66,9 @@ enum {
 	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
 	CPU_PRI_MIGRATION	= 10,
-	/* prepare workqueues for other notifiers */
-	CPU_PRI_WORKQUEUE	= 5,
+	/* bring up workqueues before normal notifiers and down after */
+	CPU_PRI_WORKQUEUE_UP	= 5,
+	CPU_PRI_WORKQUEUE_DOWN	= -5,
 };
 
 #define CPU_ONLINE		0x0002 /* CPU (unsigned)v is up */
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3586,6 +3586,41 @@ static int __devinit workqueue_cpu_callb
 	return notifier_from_errno(0);
 }
 
+/*
+ * Workqueues should be brought up before normal priority CPU notifiers.
+ * This will be registered high priority CPU notifier.
+ */
+static int __devinit workqueue_cpu_up_callback(struct notifier_block *nfb,
+					       unsigned long action,
+					       void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_CANCELED:
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		return workqueue_cpu_callback(nfb, action, hcpu);
+	}
+	return NOTIFY_OK;
+}
+
+/*
+ * Workqueues should be brought down after normal priority CPU notifiers.
+ * This will be registered as low priority CPU notifier.
+ */
+static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
+						 unsigned long action,
+						 void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_PREPARE:
+	case CPU_DYING:
+	case CPU_POST_DEAD:
+		return workqueue_cpu_callback(nfb, action, hcpu);
+	}
+	return NOTIFY_OK;
+}
+
 #ifdef CONFIG_SMP
 
 struct work_for_cpu {
@@ -3779,7 +3814,8 @@ static int __init init_workqueues(void)
 	unsigned int cpu;
 	int i;
 
-	cpu_notifier(workqueue_cpu_callback, CPU_PRI_WORKQUEUE);
+	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
+	cpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
 
 	/* initialize gcwqs */
 	for_each_gcwq_cpu(cpu) {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 50/73] ALSA: hda - Add support for Realtek ALC282
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (48 preceding siblings ...)
  2012-07-31  4:43 ` [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier() Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 51/73] iommu/amd: Fix hotplug with iommu=pt Ben Hutchings
                   ` (24 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, David Henningsson, Ray Chen, Takashi Iwai

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Henningsson <david.henningsson@canonical.com>

commit 4e01ec636e64707d202a1ca21a47bbc6d53085b7 upstream.

This codec has a separate dmic path (separate dmic only ADC),
and thus it looks mostly like ALC275.

BugLink: https://bugs.launchpad.net/bugs/1025377
Tested-by: Ray Chen <ray.chen@canonical.com>
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 sound/pci/hda/patch_realtek.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index a5b0b50..aef3139 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6988,6 +6988,7 @@ static const struct hda_codec_preset snd_hda_preset_realtek[] = {
 	{ .id = 0x10ec0275, .name = "ALC275", .patch = patch_alc269 },
 	{ .id = 0x10ec0276, .name = "ALC276", .patch = patch_alc269 },
 	{ .id = 0x10ec0280, .name = "ALC280", .patch = patch_alc269 },
+	{ .id = 0x10ec0282, .name = "ALC282", .patch = patch_alc269 },
 	{ .id = 0x10ec0861, .rev = 0x100340, .name = "ALC660",
 	  .patch = patch_alc861 },
 	{ .id = 0x10ec0660, .name = "ALC660-VD", .patch = patch_alc861vd },



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 51/73] iommu/amd: Fix hotplug with iommu=pt
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (49 preceding siblings ...)
  2012-07-31  4:44 ` [ 50/73] ALSA: hda - Add support for Realtek ALC282 Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns Ben Hutchings
                   ` (23 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Joerg Roedel

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <joerg.roedel@amd.com>

commit 2c9195e990297068d0f1f1bd8e2f1d09538009da upstream.

This did not work because devices are not put into the
pt_domain. Fix this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
[bwh: Backported to 3.2: do not use iommu_dev_data::passthrough]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/iommu/amd_iommu.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1865,6 +1865,11 @@ static int device_change_notifier(struct
 
 		iommu_init_device(dev);
 
+		if (iommu_pass_through) {
+			attach_device(dev, pt_domain);
+			break;
+		}
+
 		domain = domain_for_device(dev);
 
 		/* allocate a protection domain if a device is added */
@@ -1880,10 +1885,7 @@ static int device_change_notifier(struct
 		list_add_tail(&dma_domain->list, &iommu_pd_list);
 		spin_unlock_irqrestore(&iommu_pd_list_lock, flags);
 
-		if (!iommu_pass_through)
-			dev->archdata.dma_ops = &amd_iommu_dma_ops;
-		else
-			dev->archdata.dma_ops = &nommu_dma_ops;
+		dev->archdata.dma_ops = &amd_iommu_dma_ops;
 
 		break;
 	case BUS_NOTIFY_DEL_DEVICE:



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns.
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (50 preceding siblings ...)
  2012-07-31  4:44 ` [ 51/73] iommu/amd: Fix hotplug with iommu=pt Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare Ben Hutchings
                   ` (22 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Michel Dänzer, Alex Deucher, Dave Airlie

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michel Dänzer <michel.daenzer@amd.com>

commit f60ec4c7df043df81e62891ac45383d012afe0da upstream.

This could previously fail if either of the enabled displays was using a
horizontal resolution that is a multiple of 128, and only the leftmost column
of the cursor was (supposed to be) visible at the right edge of that display.

The solution is to move the cursor one pixel to the left in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33183

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/radeon/radeon_cursor.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c b/drivers/gpu/drm/radeon/radeon_cursor.c
index 42acc64..711e95a 100644
--- a/drivers/gpu/drm/radeon/radeon_cursor.c
+++ b/drivers/gpu/drm/radeon/radeon_cursor.c
@@ -262,8 +262,14 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
 				if (!(cursor_end & 0x7f))
 					w--;
 			}
-			if (w <= 0)
+			if (w <= 0) {
 				w = 1;
+				cursor_end = x - xorigin + w;
+				if (!(cursor_end & 0x7f)) {
+					x--;
+					WARN_ON_ONCE(x < 0);
+				}
+			}
 		}
 	}
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare.
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (51 preceding siblings ...)
  2012-07-31  4:44 ` [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 54/73] block: add blk_queue_dead() Ben Hutchings
                   ` (21 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Dylan Reid, Takashi Iwai

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dylan Reid <dgreid@chromium.org>

commit 9e76e6d031482194a5b24d8e9ab88063fbd6b4b5 upstream.

Turn on the pin widget's PIN_OUT bit from playback prepare. The pin is
enabled in open, but is disabled in hdmi_init_pin which is called during
system resume.  This causes a system suspend/resume during playback to
mute HDMI/DP. Enabling the pin in prepare instead of open allows calling
snd_pcm_prepare after a system resume to restore audio.

Signed-off-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 sound/pci/hda/patch_hdmi.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index 0b4a1ea..641408d 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -876,7 +876,6 @@ static int hdmi_pcm_open(struct hda_pcm_stream *hinfo,
 	struct hdmi_spec_per_pin *per_pin;
 	struct hdmi_eld *eld;
 	struct hdmi_spec_per_cvt *per_cvt = NULL;
-	int pinctl;
 
 	/* Validate hinfo */
 	pin_idx = hinfo_to_pin_index(spec, hinfo);
@@ -912,11 +911,6 @@ static int hdmi_pcm_open(struct hda_pcm_stream *hinfo,
 	snd_hda_codec_write(codec, per_pin->pin_nid, 0,
 			    AC_VERB_SET_CONNECT_SEL,
 			    mux_idx);
-	pinctl = snd_hda_codec_read(codec, per_pin->pin_nid, 0,
-				    AC_VERB_GET_PIN_WIDGET_CONTROL, 0);
-	snd_hda_codec_write(codec, per_pin->pin_nid, 0,
-			    AC_VERB_SET_PIN_WIDGET_CONTROL,
-			    pinctl | PIN_OUT);
 	snd_hda_spdif_ctls_assign(codec, pin_idx, per_cvt->cvt_nid);
 
 	/* Initially set the converter's capabilities */
@@ -1153,11 +1147,17 @@ static int generic_hdmi_playback_pcm_prepare(struct hda_pcm_stream *hinfo,
 	struct hdmi_spec *spec = codec->spec;
 	int pin_idx = hinfo_to_pin_index(spec, hinfo);
 	hda_nid_t pin_nid = spec->pins[pin_idx].pin_nid;
+	int pinctl;
 
 	hdmi_set_channel_count(codec, cvt_nid, substream->runtime->channels);
 
 	hdmi_setup_audio_infoframe(codec, pin_idx, substream);
 
+	pinctl = snd_hda_codec_read(codec, pin_nid, 0,
+				    AC_VERB_GET_PIN_WIDGET_CONTROL, 0);
+	snd_hda_codec_write(codec, pin_nid, 0,
+			    AC_VERB_SET_PIN_WIDGET_CONTROL, pinctl | PIN_OUT);
+
 	return hdmi_setup_stream(codec, cvt_nid, pin_nid, stream_tag, format);
 }
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 54/73] block: add blk_queue_dead()
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (52 preceding siblings ...)
  2012-07-31  4:44 ` [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 55/73] [SCSI] Fix device removal NULL pointer dereference Ben Hutchings
                   ` (20 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Tejun Heo, Jens Axboe

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 34f6055c80285e4efb3f602a9119db75239744dc upstream.

There are a number of QUEUE_FLAG_DEAD tests.  Add blk_queue_dead()
macro and use it.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 block/blk-core.c       |    6 +++---
 block/blk-exec.c       |    2 +-
 block/blk-sysfs.c      |    4 ++--
 block/blk-throttle.c   |    4 ++--
 block/blk.h            |    2 +-
 include/linux/blkdev.h |    1 +
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 435af23..b5ed4f4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -608,7 +608,7 @@ EXPORT_SYMBOL(blk_init_allocated_queue_node);
 
 int blk_get_queue(struct request_queue *q)
 {
-	if (likely(!test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (likely(!blk_queue_dead(q))) {
 		kobject_get(&q->kobj);
 		return 0;
 	}
@@ -755,7 +755,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 	const bool is_sync = rw_is_sync(rw_flags) != 0;
 	int may_queue;
 
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
 	may_queue = elv_may_queue(q, rw_flags);
@@ -875,7 +875,7 @@ static struct request *get_request_wait(struct request_queue *q, int rw_flags,
 		struct io_context *ioc;
 		struct request_list *rl = &q->rq;
 
-		if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+		if (unlikely(blk_queue_dead(q)))
 			return NULL;
 
 		prepare_to_wait_exclusive(&rl->wait[is_sync], &wait,
diff --git a/block/blk-exec.c b/block/blk-exec.c
index a1ebceb..6053285 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,7 +50,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 {
 	int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
 
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (unlikely(blk_queue_dead(q))) {
 		rq->errors = -ENXIO;
 		if (rq->end_io)
 			rq->end_io(rq, rq->errors);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e7f9f65..f0b2ca8 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -425,7 +425,7 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 	if (!entry->show)
 		return -EIO;
 	mutex_lock(&q->sysfs_lock);
-	if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)) {
+	if (blk_queue_dead(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
@@ -447,7 +447,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 
 	q = container_of(kobj, struct request_queue, kobj);
 	mutex_lock(&q->sysfs_lock);
-	if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)) {
+	if (blk_queue_dead(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 4553245..5eed6a7 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -310,7 +310,7 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	struct request_queue *q = td->queue;
 
 	/* no throttling for dead queue */
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
 	rcu_read_lock();
@@ -335,7 +335,7 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	spin_lock_irq(q->queue_lock);
 
 	/* Make sure @q is still alive */
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (unlikely(blk_queue_dead(q))) {
 		kfree(tg);
 		return NULL;
 	}
diff --git a/block/blk.h b/block/blk.h
index 3f6551b..e38691d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -85,7 +85,7 @@ static inline struct request *__elv_next_request(struct request_queue *q)
 			q->flush_queue_delayed = 1;
 			return NULL;
 		}
-		if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags) ||
+		if (unlikely(blk_queue_dead(q)) ||
 		    !q->elevator->ops->elevator_dispatch_fn(q, 0))
 			return NULL;
 	}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8a6b51b..783f97c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -481,6 +481,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
+#define blk_queue_dead(q)	test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
 	test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 55/73] [SCSI] Fix device removal NULL pointer dereference
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (53 preceding siblings ...)
  2012-07-31  4:44 ` [ 54/73] block: add blk_queue_dead() Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command() Ben Hutchings
                   ` (19 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Bart Van Assche, Junichi Nomura,
	Mike Christie, Tejun Heo, James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bvanassche@acm.org>

commit 67bd94130015c507011af37858989b199c52e1de upstream.

Use blk_queue_dead() to test whether the queue is dead instead
of !sdev. Since scsi_prep_fn() may be invoked concurrently with
__scsi_remove_device(), keep the queuedata (sdev) pointer in
__scsi_remove_device(). This patch fixes a kernel oops that
can be triggered by USB device removal. See also
http://www.spinics.net/lists/linux-scsi/msg56254.html.

Other changes included in this patch:
- Swap the blk_cleanup_queue() and kfree() calls in
  scsi_host_dev_release() to make that code easier to grasp.
- Remove the queue dead check from scsi_run_queue() since the
  queue state can change anyway at any point in that function
  where the queue lock is not held.
- Remove the queue dead check from the start of scsi_request_fn()
  since it is redundant with the scsi_device_online() check.

Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/hosts.c      |    7 ++++---
 drivers/scsi/scsi_lib.c   |   32 ++++----------------------------
 drivers/scsi/scsi_priv.h  |    1 -
 drivers/scsi/scsi_sysfs.c |    5 +----
 4 files changed, 9 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 2b6a03d..593085a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -290,6 +290,7 @@ static void scsi_host_dev_release(struct device *dev)
 	struct Scsi_Host *shost = dev_to_shost(dev);
 	struct device *parent = dev->parent;
 	struct request_queue *q;
+	void *queuedata;
 
 	scsi_proc_hostdir_rm(shost->hostt);
 
@@ -299,9 +300,9 @@ static void scsi_host_dev_release(struct device *dev)
 		destroy_workqueue(shost->work_q);
 	q = shost->uspace_req_q;
 	if (q) {
-		kfree(q->queuedata);
-		q->queuedata = NULL;
-		scsi_free_queue(q);
+		queuedata = q->queuedata;
+		blk_cleanup_queue(q);
+		kfree(queuedata);
 	}
 
 	scsi_destroy_command_freelist(shost);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9f00c12..4acf5c2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -406,10 +406,6 @@ static void scsi_run_queue(struct request_queue *q)
 	LIST_HEAD(starved_list);
 	unsigned long flags;
 
-	/* if the device is dead, sdev will be NULL, so no queue to run */
-	if (!sdev)
-		return;
-
 	shost = sdev->host;
 	if (scsi_target(sdev)->single_lun)
 		scsi_single_lun_run(sdev);
@@ -1371,16 +1367,16 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
  * may be changed after request stacking drivers call the function,
  * regardless of taking lock or not.
  *
- * When scsi can't dispatch I/Os anymore and needs to kill I/Os
- * (e.g. !sdev), scsi needs to return 'not busy'.
- * Otherwise, request stacking drivers may hold requests forever.
+ * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
+ * needs to return 'not busy'. Otherwise, request stacking drivers
+ * may hold requests forever.
  */
 static int scsi_lld_busy(struct request_queue *q)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 
-	if (!sdev)
+	if (blk_queue_dead(q))
 		return 0;
 
 	shost = sdev->host;
@@ -1491,12 +1487,6 @@ static void scsi_request_fn(struct request_queue *q)
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if (!sdev) {
-		while ((req = blk_peek_request(q)) != NULL)
-			scsi_kill_request(req, q);
-		return;
-	}
-
 	if(!get_device(&sdev->sdev_gendev))
 		/* We must be tearing the block queue down already */
 		return;
@@ -1698,20 +1688,6 @@ struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 	return q;
 }
 
-void scsi_free_queue(struct request_queue *q)
-{
-	unsigned long flags;
-
-	WARN_ON(q->queuedata);
-
-	/* cause scsi_request_fn() to kill all non-finished requests */
-	spin_lock_irqsave(q->queue_lock, flags);
-	q->request_fn(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
-	blk_cleanup_queue(q);
-}
-
 /*
  * Function:    scsi_block_requests()
  *
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index cbfe5df..291db6e 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -85,7 +85,6 @@ extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
-extern void scsi_free_queue(struct request_queue *q);
 extern int scsi_init_queue(void);
 extern void scsi_exit_queue(void);
 struct request_queue;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 5747478..9aa578a 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -972,11 +972,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
 
-	/* cause the request function to reject all I/O requests */
-	sdev->request_queue->queuedata = NULL;
-
 	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
+	blk_cleanup_queue(sdev->request_queue);
 	put_device(dev);
 }
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command()
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (54 preceding siblings ...)
  2012-07-31  4:44 ` [ 55/73] [SCSI] Fix device removal NULL pointer dereference Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 57/73] [SCSI] fix hot unplug vs async scan race Ben Hutchings
                   ` (18 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Bart Van Assche, Mike Christie, Tejun Heo,
	James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bvanassche@acm.org>

commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream.

When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device.  If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.

Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/scsi_lib.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4acf5c2..0e52ff0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -479,15 +479,26 @@ void scsi_requeue_run_queue(struct work_struct *work)
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
+	struct scsi_device *sdev = cmd->device;
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	/*
+	 * We need to hold a reference on the device to avoid the queue being
+	 * killed after the unlock and before scsi_run_queue is invoked which
+	 * may happen because scsi_unprep_request() puts the command which
+	 * releases its reference on the device.
+	 */
+	get_device(&sdev->sdev_gendev);
+
 	spin_lock_irqsave(q->queue_lock, flags);
 	scsi_unprep_request(req);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
+
+	put_device(&sdev->sdev_gendev);
 }
 
 void scsi_next_command(struct scsi_cmnd *cmd)



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 57/73] [SCSI] fix hot unplug vs async scan race
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (55 preceding siblings ...)
  2012-07-31  4:44 ` [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command() Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Ben Hutchings
                   ` (17 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Dan Williams, Dariusz Majchrzak, James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream.

The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a
  [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
  [<ffffffff8107de15>] ? module_refcount+0x89/0xa0
  [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145

...teach scsi_sysfs_add_devices() to check for deleted devices() before
trying to add them, and teach scsi_remove_target() how to remove targets
that have not been added via device_add().

Reported-by: Dariusz Majchrzak <dariusz.majchrzak@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/scsi_scan.c  |    3 +++
 drivers/scsi/scsi_sysfs.c |   41 ++++++++++++++++++++++++++---------------
 2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 2e5fe58..f55e5f1 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1717,6 +1717,9 @@ static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
 	shost_for_each_device(sdev, shost) {
+		/* target removed before the device could be added */
+		if (sdev->sdev_state == SDEV_DEL)
+			continue;
 		if (!scsi_host_scan_allowed(shost) ||
 		    scsi_sysfs_add_sdev(sdev) != 0)
 			__scsi_remove_device(sdev);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index d19d7e9..093d4f6 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1005,7 +1005,6 @@ static void __scsi_remove_target(struct scsi_target *starget)
 	struct scsi_device *sdev;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	starget->reap_ref++;
  restart:
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
 		if (sdev->channel != starget->channel ||
@@ -1019,14 +1018,6 @@ static void __scsi_remove_target(struct scsi_target *starget)
 		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	scsi_target_reap(starget);
-}
-
-static int __remove_child (struct device * dev, void * data)
-{
-	if (scsi_is_target_device(dev))
-		__scsi_remove_target(to_scsi_target(dev));
-	return 0;
 }
 
 /**
@@ -1039,14 +1030,34 @@ static int __remove_child (struct device * dev, void * data)
  */
 void scsi_remove_target(struct device *dev)
 {
-	if (scsi_is_target_device(dev)) {
-		__scsi_remove_target(to_scsi_target(dev));
-		return;
+	struct Scsi_Host *shost = dev_to_shost(dev->parent);
+	struct scsi_target *starget, *found;
+	unsigned long flags;
+
+ restart:
+	found = NULL;
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_for_each_entry(starget, &shost->__targets, siblings) {
+		if (starget->state == STARGET_DEL)
+			continue;
+		if (starget->dev.parent == dev || &starget->dev == dev) {
+			found = starget;
+			found->reap_ref++;
+			break;
+		}
 	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	get_device(dev);
-	device_for_each_child(dev, NULL, __remove_child);
-	put_device(dev);
+	if (found) {
+		__scsi_remove_target(found);
+		scsi_target_reap(found);
+		/* in the case where @dev has multiple starget children,
+		 * continue removing.
+		 *
+		 * FIXME: does such a case exist?
+		 */
+		goto restart;
+	}
 }
 EXPORT_SYMBOL(scsi_remove_target);
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (56 preceding siblings ...)
  2012-07-31  4:44 ` [ 57/73] [SCSI] fix hot unplug vs async scan race Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 59/73] [SCSI] libsas: continue revalidation Ben Hutchings
                   ` (16 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Dan Williams, Tom Jackson, James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream.

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain.  When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.  Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/scsi_error.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d0f71e5..804f632 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1687,6 +1687,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * requests are started.
 	 */
 	scsi_run_host_queues(shost);
+
+	/*
+	 * if eh is active and host_eh_scheduled is pending we need to re-run
+	 * recovery.  we do this check after scsi_run_host_queues() to allow
+	 * everything pent up since the last eh run a chance to make forward
+	 * progress before we sync again.  Either we'll immediately re-run
+	 * recovery or scsi_device_unbusy() will wake us again when these
+	 * pending commands complete.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (shost->host_eh_scheduled)
+		if (scsi_host_set_state(shost, SHOST_RECOVERY))
+			WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 59/73] [SCSI] libsas: continue revalidation
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (57 preceding siblings ...)
  2012-07-31  4:44 ` [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling Ben Hutchings
                   ` (15 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Dan Williams, James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream.

Continue running revalidation until no more broadcast devices are
discovered.  Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/libsas/sas_expander.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index af659cc..63c5742 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2114,9 +2114,7 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 	struct domain_device *dev = NULL;
 
 	res = sas_find_bcast_dev(port_dev, &dev);
-	if (res)
-		goto out;
-	if (dev) {
+	while (res == 0 && dev) {
 		struct expander_device *ex = &dev->ex_dev;
 		int i = 0, phy_id;
 
@@ -2128,8 +2126,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 			res = sas_rediscover(dev, phy_id);
 			i = phy_id + 1;
 		} while (i < ex->num_phys);
+
+		dev = NULL;
+		res = sas_find_bcast_dev(port_dev, &dev);
 	}
-out:
 	return res;
 }
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (58 preceding siblings ...)
  2012-07-31  4:44 ` [ 59/73] [SCSI] libsas: continue revalidation Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP Ben Hutchings
                   ` (14 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Dan Williams, Dan Melnic, Jack Wang,
	James Bottomley

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@intel.com>

commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream.

commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports.  The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'.  Convert it to bool and stop
returning its result up the stack.

Tested-by: Dan Melnic <dan.melnic@amd.com>
Reported-by: Dan Melnic <dan.melnic@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/scsi/libsas/sas_expander.c |   39 +++++++++++-------------------------
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 63c5742..879dbbe 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -868,7 +868,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 }
 
 /* See if this phy is part of a wide port */
-static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
+static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 {
 	struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id];
 	int i;
@@ -884,11 +884,11 @@ static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 			sas_port_add_phy(ephy->port, phy->phy);
 			phy->port = ephy->port;
 			phy->phy_state = PHY_DEVICE_DISCOVERED;
-			return 0;
+			return true;
 		}
 	}
 
-	return -ENODEV;
+	return false;
 }
 
 static struct domain_device *sas_ex_discover_expander(
@@ -1030,8 +1030,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 		return res;
 	}
 
-	res = sas_ex_join_wide_port(dev, phy_id);
-	if (!res) {
+	if (sas_ex_join_wide_port(dev, phy_id)) {
 		SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 			    phy_id, SAS_ADDR(ex_phy->attached_sas_addr));
 		return res;
@@ -1077,8 +1076,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 			if (SAS_ADDR(ex->ex_phy[i].attached_sas_addr) ==
 			    SAS_ADDR(child->sas_addr)) {
 				ex->ex_phy[i].phy_state= PHY_DEVICE_DISCOVERED;
-				res = sas_ex_join_wide_port(dev, i);
-				if (!res)
+				if (sas_ex_join_wide_port(dev, i))
 					SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 						    i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));
 
@@ -1943,32 +1941,20 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 {
 	struct ex_phy *ex_phy = &dev->ex_dev.ex_phy[phy_id];
 	struct domain_device *child;
-	bool found = false;
-	int res, i;
+	int res;
 
 	SAS_DPRINTK("ex %016llx phy%d new device attached\n",
 		    SAS_ADDR(dev->sas_addr), phy_id);
 	res = sas_ex_phy_discover(dev, phy_id);
 	if (res)
-		goto out;
-	/* to support the wide port inserted */
-	for (i = 0; i < dev->ex_dev.num_phys; i++) {
-		struct ex_phy *ex_phy_temp = &dev->ex_dev.ex_phy[i];
-		if (i == phy_id)
-			continue;
-		if (SAS_ADDR(ex_phy_temp->attached_sas_addr) ==
-		    SAS_ADDR(ex_phy->attached_sas_addr)) {
-			found = true;
-			break;
-		}
-	}
-	if (found) {
-		sas_ex_join_wide_port(dev, phy_id);
+		return res;
+
+	if (sas_ex_join_wide_port(dev, phy_id))
 		return 0;
-	}
+
 	res = sas_ex_discover_devices(dev, phy_id);
-	if (!res)
-		goto out;
+	if (res)
+		return res;
 	list_for_each_entry(child, &dev->ex_dev.children, siblings) {
 		if (SAS_ADDR(child->sas_addr) ==
 		    SAS_ADDR(ex_phy->attached_sas_addr)) {
@@ -1978,7 +1964,6 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 			break;
 		}
 	}
-out:
 	return res;
 }
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (59 preceding siblings ...)
  2012-07-31  4:44 ` [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 62/73] mmc: sdhci-pci: CaFe has broken card detection Ben Hutchings
                   ` (13 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Al Viro, Al Viro, Andy Grover,
	Hannes Reinecke, Christoph Hellwig, Nicholas Bellinger

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@ZenIV.linux.org.uk>

commit bf6932f44a7b3fa7e2246a8b18a44670e5eab6c2 upstream.

>From Al Viro:

	BTW, speaking of struct file treatment related to sockets -
        there's this piece of code in iscsi:
        /*
         * The SCTP stack needs struct socket->file.
         */
        if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
            (np->np_network_transport == ISCSI_SCTP_UDP)) {
                if (!new_sock->file) {
                        new_sock->file = kzalloc(
                                        sizeof(struct file), GFP_KERNEL);

For one thing, as far as I can see it'not true - sctp does *not* depend on
socket->file being non-NULL; it does, in one place, check socket->file->f_flags
for O_NONBLOCK, but there it treats NULL socket->file as "flag not set".
Which is the case here anyway - the fake struct file created in
__iscsi_target_login_thread() (and in iscsi_target_setup_login_socket(), with
the same excuse) do *not* get that flag set.

Moreover, it's a bloody serious violation of a bunch of asserts in VFS;
all struct file instances should come from filp_cachep, via get_empty_filp()
(or alloc_file(), which is a wrapper for it).  FWIW, I'm very tempted to
do this and be done with the entire mess:

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Grover <agrover@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/target/iscsi/iscsi_target.c       |   22 ++---------
 drivers/target/iscsi/iscsi_target_core.h  |    2 -
 drivers/target/iscsi/iscsi_target_login.c |   60 ++---------------------------
 3 files changed, 6 insertions(+), 78 deletions(-)

--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -427,19 +427,8 @@ int iscsit_reset_np_thread(
 
 int iscsit_del_np_comm(struct iscsi_np *np)
 {
-	if (!np->np_socket)
-		return 0;
-
-	/*
-	 * Some network transports allocate their own struct sock->file,
-	 * see  if we need to free any additional allocated resources.
-	 */
-	if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-		kfree(np->np_socket->file);
-		np->np_socket->file = NULL;
-	}
-
-	sock_release(np->np_socket);
+	if (np->np_socket)
+		sock_release(np->np_socket);
 	return 0;
 }
 
@@ -4105,13 +4094,8 @@ int iscsit_close_connection(
 	kfree(conn->conn_ops);
 	conn->conn_ops = NULL;
 
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	conn->thread_set = NULL;
 
 	pr_debug("Moving to TARG_CONN_STATE_FREE.\n");
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -224,7 +224,6 @@ enum iscsi_timer_flags_table {
 /* Used for struct iscsi_np->np_flags */
 enum np_flags_table {
 	NPF_IP_NETWORK		= 0x00,
-	NPF_SCTP_STRUCT_FILE	= 0x01 /* Bugfix */
 };
 
 /* Used for struct iscsi_np->np_thread_state */
@@ -511,7 +510,6 @@ struct iscsi_conn {
 	u16			local_port;
 	int			net_size;
 	u32			auth_id;
-#define CONNFLAG_SCTP_STRUCT_FILE			0x01
 	u32			conn_flags;
 	/* Used for iscsi_tx_login_rsp() */
 	u32			login_itt;
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -793,22 +793,6 @@ int iscsi_target_setup_login_socket(
 	}
 	np->np_socket = sock;
 	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!sock->file) {
-			sock->file = kzalloc(sizeof(struct file), GFP_KERNEL);
-			if (!sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				ret = -ENOMEM;
-				goto fail;
-			}
-			np->np_flags |= NPF_SCTP_STRUCT_FILE;
-		}
-	}
-	/*
 	 * Setup the np->np_sockaddr from the passed sockaddr setup
 	 * in iscsi_target_configfs.c code..
 	 */
@@ -857,21 +841,15 @@ int iscsi_target_setup_login_socket(
 
 fail:
 	np->np_socket = NULL;
-	if (sock) {
-		if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-			kfree(sock->file);
-			sock->file = NULL;
-		}
-
+	if (sock)
 		sock_release(sock);
-	}
 	return ret;
 }
 
 static int __iscsi_target_login_thread(struct iscsi_np *np)
 {
 	u8 buffer[ISCSI_HDR_LEN], iscsi_opcode, zero_tsih = 0;
-	int err, ret = 0, ip_proto, sock_type, set_sctp_conn_flag, stop;
+	int err, ret = 0, ip_proto, sock_type, stop;
 	struct iscsi_conn *conn = NULL;
 	struct iscsi_login *login;
 	struct iscsi_portal_group *tpg = NULL;
@@ -882,7 +860,6 @@ static int __iscsi_target_login_thread(s
 	struct sockaddr_in6 sock_in6;
 
 	flush_signals(current);
-	set_sctp_conn_flag = 0;
 	sock = np->np_socket;
 	ip_proto = np->np_ip_proto;
 	sock_type = np->np_sock_type;
@@ -907,35 +884,12 @@ static int __iscsi_target_login_thread(s
 		spin_unlock_bh(&np->np_thread_lock);
 		goto out;
 	}
-	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!new_sock->file) {
-			new_sock->file = kzalloc(
-					sizeof(struct file), GFP_KERNEL);
-			if (!new_sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				sock_release(new_sock);
-				/* Get another socket */
-				return 1;
-			}
-			set_sctp_conn_flag = 1;
-		}
-	}
-
 	iscsi_start_login_thread_timer(np);
 
 	conn = kzalloc(sizeof(struct iscsi_conn), GFP_KERNEL);
 	if (!conn) {
 		pr_err("Could not allocate memory for"
 			" new connection\n");
-		if (set_sctp_conn_flag) {
-			kfree(new_sock->file);
-			new_sock->file = NULL;
-		}
 		sock_release(new_sock);
 		/* Get another socket */
 		return 1;
@@ -945,9 +899,6 @@ static int __iscsi_target_login_thread(s
 	conn->conn_state = TARG_CONN_STATE_FREE;
 	conn->sock = new_sock;
 
-	if (set_sctp_conn_flag)
-		conn->conn_flags |= CONNFLAG_SCTP_STRUCT_FILE;
-
 	pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n");
 	conn->conn_state = TARG_CONN_STATE_XPT_UP;
 
@@ -1195,13 +1146,8 @@ old_sess_out:
 		iscsi_release_param_list(conn->param_list);
 		conn->param_list = NULL;
 	}
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	kfree(conn);
 
 	if (tpg) {



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 62/73] mmc: sdhci-pci: CaFe has broken card detection
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (60 preceding siblings ...)
  2012-07-31  4:44 ` [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 63/73] ext4: dont let i_reserved_meta_blocks go negative Ben Hutchings
                   ` (12 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Daniel Drake, Chris Ball

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Drake <dsd@laptop.org>

commit 55fc05b7414274f17795cd0e8a3b1546f3649d5e upstream.

At http://dev.laptop.org/ticket/11980 we have determined that the
Marvell CaFe SDHCI controller reports bad card presence during
resume. It reports that no card is present even when it is.
This is a regression -- resume worked back around 2.6.37.

Around 400ms after resuming, a "card inserted" interrupt is
generated, at which point it starts reporting presence.

Work around this hardware oddity by setting the
SDHCI_QUIRK_BROKEN_CARD_DETECTION flag.
Thanks to Chris Ball for helping with diagnosis.

Signed-off-by: Daniel Drake <dsd@laptop.org>
[stable@: please apply to 3.0+]
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/mmc/host/sdhci-pci.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
index 69ef0be..504da71 100644
--- a/drivers/mmc/host/sdhci-pci.c
+++ b/drivers/mmc/host/sdhci-pci.c
@@ -157,6 +157,7 @@ static const struct sdhci_pci_fixes sdhci_ene_714 = {
 static const struct sdhci_pci_fixes sdhci_cafe = {
 	.quirks		= SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER |
 			  SDHCI_QUIRK_NO_BUSY_IRQ |
+			  SDHCI_QUIRK_BROKEN_CARD_DETECTION |
 			  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
 };
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 63/73] ext4: dont let i_reserved_meta_blocks go negative
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (61 preceding siblings ...)
  2012-07-31  4:44 ` [ 62/73] mmc: sdhci-pci: CaFe has broken card detection Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space Ben Hutchings
                   ` (11 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Brian Foster, Theodore Tso

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream.

If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks.  In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch().  Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.

This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:

Mar 28 08:58:02 localhost kernel: [  171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [  171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost

270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair.  The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/ext4/inode.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a533a18..25f809d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -346,6 +346,15 @@ void ext4_da_update_reserve_space(struct inode *inode,
 		used = ei->i_reserved_data_blocks;
 	}
 
+	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
+		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
+			 "with only %d reserved metadata blocks\n", __func__,
+			 inode->i_ino, ei->i_allocated_meta_blocks,
+			 ei->i_reserved_meta_blocks);
+		WARN_ON(1);
+		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
+	}
+
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
 	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (62 preceding siblings ...)
  2012-07-31  4:44 ` [ 63/73] ext4: dont let i_reserved_meta_blocks go negative Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements Ben Hutchings
                   ` (10 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Theodore Tso, Brian Foster

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <tytso@mit.edu>

commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream.

The function ext4_calc_metadata_amount() has side effects, although
it's not obvious from its function name.  So if we fail to claim
space, regardless of whether we retry to claim the space again, or
return an error, we need to undo these side effects.

Otherwise we can end up incorrectly calculating the number of metadata
blocks needed for the operation, which was responsible for an xfstests
failure for test #271 when using an ext2 file system with delalloc
enabled.

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/ext4/inode.c |   32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 25f809d..89b59cb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1182,6 +1182,17 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	unsigned int md_needed;
 	int ret;
+	ext4_lblk_t save_last_lblock;
+	int save_len;
+
+	/*
+	 * We will charge metadata quota at writeout time; this saves
+	 * us from metadata over-estimation, though we may go over by
+	 * a small amount in the end.  Here we just reserve for data.
+	 */
+	ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
+	if (ret)
+		return ret;
 
 	/*
 	 * recalculate the amount of metadata blocks to reserve
@@ -1190,32 +1201,31 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
 	 */
 repeat:
 	spin_lock(&ei->i_block_reservation_lock);
+	/*
+	 * ext4_calc_metadata_amount() has side effects, which we have
+	 * to be prepared undo if we fail to claim space.
+	 */
+	save_len = ei->i_da_metadata_calc_len;
+	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
 	md_needed = EXT4_NUM_B2C(sbi,
 				 ext4_calc_metadata_amount(inode, lblock));
 	trace_ext4_da_reserve_space(inode, md_needed);
-	spin_unlock(&ei->i_block_reservation_lock);
 
 	/*
-	 * We will charge metadata quota at writeout time; this saves
-	 * us from metadata over-estimation, though we may go over by
-	 * a small amount in the end.  Here we just reserve for data.
-	 */
-	ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
-	if (ret)
-		return ret;
-	/*
 	 * We do still charge estimated metadata to the sb though;
 	 * we cannot afford to run out of free blocks.
 	 */
 	if (ext4_claim_free_clusters(sbi, md_needed + 1, 0)) {
-		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
+		ei->i_da_metadata_calc_len = save_len;
+		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
+		spin_unlock(&ei->i_block_reservation_lock);
 		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
 			yield();
 			goto repeat;
 		}
+		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
 		return -ENOSPC;
 	}
-	spin_lock(&ei->i_block_reservation_lock);
 	ei->i_reserved_data_blocks++;
 	ei->i_reserved_meta_blocks += md_needed;
 	spin_unlock(&ei->i_block_reservation_lock);



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (63 preceding siblings ...)
  2012-07-31  4:44 ` [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 66/73] locks: fix checking of fcntl_setlease argument Ben Hutchings
                   ` (9 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Mark Brown, Liam Girdwood

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Brown <broonie@opensource.wolfsonmicro.com>

commit 0ff97ebf0804d2e519d578fcb4db03f104d2ca8c upstream.

Ever since the DAPM performance improvements we've been marking all widgets
as not dirty after each DAPM run. Since _PRE and _POST events aren't part
of the DAPM graph this has rendered them non-functional, they will never be
marked dirty again and thus will never be run again.

Fix this by skipping them when marking widgets as not dirty.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 sound/soc/soc-dapm.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index f7a13f7..025060b 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -1598,7 +1598,15 @@ static int dapm_power_widgets(struct snd_soc_dapm_context *dapm, int event)
 	}
 
 	list_for_each_entry(w, &card->widgets, list) {
-		list_del_init(&w->dirty);
+		switch (w->id) {
+		case snd_soc_dapm_pre:
+		case snd_soc_dapm_post:
+			/* These widgets always need to be powered */
+			break;
+		default:
+			list_del_init(&w->dirty);
+			break;
+		}
 
 		if (w->power) {
 			d = w->dapm;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 66/73] locks: fix checking of fcntl_setlease argument
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (64 preceding siblings ...)
  2012-07-31  4:44 ` [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Ben Hutchings
                   ` (8 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, J. Bruce Fields, J. Bruce Fields

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "J. Bruce Fields" <bfields@fieldses.org>

commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream.

The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int.  Thus some illegal values
may be let through and cause problems in later code.

[ They actually *don't* cause problems in mainline, as of Dave Jones's
  commit 8d657eb3b438 "Remove easily user-triggerable BUG from
  generic_setlease", but we should fix this anyway.  And this patch will
  be necessary to fix real bugs on earlier kernels. ]

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/locks.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index fce6238..82c3533 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -308,7 +308,7 @@ static int flock_make_lock(struct file *filp, struct file_lock **lock,
 	return 0;
 }
 
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock *fl, long type)
 {
 	switch (type) {
 	case F_RDLCK:
@@ -445,7 +445,7 @@ static const struct lock_manager_operations lease_manager_ops = {
 /*
  * Initialize a lease, use the default lock manager operations
  */
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, long type, struct file_lock *fl)
  {
 	if (assign_type(fl, type) != 0)
 		return -EINVAL;
@@ -463,7 +463,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
 }
 
 /* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lock *lease_alloc(struct file *filp, long type)
 {
 	struct file_lock *fl = locks_alloc_lock();
 	int error = -ENOMEM;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (65 preceding siblings ...)
  2012-07-31  4:44 ` [ 66/73] locks: fix checking of fcntl_setlease argument Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 68/73] drm/radeon: fix bo creation retry path Ben Hutchings
                   ` (7 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Lan Tianyu, Len Brown

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lan Tianyu <tianyu.lan@intel.com>

commit f197ac13f6eeb351b31250b9ab7d0da17434ea36 upstream.

In the ac.c, power_supply_register()'s return value is not checked.

As a result, the driver's add() ops may return success
even though the device failed to initialize.

For example, some BIOS may describe two ACADs in the same DSDT.
The second ACAD device will fail to register,
but ACPI driver's add() ops returns sucessfully.
The ACPI device will receive ACPI notification and cause OOPS.

https://bugzilla.redhat.com/show_bug.cgi?id=772730

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/acpi/ac.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
index 6512b20..d1fcbc0 100644
--- a/drivers/acpi/ac.c
+++ b/drivers/acpi/ac.c
@@ -292,7 +292,9 @@ static int acpi_ac_add(struct acpi_device *device)
 	ac->charger.properties = ac_props;
 	ac->charger.num_properties = ARRAY_SIZE(ac_props);
 	ac->charger.get_property = get_ac_property;
-	power_supply_register(&ac->device->dev, &ac->charger);
+	result = power_supply_register(&ac->device->dev, &ac->charger);
+	if (result)
+		goto end;
 
 	printk(KERN_INFO PREFIX "%s [%s] (%s)\n",
 	       acpi_device_name(device), acpi_device_bid(device),



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 68/73] drm/radeon: fix bo creation retry path
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (66 preceding siblings ...)
  2012-07-31  4:44 ` [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 69/73] drm/radeon: fix non revealent error message Ben Hutchings
                   ` (6 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Jerome Glisse, Michel DÀnzer,
	Christian König, Dave Airlie

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jerome Glisse <jglisse@redhat.com>

commit d1c7871ddb1f588b8eb35affd9ee1a3d5e11cd0c upstream.

Retry label was at wrong place in function leading to memory
leak.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/radeon/radeon_object.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -117,7 +117,6 @@ int radeon_bo_create(struct radeon_devic
 		return -ENOMEM;
 	}
 
-retry:
 	bo = kzalloc(sizeof(struct radeon_bo), GFP_KERNEL);
 	if (bo == NULL)
 		return -ENOMEM;
@@ -130,6 +129,8 @@ retry:
 	bo->gem_base.driver_private = NULL;
 	bo->surface_reg = -1;
 	INIT_LIST_HEAD(&bo->list);
+
+retry:
 	radeon_ttm_placement_from_domain(bo, domain);
 	/* Kernel allocation are uninterruptible */
 	mutex_lock(&rdev->vram_mutex);



^ permalink raw reply	[flat|nested] 94+ messages in thread

* [ 69/73] drm/radeon: fix non revealent error message
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (67 preceding siblings ...)
  2012-07-31  4:44 ` [ 68/73] drm/radeon: fix bo creation retry path Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2) Ben Hutchings
                   ` (5 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Jerome Glisse, Dave Airlie

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jerome Glisse <jglisse@redhat.com>

commit 8d1c702aa0b2c4b22b0742b72a1149d91690674b upstream.

We want to print link status query failed only if it's
an unexepected fail. If we query to see if we need
link training it might be because there is nothing
connected and thus link status query have the right
to fail in that case.

To avoid printing failure when it's expected, move the
failure message to proper place.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/radeon/atombios_dp.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c
index 0355536..7712cf5 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -22,6 +22,7 @@
  *
  * Authors: Dave Airlie
  *          Alex Deucher
+ *          Jerome Glisse
  */
 #include "drmP.h"
 #include "radeon_drm.h"
@@ -654,7 +655,6 @@ static bool radeon_dp_get_link_status(struct radeon_connector *radeon_connector,
 	ret = radeon_dp_aux_native_read(radeon_connector, DP_LANE0_1_STATUS,
 					link_status, DP_LINK_STATUS_SIZE, 100);
 	if (ret <= 0) {
-		DRM_ERROR("displayport link status failed\n");
 		return false;
 	}
 
@@ -833,8 +833,10 @@ static int radeon_dp_link_train_cr(struct radeon_dp_link_train_info *dp_info)
 		else
 			mdelay(dp_info->rd_interval * 4);
 
-		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status))
+		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status)) {
+			DRM_ERROR("displayport link status failed\n");
 			break;
+		}
 
 		if (dp_clock_recovery_ok(dp_info->link_status, dp_info->dp_lane_count)) {
 			clock_recovery = true;
@@ -896,8 +898,10 @@ static int radeon_dp_link_train_ce(struct radeon_dp_link_train_info *dp_info)
 		else
 			mdelay(dp_info->rd_interval * 4);
 
-		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status))
+		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status)) {
+			DRM_ERROR("displayport link status failed\n");
 			break;
+		}
 
 		if (dp_channel_eq_ok(dp_info->link_status, dp_info->dp_lane_count)) {
 			channel_eq = true;



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2)
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (68 preceding siblings ...)
  2012-07-31  4:44 ` [ 69/73] drm/radeon: fix non revealent error message Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 71/73] drm/radeon: on hotplug force link training to happen (v2) Ben Hutchings
                   ` (4 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Jerome Glisse, Alex Deucher, Dave Airlie

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jerome Glisse <jglisse@redhat.com>

commit 266dcba541a1ef7e5d82d9e67c67fde2910636e8 upstream.

No need to retrain the link for passive adapters.

v2: agd5f
- no passive DP to VGA adapters, update comments
- assign radeon_connector_atom_dig after we are sure
  we have a digital connector as analog connectors
  have different private data.
- get new sink type before checking for retrain.  No
  need to check if it's no longer a DP connection.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/radeon/radeon_connectors.c |   29 ++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c
index 2914c57..3524f17 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -64,14 +64,27 @@ void radeon_connector_hotplug(struct drm_connector *connector)
 
 	/* just deal with DP (not eDP) here. */
 	if (connector->connector_type == DRM_MODE_CONNECTOR_DisplayPort) {
-		int saved_dpms = connector->dpms;
-
-		/* Only turn off the display it it's physically disconnected */
-		if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd))
-			drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
-		else if (radeon_dp_needs_link_train(radeon_connector))
-			drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
-		connector->dpms = saved_dpms;
+		struct radeon_connector_atom_dig *dig_connector =
+			radeon_connector->con_priv;
+
+		/* if existing sink type was not DP no need to retrain */
+		if (dig_connector->dp_sink_type != CONNECTOR_OBJECT_ID_DISPLAYPORT)
+			return;
+
+		/* first get sink type as it may be reset after (un)plug */
+		dig_connector->dp_sink_type = radeon_dp_getsinktype(radeon_connector);
+		/* don't do anything if sink is not display port, i.e.,
+		 * passive dp->(dvi|hdmi) adaptor
+		 */
+		if (dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_DISPLAYPORT) {
+			int saved_dpms = connector->dpms;
+			/* Only turn off the display if it's physically disconnected */
+			if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd))
+				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+			else if (radeon_dp_needs_link_train(radeon_connector))
+				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
+			connector->dpms = saved_dpms;
+		}
 	}
 }
 



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 71/73] drm/radeon: on hotplug force link training to happen (v2)
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (69 preceding siblings ...)
  2012-07-31  4:44 ` [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2) Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 72/73] Btrfs: call the ordered free operation without any locks held Ben Hutchings
                   ` (3 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, Jerome Glisse, Alex Deucher, Dave Airlie

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jerome Glisse <jglisse@redhat.com>

commit ca2ccde5e2f24a792caa4cca919fc5c6f65d1887 upstream.

To have DP behave like VGA/DVI we need to retrain the link
on hotplug. For this to happen we need to force link
training to happen by setting connector dpms to off
before asking it turning it on again.

v2: agd5f
- drop the dp_get_link_status() change in atombios_dp.c
  for now.  We still need the dpms OFF change.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/radeon/radeon_connectors.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c
index 3524f17..895e628 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -79,10 +79,16 @@ void radeon_connector_hotplug(struct drm_connector *connector)
 		if (dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_DISPLAYPORT) {
 			int saved_dpms = connector->dpms;
 			/* Only turn off the display if it's physically disconnected */
-			if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd))
+			if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd)) {
 				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
-			else if (radeon_dp_needs_link_train(radeon_connector))
+			} else if (radeon_dp_needs_link_train(radeon_connector)) {
+				/* set it to OFF so that drm_helper_connector_dpms()
+				 * won't return immediately since the current state
+				 * is ON at this point.
+				 */
+				connector->dpms = DRM_MODE_DPMS_OFF;
 				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
+			}
 			connector->dpms = saved_dpms;
 		}
 	}



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 72/73] Btrfs: call the ordered free operation without any locks held
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (70 preceding siblings ...)
  2012-07-31  4:44 ` [ 71/73] drm/radeon: on hotplug force link training to happen (v2) Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  4:44 ` [ 73/73] nouveau: Fix alignment requirements on src and dst addresses Ben Hutchings
                   ` (2 subsequent siblings)
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Chris Mason

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chris Mason <chris.mason@fusionio.com>

commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream.

Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/btrfs/async-thread.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 4270414..58b7d14 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -206,10 +206,17 @@ static noinline void run_ordered_completions(struct btrfs_workers *workers,
 
 		work->ordered_func(work);
 
-		/* now take the lock again and call the freeing code */
+		/* now take the lock again and drop our item from the list */
 		spin_lock(&workers->order_lock);
 		list_del(&work->order_list);
+		spin_unlock(&workers->order_lock);
+
+		/*
+		 * we don't want to call the ordered free functions
+		 * with the lock held though
+		 */
 		work->ordered_free(work);
+		spin_lock(&workers->order_lock);
 	}
 
 	spin_unlock(&workers->order_lock);



^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [ 73/73] nouveau: Fix alignment requirements on src and dst addresses
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (71 preceding siblings ...)
  2012-07-31  4:44 ` [ 72/73] Btrfs: call the ordered free operation without any locks held Ben Hutchings
@ 2012-07-31  4:44 ` Ben Hutchings
  2012-07-31  5:00 ` [ 00/73] 3.2.25-stable review Ben Hutchings
  2012-08-01 12:55 ` Steven Rostedt
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  4:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: torvalds, akpm, alan, Maarten Lankhorst

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>

commit ce806a30470bcd846d148bf39d46de3ad7748228 upstream.

Linear copy works by adding the offset to the buffer address,
which may end up not being 16-byte aligned.

Some tests I've written for prime_pcopy show that the engine
allows this correctly, so the restriction on lowest 4 bits of
address can be lifted safely.

The comments added were by envyas, I think because I used
a newer version.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
[bwh: Backported to 3.2: no # prefixes in nva3_copy.fuc]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/gpu/drm/nouveau/nva3_copy.fuc   |    4 +-
 drivers/gpu/drm/nouveau/nva3_copy.fuc.h |   94 +++++++++++++++++++++++++++++--
 drivers/gpu/drm/nouveau/nvc0_copy.fuc.h |   87 ++++++++++++++++++++++++++--
 3 files changed, 175 insertions(+), 10 deletions(-)

--- a/drivers/gpu/drm/nouveau/nva3_copy.fuc
+++ b/drivers/gpu/drm/nouveau/nva3_copy.fuc
@@ -118,9 +118,9 @@ dispatch_dma:
 // mthd 0x030c-0x0340, various stuff
 .b16 0xc3 14
 .b32 ctx_src_address_high           ~0x000000ff
-.b32 ctx_src_address_low            ~0xfffffff0
+.b32 ctx_src_address_low            ~0xffffffff
 .b32 ctx_dst_address_high           ~0x000000ff
-.b32 ctx_dst_address_low            ~0xfffffff0
+.b32 ctx_dst_address_low            ~0xffffffff
 .b32 ctx_src_pitch                  ~0x0007ffff
 .b32 ctx_dst_pitch                  ~0x0007ffff
 .b32 ctx_xcnt                       ~0x0000ffff
--- a/drivers/gpu/drm/nouveau/nva3_copy.fuc.h
+++ b/drivers/gpu/drm/nouveau/nva3_copy.fuc.h
@@ -1,37 +1,72 @@
-uint32_t nva3_pcopy_data[] = {
+u32 nva3_pcopy_data[] = {
+/* 0x0000: ctx_object */
 	0x00000000,
+/* 0x0004: ctx_dma */
+/* 0x0004: ctx_dma_query */
 	0x00000000,
+/* 0x0008: ctx_dma_src */
 	0x00000000,
+/* 0x000c: ctx_dma_dst */
 	0x00000000,
+/* 0x0010: ctx_query_address_high */
 	0x00000000,
+/* 0x0014: ctx_query_address_low */
 	0x00000000,
+/* 0x0018: ctx_query_counter */
 	0x00000000,
+/* 0x001c: ctx_src_address_high */
 	0x00000000,
+/* 0x0020: ctx_src_address_low */
 	0x00000000,
+/* 0x0024: ctx_src_pitch */
 	0x00000000,
+/* 0x0028: ctx_src_tile_mode */
 	0x00000000,
+/* 0x002c: ctx_src_xsize */
 	0x00000000,
+/* 0x0030: ctx_src_ysize */
 	0x00000000,
+/* 0x0034: ctx_src_zsize */
 	0x00000000,
+/* 0x0038: ctx_src_zoff */
 	0x00000000,
+/* 0x003c: ctx_src_xoff */
 	0x00000000,
+/* 0x0040: ctx_src_yoff */
 	0x00000000,
+/* 0x0044: ctx_src_cpp */
 	0x00000000,
+/* 0x0048: ctx_dst_address_high */
 	0x00000000,
+/* 0x004c: ctx_dst_address_low */
 	0x00000000,
+/* 0x0050: ctx_dst_pitch */
 	0x00000000,
+/* 0x0054: ctx_dst_tile_mode */
 	0x00000000,
+/* 0x0058: ctx_dst_xsize */
 	0x00000000,
+/* 0x005c: ctx_dst_ysize */
 	0x00000000,
+/* 0x0060: ctx_dst_zsize */
 	0x00000000,
+/* 0x0064: ctx_dst_zoff */
 	0x00000000,
+/* 0x0068: ctx_dst_xoff */
 	0x00000000,
+/* 0x006c: ctx_dst_yoff */
 	0x00000000,
+/* 0x0070: ctx_dst_cpp */
 	0x00000000,
+/* 0x0074: ctx_format */
 	0x00000000,
+/* 0x0078: ctx_swz_const0 */
 	0x00000000,
+/* 0x007c: ctx_swz_const1 */
 	0x00000000,
+/* 0x0080: ctx_xcnt */
 	0x00000000,
+/* 0x0084: ctx_ycnt */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -63,6 +98,7 @@ uint32_t nva3_pcopy_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+/* 0x0100: dispatch_table */
 	0x00010000,
 	0x00000000,
 	0x00000000,
@@ -73,6 +109,7 @@ uint32_t nva3_pcopy_data[] = {
 	0x00010162,
 	0x00000000,
 	0x00030060,
+/* 0x0128: dispatch_dma */
 	0x00010170,
 	0x00000000,
 	0x00010170,
@@ -118,11 +155,11 @@ uint32_t nva3_pcopy_data[] = {
 	0x0000001c,
 	0xffffff00,
 	0x00000020,
-	0x0000000f,
+	0x00000000,
 	0x00000048,
 	0xffffff00,
 	0x0000004c,
-	0x0000000f,
+	0x00000000,
 	0x00000024,
 	0xfff80000,
 	0x00000050,
@@ -146,7 +183,8 @@ uint32_t nva3_pcopy_data[] = {
 	0x00000800,
 };
 
-uint32_t nva3_pcopy_code[] = {
+u32 nva3_pcopy_code[] = {
+/* 0x0000: main */
 	0x04fe04bd,
 	0x3517f000,
 	0xf10010fe,
@@ -158,23 +196,31 @@ uint32_t nva3_pcopy_code[] = {
 	0x17f11031,
 	0x27f01200,
 	0x0012d003,
+/* 0x002f: spin */
 	0xf40031f4,
 	0x0ef40028,
+/* 0x0035: ih */
 	0x8001cffd,
 	0xf40812c4,
 	0x21f4060b,
+/* 0x0041: ih_no_chsw */
 	0x0412c472,
 	0xf4060bf4,
+/* 0x004a: ih_no_cmd */
 	0x11c4c321,
 	0x4001d00c,
+/* 0x0052: swctx */
 	0x47f101f8,
 	0x4bfe7700,
 	0x0007fe00,
 	0xf00204b9,
 	0x01f40643,
 	0x0604fa09,
+/* 0x006b: swctx_load */
 	0xfa060ef4,
+/* 0x006e: swctx_done */
 	0x03f80504,
+/* 0x0072: chsw */
 	0x27f100f8,
 	0x23cf1400,
 	0x1e3fc800,
@@ -183,18 +229,22 @@ uint32_t nva3_pcopy_code[] = {
 	0x1e3af052,
 	0xf00023d0,
 	0x24d00147,
+/* 0x0093: chsw_no_unload */
 	0xcf00f880,
 	0x3dc84023,
 	0x220bf41e,
 	0xf40131f4,
 	0x57f05221,
 	0x0367f004,
+/* 0x00a8: chsw_load_ctx_dma */
 	0xa07856bc,
 	0xb6018068,
 	0x87d00884,
 	0x0162b600,
+/* 0x00bb: chsw_finish_load */
 	0xf0f018f4,
 	0x23d00237,
+/* 0x00c3: dispatch */
 	0xf100f880,
 	0xcf190037,
 	0x33cf4032,
@@ -202,6 +252,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x1024b607,
 	0x010057f1,
 	0x74bd64bd,
+/* 0x00dc: dispatch_loop */
 	0x58005658,
 	0x50b60157,
 	0x0446b804,
@@ -211,6 +262,7 @@ uint32_t nva3_pcopy_code[] = {
 	0xb60276bb,
 	0x57bb0374,
 	0xdf0ef400,
+/* 0x0100: dispatch_valid_mthd */
 	0xb60246bb,
 	0x45bb0344,
 	0x01459800,
@@ -220,31 +272,41 @@ uint32_t nva3_pcopy_code[] = {
 	0xb0014658,
 	0x1bf40064,
 	0x00538009,
+/* 0x0127: dispatch_cmd */
 	0xf4300ef4,
 	0x55f90132,
 	0xf40c01f4,
+/* 0x0132: dispatch_invalid_bitfield */
 	0x25f0250e,
+/* 0x0135: dispatch_illegal_mthd */
 	0x0125f002,
+/* 0x0138: dispatch_error */
 	0x100047f1,
 	0xd00042d0,
 	0x27f04043,
 	0x0002d040,
+/* 0x0148: hostirq_wait */
 	0xf08002cf,
 	0x24b04024,
 	0xf71bf400,
+/* 0x0154: dispatch_done */
 	0x1d0027f1,
 	0xd00137f0,
 	0x00f80023,
+/* 0x0160: cmd_nop */
+/* 0x0162: cmd_pm_trigger */
 	0x27f100f8,
 	0x34bd2200,
 	0xd00233f0,
 	0x00f80023,
+/* 0x0170: cmd_dma */
 	0x012842b7,
 	0xf00145b6,
 	0x43801e39,
 	0x0040b701,
 	0x0644b606,
 	0xf80043d0,
+/* 0x0189: cmd_exec_set_format */
 	0xf030f400,
 	0xb00001b0,
 	0x01b00101,
@@ -256,20 +318,26 @@ uint32_t nva3_pcopy_code[] = {
 	0x70b63847,
 	0x0232f401,
 	0x94bd84bd,
+/* 0x01b4: ncomp_loop */
 	0xb60f4ac4,
 	0xb4bd0445,
+/* 0x01bc: bpc_loop */
 	0xf404a430,
 	0xa5ff0f18,
 	0x00cbbbc0,
 	0xf40231f4,
+/* 0x01ce: cmp_c0 */
 	0x1bf4220e,
 	0x10c7f00c,
 	0xf400cbbb,
+/* 0x01da: cmp_c1 */
 	0xa430160e,
 	0x0c18f406,
 	0xbb14c7f0,
 	0x0ef400cb,
+/* 0x01e9: cmp_zero */
 	0x80c7f107,
+/* 0x01ed: bpc_next */
 	0x01c83800,
 	0xb60180b6,
 	0xb5b801b0,
@@ -280,6 +348,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x98110680,
 	0x68fd2008,
 	0x0502f400,
+/* 0x0216: dst_xcnt */
 	0x75fd64bd,
 	0x1c078000,
 	0xf10078fd,
@@ -304,6 +373,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x980056d0,
 	0x56d01f06,
 	0x1030f440,
+/* 0x0276: cmd_exec_set_surface_tiled */
 	0x579800f8,
 	0x6879c70a,
 	0xb66478c7,
@@ -311,9 +381,11 @@ uint32_t nva3_pcopy_code[] = {
 	0x0e76b060,
 	0xf0091bf4,
 	0x0ef40477,
+/* 0x0291: xtile64 */
 	0x027cf00f,
 	0xfd1170b6,
 	0x77f00947,
+/* 0x029d: xtileok */
 	0x0f5a9806,
 	0xfd115b98,
 	0xb7f000ab,
@@ -371,6 +443,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x67d00600,
 	0x0060b700,
 	0x0068d004,
+/* 0x0382: cmd_exec_set_surface_linear */
 	0x6cf000f8,
 	0x0260b702,
 	0x0864b602,
@@ -381,13 +454,16 @@ uint32_t nva3_pcopy_code[] = {
 	0xb70067d0,
 	0x98040060,
 	0x67d00957,
+/* 0x03ab: cmd_exec_wait */
 	0xf900f800,
 	0xf110f900,
 	0xb6080007,
+/* 0x03b6: loop */
 	0x01cf0604,
 	0x0114f000,
 	0xfcfa1bf4,
 	0xf800fc10,
+/* 0x03c5: cmd_exec_query */
 	0x0d34c800,
 	0xf5701bf4,
 	0xf103ab21,
@@ -417,6 +493,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x47f10153,
 	0x44b60800,
 	0x0045d006,
+/* 0x0438: query_counter */
 	0x03ab21f5,
 	0x080c47f1,
 	0x980644b6,
@@ -439,11 +516,13 @@ uint32_t nva3_pcopy_code[] = {
 	0x47f10153,
 	0x44b60800,
 	0x0045d006,
+/* 0x0492: cmd_exec */
 	0x21f500f8,
 	0x3fc803ab,
 	0x0e0bf400,
 	0x018921f5,
 	0x020047f1,
+/* 0x04a7: cmd_exec_no_format */
 	0xf11e0ef4,
 	0xb6081067,
 	0x77f00664,
@@ -451,19 +530,24 @@ uint32_t nva3_pcopy_code[] = {
 	0x981c0780,
 	0x67d02007,
 	0x4067d000,
+/* 0x04c2: cmd_exec_init_src_surface */
 	0x32f444bd,
 	0xc854bd02,
 	0x0bf4043f,
 	0x8221f50a,
 	0x0a0ef403,
+/* 0x04d4: src_tiled */
 	0x027621f5,
+/* 0x04db: cmd_exec_init_dst_surface */
 	0xf40749f0,
 	0x57f00231,
 	0x083fc82c,
 	0xf50a0bf4,
 	0xf4038221,
+/* 0x04ee: dst_tiled */
 	0x21f50a0e,
 	0x49f00276,
+/* 0x04f5: cmd_exec_kick */
 	0x0057f108,
 	0x0654b608,
 	0xd0210698,
@@ -473,6 +557,8 @@ uint32_t nva3_pcopy_code[] = {
 	0xc80054d0,
 	0x0bf40c3f,
 	0xc521f507,
+/* 0x0519: cmd_exec_done */
+/* 0x051b: cmd_wrcache_flush */
 	0xf100f803,
 	0xbd220027,
 	0x0133f034,
--- a/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h
+++ b/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h
@@ -1,34 +1,65 @@
-uint32_t nvc0_pcopy_data[] = {
+u32 nvc0_pcopy_data[] = {
+/* 0x0000: ctx_object */
 	0x00000000,
+/* 0x0004: ctx_query_address_high */
 	0x00000000,
+/* 0x0008: ctx_query_address_low */
 	0x00000000,
+/* 0x000c: ctx_query_counter */
 	0x00000000,
+/* 0x0010: ctx_src_address_high */
 	0x00000000,
+/* 0x0014: ctx_src_address_low */
 	0x00000000,
+/* 0x0018: ctx_src_pitch */
 	0x00000000,
+/* 0x001c: ctx_src_tile_mode */
 	0x00000000,
+/* 0x0020: ctx_src_xsize */
 	0x00000000,
+/* 0x0024: ctx_src_ysize */
 	0x00000000,
+/* 0x0028: ctx_src_zsize */
 	0x00000000,
+/* 0x002c: ctx_src_zoff */
 	0x00000000,
+/* 0x0030: ctx_src_xoff */
 	0x00000000,
+/* 0x0034: ctx_src_yoff */
 	0x00000000,
+/* 0x0038: ctx_src_cpp */
 	0x00000000,
+/* 0x003c: ctx_dst_address_high */
 	0x00000000,
+/* 0x0040: ctx_dst_address_low */
 	0x00000000,
+/* 0x0044: ctx_dst_pitch */
 	0x00000000,
+/* 0x0048: ctx_dst_tile_mode */
 	0x00000000,
+/* 0x004c: ctx_dst_xsize */
 	0x00000000,
+/* 0x0050: ctx_dst_ysize */
 	0x00000000,
+/* 0x0054: ctx_dst_zsize */
 	0x00000000,
+/* 0x0058: ctx_dst_zoff */
 	0x00000000,
+/* 0x005c: ctx_dst_xoff */
 	0x00000000,
+/* 0x0060: ctx_dst_yoff */
 	0x00000000,
+/* 0x0064: ctx_dst_cpp */
 	0x00000000,
+/* 0x0068: ctx_format */
 	0x00000000,
+/* 0x006c: ctx_swz_const0 */
 	0x00000000,
+/* 0x0070: ctx_swz_const1 */
 	0x00000000,
+/* 0x0074: ctx_xcnt */
 	0x00000000,
+/* 0x0078: ctx_ycnt */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -63,6 +94,7 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+/* 0x0100: dispatch_table */
 	0x00010000,
 	0x00000000,
 	0x00000000,
@@ -111,11 +143,11 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000010,
 	0xffffff00,
 	0x00000014,
-	0x0000000f,
+	0x00000000,
 	0x0000003c,
 	0xffffff00,
 	0x00000040,
-	0x0000000f,
+	0x00000000,
 	0x00000018,
 	0xfff80000,
 	0x00000044,
@@ -139,7 +171,8 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000800,
 };
 
-uint32_t nvc0_pcopy_code[] = {
+u32 nvc0_pcopy_code[] = {
+/* 0x0000: main */
 	0x04fe04bd,
 	0x3517f000,
 	0xf10010fe,
@@ -151,15 +184,20 @@ uint32_t nvc0_pcopy_code[] = {
 	0x17f11031,
 	0x27f01200,
 	0x0012d003,
+/* 0x002f: spin */
 	0xf40031f4,
 	0x0ef40028,
+/* 0x0035: ih */
 	0x8001cffd,
 	0xf40812c4,
 	0x21f4060b,
+/* 0x0041: ih_no_chsw */
 	0x0412c4ca,
 	0xf5070bf4,
+/* 0x004b: ih_no_cmd */
 	0xc4010221,
 	0x01d00c11,
+/* 0x0053: swctx */
 	0xf101f840,
 	0xfe770047,
 	0x47f1004b,
@@ -188,8 +226,11 @@ uint32_t nvc0_pcopy_code[] = {
 	0xf00204b9,
 	0x01f40643,
 	0x0604fa09,
+/* 0x00c3: swctx_load */
 	0xfa060ef4,
+/* 0x00c6: swctx_done */
 	0x03f80504,
+/* 0x00ca: chsw */
 	0x27f100f8,
 	0x23cf1400,
 	0x1e3fc800,
@@ -198,18 +239,22 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1e3af053,
 	0xf00023d0,
 	0x24d00147,
+/* 0x00eb: chsw_no_unload */
 	0xcf00f880,
 	0x3dc84023,
 	0x090bf41e,
 	0xf40131f4,
+/* 0x00fa: chsw_finish_load */
 	0x37f05321,
 	0x8023d002,
+/* 0x0102: dispatch */
 	0x37f100f8,
 	0x32cf1900,
 	0x0033cf40,
 	0x07ff24e4,
 	0xf11024b6,
 	0xbd010057,
+/* 0x011b: dispatch_loop */
 	0x5874bd64,
 	0x57580056,
 	0x0450b601,
@@ -219,6 +264,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0xbb0f08f4,
 	0x74b60276,
 	0x0057bb03,
+/* 0x013f: dispatch_valid_mthd */
 	0xbbdf0ef4,
 	0x44b60246,
 	0x0045bb03,
@@ -229,24 +275,33 @@ uint32_t nvc0_pcopy_code[] = {
 	0x64b00146,
 	0x091bf400,
 	0xf4005380,
+/* 0x0166: dispatch_cmd */
 	0x32f4300e,
 	0xf455f901,
 	0x0ef40c01,
+/* 0x0171: dispatch_invalid_bitfield */
 	0x0225f025,
+/* 0x0174: dispatch_illegal_mthd */
+/* 0x0177: dispatch_error */
 	0xf10125f0,
 	0xd0100047,
 	0x43d00042,
 	0x4027f040,
+/* 0x0187: hostirq_wait */
 	0xcf0002d0,
 	0x24f08002,
 	0x0024b040,
+/* 0x0193: dispatch_done */
 	0xf1f71bf4,
 	0xf01d0027,
 	0x23d00137,
+/* 0x019f: cmd_nop */
 	0xf800f800,
+/* 0x01a1: cmd_pm_trigger */
 	0x0027f100,
 	0xf034bd22,
 	0x23d00233,
+/* 0x01af: cmd_exec_set_format */
 	0xf400f800,
 	0x01b0f030,
 	0x0101b000,
@@ -258,20 +313,26 @@ uint32_t nvc0_pcopy_code[] = {
 	0x3847c701,
 	0xf40170b6,
 	0x84bd0232,
+/* 0x01da: ncomp_loop */
 	0x4ac494bd,
 	0x0445b60f,
+/* 0x01e2: bpc_loop */
 	0xa430b4bd,
 	0x0f18f404,
 	0xbbc0a5ff,
 	0x31f400cb,
 	0x220ef402,
+/* 0x01f4: cmp_c0 */
 	0xf00c1bf4,
 	0xcbbb10c7,
 	0x160ef400,
+/* 0x0200: cmp_c1 */
 	0xf406a430,
 	0xc7f00c18,
 	0x00cbbb14,
+/* 0x020f: cmp_zero */
 	0xf1070ef4,
+/* 0x0213: bpc_next */
 	0x380080c7,
 	0x80b601c8,
 	0x01b0b601,
@@ -283,6 +344,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1d08980e,
 	0xf40068fd,
 	0x64bd0502,
+/* 0x023c: dst_xcnt */
 	0x800075fd,
 	0x78fd1907,
 	0x1057f100,
@@ -307,15 +369,18 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1c069800,
 	0xf44056d0,
 	0x00f81030,
+/* 0x029c: cmd_exec_set_surface_tiled */
 	0xc7075798,
 	0x78c76879,
 	0x0380b664,
 	0xb06077c7,
 	0x1bf40e76,
 	0x0477f009,
+/* 0x02b7: xtile64 */
 	0xf00f0ef4,
 	0x70b6027c,
 	0x0947fd11,
+/* 0x02c3: xtileok */
 	0x980677f0,
 	0x5b980c5a,
 	0x00abfd0e,
@@ -374,6 +439,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0xb70067d0,
 	0xd0040060,
 	0x00f80068,
+/* 0x03a8: cmd_exec_set_surface_linear */
 	0xb7026cf0,
 	0xb6020260,
 	0x57980864,
@@ -384,12 +450,15 @@ uint32_t nvc0_pcopy_code[] = {
 	0x0060b700,
 	0x06579804,
 	0xf80067d0,
+/* 0x03d1: cmd_exec_wait */
 	0xf900f900,
 	0x0007f110,
 	0x0604b608,
+/* 0x03dc: loop */
 	0xf00001cf,
 	0x1bf40114,
 	0xfc10fcfa,
+/* 0x03eb: cmd_exec_query */
 	0xc800f800,
 	0x1bf40d34,
 	0xd121f570,
@@ -419,6 +488,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0x0153f026,
 	0x080047f1,
 	0xd00644b6,
+/* 0x045e: query_counter */
 	0x21f50045,
 	0x47f103d1,
 	0x44b6080c,
@@ -442,11 +512,13 @@ uint32_t nvc0_pcopy_code[] = {
 	0x080047f1,
 	0xd00644b6,
 	0x00f80045,
+/* 0x04b8: cmd_exec */
 	0x03d121f5,
 	0xf4003fc8,
 	0x21f50e0b,
 	0x47f101af,
 	0x0ef40200,
+/* 0x04cd: cmd_exec_no_format */
 	0x1067f11e,
 	0x0664b608,
 	0x800177f0,
@@ -454,18 +526,23 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1d079819,
 	0xd00067d0,
 	0x44bd4067,
+/* 0x04e8: cmd_exec_init_src_surface */
 	0xbd0232f4,
 	0x043fc854,
 	0xf50a0bf4,
 	0xf403a821,
+/* 0x04fa: src_tiled */
 	0x21f50a0e,
 	0x49f0029c,
+/* 0x0501: cmd_exec_init_dst_surface */
 	0x0231f407,
 	0xc82c57f0,
 	0x0bf4083f,
 	0xa821f50a,
 	0x0a0ef403,
+/* 0x0514: dst_tiled */
 	0x029c21f5,
+/* 0x051b: cmd_exec_kick */
 	0xf10849f0,
 	0xb6080057,
 	0x06980654,
@@ -475,7 +552,9 @@ uint32_t nvc0_pcopy_code[] = {
 	0x54d00546,
 	0x0c3fc800,
 	0xf5070bf4,
+/* 0x053f: cmd_exec_done */
 	0xf803eb21,
+/* 0x0541: cmd_wrcache_flush */
 	0x0027f100,
 	0xf034bd22,
 	0x23d00133,



^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 00/73] 3.2.25-stable review
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (72 preceding siblings ...)
  2012-07-31  4:44 ` [ 73/73] nouveau: Fix alignment requirements on src and dst addresses Ben Hutchings
@ 2012-07-31  5:00 ` Ben Hutchings
  2012-08-01 12:55 ` Steven Rostedt
  74 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31  5:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: stable, torvalds, akpm, alan

[-- Attachment #1: Type: text/plain, Size: 143109 bytes --]

diff --git a/Makefile b/Makefile
index 80bb4fd..89b88d0 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 3
 PATCHLEVEL = 2
-SUBLEVEL = 24
-EXTRAVERSION =
+SUBLEVEL = 25
+EXTRAVERSION = -rc1
 NAME = Saber-toothed Squirrel
 
 # *DOCUMENTATION*
diff --git a/arch/arm/mach-omap2/opp.c b/arch/arm/mach-omap2/opp.c
index 9262a6b..6e75ae3 100644
--- a/arch/arm/mach-omap2/opp.c
+++ b/arch/arm/mach-omap2/opp.c
@@ -53,7 +53,7 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
 	omap_table_init = 1;
 
 	/* Lets now register with OPP library */
-	for (i = 0; i < opp_def_size; i++) {
+	for (i = 0; i < opp_def_size; i++, opp_def++) {
 		struct omap_hwmod *oh;
 		struct device *dev;
 
@@ -86,7 +86,6 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
 					__func__, opp_def->freq,
 					opp_def->hwmod_name, i, r);
 		}
-		opp_def++;
 	}
 
 	return 0;
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 559da19..578e5a0 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1016,7 +1016,8 @@
 /* Macros for setting and retrieving special purpose registers */
 #ifndef __ASSEMBLY__
 #define mfmsr()		({unsigned long rval; \
-			asm volatile("mfmsr %0" : "=r" (rval)); rval;})
+			asm volatile("mfmsr %0" : "=r" (rval) : \
+						: "memory"); rval;})
 #ifdef CONFIG_PPC_BOOK3S_64
 #define __mtmsrd(v, l)	asm volatile("mtmsrd %0," __stringify(l) \
 				     : : "r" (v) : "memory")
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index bf99cfa..6324008 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -245,9 +245,9 @@ __ftrace_make_nop(struct module *mod,
 
 	/*
 	 * On PPC32 the trampoline looks like:
-	 *  0x3d, 0x60, 0x00, 0x00  lis r11,sym@ha
-	 *  0x39, 0x6b, 0x00, 0x00  addi r11,r11,sym@l
-	 *  0x7d, 0x69, 0x03, 0xa6  mtctr r11
+	 *  0x3d, 0x80, 0x00, 0x00  lis r12,sym@ha
+	 *  0x39, 0x8c, 0x00, 0x00  addi r12,r12,sym@l
+	 *  0x7d, 0x89, 0x03, 0xa6  mtctr r12
 	 *  0x4e, 0x80, 0x04, 0x20  bctr
 	 */
 
@@ -262,9 +262,9 @@ __ftrace_make_nop(struct module *mod,
 	pr_devel(" %08x %08x ", jmp[0], jmp[1]);
 
 	/* verify that this is what we expect it to be */
-	if (((jmp[0] & 0xffff0000) != 0x3d600000) ||
-	    ((jmp[1] & 0xffff0000) != 0x396b0000) ||
-	    (jmp[2] != 0x7d6903a6) ||
+	if (((jmp[0] & 0xffff0000) != 0x3d800000) ||
+	    ((jmp[1] & 0xffff0000) != 0x398c0000) ||
+	    (jmp[2] != 0x7d8903a6) ||
 	    (jmp[3] != 0x4e800420)) {
 		printk(KERN_ERR "Not a trampoline\n");
 		return -EINVAL;
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 6e0073e..07c7bf4 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -26,12 +26,14 @@ static DEFINE_PER_CPU(struct cpuid, cpu_id);
 void __cpuinit cpu_init(void)
 {
 	struct cpuid *id = &per_cpu(cpu_id, smp_processor_id());
+	struct s390_idle_data *idle = &__get_cpu_var(s390_idle);
 
 	get_cpu_id(id);
 	atomic_inc(&init_mm.mm_count);
 	current->active_mm = &init_mm;
 	BUG_ON(current->mm);
 	enter_lazy_tlb(&init_mm, current);
+	memset(idle, 0, sizeof(*idle));
 }
 
 /*
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 3ea8728..1df64a8 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -1020,14 +1020,11 @@ static int __cpuinit smp_cpu_notify(struct notifier_block *self,
 	unsigned int cpu = (unsigned int)(long)hcpu;
 	struct cpu *c = &per_cpu(cpu_devices, cpu);
 	struct sys_device *s = &c->sysdev;
-	struct s390_idle_data *idle;
 	int err = 0;
 
 	switch (action) {
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
-		idle = &per_cpu(s390_idle, cpu);
-		memset(idle, 0, sizeof(struct s390_idle_data));
 		err = sysfs_create_group(&s->kobj, &cpu_online_attr_group);
 		break;
 	case CPU_DEAD:
diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
index 563a09d..29c95d7 100644
--- a/arch/x86/kernel/microcode_core.c
+++ b/arch/x86/kernel/microcode_core.c
@@ -297,20 +297,31 @@ static ssize_t reload_store(struct sys_device *dev,
 			    const char *buf, size_t size)
 {
 	unsigned long val;
-	int cpu = dev->id;
-	int ret = 0;
-	char *end;
+	int cpu;
+	ssize_t ret = 0, tmp_ret;
 
-	val = simple_strtoul(buf, &end, 0);
-	if (end == buf)
+	/* allow reload only from the BSP */
+	if (boot_cpu_data.cpu_index != dev->id)
 		return -EINVAL;
 
-	if (val == 1) {
-		get_online_cpus();
-		if (cpu_online(cpu))
-			ret = reload_for_cpu(cpu);
-		put_online_cpus();
+	ret = kstrtoul(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (val != 1)
+		return size;
+
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		tmp_ret = reload_for_cpu(cpu);
+		if (tmp_ret != 0)
+			pr_warn("Error reloading microcode on CPU %d\n", cpu);
+
+		/* save retval of the first encountered reload error */
+		if (!ret)
+			ret = tmp_ret;
 	}
+	put_online_cpus();
 
 	if (!ret)
 		ret = size;
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 6dd8955..0951b81 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -521,3 +521,20 @@ static void sb600_disable_hpet_bar(struct pci_dev *dev)
 	}
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ATI, 0x4385, sb600_disable_hpet_bar);
+
+/*
+ * Twinhead H12Y needs us to block out a region otherwise we map devices
+ * there and any access kills the box.
+ *
+ *   See: https://bugzilla.kernel.org/show_bug.cgi?id=10231
+ *
+ * Match off the LPC and svid/sdid (older kernels lose the bridge subvendor)
+ */
+static void __devinit twinhead_reserve_killing_zone(struct pci_dev *dev)
+{
+        if (dev->subsystem_vendor == 0x14FF && dev->subsystem_device == 0xA003) {
+                pr_info("Reserving memory on Twinhead H12Y\n");
+                request_mem_region(0xFFB00000, 0x100000, "twinhead");
+        }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x27B9, twinhead_reserve_killing_zone);
diff --git a/block/blk-core.c b/block/blk-core.c
index 15de223..49d9e91 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -607,7 +607,7 @@ EXPORT_SYMBOL(blk_init_allocated_queue);
 
 int blk_get_queue(struct request_queue *q)
 {
-	if (likely(!test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (likely(!blk_queue_dead(q))) {
 		kobject_get(&q->kobj);
 		return 0;
 	}
@@ -754,7 +754,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
 	const bool is_sync = rw_is_sync(rw_flags) != 0;
 	int may_queue;
 
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
 	may_queue = elv_may_queue(q, rw_flags);
@@ -874,7 +874,7 @@ static struct request *get_request_wait(struct request_queue *q, int rw_flags,
 		struct io_context *ioc;
 		struct request_list *rl = &q->rq;
 
-		if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+		if (unlikely(blk_queue_dead(q)))
 			return NULL;
 
 		prepare_to_wait_exclusive(&rl->wait[is_sync], &wait,
diff --git a/block/blk-exec.c b/block/blk-exec.c
index a1ebceb..6053285 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,7 +50,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 {
 	int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
 
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (unlikely(blk_queue_dead(q))) {
 		rq->errors = -ENXIO;
 		if (rq->end_io)
 			rq->end_io(rq, rq->errors);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e7f9f65..f0b2ca8 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -425,7 +425,7 @@ queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
 	if (!entry->show)
 		return -EIO;
 	mutex_lock(&q->sysfs_lock);
-	if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)) {
+	if (blk_queue_dead(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
@@ -447,7 +447,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 
 	q = container_of(kobj, struct request_queue, kobj);
 	mutex_lock(&q->sysfs_lock);
-	if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)) {
+	if (blk_queue_dead(q)) {
 		mutex_unlock(&q->sysfs_lock);
 		return -ENOENT;
 	}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 4553245..5eed6a7 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -310,7 +310,7 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	struct request_queue *q = td->queue;
 
 	/* no throttling for dead queue */
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+	if (unlikely(blk_queue_dead(q)))
 		return NULL;
 
 	rcu_read_lock();
@@ -335,7 +335,7 @@ static struct throtl_grp * throtl_get_tg(struct throtl_data *td)
 	spin_lock_irq(q->queue_lock);
 
 	/* Make sure @q is still alive */
-	if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+	if (unlikely(blk_queue_dead(q))) {
 		kfree(tg);
 		return NULL;
 	}
diff --git a/block/blk.h b/block/blk.h
index 3f6551b..e38691d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -85,7 +85,7 @@ static inline struct request *__elv_next_request(struct request_queue *q)
 			q->flush_queue_delayed = 1;
 			return NULL;
 		}
-		if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags) ||
+		if (unlikely(blk_queue_dead(q)) ||
 		    !q->elevator->ops->elevator_dispatch_fn(q, 0))
 			return NULL;
 	}
diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
index 6512b20..d1fcbc0 100644
--- a/drivers/acpi/ac.c
+++ b/drivers/acpi/ac.c
@@ -292,7 +292,9 @@ static int acpi_ac_add(struct acpi_device *device)
 	ac->charger.properties = ac_props;
 	ac->charger.num_properties = ARRAY_SIZE(ac_props);
 	ac->charger.get_property = get_ac_property;
-	power_supply_register(&ac->device->dev, &ac->charger);
+	result = power_supply_register(&ac->device->dev, &ac->charger);
+	if (result)
+		goto end;
 
 	printk(KERN_INFO PREFIX "%s [%s] (%s)\n",
 	       acpi_device_name(device), acpi_device_bid(device),
diff --git a/drivers/gpu/drm/nouveau/nva3_copy.fuc b/drivers/gpu/drm/nouveau/nva3_copy.fuc
index eaf35f8..d894731 100644
--- a/drivers/gpu/drm/nouveau/nva3_copy.fuc
+++ b/drivers/gpu/drm/nouveau/nva3_copy.fuc
@@ -118,9 +118,9 @@ dispatch_dma:
 // mthd 0x030c-0x0340, various stuff
 .b16 0xc3 14
 .b32 ctx_src_address_high           ~0x000000ff
-.b32 ctx_src_address_low            ~0xfffffff0
+.b32 ctx_src_address_low            ~0xffffffff
 .b32 ctx_dst_address_high           ~0x000000ff
-.b32 ctx_dst_address_low            ~0xfffffff0
+.b32 ctx_dst_address_low            ~0xffffffff
 .b32 ctx_src_pitch                  ~0x0007ffff
 .b32 ctx_dst_pitch                  ~0x0007ffff
 .b32 ctx_xcnt                       ~0x0000ffff
diff --git a/drivers/gpu/drm/nouveau/nva3_copy.fuc.h b/drivers/gpu/drm/nouveau/nva3_copy.fuc.h
index 2731de2..e2a0e88 100644
--- a/drivers/gpu/drm/nouveau/nva3_copy.fuc.h
+++ b/drivers/gpu/drm/nouveau/nva3_copy.fuc.h
@@ -1,37 +1,72 @@
-uint32_t nva3_pcopy_data[] = {
+u32 nva3_pcopy_data[] = {
+/* 0x0000: ctx_object */
 	0x00000000,
+/* 0x0004: ctx_dma */
+/* 0x0004: ctx_dma_query */
 	0x00000000,
+/* 0x0008: ctx_dma_src */
 	0x00000000,
+/* 0x000c: ctx_dma_dst */
 	0x00000000,
+/* 0x0010: ctx_query_address_high */
 	0x00000000,
+/* 0x0014: ctx_query_address_low */
 	0x00000000,
+/* 0x0018: ctx_query_counter */
 	0x00000000,
+/* 0x001c: ctx_src_address_high */
 	0x00000000,
+/* 0x0020: ctx_src_address_low */
 	0x00000000,
+/* 0x0024: ctx_src_pitch */
 	0x00000000,
+/* 0x0028: ctx_src_tile_mode */
 	0x00000000,
+/* 0x002c: ctx_src_xsize */
 	0x00000000,
+/* 0x0030: ctx_src_ysize */
 	0x00000000,
+/* 0x0034: ctx_src_zsize */
 	0x00000000,
+/* 0x0038: ctx_src_zoff */
 	0x00000000,
+/* 0x003c: ctx_src_xoff */
 	0x00000000,
+/* 0x0040: ctx_src_yoff */
 	0x00000000,
+/* 0x0044: ctx_src_cpp */
 	0x00000000,
+/* 0x0048: ctx_dst_address_high */
 	0x00000000,
+/* 0x004c: ctx_dst_address_low */
 	0x00000000,
+/* 0x0050: ctx_dst_pitch */
 	0x00000000,
+/* 0x0054: ctx_dst_tile_mode */
 	0x00000000,
+/* 0x0058: ctx_dst_xsize */
 	0x00000000,
+/* 0x005c: ctx_dst_ysize */
 	0x00000000,
+/* 0x0060: ctx_dst_zsize */
 	0x00000000,
+/* 0x0064: ctx_dst_zoff */
 	0x00000000,
+/* 0x0068: ctx_dst_xoff */
 	0x00000000,
+/* 0x006c: ctx_dst_yoff */
 	0x00000000,
+/* 0x0070: ctx_dst_cpp */
 	0x00000000,
+/* 0x0074: ctx_format */
 	0x00000000,
+/* 0x0078: ctx_swz_const0 */
 	0x00000000,
+/* 0x007c: ctx_swz_const1 */
 	0x00000000,
+/* 0x0080: ctx_xcnt */
 	0x00000000,
+/* 0x0084: ctx_ycnt */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -63,6 +98,7 @@ uint32_t nva3_pcopy_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+/* 0x0100: dispatch_table */
 	0x00010000,
 	0x00000000,
 	0x00000000,
@@ -73,6 +109,7 @@ uint32_t nva3_pcopy_data[] = {
 	0x00010162,
 	0x00000000,
 	0x00030060,
+/* 0x0128: dispatch_dma */
 	0x00010170,
 	0x00000000,
 	0x00010170,
@@ -118,11 +155,11 @@ uint32_t nva3_pcopy_data[] = {
 	0x0000001c,
 	0xffffff00,
 	0x00000020,
-	0x0000000f,
+	0x00000000,
 	0x00000048,
 	0xffffff00,
 	0x0000004c,
-	0x0000000f,
+	0x00000000,
 	0x00000024,
 	0xfff80000,
 	0x00000050,
@@ -146,7 +183,8 @@ uint32_t nva3_pcopy_data[] = {
 	0x00000800,
 };
 
-uint32_t nva3_pcopy_code[] = {
+u32 nva3_pcopy_code[] = {
+/* 0x0000: main */
 	0x04fe04bd,
 	0x3517f000,
 	0xf10010fe,
@@ -158,23 +196,31 @@ uint32_t nva3_pcopy_code[] = {
 	0x17f11031,
 	0x27f01200,
 	0x0012d003,
+/* 0x002f: spin */
 	0xf40031f4,
 	0x0ef40028,
+/* 0x0035: ih */
 	0x8001cffd,
 	0xf40812c4,
 	0x21f4060b,
+/* 0x0041: ih_no_chsw */
 	0x0412c472,
 	0xf4060bf4,
+/* 0x004a: ih_no_cmd */
 	0x11c4c321,
 	0x4001d00c,
+/* 0x0052: swctx */
 	0x47f101f8,
 	0x4bfe7700,
 	0x0007fe00,
 	0xf00204b9,
 	0x01f40643,
 	0x0604fa09,
+/* 0x006b: swctx_load */
 	0xfa060ef4,
+/* 0x006e: swctx_done */
 	0x03f80504,
+/* 0x0072: chsw */
 	0x27f100f8,
 	0x23cf1400,
 	0x1e3fc800,
@@ -183,18 +229,22 @@ uint32_t nva3_pcopy_code[] = {
 	0x1e3af052,
 	0xf00023d0,
 	0x24d00147,
+/* 0x0093: chsw_no_unload */
 	0xcf00f880,
 	0x3dc84023,
 	0x220bf41e,
 	0xf40131f4,
 	0x57f05221,
 	0x0367f004,
+/* 0x00a8: chsw_load_ctx_dma */
 	0xa07856bc,
 	0xb6018068,
 	0x87d00884,
 	0x0162b600,
+/* 0x00bb: chsw_finish_load */
 	0xf0f018f4,
 	0x23d00237,
+/* 0x00c3: dispatch */
 	0xf100f880,
 	0xcf190037,
 	0x33cf4032,
@@ -202,6 +252,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x1024b607,
 	0x010057f1,
 	0x74bd64bd,
+/* 0x00dc: dispatch_loop */
 	0x58005658,
 	0x50b60157,
 	0x0446b804,
@@ -211,6 +262,7 @@ uint32_t nva3_pcopy_code[] = {
 	0xb60276bb,
 	0x57bb0374,
 	0xdf0ef400,
+/* 0x0100: dispatch_valid_mthd */
 	0xb60246bb,
 	0x45bb0344,
 	0x01459800,
@@ -220,31 +272,41 @@ uint32_t nva3_pcopy_code[] = {
 	0xb0014658,
 	0x1bf40064,
 	0x00538009,
+/* 0x0127: dispatch_cmd */
 	0xf4300ef4,
 	0x55f90132,
 	0xf40c01f4,
+/* 0x0132: dispatch_invalid_bitfield */
 	0x25f0250e,
+/* 0x0135: dispatch_illegal_mthd */
 	0x0125f002,
+/* 0x0138: dispatch_error */
 	0x100047f1,
 	0xd00042d0,
 	0x27f04043,
 	0x0002d040,
+/* 0x0148: hostirq_wait */
 	0xf08002cf,
 	0x24b04024,
 	0xf71bf400,
+/* 0x0154: dispatch_done */
 	0x1d0027f1,
 	0xd00137f0,
 	0x00f80023,
+/* 0x0160: cmd_nop */
+/* 0x0162: cmd_pm_trigger */
 	0x27f100f8,
 	0x34bd2200,
 	0xd00233f0,
 	0x00f80023,
+/* 0x0170: cmd_dma */
 	0x012842b7,
 	0xf00145b6,
 	0x43801e39,
 	0x0040b701,
 	0x0644b606,
 	0xf80043d0,
+/* 0x0189: cmd_exec_set_format */
 	0xf030f400,
 	0xb00001b0,
 	0x01b00101,
@@ -256,20 +318,26 @@ uint32_t nva3_pcopy_code[] = {
 	0x70b63847,
 	0x0232f401,
 	0x94bd84bd,
+/* 0x01b4: ncomp_loop */
 	0xb60f4ac4,
 	0xb4bd0445,
+/* 0x01bc: bpc_loop */
 	0xf404a430,
 	0xa5ff0f18,
 	0x00cbbbc0,
 	0xf40231f4,
+/* 0x01ce: cmp_c0 */
 	0x1bf4220e,
 	0x10c7f00c,
 	0xf400cbbb,
+/* 0x01da: cmp_c1 */
 	0xa430160e,
 	0x0c18f406,
 	0xbb14c7f0,
 	0x0ef400cb,
+/* 0x01e9: cmp_zero */
 	0x80c7f107,
+/* 0x01ed: bpc_next */
 	0x01c83800,
 	0xb60180b6,
 	0xb5b801b0,
@@ -280,6 +348,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x98110680,
 	0x68fd2008,
 	0x0502f400,
+/* 0x0216: dst_xcnt */
 	0x75fd64bd,
 	0x1c078000,
 	0xf10078fd,
@@ -304,6 +373,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x980056d0,
 	0x56d01f06,
 	0x1030f440,
+/* 0x0276: cmd_exec_set_surface_tiled */
 	0x579800f8,
 	0x6879c70a,
 	0xb66478c7,
@@ -311,9 +381,11 @@ uint32_t nva3_pcopy_code[] = {
 	0x0e76b060,
 	0xf0091bf4,
 	0x0ef40477,
+/* 0x0291: xtile64 */
 	0x027cf00f,
 	0xfd1170b6,
 	0x77f00947,
+/* 0x029d: xtileok */
 	0x0f5a9806,
 	0xfd115b98,
 	0xb7f000ab,
@@ -371,6 +443,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x67d00600,
 	0x0060b700,
 	0x0068d004,
+/* 0x0382: cmd_exec_set_surface_linear */
 	0x6cf000f8,
 	0x0260b702,
 	0x0864b602,
@@ -381,13 +454,16 @@ uint32_t nva3_pcopy_code[] = {
 	0xb70067d0,
 	0x98040060,
 	0x67d00957,
+/* 0x03ab: cmd_exec_wait */
 	0xf900f800,
 	0xf110f900,
 	0xb6080007,
+/* 0x03b6: loop */
 	0x01cf0604,
 	0x0114f000,
 	0xfcfa1bf4,
 	0xf800fc10,
+/* 0x03c5: cmd_exec_query */
 	0x0d34c800,
 	0xf5701bf4,
 	0xf103ab21,
@@ -417,6 +493,7 @@ uint32_t nva3_pcopy_code[] = {
 	0x47f10153,
 	0x44b60800,
 	0x0045d006,
+/* 0x0438: query_counter */
 	0x03ab21f5,
 	0x080c47f1,
 	0x980644b6,
@@ -439,11 +516,13 @@ uint32_t nva3_pcopy_code[] = {
 	0x47f10153,
 	0x44b60800,
 	0x0045d006,
+/* 0x0492: cmd_exec */
 	0x21f500f8,
 	0x3fc803ab,
 	0x0e0bf400,
 	0x018921f5,
 	0x020047f1,
+/* 0x04a7: cmd_exec_no_format */
 	0xf11e0ef4,
 	0xb6081067,
 	0x77f00664,
@@ -451,19 +530,24 @@ uint32_t nva3_pcopy_code[] = {
 	0x981c0780,
 	0x67d02007,
 	0x4067d000,
+/* 0x04c2: cmd_exec_init_src_surface */
 	0x32f444bd,
 	0xc854bd02,
 	0x0bf4043f,
 	0x8221f50a,
 	0x0a0ef403,
+/* 0x04d4: src_tiled */
 	0x027621f5,
+/* 0x04db: cmd_exec_init_dst_surface */
 	0xf40749f0,
 	0x57f00231,
 	0x083fc82c,
 	0xf50a0bf4,
 	0xf4038221,
+/* 0x04ee: dst_tiled */
 	0x21f50a0e,
 	0x49f00276,
+/* 0x04f5: cmd_exec_kick */
 	0x0057f108,
 	0x0654b608,
 	0xd0210698,
@@ -473,6 +557,8 @@ uint32_t nva3_pcopy_code[] = {
 	0xc80054d0,
 	0x0bf40c3f,
 	0xc521f507,
+/* 0x0519: cmd_exec_done */
+/* 0x051b: cmd_wrcache_flush */
 	0xf100f803,
 	0xbd220027,
 	0x0133f034,
diff --git a/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h b/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h
index 4199038..9e87036 100644
--- a/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h
+++ b/drivers/gpu/drm/nouveau/nvc0_copy.fuc.h
@@ -1,34 +1,65 @@
-uint32_t nvc0_pcopy_data[] = {
+u32 nvc0_pcopy_data[] = {
+/* 0x0000: ctx_object */
 	0x00000000,
+/* 0x0004: ctx_query_address_high */
 	0x00000000,
+/* 0x0008: ctx_query_address_low */
 	0x00000000,
+/* 0x000c: ctx_query_counter */
 	0x00000000,
+/* 0x0010: ctx_src_address_high */
 	0x00000000,
+/* 0x0014: ctx_src_address_low */
 	0x00000000,
+/* 0x0018: ctx_src_pitch */
 	0x00000000,
+/* 0x001c: ctx_src_tile_mode */
 	0x00000000,
+/* 0x0020: ctx_src_xsize */
 	0x00000000,
+/* 0x0024: ctx_src_ysize */
 	0x00000000,
+/* 0x0028: ctx_src_zsize */
 	0x00000000,
+/* 0x002c: ctx_src_zoff */
 	0x00000000,
+/* 0x0030: ctx_src_xoff */
 	0x00000000,
+/* 0x0034: ctx_src_yoff */
 	0x00000000,
+/* 0x0038: ctx_src_cpp */
 	0x00000000,
+/* 0x003c: ctx_dst_address_high */
 	0x00000000,
+/* 0x0040: ctx_dst_address_low */
 	0x00000000,
+/* 0x0044: ctx_dst_pitch */
 	0x00000000,
+/* 0x0048: ctx_dst_tile_mode */
 	0x00000000,
+/* 0x004c: ctx_dst_xsize */
 	0x00000000,
+/* 0x0050: ctx_dst_ysize */
 	0x00000000,
+/* 0x0054: ctx_dst_zsize */
 	0x00000000,
+/* 0x0058: ctx_dst_zoff */
 	0x00000000,
+/* 0x005c: ctx_dst_xoff */
 	0x00000000,
+/* 0x0060: ctx_dst_yoff */
 	0x00000000,
+/* 0x0064: ctx_dst_cpp */
 	0x00000000,
+/* 0x0068: ctx_format */
 	0x00000000,
+/* 0x006c: ctx_swz_const0 */
 	0x00000000,
+/* 0x0070: ctx_swz_const1 */
 	0x00000000,
+/* 0x0074: ctx_xcnt */
 	0x00000000,
+/* 0x0078: ctx_ycnt */
 	0x00000000,
 	0x00000000,
 	0x00000000,
@@ -63,6 +94,7 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000000,
 	0x00000000,
 	0x00000000,
+/* 0x0100: dispatch_table */
 	0x00010000,
 	0x00000000,
 	0x00000000,
@@ -111,11 +143,11 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000010,
 	0xffffff00,
 	0x00000014,
-	0x0000000f,
+	0x00000000,
 	0x0000003c,
 	0xffffff00,
 	0x00000040,
-	0x0000000f,
+	0x00000000,
 	0x00000018,
 	0xfff80000,
 	0x00000044,
@@ -139,7 +171,8 @@ uint32_t nvc0_pcopy_data[] = {
 	0x00000800,
 };
 
-uint32_t nvc0_pcopy_code[] = {
+u32 nvc0_pcopy_code[] = {
+/* 0x0000: main */
 	0x04fe04bd,
 	0x3517f000,
 	0xf10010fe,
@@ -151,15 +184,20 @@ uint32_t nvc0_pcopy_code[] = {
 	0x17f11031,
 	0x27f01200,
 	0x0012d003,
+/* 0x002f: spin */
 	0xf40031f4,
 	0x0ef40028,
+/* 0x0035: ih */
 	0x8001cffd,
 	0xf40812c4,
 	0x21f4060b,
+/* 0x0041: ih_no_chsw */
 	0x0412c4ca,
 	0xf5070bf4,
+/* 0x004b: ih_no_cmd */
 	0xc4010221,
 	0x01d00c11,
+/* 0x0053: swctx */
 	0xf101f840,
 	0xfe770047,
 	0x47f1004b,
@@ -188,8 +226,11 @@ uint32_t nvc0_pcopy_code[] = {
 	0xf00204b9,
 	0x01f40643,
 	0x0604fa09,
+/* 0x00c3: swctx_load */
 	0xfa060ef4,
+/* 0x00c6: swctx_done */
 	0x03f80504,
+/* 0x00ca: chsw */
 	0x27f100f8,
 	0x23cf1400,
 	0x1e3fc800,
@@ -198,18 +239,22 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1e3af053,
 	0xf00023d0,
 	0x24d00147,
+/* 0x00eb: chsw_no_unload */
 	0xcf00f880,
 	0x3dc84023,
 	0x090bf41e,
 	0xf40131f4,
+/* 0x00fa: chsw_finish_load */
 	0x37f05321,
 	0x8023d002,
+/* 0x0102: dispatch */
 	0x37f100f8,
 	0x32cf1900,
 	0x0033cf40,
 	0x07ff24e4,
 	0xf11024b6,
 	0xbd010057,
+/* 0x011b: dispatch_loop */
 	0x5874bd64,
 	0x57580056,
 	0x0450b601,
@@ -219,6 +264,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0xbb0f08f4,
 	0x74b60276,
 	0x0057bb03,
+/* 0x013f: dispatch_valid_mthd */
 	0xbbdf0ef4,
 	0x44b60246,
 	0x0045bb03,
@@ -229,24 +275,33 @@ uint32_t nvc0_pcopy_code[] = {
 	0x64b00146,
 	0x091bf400,
 	0xf4005380,
+/* 0x0166: dispatch_cmd */
 	0x32f4300e,
 	0xf455f901,
 	0x0ef40c01,
+/* 0x0171: dispatch_invalid_bitfield */
 	0x0225f025,
+/* 0x0174: dispatch_illegal_mthd */
+/* 0x0177: dispatch_error */
 	0xf10125f0,
 	0xd0100047,
 	0x43d00042,
 	0x4027f040,
+/* 0x0187: hostirq_wait */
 	0xcf0002d0,
 	0x24f08002,
 	0x0024b040,
+/* 0x0193: dispatch_done */
 	0xf1f71bf4,
 	0xf01d0027,
 	0x23d00137,
+/* 0x019f: cmd_nop */
 	0xf800f800,
+/* 0x01a1: cmd_pm_trigger */
 	0x0027f100,
 	0xf034bd22,
 	0x23d00233,
+/* 0x01af: cmd_exec_set_format */
 	0xf400f800,
 	0x01b0f030,
 	0x0101b000,
@@ -258,20 +313,26 @@ uint32_t nvc0_pcopy_code[] = {
 	0x3847c701,
 	0xf40170b6,
 	0x84bd0232,
+/* 0x01da: ncomp_loop */
 	0x4ac494bd,
 	0x0445b60f,
+/* 0x01e2: bpc_loop */
 	0xa430b4bd,
 	0x0f18f404,
 	0xbbc0a5ff,
 	0x31f400cb,
 	0x220ef402,
+/* 0x01f4: cmp_c0 */
 	0xf00c1bf4,
 	0xcbbb10c7,
 	0x160ef400,
+/* 0x0200: cmp_c1 */
 	0xf406a430,
 	0xc7f00c18,
 	0x00cbbb14,
+/* 0x020f: cmp_zero */
 	0xf1070ef4,
+/* 0x0213: bpc_next */
 	0x380080c7,
 	0x80b601c8,
 	0x01b0b601,
@@ -283,6 +344,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1d08980e,
 	0xf40068fd,
 	0x64bd0502,
+/* 0x023c: dst_xcnt */
 	0x800075fd,
 	0x78fd1907,
 	0x1057f100,
@@ -307,15 +369,18 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1c069800,
 	0xf44056d0,
 	0x00f81030,
+/* 0x029c: cmd_exec_set_surface_tiled */
 	0xc7075798,
 	0x78c76879,
 	0x0380b664,
 	0xb06077c7,
 	0x1bf40e76,
 	0x0477f009,
+/* 0x02b7: xtile64 */
 	0xf00f0ef4,
 	0x70b6027c,
 	0x0947fd11,
+/* 0x02c3: xtileok */
 	0x980677f0,
 	0x5b980c5a,
 	0x00abfd0e,
@@ -374,6 +439,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0xb70067d0,
 	0xd0040060,
 	0x00f80068,
+/* 0x03a8: cmd_exec_set_surface_linear */
 	0xb7026cf0,
 	0xb6020260,
 	0x57980864,
@@ -384,12 +450,15 @@ uint32_t nvc0_pcopy_code[] = {
 	0x0060b700,
 	0x06579804,
 	0xf80067d0,
+/* 0x03d1: cmd_exec_wait */
 	0xf900f900,
 	0x0007f110,
 	0x0604b608,
+/* 0x03dc: loop */
 	0xf00001cf,
 	0x1bf40114,
 	0xfc10fcfa,
+/* 0x03eb: cmd_exec_query */
 	0xc800f800,
 	0x1bf40d34,
 	0xd121f570,
@@ -419,6 +488,7 @@ uint32_t nvc0_pcopy_code[] = {
 	0x0153f026,
 	0x080047f1,
 	0xd00644b6,
+/* 0x045e: query_counter */
 	0x21f50045,
 	0x47f103d1,
 	0x44b6080c,
@@ -442,11 +512,13 @@ uint32_t nvc0_pcopy_code[] = {
 	0x080047f1,
 	0xd00644b6,
 	0x00f80045,
+/* 0x04b8: cmd_exec */
 	0x03d121f5,
 	0xf4003fc8,
 	0x21f50e0b,
 	0x47f101af,
 	0x0ef40200,
+/* 0x04cd: cmd_exec_no_format */
 	0x1067f11e,
 	0x0664b608,
 	0x800177f0,
@@ -454,18 +526,23 @@ uint32_t nvc0_pcopy_code[] = {
 	0x1d079819,
 	0xd00067d0,
 	0x44bd4067,
+/* 0x04e8: cmd_exec_init_src_surface */
 	0xbd0232f4,
 	0x043fc854,
 	0xf50a0bf4,
 	0xf403a821,
+/* 0x04fa: src_tiled */
 	0x21f50a0e,
 	0x49f0029c,
+/* 0x0501: cmd_exec_init_dst_surface */
 	0x0231f407,
 	0xc82c57f0,
 	0x0bf4083f,
 	0xa821f50a,
 	0x0a0ef403,
+/* 0x0514: dst_tiled */
 	0x029c21f5,
+/* 0x051b: cmd_exec_kick */
 	0xf10849f0,
 	0xb6080057,
 	0x06980654,
@@ -475,7 +552,9 @@ uint32_t nvc0_pcopy_code[] = {
 	0x54d00546,
 	0x0c3fc800,
 	0xf5070bf4,
+/* 0x053f: cmd_exec_done */
 	0xf803eb21,
+/* 0x0541: cmd_wrcache_flush */
 	0x0027f100,
 	0xf034bd22,
 	0x23d00133,
diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c
index 552b436..3254d51 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -22,6 +22,7 @@
  *
  * Authors: Dave Airlie
  *          Alex Deucher
+ *          Jerome Glisse
  */
 #include "drmP.h"
 #include "radeon_drm.h"
@@ -634,7 +635,6 @@ static bool radeon_dp_get_link_status(struct radeon_connector *radeon_connector,
 	ret = radeon_dp_aux_native_read(radeon_connector, DP_LANE0_1_STATUS,
 					link_status, DP_LINK_STATUS_SIZE, 100);
 	if (ret <= 0) {
-		DRM_ERROR("displayport link status failed\n");
 		return false;
 	}
 
@@ -812,8 +812,10 @@ static int radeon_dp_link_train_cr(struct radeon_dp_link_train_info *dp_info)
 		else
 			mdelay(dp_info->rd_interval * 4);
 
-		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status))
+		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status)) {
+			DRM_ERROR("displayport link status failed\n");
 			break;
+		}
 
 		if (dp_clock_recovery_ok(dp_info->link_status, dp_info->dp_lane_count)) {
 			clock_recovery = true;
@@ -875,8 +877,10 @@ static int radeon_dp_link_train_ce(struct radeon_dp_link_train_info *dp_info)
 		else
 			mdelay(dp_info->rd_interval * 4);
 
-		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status))
+		if (!radeon_dp_get_link_status(dp_info->radeon_connector, dp_info->link_status)) {
+			DRM_ERROR("displayport link status failed\n");
 			break;
+		}
 
 		if (dp_channel_eq_ok(dp_info->link_status, dp_info->dp_lane_count)) {
 			channel_eq = true;
diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c
index 4a4493f..87d494d 100644
--- a/drivers/gpu/drm/radeon/radeon_connectors.c
+++ b/drivers/gpu/drm/radeon/radeon_connectors.c
@@ -64,14 +64,33 @@ void radeon_connector_hotplug(struct drm_connector *connector)
 
 	/* just deal with DP (not eDP) here. */
 	if (connector->connector_type == DRM_MODE_CONNECTOR_DisplayPort) {
-		int saved_dpms = connector->dpms;
-
-		/* Only turn off the display it it's physically disconnected */
-		if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd))
-			drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
-		else if (radeon_dp_needs_link_train(radeon_connector))
-			drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
-		connector->dpms = saved_dpms;
+		struct radeon_connector_atom_dig *dig_connector =
+			radeon_connector->con_priv;
+
+		/* if existing sink type was not DP no need to retrain */
+		if (dig_connector->dp_sink_type != CONNECTOR_OBJECT_ID_DISPLAYPORT)
+			return;
+
+		/* first get sink type as it may be reset after (un)plug */
+		dig_connector->dp_sink_type = radeon_dp_getsinktype(radeon_connector);
+		/* don't do anything if sink is not display port, i.e.,
+		 * passive dp->(dvi|hdmi) adaptor
+		 */
+		if (dig_connector->dp_sink_type == CONNECTOR_OBJECT_ID_DISPLAYPORT) {
+			int saved_dpms = connector->dpms;
+			/* Only turn off the display if it's physically disconnected */
+			if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd)) {
+				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+			} else if (radeon_dp_needs_link_train(radeon_connector)) {
+				/* set it to OFF so that drm_helper_connector_dpms()
+				 * won't return immediately since the current state
+				 * is ON at this point.
+				 */
+				connector->dpms = DRM_MODE_DPMS_OFF;
+				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_ON);
+			}
+			connector->dpms = saved_dpms;
+		}
 	}
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_cursor.c b/drivers/gpu/drm/radeon/radeon_cursor.c
index 986d608..2132109 100644
--- a/drivers/gpu/drm/radeon/radeon_cursor.c
+++ b/drivers/gpu/drm/radeon/radeon_cursor.c
@@ -257,8 +257,14 @@ int radeon_crtc_cursor_move(struct drm_crtc *crtc,
 				if (!(cursor_end & 0x7f))
 					w--;
 			}
-			if (w <= 0)
+			if (w <= 0) {
 				w = 1;
+				cursor_end = x - xorigin + w;
+				if (!(cursor_end & 0x7f)) {
+					x--;
+					WARN_ON_ONCE(x < 0);
+				}
+			}
 		}
 	}
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index f3ae607..39497c7 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -117,7 +117,6 @@ int radeon_bo_create(struct radeon_device *rdev,
 		return -ENOMEM;
 	}
 
-retry:
 	bo = kzalloc(sizeof(struct radeon_bo), GFP_KERNEL);
 	if (bo == NULL)
 		return -ENOMEM;
@@ -130,6 +129,8 @@ retry:
 	bo->gem_base.driver_private = NULL;
 	bo->surface_reg = -1;
 	INIT_LIST_HEAD(&bo->list);
+
+retry:
 	radeon_ttm_placement_from_domain(bo, domain);
 	/* Kernel allocation are uninterruptible */
 	mutex_lock(&rdev->vram_mutex);
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a1b8caa..0f074e0 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1865,6 +1865,11 @@ static int device_change_notifier(struct notifier_block *nb,
 
 		iommu_init_device(dev);
 
+		if (iommu_pass_through) {
+			attach_device(dev, pt_domain);
+			break;
+		}
+
 		domain = domain_for_device(dev);
 
 		/* allocate a protection domain if a device is added */
@@ -1880,10 +1885,7 @@ static int device_change_notifier(struct notifier_block *nb,
 		list_add_tail(&dma_domain->list, &iommu_pd_list);
 		spin_unlock_irqrestore(&iommu_pd_list_lock, flags);
 
-		if (!iommu_pass_through)
-			dev->archdata.dma_ops = &amd_iommu_dma_ops;
-		else
-			dev->archdata.dma_ops = &nommu_dma_ops;
+		dev->archdata.dma_ops = &amd_iommu_dma_ops;
 
 		break;
 	case BUS_NOTIFY_DEL_DEVICE:
diff --git a/drivers/media/video/cx25821/cx25821-core.c b/drivers/media/video/cx25821/cx25821-core.c
index a7fa38f..e572ce5 100644
--- a/drivers/media/video/cx25821/cx25821-core.c
+++ b/drivers/media/video/cx25821/cx25821-core.c
@@ -914,9 +914,6 @@ static int cx25821_dev_setup(struct cx25821_dev *dev)
 	list_add_tail(&dev->devlist, &cx25821_devlist);
 	mutex_unlock(&cx25821_devlist_mutex);
 
-	strcpy(cx25821_boards[UNKNOWN_BOARD].name, "unknown");
-	strcpy(cx25821_boards[CX25821_BOARD].name, "cx25821");
-
 	if (dev->pci->device != 0x8210) {
 		pr_info("%s(): Exiting. Incorrect Hardware device = 0x%02x\n",
 			__func__, dev->pci->device);
diff --git a/drivers/media/video/cx25821/cx25821.h b/drivers/media/video/cx25821/cx25821.h
index 2d2d009..bf54360 100644
--- a/drivers/media/video/cx25821/cx25821.h
+++ b/drivers/media/video/cx25821/cx25821.h
@@ -187,7 +187,7 @@ enum port {
 };
 
 struct cx25821_board {
-	char *name;
+	const char *name;
 	enum port porta;
 	enum port portb;
 	enum port portc;
diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
index 6878a94..83b51b5 100644
--- a/drivers/mmc/host/sdhci-pci.c
+++ b/drivers/mmc/host/sdhci-pci.c
@@ -148,6 +148,7 @@ static const struct sdhci_pci_fixes sdhci_ene_714 = {
 static const struct sdhci_pci_fixes sdhci_cafe = {
 	.quirks		= SDHCI_QUIRK_NO_SIMULT_VDD_AND_POWER |
 			  SDHCI_QUIRK_NO_BUSY_IRQ |
+			  SDHCI_QUIRK_BROKEN_CARD_DETECTION |
 			  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
 };
 
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 9e61d6b..ed1be8a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3770,6 +3770,7 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
 	case RTL_GIGA_MAC_VER_22:
 	case RTL_GIGA_MAC_VER_23:
 	case RTL_GIGA_MAC_VER_24:
+	case RTL_GIGA_MAC_VER_34:
 		RTL_W32(RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
 		break;
 	default:
diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c
index 01dcb1a..727c129 100644
--- a/drivers/net/wireless/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/mwifiex/cfg80211.c
@@ -545,9 +545,9 @@ mwifiex_dump_station_info(struct mwifiex_private *priv,
 
 	/*
 	 * Bit 0 in tx_htinfo indicates that current Tx rate is 11n rate. Valid
-	 * MCS index values for us are 0 to 7.
+	 * MCS index values for us are 0 to 15.
 	 */
-	if ((priv->tx_htinfo & BIT(0)) && (priv->tx_rate < 8)) {
+	if ((priv->tx_htinfo & BIT(0)) && (priv->tx_rate < 16)) {
 		sinfo->txrate.mcs = priv->tx_rate;
 		sinfo->txrate.flags |= RATE_INFO_FLAGS_MCS;
 		/* 40MHz rate */
diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index 0ffa111..bdf960b 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -876,6 +876,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x1482, 0x3c09) },
 	/* AirTies */
 	{ USB_DEVICE(0x1eda, 0x2012) },
+	{ USB_DEVICE(0x1eda, 0x2210) },
 	{ USB_DEVICE(0x1eda, 0x2310) },
 	/* Allwin */
 	{ USB_DEVICE(0x8516, 0x2070) },
@@ -945,6 +946,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* DVICO */
 	{ USB_DEVICE(0x0fe9, 0xb307) },
 	/* Edimax */
+	{ USB_DEVICE(0x7392, 0x4085) },
 	{ USB_DEVICE(0x7392, 0x7711) },
 	{ USB_DEVICE(0x7392, 0x7717) },
 	{ USB_DEVICE(0x7392, 0x7718) },
@@ -1020,6 +1022,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* Philips */
 	{ USB_DEVICE(0x0471, 0x200f) },
 	/* Planex */
+	{ USB_DEVICE(0x2019, 0x5201) },
 	{ USB_DEVICE(0x2019, 0xab25) },
 	{ USB_DEVICE(0x2019, 0xed06) },
 	/* Quanta */
@@ -1088,6 +1091,12 @@ static struct usb_device_id rt2800usb_device_table[] = {
 #ifdef CONFIG_RT2800USB_RT33XX
 	/* Belkin */
 	{ USB_DEVICE(0x050d, 0x945b) },
+	/* D-Link */
+	{ USB_DEVICE(0x2001, 0x3c17) },
+	/* Panasonic */
+	{ USB_DEVICE(0x083a, 0xb511) },
+	/* Philips */
+	{ USB_DEVICE(0x0471, 0x20dd) },
 	/* Ralink */
 	{ USB_DEVICE(0x148f, 0x3370) },
 	{ USB_DEVICE(0x148f, 0x8070) },
@@ -1099,6 +1108,8 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x8516, 0x3572) },
 	/* Askey */
 	{ USB_DEVICE(0x1690, 0x0744) },
+	{ USB_DEVICE(0x1690, 0x0761) },
+	{ USB_DEVICE(0x1690, 0x0764) },
 	/* Cisco */
 	{ USB_DEVICE(0x167b, 0x4001) },
 	/* EnGenius */
@@ -1113,6 +1124,9 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* Sitecom */
 	{ USB_DEVICE(0x0df6, 0x0041) },
 	{ USB_DEVICE(0x0df6, 0x0062) },
+	{ USB_DEVICE(0x0df6, 0x0065) },
+	{ USB_DEVICE(0x0df6, 0x0066) },
+	{ USB_DEVICE(0x0df6, 0x0068) },
 	/* Toshiba */
 	{ USB_DEVICE(0x0930, 0x0a07) },
 	/* Zinwell */
@@ -1122,6 +1136,9 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* Azurewave */
 	{ USB_DEVICE(0x13d3, 0x3329) },
 	{ USB_DEVICE(0x13d3, 0x3365) },
+	/* D-Link */
+	{ USB_DEVICE(0x2001, 0x3c1c) },
+	{ USB_DEVICE(0x2001, 0x3c1d) },
 	/* Ralink */
 	{ USB_DEVICE(0x148f, 0x5370) },
 	{ USB_DEVICE(0x148f, 0x5372) },
@@ -1163,13 +1180,8 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	/* D-Link */
 	{ USB_DEVICE(0x07d1, 0x3c0b) },
 	{ USB_DEVICE(0x07d1, 0x3c17) },
-	{ USB_DEVICE(0x2001, 0x3c17) },
-	/* Edimax */
-	{ USB_DEVICE(0x7392, 0x4085) },
 	/* Encore */
 	{ USB_DEVICE(0x203d, 0x14a1) },
-	/* Fujitsu Stylistic 550 */
-	{ USB_DEVICE(0x1690, 0x0761) },
 	/* Gemtek */
 	{ USB_DEVICE(0x15a9, 0x0010) },
 	/* Gigabyte */
@@ -1190,7 +1202,6 @@ static struct usb_device_id rt2800usb_device_table[] = {
 	{ USB_DEVICE(0x05a6, 0x0101) },
 	{ USB_DEVICE(0x1d4d, 0x0010) },
 	/* Planex */
-	{ USB_DEVICE(0x2019, 0x5201) },
 	{ USB_DEVICE(0x2019, 0xab24) },
 	/* Qcom */
 	{ USB_DEVICE(0x18e8, 0x6259) },
diff --git a/drivers/net/wireless/rtlwifi/rtl8192de/phy.c b/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
index 2cf4c5f..de9faa9 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192de/phy.c
@@ -3462,21 +3462,21 @@ void rtl92d_phy_config_macphymode_info(struct ieee80211_hw *hw)
 	switch (rtlhal->macphymode) {
 	case DUALMAC_SINGLEPHY:
 		rtlphy->rf_type = RF_2T2R;
-		rtlhal->version |= CHIP_92D_SINGLEPHY;
+		rtlhal->version |= RF_TYPE_2T2R;
 		rtlhal->bandset = BAND_ON_BOTH;
 		rtlhal->current_bandtype = BAND_ON_2_4G;
 		break;
 
 	case SINGLEMAC_SINGLEPHY:
 		rtlphy->rf_type = RF_2T2R;
-		rtlhal->version |= CHIP_92D_SINGLEPHY;
+		rtlhal->version |= RF_TYPE_2T2R;
 		rtlhal->bandset = BAND_ON_BOTH;
 		rtlhal->current_bandtype = BAND_ON_2_4G;
 		break;
 
 	case DUALMAC_DUALPHY:
 		rtlphy->rf_type = RF_1T1R;
-		rtlhal->version &= (~CHIP_92D_SINGLEPHY);
+		rtlhal->version &= RF_TYPE_1T1R;
 		/* Now we let MAC0 run on 5G band. */
 		if (rtlhal->interfaceindex == 0) {
 			rtlhal->bandset = BAND_ON_5G;
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 351dc0b..ee77a58 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -287,6 +287,7 @@ static void scsi_host_dev_release(struct device *dev)
 	struct Scsi_Host *shost = dev_to_shost(dev);
 	struct device *parent = dev->parent;
 	struct request_queue *q;
+	void *queuedata;
 
 	scsi_proc_hostdir_rm(shost->hostt);
 
@@ -296,9 +297,9 @@ static void scsi_host_dev_release(struct device *dev)
 		destroy_workqueue(shost->work_q);
 	q = shost->uspace_req_q;
 	if (q) {
-		kfree(q->queuedata);
-		q->queuedata = NULL;
-		scsi_free_queue(q);
+		queuedata = q->queuedata;
+		blk_cleanup_queue(q);
+		kfree(queuedata);
 	}
 
 	scsi_destroy_command_freelist(shost);
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index e48ba4b..dbe3568 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -774,7 +774,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 }
 
 /* See if this phy is part of a wide port */
-static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
+static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 {
 	struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id];
 	int i;
@@ -790,11 +790,11 @@ static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
 			sas_port_add_phy(ephy->port, phy->phy);
 			phy->port = ephy->port;
 			phy->phy_state = PHY_DEVICE_DISCOVERED;
-			return 0;
+			return true;
 		}
 	}
 
-	return -ENODEV;
+	return false;
 }
 
 static struct domain_device *sas_ex_discover_expander(
@@ -932,8 +932,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 		return res;
 	}
 
-	res = sas_ex_join_wide_port(dev, phy_id);
-	if (!res) {
+	if (sas_ex_join_wide_port(dev, phy_id)) {
 		SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 			    phy_id, SAS_ADDR(ex_phy->attached_sas_addr));
 		return res;
@@ -978,8 +977,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
 			if (SAS_ADDR(ex->ex_phy[i].attached_sas_addr) ==
 			    SAS_ADDR(child->sas_addr)) {
 				ex->ex_phy[i].phy_state= PHY_DEVICE_DISCOVERED;
-				res = sas_ex_join_wide_port(dev, i);
-				if (!res)
+				if (sas_ex_join_wide_port(dev, i))
 					SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
 						    i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));
 
@@ -1849,32 +1847,20 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 {
 	struct ex_phy *ex_phy = &dev->ex_dev.ex_phy[phy_id];
 	struct domain_device *child;
-	bool found = false;
-	int res, i;
+	int res;
 
 	SAS_DPRINTK("ex %016llx phy%d new device attached\n",
 		    SAS_ADDR(dev->sas_addr), phy_id);
 	res = sas_ex_phy_discover(dev, phy_id);
 	if (res)
-		goto out;
-	/* to support the wide port inserted */
-	for (i = 0; i < dev->ex_dev.num_phys; i++) {
-		struct ex_phy *ex_phy_temp = &dev->ex_dev.ex_phy[i];
-		if (i == phy_id)
-			continue;
-		if (SAS_ADDR(ex_phy_temp->attached_sas_addr) ==
-		    SAS_ADDR(ex_phy->attached_sas_addr)) {
-			found = true;
-			break;
-		}
-	}
-	if (found) {
-		sas_ex_join_wide_port(dev, phy_id);
+		return res;
+
+	if (sas_ex_join_wide_port(dev, phy_id))
 		return 0;
-	}
+
 	res = sas_ex_discover_devices(dev, phy_id);
-	if (!res)
-		goto out;
+	if (res)
+		return res;
 	list_for_each_entry(child, &dev->ex_dev.children, siblings) {
 		if (SAS_ADDR(child->sas_addr) ==
 		    SAS_ADDR(ex_phy->attached_sas_addr)) {
@@ -1884,7 +1870,6 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
 			break;
 		}
 	}
-out:
 	return res;
 }
 
@@ -1983,9 +1968,7 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 	struct domain_device *dev = NULL;
 
 	res = sas_find_bcast_dev(port_dev, &dev);
-	if (res)
-		goto out;
-	if (dev) {
+	while (res == 0 && dev) {
 		struct expander_device *ex = &dev->ex_dev;
 		int i = 0, phy_id;
 
@@ -1997,8 +1980,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 			res = sas_rediscover(dev, phy_id);
 			i = phy_id + 1;
 		} while (i < ex->num_phys);
+
+		dev = NULL;
+		res = sas_find_bcast_dev(port_dev, &dev);
 	}
-out:
 	return res;
 }
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2aeb2e9..831db24 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -785,7 +785,13 @@ static void scsi_done(struct scsi_cmnd *cmd)
 /* Move this to a header if it becomes more generally useful */
 static struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
 {
-	return *(struct scsi_driver **)cmd->request->rq_disk->private_data;
+	struct scsi_driver **sdp;
+
+	sdp = (struct scsi_driver **)cmd->request->rq_disk->private_data;
+	if (!sdp)
+		return NULL;
+
+	return *sdp;
 }
 
 /**
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index dc6131e..456b131 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1673,6 +1673,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * requests are started.
 	 */
 	scsi_run_host_queues(shost);
+
+	/*
+	 * if eh is active and host_eh_scheduled is pending we need to re-run
+	 * recovery.  we do this check after scsi_run_host_queues() to allow
+	 * everything pent up since the last eh run a chance to make forward
+	 * progress before we sync again.  Either we'll immediately re-run
+	 * recovery or scsi_device_unbusy() will wake us again when these
+	 * pending commands complete.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (shost->host_eh_scheduled)
+		if (scsi_host_set_state(shost, SHOST_RECOVERY))
+			WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f0ab58e..6c4b620 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -406,10 +406,6 @@ static void scsi_run_queue(struct request_queue *q)
 	LIST_HEAD(starved_list);
 	unsigned long flags;
 
-	/* if the device is dead, sdev will be NULL, so no queue to run */
-	if (!sdev)
-		return;
-
 	shost = sdev->host;
 	if (scsi_target(sdev)->single_lun)
 		scsi_single_lun_run(sdev);
@@ -483,15 +479,26 @@ void scsi_requeue_run_queue(struct work_struct *work)
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
+	struct scsi_device *sdev = cmd->device;
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	/*
+	 * We need to hold a reference on the device to avoid the queue being
+	 * killed after the unlock and before scsi_run_queue is invoked which
+	 * may happen because scsi_unprep_request() puts the command which
+	 * releases its reference on the device.
+	 */
+	get_device(&sdev->sdev_gendev);
+
 	spin_lock_irqsave(q->queue_lock, flags);
 	scsi_unprep_request(req);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	scsi_run_queue(q);
+
+	put_device(&sdev->sdev_gendev);
 }
 
 void scsi_next_command(struct scsi_cmnd *cmd)
@@ -1374,16 +1381,16 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
  * may be changed after request stacking drivers call the function,
  * regardless of taking lock or not.
  *
- * When scsi can't dispatch I/Os anymore and needs to kill I/Os
- * (e.g. !sdev), scsi needs to return 'not busy'.
- * Otherwise, request stacking drivers may hold requests forever.
+ * When scsi can't dispatch I/Os anymore and needs to kill I/Os scsi
+ * needs to return 'not busy'. Otherwise, request stacking drivers
+ * may hold requests forever.
  */
 static int scsi_lld_busy(struct request_queue *q)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
 
-	if (!sdev)
+	if (blk_queue_dead(q))
 		return 0;
 
 	shost = sdev->host;
@@ -1494,12 +1501,6 @@ static void scsi_request_fn(struct request_queue *q)
 	struct scsi_cmnd *cmd;
 	struct request *req;
 
-	if (!sdev) {
-		while ((req = blk_peek_request(q)) != NULL)
-			scsi_kill_request(req, q);
-		return;
-	}
-
 	if(!get_device(&sdev->sdev_gendev))
 		/* We must be tearing the block queue down already */
 		return;
@@ -1701,20 +1702,6 @@ struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 	return q;
 }
 
-void scsi_free_queue(struct request_queue *q)
-{
-	unsigned long flags;
-
-	WARN_ON(q->queuedata);
-
-	/* cause scsi_request_fn() to kill all non-finished requests */
-	spin_lock_irqsave(q->queue_lock, flags);
-	q->request_fn(q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
-	blk_cleanup_queue(q);
-}
-
 /*
  * Function:    scsi_block_requests()
  *
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 5b475d0..d58adca 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -85,7 +85,6 @@ extern void scsi_next_command(struct scsi_cmnd *cmd);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
-extern void scsi_free_queue(struct request_queue *q);
 extern int scsi_init_queue(void);
 extern void scsi_exit_queue(void);
 struct request_queue;
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 6e7ea4a..a48b59c 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1710,6 +1710,9 @@ static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
 	shost_for_each_device(sdev, shost) {
+		/* target removed before the device could be added */
+		if (sdev->sdev_state == SDEV_DEL)
+			continue;
 		if (!scsi_host_scan_allowed(shost) ||
 		    scsi_sysfs_add_sdev(sdev) != 0)
 			__scsi_remove_device(sdev);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 04c2a27..bb7c482 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -971,11 +971,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		sdev->host->hostt->slave_destroy(sdev);
 	transport_destroy_device(dev);
 
-	/* cause the request function to reject all I/O requests */
-	sdev->request_queue->queuedata = NULL;
-
 	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
+	blk_cleanup_queue(sdev->request_queue);
 	put_device(dev);
 }
 
@@ -1000,7 +997,6 @@ static void __scsi_remove_target(struct scsi_target *starget)
 	struct scsi_device *sdev;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	starget->reap_ref++;
  restart:
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
 		if (sdev->channel != starget->channel ||
@@ -1014,14 +1010,6 @@ static void __scsi_remove_target(struct scsi_target *starget)
 		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	scsi_target_reap(starget);
-}
-
-static int __remove_child (struct device * dev, void * data)
-{
-	if (scsi_is_target_device(dev))
-		__scsi_remove_target(to_scsi_target(dev));
-	return 0;
 }
 
 /**
@@ -1034,14 +1022,34 @@ static int __remove_child (struct device * dev, void * data)
  */
 void scsi_remove_target(struct device *dev)
 {
-	if (scsi_is_target_device(dev)) {
-		__scsi_remove_target(to_scsi_target(dev));
-		return;
+	struct Scsi_Host *shost = dev_to_shost(dev->parent);
+	struct scsi_target *starget, *found;
+	unsigned long flags;
+
+ restart:
+	found = NULL;
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_for_each_entry(starget, &shost->__targets, siblings) {
+		if (starget->state == STARGET_DEL)
+			continue;
+		if (starget->dev.parent == dev || &starget->dev == dev) {
+			found = starget;
+			found->reap_ref++;
+			break;
+		}
 	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	get_device(dev);
-	device_for_each_child(dev, NULL, __remove_child);
-	put_device(dev);
+	if (found) {
+		__scsi_remove_target(found);
+		scsi_target_reap(found);
+		/* in the case where @dev has multiple starget children,
+		 * continue removing.
+		 *
+		 * FIXME: does such a case exist?
+		 */
+		goto restart;
+	}
 }
 EXPORT_SYMBOL(scsi_remove_target);
 
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 0842cc7..2ff1255 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -427,19 +427,8 @@ int iscsit_reset_np_thread(
 
 int iscsit_del_np_comm(struct iscsi_np *np)
 {
-	if (!np->np_socket)
-		return 0;
-
-	/*
-	 * Some network transports allocate their own struct sock->file,
-	 * see  if we need to free any additional allocated resources.
-	 */
-	if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-		kfree(np->np_socket->file);
-		np->np_socket->file = NULL;
-	}
-
-	sock_release(np->np_socket);
+	if (np->np_socket)
+		sock_release(np->np_socket);
 	return 0;
 }
 
@@ -4105,13 +4094,8 @@ int iscsit_close_connection(
 	kfree(conn->conn_ops);
 	conn->conn_ops = NULL;
 
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	conn->thread_set = NULL;
 
 	pr_debug("Moving to TARG_CONN_STATE_FREE.\n");
diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 7da2d6a..0f68197 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -224,7 +224,6 @@ enum iscsi_timer_flags_table {
 /* Used for struct iscsi_np->np_flags */
 enum np_flags_table {
 	NPF_IP_NETWORK		= 0x00,
-	NPF_SCTP_STRUCT_FILE	= 0x01 /* Bugfix */
 };
 
 /* Used for struct iscsi_np->np_thread_state */
@@ -511,7 +510,6 @@ struct iscsi_conn {
 	u16			local_port;
 	int			net_size;
 	u32			auth_id;
-#define CONNFLAG_SCTP_STRUCT_FILE			0x01
 	u32			conn_flags;
 	/* Used for iscsi_tx_login_rsp() */
 	u32			login_itt;
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index bd2adec..2ec5339 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -793,22 +793,6 @@ int iscsi_target_setup_login_socket(
 	}
 	np->np_socket = sock;
 	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!sock->file) {
-			sock->file = kzalloc(sizeof(struct file), GFP_KERNEL);
-			if (!sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				ret = -ENOMEM;
-				goto fail;
-			}
-			np->np_flags |= NPF_SCTP_STRUCT_FILE;
-		}
-	}
-	/*
 	 * Setup the np->np_sockaddr from the passed sockaddr setup
 	 * in iscsi_target_configfs.c code..
 	 */
@@ -857,21 +841,15 @@ int iscsi_target_setup_login_socket(
 
 fail:
 	np->np_socket = NULL;
-	if (sock) {
-		if (np->np_flags & NPF_SCTP_STRUCT_FILE) {
-			kfree(sock->file);
-			sock->file = NULL;
-		}
-
+	if (sock)
 		sock_release(sock);
-	}
 	return ret;
 }
 
 static int __iscsi_target_login_thread(struct iscsi_np *np)
 {
 	u8 buffer[ISCSI_HDR_LEN], iscsi_opcode, zero_tsih = 0;
-	int err, ret = 0, ip_proto, sock_type, set_sctp_conn_flag, stop;
+	int err, ret = 0, ip_proto, sock_type, stop;
 	struct iscsi_conn *conn = NULL;
 	struct iscsi_login *login;
 	struct iscsi_portal_group *tpg = NULL;
@@ -882,7 +860,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 	struct sockaddr_in6 sock_in6;
 
 	flush_signals(current);
-	set_sctp_conn_flag = 0;
 	sock = np->np_socket;
 	ip_proto = np->np_ip_proto;
 	sock_type = np->np_sock_type;
@@ -907,35 +884,12 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 		spin_unlock_bh(&np->np_thread_lock);
 		goto out;
 	}
-	/*
-	 * The SCTP stack needs struct socket->file.
-	 */
-	if ((np->np_network_transport == ISCSI_SCTP_TCP) ||
-	    (np->np_network_transport == ISCSI_SCTP_UDP)) {
-		if (!new_sock->file) {
-			new_sock->file = kzalloc(
-					sizeof(struct file), GFP_KERNEL);
-			if (!new_sock->file) {
-				pr_err("Unable to allocate struct"
-						" file for SCTP\n");
-				sock_release(new_sock);
-				/* Get another socket */
-				return 1;
-			}
-			set_sctp_conn_flag = 1;
-		}
-	}
-
 	iscsi_start_login_thread_timer(np);
 
 	conn = kzalloc(sizeof(struct iscsi_conn), GFP_KERNEL);
 	if (!conn) {
 		pr_err("Could not allocate memory for"
 			" new connection\n");
-		if (set_sctp_conn_flag) {
-			kfree(new_sock->file);
-			new_sock->file = NULL;
-		}
 		sock_release(new_sock);
 		/* Get another socket */
 		return 1;
@@ -945,9 +899,6 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
 	conn->conn_state = TARG_CONN_STATE_FREE;
 	conn->sock = new_sock;
 
-	if (set_sctp_conn_flag)
-		conn->conn_flags |= CONNFLAG_SCTP_STRUCT_FILE;
-
 	pr_debug("Moving to TARG_CONN_STATE_XPT_UP.\n");
 	conn->conn_state = TARG_CONN_STATE_XPT_UP;
 
@@ -1195,13 +1146,8 @@ old_sess_out:
 		iscsi_release_param_list(conn->param_list);
 		conn->param_list = NULL;
 	}
-	if (conn->sock) {
-		if (conn->conn_flags & CONNFLAG_SCTP_STRUCT_FILE) {
-			kfree(conn->sock->file);
-			conn->sock->file = NULL;
-		}
+	if (conn->sock)
 		sock_release(conn->sock);
-	}
 	kfree(conn);
 
 	if (tpg) {
diff --git a/drivers/target/target_core_cdb.c b/drivers/target/target_core_cdb.c
index 93b9406..717a8d4 100644
--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1114,11 +1114,11 @@ int target_emulate_unmap(struct se_task *task)
 	struct se_cmd *cmd = task->task_se_cmd;
 	struct se_device *dev = cmd->se_dev;
 	unsigned char *buf, *ptr = NULL;
-	unsigned char *cdb = &cmd->t_task_cdb[0];
 	sector_t lba;
-	unsigned int size = cmd->data_length, range;
-	int ret = 0, offset;
-	unsigned short dl, bd_dl;
+	int size = cmd->data_length;
+	u32 range;
+	int ret = 0;
+	int dl, bd_dl;
 
 	if (!dev->transport->do_discard) {
 		pr_err("UNMAP emulation not supported for: %s\n",
@@ -1127,24 +1127,41 @@ int target_emulate_unmap(struct se_task *task)
 		return -ENOSYS;
 	}
 
-	/* First UNMAP block descriptor starts at 8 byte offset */
-	offset = 8;
-	size -= 8;
-	dl = get_unaligned_be16(&cdb[0]);
-	bd_dl = get_unaligned_be16(&cdb[2]);
-
 	buf = transport_kmap_data_sg(cmd);
 
-	ptr = &buf[offset];
-	pr_debug("UNMAP: Sub: %s Using dl: %hu bd_dl: %hu size: %hu"
+	dl = get_unaligned_be16(&buf[0]);
+	bd_dl = get_unaligned_be16(&buf[2]);
+
+	size = min(size - 8, bd_dl);
+	if (size / 16 > dev->se_sub_dev->se_dev_attrib.max_unmap_block_desc_count) {
+		cmd->scsi_sense_reason = TCM_INVALID_PARAMETER_LIST;
+		ret = -EINVAL;
+		goto err;
+	}
+
+	/* First UNMAP block descriptor starts at 8 byte offset */
+	ptr = &buf[8];
+	pr_debug("UNMAP: Sub: %s Using dl: %u bd_dl: %u size: %u"
 		" ptr: %p\n", dev->transport->name, dl, bd_dl, size, ptr);
 
-	while (size) {
+	while (size >= 16) {
 		lba = get_unaligned_be64(&ptr[0]);
 		range = get_unaligned_be32(&ptr[8]);
 		pr_debug("UNMAP: Using lba: %llu and range: %u\n",
 				 (unsigned long long)lba, range);
 
+		if (range > dev->se_sub_dev->se_dev_attrib.max_unmap_lba_count) {
+			cmd->scsi_sense_reason = TCM_INVALID_PARAMETER_LIST;
+			ret = -EINVAL;
+			goto err;
+		}
+
+		if (lba + range > dev->transport->get_blocks(dev) + 1) {
+			cmd->scsi_sense_reason = TCM_ADDRESS_OUT_OF_RANGE;
+			ret = -EINVAL;
+			goto err;
+		}
+
 		ret = dev->transport->do_discard(dev, lba, range);
 		if (ret < 0) {
 			pr_err("blkdev_issue_discard() failed: %d\n",
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5660916..94c03d2 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1820,6 +1820,7 @@ static void transport_generic_request_failure(struct se_cmd *cmd)
 	case TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE:
 	case TCM_UNKNOWN_MODE_PAGE:
 	case TCM_WRITE_PROTECTED:
+	case TCM_ADDRESS_OUT_OF_RANGE:
 	case TCM_CHECK_CONDITION_ABORT_CMD:
 	case TCM_CHECK_CONDITION_UNIT_ATTENTION:
 	case TCM_CHECK_CONDITION_NOT_READY:
@@ -4496,6 +4497,15 @@ int transport_send_check_condition_and_sense(
 		/* WRITE PROTECTED */
 		buffer[offset+SPC_ASC_KEY_OFFSET] = 0x27;
 		break;
+	case TCM_ADDRESS_OUT_OF_RANGE:
+		/* CURRENT ERROR */
+		buffer[offset] = 0x70;
+		buffer[offset+SPC_ADD_SENSE_LEN_OFFSET] = 10;
+		/* ILLEGAL REQUEST */
+		buffer[offset+SPC_SENSE_KEY_OFFSET] = ILLEGAL_REQUEST;
+		/* LOGICAL BLOCK ADDRESS OUT OF RANGE */
+		buffer[offset+SPC_ASC_KEY_OFFSET] = 0x21;
+		break;
 	case TCM_CHECK_CONDITION_UNIT_ATTENTION:
 		/* CURRENT ERROR */
 		buffer[offset] = 0x70;
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index f6ff837..a9df218 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -1555,10 +1555,14 @@ static int processcompl_compat(struct async *as, void __user * __user *arg)
 	void __user *addr = as->userurb;
 	unsigned int i;
 
-	if (as->userbuffer && urb->actual_length)
-		if (copy_to_user(as->userbuffer, urb->transfer_buffer,
-				 urb->actual_length))
+	if (as->userbuffer && urb->actual_length) {
+		if (urb->number_of_packets > 0)		/* Isochronous */
+			i = urb->transfer_buffer_length;
+		else					/* Non-Isoc */
+			i = urb->actual_length;
+		if (copy_to_user(as->userbuffer, urb->transfer_buffer, i))
 			return -EFAULT;
+	}
 	if (put_user(as->status, &userurb->status))
 		return -EFAULT;
 	if (put_user(urb->actual_length, &userurb->actual_length))
diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index 29c854b..4e1f0aa 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -796,12 +796,6 @@ int gether_setup(struct usb_gadget *g, u8 ethaddr[ETH_ALEN])
 
 	SET_ETHTOOL_OPS(net, &ops);
 
-	/* two kinds of host-initiated state changes:
-	 *  - iff DATA transfer is active, carrier is "on"
-	 *  - tx queueing enabled if open *and* carrier is "on"
-	 */
-	netif_carrier_off(net);
-
 	dev->gadget = g;
 	SET_NETDEV_DEV(net, &g->dev);
 	SET_NETDEV_DEVTYPE(net, &gadget_type);
@@ -815,6 +809,12 @@ int gether_setup(struct usb_gadget *g, u8 ethaddr[ETH_ALEN])
 		INFO(dev, "HOST MAC %pM\n", dev->host_mac);
 
 		the_dev = dev;
+
+		/* two kinds of host-initiated state changes:
+		 *  - iff DATA transfer is active, carrier is "on"
+		 *  - tx queueing enabled if open *and* carrier is "on"
+		 */
+		netif_carrier_off(net);
 	}
 
 	return status;
diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 5971c95..d89aac1 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -932,8 +932,12 @@ static const struct usb_device_id option_ids[] = {
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0165, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0167, 0xff, 0xff, 0xff),
 	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
-	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff) },
-	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff) },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x0326, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1008, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1010, 0xff, 0xff, 0xff),
+	  .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1012, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1057, 0xff, 0xff, 0xff) },
 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1058, 0xff, 0xff, 0xff) },
diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 0b39458..03321e5 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -206,10 +206,17 @@ static noinline int run_ordered_completions(struct btrfs_workers *workers,
 
 		work->ordered_func(work);
 
-		/* now take the lock again and call the freeing code */
+		/* now take the lock again and drop our item from the list */
 		spin_lock(&workers->order_lock);
 		list_del(&work->order_list);
+		spin_unlock(&workers->order_lock);
+
+		/*
+		 * we don't want to call the ordered free functions
+		 * with the lock held though
+		 */
 		work->ordered_free(work);
+		spin_lock(&workers->order_lock);
 	}
 
 	spin_unlock(&workers->order_lock);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f44b392..6b2a724 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
-			struct page *newpage, struct page *page)
+			struct page *newpage, struct page *page,
+			enum migrate_mode mode)
 {
 	/*
 	 * we can't safely write a btree page from here,
@@ -887,7 +888,7 @@ static int btree_migratepage(struct address_space *mapping,
 	if (page_has_private(page) &&
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, mode);
 }
 #endif
 
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 6aa7457..c858a29 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -89,6 +89,32 @@ static struct {
 /* Forward declarations */
 static void cifs_readv_complete(struct work_struct *work);
 
+#ifdef CONFIG_HIGHMEM
+/*
+ * On arches that have high memory, kmap address space is limited. By
+ * serializing the kmap operations on those arches, we ensure that we don't
+ * end up with a bunch of threads in writeback with partially mapped page
+ * arrays, stuck waiting for kmap to come back. That situation prevents
+ * progress and can deadlock.
+ */
+static DEFINE_MUTEX(cifs_kmap_mutex);
+
+static inline void
+cifs_kmap_lock(void)
+{
+	mutex_lock(&cifs_kmap_mutex);
+}
+
+static inline void
+cifs_kmap_unlock(void)
+{
+	mutex_unlock(&cifs_kmap_mutex);
+}
+#else /* !CONFIG_HIGHMEM */
+#define cifs_kmap_lock() do { ; } while(0)
+#define cifs_kmap_unlock() do { ; } while(0)
+#endif /* CONFIG_HIGHMEM */
+
 /* Mark as invalid, all open files on tree connections since they
    were closed when session to server was lost */
 static void mark_open_files_invalid(struct cifs_tcon *pTcon)
@@ -1540,6 +1566,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 	eof_index = eof ? (eof - 1) >> PAGE_CACHE_SHIFT : 0;
 	cFYI(1, "eof=%llu eof_index=%lu", eof, eof_index);
 
+	cifs_kmap_lock();
 	list_for_each_entry_safe(page, tpage, &rdata->pages, lru) {
 		if (remaining >= PAGE_CACHE_SIZE) {
 			/* enough data to fill the page */
@@ -1589,6 +1616,7 @@ cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid)
 			page_cache_release(page);
 		}
 	}
+	cifs_kmap_unlock();
 
 	/* issue the read if we have any iovecs left to fill */
 	if (rdata->nr_iov > 1) {
@@ -2171,6 +2199,7 @@ cifs_async_writev(struct cifs_writedata *wdata)
 	iov[0].iov_base = smb;
 
 	/* marshal up the pages into iov array */
+	cifs_kmap_lock();
 	wdata->bytes = 0;
 	for (i = 0; i < wdata->nr_pages; i++) {
 		iov[i + 1].iov_len = min(inode->i_size -
@@ -2179,6 +2208,7 @@ cifs_async_writev(struct cifs_writedata *wdata)
 		iov[i + 1].iov_base = kmap(wdata->pages[i]);
 		wdata->bytes += iov[i + 1].iov_len;
 	}
+	cifs_kmap_unlock();
 
 	cFYI(1, "async write at %llu %u bytes", wdata->offset, wdata->bytes);
 
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 914bf9e..d6970f7 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -557,7 +557,8 @@ ext4_fsblk_t ext4_count_free_clusters(struct super_block *sb)
 		if (bitmap_bh == NULL)
 			continue;
 
-		x = ext4_count_free(bitmap_bh, sb->s_blocksize);
+		x = ext4_count_free(bitmap_bh->b_data,
+				    EXT4_BLOCKS_PER_GROUP(sb) / 8);
 		printk(KERN_DEBUG "group %u: stored = %d, counted = %u\n",
 			i, ext4_free_group_clusters(sb, gdp), x);
 		bitmap_count += x;
diff --git a/fs/ext4/bitmap.c b/fs/ext4/bitmap.c
index fa3af81..bbde5d5 100644
--- a/fs/ext4/bitmap.c
+++ b/fs/ext4/bitmap.c
@@ -11,21 +11,15 @@
 #include <linux/jbd2.h>
 #include "ext4.h"
 
-#ifdef EXT4FS_DEBUG
-
 static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0};
 
-unsigned int ext4_count_free(struct buffer_head *map, unsigned int numchars)
+unsigned int ext4_count_free(char *bitmap, unsigned int numchars)
 {
 	unsigned int i, sum = 0;
 
-	if (!map)
-		return 0;
 	for (i = 0; i < numchars; i++)
-		sum += nibblemap[map->b_data[i] & 0xf] +
-			nibblemap[(map->b_data[i] >> 4) & 0xf];
+		sum += nibblemap[bitmap[i] & 0xf] +
+			nibblemap[(bitmap[i] >> 4) & 0xf];
 	return sum;
 }
 
-#endif  /*  EXT4FS_DEBUG  */
-
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7b1cd5c..8cb184c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1123,8 +1123,7 @@ struct ext4_sb_info {
 	unsigned long s_desc_per_block;	/* Number of group descriptors per block */
 	ext4_group_t s_groups_count;	/* Number of groups in the fs */
 	ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */
-	unsigned long s_overhead_last;  /* Last calculated overhead */
-	unsigned long s_blocks_last;    /* Last seen block count */
+	unsigned long s_overhead;  /* # of fs overhead clusters */
 	unsigned int s_cluster_ratio;	/* Number of blocks per cluster */
 	unsigned int s_cluster_bits;	/* log2 of s_cluster_ratio */
 	loff_t s_bitmap_maxbytes;	/* max bytes for bitmap files */
@@ -1757,7 +1756,7 @@ struct mmpd_data {
 # define NORET_AND	noreturn,
 
 /* bitmap.c */
-extern unsigned int ext4_count_free(struct buffer_head *, unsigned);
+extern unsigned int ext4_count_free(char *bitmap, unsigned numchars);
 
 /* balloc.c */
 extern unsigned int ext4_block_group(struct super_block *sb,
@@ -1925,6 +1924,7 @@ extern int ext4_group_extend(struct super_block *sb,
 				ext4_fsblk_t n_blocks_count);
 
 /* super.c */
+extern int ext4_calculate_overhead(struct super_block *sb);
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 8fb6844..6266799 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1057,7 +1057,8 @@ unsigned long ext4_count_free_inodes(struct super_block *sb)
 		if (!bitmap_bh)
 			continue;
 
-		x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8);
+		x = ext4_count_free(bitmap_bh->b_data,
+				    EXT4_INODES_PER_GROUP(sb) / 8);
 		printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
 			(unsigned long) i, ext4_free_inodes_count(sb, gdp), x);
 		bitmap_count += x;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3ce7613..8b01f9f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -277,6 +277,15 @@ void ext4_da_update_reserve_space(struct inode *inode,
 		used = ei->i_reserved_data_blocks;
 	}
 
+	if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) {
+		ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d "
+			 "with only %d reserved metadata blocks\n", __func__,
+			 inode->i_ino, ei->i_allocated_meta_blocks,
+			 ei->i_reserved_meta_blocks);
+		WARN_ON(1);
+		ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks;
+	}
+
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
 	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
@@ -1102,6 +1111,17 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	unsigned int md_needed;
 	int ret;
+	ext4_lblk_t save_last_lblock;
+	int save_len;
+
+	/*
+	 * We will charge metadata quota at writeout time; this saves
+	 * us from metadata over-estimation, though we may go over by
+	 * a small amount in the end.  Here we just reserve for data.
+	 */
+	ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
+	if (ret)
+		return ret;
 
 	/*
 	 * recalculate the amount of metadata blocks to reserve
@@ -1110,32 +1130,31 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock)
 	 */
 repeat:
 	spin_lock(&ei->i_block_reservation_lock);
+	/*
+	 * ext4_calc_metadata_amount() has side effects, which we have
+	 * to be prepared undo if we fail to claim space.
+	 */
+	save_len = ei->i_da_metadata_calc_len;
+	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
 	md_needed = EXT4_NUM_B2C(sbi,
 				 ext4_calc_metadata_amount(inode, lblock));
 	trace_ext4_da_reserve_space(inode, md_needed);
-	spin_unlock(&ei->i_block_reservation_lock);
 
 	/*
-	 * We will charge metadata quota at writeout time; this saves
-	 * us from metadata over-estimation, though we may go over by
-	 * a small amount in the end.  Here we just reserve for data.
-	 */
-	ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
-	if (ret)
-		return ret;
-	/*
 	 * We do still charge estimated metadata to the sb though;
 	 * we cannot afford to run out of free blocks.
 	 */
 	if (ext4_claim_free_clusters(sbi, md_needed + 1, 0)) {
-		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
+		ei->i_da_metadata_calc_len = save_len;
+		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
+		spin_unlock(&ei->i_block_reservation_lock);
 		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
 			yield();
 			goto repeat;
 		}
+		dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
 		return -ENOSPC;
 	}
-	spin_lock(&ei->i_block_reservation_lock);
 	ei->i_reserved_data_blocks++;
 	ei->i_reserved_meta_blocks += md_needed;
 	spin_unlock(&ei->i_block_reservation_lock);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 996780a..4eac337 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -952,6 +952,11 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
 			   &sbi->s_flex_groups[flex_group].free_inodes);
 	}
 
+	/*
+	 * Update the fs overhead information
+	 */
+	ext4_calculate_overhead(sb);
+
 	ext4_handle_dirty_super(handle, sb);
 
 exit_journal:
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a93486e..a071348 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3083,6 +3083,114 @@ static void ext4_destroy_lazyinit_thread(void)
 	kthread_stop(ext4_lazyinit_task);
 }
 
+/*
+ * Note: calculating the overhead so we can be compatible with
+ * historical BSD practice is quite difficult in the face of
+ * clusters/bigalloc.  This is because multiple metadata blocks from
+ * different block group can end up in the same allocation cluster.
+ * Calculating the exact overhead in the face of clustered allocation
+ * requires either O(all block bitmaps) in memory or O(number of block
+ * groups**2) in time.  We will still calculate the superblock for
+ * older file systems --- and if we come across with a bigalloc file
+ * system with zero in s_overhead_clusters the estimate will be close to
+ * correct especially for very large cluster sizes --- but for newer
+ * file systems, it's better to calculate this figure once at mkfs
+ * time, and store it in the superblock.  If the superblock value is
+ * present (even for non-bigalloc file systems), we will use it.
+ */
+static int count_overhead(struct super_block *sb, ext4_group_t grp,
+			  char *buf)
+{
+	struct ext4_sb_info	*sbi = EXT4_SB(sb);
+	struct ext4_group_desc	*gdp;
+	ext4_fsblk_t		first_block, last_block, b;
+	ext4_group_t		i, ngroups = ext4_get_groups_count(sb);
+	int			s, j, count = 0;
+
+	first_block = le32_to_cpu(sbi->s_es->s_first_data_block) +
+		(grp * EXT4_BLOCKS_PER_GROUP(sb));
+	last_block = first_block + EXT4_BLOCKS_PER_GROUP(sb) - 1;
+	for (i = 0; i < ngroups; i++) {
+		gdp = ext4_get_group_desc(sb, i, NULL);
+		b = ext4_block_bitmap(sb, gdp);
+		if (b >= first_block && b <= last_block) {
+			ext4_set_bit(EXT4_B2C(sbi, b - first_block), buf);
+			count++;
+		}
+		b = ext4_inode_bitmap(sb, gdp);
+		if (b >= first_block && b <= last_block) {
+			ext4_set_bit(EXT4_B2C(sbi, b - first_block), buf);
+			count++;
+		}
+		b = ext4_inode_table(sb, gdp);
+		if (b >= first_block && b + sbi->s_itb_per_group <= last_block)
+			for (j = 0; j < sbi->s_itb_per_group; j++, b++) {
+				int c = EXT4_B2C(sbi, b - first_block);
+				ext4_set_bit(c, buf);
+				count++;
+			}
+		if (i != grp)
+			continue;
+		s = 0;
+		if (ext4_bg_has_super(sb, grp)) {
+			ext4_set_bit(s++, buf);
+			count++;
+		}
+		for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) {
+			ext4_set_bit(EXT4_B2C(sbi, s++), buf);
+			count++;
+		}
+	}
+	if (!count)
+		return 0;
+	return EXT4_CLUSTERS_PER_GROUP(sb) -
+		ext4_count_free(buf, EXT4_CLUSTERS_PER_GROUP(sb) / 8);
+}
+
+/*
+ * Compute the overhead and stash it in sbi->s_overhead
+ */
+int ext4_calculate_overhead(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_super_block *es = sbi->s_es;
+	ext4_group_t i, ngroups = ext4_get_groups_count(sb);
+	ext4_fsblk_t overhead = 0;
+	char *buf = (char *) get_zeroed_page(GFP_KERNEL);
+
+	memset(buf, 0, PAGE_SIZE);
+	if (!buf)
+		return -ENOMEM;
+
+	/*
+	 * Compute the overhead (FS structures).  This is constant
+	 * for a given filesystem unless the number of block groups
+	 * changes so we cache the previous value until it does.
+	 */
+
+	/*
+	 * All of the blocks before first_data_block are overhead
+	 */
+	overhead = EXT4_B2C(sbi, le32_to_cpu(es->s_first_data_block));
+
+	/*
+	 * Add the overhead found in each block group
+	 */
+	for (i = 0; i < ngroups; i++) {
+		int blks;
+
+		blks = count_overhead(sb, i, buf);
+		overhead += blks;
+		if (blks)
+			memset(buf, 0, PAGE_SIZE);
+		cond_resched();
+	}
+	sbi->s_overhead = overhead;
+	smp_wmb();
+	free_page((unsigned long) buf);
+	return 0;
+}
+
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
 	char *orig_data = kstrdup(data, GFP_KERNEL);
@@ -3695,6 +3803,18 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 
 no_journal:
 	/*
+	 * Get the # of file system overhead blocks from the
+	 * superblock if present.
+	 */
+	if (es->s_overhead_clusters)
+		sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
+	else {
+		ret = ext4_calculate_overhead(sb);
+		if (ret)
+			goto failed_mount_wq;
+	}
+
+	/*
 	 * The maximum number of concurrent works can be high and
 	 * concurrency isn't really necessary.  Limit it to 1.
 	 */
@@ -4568,67 +4688,21 @@ restore_opts:
 	return err;
 }
 
-/*
- * Note: calculating the overhead so we can be compatible with
- * historical BSD practice is quite difficult in the face of
- * clusters/bigalloc.  This is because multiple metadata blocks from
- * different block group can end up in the same allocation cluster.
- * Calculating the exact overhead in the face of clustered allocation
- * requires either O(all block bitmaps) in memory or O(number of block
- * groups**2) in time.  We will still calculate the superblock for
- * older file systems --- and if we come across with a bigalloc file
- * system with zero in s_overhead_clusters the estimate will be close to
- * correct especially for very large cluster sizes --- but for newer
- * file systems, it's better to calculate this figure once at mkfs
- * time, and store it in the superblock.  If the superblock value is
- * present (even for non-bigalloc file systems), we will use it.
- */
 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	struct ext4_super_block *es = sbi->s_es;
-	struct ext4_group_desc *gdp;
+	ext4_fsblk_t overhead = 0;
 	u64 fsid;
 	s64 bfree;
 
-	if (test_opt(sb, MINIX_DF)) {
-		sbi->s_overhead_last = 0;
-	} else if (es->s_overhead_clusters) {
-		sbi->s_overhead_last = le32_to_cpu(es->s_overhead_clusters);
-	} else if (sbi->s_blocks_last != ext4_blocks_count(es)) {
-		ext4_group_t i, ngroups = ext4_get_groups_count(sb);
-		ext4_fsblk_t overhead = 0;
-
-		/*
-		 * Compute the overhead (FS structures).  This is constant
-		 * for a given filesystem unless the number of block groups
-		 * changes so we cache the previous value until it does.
-		 */
-
-		/*
-		 * All of the blocks before first_data_block are
-		 * overhead
-		 */
-		overhead = EXT4_B2C(sbi, le32_to_cpu(es->s_first_data_block));
-
-		/*
-		 * Add the overhead found in each block group
-		 */
-		for (i = 0; i < ngroups; i++) {
-			gdp = ext4_get_group_desc(sb, i, NULL);
-			overhead += ext4_num_overhead_clusters(sb, i, gdp);
-			cond_resched();
-		}
-		sbi->s_overhead_last = overhead;
-		smp_wmb();
-		sbi->s_blocks_last = ext4_blocks_count(es);
-	}
+	if (!test_opt(sb, MINIX_DF))
+		overhead = sbi->s_overhead;
 
 	buf->f_type = EXT4_SUPER_MAGIC;
 	buf->f_bsize = sb->s_blocksize;
-	buf->f_blocks = (ext4_blocks_count(es) -
-			 EXT4_C2B(sbi, sbi->s_overhead_last));
+	buf->f_blocks = ext4_blocks_count(es) - EXT4_C2B(sbi, sbi->s_overhead);
 	bfree = percpu_counter_sum_positive(&sbi->s_freeclusters_counter) -
 		percpu_counter_sum_positive(&sbi->s_dirtyclusters_counter);
 	/* prevent underflow in case that few free space is available */
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ebc2f4d..0aa424a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -569,7 +569,8 @@ static int hugetlbfs_set_page_dirty(struct page *page)
 }
 
 static int hugetlbfs_migrate_page(struct address_space *mapping,
-				struct page *newpage, struct page *page)
+				struct page *newpage, struct page *page,
+				enum migrate_mode mode)
 {
 	int rc;
 
diff --git a/fs/locks.c b/fs/locks.c
index 6a64f15..fcc50ab 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -308,7 +308,7 @@ static int flock_make_lock(struct file *filp, struct file_lock **lock,
 	return 0;
 }
 
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock *fl, long type)
 {
 	switch (type) {
 	case F_RDLCK:
@@ -445,7 +445,7 @@ static const struct lock_manager_operations lease_manager_ops = {
 /*
  * Initialize a lease, use the default lock manager operations
  */
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, long type, struct file_lock *fl)
  {
 	if (assign_type(fl, type) != 0)
 		return -EINVAL;
@@ -463,7 +463,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
 }
 
 /* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lock *lease_alloc(struct file *filp, long type)
 {
 	struct file_lock *fl = locks_alloc_lock();
 	int error = -ENOMEM;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 3f4d957..68b3f20 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -330,7 +330,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-		struct page *, struct page *);
+		struct page *, struct page *, enum migrate_mode);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4efd421..c6e523a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1711,7 +1711,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-		struct page *page)
+		struct page *page, enum migrate_mode mode)
 {
 	/*
 	 * If PagePrivate is set, then the page is currently associated with
@@ -1726,7 +1726,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
 
 	nfs_fscache_release_page(page, GFP_KERNEL);
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, mode);
 }
 #endif
 
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 270e135..516b7f0 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1285,7 +1285,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
 	BUG_ON(ident != TAG_IDENT_LVD);
 	lvd = (struct logicalVolDesc *)bh->b_data;
 	table_len = le32_to_cpu(lvd->mapTableLength);
-	if (sizeof(*lvd) + table_len > sb->s_blocksize) {
+	if (table_len > sb->s_blocksize - sizeof(*lvd)) {
 		udf_err(sb, "error loading logical volume descriptor: "
 			"Partition table too long (%u > %lu)\n", table_len,
 			sb->s_blocksize - sizeof(*lvd));
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0ed1eb0..ff039f0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -481,6 +481,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 
 #define blk_queue_tagged(q)	test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
+#define blk_queue_dead(q)	test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
 	test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 6cb60fd..c692acc 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -66,8 +66,9 @@ enum {
 	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
 	CPU_PRI_MIGRATION	= 10,
-	/* prepare workqueues for other notifiers */
-	CPU_PRI_WORKQUEUE	= 5,
+	/* bring up workqueues before normal notifiers and down after */
+	CPU_PRI_WORKQUEUE_UP	= 5,
+	CPU_PRI_WORKQUEUE_DOWN	= -5,
 };
 
 #define CPU_ONLINE		0x0002 /* CPU (unsigned)v is up */
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index e9eaec5..7a7e5fd 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -89,42 +89,33 @@ extern void rebuild_sched_domains(void);
 extern void cpuset_print_task_mems_allowed(struct task_struct *p);
 
 /*
- * reading current mems_allowed and mempolicy in the fastpath must protected
- * by get_mems_allowed()
+ * get_mems_allowed is required when making decisions involving mems_allowed
+ * such as during page allocation. mems_allowed can be updated in parallel
+ * and depending on the new value an operation can fail potentially causing
+ * process failure. A retry loop with get_mems_allowed and put_mems_allowed
+ * prevents these artificial failures.
  */
-static inline void get_mems_allowed(void)
+static inline unsigned int get_mems_allowed(void)
 {
-	current->mems_allowed_change_disable++;
-
-	/*
-	 * ensure that reading mems_allowed and mempolicy happens after the
-	 * update of ->mems_allowed_change_disable.
-	 *
-	 * the write-side task finds ->mems_allowed_change_disable is not 0,
-	 * and knows the read-side task is reading mems_allowed or mempolicy,
-	 * so it will clear old bits lazily.
-	 */
-	smp_mb();
+	return read_seqcount_begin(&current->mems_allowed_seq);
 }
 
-static inline void put_mems_allowed(void)
+/*
+ * If this returns false, the operation that took place after get_mems_allowed
+ * may have failed. It is up to the caller to retry the operation if
+ * appropriate.
+ */
+static inline bool put_mems_allowed(unsigned int seq)
 {
-	/*
-	 * ensure that reading mems_allowed and mempolicy before reducing
-	 * mems_allowed_change_disable.
-	 *
-	 * the write-side task will know that the read-side task is still
-	 * reading mems_allowed or mempolicy, don't clears old bits in the
-	 * nodemask.
-	 */
-	smp_mb();
-	--ACCESS_ONCE(current->mems_allowed_change_disable);
+	return !read_seqcount_retry(&current->mems_allowed_seq, seq);
 }
 
 static inline void set_mems_allowed(nodemask_t nodemask)
 {
 	task_lock(current);
+	write_seqcount_begin(&current->mems_allowed_seq);
 	current->mems_allowed = nodemask;
+	write_seqcount_end(&current->mems_allowed_seq);
 	task_unlock(current);
 }
 
@@ -234,12 +225,14 @@ static inline void set_mems_allowed(nodemask_t nodemask)
 {
 }
 
-static inline void get_mems_allowed(void)
+static inline unsigned int get_mems_allowed(void)
 {
+	return 0;
 }
 
-static inline void put_mems_allowed(void)
+static inline bool put_mems_allowed(unsigned int seq)
 {
+	return true;
 }
 
 #endif /* !CONFIG_CPUSETS */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 43d36b7..29b6353 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -525,6 +525,7 @@ enum positive_aop_returns {
 struct page;
 struct address_space;
 struct writeback_control;
+enum migrate_mode;
 
 struct iov_iter {
 	const struct iovec *iov;
@@ -609,9 +610,12 @@ struct address_space_operations {
 			loff_t offset, unsigned long nr_segs);
 	int (*get_xip_mem)(struct address_space *, pgoff_t, int,
 						void **, unsigned long *);
-	/* migrate the contents of a page to the specified target */
+	/*
+	 * migrate the contents of a page to the specified target. If sync
+	 * is false, it must not block.
+	 */
 	int (*migratepage) (struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, enum migrate_mode);
 	int (*launder_page) (struct page *);
 	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 					unsigned long);
@@ -2586,7 +2590,8 @@ extern int generic_check_addressable(unsigned, u64);
 
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
-				struct page *, struct page *);
+				struct page *, struct page *,
+				enum migrate_mode);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 32574ee..df53fdf 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -30,6 +30,13 @@ extern struct fs_struct init_fs;
 #define INIT_THREADGROUP_FORK_LOCK(sig)
 #endif
 
+#ifdef CONFIG_CPUSETS
+#define INIT_CPUSET_SEQ							\
+	.mems_allowed_seq = SEQCNT_ZERO,
+#else
+#define INIT_CPUSET_SEQ
+#endif
+
 #define INIT_SIGNALS(sig) {						\
 	.nr_threads	= 1,						\
 	.wait_chldexit	= __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
@@ -193,6 +200,7 @@ extern struct cred init_cred;
 	INIT_FTRACE_GRAPH						\
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
+	INIT_CPUSET_SEQ							\
 }
 
 
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e39aeec..eaf8674 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -6,18 +6,31 @@
 
 typedef struct page *new_page_t(struct page *, unsigned long private, int **);
 
+/*
+ * MIGRATE_ASYNC means never block
+ * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
+ *	on most operations but not ->writepage as the potential stall time
+ *	is too significant
+ * MIGRATE_SYNC will block when migrating pages
+ */
+enum migrate_mode {
+	MIGRATE_ASYNC,
+	MIGRATE_SYNC_LIGHT,
+	MIGRATE_SYNC,
+};
+
 #ifdef CONFIG_MIGRATION
 #define PAGE_MIGRATION 1
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-			struct page *, struct page *);
+			struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode mode);
 extern int migrate_huge_pages(struct list_head *l, new_page_t x,
 			unsigned long private, bool offlining,
-			bool sync);
+			enum migrate_mode mode);
 
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
@@ -36,10 +49,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode mode) { return -ENOSYS; }
 static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
 		unsigned long private, bool offlining,
-		bool sync) { return -ENOSYS; }
+		enum migrate_mode mode) { return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 905b1e1..25842b6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -173,6 +173,8 @@ static inline int is_unevictable_lru(enum lru_list l)
 #define ISOLATE_CLEAN		((__force isolate_mode_t)0x4)
 /* Isolate unmapped file */
 #define ISOLATE_UNMAPPED	((__force isolate_mode_t)0x8)
+/* Isolate for asynchronous migration */
+#define ISOLATE_ASYNC_MIGRATE	((__force isolate_mode_t)0x10)
 
 /* LRU Isolation modes. */
 typedef unsigned __bitwise__ isolate_mode_t;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5afa2a3..d336c35 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -145,6 +145,7 @@ extern unsigned long this_cpu_load(void);
 
 
 extern void calc_global_load(unsigned long ticks);
+extern void update_cpu_load_nohz(void);
 
 extern unsigned long get_parent_ip(unsigned long addr);
 
@@ -1481,7 +1482,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_CPUSETS
 	nodemask_t mems_allowed;	/* Protected by alloc_lock */
-	int mems_allowed_change_disable;
+	seqcount_t mems_allowed_seq;	/* Seqence no to catch updates */
 	int cpuset_mem_spread_rotor;
 	int cpuset_slab_spread_rotor;
 #endif
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 94bbec3..6ee550e 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -157,6 +157,7 @@ enum tcm_sense_reason_table {
 	TCM_CHECK_CONDITION_UNIT_ATTENTION	= 0x0e,
 	TCM_CHECK_CONDITION_NOT_READY		= 0x0f,
 	TCM_RESERVATION_CONFLICT		= 0x10,
+	TCM_ADDRESS_OUT_OF_RANGE		= 0x11,
 };
 
 struct se_obj {
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 0b1712d..46a1d3c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -964,7 +964,6 @@ static void cpuset_change_task_nodemask(struct task_struct *tsk,
 {
 	bool need_loop;
 
-repeat:
 	/*
 	 * Allow tasks that have access to memory reserves because they have
 	 * been OOM killed to get memory anywhere.
@@ -983,45 +982,19 @@ repeat:
 	 */
 	need_loop = task_has_mempolicy(tsk) ||
 			!nodes_intersects(*newmems, tsk->mems_allowed);
-	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
-	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
 
-	/*
-	 * ensure checking ->mems_allowed_change_disable after setting all new
-	 * allowed nodes.
-	 *
-	 * the read-side task can see an nodemask with new allowed nodes and
-	 * old allowed nodes. and if it allocates page when cpuset clears newly
-	 * disallowed ones continuous, it can see the new allowed bits.
-	 *
-	 * And if setting all new allowed nodes is after the checking, setting
-	 * all new allowed nodes and clearing newly disallowed ones will be done
-	 * continuous, and the read-side task may find no node to alloc page.
-	 */
-	smp_mb();
+	if (need_loop)
+		write_seqcount_begin(&tsk->mems_allowed_seq);
 
-	/*
-	 * Allocation of memory is very fast, we needn't sleep when waiting
-	 * for the read-side.
-	 */
-	while (need_loop && ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
-		task_unlock(tsk);
-		if (!task_curr(tsk))
-			yield();
-		goto repeat;
-	}
-
-	/*
-	 * ensure checking ->mems_allowed_change_disable before clearing all new
-	 * disallowed nodes.
-	 *
-	 * if clearing newly disallowed bits before the checking, the read-side
-	 * task may find no node to alloc page.
-	 */
-	smp_mb();
+	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
+	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
 
 	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP2);
 	tsk->mems_allowed = *newmems;
+
+	if (need_loop)
+		write_seqcount_end(&tsk->mems_allowed_seq);
+
 	task_unlock(tsk);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 79ee71f..222457a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -979,6 +979,9 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 #ifdef CONFIG_CGROUPS
 	init_rwsem(&sig->threadgroup_fork_lock);
 #endif
+#ifdef CONFIG_CPUSETS
+	seqcount_init(&tsk->mems_allowed_seq);
+#endif
 
 	sig->oom_adj = current->signal->oom_adj;
 	sig->oom_score_adj = current->signal->oom_score_adj;
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 7c0d578..013bd2e 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -367,6 +367,7 @@ int hibernation_snapshot(int platform_mode)
 	}
 
 	suspend_console();
+	ftrace_stop();
 	pm_restrict_gfp_mask();
 	error = dpm_suspend(PMSG_FREEZE);
 	if (error)
@@ -392,6 +393,7 @@ int hibernation_snapshot(int platform_mode)
 	if (error || !in_suspend)
 		pm_restore_gfp_mask();
 
+	ftrace_start();
 	resume_console();
 	dpm_complete(msg);
 
@@ -496,6 +498,7 @@ int hibernation_restore(int platform_mode)
 
 	pm_prepare_console();
 	suspend_console();
+	ftrace_stop();
 	pm_restrict_gfp_mask();
 	error = dpm_suspend_start(PMSG_QUIESCE);
 	if (!error) {
@@ -503,6 +506,7 @@ int hibernation_restore(int platform_mode)
 		dpm_resume_end(PMSG_RECOVER);
 	}
 	pm_restore_gfp_mask();
+	ftrace_start();
 	resume_console();
 	pm_restore_console();
 	return error;
@@ -529,6 +533,7 @@ int hibernation_platform_enter(void)
 
 	entering_platform_hibernation = true;
 	suspend_console();
+	ftrace_stop();
 	error = dpm_suspend_start(PMSG_HIBERNATE);
 	if (error) {
 		if (hibernation_ops->recover)
@@ -572,6 +577,7 @@ int hibernation_platform_enter(void)
  Resume_devices:
 	entering_platform_hibernation = false;
 	dpm_resume_end(PMSG_RESTORE);
+	ftrace_start();
 	resume_console();
 
  Close:
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 4953dc0..af48faa 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -25,6 +25,7 @@
 #include <linux/export.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
+#include <linux/ftrace.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -220,6 +221,7 @@ int suspend_devices_and_enter(suspend_state_t state)
 			goto Close;
 	}
 	suspend_console();
+	ftrace_stop();
 	suspend_test_start();
 	error = dpm_suspend_start(PMSG_SUSPEND);
 	if (error) {
@@ -239,6 +241,7 @@ int suspend_devices_and_enter(suspend_state_t state)
 	suspend_test_start();
 	dpm_resume_end(PMSG_RESUME);
 	suspend_test_finish("resume devices");
+	ftrace_start();
 	resume_console();
  Close:
 	if (suspend_ops->end)
diff --git a/kernel/sched.c b/kernel/sched.c
index 52ac69b..9cd8ca7 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1887,7 +1887,7 @@ static void double_rq_unlock(struct rq *rq1, struct rq *rq2)
 
 static void update_sysctl(void);
 static int get_update_sysctl_factor(void);
-static void update_cpu_load(struct rq *this_rq);
+static void update_idle_cpu_load(struct rq *this_rq);
 
 static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
 {
@@ -3855,22 +3855,13 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-static void update_cpu_load(struct rq *this_rq)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+			      unsigned long pending_updates)
 {
-	unsigned long this_load = this_rq->load.weight;
-	unsigned long curr_jiffies = jiffies;
-	unsigned long pending_updates;
 	int i, scale;
 
 	this_rq->nr_load_updates++;
 
-	/* Avoid repeated calls on same jiffy, when moving in and out of idle */
-	if (curr_jiffies == this_rq->last_load_update_tick)
-		return;
-
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-
 	/* Update our load: */
 	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
@@ -3895,9 +3886,78 @@ static void update_cpu_load(struct rq *this_rq)
 	sched_avg_update(this_rq);
 }
 
+#ifdef CONFIG_NO_HZ
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we cannot use the delta approach from the regular tick since that
+ * would seriously skew the load calculation. However we'll make do for those
+ * updates happening while idle (nohz_idle_balance) or coming out of idle
+ * (tick_nohz_idle_exit).
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
+/*
+ * Called from nohz_idle_balance() to update the load ratings before doing the
+ * idle balance.
+ */
+static void update_idle_cpu_load(struct rq *this_rq)
+{
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long load = this_rq->load.weight;
+	unsigned long pending_updates;
+
+	/*
+	 * bail if there's load or we're actually up-to-date.
+	 */
+	if (load || curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
+
+	__update_cpu_load(this_rq, load, pending_updates);
+}
+
+/*
+ * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ */
+void update_cpu_load_nohz(void)
+{
+	struct rq *this_rq = this_rq();
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	if (curr_jiffies == this_rq->last_load_update_tick)
+		return;
+
+	raw_spin_lock(&this_rq->lock);
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	if (pending_updates) {
+		this_rq->last_load_update_tick = curr_jiffies;
+		/*
+		 * We were idle, this means load 0, the current load might be
+		 * !0 due to remote wakeups and the sort.
+		 */
+		__update_cpu_load(this_rq, 0, pending_updates);
+	}
+	raw_spin_unlock(&this_rq->lock);
+}
+#endif /* CONFIG_NO_HZ */
+
+/*
+ * Called from scheduler_tick()
+ */
 static void update_cpu_load_active(struct rq *this_rq)
 {
-	update_cpu_load(this_rq);
+	/*
+	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
+	 */
+	this_rq->last_load_update_tick = jiffies;
+	__update_cpu_load(this_rq, this_rq->load.weight, 1);
 
 	calc_load_account_active(this_rq);
 }
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8a39fa3..66e4576 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -4735,7 +4735,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
 
 		raw_spin_lock_irq(&this_rq->lock);
 		update_rq_clock(this_rq);
-		update_cpu_load(this_rq);
+		update_idle_cpu_load(this_rq);
 		raw_spin_unlock_irq(&this_rq->lock);
 
 		rebalance_domains(balance_cpu, CPU_IDLE);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9955ebd..793548c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -549,6 +549,7 @@ void tick_nohz_restart_sched_tick(void)
 	/* Update jiffies first */
 	select_nohz_load_balancer(0);
 	tick_do_update_jiffies64(now);
+	update_cpu_load_nohz();
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 	/*
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7947e16..a650bee 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3586,6 +3586,41 @@ static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
 	return notifier_from_errno(0);
 }
 
+/*
+ * Workqueues should be brought up before normal priority CPU notifiers.
+ * This will be registered high priority CPU notifier.
+ */
+static int __devinit workqueue_cpu_up_callback(struct notifier_block *nfb,
+					       unsigned long action,
+					       void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+	case CPU_UP_CANCELED:
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		return workqueue_cpu_callback(nfb, action, hcpu);
+	}
+	return NOTIFY_OK;
+}
+
+/*
+ * Workqueues should be brought down after normal priority CPU notifiers.
+ * This will be registered as low priority CPU notifier.
+ */
+static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
+						 unsigned long action,
+						 void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_PREPARE:
+	case CPU_DYING:
+	case CPU_POST_DEAD:
+		return workqueue_cpu_callback(nfb, action, hcpu);
+	}
+	return NOTIFY_OK;
+}
+
 #ifdef CONFIG_SMP
 
 struct work_for_cpu {
@@ -3779,7 +3814,8 @@ static int __init init_workqueues(void)
 	unsigned int cpu;
 	int i;
 
-	cpu_notifier(workqueue_cpu_callback, CPU_PRI_WORKQUEUE);
+	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
+	cpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
 
 	/* initialize gcwqs */
 	for_each_gcwq_cpu(cpu) {
diff --git a/mm/compaction.c b/mm/compaction.c
index 50f1c60..46973fb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -372,7 +372,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		}
 
 		if (!cc->sync)
-			mode |= ISOLATE_CLEAN;
+			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
 		if (__isolate_lru_page(page, mode, 0) != 0)
@@ -577,7 +577,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
 				(unsigned long)cc, false,
-				cc->sync);
+				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 03c5b0e..556858c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -500,10 +500,13 @@ struct page *__page_cache_alloc(gfp_t gfp)
 	struct page *page;
 
 	if (cpuset_do_page_mem_spread()) {
-		get_mems_allowed();
-		n = cpuset_mem_spread_node();
-		page = alloc_pages_exact_node(n, gfp, 0);
-		put_mems_allowed();
+		unsigned int cpuset_mems_cookie;
+		do {
+			cpuset_mems_cookie = get_mems_allowed();
+			n = cpuset_mem_spread_node();
+			page = alloc_pages_exact_node(n, gfp, 0);
+		} while (!put_mems_allowed(cpuset_mems_cookie) && !page);
+
 		return page;
 	}
 	return alloc_pages(gfp, 0);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c535b0..b1e1bad 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -538,8 +538,10 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 	struct zonelist *zonelist;
 	struct zone *zone;
 	struct zoneref *z;
+	unsigned int cpuset_mems_cookie;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
 	zonelist = huge_zonelist(vma, address,
 					htlb_alloc_mask, &mpol, &nodemask);
 	/*
@@ -566,10 +568,15 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 			}
 		}
 	}
-err:
+
 	mpol_cond_put(mpol);
-	put_mems_allowed();
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
 	return page;
+
+err:
+	mpol_cond_put(mpol);
+	return NULL;
 }
 
 static void update_and_free_page(struct hstate *h, struct page *page)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 06d3479..56080ea 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1557,7 +1557,7 @@ int soft_offline_page(struct page *page, int flags)
 					    page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
-								0, true);
+							0, MIGRATE_SYNC);
 		if (ret) {
 			putback_lru_pages(&pagelist);
 			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2168489..6629faf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -809,7 +809,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 		}
 		/* this function returns # of failed pages */
 		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
-								true, true);
+							true, MIGRATE_SYNC);
 		if (ret)
 			putback_lru_pages(&source);
 	}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b26aae2..c0007f9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -942,7 +942,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
 
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_node_page, dest,
-								false, true);
+							false, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
@@ -1843,18 +1843,24 @@ struct page *
 alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		unsigned long addr, int node)
 {
-	struct mempolicy *pol = get_vma_policy(current, vma, addr);
+	struct mempolicy *pol;
 	struct zonelist *zl;
 	struct page *page;
+	unsigned int cpuset_mems_cookie;
+
+retry_cpuset:
+	pol = get_vma_policy(current, vma, addr);
+	cpuset_mems_cookie = get_mems_allowed();
 
-	get_mems_allowed();
 	if (unlikely(pol->mode == MPOL_INTERLEAVE)) {
 		unsigned nid;
 
 		nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
 		mpol_cond_put(pol);
 		page = alloc_page_interleave(gfp, order, nid);
-		put_mems_allowed();
+		if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+			goto retry_cpuset;
+
 		return page;
 	}
 	zl = policy_zonelist(gfp, pol, node);
@@ -1865,7 +1871,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 		struct page *page =  __alloc_pages_nodemask(gfp, order,
 						zl, policy_nodemask(gfp, pol));
 		__mpol_put(pol);
-		put_mems_allowed();
+		if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+			goto retry_cpuset;
 		return page;
 	}
 	/*
@@ -1873,7 +1880,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	 */
 	page = __alloc_pages_nodemask(gfp, order, zl,
 				      policy_nodemask(gfp, pol));
-	put_mems_allowed();
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
 	return page;
 }
 
@@ -1900,11 +1908,14 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order)
 {
 	struct mempolicy *pol = current->mempolicy;
 	struct page *page;
+	unsigned int cpuset_mems_cookie;
 
 	if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
 		pol = &default_policy;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+
 	/*
 	 * No reference counting needed for current->mempolicy
 	 * nor system default_policy
@@ -1915,7 +1926,10 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order)
 		page = __alloc_pages_nodemask(gfp, order,
 				policy_zonelist(gfp, pol, numa_node_id()),
 				policy_nodemask(gfp, pol));
-	put_mems_allowed();
+
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
+
 	return page;
 }
 EXPORT_SYMBOL(alloc_pages_current);
diff --git a/mm/migrate.c b/mm/migrate.c
index 177aca4..180d97f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -220,6 +220,56 @@ out:
 	pte_unmap_unlock(ptep, ptl);
 }
 
+#ifdef CONFIG_BLOCK
+/* Returns true if all buffers are successfully locked */
+static bool buffer_migrate_lock_buffers(struct buffer_head *head,
+							enum migrate_mode mode)
+{
+	struct buffer_head *bh = head;
+
+	/* Simple case, sync compaction */
+	if (mode != MIGRATE_ASYNC) {
+		do {
+			get_bh(bh);
+			lock_buffer(bh);
+			bh = bh->b_this_page;
+
+		} while (bh != head);
+
+		return true;
+	}
+
+	/* async case, we cannot block on lock_buffer so use trylock_buffer */
+	do {
+		get_bh(bh);
+		if (!trylock_buffer(bh)) {
+			/*
+			 * We failed to lock the buffer and cannot stall in
+			 * async migration. Release the taken locks
+			 */
+			struct buffer_head *failed_bh = bh;
+			put_bh(failed_bh);
+			bh = head;
+			while (bh != failed_bh) {
+				unlock_buffer(bh);
+				put_bh(bh);
+				bh = bh->b_this_page;
+			}
+			return false;
+		}
+
+		bh = bh->b_this_page;
+	} while (bh != head);
+	return true;
+}
+#else
+static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
+							enum migrate_mode mode)
+{
+	return true;
+}
+#endif /* CONFIG_BLOCK */
+
 /*
  * Replace the page in the mapping.
  *
@@ -229,7 +279,8 @@ out:
  * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
  */
 static int migrate_page_move_mapping(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page,
+		struct buffer_head *head, enum migrate_mode mode)
 {
 	int expected_count;
 	void **pslot;
@@ -259,6 +310,20 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	}
 
 	/*
+	 * In the async migration case of moving a page with buffers, lock the
+	 * buffers using trylock before the mapping is moved. If the mapping
+	 * was moved, we later failed to lock the buffers and could not move
+	 * the mapping back due to an elevated page count, we would have to
+	 * block waiting on other references to be dropped.
+	 */
+	if (mode == MIGRATE_ASYNC && head &&
+			!buffer_migrate_lock_buffers(head, mode)) {
+		page_unfreeze_refs(page, expected_count);
+		spin_unlock_irq(&mapping->tree_lock);
+		return -EAGAIN;
+	}
+
+	/*
 	 * Now we know that no one else is looking at the page.
 	 */
 	get_page(newpage);	/* add cache reference */
@@ -415,13 +480,14 @@ EXPORT_SYMBOL(fail_migrate_page);
  * Pages are locked upon entry and exit.
  */
 int migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page,
+		enum migrate_mode mode)
 {
 	int rc;
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
 
 	if (rc)
 		return rc;
@@ -438,28 +504,28 @@ EXPORT_SYMBOL(migrate_page);
  * exist.
  */
 int buffer_migrate_page(struct address_space *mapping,
-		struct page *newpage, struct page *page)
+		struct page *newpage, struct page *page, enum migrate_mode mode)
 {
 	struct buffer_head *bh, *head;
 	int rc;
 
 	if (!page_has_buffers(page))
-		return migrate_page(mapping, newpage, page);
+		return migrate_page(mapping, newpage, page, mode);
 
 	head = page_buffers(page);
 
-	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
 
 	if (rc)
 		return rc;
 
-	bh = head;
-	do {
-		get_bh(bh);
-		lock_buffer(bh);
-		bh = bh->b_this_page;
-
-	} while (bh != head);
+	/*
+	 * In the async case, migrate_page_move_mapping locked the buffers
+	 * with an IRQ-safe spinlock held. In the sync case, the buffers
+	 * need to be locked now
+	 */
+	if (mode != MIGRATE_ASYNC)
+		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
 	set_page_private(newpage, page_private(page));
@@ -536,10 +602,14 @@ static int writeout(struct address_space *mapping, struct page *page)
  * Default handling if a filesystem does not provide a migration function.
  */
 static int fallback_migrate_page(struct address_space *mapping,
-	struct page *newpage, struct page *page)
+	struct page *newpage, struct page *page, enum migrate_mode mode)
 {
-	if (PageDirty(page))
+	if (PageDirty(page)) {
+		/* Only writeback pages in full synchronous migration */
+		if (mode != MIGRATE_SYNC)
+			return -EBUSY;
 		return writeout(mapping, page);
+	}
 
 	/*
 	 * Buffers may be managed in a filesystem specific way.
@@ -549,7 +619,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 	    !try_to_release_page(page, GFP_KERNEL))
 		return -EAGAIN;
 
-	return migrate_page(mapping, newpage, page);
+	return migrate_page(mapping, newpage, page, mode);
 }
 
 /*
@@ -564,7 +634,7 @@ static int fallback_migrate_page(struct address_space *mapping,
  *  == 0 - success
  */
 static int move_to_new_page(struct page *newpage, struct page *page,
-					int remap_swapcache, bool sync)
+				int remap_swapcache, enum migrate_mode mode)
 {
 	struct address_space *mapping;
 	int rc;
@@ -585,29 +655,18 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 
 	mapping = page_mapping(page);
 	if (!mapping)
-		rc = migrate_page(mapping, newpage, page);
-	else {
+		rc = migrate_page(mapping, newpage, page, mode);
+	else if (mapping->a_ops->migratepage)
 		/*
-		 * Do not writeback pages if !sync and migratepage is
-		 * not pointing to migrate_page() which is nonblocking
-		 * (swapcache/tmpfs uses migratepage = migrate_page).
+		 * Most pages have a mapping and most filesystems provide a
+		 * migratepage callback. Anonymous pages are part of swap
+		 * space which also has its own migratepage callback. This
+		 * is the most common path for page migration.
 		 */
-		if (PageDirty(page) && !sync &&
-		    mapping->a_ops->migratepage != migrate_page)
-			rc = -EBUSY;
-		else if (mapping->a_ops->migratepage)
-			/*
-			 * Most pages have a mapping and most filesystems
-			 * should provide a migration function. Anonymous
-			 * pages are part of swap space which also has its
-			 * own migration function. This is the most common
-			 * path for page migration.
-			 */
-			rc = mapping->a_ops->migratepage(mapping,
-							newpage, page);
-		else
-			rc = fallback_migrate_page(mapping, newpage, page);
-	}
+		rc = mapping->a_ops->migratepage(mapping,
+						newpage, page, mode);
+	else
+		rc = fallback_migrate_page(mapping, newpage, page, mode);
 
 	if (rc) {
 		newpage->mapping = NULL;
@@ -622,7 +681,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
 }
 
 static int __unmap_and_move(struct page *page, struct page *newpage,
-				int force, bool offlining, bool sync)
+			int force, bool offlining, enum migrate_mode mode)
 {
 	int rc = -EAGAIN;
 	int remap_swapcache = 1;
@@ -631,7 +690,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	struct anon_vma *anon_vma = NULL;
 
 	if (!trylock_page(page)) {
-		if (!force || !sync)
+		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
 
 		/*
@@ -677,10 +736,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 	if (PageWriteback(page)) {
 		/*
-		 * For !sync, there is no point retrying as the retry loop
-		 * is expected to be too short for PageWriteback to be cleared
+		 * Only in the case of a full syncronous migration is it
+		 * necessary to wait for PageWriteback. In the async case,
+		 * the retry loop is too short and in the sync-light case,
+		 * the overhead of stalling is too much
 		 */
-		if (!sync) {
+		if (mode != MIGRATE_SYNC) {
 			rc = -EBUSY;
 			goto uncharge;
 		}
@@ -751,7 +812,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 
 skip_unmap:
 	if (!page_mapped(page))
-		rc = move_to_new_page(newpage, page, remap_swapcache, sync);
+		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
 
 	if (rc && remap_swapcache)
 		remove_migration_ptes(page, page);
@@ -774,7 +835,8 @@ out:
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, bool offlining, bool sync)
+			struct page *page, int force, bool offlining,
+			enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -792,7 +854,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 		if (unlikely(split_huge_page(page)))
 			goto out;
 
-	rc = __unmap_and_move(page, newpage, force, offlining, sync);
+	rc = __unmap_and_move(page, newpage, force, offlining, mode);
 out:
 	if (rc != -EAGAIN) {
 		/*
@@ -840,7 +902,8 @@ out:
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
 				unsigned long private, struct page *hpage,
-				int force, bool offlining, bool sync)
+				int force, bool offlining,
+				enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -853,7 +916,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	rc = -EAGAIN;
 
 	if (!trylock_page(hpage)) {
-		if (!force || !sync)
+		if (!force || mode != MIGRATE_SYNC)
 			goto out;
 		lock_page(hpage);
 	}
@@ -864,7 +927,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
 
 	if (!page_mapped(hpage))
-		rc = move_to_new_page(new_hpage, hpage, 1, sync);
+		rc = move_to_new_page(new_hpage, hpage, 1, mode);
 
 	if (rc)
 		remove_migration_ptes(hpage, hpage);
@@ -907,7 +970,7 @@ out:
  */
 int migrate_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -928,7 +991,7 @@ int migrate_pages(struct list_head *from,
 
 			rc = unmap_and_move(get_new_page, private,
 						page, pass > 2, offlining,
-						sync);
+						mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -958,7 +1021,7 @@ out:
 
 int migrate_huge_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private, bool offlining,
-		bool sync)
+		enum migrate_mode mode)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -975,7 +1038,7 @@ int migrate_huge_pages(struct list_head *from,
 
 			rc = unmap_and_move_huge_page(get_new_page,
 					private, page, pass > 2, offlining,
-					sync);
+					mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1104,7 +1167,7 @@ set_status:
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node,
-				(unsigned long)pm, 0, true);
+				(unsigned long)pm, 0, MIGRATE_SYNC);
 		if (err)
 			putback_lru_pages(&pagelist);
 	}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 485be89..065dbe8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1886,14 +1886,20 @@ static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	struct page *page;
 
-	if (!order || compaction_deferred(preferred_zone))
+	if (!order)
 		return NULL;
 
+	if (compaction_deferred(preferred_zone)) {
+		*deferred_compaction = true;
+		return NULL;
+	}
+
 	current->flags |= PF_MEMALLOC;
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
 						nodemask, sync_migration);
@@ -1921,7 +1927,13 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 		 * but not enough to satisfy watermarks.
 		 */
 		count_vm_event(COMPACTFAIL);
-		defer_compaction(preferred_zone);
+
+		/*
+		 * As async compaction considers a subset of pageblocks, only
+		 * defer if the failure was a sync compaction failure.
+		 */
+		if (sync_migration)
+			defer_compaction(preferred_zone);
 
 		cond_resched();
 	}
@@ -1933,8 +1945,9 @@ static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, unsigned long *did_some_progress,
-	bool sync_migration)
+	int migratetype, bool sync_migration,
+	bool *deferred_compaction,
+	unsigned long *did_some_progress)
 {
 	return NULL;
 }
@@ -2084,6 +2097,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
 	bool sync_migration = false;
+	bool deferred_compaction = false;
 
 	/*
 	 * In the slowpath, we sanity check order to avoid ever trying to
@@ -2164,12 +2178,22 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 	if (page)
 		goto got_pg;
 	sync_migration = true;
 
+	/*
+	 * If compaction is deferred for high-order allocations, it is because
+	 * sync compaction recently failed. In this is the case and the caller
+	 * has requested the system not be heavily disrupted, fail the
+	 * allocation now instead of entering direct reclaim
+	 */
+	if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
+		goto nopage;
+
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,
@@ -2232,8 +2256,9 @@ rebalance:
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, &did_some_progress,
-					sync_migration);
+					migratetype, sync_migration,
+					&deferred_compaction,
+					&did_some_progress);
 		if (page)
 			goto got_pg;
 	}
@@ -2257,8 +2282,9 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	struct zone *preferred_zone;
-	struct page *page;
+	struct page *page = NULL;
 	int migratetype = allocflags_to_migratetype(gfp_mask);
+	unsigned int cpuset_mems_cookie;
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2277,15 +2303,15 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (unlikely(!zonelist->_zonerefs->zone))
 		return NULL;
 
-	get_mems_allowed();
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+
 	/* The preferred zone is used for statistics later */
 	first_zones_zonelist(zonelist, high_zoneidx,
 				nodemask ? : &cpuset_current_mems_allowed,
 				&preferred_zone);
-	if (!preferred_zone) {
-		put_mems_allowed();
-		return NULL;
-	}
+	if (!preferred_zone)
+		goto out;
 
 	/* First allocation attempt */
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
@@ -2295,9 +2321,19 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
-	put_mems_allowed();
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+
+out:
+	/*
+	 * When updating a task's mems_allowed, it is possible to race with
+	 * parallel threads in such a way that an allocation can fail while
+	 * the mask is being updated. If a page allocation is about to fail,
+	 * check if the cpuset changed during allocation and if so, retry.
+	 */
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
+		goto retry_cpuset;
+
 	return page;
 }
 EXPORT_SYMBOL(__alloc_pages_nodemask);
@@ -2521,13 +2557,15 @@ void si_meminfo_node(struct sysinfo *val, int nid)
 bool skip_free_areas_node(unsigned int flags, int nid)
 {
 	bool ret = false;
+	unsigned int cpuset_mems_cookie;
 
 	if (!(flags & SHOW_MEM_FILTER_NODES))
 		goto out;
 
-	get_mems_allowed();
-	ret = !node_isset(nid, cpuset_current_mems_allowed);
-	put_mems_allowed();
+	do {
+		cpuset_mems_cookie = get_mems_allowed();
+		ret = !node_isset(nid, cpuset_current_mems_allowed);
+	} while (!put_mems_allowed(cpuset_mems_cookie));
 out:
 	return ret;
 }
@@ -3407,25 +3445,33 @@ static void setup_zone_migrate_reserve(struct zone *zone)
 		if (page_to_nid(page) != zone_to_nid(zone))
 			continue;
 
-		/* Blocks with reserved pages will never free, skip them. */
-		block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn);
-		if (pageblock_is_reserved(pfn, block_end_pfn))
-			continue;
-
 		block_migratetype = get_pageblock_migratetype(page);
 
-		/* If this block is reserved, account for it */
-		if (reserve > 0 && block_migratetype == MIGRATE_RESERVE) {
-			reserve--;
-			continue;
-		}
+		/* Only test what is necessary when the reserves are not met */
+		if (reserve > 0) {
+			/*
+			 * Blocks with reserved pages will never free, skip
+			 * them.
+			 */
+			block_end_pfn = min(pfn + pageblock_nr_pages, end_pfn);
+			if (pageblock_is_reserved(pfn, block_end_pfn))
+				continue;
 
-		/* Suitable for reserving if this block is movable */
-		if (reserve > 0 && block_migratetype == MIGRATE_MOVABLE) {
-			set_pageblock_migratetype(page, MIGRATE_RESERVE);
-			move_freepages_block(zone, page, MIGRATE_RESERVE);
-			reserve--;
-			continue;
+			/* If this block is reserved, account for it */
+			if (block_migratetype == MIGRATE_RESERVE) {
+				reserve--;
+				continue;
+			}
+
+			/* Suitable for reserving if this block is movable */
+			if (block_migratetype == MIGRATE_MOVABLE) {
+				set_pageblock_migratetype(page,
+							MIGRATE_RESERVE);
+				move_freepages_block(zone, page,
+							MIGRATE_RESERVE);
+				reserve--;
+				continue;
+			}
 		}
 
 		/*
diff --git a/mm/slab.c b/mm/slab.c
index 83311c9a..cd3ab93 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3267,12 +3267,10 @@ static void *alternate_node_alloc(struct kmem_cache *cachep, gfp_t flags)
 	if (in_interrupt() || (flags & __GFP_THISNODE))
 		return NULL;
 	nid_alloc = nid_here = numa_mem_id();
-	get_mems_allowed();
 	if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
 		nid_alloc = cpuset_slab_spread_node();
 	else if (current->mempolicy)
 		nid_alloc = slab_node(current->mempolicy);
-	put_mems_allowed();
 	if (nid_alloc != nid_here)
 		return ____cache_alloc_node(cachep, flags, nid_alloc);
 	return NULL;
@@ -3295,14 +3293,17 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
 	enum zone_type high_zoneidx = gfp_zone(flags);
 	void *obj = NULL;
 	int nid;
+	unsigned int cpuset_mems_cookie;
 
 	if (flags & __GFP_THISNODE)
 		return NULL;
 
-	get_mems_allowed();
-	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
+retry_cpuset:
+	cpuset_mems_cookie = get_mems_allowed();
+	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+
 retry:
 	/*
 	 * Look through allowed nodes for objects available
@@ -3355,7 +3356,9 @@ retry:
 			}
 		}
 	}
-	put_mems_allowed();
+
+	if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !obj))
+		goto retry_cpuset;
 	return obj;
 }
 
diff --git a/mm/slub.c b/mm/slub.c
index af47188..5710788 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1582,6 +1582,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
 	struct zone *zone;
 	enum zone_type high_zoneidx = gfp_zone(flags);
 	void *object;
+	unsigned int cpuset_mems_cookie;
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -1605,23 +1606,32 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags,
 			get_cycles() % 1024 > s->remote_node_defrag_ratio)
 		return NULL;
 
-	get_mems_allowed();
-	zonelist = node_zonelist(slab_node(current->mempolicy), flags);
-	for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
-		struct kmem_cache_node *n;
-
-		n = get_node(s, zone_to_nid(zone));
-
-		if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
-				n->nr_partial > s->min_partial) {
-			object = get_partial_node(s, n, c);
-			if (object) {
-				put_mems_allowed();
-				return object;
+	do {
+		cpuset_mems_cookie = get_mems_allowed();
+		zonelist = node_zonelist(slab_node(current->mempolicy), flags);
+		for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
+			struct kmem_cache_node *n;
+
+			n = get_node(s, zone_to_nid(zone));
+
+			if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
+					n->nr_partial > s->min_partial) {
+				object = get_partial_node(s, n, c);
+				if (object) {
+					/*
+					 * Return the object even if
+					 * put_mems_allowed indicated that
+					 * the cpuset mems_allowed was
+					 * updated in parallel. It's a
+					 * harmless race between the alloc
+					 * and the cpuset update.
+					 */
+					put_mems_allowed(cpuset_mems_cookie);
+					return object;
+				}
 			}
 		}
-	}
-	put_mems_allowed();
+	} while (!put_mems_allowed(cpuset_mems_cookie));
 #endif
 	return NULL;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8342119..48febd7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -715,7 +715,13 @@ static enum page_references page_check_references(struct page *page,
 		 */
 		SetPageReferenced(page);
 
-		if (referenced_page)
+		if (referenced_page || referenced_ptes > 1)
+			return PAGEREF_ACTIVATE;
+
+		/*
+		 * Activate file-backed executable pages after first usage.
+		 */
+		if (vm_flags & VM_EXEC)
 			return PAGEREF_ACTIVATE;
 
 		return PAGEREF_KEEP;
@@ -1061,8 +1067,39 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file)
 
 	ret = -EBUSY;
 
-	if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page)))
-		return ret;
+	/*
+	 * To minimise LRU disruption, the caller can indicate that it only
+	 * wants to isolate pages it will be able to operate on without
+	 * blocking - clean pages for the most part.
+	 *
+	 * ISOLATE_CLEAN means that only clean pages should be isolated. This
+	 * is used by reclaim when it is cannot write to backing storage
+	 *
+	 * ISOLATE_ASYNC_MIGRATE is used to indicate that it only wants to pages
+	 * that it is possible to migrate without blocking
+	 */
+	if (mode & (ISOLATE_CLEAN|ISOLATE_ASYNC_MIGRATE)) {
+		/* All the caller can do on PageWriteback is block */
+		if (PageWriteback(page))
+			return ret;
+
+		if (PageDirty(page)) {
+			struct address_space *mapping;
+
+			/* ISOLATE_CLEAN means only clean pages */
+			if (mode & ISOLATE_CLEAN)
+				return ret;
+
+			/*
+			 * Only pages without mappings or that have a
+			 * ->migratepage callback are possible to migrate
+			 * without blocking
+			 */
+			mapping = page_mapping(page);
+			if (mapping && !mapping->a_ops->migratepage)
+				return ret;
+		}
+	}
 
 	if ((mode & ISOLATE_UNMAPPED) && page_mapped(page))
 		return ret;
@@ -1178,7 +1215,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 			 * anon page which don't already have a swap slot is
 			 * pointless.
 			 */
-			if (nr_swap_pages <= 0 && PageAnon(cursor_page) &&
+			if (nr_swap_pages <= 0 && PageSwapBacked(cursor_page) &&
 			    !PageSwapCache(cursor_page))
 				break;
 
@@ -1874,7 +1911,8 @@ static void get_scan_count(struct zone *zone, struct scan_control *sc,
 	 * latencies, so it's better to scan a minimum amount there as
 	 * well.
 	 */
-	if (scanning_global_lru(sc) && current_is_kswapd())
+	if (scanning_global_lru(sc) && current_is_kswapd() &&
+	    zone->all_unreclaimable)
 		force_scan = true;
 	if (!scanning_global_lru(sc))
 		force_scan = true;
@@ -2012,8 +2050,9 @@ static inline bool should_continue_reclaim(struct zone *zone,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = (2UL << sc->order);
-	inactive_lru_pages = zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON) +
-				zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+	inactive_lru_pages = zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+	if (nr_swap_pages > 0)
+		inactive_lru_pages += zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
 	if (sc->nr_reclaimed < pages_for_compaction &&
 			inactive_lru_pages > pages_for_compaction)
 		return true;
@@ -2088,6 +2127,42 @@ restart:
 	throttle_vm_writeout(sc->gfp_mask);
 }
 
+/* Returns true if compaction should go ahead for a high-order request */
+static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+{
+	unsigned long balance_gap, watermark;
+	bool watermark_ok;
+
+	/* Do not consider compaction for orders reclaim is meant to satisfy */
+	if (sc->order <= PAGE_ALLOC_COSTLY_ORDER)
+		return false;
+
+	/*
+	 * Compaction takes time to run and there are potentially other
+	 * callers using the pages just freed. Continue reclaiming until
+	 * there is a buffer of free pages available to give compaction
+	 * a reasonable chance of completing and allocating the page
+	 */
+	balance_gap = min(low_wmark_pages(zone),
+		(zone->present_pages + KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+			KSWAPD_ZONE_BALANCE_GAP_RATIO);
+	watermark = high_wmark_pages(zone) + balance_gap + (2UL << sc->order);
+	watermark_ok = zone_watermark_ok_safe(zone, 0, watermark, 0, 0);
+
+	/*
+	 * If compaction is deferred, reclaim up to a point where
+	 * compaction will have a chance of success when re-enabled
+	 */
+	if (compaction_deferred(zone))
+		return watermark_ok;
+
+	/* If compaction is not ready to start, keep reclaiming */
+	if (!compaction_suitable(zone, sc->order))
+		return false;
+
+	return watermark_ok;
+}
+
 /*
  * This is the direct reclaim path, for page-allocating processes.  We only
  * try to reclaim pages from zones which will satisfy the caller's allocation
@@ -2105,8 +2180,9 @@ restart:
  * scan then give up on it.
  *
  * This function returns true if a zone is being reclaimed for a costly
- * high-order allocation and compaction is either ready to begin or deferred.
- * This indicates to the caller that it should retry the allocation or fail.
+ * high-order allocation and compaction is ready to begin. This indicates to
+ * the caller that it should consider retrying the allocation instead of
+ * further reclaim.
  */
 static bool shrink_zones(int priority, struct zonelist *zonelist,
 					struct scan_control *sc)
@@ -2115,7 +2191,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
 	struct zone *zone;
 	unsigned long nr_soft_reclaimed;
 	unsigned long nr_soft_scanned;
-	bool should_abort_reclaim = false;
+	bool aborted_reclaim = false;
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					gfp_zone(sc->gfp_mask), sc->nodemask) {
@@ -2140,10 +2216,8 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
 				 * noticable problem, like transparent huge page
 				 * allocations.
 				 */
-				if (sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-					(compaction_suitable(zone, sc->order) ||
-					 compaction_deferred(zone))) {
-					should_abort_reclaim = true;
+				if (compaction_ready(zone, sc)) {
+					aborted_reclaim = true;
 					continue;
 				}
 			}
@@ -2165,7 +2239,7 @@ static bool shrink_zones(int priority, struct zonelist *zonelist,
 		shrink_zone(priority, zone, sc);
 	}
 
-	return should_abort_reclaim;
+	return aborted_reclaim;
 }
 
 static bool zone_reclaimable(struct zone *zone)
@@ -2219,8 +2293,8 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	unsigned long writeback_threshold;
+	bool aborted_reclaim;
 
-	get_mems_allowed();
 	delayacct_freepages_start();
 
 	if (scanning_global_lru(sc))
@@ -2230,8 +2304,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token(sc->mem_cgroup);
-		if (shrink_zones(priority, zonelist, sc))
-			break;
+		aborted_reclaim = shrink_zones(priority, zonelist, sc);
 
 		/*
 		 * Don't shrink slabs when reclaiming memory from
@@ -2285,7 +2358,6 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
 
 out:
 	delayacct_freepages_end();
-	put_mems_allowed();
 
 	if (sc->nr_reclaimed)
 		return sc->nr_reclaimed;
@@ -2298,6 +2370,10 @@ out:
 	if (oom_killer_disabled)
 		return 0;
 
+	/* Aborted reclaim to try compaction? don't OOM, then */
+	if (aborted_reclaim)
+		return 1;
+
 	/* top priority shrink_zones still had more to do? don't OOM, then */
 	if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
 		return 1;
diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index c505fd5..c119f33 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -868,7 +868,6 @@ static int hdmi_pcm_open(struct hda_pcm_stream *hinfo,
 	struct hdmi_spec_per_pin *per_pin;
 	struct hdmi_eld *eld;
 	struct hdmi_spec_per_cvt *per_cvt = NULL;
-	int pinctl;
 
 	/* Validate hinfo */
 	pin_idx = hinfo_to_pin_index(spec, hinfo);
@@ -904,11 +903,6 @@ static int hdmi_pcm_open(struct hda_pcm_stream *hinfo,
 	snd_hda_codec_write(codec, per_pin->pin_nid, 0,
 			    AC_VERB_SET_CONNECT_SEL,
 			    mux_idx);
-	pinctl = snd_hda_codec_read(codec, per_pin->pin_nid, 0,
-				    AC_VERB_GET_PIN_WIDGET_CONTROL, 0);
-	snd_hda_codec_write(codec, per_pin->pin_nid, 0,
-			    AC_VERB_SET_PIN_WIDGET_CONTROL,
-			    pinctl | PIN_OUT);
 	snd_hda_spdif_ctls_assign(codec, pin_idx, per_cvt->cvt_nid);
 
 	/* Initially set the converter's capabilities */
@@ -1147,11 +1141,17 @@ static int generic_hdmi_playback_pcm_prepare(struct hda_pcm_stream *hinfo,
 	struct hdmi_spec *spec = codec->spec;
 	int pin_idx = hinfo_to_pin_index(spec, hinfo);
 	hda_nid_t pin_nid = spec->pins[pin_idx].pin_nid;
+	int pinctl;
 
 	hdmi_set_channel_count(codec, cvt_nid, substream->runtime->channels);
 
 	hdmi_setup_audio_infoframe(codec, pin_idx, substream);
 
+	pinctl = snd_hda_codec_read(codec, pin_nid, 0,
+				    AC_VERB_GET_PIN_WIDGET_CONTROL, 0);
+	snd_hda_codec_write(codec, pin_nid, 0,
+			    AC_VERB_SET_PIN_WIDGET_CONTROL, pinctl | PIN_OUT);
+
 	return hdmi_setup_stream(codec, cvt_nid, pin_nid, stream_tag, format);
 }
 
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 5f096a5..191fd78 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5989,6 +5989,7 @@ static const struct hda_codec_preset snd_hda_preset_realtek[] = {
 	{ .id = 0x10ec0275, .name = "ALC275", .patch = patch_alc269 },
 	{ .id = 0x10ec0276, .name = "ALC276", .patch = patch_alc269 },
 	{ .id = 0x10ec0280, .name = "ALC280", .patch = patch_alc269 },
+	{ .id = 0x10ec0282, .name = "ALC282", .patch = patch_alc269 },
 	{ .id = 0x10ec0861, .rev = 0x100340, .name = "ALC660",
 	  .patch = patch_alc861 },
 	{ .id = 0x10ec0660, .name = "ALC660-VD", .patch = patch_alc861vd },
diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index 90e93bf..ab21ead 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -1381,7 +1381,15 @@ static int dapm_power_widgets(struct snd_soc_dapm_context *dapm, int event)
 	}
 
 	list_for_each_entry(w, &card->widgets, list) {
-		list_del_init(&w->dirty);
+		switch (w->id) {
+		case snd_soc_dapm_pre:
+		case snd_soc_dapm_post:
+			/* These widgets always need to be powered */
+			break;
+		default:
+			list_del_init(&w->dirty);
+			break;
+		}
 
 		if (w->power) {
 			d = w->dapm;
@@ -2966,10 +2974,13 @@ EXPORT_SYMBOL_GPL(snd_soc_dapm_free);
 
 static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
 {
+	struct snd_soc_card *card = dapm->card;
 	struct snd_soc_dapm_widget *w;
 	LIST_HEAD(down_list);
 	int powerdown = 0;
 
+	mutex_lock(&card->dapm_mutex);
+
 	list_for_each_entry(w, &dapm->card->widgets, list) {
 		if (w->dapm != dapm)
 			continue;
@@ -2992,6 +3003,8 @@ static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
 			snd_soc_dapm_set_bias_level(dapm,
 						    SND_SOC_BIAS_STANDBY);
 	}
+
+	mutex_unlock(&card->dapm_mutex);
 }
 
 /*



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [ 35/73] ASoC: dapm: Fix locking during codec shutdown
  2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
@ 2012-07-31 16:11   ` Herton Ronaldo Krzesinski
  2012-07-31 16:13     ` Mark Brown
  0 siblings, 1 reply; 94+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-07-31 16:11 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, torvalds, akpm, alan, Liam Girdwood,
	Misael Lopez Cruz, Mark Brown

On Tue, Jul 31, 2012 at 05:43:45AM +0100, Ben Hutchings wrote:
> 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Liam Girdwood <lrg@ti.com>
> 
> commit 01005a729a17ab419f61a366e22f3419e7a2c3fe upstream.
> 
> Codec shutdown performs a DAPM power sequence that might cause conflicts
> and/or race conditions if another stream power event is running simultaneously.
> Use card's dapm mutex to protect any potential race condition between them.
> 
> Signed-off-by: Misael Lopez Cruz <misael.lopez@ti.com>
> Signed-off-by: Liam Girdwood <lrg@ti.com>
> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  sound/soc/soc-dapm.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
> index 5be4f9a..114f2af 100644
> --- a/sound/soc/soc-dapm.c
> +++ b/sound/soc/soc-dapm.c
> @@ -3537,10 +3537,13 @@ EXPORT_SYMBOL_GPL(snd_soc_dapm_free);
>  
>  static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
>  {
> +	struct snd_soc_card *card = dapm->card;
>  	struct snd_soc_dapm_widget *w;
>  	LIST_HEAD(down_list);
>  	int powerdown = 0;
>  
> +	mutex_lock(&card->dapm_mutex);
> +

Hi, this doesn't build on 3.2:

linux-stable/sound/soc/soc-dapm.c: In function 'soc_dapm_shutdown_codec':
linux-stable/sound/soc/soc-dapm.c:2982:18: error: 'struct snd_soc_card' has no member named 'dapm_mutex'
linux-stable/sound/soc/soc-dapm.c:3007:20: error: 'struct snd_soc_card' has no member named 'dapm_mutex'

Looking at it, I'm not sure the fix is needed on 3.2, and introducing
dapm_mutex would be several changes.

>  	list_for_each_entry(w, &dapm->card->widgets, list) {
>  		if (w->dapm != dapm)
>  			continue;
> @@ -3563,6 +3566,8 @@ static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
>  			snd_soc_dapm_set_bias_level(dapm,
>  						    SND_SOC_BIAS_STANDBY);
>  	}
> +
> +	mutex_unlock(&card->dapm_mutex);
>  }
>  
>  /*
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 35/73] ASoC: dapm: Fix locking during codec shutdown
  2012-07-31 16:11   ` Herton Ronaldo Krzesinski
@ 2012-07-31 16:13     ` Mark Brown
  2012-07-31 23:20       ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Mark Brown @ 2012-07-31 16:13 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: Ben Hutchings, linux-kernel, stable, torvalds, akpm, alan,
	Liam Girdwood, Misael Lopez Cruz

On Tue, Jul 31, 2012 at 01:11:01PM -0300, Herton Ronaldo Krzesinski wrote:

> Hi, this doesn't build on 3.2:

> linux-stable/sound/soc/soc-dapm.c: In function 'soc_dapm_shutdown_codec':
> linux-stable/sound/soc/soc-dapm.c:2982:18: error: 'struct snd_soc_card' has no member named 'dapm_mutex'
> linux-stable/sound/soc/soc-dapm.c:3007:20: error: 'struct snd_soc_card' has no member named 'dapm_mutex'

> Looking at it, I'm not sure the fix is needed on 3.2, and introducing
> dapm_mutex would be several changes.

Yes, this is irrelevant on v3.2.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 06/73] mm: compaction: introduce sync-light migration for use by compaction
  2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
@ 2012-07-31 16:42   ` Herton Ronaldo Krzesinski
  2012-07-31 17:00     ` Mel Gorman
  0 siblings, 1 reply; 94+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-07-31 16:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, torvalds, akpm, alan, Mel Gorman,
	Rik van Riel, Andrea Arcangeli, Minchan Kim, Dave Jones,
	Jan Kara, Andy Isaacson, Nai Xia, Johannes Weiner

On Tue, Jul 31, 2012 at 05:43:16AM +0100, Ben Hutchings wrote:
> 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Mel Gorman <mgorman@suse.de>
> 
> commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream.

We need also to pick recent fix dc32f63453f56d07a1073a697dcd843dd3098c09 after
applying this one.

> 
> Stable note: Not tracked in Buzilla. This was part of a series that
> 	reduced interactivity stalls experienced when THP was enabled.
> 	These stalls were particularly noticable when copying data
> 	to a USB stick but the experiences for users varied a lot.
> 
> This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
> mode that avoids writing back pages to backing storage.  Async compaction
> maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
> For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
> used.
> 
> This avoids sync compaction stalling for an excessive length of time,
> particularly when copying files to a USB stick where there might be a
> large number of dirty pages backed by a filesystem that does not support
> ->writepages.
> 
> [aarcange@redhat.com: This patch is heavily based on Andrea's work]
> [akpm@linux-foundation.org: fix fs/nfs/write.c build]
> [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>
> Cc: Dave Jones <davej@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Andy Isaacson <adi@hexapodia.org>
> Cc: Nai Xia <nai.xia@gmail.com>
> Cc: Johannes Weiner <jweiner@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  fs/btrfs/disk-io.c      |    5 +--
>  fs/hugetlbfs/inode.c    |    2 +-
>  fs/nfs/internal.h       |    2 +-
>  fs/nfs/write.c          |    4 +--
>  include/linux/fs.h      |    6 ++--
>  include/linux/migrate.h |   23 +++++++++++---
>  mm/compaction.c         |    2 +-
>  mm/memory-failure.c     |    2 +-
>  mm/memory_hotplug.c     |    2 +-
>  mm/mempolicy.c          |    2 +-
>  mm/migrate.c            |   78 ++++++++++++++++++++++++++---------------------
>  11 files changed, 76 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1375494..d852566 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
>  
>  #ifdef CONFIG_MIGRATION
>  static int btree_migratepage(struct address_space *mapping,
> -			struct page *newpage, struct page *page, bool sync)
> +			struct page *newpage, struct page *page,
> +			enum migrate_mode mode)
>  {
>  	/*
>  	 * we can't safely write a btree page from here,
> @@ -887,7 +888,7 @@ static int btree_migratepage(struct address_space *mapping,
>  	if (page_has_private(page) &&
>  	    !try_to_release_page(page, GFP_KERNEL))
>  		return -EAGAIN;
> -	return migrate_page(mapping, newpage, page, sync);
> +	return migrate_page(mapping, newpage, page, mode);
>  }
>  #endif
>  
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 06fd460..1e85a7a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -584,7 +584,7 @@ static int hugetlbfs_set_page_dirty(struct page *page)
>  
>  static int hugetlbfs_migrate_page(struct address_space *mapping,
>  				struct page *newpage, struct page *page,
> -				bool sync)
> +				enum migrate_mode mode)
>  {
>  	int rc;
>  
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 114398a..8102db9 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -332,7 +332,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
>  
>  #ifdef CONFIG_MIGRATION
>  extern int nfs_migrate_page(struct address_space *,
> -		struct page *, struct page *, bool);
> +		struct page *, struct page *, enum migrate_mode);
>  #else
>  #define nfs_migrate_page NULL
>  #endif
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 889e98b..834f0fe 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1688,7 +1688,7 @@ out_error:
>  
>  #ifdef CONFIG_MIGRATION
>  int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
> -		struct page *page, bool sync)
> +		struct page *page, enum migrate_mode mode)
>  {
>  	/*
>  	 * If PagePrivate is set, then the page is currently associated with
> @@ -1703,7 +1703,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
>  
>  	nfs_fscache_release_page(page, GFP_KERNEL);
>  
> -	return migrate_page(mapping, newpage, page, sync);
> +	return migrate_page(mapping, newpage, page, mode);
>  }
>  #endif
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b92b73d..e694bd4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -525,6 +525,7 @@ enum positive_aop_returns {
>  struct page;
>  struct address_space;
>  struct writeback_control;
> +enum migrate_mode;
>  
>  struct iov_iter {
>  	const struct iovec *iov;
> @@ -614,7 +615,7 @@ struct address_space_operations {
>  	 * is false, it must not block.
>  	 */
>  	int (*migratepage) (struct address_space *,
> -			struct page *, struct page *, bool);
> +			struct page *, struct page *, enum migrate_mode);
>  	int (*launder_page) (struct page *);
>  	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
>  					unsigned long);
> @@ -2540,7 +2541,8 @@ extern int generic_check_addressable(unsigned, u64);
>  
>  #ifdef CONFIG_MIGRATION
>  extern int buffer_migrate_page(struct address_space *,
> -				struct page *, struct page *, bool);
> +				struct page *, struct page *,
> +				enum migrate_mode);
>  #else
>  #define buffer_migrate_page NULL
>  #endif
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 14e6d2a..eaf8674 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -6,18 +6,31 @@
>  
>  typedef struct page *new_page_t(struct page *, unsigned long private, int **);
>  
> +/*
> + * MIGRATE_ASYNC means never block
> + * MIGRATE_SYNC_LIGHT in the current implementation means to allow blocking
> + *	on most operations but not ->writepage as the potential stall time
> + *	is too significant
> + * MIGRATE_SYNC will block when migrating pages
> + */
> +enum migrate_mode {
> +	MIGRATE_ASYNC,
> +	MIGRATE_SYNC_LIGHT,
> +	MIGRATE_SYNC,
> +};
> +
>  #ifdef CONFIG_MIGRATION
>  #define PAGE_MIGRATION 1
>  
>  extern void putback_lru_pages(struct list_head *l);
>  extern int migrate_page(struct address_space *,
> -			struct page *, struct page *, bool);
> +			struct page *, struct page *, enum migrate_mode);
>  extern int migrate_pages(struct list_head *l, new_page_t x,
>  			unsigned long private, bool offlining,
> -			bool sync);
> +			enum migrate_mode mode);
>  extern int migrate_huge_pages(struct list_head *l, new_page_t x,
>  			unsigned long private, bool offlining,
> -			bool sync);
> +			enum migrate_mode mode);
>  
>  extern int fail_migrate_page(struct address_space *,
>  			struct page *, struct page *);
> @@ -36,10 +49,10 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>  static inline void putback_lru_pages(struct list_head *l) {}
>  static inline int migrate_pages(struct list_head *l, new_page_t x,
>  		unsigned long private, bool offlining,
> -		bool sync) { return -ENOSYS; }
> +		enum migrate_mode mode) { return -ENOSYS; }
>  static inline int migrate_huge_pages(struct list_head *l, new_page_t x,
>  		unsigned long private, bool offlining,
> -		bool sync) { return -ENOSYS; }
> +		enum migrate_mode mode) { return -ENOSYS; }
>  
>  static inline int migrate_prep(void) { return -ENOSYS; }
>  static inline int migrate_prep_local(void) { return -ENOSYS; }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fb29158..71a58f6 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -557,7 +557,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  		nr_migrate = cc->nr_migratepages;
>  		err = migrate_pages(&cc->migratepages, compaction_alloc,
>  				(unsigned long)cc, false,
> -				cc->sync);
> +				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
>  		update_nr_listpages(cc);
>  		nr_remaining = cc->nr_migratepages;
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 06d3479..56080ea 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1557,7 +1557,7 @@ int soft_offline_page(struct page *page, int flags)
>  					    page_is_file_cache(page));
>  		list_add(&page->lru, &pagelist);
>  		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
> -								0, true);
> +							0, MIGRATE_SYNC);
>  		if (ret) {
>  			putback_lru_pages(&pagelist);
>  			pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 2168489..6629faf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -809,7 +809,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  		}
>  		/* this function returns # of failed pages */
>  		ret = migrate_pages(&source, hotremove_migrate_alloc, 0,
> -								true, true);
> +							true, MIGRATE_SYNC);
>  		if (ret)
>  			putback_lru_pages(&source);
>  	}
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index e3d58f0..06b145f 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -942,7 +942,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
>  
>  	if (!list_empty(&pagelist)) {
>  		err = migrate_pages(&pagelist, new_node_page, dest,
> -								false, true);
> +							false, MIGRATE_SYNC);
>  		if (err)
>  			putback_lru_pages(&pagelist);
>  	}
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4e86f3b..9871a56 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -218,12 +218,13 @@ out:
>  
>  #ifdef CONFIG_BLOCK
>  /* Returns true if all buffers are successfully locked */
> -static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
> +static bool buffer_migrate_lock_buffers(struct buffer_head *head,
> +							enum migrate_mode mode)
>  {
>  	struct buffer_head *bh = head;
>  
>  	/* Simple case, sync compaction */
> -	if (sync) {
> +	if (mode != MIGRATE_ASYNC) {
>  		do {
>  			get_bh(bh);
>  			lock_buffer(bh);
> @@ -259,7 +260,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync)
>  }
>  #else
>  static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
> -								bool sync)
> +							enum migrate_mode mode)
>  {
>  	return true;
>  }
> @@ -275,7 +276,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
>   */
>  static int migrate_page_move_mapping(struct address_space *mapping,
>  		struct page *newpage, struct page *page,
> -		struct buffer_head *head, bool sync)
> +		struct buffer_head *head, enum migrate_mode mode)
>  {
>  	int expected_count;
>  	void **pslot;
> @@ -311,7 +312,8 @@ static int migrate_page_move_mapping(struct address_space *mapping,
>  	 * the mapping back due to an elevated page count, we would have to
>  	 * block waiting on other references to be dropped.
>  	 */
> -	if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) {
> +	if (mode == MIGRATE_ASYNC && head &&
> +			!buffer_migrate_lock_buffers(head, mode)) {
>  		page_unfreeze_refs(page, expected_count);
>  		spin_unlock_irq(&mapping->tree_lock);
>  		return -EAGAIN;
> @@ -472,13 +474,14 @@ EXPORT_SYMBOL(fail_migrate_page);
>   * Pages are locked upon entry and exit.
>   */
>  int migrate_page(struct address_space *mapping,
> -		struct page *newpage, struct page *page, bool sync)
> +		struct page *newpage, struct page *page,
> +		enum migrate_mode mode)
>  {
>  	int rc;
>  
>  	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
>  
> -	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync);
> +	rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode);
>  
>  	if (rc)
>  		return rc;
> @@ -495,17 +498,17 @@ EXPORT_SYMBOL(migrate_page);
>   * exist.
>   */
>  int buffer_migrate_page(struct address_space *mapping,
> -		struct page *newpage, struct page *page, bool sync)
> +		struct page *newpage, struct page *page, enum migrate_mode mode)
>  {
>  	struct buffer_head *bh, *head;
>  	int rc;
>  
>  	if (!page_has_buffers(page))
> -		return migrate_page(mapping, newpage, page, sync);
> +		return migrate_page(mapping, newpage, page, mode);
>  
>  	head = page_buffers(page);
>  
> -	rc = migrate_page_move_mapping(mapping, newpage, page, head, sync);
> +	rc = migrate_page_move_mapping(mapping, newpage, page, head, mode);
>  
>  	if (rc)
>  		return rc;
> @@ -515,8 +518,8 @@ int buffer_migrate_page(struct address_space *mapping,
>  	 * with an IRQ-safe spinlock held. In the sync case, the buffers
>  	 * need to be locked now
>  	 */
> -	if (sync)
> -		BUG_ON(!buffer_migrate_lock_buffers(head, sync));
> +	if (mode != MIGRATE_ASYNC)
> +		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
>  
>  	ClearPagePrivate(page);
>  	set_page_private(newpage, page_private(page));
> @@ -593,10 +596,11 @@ static int writeout(struct address_space *mapping, struct page *page)
>   * Default handling if a filesystem does not provide a migration function.
>   */
>  static int fallback_migrate_page(struct address_space *mapping,
> -	struct page *newpage, struct page *page, bool sync)
> +	struct page *newpage, struct page *page, enum migrate_mode mode)
>  {
>  	if (PageDirty(page)) {
> -		if (!sync)
> +		/* Only writeback pages in full synchronous migration */
> +		if (mode != MIGRATE_SYNC)
>  			return -EBUSY;
>  		return writeout(mapping, page);
>  	}
> @@ -609,7 +613,7 @@ static int fallback_migrate_page(struct address_space *mapping,
>  	    !try_to_release_page(page, GFP_KERNEL))
>  		return -EAGAIN;
>  
> -	return migrate_page(mapping, newpage, page, sync);
> +	return migrate_page(mapping, newpage, page, mode);
>  }
>  
>  /*
> @@ -624,7 +628,7 @@ static int fallback_migrate_page(struct address_space *mapping,
>   *  == 0 - success
>   */
>  static int move_to_new_page(struct page *newpage, struct page *page,
> -					int remap_swapcache, bool sync)
> +				int remap_swapcache, enum migrate_mode mode)
>  {
>  	struct address_space *mapping;
>  	int rc;
> @@ -645,7 +649,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  
>  	mapping = page_mapping(page);
>  	if (!mapping)
> -		rc = migrate_page(mapping, newpage, page, sync);
> +		rc = migrate_page(mapping, newpage, page, mode);
>  	else if (mapping->a_ops->migratepage)
>  		/*
>  		 * Most pages have a mapping and most filesystems provide a
> @@ -654,9 +658,9 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  		 * is the most common path for page migration.
>  		 */
>  		rc = mapping->a_ops->migratepage(mapping,
> -						newpage, page, sync);
> +						newpage, page, mode);
>  	else
> -		rc = fallback_migrate_page(mapping, newpage, page, sync);
> +		rc = fallback_migrate_page(mapping, newpage, page, mode);
>  
>  	if (rc) {
>  		newpage->mapping = NULL;
> @@ -671,7 +675,7 @@ static int move_to_new_page(struct page *newpage, struct page *page,
>  }
>  
>  static int __unmap_and_move(struct page *page, struct page *newpage,
> -				int force, bool offlining, bool sync)
> +			int force, bool offlining, enum migrate_mode mode)
>  {
>  	int rc = -EAGAIN;
>  	int remap_swapcache = 1;
> @@ -680,7 +684,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  	struct anon_vma *anon_vma = NULL;
>  
>  	if (!trylock_page(page)) {
> -		if (!force || !sync)
> +		if (!force || mode == MIGRATE_ASYNC)
>  			goto out;
>  
>  		/*
> @@ -726,10 +730,12 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  
>  	if (PageWriteback(page)) {
>  		/*
> -		 * For !sync, there is no point retrying as the retry loop
> -		 * is expected to be too short for PageWriteback to be cleared
> +		 * Only in the case of a full syncronous migration is it
> +		 * necessary to wait for PageWriteback. In the async case,
> +		 * the retry loop is too short and in the sync-light case,
> +		 * the overhead of stalling is too much
>  		 */
> -		if (!sync) {
> +		if (mode != MIGRATE_SYNC) {
>  			rc = -EBUSY;
>  			goto uncharge;
>  		}
> @@ -800,7 +806,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>  
>  skip_unmap:
>  	if (!page_mapped(page))
> -		rc = move_to_new_page(newpage, page, remap_swapcache, sync);
> +		rc = move_to_new_page(newpage, page, remap_swapcache, mode);
>  
>  	if (rc && remap_swapcache)
>  		remove_migration_ptes(page, page);
> @@ -823,7 +829,8 @@ out:
>   * to the newly allocated page in newpage.
>   */
>  static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> -			struct page *page, int force, bool offlining, bool sync)
> +			struct page *page, int force, bool offlining,
> +			enum migrate_mode mode)
>  {
>  	int rc = 0;
>  	int *result = NULL;
> @@ -843,7 +850,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
>  		if (unlikely(split_huge_page(page)))
>  			goto out;
>  
> -	rc = __unmap_and_move(page, newpage, force, offlining, sync);
> +	rc = __unmap_and_move(page, newpage, force, offlining, mode);
>  out:
>  	if (rc != -EAGAIN) {
>  		/*
> @@ -891,7 +898,8 @@ out:
>   */
>  static int unmap_and_move_huge_page(new_page_t get_new_page,
>  				unsigned long private, struct page *hpage,
> -				int force, bool offlining, bool sync)
> +				int force, bool offlining,
> +				enum migrate_mode mode)
>  {
>  	int rc = 0;
>  	int *result = NULL;
> @@ -904,7 +912,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	rc = -EAGAIN;
>  
>  	if (!trylock_page(hpage)) {
> -		if (!force || !sync)
> +		if (!force || mode != MIGRATE_SYNC)
>  			goto out;
>  		lock_page(hpage);
>  	}
> @@ -915,7 +923,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	try_to_unmap(hpage, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>  
>  	if (!page_mapped(hpage))
> -		rc = move_to_new_page(new_hpage, hpage, 1, sync);
> +		rc = move_to_new_page(new_hpage, hpage, 1, mode);
>  
>  	if (rc)
>  		remove_migration_ptes(hpage, hpage);
> @@ -958,7 +966,7 @@ out:
>   */
>  int migrate_pages(struct list_head *from,
>  		new_page_t get_new_page, unsigned long private, bool offlining,
> -		bool sync)
> +		enum migrate_mode mode)
>  {
>  	int retry = 1;
>  	int nr_failed = 0;
> @@ -979,7 +987,7 @@ int migrate_pages(struct list_head *from,
>  
>  			rc = unmap_and_move(get_new_page, private,
>  						page, pass > 2, offlining,
> -						sync);
> +						mode);
>  
>  			switch(rc) {
>  			case -ENOMEM:
> @@ -1009,7 +1017,7 @@ out:
>  
>  int migrate_huge_pages(struct list_head *from,
>  		new_page_t get_new_page, unsigned long private, bool offlining,
> -		bool sync)
> +		enum migrate_mode mode)
>  {
>  	int retry = 1;
>  	int nr_failed = 0;
> @@ -1026,7 +1034,7 @@ int migrate_huge_pages(struct list_head *from,
>  
>  			rc = unmap_and_move_huge_page(get_new_page,
>  					private, page, pass > 2, offlining,
> -					sync);
> +					mode);
>  
>  			switch(rc) {
>  			case -ENOMEM:
> @@ -1155,7 +1163,7 @@ set_status:
>  	err = 0;
>  	if (!list_empty(&pagelist)) {
>  		err = migrate_pages(&pagelist, new_page_node,
> -				(unsigned long)pm, 0, true);
> +				(unsigned long)pm, 0, MIGRATE_SYNC);
>  		if (err)
>  			putback_lru_pages(&pagelist);
>  	}
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 06/73] mm: compaction: introduce sync-light migration for use by compaction
  2012-07-31 16:42   ` Herton Ronaldo Krzesinski
@ 2012-07-31 17:00     ` Mel Gorman
  2012-07-31 17:03       ` Mel Gorman
  0 siblings, 1 reply; 94+ messages in thread
From: Mel Gorman @ 2012-07-31 17:00 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: Ben Hutchings, linux-kernel, stable, torvalds, akpm, alan,
	Rik van Riel, Andrea Arcangeli, Minchan Kim, Dave Jones,
	Jan Kara, Andy Isaacson, Nai Xia, Johannes Weiner

On Tue, Jul 31, 2012 at 01:42:04PM -0300, Herton Ronaldo Krzesinski wrote:
> On Tue, Jul 31, 2012 at 05:43:16AM +0100, Ben Hutchings wrote:
> > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Mel Gorman <mgorman@suse.de>
> > 
> > commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream.
> 
> We need also to pick recent fix dc32f63453f56d07a1073a697dcd843dd3098c09 after
> applying this one.
> 

mel@machina:~/git-public/linux-2.6 > git remote update
Fetching linux-next
Fetching stable
Fetching net-next
mel@machina:~/git-public/linux-2.6 > git show dc32f63453f56d07a1073a697dcd843dd3098c09
fatal: bad object dc32f63453f56d07a1073a697dcd843dd3098c09

What commit is this, where did it come from and why is it needed?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 06/73] mm: compaction: introduce sync-light migration for use by compaction
  2012-07-31 17:00     ` Mel Gorman
@ 2012-07-31 17:03       ` Mel Gorman
  2012-07-31 23:12         ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Mel Gorman @ 2012-07-31 17:03 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: Ben Hutchings, linux-kernel, stable, torvalds, akpm, alan,
	Rik van Riel, Andrea Arcangeli, Minchan Kim, Dave Jones,
	Jan Kara, Andy Isaacson, Nai Xia, Johannes Weiner

On Tue, Jul 31, 2012 at 06:00:51PM +0100, Mel Gorman wrote:
> On Tue, Jul 31, 2012 at 01:42:04PM -0300, Herton Ronaldo Krzesinski wrote:
> > On Tue, Jul 31, 2012 at 05:43:16AM +0100, Ben Hutchings wrote:
> > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > > 
> > > ------------------
> > > 
> > > From: Mel Gorman <mgorman@suse.de>
> > > 
> > > commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream.
> > 
> > We need also to pick recent fix dc32f63453f56d07a1073a697dcd843dd3098c09 after
> > applying this one.
> > 
> 
> mel@machina:~/git-public/linux-2.6 > git remote update
> Fetching linux-next
> Fetching stable
> Fetching net-next
> mel@machina:~/git-public/linux-2.6 > git show dc32f63453f56d07a1073a697dcd843dd3098c09
> fatal: bad object dc32f63453f56d07a1073a697dcd843dd3098c09
> 
> What commit is this, where did it come from and why is it needed?
> 

Bah, I'm an idiot. Yes, this patch should be included as well.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 06/73] mm: compaction: introduce sync-light migration for use by compaction
  2012-07-31 17:03       ` Mel Gorman
@ 2012-07-31 23:12         ` Ben Hutchings
  0 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31 23:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Herton Ronaldo Krzesinski, linux-kernel, stable, torvalds, akpm,
	alan, Rik van Riel, Andrea Arcangeli, Minchan Kim, Dave Jones,
	Jan Kara, Andy Isaacson, Nai Xia, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On Tue, 2012-07-31 at 18:03 +0100, Mel Gorman wrote:
> On Tue, Jul 31, 2012 at 06:00:51PM +0100, Mel Gorman wrote:
> > On Tue, Jul 31, 2012 at 01:42:04PM -0300, Herton Ronaldo Krzesinski wrote:
> > > On Tue, Jul 31, 2012 at 05:43:16AM +0100, Ben Hutchings wrote:
> > > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > > > 
> > > > ------------------
> > > > 
> > > > From: Mel Gorman <mgorman@suse.de>
> > > > 
> > > > commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream.
> > > 
> > > We need also to pick recent fix dc32f63453f56d07a1073a697dcd843dd3098c09 after
> > > applying this one.
> > > 
> > 
> > mel@machina:~/git-public/linux-2.6 > git remote update
> > Fetching linux-next
> > Fetching stable
> > Fetching net-next
> > mel@machina:~/git-public/linux-2.6 > git show dc32f63453f56d07a1073a697dcd843dd3098c09
> > fatal: bad object dc32f63453f56d07a1073a697dcd843dd3098c09
> > 
> > What commit is this, where did it come from and why is it needed?
> > 
> 
> Bah, I'm an idiot. Yes, this patch should be included as well.

OK, I've added this.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 35/73] ASoC: dapm: Fix locking during codec shutdown
  2012-07-31 16:13     ` Mark Brown
@ 2012-07-31 23:20       ` Ben Hutchings
  0 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-07-31 23:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Herton Ronaldo Krzesinski, linux-kernel, stable, torvalds, akpm,
	alan, Liam Girdwood, Misael Lopez Cruz

[-- Attachment #1: Type: text/plain, Size: 844 bytes --]

On Tue, 2012-07-31 at 17:13 +0100, Mark Brown wrote:
> On Tue, Jul 31, 2012 at 01:11:01PM -0300, Herton Ronaldo Krzesinski wrote:
> 
> > Hi, this doesn't build on 3.2:
> 
> > linux-stable/sound/soc/soc-dapm.c: In function 'soc_dapm_shutdown_codec':
> > linux-stable/sound/soc/soc-dapm.c:2982:18: error: 'struct snd_soc_card' has no member named 'dapm_mutex'
> > linux-stable/sound/soc/soc-dapm.c:3007:20: error: 'struct snd_soc_card' has no member named 'dapm_mutex'
> 
> > Looking at it, I'm not sure the fix is needed on 3.2, and introducing
> > dapm_mutex would be several changes.
> 
> Yes, this is irrelevant on v3.2.

OK, I've dropped this.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one
  2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
@ 2012-08-01  1:56   ` Herton Ronaldo Krzesinski
  2012-08-01  2:36     ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Herton Ronaldo Krzesinski @ 2012-08-01  1:56 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, torvalds, akpm, alan, Nishanth Menon,
	Steve Sakoman, Tony Lindgren, Kevin Hilman

On Tue, Jul 31, 2012 at 05:43:38AM +0100, Ben Hutchings wrote:
> 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Nishanth Menon <nm@ti.com>
> 
> commit b110547e586eb5825bc1d04aa9147bff83b57672 upstream.

This change is uneeded in 3.2, but doesn't do any harm either... it just
seems to fix the code because of the continue added in 9fa2df6b (ARM:
OMAP2+: OPP: allow OPP enumeration to continue if device is not present),
change which 3.2 doesn't have. A noop for 3.2 anyway, so either way it's
fine, applying or not, just commenting on it.

> 
> Commit 9fa2df6b90786301b175e264f5fa9846aba81a65
> (ARM: OMAP2+: OPP: allow OPP enumeration to continue if device is not present)
> makes the logic:
> for (i = 0; i < opp_def_size; i++) {
> 	<snip>
> 	if (!oh || !oh->od) {
> 		<snip>
> 		continue;
> 	}
> <snip>
> opp_def++;
> }
> 
> In short, the moment we hit a "Bad OPP", we end up looping the list
> comparing against the bad opp definition pointer for the rest of the
> iteration count. Instead, increment opp_def in the for loop itself
> and allow continue to be used in code without much thought so that
> we check the next set of OPP definition pointers :)
> 
> Cc: Steve Sakoman <steve@sakoman.com>
> Cc: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Nishanth Menon <nm@ti.com>
> Signed-off-by: Kevin Hilman <khilman@ti.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  arch/arm/mach-omap2/opp.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/opp.c b/arch/arm/mach-omap2/opp.c
> index de6d464..d8f6dbf 100644
> --- a/arch/arm/mach-omap2/opp.c
> +++ b/arch/arm/mach-omap2/opp.c
> @@ -53,7 +53,7 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
>  	omap_table_init = 1;
>  
>  	/* Lets now register with OPP library */
> -	for (i = 0; i < opp_def_size; i++) {
> +	for (i = 0; i < opp_def_size; i++, opp_def++) {
>  		struct omap_hwmod *oh;
>  		struct device *dev;
>  
> @@ -86,7 +86,6 @@ int __init omap_init_opp_table(struct omap_opp_def *opp_def,
>  					__func__, opp_def->freq,
>  					opp_def->hwmod_name, i, r);
>  		}
> -		opp_def++;
>  	}
>  
>  	return 0;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
[]'s
Herton

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one
  2012-08-01  1:56   ` Herton Ronaldo Krzesinski
@ 2012-08-01  2:36     ` Ben Hutchings
  0 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-08-01  2:36 UTC (permalink / raw)
  To: Herton Ronaldo Krzesinski
  Cc: linux-kernel, stable, torvalds, akpm, alan, Nishanth Menon,
	Steve Sakoman, Tony Lindgren, Kevin Hilman

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

On Tue, 2012-07-31 at 22:56 -0300, Herton Ronaldo Krzesinski wrote:
> On Tue, Jul 31, 2012 at 05:43:38AM +0100, Ben Hutchings wrote:
> > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Nishanth Menon <nm@ti.com>
> > 
> > commit b110547e586eb5825bc1d04aa9147bff83b57672 upstream.
> 
> This change is uneeded in 3.2, but doesn't do any harm either... it just
> seems to fix the code because of the continue added in 9fa2df6b (ARM:
> OMAP2+: OPP: allow OPP enumeration to continue if device is not present),
> change which 3.2 doesn't have. A noop for 3.2 anyway, so either way it's
> fine, applying or not, just commenting on it.
[...]

I'll drop it.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 00/73] 3.2.25-stable review
  2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
                   ` (73 preceding siblings ...)
  2012-07-31  5:00 ` [ 00/73] 3.2.25-stable review Ben Hutchings
@ 2012-08-01 12:55 ` Steven Rostedt
  2012-08-05 22:26   ` Ben Hutchings
  74 siblings, 1 reply; 94+ messages in thread
From: Steven Rostedt @ 2012-08-01 12:55 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman
  Cc: linux-kernel, stable, torvalds, akpm, alan, Jens Axboe,
	Tejun Heo, Vivek Goyal

On Tue, 2012-07-31 at 05:43 +0100, Ben Hutchings wrote:
> This is the start of the stable review cycle for the 3.2.25 release.
> There are 73 patches in this series, which will be posted as responses
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu Aug  2 10:00:00 UTC 2012.
> Anything received after that time might be too late.
> 
> A combined patch relative to 3.2.24 will be posted as an additional
> response to this, and the diffstat can be found below.

I tested this against configs I normally test against my own code, and I
hit this bug:

[   37.030215] =============================================================================
[   37.031170] BUG blkdev_queue: Poison overwritten
[   37.031170] -----------------------------------------------------------------------------
[   37.031170] 
[   37.031170] INFO: 0xffff8800757287b0-0xffff8800757287b0. First byte 0x6a instead of 0x6b
[   37.031170] INFO: Allocated in blk_alloc_queue_node+0x25/0x1fb age=3399 cpu=1 pid=1
[   37.031170]  __slab_alloc+0x2e4/0x365
[   37.031170]  kmem_cache_alloc_node+0x92/0x18d
[   37.031170]  blk_alloc_queue_node+0x25/0x1fb
[   37.031170]  blk_init_queue_node+0x24/0x5c
[   37.031170]  blk_init_queue+0x11/0x13
[   37.031170]  floppy_init+0x78/0x5d9^M
[   37.031170]  do_one_initcall+0x7f/0x140
[   37.092043]  kernel_init+0xc9/0x143
[   37.092043]  kernel_thread_helper+0x4/0x10
[   37.092043] INFO: Freed in blk_release_queue+0x86/0x8b age=78 cpu=0 pid=1
[   37.092043]  __slab_free+0x38/0x377
[   37.092043]  kmem_cache_free+0xf7/0x155
[   37.092043]  blk_release_queue+0x86/0x8b
[   37.092043]  kobject_cleanup+0xc4/0xeb
[   37.092043]  kobject_release+0xd/0xf
[   37.092043]  kobject_put+0x4a/0x4f
[   37.092043]  blk_cleanup_queue+0x159/0x162
[   37.092043]  floppy_init+0x5b6/0x5d9
[   37.092043]  do_one_initcall+0x7f/0x140
[   37.092043]  kernel_init+0xc9/0x143
[   37.092043]  kernel_thread_helper+0x4/0x10
[   37.092043] INFO: Slab 0xffffea0001d5ca00 objects=9 used=9 fp=0x          (null) flags=0x100000000004080
[   37.092043] INFO: Object 0xffff880075728000 @offset=0 fp=0xffff88007572e700
[...]
[   37.092043] Object ffff880075728b80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[   37.092043] Object ffff880075728b90: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
[   37.092043] Redzone ffff880075728b98: bb bb bb bb bb bb bb bb                          ........
[   37.092043] Padding ffff880075728cd8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
[   37.092043] Pid: 1, comm: swapper/0 Not tainted 3.2.0-test+ #1
[   37.092043] Call Trace:
[   37.092043]  [<ffffffff8114d18b>] ? print_section+0x3d/0x3f
[   37.092043]  [<ffffffff8114db29>] print_trailer+0x10a/0x113
[   37.092043]  [<ffffffff8114df53>] check_bytes_and_report+0xb1/0xea
[   37.092043]  [<ffffffff8114e050>] check_object+0xc4/0x1fc
[   37.092043]  [<ffffffff813f7c63>] ? blk_alloc_queue_node+0x25/0x1fb
[   37.092043]  [<ffffffff813f7c63>] ? blk_alloc_queue_node+0x25/0x1fb
[   37.092043]  [<ffffffff819e465e>] alloc_debug_processing+0xa7/0x14a
[   37.092043]  [<ffffffff819e4fb1>] __slab_alloc+0x2e4/0x365
[   37.092043]  [<ffffffff813f7c63>] ? blk_alloc_queue_node+0x25/0x1fb
[   37.092043]  [<ffffffff810b113d>] ? trace_hardirqs_on+0xd/0xf
[   37.092043]  [<ffffffff8114fd71>] kmem_cache_alloc_node+0x92/0x18d
[   37.092043]  [<ffffffff810b10f9>] ? trace_hardirqs_on_caller+0x12f/0x166
[   37.092043]  [<ffffffff813f7c63>] ? blk_alloc_queue_node+0x25/0x1fb
[   37.092043]  [<ffffffff813f7c63>] blk_alloc_queue_node+0x25/0x1fb
[   37.092043]  [<ffffffff813f7e4a>] blk_alloc_queue+0x11/0x13
[   37.092043]  [<ffffffff81550489>] brd_alloc+0x79/0x185
[   37.092043]  [<ffffffff82264a24>] brd_init+0xc6/0x19c
[   37.092043]  [<ffffffff8226495e>] ? floppy_init+0x5d9/0x5d9
[   37.092043]  [<ffffffff8100020f>] do_one_initcall+0x7f/0x140
[   37.092043]  [<ffffffff82234c44>] kernel_init+0xc9/0x143
[   37.092043]  [<ffffffff81a14874>] kernel_thread_helper+0x4/0x10
[   37.092043]  [<ffffffff81a0c5f4>] ? retint_restore_args+0x13/0x13
[   37.092043]  [<ffffffff82234b7b>] ? start_kernel+0x3af/0x3af
[   37.092043]  [<ffffffff81a14870>] ? gs_change+0x13/0x13
[   37.092043] FIX blkdev_queue: Restoring 0xffff8800757287b0-0xffff8800757287b0=0x6b

I then checked against 3.2.24 and it bugged too, as well as vanilla 3.2.
I then checked against 3.3 and it did not bug. I kicked off a 'reverse'
bisect with ktest (bad is good and good is bad) and found the fix:

commit 3f9a5aabd0a9fe0e0cd308506f48963d79169aa7
Author: Vivek Goyal <vgoyal@redhat.com>
floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called

I applied it against v3.2.24 and it solved the bug. I did not apply it
against 25-rc1, but I'm pretty sure it should work for that too.

I haven't checked 3.0 if that has an issue with my config.

Anyway, please apply this patch to stable 3.2. Either for 25 or for 26.

Thanks!

-- Steve

commit 3f9a5aabd0a9fe0e0cd308506f48963d79169aa7
Author: Vivek Goyal <vgoyal@redhat.com>
Date:   Wed Feb 8 20:03:38 2012 +0100

    floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called
    
    add_disk() takes gendisk reference on request queue. If driver failed during
    initialization and never called add_disk() then that extra reference is not
    taken. That reference is put in put_disk(). floppy driver allocates the
    disk, allocates queue, sets disk->queue and then relizes that floppy
    controller is not present. It tries to tear down everything and tries to
    put a reference down in put_disk() which was never taken.
    
    In such error cases cleanup disk->queue before calling put_disk() so that
    we never try to put down a reference which was never taken in first place.
    
    Reported-and-tested-by: Suresh Jayaraman <sjayaraman@suse.com>
    Tested-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de>
    Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 510fb10..401ba78 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4368,8 +4368,14 @@ out_unreg_blkdev:
 out_put_disk:
 	while (dr--) {
 		del_timer_sync(&motor_off_timer[dr]);
-		if (disks[dr]->queue)
+		if (disks[dr]->queue) {
 			blk_cleanup_queue(disks[dr]->queue);
+			/*
+			 * put_disk() is not paired with add_disk() and
+			 * will put queue reference one extra time. fix it.
+			 */
+			disks[dr]->queue = NULL;
+		}
 		put_disk(disks[dr]);
 	}
 	return err;




^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
@ 2012-08-03  9:04   ` Sven Joachim
  2012-08-03  9:43     ` Borislav Petkov
  0 siblings, 1 reply; 94+ messages in thread
From: Sven Joachim @ 2012-08-03  9:04 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, torvalds, akpm, alan, Borislav Petkov,
	Henrique de Moraes Holschuh, Peter Zijlstra, H. Peter Anvin

On 2012-07-31 06:43 +0200, Ben Hutchings wrote:

> 3.2-stable review patch.  If anyone has any objections, please let me know.

Alas, this does not build if CONFIG_SMP is unset:

,----
| arch/x86/kernel/microcode_core.c: In function 'reload_store':
| arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
`----

Cheers,
       Sven


> From: Borislav Petkov <borislav.petkov@amd.com>
>
> commit c9fc3f778a6a215ace14ee556067c73982b6d40f upstream.
>
> Microcode reloading in a per-core manner is a very bad idea for both
> major x86 vendors. And the thing is, we have such interface with which
> we can end up with different microcode versions applied on different
> cores of an otherwise homogeneous wrt (family,model,stepping) system.
>
> So turn off the possibility of doing that per core and allow it only
> system-wide.
>
> This is a minimal fix which we'd like to see in stable too thus the
> more-or-less arbitrary decision to allow system-wide reloading only on
> the BSP:
>
> $ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
> ...
>
> and disable the interface on the other cores:
>
> $ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
> -bash: echo: write error: Invalid argument
>
> Also, allowing the reload only from one CPU (the BSP in
> that case) doesn't allow the reload procedure to degenerate
> into an O(n^2) deal when triggering reloads from all
> /sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
> simultaneously.
>
> A more generic fix will follow.
>
> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
> Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>  arch/x86/kernel/microcode_core.c |   26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c
> index fbdfc69..24b852b 100644
> --- a/arch/x86/kernel/microcode_core.c
> +++ b/arch/x86/kernel/microcode_core.c
> @@ -298,19 +298,31 @@ static ssize_t reload_store(struct device *dev,
>  			    const char *buf, size_t size)
>  {
>  	unsigned long val;
> -	int cpu = dev->id;
> -	ssize_t ret = 0;
> +	int cpu;
> +	ssize_t ret = 0, tmp_ret;
> +
> +	/* allow reload only from the BSP */
> +	if (boot_cpu_data.cpu_index != dev->id)
> +		return -EINVAL;
>  
>  	ret = kstrtoul(buf, 0, &val);
>  	if (ret)
>  		return ret;
>  
> -	if (val == 1) {
> -		get_online_cpus();
> -		if (cpu_online(cpu))
> -			ret = reload_for_cpu(cpu);
> -		put_online_cpus();
> +	if (val != 1)
> +		return size;
> +
> +	get_online_cpus();
> +	for_each_online_cpu(cpu) {
> +		tmp_ret = reload_for_cpu(cpu);
> +		if (tmp_ret != 0)
> +			pr_warn("Error reloading microcode on CPU %d\n", cpu);
> +
> +		/* save retval of the first encountered reload error */
> +		if (!ret)
> +			ret = tmp_ret;
>  	}
> +	put_online_cpus();
>  
>  	if (!ret)
>  		ret = size;

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-03  9:04   ` Sven Joachim
@ 2012-08-03  9:43     ` Borislav Petkov
  2012-08-03 12:27       ` Borislav Petkov
  0 siblings, 1 reply; 94+ messages in thread
From: Borislav Petkov @ 2012-08-03  9:43 UTC (permalink / raw)
  To: Sven Joachim
  Cc: Ben Hutchings, linux-kernel, stable, torvalds, akpm, alan,
	Henrique de Moraes Holschuh, Peter Zijlstra, H. Peter Anvin

On Fri, Aug 03, 2012 at 11:04:06AM +0200, Sven Joachim wrote:
> On 2012-07-31 06:43 +0200, Ben Hutchings wrote:
> 
> > 3.2-stable review patch.  If anyone has any objections, please let me know.
> 
> Alas, this does not build if CONFIG_SMP is unset:
> 
> ,----
> | arch/x86/kernel/microcode_core.c: In function 'reload_store':
> | arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
> `----

Crap. :-(

3.2 still has this:

<arch/x86/include/asm/processor.h>:
...
#ifdef CONFIG_SMP
        /* number of cores as seen by the OS: */
        u16                     booted_cores;
        /* Physical processor id: */
        u16                     phys_proc_id;
        /* Core id: */
        u16                     cpu_core_id;
        /* Compute unit id */
        u8                      compute_unit_id;
        /* Index into per_cpu list: */
        u16                     cpu_index;
#endif
        u32                     microcode;
} __attribute__((__aligned__(SMP_CACHE_BYTES)));
---

which got removed by

commit 141168c36cdee3ff23d9c7700b0edc47cb65479f
Author: Kevin Winchester <kjwinchester@gmail.com>
Date:   Tue Dec 20 20:52:22 2011 -0400

    x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'

Ben, you might want to backport this one too... I'll run a couple of 3.2
builds with it ontop of 3.2 to verify nothing else breaks.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-03  9:43     ` Borislav Petkov
@ 2012-08-03 12:27       ` Borislav Petkov
  2012-08-04 15:41         ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Borislav Petkov @ 2012-08-03 12:27 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Sven Joachim, linux-kernel, stable, torvalds, akpm, alan,
	Henrique de Moraes Holschuh, Peter Zijlstra, H. Peter Anvin,
	Kevin Winchester

On Fri, Aug 03, 2012 at 11:43:14AM +0200, Borislav Petkov wrote:
> On Fri, Aug 03, 2012 at 11:04:06AM +0200, Sven Joachim wrote:
> > On 2012-07-31 06:43 +0200, Ben Hutchings wrote:
> > 
> > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > 
> > Alas, this does not build if CONFIG_SMP is unset:
> > 
> > ,----
> > | arch/x86/kernel/microcode_core.c: In function 'reload_store':
> > | arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
> > `----
> 
> Crap. :-(
> 
> 3.2 still has this:
> 
> <arch/x86/include/asm/processor.h>:
> ...
> #ifdef CONFIG_SMP
>         /* number of cores as seen by the OS: */
>         u16                     booted_cores;
>         /* Physical processor id: */
>         u16                     phys_proc_id;
>         /* Core id: */
>         u16                     cpu_core_id;
>         /* Compute unit id */
>         u8                      compute_unit_id;
>         /* Index into per_cpu list: */
>         u16                     cpu_index;
> #endif
>         u32                     microcode;
> } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> ---
> 
> which got removed by
> 
> commit 141168c36cdee3ff23d9c7700b0edc47cb65479f
> Author: Kevin Winchester <kjwinchester@gmail.com>
> Date:   Tue Dec 20 20:52:22 2011 -0400
> 
>     x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
> 
> Ben, you might want to backport this one too... I'll run a couple of 3.2
> builds with it ontop of 3.2 to verify nothing else breaks.

Ok, 141168c36cdee3ff23d9c7700b0edc47cb65479f doesn't apply cleanly to
3.2-stable, as expected. I've attached a partly backported version. Why
partly? Well, it broke an UP build in mainline which got fixed later by

commit 3f806e50981825fa56a7f1938f24c0680816be45
Author: Borislav Petkov <bp@alien8.de>
Date:   Fri Feb 3 20:18:01 2012 +0100

    x86/mce/AMD: Fix UP build error
    
    141168c36cde ("x86: Simplify code by removing a !SMP #ifdefs
    from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
    around code touching struct cpuinfo_x86 members but also caused
    the following build error with Randy's randconfigs:
    
    mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'
---

which reverted what the original patch removed.

So I've taken out the parts that introduce the breakage from the
backport.

And the attached version survives a bunch of smoke tests like
all{yes,no,mod}config builds on 32 and 64-bit.

@Sven: it should fix the issue on your box too.

HTH.

--
>From 5e2fe6b301f5f969f25e4404a6b9d069957b8aeb Mon Sep 17 00:00:00 2001
From: Kevin Winchester <kjwinchester@gmail.com>
Date: Tue, 20 Dec 2011 20:52:22 -0400
Subject: [PATCH] x86: Simplify code by removing a !SMP #ifdefs from 'struct
 cpuinfo_x86'

Upstream commit: 141168c36cdee3ff23d9c7700b0edc47cb65479f

Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space.  However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files.  The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:

	   text    data     bss     dec     hex filename
	4737168	 506459	 972040	6215667	 5ed7f3	vmlinux.o.before
	4737444	 506459	 972040	6215943	 5ed907	vmlinux.o.after

for a difference of 276 bytes for an example UP config.

If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
 arch/x86/include/asm/processor.h     | 2 --
 arch/x86/kernel/amd_nb.c             | 8 ++------
 arch/x86/kernel/cpu/amd.c            | 2 --
 arch/x86/kernel/cpu/common.c         | 5 -----
 arch/x86/kernel/cpu/intel.c          | 2 --
 arch/x86/kernel/cpu/mcheck/mce.c     | 2 --
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 5 +----
 arch/x86/kernel/cpu/proc.c           | 4 +---
 drivers/edac/sb_edac.c               | 2 --
 drivers/hwmon/coretemp.c             | 7 +++----
 10 files changed, 7 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index bb3ee3629a0f..f7c89e231c6c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -99,7 +99,6 @@ struct cpuinfo_x86 {
 	u16			apicid;
 	u16			initial_apicid;
 	u16			x86_clflush_size;
-#ifdef CONFIG_SMP
 	/* number of cores as seen by the OS: */
 	u16			booted_cores;
 	/* Physical processor id: */
@@ -110,7 +109,6 @@ struct cpuinfo_x86 {
 	u8			compute_unit_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
-#endif
 	u32			microcode;
 } __attribute__((__aligned__(SMP_CACHE_BYTES)));
 
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index bae1efe6d515..be16854591cc 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -154,16 +154,14 @@ int amd_get_subcaches(int cpu)
 {
 	struct pci_dev *link = node_to_amd_nb(amd_get_nb_id(cpu))->link;
 	unsigned int mask;
-	int cuid = 0;
+	int cuid;
 
 	if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
 		return 0;
 
 	pci_read_config_dword(link, 0x1d4, &mask);
 
-#ifdef CONFIG_SMP
 	cuid = cpu_data(cpu).compute_unit_id;
-#endif
 	return (mask >> (4 * cuid)) & 0xf;
 }
 
@@ -172,7 +170,7 @@ int amd_set_subcaches(int cpu, int mask)
 	static unsigned int reset, ban;
 	struct amd_northbridge *nb = node_to_amd_nb(amd_get_nb_id(cpu));
 	unsigned int reg;
-	int cuid = 0;
+	int cuid;
 
 	if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING) || mask > 0xf)
 		return -EINVAL;
@@ -190,9 +188,7 @@ int amd_set_subcaches(int cpu, int mask)
 		pci_write_config_dword(nb->misc, 0x1b8, reg & ~0x180000);
 	}
 
-#ifdef CONFIG_SMP
 	cuid = cpu_data(cpu).compute_unit_id;
-#endif
 	mask <<= 4 * cuid;
 	mask |= (0xf ^ (1 << cuid)) << 26;
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 3524e1f5e960..ff8557e2e101 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -148,7 +148,6 @@ static void __cpuinit init_amd_k6(struct cpuinfo_x86 *c)
 
 static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_SMP
 	/* calling is from identify_secondary_cpu() ? */
 	if (!c->cpu_index)
 		return;
@@ -192,7 +191,6 @@ static void __cpuinit amd_k7_smp_check(struct cpuinfo_x86 *c)
 
 valid_k7:
 	;
-#endif
 }
 
 static void __cpuinit init_amd_k7(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index aa003b13a831..ca93cc79fbc6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -676,9 +676,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 	if (this_cpu->c_early_init)
 		this_cpu->c_early_init(c);
 
-#ifdef CONFIG_SMP
 	c->cpu_index = 0;
-#endif
 	filter_cpuid_features(c, false);
 
 	setup_smep(c);
@@ -764,10 +762,7 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 *c)
 		c->apicid = c->initial_apicid;
 # endif
 #endif
-
-#ifdef CONFIG_X86_HT
 		c->phys_proc_id = c->initial_apicid;
-#endif
 	}
 
 	setup_smep(c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 523131213f08..3e6ff6cbf42a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -181,7 +181,6 @@ static void __cpuinit trap_init_f00f_bug(void)
 
 static void __cpuinit intel_smp_check(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_SMP
 	/* calling is from identify_secondary_cpu() ? */
 	if (!c->cpu_index)
 		return;
@@ -198,7 +197,6 @@ static void __cpuinit intel_smp_check(struct cpuinfo_x86 *c)
 		WARN_ONCE(1, "WARNING: SMP operation may be unreliable"
 				    "with B stepping processors.\n");
 	}
-#endif
 }
 
 static void __cpuinit intel_workarounds(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b0f1271e3138..3b678770dba5 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -119,9 +119,7 @@ void mce_setup(struct mce *m)
 	m->time = get_seconds();
 	m->cpuvendor = boot_cpu_data.x86_vendor;
 	m->cpuid = cpuid_eax(1);
-#ifdef CONFIG_SMP
 	m->socketid = cpu_data(m->extcpu).phys_proc_id;
-#endif
 	m->apicid = cpu_data(m->extcpu).initial_apicid;
 	rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
 }
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 445a61c39dff..d4444be98912 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -65,11 +65,9 @@ struct threshold_bank {
 };
 static DEFINE_PER_CPU(struct threshold_bank * [NR_BANKS], threshold_banks);
 
-#ifdef CONFIG_SMP
 static unsigned char shared_bank[NR_BANKS] = {
 	0, 0, 0, 0, 1
 };
-#endif
 
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
 
@@ -227,10 +225,9 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
 			if (!block)
 				per_cpu(bank_map, cpu) |= (1 << bank);
-#ifdef CONFIG_SMP
+
 			if (shared_bank[bank] && c->cpu_core_id)
 				break;
-#endif
 
 			memset(&b, 0, sizeof(b));
 			b.cpu			= cpu;
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 14b23140e81f..8022c6681485 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -64,12 +64,10 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
 	struct cpuinfo_x86 *c = v;
-	unsigned int cpu = 0;
+	unsigned int cpu;
 	int i;
 
-#ifdef CONFIG_SMP
 	cpu = c->cpu_index;
-#endif
 	seq_printf(m, "processor\t: %u\n"
 		   "vendor_id\t: %s\n"
 		   "cpu family\t: %d\n"
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 18a129391c0f..0db57b594c62 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1609,11 +1609,9 @@ static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val,
 		mce->cpuvendor, mce->cpuid, mce->time,
 		mce->socketid, mce->apicid);
 
-#ifdef CONFIG_SMP
 	/* Only handle if it is the right mc controller */
 	if (cpu_data(mce->cpu).phys_proc_id != pvt->sbridge_dev->mc)
 		return NOTIFY_DONE;
-#endif
 
 	smp_rmb();
 	if ((pvt->mce_out + 1) % MCE_LOG_LEN == pvt->mce_in) {
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 0790c98e294e..19b4412ed534 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -57,16 +57,15 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
 #define TOTAL_ATTRS		(MAX_CORE_ATTRS + 1)
 #define MAX_CORE_DATA		(NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
 
-#ifdef CONFIG_SMP
 #define TO_PHYS_ID(cpu)		cpu_data(cpu).phys_proc_id
 #define TO_CORE_ID(cpu)		cpu_data(cpu).cpu_core_id
+#define TO_ATTR_NO(cpu)		(TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO)
+
+#ifdef CONFIG_SMP
 #define for_each_sibling(i, cpu)	for_each_cpu(i, cpu_sibling_mask(cpu))
 #else
-#define TO_PHYS_ID(cpu)		(cpu)
-#define TO_CORE_ID(cpu)		(cpu)
 #define for_each_sibling(i, cpu)	for (i = 0; false; )
 #endif
-#define TO_ATTR_NO(cpu)		(TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO)
 
 /*
  * Per-Core Temperature Data
-- 
1.7.11.rc1


-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-03 12:27       ` Borislav Petkov
@ 2012-08-04 15:41         ` Ben Hutchings
  2012-08-04 16:07           ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-08-04 15:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sven Joachim, linux-kernel, stable, torvalds, akpm, alan,
	Henrique de Moraes Holschuh, Peter Zijlstra, H. Peter Anvin,
	Kevin Winchester

[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]

On Fri, 2012-08-03 at 14:27 +0200, Borislav Petkov wrote:
> On Fri, Aug 03, 2012 at 11:43:14AM +0200, Borislav Petkov wrote:
> > On Fri, Aug 03, 2012 at 11:04:06AM +0200, Sven Joachim wrote:
> > > On 2012-07-31 06:43 +0200, Ben Hutchings wrote:
> > > 
> > > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > > 
> > > Alas, this does not build if CONFIG_SMP is unset:
> > > 
> > > ,----
> > > | arch/x86/kernel/microcode_core.c: In function 'reload_store':
> > > | arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
> > > `----
> > 
> > Crap. :-(
> > 
> > 3.2 still has this:
> > 
> > <arch/x86/include/asm/processor.h>:
> > ...
> > #ifdef CONFIG_SMP
> >         /* number of cores as seen by the OS: */
> >         u16                     booted_cores;
> >         /* Physical processor id: */
> >         u16                     phys_proc_id;
> >         /* Core id: */
> >         u16                     cpu_core_id;
> >         /* Compute unit id */
> >         u8                      compute_unit_id;
> >         /* Index into per_cpu list: */
> >         u16                     cpu_index;
> > #endif
> >         u32                     microcode;
> > } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> > ---
> > 
> > which got removed by
> > 
> > commit 141168c36cdee3ff23d9c7700b0edc47cb65479f
> > Author: Kevin Winchester <kjwinchester@gmail.com>
> > Date:   Tue Dec 20 20:52:22 2011 -0400
> > 
> >     x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
> > 
> > Ben, you might want to backport this one too... I'll run a couple of 3.2
> > builds with it ontop of 3.2 to verify nothing else breaks.
> 
> Ok, 141168c36cdee3ff23d9c7700b0edc47cb65479f doesn't apply cleanly to
> 3.2-stable, as expected. I've attached a partly backported version. Why
> partly? Well, it broke an UP build in mainline which got fixed later by
> 
> commit 3f806e50981825fa56a7f1938f24c0680816be45
> Author: Borislav Petkov <bp@alien8.de>
> Date:   Fri Feb 3 20:18:01 2012 +0100
> 
>     x86/mce/AMD: Fix UP build error
>     
>     141168c36cde ("x86: Simplify code by removing a !SMP #ifdefs
>     from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
>     around code touching struct cpuinfo_x86 members but also caused
>     the following build error with Randy's randconfigs:
>     
>     mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'
> ---
> 
> which reverted what the original patch removed.
> 
> So I've taken out the parts that introduce the breakage from the
> backport.
[...]

Thanks everyone for working this out.

If you combine multiple mainline commits like this, the new commit
message should refer to all of them.  I've fixed that up this time.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-04 15:41         ` Ben Hutchings
@ 2012-08-04 16:07           ` Henrique de Moraes Holschuh
  2012-08-04 17:23             ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Henrique de Moraes Holschuh @ 2012-08-04 16:07 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Borislav Petkov, Sven Joachim, linux-kernel, stable, torvalds,
	akpm, alan, Peter Zijlstra, H. Peter Anvin, Kevin Winchester

On Sat, 04 Aug 2012, Ben Hutchings wrote:
> On Fri, 2012-08-03 at 14:27 +0200, Borislav Petkov wrote:
> > On Fri, Aug 03, 2012 at 11:43:14AM +0200, Borislav Petkov wrote:
> > > On Fri, Aug 03, 2012 at 11:04:06AM +0200, Sven Joachim wrote:
> > > > On 2012-07-31 06:43 +0200, Ben Hutchings wrote:
> > > > 
> > > > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > > > 
> > > > Alas, this does not build if CONFIG_SMP is unset:
> > > > 
> > > > ,----
> > > > | arch/x86/kernel/microcode_core.c: In function 'reload_store':
> > > > | arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
> > > > `----
> > > 
> > > Crap. :-(
> > > 
> > > 3.2 still has this:
> > > 
> > > <arch/x86/include/asm/processor.h>:
> > > ...
> > > #ifdef CONFIG_SMP
> > >         /* number of cores as seen by the OS: */
> > >         u16                     booted_cores;
> > >         /* Physical processor id: */
> > >         u16                     phys_proc_id;
> > >         /* Core id: */
> > >         u16                     cpu_core_id;
> > >         /* Compute unit id */
> > >         u8                      compute_unit_id;
> > >         /* Index into per_cpu list: */
> > >         u16                     cpu_index;
> > > #endif
> > >         u32                     microcode;
> > > } __attribute__((__aligned__(SMP_CACHE_BYTES)));
> > > ---
> > > 
> > > which got removed by
> > > 
> > > commit 141168c36cdee3ff23d9c7700b0edc47cb65479f
> > > Author: Kevin Winchester <kjwinchester@gmail.com>
> > > Date:   Tue Dec 20 20:52:22 2011 -0400
> > > 
> > >     x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
> > > 
> > > Ben, you might want to backport this one too... I'll run a couple of 3.2
> > > builds with it ontop of 3.2 to verify nothing else breaks.
> > 
> > Ok, 141168c36cdee3ff23d9c7700b0edc47cb65479f doesn't apply cleanly to
> > 3.2-stable, as expected. I've attached a partly backported version. Why
> > partly? Well, it broke an UP build in mainline which got fixed later by
> > 
> > commit 3f806e50981825fa56a7f1938f24c0680816be45
> > Author: Borislav Petkov <bp@alien8.de>
> > Date:   Fri Feb 3 20:18:01 2012 +0100
> > 
> >     x86/mce/AMD: Fix UP build error
> >     
> >     141168c36cde ("x86: Simplify code by removing a !SMP #ifdefs
> >     from 'struct cpuinfo_x86'") removed a bunch of CONFIG_SMP ifdefs
> >     around code touching struct cpuinfo_x86 members but also caused
> >     the following build error with Randy's randconfigs:
> >     
> >     mce_amd.c:(.cpuinit.text+0x4723): undefined reference to `cpu_llc_shared_map'
> > ---
> > 
> > which reverted what the original patch removed.
> > 
> > So I've taken out the parts that introduce the breakage from the
> > backport.
> [...]
> 
> Thanks everyone for working this out.
> 
> If you combine multiple mainline commits like this, the new commit
> message should refer to all of them.  I've fixed that up this time.

Ben, the backport is also needed on 3.0 and 3.4, do you have your patch
queue available for download/pull somewhere?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-04 16:07           ` Henrique de Moraes Holschuh
@ 2012-08-04 17:23             ` Ben Hutchings
  2012-08-05  9:21               ` Borislav Petkov
  0 siblings, 1 reply; 94+ messages in thread
From: Ben Hutchings @ 2012-08-04 17:23 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Borislav Petkov, Sven Joachim, linux-kernel, stable, torvalds,
	akpm, alan, Peter Zijlstra, H. Peter Anvin, Kevin Winchester

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On Sat, 2012-08-04 at 13:07 -0300, Henrique de Moraes Holschuh wrote:
> On Sat, 04 Aug 2012, Ben Hutchings wrote:
> > On Fri, 2012-08-03 at 14:27 +0200, Borislav Petkov wrote:
> > > On Fri, Aug 03, 2012 at 11:43:14AM +0200, Borislav Petkov wrote:
> > > > On Fri, Aug 03, 2012 at 11:04:06AM +0200, Sven Joachim wrote:
> > > > > On 2012-07-31 06:43 +0200, Ben Hutchings wrote:
> > > > > 
> > > > > > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > > > > 
> > > > > Alas, this does not build if CONFIG_SMP is unset:
> > > > > 
> > > > > ,----
> > > > > | arch/x86/kernel/microcode_core.c: In function 'reload_store':
> > > > > | arch/x86/kernel/microcode_core.c:304:19: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
> > > > > `----
[...]
> > 
> > Thanks everyone for working this out.
> > 
> > If you combine multiple mainline commits like this, the new commit
> > message should refer to all of them.  I've fixed that up this time.
> 
> Ben, the backport is also needed on 3.0 and 3.4, do you have your patch
> queue available for download/pull somewhere?

This is in v3.2.26, tagged in git
<git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y.git>.
I'll wait for Greg to generate tarballs etc. before sending the
announcement.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
                                                         - Carolyn Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-04 17:23             ` Ben Hutchings
@ 2012-08-05  9:21               ` Borislav Petkov
  2012-08-05 18:56                 ` Ben Hutchings
  0 siblings, 1 reply; 94+ messages in thread
From: Borislav Petkov @ 2012-08-05  9:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Henrique de Moraes Holschuh, Sven Joachim, linux-kernel, stable,
	torvalds, akpm, alan, Peter Zijlstra, H. Peter Anvin,
	Kevin Winchester

On Sat, Aug 04, 2012 at 06:23:41PM +0100, Ben Hutchings wrote:

[ … ]

> > > Thanks everyone for working this out.
> > >
> > > If you combine multiple mainline commits like this, the new commit
> > > message should refer to all of them. I've fixed that up this time.

Thanks.

> > Ben, the backport is also needed on 3.0 and 3.4, do you have your patch
> > queue available for download/pull somewhere?
> 
> This is in v3.2.26, tagged in git
> <git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y.git>.
> I'll wait for Greg to generate tarballs etc. before sending the
> announcement.

Ok, guys.

Pls let me know if I should send the backported patch for 3.0 and 3.4 to
Greg or you are doing this.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface
  2012-08-05  9:21               ` Borislav Petkov
@ 2012-08-05 18:56                 ` Ben Hutchings
  0 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-08-05 18:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Henrique de Moraes Holschuh, Sven Joachim, linux-kernel, stable,
	torvalds, akpm, alan, Peter Zijlstra, H. Peter Anvin,
	Kevin Winchester

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

On Sun, 2012-08-05 at 11:21 +0200, Borislav Petkov wrote:
> On Sat, Aug 04, 2012 at 06:23:41PM +0100, Ben Hutchings wrote:
> 
> [ … ]
> 
> > > > Thanks everyone for working this out.
> > > >
> > > > If you combine multiple mainline commits like this, the new commit
> > > > message should refer to all of them. I've fixed that up this time.
> 
> Thanks.
> 
> > > Ben, the backport is also needed on 3.0 and 3.4, do you have your patch
> > > queue available for download/pull somewhere?
> > 
> > This is in v3.2.26, tagged in git
> > <git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y.git>.
> > I'll wait for Greg to generate tarballs etc. before sending the
> > announcement.
> 
> Ok, guys.
> 
> Pls let me know if I should send the backported patch for 3.0 and 3.4 to
> Greg or you are doing this.

Please do that.  They will both need commit
e826abd523913f63eb03b59746ffb16153c53dc4 ('x86, microcode:
microcode_core.c simple_strtoul cleanup') and 3.0 needs the !SMP fix.

Ben.

-- 
Ben Hutchings
Computers are not intelligent.	They only think they are.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [ 00/73] 3.2.25-stable review
  2012-08-01 12:55 ` Steven Rostedt
@ 2012-08-05 22:26   ` Ben Hutchings
  0 siblings, 0 replies; 94+ messages in thread
From: Ben Hutchings @ 2012-08-05 22:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Greg Kroah-Hartman, linux-kernel, stable, torvalds, akpm, alan,
	Jens Axboe, Tejun Heo, Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

On Wed, 2012-08-01 at 08:55 -0400, Steven Rostedt wrote:
> On Tue, 2012-07-31 at 05:43 +0100, Ben Hutchings wrote:
> > This is the start of the stable review cycle for the 3.2.25 release.
> > There are 73 patches in this series, which will be posted as responses
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Aug  2 10:00:00 UTC 2012.
> > Anything received after that time might be too late.
> > 
> > A combined patch relative to 3.2.24 will be posted as an additional
> > response to this, and the diffstat can be found below.
> 
> I tested this against configs I normally test against my own code, and I
> hit this bug:
[...]
> I then checked against 3.2.24 and it bugged too, as well as vanilla 3.2.
> I then checked against 3.3 and it did not bug. I kicked off a 'reverse'
> bisect with ktest (bad is good and good is bad) and found the fix:
> 
> commit 3f9a5aabd0a9fe0e0cd308506f48963d79169aa7
> Author: Vivek Goyal <vgoyal@redhat.com>
> floppy: Cleanup disk->queue before caling put_disk() if add_disk() was never called
> 
> I applied it against v3.2.24 and it solved the bug. I did not apply it
> against 25-rc1, but I'm pretty sure it should work for that too.
> 
> I haven't checked 3.0 if that has an issue with my config.
> 
> Anyway, please apply this patch to stable 3.2. Either for 25 or for 26.
[...]

Added to the queue for 3.2; it will go into 3.2.27-rc1 now.

Ben.

-- 
Ben Hutchings
Computers are not intelligent.	They only think they are.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2012-08-05 22:26 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-31  4:43 [ 00/73] 3.2.25-stable review Ben Hutchings
2012-07-31  4:43 ` [ 01/73] mm: reduce the amount of work done when updating min_free_kbytes Ben Hutchings
2012-07-31  4:43 ` [ 02/73] mm: compaction: allow compaction to isolate dirty pages Ben Hutchings
2012-07-31  4:43 ` [ 03/73] mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage Ben Hutchings
2012-07-31  4:43 ` [ 04/73] mm: page allocator: do not call direct reclaim for THP allocations while compaction is deferred Ben Hutchings
2012-07-31  4:43 ` [ 05/73] mm: compaction: make isolate_lru_page() filter-aware again Ben Hutchings
2012-07-31  4:43 ` [ 06/73] mm: compaction: introduce sync-light migration for use by compaction Ben Hutchings
2012-07-31 16:42   ` Herton Ronaldo Krzesinski
2012-07-31 17:00     ` Mel Gorman
2012-07-31 17:03       ` Mel Gorman
2012-07-31 23:12         ` Ben Hutchings
2012-07-31  4:43 ` [ 07/73] mm: vmscan: when reclaiming for compaction, ensure there are sufficient free pages available Ben Hutchings
2012-07-31  4:43 ` [ 08/73] mm: vmscan: do not OOM if aborting reclaim to start compaction Ben Hutchings
2012-07-31  4:43 ` [ 09/73] mm: vmscan: check if reclaim should really abort even if compaction_ready() is true for one zone Ben Hutchings
2012-07-31  4:43 ` [ 10/73] vmscan: promote shared file mapped pages Ben Hutchings
2012-07-31  4:43 ` [ 11/73] vmscan: activate executable pages after first usage Ben Hutchings
2012-07-31  4:43 ` [ 12/73] mm/vmscan.c: consider swap space when deciding whether to continue reclaim Ben Hutchings
2012-07-31  4:43 ` [ 13/73] mm: test PageSwapBacked in lumpy reclaim Ben Hutchings
2012-07-31  4:43 ` [ 14/73] mm: vmscan: convert global reclaim to per-memcg LRU lists Ben Hutchings
2012-07-31  4:43 ` [ 15/73] cpuset: mm: reduce large amounts of memory barrier related damage v3 Ben Hutchings
2012-07-31  4:43 ` [ 16/73] mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma Ben Hutchings
2012-07-31  4:43 ` [ 17/73] [SCSI] Fix NULL dereferences in scsi_cmd_to_driver Ben Hutchings
2012-07-31  4:43 ` [ 18/73] sched/nohz: Fix rq->cpu_load[] calculations Ben Hutchings
2012-07-31  4:43 ` [ 19/73] sched/nohz: Fix rq->cpu_load calculations some more Ben Hutchings
2012-07-31  4:43 ` [ 20/73] powerpc/ftrace: Fix assembly trampoline register usage Ben Hutchings
2012-07-31  4:43 ` [ 21/73] cx25821: Remove bad strcpy to read-only char* Ben Hutchings
2012-07-31  4:43 ` [ 22/73] x86: Fix boot on Twinhead H12Y Ben Hutchings
2012-07-31  4:43 ` [ 23/73] r8169: RxConfig hack for the 8168evl Ben Hutchings
2012-07-31  4:43 ` [ 24/73] cifs: when CONFIG_HIGHMEM is set, serialize the read/write kmaps Ben Hutchings
2012-07-31  4:43 ` [ 25/73] wireless: rt2x00: rt2800usb add more devices ids Ben Hutchings
2012-07-31  4:43 ` [ 26/73] wireless: rt2x00: rt2800usb more devices were identified Ben Hutchings
2012-07-31  4:43 ` [ 27/73] rt2800usb: 2001:3c17 is an RT3370 device Ben Hutchings
2012-07-31  4:43 ` [ 28/73] ARM: OMAP2+: OPP: Fix to ensure check of right oppdef after bad one Ben Hutchings
2012-08-01  1:56   ` Herton Ronaldo Krzesinski
2012-08-01  2:36     ` Ben Hutchings
2012-07-31  4:43 ` [ 29/73] usb: gadget: Fix g_ether interface link status Ben Hutchings
2012-07-31  4:43 ` [ 30/73] ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Ben Hutchings
2012-07-31  4:43 ` [ 31/73] ftrace: Disable function tracing during suspend/resume and hibernation, again Ben Hutchings
2012-07-31  4:43 ` [ 32/73] x86, microcode: microcode_core.c simple_strtoul cleanup Ben Hutchings
2012-07-31  4:43 ` [ 33/73] x86, microcode: Sanitize per-cpu microcode reloading interface Ben Hutchings
2012-08-03  9:04   ` Sven Joachim
2012-08-03  9:43     ` Borislav Petkov
2012-08-03 12:27       ` Borislav Petkov
2012-08-04 15:41         ` Ben Hutchings
2012-08-04 16:07           ` Henrique de Moraes Holschuh
2012-08-04 17:23             ` Ben Hutchings
2012-08-05  9:21               ` Borislav Petkov
2012-08-05 18:56                 ` Ben Hutchings
2012-07-31  4:43 ` [ 34/73] usbdevfs: Correct amount of data copied to user in processcompl_compat Ben Hutchings
2012-07-31  4:43 ` [ 35/73] ASoC: dapm: Fix locking during codec shutdown Ben Hutchings
2012-07-31 16:11   ` Herton Ronaldo Krzesinski
2012-07-31 16:13     ` Mark Brown
2012-07-31 23:20       ` Ben Hutchings
2012-07-31  4:43 ` [ 36/73] ext4: fix overhead calculation used by ext4_statfs() Ben Hutchings
2012-07-31  4:43 ` [ 37/73] udf: Improve table length check to avoid possible overflow Ben Hutchings
2012-07-31  4:43 ` [ 38/73] powerpc: Add "memory" attribute for mfmsr() Ben Hutchings
2012-07-31  4:43 ` [ 39/73] mwifiex: correction in mcs index check Ben Hutchings
2012-07-31  4:43 ` [ 40/73] USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces Ben Hutchings
2012-07-31  4:43 ` [ 41/73] USB: option: add ZTE MF821D Ben Hutchings
2012-07-31  4:43 ` [ 42/73] target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE Ben Hutchings
2012-07-31  4:43 ` [ 43/73] target: Add range checking to UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 44/73] target: Fix reading of data length fields for UNMAP commands Ben Hutchings
2012-07-31  4:43 ` [ 45/73] target: Fix possible integer underflow in UNMAP emulation Ben Hutchings
2012-07-31  4:43 ` [ 46/73] target: Check number of unmap descriptors against our limit Ben Hutchings
2012-07-31  4:43 ` [ 47/73] s390/idle: fix sequence handling vs cpu hotplug Ben Hutchings
2012-07-31  4:43 ` [ 48/73] rtlwifi: rtl8192de: Fix phy-based version calculation Ben Hutchings
2012-07-31  4:43 ` [ 49/73] workqueue: perform cpu down operations from low priority cpu_notifier() Ben Hutchings
2012-07-31  4:44 ` [ 50/73] ALSA: hda - Add support for Realtek ALC282 Ben Hutchings
2012-07-31  4:44 ` [ 51/73] iommu/amd: Fix hotplug with iommu=pt Ben Hutchings
2012-07-31  4:44 ` [ 52/73] drm/radeon: Try harder to avoid HW cursor ending on a multiple of 128 columns Ben Hutchings
2012-07-31  4:44 ` [ 53/73] ALSA: hda - Turn on PIN_OUT from hdmi playback prepare Ben Hutchings
2012-07-31  4:44 ` [ 54/73] block: add blk_queue_dead() Ben Hutchings
2012-07-31  4:44 ` [ 55/73] [SCSI] Fix device removal NULL pointer dereference Ben Hutchings
2012-07-31  4:44 ` [ 56/73] [SCSI] Avoid dangling pointer in scsi_requeue_command() Ben Hutchings
2012-07-31  4:44 ` [ 57/73] [SCSI] fix hot unplug vs async scan race Ben Hutchings
2012-07-31  4:44 ` [ 58/73] [SCSI] fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Ben Hutchings
2012-07-31  4:44 ` [ 59/73] [SCSI] libsas: continue revalidation Ben Hutchings
2012-07-31  4:44 ` [ 60/73] [SCSI] libsas: fix sas_discover_devices return code handling Ben Hutchings
2012-07-31  4:44 ` [ 61/73] iscsi-target: Drop bogus struct file usage for iSCSI/SCTP Ben Hutchings
2012-07-31  4:44 ` [ 62/73] mmc: sdhci-pci: CaFe has broken card detection Ben Hutchings
2012-07-31  4:44 ` [ 63/73] ext4: dont let i_reserved_meta_blocks go negative Ben Hutchings
2012-07-31  4:44 ` [ 64/73] ext4: undo ext4_calc_metadata_amount if we fail to claim space Ben Hutchings
2012-07-31  4:44 ` [ 65/73] ASoC: dapm: Fix _PRE and _POST events for DAPM performance improvements Ben Hutchings
2012-07-31  4:44 ` [ 66/73] locks: fix checking of fcntl_setlease argument Ben Hutchings
2012-07-31  4:44 ` [ 67/73] ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check Ben Hutchings
2012-07-31  4:44 ` [ 68/73] drm/radeon: fix bo creation retry path Ben Hutchings
2012-07-31  4:44 ` [ 69/73] drm/radeon: fix non revealent error message Ben Hutchings
2012-07-31  4:44 ` [ 70/73] drm/radeon: fix hotplug of DP to DVI|HDMI passive adapters (v2) Ben Hutchings
2012-07-31  4:44 ` [ 71/73] drm/radeon: on hotplug force link training to happen (v2) Ben Hutchings
2012-07-31  4:44 ` [ 72/73] Btrfs: call the ordered free operation without any locks held Ben Hutchings
2012-07-31  4:44 ` [ 73/73] nouveau: Fix alignment requirements on src and dst addresses Ben Hutchings
2012-07-31  5:00 ` [ 00/73] 3.2.25-stable review Ben Hutchings
2012-08-01 12:55 ` Steven Rostedt
2012-08-05 22:26   ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).