linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3.14 000/122] 3.14.25-stable review
@ 2014-11-19 20:50 Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 001/122] Revert "drivers/net: Disable UFO through virtio" Greg Kroah-Hartman
                   ` (117 more replies)
  0 siblings, 118 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, satoru.takeuchi,
	shuah.kh, stable

This is the start of the stable review cycle for the 3.14.25 release.
There are 122 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Nov 21 20:51:52 UTC 2014.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.14.25-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 3.14.25-rc1

Vlastimil Babka <vbabka@suse.cz>
    mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced

Mel Gorman <mgorman@suse.de>
    mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY

Tim Chen <tim.c.chen@linux.intel.com>
    fs/superblock: avoid locking counting inodes and dentries before reclaiming them

Dave Chinner <david@fromorbit.com>
    fs/superblock: unregister sb shrinker before ->kill_sb()

Hugh Dickins <hughd@google.com>
    mm: fix direct reclaim writeback regression

Shaohua Li <shli@kernel.org>
    x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB

Vlastimil Babka <vbabka@suse.cz>
    mm, compaction: properly signal and act upon lock and need_sched() contention

Vlastimil Babka <vbabka@suse.cz>
    mm/compaction: avoid rescanning pageblocks in isolate_freepages

Vlastimil Babka <vbabka@suse.cz>
    mm/compaction: do not count migratepages when unnecessary

David Rientjes <rientjes@google.com>
    mm, compaction: terminate async compaction when rescheduling

David Rientjes <rientjes@google.com>
    mm, compaction: embed migration mode in compact_control

David Rientjes <rientjes@google.com>
    mm, compaction: add per-zone migration pfn cache for async compaction

David Rientjes <rientjes@google.com>
    mm, compaction: return failed migration target pages back to freelist

David Rientjes <rientjes@google.com>
    mm, migration: add destination page freeing callback

Vlastimil Babka <vbabka@suse.cz>
    mm/compaction: cleanup isolate_freepages()

Heesub Shin <heesub.shin@samsung.com>
    mm/compaction: clean up unused code lines

Fabian Frederick <fabf@skynet.be>
    mm/readahead.c: inline ra_submit

Al Viro <viro@zeniv.linux.org.uk>
    callers of iov_copy_from_user_atomic() don't need pagecache_disable()

Sasha Levin <sasha.levin@oracle.com>
    mm: remove read_cache_page_async()

Johannes Weiner <hannes@cmpxchg.org>
    mm: madvise: fix MADV_WILLNEED on shmem swapouts

Johannes Weiner <hannes@cmpxchg.org>
    mm + fs: prepare for non-page entries in page cache radix trees

Johannes Weiner <hannes@cmpxchg.org>
    mm: filemap: move radix tree hole searching here

Johannes Weiner <hannes@cmpxchg.org>
    mm: shmem: save one radix tree lookup when truncating swapped pages

Johannes Weiner <hannes@cmpxchg.org>
    lib: radix-tree: add radix_tree_delete_item()

Quentin Casasnovas <quentin.casasnovas@oracle.com>
    regmap: fix kernel hang on regmap_bulk_write with zero val_count.

Emmanuel Grumbach <emmanuel.grumbach@intel.com>
    iwlwifi: configure the LTR

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix panic on duplicate ASCONF chunks

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix remote memory pressure from excessive queueing

Nadav Amit <namit@cs.technion.ac.il>
    KVM: x86: Don't report guest userspace emulation error to userspace

Vince Weaver <vincent.weaver@maine.edu>
    perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge

Pawel Moll <pawel.moll@arm.com>
    perf: Handle compat ioctl

Pali Rohár <pali.rohar@gmail.com>
    dell-wmi: Fix access out of memory

Pranith Kumar <bobby.prani@gmail.com>
    rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads

Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    rcu: Make callers awaken grace-period kthread

Steven Whitehouse <swhiteho@redhat.com>
    GFS2: Fix address space from page function

Ben Dooks <ben.dooks@codethink.co.uk>
    ARM: probes: fix instruction fetch order with <asm/opcodes.h>

Pablo Neira <pablo@netfilter.org>
    netfilter: xt_bpf: add mising opaque struct sk_filter definition

Arturo Borrero <arturo.borrero.glez@gmail.com>
    netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()

Houcheng Lin <houcheng@gmail.com>
    netfilter: nf_log: release skbuff on nlmsg put failure

Florian Westphal <fw@strlen.de>
    netfilter: nfnetlink_log: fix maximum packet length logged to userspace

Florian Westphal <fw@strlen.de>
    netfilter: nf_log: account for size of NLMSG_DONE attribute

Dan Carpenter <dan.carpenter@oracle.com>
    netfilter: ipset: off by one in ip_set_nfnl_get_byindex()

Andrey Vagin <avagin@openvz.org>
    ipc: always handle a new value of auto_msgmni

Devesh Sharma <devesh.sharma@emulex.com>
    IB/core: Clear AH attr variable to prevent garbage data

Bjorn Helgaas <bhelgaas@google.com>
    clocksource: Remove "weak" from clocksource_default_clock() declaration

Bjorn Helgaas <bhelgaas@google.com>
    kgdb: Remove "weak" from kgdb_arch_pc() declaration

Bjorn Helgaas <bhelgaas@google.com>
    vmcore: Remove "weak" from function declarations

Bjorn Helgaas <bhelgaas@google.com>
    memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration

Dan Carpenter <dan.carpenter@oracle.com>
    media: ttusb-dec: buffer overflow in ioctl

Trond Myklebust <trond.myklebust@primarydata.com>
    NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE

Trond Myklebust <trond.myklebust@primarydata.com>
    NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return

Jan Kara <jack@suse.cz>
    nfs: Fix use of uninitialized variable in nfs_getattr()

Trond Myklebust <trond.myklebust@primarydata.com>
    NFS: Don't try to reclaim delegation open state if recovery failed

Trond Myklebust <trond.myklebust@primarydata.com>
    NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired

NeilBrown <neilb@suse.de>
    md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN

Junjie Mao <eternal.n08@gmail.com>
    x86, kaslr: Prevent .bss from overlaping initrd

Borislav Petkov <bp@suse.de>
    x86, microcode, AMD: Fix ucode patch stashing on 32-bit

Borislav Petkov <bp@suse.de>
    x86, microcode, AMD: Fix early ucode loading on 32-bit

Krzysztof Kozlowski <k.kozlowski@samsung.com>
    power: bq2415x_charger: Fix memory leak on DTS parsing error

Krzysztof Kozlowski <k.kozlowski@samsung.com>
    power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle

Krzysztof Kozlowski <k.kozlowski@samsung.com>
    power: charger-manager: Fix accessing invalidated power supply after charger unbind

Krzysztof Kozlowski <k.kozlowski@samsung.com>
    power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind

Pali Rohár <pali.rohar@gmail.com>
    Input: alps - ignore bad data on Dell Latitudes E6440 and E7440

Pali Rohár <pali.rohar@gmail.com>
    Input: alps - allow up to 2 invalid packets without resetting device

Pali Rohár <pali.rohar@gmail.com>
    Input: alps - ignore potential bare packets when device is out of sync

Takashi Iwai <tiwai@suse.de>
    Input: synaptics - add min/max quirk for Lenovo T440s

Heinz Mauelshagen <heinzm@redhat.com>
    dm raid: ensure superblock's size matches device's logical block size

Joe Thornber <ejt@redhat.com>
    dm btree: fix a recursion depth bug in btree walking code

Mikulas Patocka <mpatocka@redhat.com>
    dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks

Jan Kara <jack@suse.cz>
    block: Fix computation of merged request priority

Helge Deller <deller@gmx.de>
    parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls

Christoph Hellwig <hch@lst.de>
    scsi: only re-lock door after EH on devices that were reset

William Cohen <wcohen@redhat.com>
    Correct the race condition in aarch64_insn_patch_text_sync()

Peng Tao <tao.peng@primarydata.com>
    nfs: fix pnfs direct write memory leak

Simon Horman <horms+renesas@verge.net.au>
    ata: sata_rcar: Disable DIPM mode for r8a7790 ES1

Stefan Richter <stefanr@s5r6.in-berlin.de>
    firewire: cdev: prevent kernel stack leaking into ioctl arguments

Kyle McMartin <kyle@redhat.com>
    arm64: __clear_user: handle exceptions on strb

Joe Thornber <ejt@redhat.com>
    dm thin: grab a virtual cell before looking up the mapping

Roger Quadros <rogerq@ti.com>
    pinctrl: dra: dt-bindings: Fix output pull up/down

Will Deacon <will.deacon@arm.com>
    ARM: 8191/1: decompressor: ensure I-side picks up relocated code

Nathan Lynch <nathan_lynch@mentor.com>
    ARM: 8198/1: make kuser helpers depend on MMU

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon: add missing crtc unlock when setting up the MC

Alex Deucher <alexander.deucher@amd.com>
    drm/radeon: make sure mode init is complete in bandwidth_update

Jammy Zhou <Jammy.Zhou@amd.com>
    drm/radeon: set correct CE ram size for CIK

Johannes Berg <johannes.berg@intel.com>
    mac80211: fix use-after-free in defragmentation

Luciano Coelho <luciano.coelho@intel.com>
    mac80211: schedule the actual switch of the station before CSA count 0

Luciano Coelho <luciano.coelho@intel.com>
    mac80211: use secondary channel offset IE also beacons during CSA

Johannes Berg <johannes@sipsolutions.net>
    mac80211: properly flush delayed scan work on interface removal

Junjie Mao <eternal.n08@gmail.com>
    mac80211_hwsim: release driver when ieee80211_register_hw fails

Herbert Xu <herbert@gondor.apana.org.au>
    macvtap: Fix csum_start when VLAN tags are present

Ilya Dryomov <idryomov@redhat.com>
    libceph: do not crash on large auth tickets

Max Filippov <jcmvbkbc@gmail.com>
    xtensa: re-wire umount syscall to sys_oldumount

Takashi Iwai <tiwai@suse.de>
    ALSA: usb-audio: Fix memory leak in FTU quirk

Tejun Heo <tj@kernel.org>
    ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks

James Ralston <james.d.ralston@intel.com>
    ahci: Add Device IDs for Intel Sunrise Point PCH

Miklos Szeredi <mszeredi@suse.cz>
    audit: keep inode pinned

Richard Guy Briggs <rgb@redhat.com>
    audit: AUDIT_FEATURE_CHANGE message format missing delimiting space

Richard Guy Briggs <rgb@redhat.com>
    audit: correct AUDIT_GET_FEATURE return message type

Andy Lutomirski <luto@amacapital.net>
    x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit

Herbert Xu <herbert@gondor.apana.org.au>
    tun: Fix csum_start with VLAN acceleration

Greg Kurz <gkurz@linux.vnet.ibm.com>
    hwrng: pseries - port to new read API and fix stack corruption

Cristian Stoica <cristian.stoica@freescale.com>
    crypto: caam - remove duplicated sg copy functions

Cristian Stoica <cristian.stoica@freescale.com>
    crypto: caam - fix missing dma unmap on error path

Weijie Yang <weijie.yang@samsung.com>
    zram: avoid kunmap_atomic() of a NULL pointer

Andreas Larsson <andreas@gaisler.com>
    sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks

David S. Miller <davem@davemloft.net>
    sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().

David S. Miller <davem@davemloft.net>
    sparc64: Fix crashes in schizo_pcierr_intr_other().

Dwight Engen <dwight.engen@oracle.com>
    sunvdc: don't call VD_OP_GET_VTOC

Dwight Engen <dwight.engen@oracle.com>
    vio: fix reuse of vio_dring slot

Dwight Engen <dwight.engen@oracle.com>
    sunvdc: limit each sg segment to a page

Allen Pais <allen.pais@oracle.com>
    sunvdc: compute vdisk geometry from capacity

Allen Pais <allen.pais@oracle.com>
    sunvdc: add cdrom and v1.1 protocol support

Enric Balletbo i Serra <eballetbo@iseebcn.com>
    smsc911x: power-up phydev before doing a software reset.

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix memory leak in auth key management

Daniel Borkmann <dborkman@redhat.com>
    net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet

Marcelo Leitner <mleitner@redhat.com>
    vxlan: Do not reuse sockets for a different address family

Steffen Klassert <steffen.klassert@secunet.com>
    gre6: Move the setting of dev->iflink into the ndo_init functions.

Steffen Klassert <steffen.klassert@secunet.com>
    sit: Use ipip6_tunnel_init as the ndo_init function.

Steffen Klassert <steffen.klassert@secunet.com>
    vti6: Use vti6_dev_init as the ndo_init function.

Steffen Klassert <steffen.klassert@secunet.com>
    ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Revert "drivers/net: Disable UFO through virtio"


-------------

Diffstat:

 .../devicetree/bindings/ata/sata_rcar.txt          |   3 +-
 Makefile                                           |   4 +-
 arch/arm/boot/compressed/head.S                    |  20 +-
 arch/arm/kernel/kprobes-common.c                   |  19 +-
 arch/arm/kernel/kprobes-thumb.c                    |  21 +-
 arch/arm/kernel/kprobes.c                          |   9 +-
 arch/arm/mm/Kconfig                                |   1 +
 arch/arm64/kernel/insn.c                           |   5 +-
 arch/arm64/lib/clear_user.S                        |   2 +-
 arch/parisc/include/uapi/asm/shmbuf.h              |  25 +-
 arch/parisc/kernel/syscall_table.S                 |   8 +-
 arch/sparc/include/asm/atomic_32.h                 |   2 +-
 arch/sparc/include/asm/cmpxchg_32.h                |  12 +-
 arch/sparc/include/asm/vio.h                       |  14 +-
 arch/sparc/kernel/pci_schizo.c                     |   6 +-
 arch/sparc/kernel/smp_64.c                         |   4 +
 arch/sparc/lib/atomic32.c                          |  27 ++
 arch/x86/boot/compressed/Makefile                  |   4 +-
 arch/x86/boot/compressed/head_32.S                 |   5 +-
 arch/x86/boot/compressed/head_64.S                 |   5 +-
 arch/x86/boot/compressed/misc.c                    |  13 +-
 arch/x86/boot/compressed/mkpiggy.c                 |   9 +-
 arch/x86/kernel/cpu/microcode/amd_early.c          |  33 +-
 arch/x86/kernel/cpu/perf_event_intel.c             |   3 +
 arch/x86/kernel/ptrace.c                           |  11 +-
 arch/x86/kvm/x86.c                                 |   2 +-
 arch/x86/mm/pgtable.c                              |  21 +-
 arch/x86/tools/calc_run_size.pl                    |  30 ++
 arch/xtensa/include/uapi/asm/unistd.h              |   3 +-
 drivers/ata/ahci.c                                 |  19 +-
 drivers/ata/sata_rcar.c                            |  10 +
 drivers/base/regmap/regmap.c                       |   6 +-
 drivers/block/sunvdc.c                             | 176 ++++++++---
 drivers/block/zram/zram_drv.c                      |   3 +-
 drivers/char/hw_random/pseries-rng.c               |  11 +-
 drivers/crypto/caam/caamhash.c                     |  22 +-
 drivers/crypto/caam/key_gen.c                      |  29 +-
 drivers/crypto/caam/sg_sw_sec4.h                   |  54 ----
 drivers/firewire/core-cdev.c                       |   3 +-
 drivers/gpu/drm/radeon/cik.c                       |   7 +-
 drivers/gpu/drm/radeon/evergreen.c                 |   4 +
 drivers/gpu/drm/radeon/r100.c                      |   3 +
 drivers/gpu/drm/radeon/rs600.c                     |   3 +
 drivers/gpu/drm/radeon/rs690.c                     |   3 +
 drivers/gpu/drm/radeon/rv515.c                     |   3 +
 drivers/gpu/drm/radeon/si.c                        |   3 +
 drivers/infiniband/core/uverbs_cmd.c               |   2 +
 drivers/input/mouse/alps.c                         |  28 +-
 drivers/input/mouse/synaptics.c                    |   5 +-
 drivers/md/dm-bufio.c                              |  12 +-
 drivers/md/dm-raid.c                               |  11 +-
 drivers/md/dm-thin.c                               |  16 +-
 drivers/md/md.c                                    |   4 +
 drivers/md/persistent-data/dm-btree-internal.h     |   6 +
 drivers/md/persistent-data/dm-btree-spine.c        |   2 +-
 drivers/md/persistent-data/dm-btree.c              |  24 +-
 drivers/media/usb/ttusb-dec/ttusbdecfe.c           |   3 +
 drivers/net/ethernet/smsc/smsc911x.c               |  46 +++
 drivers/net/ethernet/sun/sunvnet.c                 |   4 +-
 drivers/net/macvtap.c                              |  15 +-
 drivers/net/tun.c                                  |  35 +--
 drivers/net/virtio_net.c                           |  24 +-
 drivers/net/vxlan.c                                |  28 +-
 drivers/net/wireless/iwlwifi/iwl-trans.h           |   2 +
 drivers/net/wireless/iwlwifi/mvm/fw-api-power.h    |  35 ++-
 drivers/net/wireless/iwlwifi/mvm/fw-api.h          |   1 +
 drivers/net/wireless/iwlwifi/mvm/fw.c              |   9 +
 drivers/net/wireless/iwlwifi/mvm/ops.c             |   1 +
 drivers/net/wireless/iwlwifi/pcie/trans.c          |  16 +-
 drivers/net/wireless/mac80211_hwsim.c              |   4 +-
 drivers/platform/x86/dell-wmi.c                    |  12 +-
 drivers/power/bq2415x_charger.c                    |  23 +-
 drivers/power/charger-manager.c                    | 163 ++++++----
 drivers/scsi/scsi_error.c                          |   4 +-
 fs/btrfs/compression.c                             |   2 +-
 fs/btrfs/file.c                                    |   5 -
 fs/cramfs/inode.c                                  |   3 +-
 fs/fuse/file.c                                     |   2 -
 fs/gfs2/meta_io.c                                  |   5 +
 fs/gfs2/meta_io.h                                  |   3 +
 fs/gfs2/ops_fstype.c                               |   2 +-
 fs/ioprio.c                                        |  14 +-
 fs/jffs2/fs.c                                      |   2 +-
 fs/nfs/blocklayout/blocklayout.c                   |   2 +-
 fs/nfs/delegation.c                                |  25 +-
 fs/nfs/delegation.h                                |   1 +
 fs/nfs/direct.c                                    |   1 +
 fs/nfs/inode.c                                     |   2 +-
 fs/nfs/nfs4proc.c                                  |  68 ++--
 fs/super.c                                         |  16 +-
 include/dt-bindings/pinctrl/dra.h                  |   4 +-
 include/linux/clocksource.h                        |   2 +-
 include/linux/compaction.h                         |   4 +-
 include/linux/crash_dump.h                         |  15 +-
 include/linux/kgdb.h                               |   2 +-
 include/linux/memory.h                             |   2 +-
 include/linux/migrate.h                            |  11 +-
 include/linux/mm.h                                 |  11 +-
 include/linux/mmzone.h                             |   5 +-
 include/linux/nfs_xdr.h                            |  11 +
 include/linux/pagemap.h                            |  30 +-
 include/linux/pagevec.h                            |   5 +
 include/linux/power/charger-manager.h              |   3 -
 include/linux/radix-tree.h                         |   5 +-
 include/linux/shmem_fs.h                           |   1 +
 include/net/sctp/sctp.h                            |   5 +
 include/net/sctp/sm.h                              |   6 +-
 include/trace/events/compaction.h                  |  25 +-
 include/uapi/linux/netfilter/xt_bpf.h              |   2 +
 ipc/ipc_sysctl.c                                   |   3 +-
 kernel/audit.c                                     |   4 +-
 kernel/audit_tree.c                                |   1 +
 kernel/events/core.c                               |  23 +-
 kernel/rcu/tree.c                                  |  22 +-
 lib/radix-tree.c                                   | 106 ++-----
 mm/compaction.c                                    | 249 ++++++++-------
 mm/filemap.c                                       | 341 +++++++++++++++++----
 mm/internal.h                                      |  22 +-
 mm/madvise.c                                       |   2 +-
 mm/memory-failure.c                                |   4 +-
 mm/memory_hotplug.c                                |   2 +-
 mm/mempolicy.c                                     |   4 +-
 mm/migrate.c                                       |  56 +++-
 mm/mincore.c                                       |  20 +-
 mm/page_alloc.c                                    |  62 ++--
 mm/readahead.c                                     |  27 +-
 mm/shmem.c                                         | 122 ++------
 mm/swap.c                                          |  51 +++
 mm/truncate.c                                      |  74 ++++-
 mm/vmscan.c                                        |  36 ++-
 net/ceph/crypto.c                                  | 169 +++++++---
 net/ipv6/ip6_gre.c                                 |   5 +-
 net/ipv6/ip6_tunnel.c                              |  10 +-
 net/ipv6/ip6_vti.c                                 |  11 +-
 net/ipv6/sit.c                                     |  15 +-
 net/mac80211/ibss.c                                |   2 +-
 net/mac80211/ieee80211_i.h                         |   3 +-
 net/mac80211/iface.c                               |   7 +-
 net/mac80211/mesh.c                                |   2 +-
 net/mac80211/mlme.c                                |   5 +-
 net/mac80211/rx.c                                  |  14 +-
 net/mac80211/spectmgmt.c                           |  18 +-
 net/netfilter/ipset/ip_set_core.c                  |   2 +-
 net/netfilter/nfnetlink_log.c                      |  31 +-
 net/netfilter/nft_compat.c                         |   2 +-
 net/sctp/associola.c                               |   2 +
 net/sctp/auth.c                                    |   2 -
 net/sctp/inqueue.c                                 |  33 +-
 net/sctp/sm_make_chunk.c                           | 102 +++---
 net/sctp/sm_statefuns.c                            |  21 +-
 sound/usb/mixer_quirks.c                           |   6 +
 151 files changed, 2038 insertions(+), 1172 deletions(-)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 001/122] Revert "drivers/net: Disable UFO through virtio"
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 002/122] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
                   ` (116 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ben Hutchings, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

This reverts commit 2b52d6c6beda6308ba95024a1eba1dfc9515ba32 which was
commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 upstream.

Ben writes:

	Please drop this patch for 3.14 and 3.17.  It causes problems
	for migration of VMs and we're probably going to revert part of
	this.  The following patch ("drivers/net, ipv6: Select IPv6
	fragment idents for virtio UFO packets") might no longer apply,
	in which case you can drop that as well until we have this
	sorted out upstream.

Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/macvtap.c    |   13 ++++++++-----
 drivers/net/tun.c        |   19 ++++++++-----------
 drivers/net/virtio_net.c |   24 ++++++++++--------------
 3 files changed, 26 insertions(+), 30 deletions(-)

--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
 static const struct proto_ops macvtap_socket_ops;
 
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
-		      NETIF_F_TSO6)
+		      NETIF_F_TSO6 | NETIF_F_UFO)
 #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
 #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
 
@@ -570,8 +570,6 @@ static int macvtap_skb_from_vnet_hdr(str
 			gso_type = SKB_GSO_TCPV6;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-			pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
-				     current->comm);
 			gso_type = SKB_GSO_UDP;
 			if (skb->protocol == htons(ETH_P_IPV6))
 				ipv6_proxy_select_ident(skb);
@@ -619,6 +617,8 @@ static void macvtap_skb_to_vnet_hdr(cons
 			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (sinfo->gso_type & SKB_GSO_TCPV6)
 			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+		else if (sinfo->gso_type & SKB_GSO_UDP)
+			vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
 		else
 			BUG();
 		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -953,6 +953,9 @@ static int set_offload(struct macvtap_qu
 			if (arg & TUN_F_TSO6)
 				feature_mask |= NETIF_F_TSO6;
 		}
+
+		if (arg & TUN_F_UFO)
+			feature_mask |= NETIF_F_UFO;
 	}
 
 	/* tun/tap driver inverts the usage for TSO offloads, where
@@ -963,7 +966,7 @@ static int set_offload(struct macvtap_qu
 	 * When user space turns off TSO, we turn off GSO/LRO so that
 	 * user-space will not receive TSO frames.
 	 */
-	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
+	if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
 		features |= RX_OFFLOADS;
 	else
 		features &= ~RX_OFFLOADS;
@@ -1064,7 +1067,7 @@ static long macvtap_ioctl(struct file *f
 	case TUNSETOFFLOAD:
 		/* let the user check for future flags */
 		if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
-			    TUN_F_TSO_ECN))
+			    TUN_F_TSO_ECN | TUN_F_UFO))
 			return -EINVAL;
 
 		rtnl_lock();
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -175,7 +175,7 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6)
+			  NETIF_F_TSO6|NETIF_F_UFO)
 
 	int			vnet_hdr_sz;
 	int			sndbuf;
@@ -1153,20 +1153,10 @@ static ssize_t tun_get_user(struct tun_s
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-		{
-			static bool warned;
-
-			if (!warned) {
-				warned = true;
-				netdev_warn(tun->dev,
-					    "%s: using disabled UFO feature; please fix this program\n",
-					    current->comm);
-			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 			if (skb->protocol == htons(ETH_P_IPV6))
 				ipv6_proxy_select_ident(skb);
 			break;
-		}
 		default:
 			tun->dev->stats.rx_frame_errors++;
 			kfree_skb(skb);
@@ -1266,6 +1256,8 @@ static ssize_t tun_put_user(struct tun_s
 				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 			else if (sinfo->gso_type & SKB_GSO_TCPV6)
 				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+			else if (sinfo->gso_type & SKB_GSO_UDP)
+				gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
 			else {
 				pr_err("unexpected GSO type: "
 				       "0x%x, gso_size %d, hdr_len %d\n",
@@ -1795,6 +1787,11 @@ static int set_offload(struct tun_struct
 				features |= NETIF_F_TSO6;
 			arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
 		}
+
+		if (arg & TUN_F_UFO) {
+			features |= NETIF_F_UFO;
+			arg &= ~TUN_F_UFO;
+		}
 	}
 
 	/* This gives the user a way to test for new features in future by
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -496,17 +496,8 @@ static void receive_buf(struct receive_q
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
-		{
-			static bool warned;
-
-			if (!warned) {
-				warned = true;
-				netdev_warn(dev,
-					    "host using disabled UFO feature; please fix it\n");
-			}
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 			break;
-		}
 		case VIRTIO_NET_HDR_GSO_TCPV6:
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
 			break;
@@ -845,6 +836,8 @@ static int xmit_skb(struct send_queue *s
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
 			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
+			hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
 		else
 			BUG();
 		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
@@ -1664,7 +1657,7 @@ static int virtnet_probe(struct virtio_d
 			dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
 
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
-			dev->hw_features |= NETIF_F_TSO
+			dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
 				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
 		}
 		/* Individual feature bits: what can host handle? */
@@ -1674,9 +1667,11 @@ static int virtnet_probe(struct virtio_d
 			dev->hw_features |= NETIF_F_TSO6;
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
 			dev->hw_features |= NETIF_F_TSO_ECN;
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
+			dev->hw_features |= NETIF_F_UFO;
 
 		if (gso)
-			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
+			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
@@ -1716,7 +1711,8 @@ static int virtnet_probe(struct virtio_d
 	/* If we can receive ANY GSO packets, we must allocate large ones. */
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
-	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
 		vi->big_packets = true;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
@@ -1907,9 +1903,9 @@ static struct virtio_device_id id_table[
 static unsigned int features[] = {
 	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
 	VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
-	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
+	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
 	VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
-	VIRTIO_NET_F_GUEST_ECN,
+	VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
 	VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 002/122] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 001/122] Revert "drivers/net: Disable UFO through virtio" Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 003/122] vti6: Use vti6_dev_init " Greg Kroah-Hartman
                   ` (115 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit 6c6151daaf2d8dc2046d9926539feed5f66bf74e ]

ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_tunnel.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -272,9 +272,6 @@ static int ip6_tnl_create2(struct net_de
 	int err;
 
 	t = netdev_priv(dev);
-	err = ip6_tnl_dev_init(dev);
-	if (err < 0)
-		goto out;
 
 	err = register_netdevice(dev);
 	if (err < 0)
@@ -1456,6 +1453,7 @@ ip6_tnl_change_mtu(struct net_device *de
 
 
 static const struct net_device_ops ip6_tnl_netdev_ops = {
+	.ndo_init	= ip6_tnl_dev_init,
 	.ndo_uninit	= ip6_tnl_dev_uninit,
 	.ndo_start_xmit = ip6_tnl_xmit,
 	.ndo_do_ioctl	= ip6_tnl_ioctl,
@@ -1547,16 +1545,10 @@ static int __net_init ip6_fb_tnl_dev_ini
 	struct ip6_tnl *t = netdev_priv(dev);
 	struct net *net = dev_net(dev);
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
-	int err = ip6_tnl_dev_init_gen(dev);
-
-	if (err)
-		return err;
 
 	t->parms.proto = IPPROTO_IPV6;
 	dev_hold(dev);
 
-	ip6_tnl_link_config(t);
-
 	rcu_assign_pointer(ip6n->tnls_wc[0], t);
 	return 0;
 }



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 003/122] vti6: Use vti6_dev_init as the ndo_init function.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 001/122] Revert "drivers/net: Disable UFO through virtio" Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 002/122] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 004/122] sit: Use ipip6_tunnel_init " Greg Kroah-Hartman
                   ` (114 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit 16a0231bf7dc3fb37e9b1f1cb1a277dc220b5c5e ]

vti6_dev_init() sets the dev->iflink via a call to
vti6_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for vti6 tunnels. Fix this by using vti6_dev_init() as the
ndo_init function. Then vti6_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_vti.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -172,10 +172,6 @@ static int vti6_tnl_create2(struct net_d
 	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
 	int err;
 
-	err = vti6_dev_init(dev);
-	if (err < 0)
-		goto out;
-
 	err = register_netdevice(dev);
 	if (err < 0)
 		goto out;
@@ -693,6 +689,7 @@ static int vti6_change_mtu(struct net_de
 }
 
 static const struct net_device_ops vti6_netdev_ops = {
+	.ndo_init	= vti6_dev_init,
 	.ndo_uninit	= vti6_dev_uninit,
 	.ndo_start_xmit = vti6_tnl_xmit,
 	.ndo_do_ioctl	= vti6_ioctl,
@@ -772,16 +769,10 @@ static int __net_init vti6_fb_tnl_dev_in
 	struct ip6_tnl *t = netdev_priv(dev);
 	struct net *net = dev_net(dev);
 	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
-	int err = vti6_dev_init_gen(dev);
-
-	if (err)
-		return err;
 
 	t->parms.proto = IPPROTO_IPV6;
 	dev_hold(dev);
 
-	vti6_link_config(t);
-
 	rcu_assign_pointer(ip6n->tnls_wc[0], t);
 	return 0;
 }



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 004/122] sit: Use ipip6_tunnel_init as the ndo_init function.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 003/122] vti6: Use vti6_dev_init " Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 005/122] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
                   ` (113 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit ebe084aafb7e93adf210e80043c9f69adf56820d ]

ipip6_tunnel_init() sets the dev->iflink via a call to
ipip6_tunnel_bind_dev(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the
ndo_init function. Then ipip6_tunnel_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/sit.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -195,10 +195,8 @@ static int ipip6_tunnel_create(struct ne
 	struct sit_net *sitn = net_generic(net, sit_net_id);
 	int err;
 
-	err = ipip6_tunnel_init(dev);
-	if (err < 0)
-		goto out;
-	ipip6_tunnel_clone_6rd(dev, sitn);
+	memcpy(dev->dev_addr, &t->parms.iph.saddr, 4);
+	memcpy(dev->broadcast, &t->parms.iph.daddr, 4);
 
 	if ((__force u16)t->parms.i_flags & SIT_ISATAP)
 		dev->priv_flags |= IFF_ISATAP;
@@ -207,7 +205,8 @@ static int ipip6_tunnel_create(struct ne
 	if (err < 0)
 		goto out;
 
-	strcpy(t->parms.name, dev->name);
+	ipip6_tunnel_clone_6rd(dev, sitn);
+
 	dev->rtnl_link_ops = &sit_link_ops;
 
 	dev_hold(dev);
@@ -1321,6 +1320,7 @@ static int ipip6_tunnel_change_mtu(struc
 }
 
 static const struct net_device_ops ipip6_netdev_ops = {
+	.ndo_init	= ipip6_tunnel_init,
 	.ndo_uninit	= ipip6_tunnel_uninit,
 	.ndo_start_xmit	= sit_tunnel_xmit,
 	.ndo_do_ioctl	= ipip6_tunnel_ioctl,
@@ -1367,9 +1367,7 @@ static int ipip6_tunnel_init(struct net_
 
 	tunnel->dev = dev;
 	tunnel->net = dev_net(dev);
-
-	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
-	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
+	strcpy(tunnel->parms.name, dev->name);
 
 	ipip6_tunnel_bind_dev(dev);
 	dev->tstats = alloc_percpu(struct pcpu_sw_netstats);
@@ -1401,7 +1399,6 @@ static int __net_init ipip6_fb_tunnel_in
 
 	tunnel->dev = dev;
 	tunnel->net = dev_net(dev);
-	strcpy(tunnel->parms.name, dev->name);
 
 	iph->version		= 4;
 	iph->protocol		= IPPROTO_IPV6;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 005/122] gre6: Move the setting of dev->iflink into the ndo_init functions.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 004/122] sit: Use ipip6_tunnel_init " Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 006/122] vxlan: Do not reuse sockets for a different address family Greg Kroah-Hartman
                   ` (112 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Steffen Klassert, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steffen Klassert <steffen.klassert@secunet.com>

[ Upstream commit f03eb128e3f4276f46442d14f3b8f864f3775821 ]

Otherwise it gets overwritten by register_netdev().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv6/ip6_gre.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -962,8 +962,6 @@ static void ip6gre_tnl_link_config(struc
 	else
 		dev->flags &= ~IFF_POINTOPOINT;
 
-	dev->iflink = p->link;
-
 	/* Precalculate GRE options length */
 	if (t->parms.o_flags&(GRE_CSUM|GRE_KEY|GRE_SEQ)) {
 		if (t->parms.o_flags&GRE_CSUM)
@@ -1273,6 +1271,7 @@ static int ip6gre_tunnel_init(struct net
 		u64_stats_init(&ip6gre_tunnel_stats->syncp);
 	}
 
+	dev->iflink = tunnel->parms.link;
 
 	return 0;
 }
@@ -1474,6 +1473,8 @@ static int ip6gre_tap_init(struct net_de
 		u64_stats_init(&ip6gre_tap_stats->syncp);
 	}
 
+	dev->iflink = tunnel->parms.link;
+
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 006/122] vxlan: Do not reuse sockets for a different address family
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 005/122] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 007/122] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
                   ` (111 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jean-Tsung Hsiao,
	Marcelo Ricardo Leitner, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Marcelo Leitner <mleitner@redhat.com>

[ Upstream commit 19ca9fc1445b76b60d34148f7ff837b055f5dcf3 ]

Currently, we only match against local port number in order to reuse
socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound
to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will
follow. The following steps reproduce it:

   # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \
       srcport 5000 6000 dev eth0
   # ip link add vxlan7 type vxlan id 43 group ff0e::110 \
       srcport 5000 6000 dev eth0
   # ip link set vxlan6 up
   # ip link set vxlan7 up
   <panic>

[    4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
...
[    4.188076] Call Trace:
[    4.188085]  [<ffffffff81667c4a>] ? ipv6_sock_mc_join+0x3a/0x630
[    4.188098]  [<ffffffffa05a6ad6>] vxlan_igmp_join+0x66/0xd0 [vxlan]
[    4.188113]  [<ffffffff810a3430>] process_one_work+0x220/0x710
[    4.188125]  [<ffffffff810a33c4>] ? process_one_work+0x1b4/0x710
[    4.188138]  [<ffffffff810a3a3b>] worker_thread+0x11b/0x3a0
[    4.188149]  [<ffffffff810a3920>] ? process_one_work+0x710/0x710

So address family must also match in order to reuse a socket.

Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/vxlan.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -279,13 +279,15 @@ static inline struct vxlan_rdst *first_r
 	return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
 }
 
-/* Find VXLAN socket based on network namespace and UDP port */
-static struct vxlan_sock *vxlan_find_sock(struct net *net, __be16 port)
+/* Find VXLAN socket based on network namespace, address family and UDP port */
+static struct vxlan_sock *vxlan_find_sock(struct net *net,
+					  sa_family_t family, __be16 port)
 {
 	struct vxlan_sock *vs;
 
 	hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
-		if (inet_sk(vs->sock->sk)->inet_sport == port)
+		if (inet_sk(vs->sock->sk)->inet_sport == port &&
+		    inet_sk(vs->sock->sk)->sk.sk_family == family)
 			return vs;
 	}
 	return NULL;
@@ -304,11 +306,12 @@ static struct vxlan_dev *vxlan_vs_find_v
 }
 
 /* Look up VNI in a per net namespace table */
-static struct vxlan_dev *vxlan_find_vni(struct net *net, u32 id, __be16 port)
+static struct vxlan_dev *vxlan_find_vni(struct net *net, u32 id,
+					sa_family_t family, __be16 port)
 {
 	struct vxlan_sock *vs;
 
-	vs = vxlan_find_sock(net, port);
+	vs = vxlan_find_sock(net, family, port);
 	if (!vs)
 		return NULL;
 
@@ -1872,7 +1875,8 @@ static void vxlan_xmit_one(struct sk_buf
 			struct vxlan_dev *dst_vxlan;
 
 			ip_rt_put(rt);
-			dst_vxlan = vxlan_find_vni(dev_net(dev), vni, dst_port);
+			dst_vxlan = vxlan_find_vni(dev_net(dev), vni,
+						   dst->sa.sa_family, dst_port);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -1925,7 +1929,8 @@ static void vxlan_xmit_one(struct sk_buf
 			struct vxlan_dev *dst_vxlan;
 
 			dst_release(ndst);
-			dst_vxlan = vxlan_find_vni(dev_net(dev), vni, dst_port);
+			dst_vxlan = vxlan_find_vni(dev_net(dev), vni,
+						   dst->sa.sa_family, dst_port);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -2083,6 +2088,7 @@ static int vxlan_init(struct net_device
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+	bool ipv6 = vxlan->flags & VXLAN_F_IPV6;
 	struct vxlan_sock *vs;
 	int i;
 
@@ -2098,7 +2104,8 @@ static int vxlan_init(struct net_device
 
 
 	spin_lock(&vn->sock_lock);
-	vs = vxlan_find_sock(dev_net(dev), vxlan->dst_port);
+	vs = vxlan_find_sock(dev_net(dev), ipv6 ? AF_INET6 : AF_INET,
+			     vxlan->dst_port);
 	if (vs) {
 		/* If we have a socket with same port already, reuse it */
 		atomic_inc(&vs->refcnt);
@@ -2566,7 +2573,7 @@ struct vxlan_sock *vxlan_sock_add(struct
 		return vs;
 
 	spin_lock(&vn->sock_lock);
-	vs = vxlan_find_sock(net, port);
+	vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port);
 	if (vs) {
 		if (vs->rcv == rcv)
 			atomic_inc(&vs->refcnt);
@@ -2712,7 +2719,8 @@ static int vxlan_newlink(struct net *net
 	if (data[IFLA_VXLAN_PORT])
 		vxlan->dst_port = nla_get_be16(data[IFLA_VXLAN_PORT]);
 
-	if (vxlan_find_vni(net, vni, vxlan->dst_port)) {
+	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
+			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
 		return -EEXIST;
 	}



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 007/122] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 006/122] vxlan: Do not reuse sockets for a different address family Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 008/122] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
                   ` (110 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit e40607cbe270a9e8360907cb1e62ddf0736e4864 ]

An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/sm_make_chunk.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2609,6 +2609,9 @@ do_addr_param:
 		addr_param = param.v + sizeof(sctp_addip_param_t);
 
 		af = sctp_get_af_specific(param_type2af(param.p->type));
+		if (af == NULL)
+			break;
+
 		af->from_addr_param(&addr, addr_param,
 				    htons(asoc->peer.port), 0);
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 008/122] net: sctp: fix memory leak in auth key management
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 007/122] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:50 ` [PATCH 3.14 009/122] smsc911x: power-up phydev before doing a software reset Greg Kroah-Hartman
                   ` (109 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

[ Upstream commit 4184b2a79a7612a9272ce20d639934584a1f3786 ]

A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/auth.c |    2 --
 1 file changed, 2 deletions(-)

--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -862,8 +862,6 @@ int sctp_auth_set_key(struct sctp_endpoi
 		list_add(&cur_key->key_list, sh_keys);
 
 	cur_key->key = key;
-	sctp_auth_key_hold(key);
-
 	return 0;
 nomem:
 	if (!replace)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 009/122] smsc911x: power-up phydev before doing a software reset.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 008/122] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
@ 2014-11-19 20:50 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 010/122] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
                   ` (108 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Enric Balletbo i Serra, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Enric Balletbo i Serra <eballetbo@iseebcn.com>

[ Upstream commit ccf899a27c08038db91765ff12bb0380dcd85887 ]

With commit be9dad1f9f26604fb ("net: phy: suspend phydev when going
to HALTED"), the PHY device will be put in a low-power mode using
BMCR_PDOWN if the the interface is set down. The smsc911x driver does
a software_reset opening the device driver (ndo_open). In such case,
the PHY must be powered-up before access to any register and before
calling the software_reset function. Otherwise, as the PHY is powered
down the software reset fails and the interface can not be enabled
again.

This patch fixes this scenario that is easy to reproduce setting down
the network interface and setting up again.

    $ ifconfig eth0 down
    $ ifconfig eth0 up
    ifconfig: SIOCSIFFLAGS: Input/output error

Signed-off-by: Enric Balletbo i Serra <eballetbo@iseebcn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/smsc/smsc911x.c |   46 +++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1341,6 +1341,42 @@ static void smsc911x_rx_multicast_update
 	spin_unlock(&pdata->mac_lock);
 }
 
+static int smsc911x_phy_general_power_up(struct smsc911x_data *pdata)
+{
+	int rc = 0;
+
+	if (!pdata->phy_dev)
+		return rc;
+
+	/* If the internal PHY is in General Power-Down mode, all, except the
+	 * management interface, is powered-down and stays in that condition as
+	 * long as Phy register bit 0.11 is HIGH.
+	 *
+	 * In that case, clear the bit 0.11, so the PHY powers up and we can
+	 * access to the phy registers.
+	 */
+	rc = phy_read(pdata->phy_dev, MII_BMCR);
+	if (rc < 0) {
+		SMSC_WARN(pdata, drv, "Failed reading PHY control reg");
+		return rc;
+	}
+
+	/* If the PHY general power-down bit is not set is not necessary to
+	 * disable the general power down-mode.
+	 */
+	if (rc & BMCR_PDOWN) {
+		rc = phy_write(pdata->phy_dev, MII_BMCR, rc & ~BMCR_PDOWN);
+		if (rc < 0) {
+			SMSC_WARN(pdata, drv, "Failed writing PHY control reg");
+			return rc;
+		}
+
+		usleep_range(1000, 1500);
+	}
+
+	return 0;
+}
+
 static int smsc911x_phy_disable_energy_detect(struct smsc911x_data *pdata)
 {
 	int rc = 0;
@@ -1414,6 +1450,16 @@ static int smsc911x_soft_reset(struct sm
 	int ret;
 
 	/*
+	 * Make sure to power-up the PHY chip before doing a reset, otherwise
+	 * the reset fails.
+	 */
+	ret = smsc911x_phy_general_power_up(pdata);
+	if (ret) {
+		SMSC_WARN(pdata, drv, "Failed to power-up the PHY chip");
+		return ret;
+	}
+
+	/*
 	 * LAN9210/LAN9211/LAN9220/LAN9221 chips have an internal PHY that
 	 * are initialized in a Energy Detect Power-Down mode that prevents
 	 * the MAC chip to be software reseted. So we have to wakeup the PHY



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 010/122] sunvdc: add cdrom and v1.1 protocol support
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2014-11-19 20:50 ` [PATCH 3.14 009/122] smsc911x: power-up phydev before doing a software reset Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 011/122] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
                   ` (107 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Allen Pais <allen.pais@oracle.com>

[ Upstream commit 9bce21828d54a95143f1b74619705c2dd8e88b92 ]

Interpret the media type from v1.1 protocol to support CDROM/DVD.

For v1.0 protocol, a disk's size continues to be calculated from the
geometry returned by the vdisk server. The geometry returned by the server
can be less than the actual number of sectors available in the backing
image/device due to the rounding in the division used to compute the
geometry in the vdisk server.

In v1.1 protocol a disk's actual size in sectors is returned during the
handshake. Use this size when v1.1 protocol is negotiated. Since this size
will always be larger than the former geometry computed size, disks created
under v1.0 will be forwards compatible to v1.1, but not vice versa.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/vio.h |   12 +++-
 drivers/block/sunvdc.c       |  109 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 101 insertions(+), 20 deletions(-)

--- a/arch/sparc/include/asm/vio.h
+++ b/arch/sparc/include/asm/vio.h
@@ -118,12 +118,18 @@ struct vio_disk_attr_info {
 	u8			vdisk_type;
 #define VD_DISK_TYPE_SLICE	0x01 /* Slice in block device	*/
 #define VD_DISK_TYPE_DISK	0x02 /* Entire block device	*/
-	u16			resv1;
+	u8			vdisk_mtype;		/* v1.1 */
+#define VD_MEDIA_TYPE_FIXED	0x01 /* Fixed device */
+#define VD_MEDIA_TYPE_CD	0x02 /* CD Device    */
+#define VD_MEDIA_TYPE_DVD	0x03 /* DVD Device   */
+	u8			resv1;
 	u32			vdisk_block_size;
 	u64			operations;
-	u64			vdisk_size;
+	u64			vdisk_size;		/* v1.1 */
 	u64			max_xfer_size;
-	u64			resv2[2];
+	u32			phys_block_size;	/* v1.2 */
+	u32			resv2;
+	u64			resv3[1];
 };
 
 struct vio_disk_desc {
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -9,6 +9,7 @@
 #include <linux/blkdev.h>
 #include <linux/hdreg.h>
 #include <linux/genhd.h>
+#include <linux/cdrom.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/completion.h>
@@ -22,8 +23,8 @@
 
 #define DRV_MODULE_NAME		"sunvdc"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"1.0"
-#define DRV_MODULE_RELDATE	"June 25, 2007"
+#define DRV_MODULE_VERSION	"1.1"
+#define DRV_MODULE_RELDATE	"February 13, 2013"
 
 static char version[] =
 	DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
@@ -65,6 +66,7 @@ struct vdc_port {
 	u64			operations;
 	u32			vdisk_size;
 	u8			vdisk_type;
+	u8			vdisk_mtype;
 
 	char			disk_name[32];
 
@@ -79,9 +81,16 @@ static inline struct vdc_port *to_vdc_po
 
 /* Ordered from largest major to lowest */
 static struct vio_version vdc_versions[] = {
+	{ .major = 1, .minor = 1 },
 	{ .major = 1, .minor = 0 },
 };
 
+static inline int vdc_version_supported(struct vdc_port *port,
+					u16 major, u16 minor)
+{
+	return port->vio.ver.major == major && port->vio.ver.minor >= minor;
+}
+
 #define VDCBLK_NAME	"vdisk"
 static int vdc_major;
 #define PARTITION_SHIFT	3
@@ -103,9 +112,41 @@ static int vdc_getgeo(struct block_devic
 	return 0;
 }
 
+/* Add ioctl/CDROM_GET_CAPABILITY to support cdrom_id in udev
+ * when vdisk_mtype is VD_MEDIA_TYPE_CD or VD_MEDIA_TYPE_DVD.
+ * Needed to be able to install inside an ldom from an iso image.
+ */
+static int vdc_ioctl(struct block_device *bdev, fmode_t mode,
+		     unsigned command, unsigned long argument)
+{
+	int i;
+	struct gendisk *disk;
+
+	switch (command) {
+	case CDROMMULTISESSION:
+		pr_debug(PFX "Multisession CDs not supported\n");
+		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
+			if (put_user(0, (char __user *)(argument + i)))
+				return -EFAULT;
+		return 0;
+
+	case CDROM_GET_CAPABILITY:
+		disk = bdev->bd_disk;
+
+		if (bdev->bd_disk && (disk->flags & GENHD_FL_CD))
+			return 0;
+		return -EINVAL;
+
+	default:
+		pr_debug(PFX "ioctl %08x not supported\n", command);
+		return -EINVAL;
+	}
+}
+
 static const struct block_device_operations vdc_fops = {
 	.owner		= THIS_MODULE,
 	.getgeo		= vdc_getgeo,
+	.ioctl		= vdc_ioctl,
 };
 
 static void vdc_finish(struct vio_driver_state *vio, int err, int waiting_for)
@@ -165,9 +206,9 @@ static int vdc_handle_attr(struct vio_dr
 	struct vio_disk_attr_info *pkt = arg;
 
 	viodbg(HS, "GOT ATTR stype[0x%x] ops[%llx] disk_size[%llu] disk_type[%x] "
-	       "xfer_mode[0x%x] blksz[%u] max_xfer[%llu]\n",
+	       "mtype[0x%x] xfer_mode[0x%x] blksz[%u] max_xfer[%llu]\n",
 	       pkt->tag.stype, pkt->operations,
-	       pkt->vdisk_size, pkt->vdisk_type,
+	       pkt->vdisk_size, pkt->vdisk_type, pkt->vdisk_mtype,
 	       pkt->xfer_mode, pkt->vdisk_block_size,
 	       pkt->max_xfer_size);
 
@@ -192,8 +233,11 @@ static int vdc_handle_attr(struct vio_dr
 		}
 
 		port->operations = pkt->operations;
-		port->vdisk_size = pkt->vdisk_size;
 		port->vdisk_type = pkt->vdisk_type;
+		if (vdc_version_supported(port, 1, 1)) {
+			port->vdisk_size = pkt->vdisk_size;
+			port->vdisk_mtype = pkt->vdisk_mtype;
+		}
 		if (pkt->max_xfer_size < port->max_xfer_size)
 			port->max_xfer_size = pkt->max_xfer_size;
 		port->vdisk_block_size = pkt->vdisk_block_size;
@@ -663,18 +707,25 @@ static int probe_disk(struct vdc_port *p
 		return err;
 	}
 
-	err = generic_request(port, VD_OP_GET_DISKGEOM,
-			      &port->geom, sizeof(port->geom));
-	if (err < 0) {
-		printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
-		       "error %d\n", err);
-		return err;
+	if (vdc_version_supported(port, 1, 1)) {
+		/* vdisk_size should be set during the handshake, if it wasn't
+		 * then the underlying disk is reserved by another system
+		 */
+		if (port->vdisk_size == -1)
+			return -ENODEV;
+	} else {
+		err = generic_request(port, VD_OP_GET_DISKGEOM,
+				      &port->geom, sizeof(port->geom));
+		if (err < 0) {
+			printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
+			       "error %d\n", err);
+			return err;
+		}
+		port->vdisk_size = ((u64)port->geom.num_cyl *
+				    (u64)port->geom.num_hd *
+				    (u64)port->geom.num_sec);
 	}
 
-	port->vdisk_size = ((u64)port->geom.num_cyl *
-			    (u64)port->geom.num_hd *
-			    (u64)port->geom.num_sec);
-
 	q = blk_init_queue(do_vdc_request, &port->vio.lock);
 	if (!q) {
 		printk(KERN_ERR PFX "%s: Could not allocate queue.\n",
@@ -704,9 +755,32 @@ static int probe_disk(struct vdc_port *p
 
 	set_capacity(g, port->vdisk_size);
 
-	printk(KERN_INFO PFX "%s: %u sectors (%u MB)\n",
+	if (vdc_version_supported(port, 1, 1)) {
+		switch (port->vdisk_mtype) {
+		case VD_MEDIA_TYPE_CD:
+			pr_info(PFX "Virtual CDROM %s\n", port->disk_name);
+			g->flags |= GENHD_FL_CD;
+			g->flags |= GENHD_FL_REMOVABLE;
+			set_disk_ro(g, 1);
+			break;
+
+		case VD_MEDIA_TYPE_DVD:
+			pr_info(PFX "Virtual DVD %s\n", port->disk_name);
+			g->flags |= GENHD_FL_CD;
+			g->flags |= GENHD_FL_REMOVABLE;
+			set_disk_ro(g, 1);
+			break;
+
+		case VD_MEDIA_TYPE_FIXED:
+			pr_info(PFX "Virtual Hard disk %s\n", port->disk_name);
+			break;
+		}
+	}
+
+	pr_info(PFX "%s: %u sectors (%u MB) protocol %d.%d\n",
 	       g->disk_name,
-	       port->vdisk_size, (port->vdisk_size >> (20 - 9)));
+	       port->vdisk_size, (port->vdisk_size >> (20 - 9)),
+	       port->vio.ver.major, port->vio.ver.minor);
 
 	add_disk(g);
 
@@ -765,6 +839,7 @@ static int vdc_port_probe(struct vio_dev
 	else
 		snprintf(port->disk_name, sizeof(port->disk_name),
 			 VDCBLK_NAME "%c", 'a' + ((int)vdev->dev_no % 26));
+	port->vdisk_size = -1;
 
 	err = vio_driver_init(&port->vio, vdev, VDEV_DISK,
 			      vdc_versions, ARRAY_SIZE(vdc_versions),



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 011/122] sunvdc: compute vdisk geometry from capacity
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 010/122] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 012/122] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
                   ` (106 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Allen Pais <allen.pais@oracle.com>

[ Upstream commit de5b73f08468b4fc5e2f6d1505f650262622f78b ]

The LDom diskserver doesn't return reliable geometry data. In addition,
the types for all fields in the vio_disk_geom are u16, which were being
truncated in the cast into the u8's of the Linux struct hd_geometry.

Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
manner consistent with xen-blkfront::blkif_getgeo().

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -70,7 +70,6 @@ struct vdc_port {
 
 	char			disk_name[32];
 
-	struct vio_disk_geom	geom;
 	struct vio_disk_vtoc	label;
 };
 
@@ -103,11 +102,15 @@ static inline u32 vdc_tx_dring_avail(str
 static int vdc_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 {
 	struct gendisk *disk = bdev->bd_disk;
-	struct vdc_port *port = disk->private_data;
+	sector_t nsect = get_capacity(disk);
+	sector_t cylinders = nsect;
 
-	geo->heads = (u8) port->geom.num_hd;
-	geo->sectors = (u8) port->geom.num_sec;
-	geo->cylinders = port->geom.num_cyl;
+	geo->heads = 0xff;
+	geo->sectors = 0x3f;
+	sector_div(cylinders, geo->heads * geo->sectors);
+	geo->cylinders = cylinders;
+	if ((sector_t)(geo->cylinders + 1) * geo->heads * geo->sectors < nsect)
+		geo->cylinders = 0xffff;
 
 	return 0;
 }
@@ -714,16 +717,18 @@ static int probe_disk(struct vdc_port *p
 		if (port->vdisk_size == -1)
 			return -ENODEV;
 	} else {
+		struct vio_disk_geom geom;
+
 		err = generic_request(port, VD_OP_GET_DISKGEOM,
-				      &port->geom, sizeof(port->geom));
+				      &geom, sizeof(geom));
 		if (err < 0) {
 			printk(KERN_ERR PFX "VD_OP_GET_DISKGEOM returns "
 			       "error %d\n", err);
 			return err;
 		}
-		port->vdisk_size = ((u64)port->geom.num_cyl *
-				    (u64)port->geom.num_hd *
-				    (u64)port->geom.num_sec);
+		port->vdisk_size = ((u64)geom.num_cyl *
+				    (u64)geom.num_hd *
+				    (u64)geom.num_sec);
 	}
 
 	q = blk_init_queue(do_vdc_request, &port->vio.lock);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 012/122] sunvdc: limit each sg segment to a page
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 011/122] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 013/122] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
                   ` (105 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit 5eed69ffd248c9f68f56c710caf07db134aef28b ]

ldc_map_sg() could fail its check that the number of pages referred to
by the sg scatterlist was <= the number of cookies.

This fixes the issue by doing a similar thing to the xen-blkfront driver,
ensuring that the scatterlist will only ever contain a segment count <=
port->ring_cookies, and each segment will be page aligned, and <= page
size. This ensures that the scatterlist is always mappable.

Orabug: 19347817
OraBZ: 15945

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -747,6 +747,10 @@ static int probe_disk(struct vdc_port *p
 
 	port->disk = g;
 
+	/* Each segment in a request is up to an aligned page in size. */
+	blk_queue_segment_boundary(q, PAGE_SIZE - 1);
+	blk_queue_max_segment_size(q, PAGE_SIZE);
+
 	blk_queue_max_segments(q, port->ring_cookies);
 	blk_queue_max_hw_sectors(q, port->max_xfer_size);
 	g->major = vdc_major;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 013/122] vio: fix reuse of vio_dring slot
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 012/122] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 014/122] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
                   ` (104 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit d0aedcd4f14a22e23b313f42b7e6e6ebfc0fbc31 ]

vio_dring_avail() will allow use of every dring entry, but when the last
entry is allocated then dr->prod == dr->cons which is indistinguishable from
the ring empty condition. This causes the next allocation to reuse an entry.
When this happens in sunvdc, the server side vds driver begins nack'ing the
messages and ends up resetting the ldc channel. This problem does not effect
sunvnet since it checks for < 2.

The fix here is to just never allocate the very last dring slot so that full
and empty are not the same condition. The request start path was changed to
check for the ring being full a bit earlier, and to stop the blk_queue if
there is no space left. The blk_queue will be restarted once the ring is
only half full again. The number of ring entries was increased to 512 which
matches the sunvnet and Solaris vdc drivers, and greatly reduces the
frequency of hitting the ring full condition and the associated blk_queue
stop/starting. The checks in sunvent were adjusted to account for
vio_dring_avail() returning 1 less.

Orabug: 19441666
OraBZ: 14983

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/vio.h       |    2 -
 drivers/block/sunvdc.c             |   39 +++++++++++++++++++++----------------
 drivers/net/ethernet/sun/sunvnet.c |    4 +--
 3 files changed, 26 insertions(+), 19 deletions(-)

--- a/arch/sparc/include/asm/vio.h
+++ b/arch/sparc/include/asm/vio.h
@@ -265,7 +265,7 @@ static inline u32 vio_dring_avail(struct
 				  unsigned int ring_size)
 {
 	return (dr->pending -
-		((dr->prod - dr->cons) & (ring_size - 1)));
+		((dr->prod - dr->cons) & (ring_size - 1)) - 1);
 }
 
 #define VIO_MAX_TYPE_LEN	32
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -33,7 +33,7 @@ MODULE_DESCRIPTION("Sun LDOM virtual dis
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
-#define VDC_TX_RING_SIZE	256
+#define VDC_TX_RING_SIZE	512
 
 #define WAITING_FOR_LINK_UP	0x01
 #define WAITING_FOR_TX_SPACE	0x02
@@ -283,7 +283,9 @@ static void vdc_end_one(struct vdc_port
 
 	__blk_end_request(req, (desc->status ? -EIO : 0), desc->size);
 
-	if (blk_queue_stopped(port->disk->queue))
+	/* restart blk queue when ring is half emptied */
+	if (blk_queue_stopped(port->disk->queue) &&
+	    vdc_tx_dring_avail(dr) * 100 / VDC_TX_RING_SIZE >= 50)
 		blk_start_queue(port->disk->queue);
 }
 
@@ -435,12 +437,6 @@ static int __send_request(struct request
 	for (i = 0; i < nsg; i++)
 		len += sg[i].length;
 
-	if (unlikely(vdc_tx_dring_avail(dr) < 1)) {
-		blk_stop_queue(port->disk->queue);
-		err = -ENOMEM;
-		goto out;
-	}
-
 	desc = vio_dring_cur(dr);
 
 	err = ldc_map_sg(port->vio.lp, sg, nsg,
@@ -480,21 +476,32 @@ static int __send_request(struct request
 		port->req_id++;
 		dr->prod = (dr->prod + 1) & (VDC_TX_RING_SIZE - 1);
 	}
-out:
 
 	return err;
 }
 
-static void do_vdc_request(struct request_queue *q)
+static void do_vdc_request(struct request_queue *rq)
 {
-	while (1) {
-		struct request *req = blk_fetch_request(q);
+	struct request *req;
 
-		if (!req)
+	while ((req = blk_peek_request(rq)) != NULL) {
+		struct vdc_port *port;
+		struct vio_dring_state *dr;
+
+		port = req->rq_disk->private_data;
+		dr = &port->vio.drings[VIO_DRIVER_TX_RING];
+		if (unlikely(vdc_tx_dring_avail(dr) < 1))
+			goto wait;
+
+		blk_start_request(req);
+
+		if (__send_request(req) < 0) {
+			blk_requeue_request(rq, req);
+wait:
+			/* Avoid pointless unplugs. */
+			blk_stop_queue(rq);
 			break;
-
-		if (__send_request(req) < 0)
-			__blk_end_request_all(req, -EIO);
+		}
 	}
 }
 
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -656,7 +656,7 @@ static int vnet_start_xmit(struct sk_buf
 	spin_lock_irqsave(&port->vio.lock, flags);
 
 	dr = &port->vio.drings[VIO_DRIVER_TX_RING];
-	if (unlikely(vnet_tx_dring_avail(dr) < 2)) {
+	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
 		if (!netif_queue_stopped(dev)) {
 			netif_stop_queue(dev);
 
@@ -704,7 +704,7 @@ static int vnet_start_xmit(struct sk_buf
 	dev->stats.tx_bytes += skb->len;
 
 	dr->prod = (dr->prod + 1) & (VNET_TX_RING_SIZE - 1);
-	if (unlikely(vnet_tx_dring_avail(dr) < 2)) {
+	if (unlikely(vnet_tx_dring_avail(dr) < 1)) {
 		netif_stop_queue(dev);
 		if (vnet_tx_dring_avail(dr) > VNET_TX_WAKEUP_THRESH(dr))
 			netif_wake_queue(dev);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 014/122] sunvdc: dont call VD_OP_GET_VTOC
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 013/122] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 015/122] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
                   ` (103 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dwight Engen, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dwight Engen <dwight.engen@oracle.com>

[ Upstream commit 85b0c6e62c48bb9179fd5b3e954f362fb346cbd5 ]

The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.

Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/block/sunvdc.c |    9 ---------
 1 file changed, 9 deletions(-)

--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -69,8 +69,6 @@ struct vdc_port {
 	u8			vdisk_mtype;
 
 	char			disk_name[32];
-
-	struct vio_disk_vtoc	label;
 };
 
 static inline struct vdc_port *to_vdc_port(struct vio_driver_state *vio)
@@ -710,13 +708,6 @@ static int probe_disk(struct vdc_port *p
 	if (comp.err)
 		return comp.err;
 
-	err = generic_request(port, VD_OP_GET_VTOC,
-			      &port->label, sizeof(port->label));
-	if (err < 0) {
-		printk(KERN_ERR PFX "VD_OP_GET_VTOC returns error %d\n", err);
-		return err;
-	}
-
 	if (vdc_version_supported(port, 1, 1)) {
 		/* vdisk_size should be set during the handshake, if it wasn't
 		 * then the underlying disk is reserved by another system



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 015/122] sparc64: Fix crashes in schizo_pcierr_intr_other().
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 014/122] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 016/122] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
                   ` (102 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Meelis Roos, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>

[ Upstream commit 7da89a2a3776442a57e918ca0b8678d1b16a7072 ]

Meelis Roos reports crashes during bootup on a V480 that look like
this:

====================
[   61.300577] PCI: Scanning PBM /pci@9,600000
[   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
[   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
[   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
[   61.331173] pci_bus 0003:00: root bus resource [bus 00]
[   61.385344] Unable to handle kernel NULL pointer dereference
[   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
[   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
[   61.401716]               \|/ ____ \|/
[   61.401716]               "@'/ .. \`@"
[   61.401716]               /_| \__/ |_\
[   61.401716]                  \__U_/
[   61.416362] swapper/0(0): Oops [#1]
[   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24
[   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
[   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
[   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
[   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
[   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
[   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
[   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
[   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
[   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
[   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
[   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
[   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
[   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
[   61.529099] Call Trace:
[   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
[   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
[   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
[   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
[   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
====================

The problem is that pbm->pci_bus->self is NULL.

This code is trying to go through the standard PCI config space
interfaces to read the PCI controller's PCI_STATUS register.

This doesn't work, because we more often than not do not enumerate
the PCI controller as a bonafide PCI device during the OF device
node scan.  Therefore bus->self remains NULL.

Existing common code for PSYCHO and PSYCHO-like PCI controllers
handles this properly, by doing the config space access directly.

Do the same here, pbm->pci_ops->{read,write}().

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/pci_schizo.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/sparc/kernel/pci_schizo.c
+++ b/arch/sparc/kernel/pci_schizo.c
@@ -581,7 +581,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 {
 	unsigned long csr_reg, csr, csr_error_bits;
 	irqreturn_t ret = IRQ_NONE;
-	u16 stat;
+	u32 stat;
 
 	csr_reg = pbm->pbm_regs + SCHIZO_PCI_CTRL;
 	csr = upa_readq(csr_reg);
@@ -617,7 +617,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 			       pbm->name);
 		ret = IRQ_HANDLED;
 	}
-	pci_read_config_word(pbm->pci_bus->self, PCI_STATUS, &stat);
+	pbm->pci_ops->read(pbm->pci_bus, 0, PCI_STATUS, 2, &stat);
 	if (stat & (PCI_STATUS_PARITY |
 		    PCI_STATUS_SIG_TARGET_ABORT |
 		    PCI_STATUS_REC_TARGET_ABORT |
@@ -625,7 +625,7 @@ static irqreturn_t schizo_pcierr_intr_ot
 		    PCI_STATUS_SIG_SYSTEM_ERROR)) {
 		printk("%s: PCI bus error, PCI_STATUS[%04x]\n",
 		       pbm->name, stat);
-		pci_write_config_word(pbm->pci_bus->self, PCI_STATUS, 0xffff);
+		pbm->pci_ops->write(pbm->pci_bus, 0, PCI_STATUS, 2, 0xffff);
 		ret = IRQ_HANDLED;
 	}
 	return ret;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 016/122] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 015/122] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 017/122] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
                   ` (101 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Meelis Roos, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <davem@davemloft.net>

[ Upstream commit ab5c780913bca0a5763ca05dd5c2cb5cb08ccb26 ]

Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:

====================
[  188.275021] ===============================
[  188.309351] [ INFO: suspicious RCU usage. ]
[  188.343737] 3.18.0-rc3-00068-g20f3963-dirty #54 Not tainted
[  188.394786] -------------------------------
[  188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
illegally while idle!
[  188.505235]
other info that might help us debug this:

[  188.554230]
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
[  188.637587] RCU used illegally from extended quiescent state!
[  188.690684] 3 locks held by swapper/7/0:
[  188.721932]  #0:  (&x->wait#11){......}, at: [<0000000000495de8>] complete+0x8/0x60
[  188.797994]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000048510c>] try_to_wake_up+0xc/0x400
[  188.881343]  #2:  (rcu_read_lock){......}, at: [<000000000048a910>] select_task_rq_fair+0x90/0xb40
[  188.973043]stack backtrace:
[  188.993879] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.18.0-rc3-00068-g20f3963-dirty #54
[  189.076187] Call Trace:
[  189.089719]  [0000000000499360] lockdep_rcu_suspicious+0xe0/0x100
[  189.147035]  [000000000048a99c] select_task_rq_fair+0x11c/0xb40
[  189.202253]  [00000000004852d8] try_to_wake_up+0x1d8/0x400
[  189.252258]  [000000000048554c] default_wake_function+0xc/0x20
[  189.306435]  [0000000000495554] __wake_up_common+0x34/0x80
[  189.356448]  [00000000004955b4] __wake_up_locked+0x14/0x40
[  189.406456]  [0000000000495e08] complete+0x28/0x60
[  189.448142]  [0000000000636e28] blk_end_sync_rq+0x8/0x20
[  189.496057]  [0000000000639898] __blk_mq_end_request+0x18/0x60
[  189.550249]  [00000000006ee014] scsi_end_request+0x94/0x180
[  189.601286]  [00000000006ee334] scsi_io_completion+0x1d4/0x600
[  189.655463]  [00000000006e51c4] scsi_finish_command+0xc4/0xe0
[  189.708598]  [00000000006ed958] scsi_softirq_done+0x118/0x140
[  189.761735]  [00000000006398ec] __blk_mq_complete_request_remote+0xc/0x20
[  189.827383]  [00000000004c75d0] generic_smp_call_function_single_interrupt+0x150/0x1c0
[  189.906581]  [000000000043e514] smp_call_function_single_client+0x14/0x40
====================

Based almost entirely upon a patch by Paul E. McKenney.

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/kernel/smp_64.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -823,13 +823,17 @@ void arch_send_call_function_single_ipi(
 void __irq_entry smp_call_function_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	irq_enter();
 	generic_smp_call_function_interrupt();
+	irq_exit();
 }
 
 void __irq_entry smp_call_function_single_client(int irq, struct pt_regs *regs)
 {
 	clear_softint(1 << irq);
+	irq_enter();
 	generic_smp_call_function_single_interrupt();
+	irq_exit();
 }
 
 static void tsb_sync(void *info)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 017/122] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 016/122] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 018/122] zram: avoid kunmap_atomic() of a NULL pointer Greg Kroah-Hartman
                   ` (100 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andreas Larsson, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andreas Larsson <andreas@gaisler.com>

[ Upstream commit 1a17fdc4f4ed06b63fac1937470378a5441a663a ]

Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
implemented with a swap and cmpxchg is implemented with locks.
Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/include/asm/atomic_32.h  |    2 +-
 arch/sparc/include/asm/cmpxchg_32.h |   12 ++----------
 arch/sparc/lib/atomic32.c           |   27 +++++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 11 deletions(-)

--- a/arch/sparc/include/asm/atomic_32.h
+++ b/arch/sparc/include/asm/atomic_32.h
@@ -21,7 +21,7 @@
 
 extern int __atomic_add_return(int, atomic_t *);
 extern int atomic_cmpxchg(atomic_t *, int, int);
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+extern int atomic_xchg(atomic_t *, int);
 extern int __atomic_add_unless(atomic_t *, int, int);
 extern void atomic_set(atomic_t *, int);
 
--- a/arch/sparc/include/asm/cmpxchg_32.h
+++ b/arch/sparc/include/asm/cmpxchg_32.h
@@ -11,22 +11,14 @@
 #ifndef __ARCH_SPARC_CMPXCHG__
 #define __ARCH_SPARC_CMPXCHG__
 
-static inline unsigned long xchg_u32(__volatile__ unsigned long *m, unsigned long val)
-{
-	__asm__ __volatile__("swap [%2], %0"
-			     : "=&r" (val)
-			     : "0" (val), "r" (m)
-			     : "memory");
-	return val;
-}
-
+extern unsigned long __xchg_u32(volatile u32 *m, u32 new);
 extern void __xchg_called_with_bad_pointer(void);
 
 static inline unsigned long __xchg(unsigned long x, __volatile__ void * ptr, int size)
 {
 	switch (size) {
 	case 4:
-		return xchg_u32(ptr, x);
+		return __xchg_u32(ptr, x);
 	}
 	__xchg_called_with_bad_pointer();
 	return x;
--- a/arch/sparc/lib/atomic32.c
+++ b/arch/sparc/lib/atomic32.c
@@ -40,6 +40,19 @@ int __atomic_add_return(int i, atomic_t
 }
 EXPORT_SYMBOL(__atomic_add_return);
 
+int atomic_xchg(atomic_t *v, int new)
+{
+	int ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(ATOMIC_HASH(v), flags);
+	ret = v->counter;
+	v->counter = new;
+	spin_unlock_irqrestore(ATOMIC_HASH(v), flags);
+	return ret;
+}
+EXPORT_SYMBOL(atomic_xchg);
+
 int atomic_cmpxchg(atomic_t *v, int old, int new)
 {
 	int ret;
@@ -132,3 +145,17 @@ unsigned long __cmpxchg_u32(volatile u32
 	return (unsigned long)prev;
 }
 EXPORT_SYMBOL(__cmpxchg_u32);
+
+unsigned long __xchg_u32(volatile u32 *ptr, u32 new)
+{
+	unsigned long flags;
+	u32 prev;
+
+	spin_lock_irqsave(ATOMIC_HASH(ptr), flags);
+	prev = *ptr;
+	*ptr = new;
+	spin_unlock_irqrestore(ATOMIC_HASH(ptr), flags);
+
+	return (unsigned long)prev;
+}
+EXPORT_SYMBOL(__xchg_u32);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 018/122] zram: avoid kunmap_atomic() of a NULL pointer
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 017/122] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 019/122] crypto: caam - fix missing dma unmap on error path Greg Kroah-Hartman
                   ` (99 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Weijie Yang, Sergey Senozhatsky,
	Dan Streetman, Nitin Gupta, Weijie Yang, Jerome Marchand,
	Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Weijie Yang <weijie.yang@samsung.com>

commit c406515239376fc93a30d5d03192182160cbd3fb upstream.

zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io.  The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.

This patch fixes this issue.

Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/block/zram/zram_drv.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -447,7 +447,8 @@ static int zram_bvec_write(struct zram *
 	}
 
 	if (page_zero_filled(uncmem)) {
-		kunmap_atomic(user_mem);
+		if (user_mem)
+			kunmap_atomic(user_mem);
 		/* Free memory associated with this sector now. */
 		write_lock(&zram->meta->tb_lock);
 		zram_free_page(zram, index);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 019/122] crypto: caam - fix missing dma unmap on error path
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 018/122] zram: avoid kunmap_atomic() of a NULL pointer Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 020/122] crypto: caam - remove duplicated sg copy functions Greg Kroah-Hartman
                   ` (98 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Cristian Stoica, Herbert Xu

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cristian Stoica <cristian.stoica@freescale.com>

commit 738459e3f88538f2ece263424dafe5d91799e46b upstream.

If dma mapping for dma_addr_out fails, the descriptor memory is freed
but the previous dma mapping for dma_addr_in remains.
This patch resolves the missing dma unmap and groups resource
allocations at function start.

Signed-off-by: Cristian Stoica <cristian.stoica@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/crypto/caam/key_gen.c |   29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

--- a/drivers/crypto/caam/key_gen.c
+++ b/drivers/crypto/caam/key_gen.c
@@ -51,23 +51,29 @@ int gen_split_key(struct device *jrdev,
 	u32 *desc;
 	struct split_key_result result;
 	dma_addr_t dma_addr_in, dma_addr_out;
-	int ret = 0;
+	int ret = -ENOMEM;
 
 	desc = kmalloc(CAAM_CMD_SZ * 6 + CAAM_PTR_SZ * 2, GFP_KERNEL | GFP_DMA);
 	if (!desc) {
 		dev_err(jrdev, "unable to allocate key input memory\n");
-		return -ENOMEM;
+		return ret;
 	}
 
-	init_job_desc(desc, 0);
-
 	dma_addr_in = dma_map_single(jrdev, (void *)key_in, keylen,
 				     DMA_TO_DEVICE);
 	if (dma_mapping_error(jrdev, dma_addr_in)) {
 		dev_err(jrdev, "unable to map key input memory\n");
-		kfree(desc);
-		return -ENOMEM;
+		goto out_free;
 	}
+
+	dma_addr_out = dma_map_single(jrdev, key_out, split_key_pad_len,
+				      DMA_FROM_DEVICE);
+	if (dma_mapping_error(jrdev, dma_addr_out)) {
+		dev_err(jrdev, "unable to map key output memory\n");
+		goto out_unmap_in;
+	}
+
+	init_job_desc(desc, 0);
 	append_key(desc, dma_addr_in, keylen, CLASS_2 | KEY_DEST_CLASS_REG);
 
 	/* Sets MDHA up into an HMAC-INIT */
@@ -84,13 +90,6 @@ int gen_split_key(struct device *jrdev,
 	 * FIFO_STORE with the explicit split-key content store
 	 * (0x26 output type)
 	 */
-	dma_addr_out = dma_map_single(jrdev, key_out, split_key_pad_len,
-				      DMA_FROM_DEVICE);
-	if (dma_mapping_error(jrdev, dma_addr_out)) {
-		dev_err(jrdev, "unable to map key output memory\n");
-		kfree(desc);
-		return -ENOMEM;
-	}
 	append_fifo_store(desc, dma_addr_out, split_key_len,
 			  LDST_CLASS_2_CCB | FIFOST_TYPE_SPLIT_KEK);
 
@@ -118,10 +117,10 @@ int gen_split_key(struct device *jrdev,
 
 	dma_unmap_single(jrdev, dma_addr_out, split_key_pad_len,
 			 DMA_FROM_DEVICE);
+out_unmap_in:
 	dma_unmap_single(jrdev, dma_addr_in, keylen, DMA_TO_DEVICE);
-
+out_free:
 	kfree(desc);
-
 	return ret;
 }
 EXPORT_SYMBOL(gen_split_key);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 020/122] crypto: caam - remove duplicated sg copy functions
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 019/122] crypto: caam - fix missing dma unmap on error path Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 021/122] hwrng: pseries - port to new read API and fix stack corruption Greg Kroah-Hartman
                   ` (97 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu, Cristian Stoica

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cristian Stoica <cristian.stoica@freescale.com>

commit 307fd543f3d23f8f56850eca1b27b1be2fe71017 upstream.

Replace equivalent (and partially incorrect) scatter-gather functions
with ones from crypto-API.

The replacement is motivated by page-faults in sg_copy_part triggered
by successive calls to crypto_hash_update. The following fault appears
after calling crypto_ahash_update twice, first with 13 and then
with 285 bytes:

Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xf9bf9a8c
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in: tcrypt(+) caamhash caam_jr caam tls
CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted
3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75
task: e9308530 ti: e700e000 task.ti: e700e000
NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0
REGS: e700fb80 TRAP: 0300   Not tainted
(3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2)
MSR: 00029002 <CE,EE,ME>  CR: 44f92024  XER: 20000000
DEAR: 00000008, ESR: 00000000

GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888
GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154
GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48
GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000
NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash]
LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]
Call Trace:
[e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable)
[e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]
[e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls]
[e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls]
[e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70
[e700fe40] [c02259f0] alg_test_aead+0x50/0xc0
[e700fe60] [c02241e4] alg_test+0x114/0x2e0
[e700fee0] [c022276c] cryptomgr_test+0x4c/0x60
[e700fef0] [c004f658] kthread+0x98/0xa0
[e700ff40] [c000fd04] ret_from_kernel_thread+0x5c/0x64

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/crypto/caam/caamhash.c   |   22 ++++++++++-----
 drivers/crypto/caam/sg_sw_sec4.h |   54 ---------------------------------------
 2 files changed, 14 insertions(+), 62 deletions(-)

--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -835,8 +835,9 @@ static int ahash_update_ctx(struct ahash
 					   edesc->sec4_sg + sec4_sg_src_index,
 					   chained);
 			if (*next_buflen) {
-				sg_copy_part(next_buf, req->src, to_hash -
-					     *buflen, req->nbytes);
+				scatterwalk_map_and_copy(next_buf, req->src,
+							 to_hash - *buflen,
+							 *next_buflen, 0);
 				state->current_buf = !state->current_buf;
 			}
 		} else {
@@ -869,7 +870,8 @@ static int ahash_update_ctx(struct ahash
 			kfree(edesc);
 		}
 	} else if (*next_buflen) {
-		sg_copy(buf + *buflen, req->src, req->nbytes);
+		scatterwalk_map_and_copy(buf + *buflen, req->src, 0,
+					 req->nbytes, 0);
 		*buflen = *next_buflen;
 		*next_buflen = last_buflen;
 	}
@@ -1216,8 +1218,9 @@ static int ahash_update_no_ctx(struct ah
 		src_map_to_sec4_sg(jrdev, req->src, src_nents,
 				   edesc->sec4_sg + 1, chained);
 		if (*next_buflen) {
-			sg_copy_part(next_buf, req->src, to_hash - *buflen,
-				    req->nbytes);
+			scatterwalk_map_and_copy(next_buf, req->src,
+						 to_hash - *buflen,
+						 *next_buflen, 0);
 			state->current_buf = !state->current_buf;
 		}
 
@@ -1248,7 +1251,8 @@ static int ahash_update_no_ctx(struct ah
 			kfree(edesc);
 		}
 	} else if (*next_buflen) {
-		sg_copy(buf + *buflen, req->src, req->nbytes);
+		scatterwalk_map_and_copy(buf + *buflen, req->src, 0,
+					 req->nbytes, 0);
 		*buflen = *next_buflen;
 		*next_buflen = 0;
 	}
@@ -1405,7 +1409,8 @@ static int ahash_update_first(struct aha
 		}
 
 		if (*next_buflen)
-			sg_copy_part(next_buf, req->src, to_hash, req->nbytes);
+			scatterwalk_map_and_copy(next_buf, req->src, to_hash,
+						 *next_buflen, 0);
 
 		sh_len = desc_len(sh_desc);
 		desc = edesc->hw_desc;
@@ -1438,7 +1443,8 @@ static int ahash_update_first(struct aha
 		state->update = ahash_update_no_ctx;
 		state->finup = ahash_finup_no_ctx;
 		state->final = ahash_final_no_ctx;
-		sg_copy(next_buf, req->src, req->nbytes);
+		scatterwalk_map_and_copy(next_buf, req->src, 0,
+					 req->nbytes, 0);
 	}
 #ifdef DEBUG
 	print_hex_dump(KERN_ERR, "next buf@"__stringify(__LINE__)": ",
--- a/drivers/crypto/caam/sg_sw_sec4.h
+++ b/drivers/crypto/caam/sg_sw_sec4.h
@@ -116,57 +116,3 @@ static int dma_unmap_sg_chained(struct d
 	}
 	return nents;
 }
-
-/* Map SG page in kernel virtual address space and copy */
-static inline void sg_map_copy(u8 *dest, struct scatterlist *sg,
-			       int len, int offset)
-{
-	u8 *mapped_addr;
-
-	/*
-	 * Page here can be user-space pinned using get_user_pages
-	 * Same must be kmapped before use and kunmapped subsequently
-	 */
-	mapped_addr = kmap_atomic(sg_page(sg));
-	memcpy(dest, mapped_addr + offset, len);
-	kunmap_atomic(mapped_addr);
-}
-
-/* Copy from len bytes of sg to dest, starting from beginning */
-static inline void sg_copy(u8 *dest, struct scatterlist *sg, unsigned int len)
-{
-	struct scatterlist *current_sg = sg;
-	int cpy_index = 0, next_cpy_index = current_sg->length;
-
-	while (next_cpy_index < len) {
-		sg_map_copy(dest + cpy_index, current_sg, current_sg->length,
-			    current_sg->offset);
-		current_sg = scatterwalk_sg_next(current_sg);
-		cpy_index = next_cpy_index;
-		next_cpy_index += current_sg->length;
-	}
-	if (cpy_index < len)
-		sg_map_copy(dest + cpy_index, current_sg, len-cpy_index,
-			    current_sg->offset);
-}
-
-/* Copy sg data, from to_skip to end, to dest */
-static inline void sg_copy_part(u8 *dest, struct scatterlist *sg,
-				      int to_skip, unsigned int end)
-{
-	struct scatterlist *current_sg = sg;
-	int sg_index, cpy_index, offset;
-
-	sg_index = current_sg->length;
-	while (sg_index <= to_skip) {
-		current_sg = scatterwalk_sg_next(current_sg);
-		sg_index += current_sg->length;
-	}
-	cpy_index = sg_index - to_skip;
-	offset = current_sg->offset + current_sg->length - cpy_index;
-	sg_map_copy(dest, current_sg, cpy_index, offset);
-	if (end - sg_index) {
-		current_sg = scatterwalk_sg_next(current_sg);
-		sg_copy(dest + cpy_index, current_sg, end - sg_index);
-	}
-}



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 021/122] hwrng: pseries - port to new read API and fix stack corruption
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 020/122] crypto: caam - remove duplicated sg copy functions Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 022/122] tun: Fix csum_start with VLAN acceleration Greg Kroah-Hartman
                   ` (96 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Greg Kurz, Herbert Xu

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Greg Kurz <gkurz@linux.vnet.ibm.com>

commit 24c65bc7037e7d0f362c0df70d17dd72ee64b8b9 upstream.

The add_early_randomness() function in drivers/char/hw_random/core.c passes
a 16-byte buffer to pseries_rng_data_read(). Unfortunately, plpar_hcall()
returns four 64-bit values and trashes 16 bytes on the stack.

This bug has been lying around for a long time. It got unveiled by:

commit d3cc7996473a7bdd33256029988ea690754e4e2a
Author: Amit Shah <amit.shah@redhat.com>
Date:   Thu Jul 10 15:42:34 2014 +0530

    hwrng: fetch randomness only after device init

It may trig a oops while loading or unloading the pseries-rng module for both
PowerVM and PowerKVM guests.

This patch does two things:
- pass an intermediate well sized buffer to plpar_hcall(). This is acceptalbe
  since we're not on a hot path.
- move to the new read API so that we know the return buffer size for sure.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/char/hw_random/pseries-rng.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/drivers/char/hw_random/pseries-rng.c
+++ b/drivers/char/hw_random/pseries-rng.c
@@ -25,18 +25,21 @@
 #include <asm/vio.h>
 
 
-static int pseries_rng_data_read(struct hwrng *rng, u32 *data)
+static int pseries_rng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 {
+	u64 buffer[PLPAR_HCALL_BUFSIZE];
+	size_t size = max < 8 ? max : 8;
 	int rc;
 
-	rc = plpar_hcall(H_RANDOM, (unsigned long *)data);
+	rc = plpar_hcall(H_RANDOM, (unsigned long *)buffer);
 	if (rc != H_SUCCESS) {
 		pr_err_ratelimited("H_RANDOM call failed %d\n", rc);
 		return -EIO;
 	}
+	memcpy(data, buffer, size);
 
 	/* The hypervisor interface returns 64 bits */
-	return 8;
+	return size;
 }
 
 /**
@@ -55,7 +58,7 @@ static unsigned long pseries_rng_get_des
 
 static struct hwrng pseries_rng = {
 	.name		= KBUILD_MODNAME,
-	.data_read	= pseries_rng_data_read,
+	.read		= pseries_rng_read,
 };
 
 static int __init pseries_rng_probe(struct vio_dev *dev,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 022/122] tun: Fix csum_start with VLAN acceleration
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 021/122] hwrng: pseries - port to new read API and fix stack corruption Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 023/122] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
                   ` (95 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit a8f9bfdf982e2b1fb9f094e4de9ab08c57f3d2fd upstream.

When VLAN acceleration is in use on the xmit path, we end up
setting csum_start to the wrong place.  The result is that the
whoever ends up doing the checksum setting will corrupt the packet
instead of writing the checksum to the expected location, usually
this means writing the checksum with an offset of -4.

This patch fixes this by adjusting csum_start when VLAN acceleration
is detected.

Fixes: 6680ec68eff4 ("tuntap: hardware vlan tx support")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/tun.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1226,6 +1226,10 @@ static ssize_t tun_put_user(struct tun_s
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;
 	int vlan_offset = 0, copied;
+	int vlan_hlen = 0;
+
+	if (vlan_tx_tag_present(skb))
+		vlan_hlen = VLAN_HLEN;
 
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) < 0)
@@ -1277,7 +1281,8 @@ static ssize_t tun_put_user(struct tun_s
 
 		if (skb->ip_summed == CHECKSUM_PARTIAL) {
 			gso.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
-			gso.csum_start = skb_checksum_start_offset(skb);
+			gso.csum_start = skb_checksum_start_offset(skb) +
+					 vlan_hlen;
 			gso.csum_offset = skb->csum_offset;
 		} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
 			gso.flags = VIRTIO_NET_HDR_F_DATA_VALID;
@@ -1290,10 +1295,9 @@ static ssize_t tun_put_user(struct tun_s
 	}
 
 	copied = total;
-	total += skb->len;
-	if (!vlan_tx_tag_present(skb)) {
-		len = min_t(int, skb->len, len);
-	} else {
+	len = min_t(int, skb->len + vlan_hlen, len);
+	total += skb->len + vlan_hlen;
+	if (vlan_hlen) {
 		int copy, ret;
 		struct {
 			__be16 h_vlan_proto;
@@ -1304,8 +1308,6 @@ static ssize_t tun_put_user(struct tun_s
 		veth.h_vlan_TCI = htons(vlan_tx_tag_get(skb));
 
 		vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
-		len = min_t(int, skb->len + VLAN_HLEN, len);
-		total += VLAN_HLEN;
 
 		copy = min_t(int, vlan_offset, len);
 		ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 023/122] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 022/122] tun: Fix csum_start with VLAN acceleration Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 024/122] audit: correct AUDIT_GET_FEATURE return message type Greg Kroah-Hartman
                   ` (94 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Andy Lutomirski, H. Peter Anvin

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <luto@amacapital.net>

commit 81f49a8fd7088cfcb588d182eeede862c0e3303e upstream.

is_compat_task() is the wrong check for audit arch; the check should
be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not
AUDIT_ARCH_I386.

CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has
no visible effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/ptrace.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1441,15 +1441,6 @@ void send_sigtrap(struct task_struct *ts
 	force_sig_info(SIGTRAP, &info, tsk);
 }
 
-
-#ifdef CONFIG_X86_32
-# define IS_IA32	1
-#elif defined CONFIG_IA32_EMULATION
-# define IS_IA32	is_compat_task()
-#else
-# define IS_IA32	0
-#endif
-
 /*
  * We must return the syscall number to actually look up in the table.
  * This can be -1L to skip running any syscall at all.
@@ -1487,7 +1478,7 @@ long syscall_trace_enter(struct pt_regs
 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
 		trace_sys_enter(regs, regs->orig_ax);
 
-	if (IS_IA32)
+	if (is_ia32_task())
 		audit_syscall_entry(AUDIT_ARCH_I386,
 				    regs->orig_ax,
 				    regs->bx, regs->cx,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 024/122] audit: correct AUDIT_GET_FEATURE return message type
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 023/122] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 025/122] audit: AUDIT_FEATURE_CHANGE message format missing delimiting space Greg Kroah-Hartman
                   ` (93 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Steve Grubb, Richard Guy Briggs

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Guy Briggs <rgb@redhat.com>

commit 9ef91514774a140e468f99d73d7593521e6d25dc upstream.

When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature.  The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.

This appears to have been a cut-and-paste-eo in commit b0fed40.

Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/audit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -687,7 +687,7 @@ static int audit_get_feature(struct sk_b
 
 	seq = nlmsg_hdr(skb)->nlmsg_seq;
 
-	audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &af, sizeof(af));
+	audit_send_reply(skb, seq, AUDIT_GET_FEATURE, 0, 0, &af, sizeof(af));
 
 	return 0;
 }



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 025/122] audit: AUDIT_FEATURE_CHANGE message format missing delimiting space
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 024/122] audit: correct AUDIT_GET_FEATURE return message type Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 026/122] audit: keep inode pinned Greg Kroah-Hartman
                   ` (92 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Richard Guy Briggs, Paul Moore

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Guy Briggs <rgb@redhat.com>

commit 897f1acbb6702ddaa953e8d8436eee3b12016c7e upstream.

Add a space between subj= and feature= fields to make them parsable.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/audit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -702,7 +702,7 @@ static void audit_log_feature_change(int
 
 	ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
 	audit_log_task_info(ab, current);
-	audit_log_format(ab, "feature=%s old=%u new=%u old_lock=%u new_lock=%u res=%d",
+	audit_log_format(ab, " feature=%s old=%u new=%u old_lock=%u new_lock=%u res=%d",
 			 audit_feature_names[which], !!old_feature, !!new_feature,
 			 !!old_lock, !!new_lock, res);
 	audit_log_end(ab);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 026/122] audit: keep inode pinned
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 025/122] audit: AUDIT_FEATURE_CHANGE message format missing delimiting space Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 027/122] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
                   ` (91 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Miklos Szeredi, Paul Moore

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Miklos Szeredi <mszeredi@suse.cz>

commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/audit_tree.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -154,6 +154,7 @@ static struct audit_chunk *alloc_chunk(i
 		chunk->owners[i].index = i;
 	}
 	fsnotify_init_mark(&chunk->mark, audit_tree_destroy_watch);
+	chunk->mark.mask = FS_IN_IGNORED;
 	return chunk;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 027/122] ahci: Add Device IDs for Intel Sunrise Point PCH
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 026/122] audit: keep inode pinned Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 028/122] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
                   ` (90 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, James Ralston, Tejun Heo

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: James Ralston <james.d.ralston@intel.com>

commit 690000b930456a98663567d35dd5c54b688d1e3f upstream.

This patch adds the AHCI-mode SATA Device IDs for the Intel Sunrise Point PCH.

Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/ahci.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -314,6 +314,11 @@ static const struct pci_device_id ahci_p
 	{ PCI_VDEVICE(INTEL, 0x8c87), board_ahci }, /* 9 Series RAID */
 	{ PCI_VDEVICE(INTEL, 0x8c8e), board_ahci }, /* 9 Series RAID */
 	{ PCI_VDEVICE(INTEL, 0x8c8f), board_ahci }, /* 9 Series RAID */
+	{ PCI_VDEVICE(INTEL, 0xa103), board_ahci }, /* Sunrise Point-H AHCI */
+	{ PCI_VDEVICE(INTEL, 0xa103), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa105), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa107), board_ahci }, /* Sunrise Point-H RAID */
+	{ PCI_VDEVICE(INTEL, 0xa10f), board_ahci }, /* Sunrise Point-H RAID */
 
 	/* JMicron 360/1/3/5/6, match class to avoid IDE function */
 	{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 028/122] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 027/122] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 029/122] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
                   ` (89 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Tejun Heo, dorin, Imre Kaloz

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 66a7cbc303f4d28f201529b06061944d51ab530c upstream.

Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so
67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
disabled NCQ on them.  It turns out that NCQ is fine as long as MSI is
not used, so let's turn off MSI and leave NCQ on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731
Tested-by: <dorin@i51.org>
Tested-by: Imre Kaloz <kaloz@openwrt.org>
Fixes: 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/ahci.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -61,6 +61,7 @@ enum board_ids {
 	/* board IDs by feature in alphabetical order */
 	board_ahci,
 	board_ahci_ign_iferr,
+	board_ahci_nomsi,
 	board_ahci_noncq,
 	board_ahci_nosntf,
 	board_ahci_yes_fbs,
@@ -122,6 +123,13 @@ static const struct ata_port_info ahci_p
 		.udma_mask	= ATA_UDMA6,
 		.port_ops	= &ahci_ops,
 	},
+	[board_ahci_nomsi] = {
+		AHCI_HFLAGS	(AHCI_HFLAG_NO_MSI),
+		.flags		= AHCI_FLAG_COMMON,
+		.pio_mask	= ATA_PIO4,
+		.udma_mask	= ATA_UDMA6,
+		.port_ops	= &ahci_ops,
+	},
 	[board_ahci_noncq] = {
 		AHCI_HFLAGS	(AHCI_HFLAG_NO_NCQ),
 		.flags		= AHCI_FLAG_COMMON,
@@ -481,10 +489,10 @@ static const struct pci_device_id ahci_p
 	{ PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },	/* ASM1062 */
 
 	/*
-	 * Samsung SSDs found on some macbooks.  NCQ times out.
-	 * https://bugzilla.kernel.org/show_bug.cgi?id=60731
+	 * Samsung SSDs found on some macbooks.  NCQ times out if MSI is
+	 * enabled.  https://bugzilla.kernel.org/show_bug.cgi?id=60731
 	 */
-	{ PCI_VDEVICE(SAMSUNG, 0x1600), board_ahci_noncq },
+	{ PCI_VDEVICE(SAMSUNG, 0x1600), board_ahci_nomsi },
 
 	/* Enmotus */
 	{ PCI_DEVICE(0x1c44, 0x8000), board_ahci },



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 029/122] ALSA: usb-audio: Fix memory leak in FTU quirk
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 028/122] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 030/122] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
                   ` (88 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 1a290581ded60e87276741f8ca97b161d2b226fc upstream.

M-audio FastTrack Ultra quirk doesn't release the kzalloc'ed memory.
This patch adds the private_free callback to release it properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/usb/mixer_quirks.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/sound/usb/mixer_quirks.c
+++ b/sound/usb/mixer_quirks.c
@@ -885,6 +885,11 @@ static int snd_ftu_eff_switch_put(struct
 	return changed;
 }
 
+static void kctl_private_value_free(struct snd_kcontrol *kctl)
+{
+	kfree((void *)kctl->private_value);
+}
+
 static int snd_ftu_create_effect_switch(struct usb_mixer_interface *mixer,
 	int validx, int bUnitID)
 {
@@ -919,6 +924,7 @@ static int snd_ftu_create_effect_switch(
 		return -ENOMEM;
 	}
 
+	kctl->private_free = kctl_private_value_free;
 	err = snd_ctl_add(mixer->chip->card, kctl);
 	if (err < 0)
 		return err;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 030/122] xtensa: re-wire umount syscall to sys_oldumount
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 029/122] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 031/122] libceph: do not crash on large auth tickets Greg Kroah-Hartman
                   ` (87 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Max Filippov

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Max Filippov <jcmvbkbc@gmail.com>

commit 2651cc6974d47fc43bef1cd8cd26966e4f5ba306 upstream.

Userspace actually passes single parameter (path name) to the umount
syscall, so new umount just fails. Fix it by requesting old umount
syscall implementation and re-wiring umount to it.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/xtensa/include/uapi/asm/unistd.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/xtensa/include/uapi/asm/unistd.h
+++ b/arch/xtensa/include/uapi/asm/unistd.h
@@ -384,7 +384,8 @@ __SYSCALL(174, sys_chroot, 1)
 #define __NR_pivot_root 			175
 __SYSCALL(175, sys_pivot_root, 2)
 #define __NR_umount 				176
-__SYSCALL(176, sys_umount, 2)
+__SYSCALL(176, sys_oldumount, 1)
+#define __ARCH_WANT_SYS_OLDUMOUNT
 #define __NR_swapoff 				177
 __SYSCALL(177, sys_swapoff, 1)
 #define __NR_sync 				178



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 031/122] libceph: do not crash on large auth tickets
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 030/122] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 032/122] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
                   ` (86 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ilya Dryomov, Sage Weil

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <idryomov@redhat.com>

commit aaef31703a0cf6a733e651885bfb49edc3ac6774 upstream.

Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
tickets will have their buffers vmalloc'ed, which leads to the
following crash in crypto:

[   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
[   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.686032] PGD 0
[   28.688088] Oops: 0000 [#1] PREEMPT SMP
[   28.688088] Modules linked in:
[   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
[   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.688088] Workqueue: ceph-msgr con_work
[   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
[   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
[   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
[   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
[   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
[   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
[   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
[   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
[   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
[   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
[   28.688088] Stack:
[   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
[   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
[   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
[   28.688088] Call Trace:
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
[   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
[   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
[   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
[   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
[   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
[   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
[   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
[   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
[   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
[   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
[   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
[   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
[   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
[   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
[   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10

This is because we set up crypto scatterlists as if all buffers were
kmalloc'ed.  Fix it.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/ceph/crypto.c |  169 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 132 insertions(+), 37 deletions(-)

--- a/net/ceph/crypto.c
+++ b/net/ceph/crypto.c
@@ -89,11 +89,82 @@ static struct crypto_blkcipher *ceph_cry
 
 static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
 
+/*
+ * Should be used for buffers allocated with ceph_kvmalloc().
+ * Currently these are encrypt out-buffer (ceph_buffer) and decrypt
+ * in-buffer (msg front).
+ *
+ * Dispose of @sgt with teardown_sgtable().
+ *
+ * @prealloc_sg is to avoid memory allocation inside sg_alloc_table()
+ * in cases where a single sg is sufficient.  No attempt to reduce the
+ * number of sgs by squeezing physically contiguous pages together is
+ * made though, for simplicity.
+ */
+static int setup_sgtable(struct sg_table *sgt, struct scatterlist *prealloc_sg,
+			 const void *buf, unsigned int buf_len)
+{
+	struct scatterlist *sg;
+	const bool is_vmalloc = is_vmalloc_addr(buf);
+	unsigned int off = offset_in_page(buf);
+	unsigned int chunk_cnt = 1;
+	unsigned int chunk_len = PAGE_ALIGN(off + buf_len);
+	int i;
+	int ret;
+
+	if (buf_len == 0) {
+		memset(sgt, 0, sizeof(*sgt));
+		return -EINVAL;
+	}
+
+	if (is_vmalloc) {
+		chunk_cnt = chunk_len >> PAGE_SHIFT;
+		chunk_len = PAGE_SIZE;
+	}
+
+	if (chunk_cnt > 1) {
+		ret = sg_alloc_table(sgt, chunk_cnt, GFP_NOFS);
+		if (ret)
+			return ret;
+	} else {
+		WARN_ON(chunk_cnt != 1);
+		sg_init_table(prealloc_sg, 1);
+		sgt->sgl = prealloc_sg;
+		sgt->nents = sgt->orig_nents = 1;
+	}
+
+	for_each_sg(sgt->sgl, sg, sgt->orig_nents, i) {
+		struct page *page;
+		unsigned int len = min(chunk_len - off, buf_len);
+
+		if (is_vmalloc)
+			page = vmalloc_to_page(buf);
+		else
+			page = virt_to_page(buf);
+
+		sg_set_page(sg, page, len, off);
+
+		off = 0;
+		buf += len;
+		buf_len -= len;
+	}
+	WARN_ON(buf_len != 0);
+
+	return 0;
+}
+
+static void teardown_sgtable(struct sg_table *sgt)
+{
+	if (sgt->orig_nents > 1)
+		sg_free_table(sgt);
+}
+
 static int ceph_aes_encrypt(const void *key, int key_len,
 			    void *dst, size_t *dst_len,
 			    const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[2], sg_out[1];
+	struct scatterlist sg_in[2], prealloc_sg;
+	struct sg_table sg_out;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
 	int ret;
@@ -109,16 +180,18 @@ static int ceph_aes_encrypt(const void *
 
 	*dst_len = src_len + zero_padding;
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	sg_init_table(sg_in, 2);
 	sg_set_buf(&sg_in[0], src, src_len);
 	sg_set_buf(&sg_in[1], pad, zero_padding);
-	sg_init_table(sg_out, 1);
-	sg_set_buf(sg_out, dst, *dst_len);
+	ret = setup_sgtable(&sg_out, &prealloc_sg, dst, *dst_len);
+	if (ret)
+		goto out_tfm;
+
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
+
 	/*
 	print_hex_dump(KERN_ERR, "enc key: ", DUMP_PREFIX_NONE, 16, 1,
 		       key, key_len, 1);
@@ -127,16 +200,22 @@ static int ceph_aes_encrypt(const void *
 	print_hex_dump(KERN_ERR, "enc pad: ", DUMP_PREFIX_NONE, 16, 1,
 			pad, zero_padding, 1);
 	*/
-	ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+	ret = crypto_blkcipher_encrypt(&desc, sg_out.sgl, sg_in,
 				     src_len + zero_padding);
-	crypto_free_blkcipher(tfm);
-	if (ret < 0)
+	if (ret < 0) {
 		pr_err("ceph_aes_crypt failed %d\n", ret);
+		goto out_sg;
+	}
 	/*
 	print_hex_dump(KERN_ERR, "enc out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_out);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_encrypt2(const void *key, int key_len, void *dst,
@@ -144,7 +223,8 @@ static int ceph_aes_encrypt2(const void
 			     const void *src1, size_t src1_len,
 			     const void *src2, size_t src2_len)
 {
-	struct scatterlist sg_in[3], sg_out[1];
+	struct scatterlist sg_in[3], prealloc_sg;
+	struct sg_table sg_out;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm, .flags = 0 };
 	int ret;
@@ -160,17 +240,19 @@ static int ceph_aes_encrypt2(const void
 
 	*dst_len = src1_len + src2_len + zero_padding;
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	sg_init_table(sg_in, 3);
 	sg_set_buf(&sg_in[0], src1, src1_len);
 	sg_set_buf(&sg_in[1], src2, src2_len);
 	sg_set_buf(&sg_in[2], pad, zero_padding);
-	sg_init_table(sg_out, 1);
-	sg_set_buf(sg_out, dst, *dst_len);
+	ret = setup_sgtable(&sg_out, &prealloc_sg, dst, *dst_len);
+	if (ret)
+		goto out_tfm;
+
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
+
 	/*
 	print_hex_dump(KERN_ERR, "enc  key: ", DUMP_PREFIX_NONE, 16, 1,
 		       key, key_len, 1);
@@ -181,23 +263,30 @@ static int ceph_aes_encrypt2(const void
 	print_hex_dump(KERN_ERR, "enc  pad: ", DUMP_PREFIX_NONE, 16, 1,
 			pad, zero_padding, 1);
 	*/
-	ret = crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
+	ret = crypto_blkcipher_encrypt(&desc, sg_out.sgl, sg_in,
 				     src1_len + src2_len + zero_padding);
-	crypto_free_blkcipher(tfm);
-	if (ret < 0)
+	if (ret < 0) {
 		pr_err("ceph_aes_crypt2 failed %d\n", ret);
+		goto out_sg;
+	}
 	/*
 	print_hex_dump(KERN_ERR, "enc  out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_out);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_decrypt(const void *key, int key_len,
 			    void *dst, size_t *dst_len,
 			    const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[1], sg_out[2];
+	struct sg_table sg_in;
+	struct scatterlist sg_out[2], prealloc_sg;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm };
 	char pad[16];
@@ -209,16 +298,16 @@ static int ceph_aes_decrypt(const void *
 	if (IS_ERR(tfm))
 		return PTR_ERR(tfm);
 
-	crypto_blkcipher_setkey((void *)tfm, key, key_len);
-	sg_init_table(sg_in, 1);
 	sg_init_table(sg_out, 2);
-	sg_set_buf(sg_in, src, src_len);
 	sg_set_buf(&sg_out[0], dst, *dst_len);
 	sg_set_buf(&sg_out[1], pad, sizeof(pad));
+	ret = setup_sgtable(&sg_in, &prealloc_sg, src, src_len);
+	if (ret)
+		goto out_tfm;
 
+	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
 
 	/*
@@ -227,12 +316,10 @@ static int ceph_aes_decrypt(const void *
 	print_hex_dump(KERN_ERR, "dec  in: ", DUMP_PREFIX_NONE, 16, 1,
 		       src, src_len, 1);
 	*/
-
-	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-	crypto_free_blkcipher(tfm);
+	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in.sgl, src_len);
 	if (ret < 0) {
 		pr_err("ceph_aes_decrypt failed %d\n", ret);
-		return ret;
+		goto out_sg;
 	}
 
 	if (src_len <= *dst_len)
@@ -250,7 +337,12 @@ static int ceph_aes_decrypt(const void *
 	print_hex_dump(KERN_ERR, "dec out: ", DUMP_PREFIX_NONE, 16, 1,
 		       dst, *dst_len, 1);
 	*/
-	return 0;
+
+out_sg:
+	teardown_sgtable(&sg_in);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 static int ceph_aes_decrypt2(const void *key, int key_len,
@@ -258,7 +350,8 @@ static int ceph_aes_decrypt2(const void
 			     void *dst2, size_t *dst2_len,
 			     const void *src, size_t src_len)
 {
-	struct scatterlist sg_in[1], sg_out[3];
+	struct sg_table sg_in;
+	struct scatterlist sg_out[3], prealloc_sg;
 	struct crypto_blkcipher *tfm = ceph_crypto_alloc_cipher();
 	struct blkcipher_desc desc = { .tfm = tfm };
 	char pad[16];
@@ -270,17 +363,17 @@ static int ceph_aes_decrypt2(const void
 	if (IS_ERR(tfm))
 		return PTR_ERR(tfm);
 
-	sg_init_table(sg_in, 1);
-	sg_set_buf(sg_in, src, src_len);
 	sg_init_table(sg_out, 3);
 	sg_set_buf(&sg_out[0], dst1, *dst1_len);
 	sg_set_buf(&sg_out[1], dst2, *dst2_len);
 	sg_set_buf(&sg_out[2], pad, sizeof(pad));
+	ret = setup_sgtable(&sg_in, &prealloc_sg, src, src_len);
+	if (ret)
+		goto out_tfm;
 
 	crypto_blkcipher_setkey((void *)tfm, key, key_len);
 	iv = crypto_blkcipher_crt(tfm)->iv;
 	ivsize = crypto_blkcipher_ivsize(tfm);
-
 	memcpy(iv, aes_iv, ivsize);
 
 	/*
@@ -289,12 +382,10 @@ static int ceph_aes_decrypt2(const void
 	print_hex_dump(KERN_ERR, "dec   in: ", DUMP_PREFIX_NONE, 16, 1,
 		       src, src_len, 1);
 	*/
-
-	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in, src_len);
-	crypto_free_blkcipher(tfm);
+	ret = crypto_blkcipher_decrypt(&desc, sg_out, sg_in.sgl, src_len);
 	if (ret < 0) {
 		pr_err("ceph_aes_decrypt failed %d\n", ret);
-		return ret;
+		goto out_sg;
 	}
 
 	if (src_len <= *dst1_len)
@@ -324,7 +415,11 @@ static int ceph_aes_decrypt2(const void
 		       dst2, *dst2_len, 1);
 	*/
 
-	return 0;
+out_sg:
+	teardown_sgtable(&sg_in);
+out_tfm:
+	crypto_free_blkcipher(tfm);
+	return ret;
 }
 
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 032/122] macvtap: Fix csum_start when VLAN tags are present
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 031/122] libceph: do not crash on large auth tickets Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 033/122] mac80211_hwsim: release driver when ieee80211_register_hw fails Greg Kroah-Hartman
                   ` (85 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Herbert Xu, David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 3ce9b20f1971690b8b3b620e735ec99431573b39 upstream.

When VLAN is in use in macvtap_put_user, we end up setting
csum_start to the wrong place.  The result is that the whoever
ends up doing the checksum setting will corrupt the packet instead
of writing the checksum to the expected location, usually this
means writing the checksum with an offset of -4.

This patch fixes this by adjusting csum_start when VLAN tags are
detected.

Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

---
 drivers/net/macvtap.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -629,6 +629,8 @@ static void macvtap_skb_to_vnet_hdr(cons
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		vnet_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
 		vnet_hdr->csum_start = skb_checksum_start_offset(skb);
+		if (vlan_tx_tag_present(skb))
+			vnet_hdr->csum_start += VLAN_HLEN;
 		vnet_hdr->csum_offset = skb->csum_offset;
 	} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
 		vnet_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 033/122] mac80211_hwsim: release driver when ieee80211_register_hw fails
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 032/122] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 034/122] mac80211: properly flush delayed scan work on interface removal Greg Kroah-Hartman
                   ` (84 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Fengguang Wu, Junjie Mao, Johannes Berg

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Junjie Mao <eternal.n08@gmail.com>

commit 805dbe17d1c832ad341f14fae8cedf41b67ca6fa upstream.

The driver is not released when ieee80211_register_hw fails in
mac80211_hwsim_create_radio, leading to the access to the unregistered (and
possibly freed) device in platform_driver_unregister:

[    0.447547] mac80211_hwsim: ieee80211_register_hw failed (-2)
[    0.448292] ------------[ cut here ]------------
[    0.448854] WARNING: CPU: 0 PID: 1 at ../include/linux/kref.h:47 kobject_get+0x33/0x50()
[    0.449839] CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-00001-gdd46990-dirty #2
[    0.450813] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.451512]  00000000 00000000 78025e38 7967c6c6 78025e68 7905e09b 7988b480 00000000
[    0.452579]  00000001 79887d62 0000002f 79170bb3 79170bb3 78397008 79ac9d74 00000001
[    0.453614]  78025e78 7905e15d 00000009 00000000 78025e84 79170bb3 78397000 78025e8c
[    0.454632] Call Trace:
[    0.454921]  [<7967c6c6>] dump_stack+0x16/0x18
[    0.455453]  [<7905e09b>] warn_slowpath_common+0x6b/0x90
[    0.456067]  [<79170bb3>] ? kobject_get+0x33/0x50
[    0.456612]  [<79170bb3>] ? kobject_get+0x33/0x50
[    0.457155]  [<7905e15d>] warn_slowpath_null+0x1d/0x20
[    0.457748]  [<79170bb3>] kobject_get+0x33/0x50
[    0.458274]  [<7925824f>] get_device+0xf/0x20
[    0.458779]  [<7925b5cd>] driver_detach+0x3d/0xa0
[    0.459331]  [<7925a3ff>] bus_remove_driver+0x8f/0xb0
[    0.459927]  [<7925bf80>] ? class_unregister+0x40/0x80
[    0.460660]  [<7925bad7>] driver_unregister+0x47/0x50
[    0.461248]  [<7925c033>] ? class_destroy+0x13/0x20
[    0.461824]  [<7925d07b>] platform_driver_unregister+0xb/0x10
[    0.462507]  [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9
[    0.463161]  [<79b30c58>] do_one_initcall+0x106/0x1a9
[    0.463758]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.464393]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.465001]  [<79071935>] ? parse_args+0x2f5/0x480
[    0.465569]  [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[    0.466345]  [<79b30dd9>] kernel_init_freeable+0xde/0x17d
[    0.466972]  [<79b304d6>] ? do_early_param+0x7a/0x7a
[    0.467546]  [<79677b1b>] kernel_init+0xb/0xe0
[    0.468072]  [<79075f42>] ? schedule_tail+0x12/0x40
[    0.468658]  [<79686580>] ret_from_kernel_thread+0x20/0x30
[    0.469303]  [<79677b10>] ? rest_init+0xc0/0xc0
[    0.469829] ---[ end trace ad8ac403ff8aef5c ]---
[    0.470509] ------------[ cut here ]------------
[    0.471047] WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3161 __lock_acquire.isra.22+0x7aa/0xb00()
[    0.472163] DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)
[    0.472774] CPU: 0 PID: 1 Comm: swapper Tainted: G        W      3.17.0-00001-gdd46990-dirty #2
[    0.473815] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.474492]  78025de0 78025de0 78025da0 7967c6c6 78025dd0 7905e09b 79888931 78025dfc
[    0.475515]  00000001 79888a93 00000c59 7907f33a 7907f33a 78028000 fffe9d09 00000000
[    0.476519]  78025de8 7905e10e 00000009 78025de0 79888931 78025dfc 78025e24 7907f33a
[    0.477523] Call Trace:
[    0.477821]  [<7967c6c6>] dump_stack+0x16/0x18
[    0.478352]  [<7905e09b>] warn_slowpath_common+0x6b/0x90
[    0.478976]  [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00
[    0.479658]  [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00
[    0.480417]  [<7905e10e>] warn_slowpath_fmt+0x2e/0x30
[    0.480479]  [<7907f33a>] __lock_acquire.isra.22+0x7aa/0xb00
[    0.480479]  [<79078aa5>] ? sched_clock_cpu+0xb5/0xf0
[    0.480479]  [<7907fd06>] lock_acquire+0x56/0x70
[    0.480479]  [<7925b5e8>] ? driver_detach+0x58/0xa0
[    0.480479]  [<79682d11>] mutex_lock_nested+0x61/0x2a0
[    0.480479]  [<7925b5e8>] ? driver_detach+0x58/0xa0
[    0.480479]  [<7925b5e8>] ? driver_detach+0x58/0xa0
[    0.480479]  [<7925b5e8>] driver_detach+0x58/0xa0
[    0.480479]  [<7925a3ff>] bus_remove_driver+0x8f/0xb0
[    0.480479]  [<7925bf80>] ? class_unregister+0x40/0x80
[    0.480479]  [<7925bad7>] driver_unregister+0x47/0x50
[    0.480479]  [<7925c033>] ? class_destroy+0x13/0x20
[    0.480479]  [<7925d07b>] platform_driver_unregister+0xb/0x10
[    0.480479]  [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9
[    0.480479]  [<79b30c58>] do_one_initcall+0x106/0x1a9
[    0.480479]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.480479]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.480479]  [<79071935>] ? parse_args+0x2f5/0x480
[    0.480479]  [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[    0.480479]  [<79b30dd9>] kernel_init_freeable+0xde/0x17d
[    0.480479]  [<79b304d6>] ? do_early_param+0x7a/0x7a
[    0.480479]  [<79677b1b>] kernel_init+0xb/0xe0
[    0.480479]  [<79075f42>] ? schedule_tail+0x12/0x40
[    0.480479]  [<79686580>] ret_from_kernel_thread+0x20/0x30
[    0.480479]  [<79677b10>] ? rest_init+0xc0/0xc0
[    0.480479] ---[ end trace ad8ac403ff8aef5d ]---
[    0.495478] BUG: unable to handle kernel paging request at 00200200
[    0.496257] IP: [<79682de5>] mutex_lock_nested+0x135/0x2a0
[    0.496923] *pde = 00000000
[    0.497290] Oops: 0002 [#1]
[    0.497653] CPU: 0 PID: 1 Comm: swapper Tainted: G        W      3.17.0-00001-gdd46990-dirty #2
[    0.498659] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.499321] task: 78028000 ti: 78024000 task.ti: 78024000
[    0.499955] EIP: 0060:[<79682de5>] EFLAGS: 00010097 CPU: 0
[    0.500620] EIP is at mutex_lock_nested+0x135/0x2a0
[    0.501145] EAX: 00200200 EBX: 78397434 ECX: 78397460 EDX: 78025e70
[    0.501816] ESI: 00000246 EDI: 78028000 EBP: 78025e8c ESP: 78025e54
[    0.502497]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[    0.503076] CR0: 8005003b CR2: 00200200 CR3: 01b9d000 CR4: 00000690
[    0.503773] Stack:
[    0.503998]  00000000 00000001 00000000 7925b5e8 78397460 7925b5e8 78397474 78397460
[    0.504944]  00200200 11111111 78025e70 78397000 79ac9d74 00000001 78025ea0 7925b5e8
[    0.505451]  79ac9d74 fffffffe 00000001 78025ebc 7925a3ff 7a251398 78025ec8 7925bf80
[    0.505451] Call Trace:
[    0.505451]  [<7925b5e8>] ? driver_detach+0x58/0xa0
[    0.505451]  [<7925b5e8>] ? driver_detach+0x58/0xa0
[    0.505451]  [<7925b5e8>] driver_detach+0x58/0xa0
[    0.505451]  [<7925a3ff>] bus_remove_driver+0x8f/0xb0
[    0.505451]  [<7925bf80>] ? class_unregister+0x40/0x80
[    0.505451]  [<7925bad7>] driver_unregister+0x47/0x50
[    0.505451]  [<7925c033>] ? class_destroy+0x13/0x20
[    0.505451]  [<7925d07b>] platform_driver_unregister+0xb/0x10
[    0.505451]  [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9
[    0.505451]  [<79b30c58>] do_one_initcall+0x106/0x1a9
[    0.505451]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.505451]  [<79b517b8>] ? if_spi_init_module+0xac/0xac
[    0.505451]  [<79071935>] ? parse_args+0x2f5/0x480
[    0.505451]  [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[    0.505451]  [<79b30dd9>] kernel_init_freeable+0xde/0x17d
[    0.505451]  [<79b304d6>] ? do_early_param+0x7a/0x7a
[    0.505451]  [<79677b1b>] kernel_init+0xb/0xe0
[    0.505451]  [<79075f42>] ? schedule_tail+0x12/0x40
[    0.505451]  [<79686580>] ret_from_kernel_thread+0x20/0x30
[    0.505451]  [<79677b10>] ? rest_init+0xc0/0xc0
[    0.505451] Code: 89 d8 e8 cf 9b 9f ff 8b 4f 04 8d 55 e4 89 d8 e8 72 9d 9f ff 8d 43 2c 89 c1 89 45 d8 8b 43 30 8d 55 e4 89 53 30 89 4d e4 89 45 e8 <89> 10 8b 55 dc 8b 45 e0 89 7d ec e8 db af 9f ff eb 11 90 31 c0
[    0.505451] EIP: [<79682de5>] mutex_lock_nested+0x135/0x2a0 SS:ESP 0068:78025e54
[    0.505451] CR2: 0000000000200200
[    0.505451] ---[ end trace ad8ac403ff8aef5e ]---
[    0.505451] Kernel panic - not syncing: Fatal exception

Fixes: 9ea927748ced ("mac80211_hwsim: Register and bind to driver")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/wireless/mac80211_hwsim.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1974,7 +1974,7 @@ static int mac80211_hwsim_create_radio(i
 	if (err != 0) {
 		printk(KERN_DEBUG "mac80211_hwsim: device_bind_driver failed (%d)\n",
 		       err);
-		goto failed_hw;
+		goto failed_bind;
 	}
 
 	skb_queue_head_init(&data->pending);
@@ -2157,6 +2157,8 @@ static int mac80211_hwsim_create_radio(i
 	return idx;
 
 failed_hw:
+	device_release_driver(data->dev);
+failed_bind:
 	device_unregister(data->dev);
 failed_drvdata:
 	ieee80211_free_hw(hw);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 034/122] mac80211: properly flush delayed scan work on interface removal
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 033/122] mac80211_hwsim: release driver when ieee80211_register_hw fails Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 035/122] mac80211: use secondary channel offset IE also beacons during CSA Greg Kroah-Hartman
                   ` (83 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Sujith Manoharan, Johannes Berg

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Berg <johannes@sipsolutions.net>

commit 46238845bd609a5c0fbe076e1b82b4c5b33360b2 upstream.

When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.

However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.

To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.

Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/mac80211/iface.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -760,10 +760,12 @@ static void ieee80211_do_stop(struct iee
 	int i, flushed;
 	struct ps_data *ps;
 	struct cfg80211_chan_def chandef;
+	bool cancel_scan;
 
 	clear_bit(SDATA_STATE_RUNNING, &sdata->state);
 
-	if (rcu_access_pointer(local->scan_sdata) == sdata)
+	cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
+	if (cancel_scan)
 		ieee80211_scan_cancel(local);
 
 	/*
@@ -973,6 +975,9 @@ static void ieee80211_do_stop(struct iee
 
 	ieee80211_recalc_ps(local, -1);
 
+	if (cancel_scan)
+		flush_delayed_work(&local->scan_work);
+
 	if (local->open_count == 0) {
 		ieee80211_stop_device(local);
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 035/122] mac80211: use secondary channel offset IE also beacons during CSA
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 034/122] mac80211: properly flush delayed scan work on interface removal Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 036/122] mac80211: schedule the actual switch of the station before CSA count 0 Greg Kroah-Hartman
                   ` (82 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jouni Malinen, Luciano Coelho, Johannes Berg

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Luciano Coelho <luciano.coelho@intel.com>

commit 84469a45a1bedec9918e94ab2f78c5dc0739e4a7 upstream.

If we are switching from an HT40+ to an HT40- channel (or vice-versa),
we need the secondary channel offset IE to specify what is the
post-CSA offset to be used.  This applies both to beacons and to probe
responses.

In ieee80211_parse_ch_switch_ie() we were ignoring this IE from
beacons and using the *current* HT information IE instead.  This was
causing us to use the same offset as before the switch.

Fix that by using the secondary channel offset IE also for beacons and
don't ever use the pre-switch offset.  Additionally, remove the
"beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not
needed anymore.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/mac80211/ibss.c        |    2 +-
 net/mac80211/ieee80211_i.h |    3 +--
 net/mac80211/mesh.c        |    2 +-
 net/mac80211/mlme.c        |    2 +-
 net/mac80211/spectmgmt.c   |   18 ++++++------------
 5 files changed, 10 insertions(+), 17 deletions(-)

--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -815,7 +815,7 @@ ieee80211_ibss_process_chanswitch(struct
 
 	memset(&params, 0, sizeof(params));
 	memset(&csa_ie, 0, sizeof(csa_ie));
-	err = ieee80211_parse_ch_switch_ie(sdata, elems, beacon,
+	err = ieee80211_parse_ch_switch_ie(sdata, elems,
 					   ifibss->chandef.chan->band,
 					   sta_flags, ifibss->bssid, &csa_ie);
 	/* can't switch to destination channel, fail */
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1569,7 +1569,6 @@ void ieee80211_process_measurement_req(s
  * ieee80211_parse_ch_switch_ie - parses channel switch IEs
  * @sdata: the sdata of the interface which has received the frame
  * @elems: parsed 802.11 elements received with the frame
- * @beacon: indicates if the frame was a beacon or probe response
  * @current_band: indicates the current band
  * @sta_flags: contains information about own capabilities and restrictions
  *	to decide which channel switch announcements can be accepted. Only the
@@ -1583,7 +1582,7 @@ void ieee80211_process_measurement_req(s
  * Return: 0 on success, <0 on error and >0 if there is nothing to parse.
  */
 int ieee80211_parse_ch_switch_ie(struct ieee80211_sub_if_data *sdata,
-				 struct ieee802_11_elems *elems, bool beacon,
+				 struct ieee802_11_elems *elems,
 				 enum ieee80211_band current_band,
 				 u32 sta_flags, u8 *bssid,
 				 struct ieee80211_csa_ie *csa_ie);
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -885,7 +885,7 @@ ieee80211_mesh_process_chnswitch(struct
 
 	memset(&params, 0, sizeof(params));
 	memset(&csa_ie, 0, sizeof(csa_ie));
-	err = ieee80211_parse_ch_switch_ie(sdata, elems, beacon, band,
+	err = ieee80211_parse_ch_switch_ie(sdata, elems, band,
 					   sta_flags, sdata->vif.addr,
 					   &csa_ie);
 	if (err < 0)
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1001,7 +1001,7 @@ ieee80211_sta_process_chanswitch(struct
 
 	current_band = cbss->channel->band;
 	memset(&csa_ie, 0, sizeof(csa_ie));
-	res = ieee80211_parse_ch_switch_ie(sdata, elems, beacon, current_band,
+	res = ieee80211_parse_ch_switch_ie(sdata, elems, current_band,
 					   ifmgd->flags,
 					   ifmgd->associated->bssid, &csa_ie);
 	if (res	< 0)
--- a/net/mac80211/spectmgmt.c
+++ b/net/mac80211/spectmgmt.c
@@ -22,7 +22,7 @@
 #include "wme.h"
 
 int ieee80211_parse_ch_switch_ie(struct ieee80211_sub_if_data *sdata,
-				 struct ieee802_11_elems *elems, bool beacon,
+				 struct ieee802_11_elems *elems,
 				 enum ieee80211_band current_band,
 				 u32 sta_flags, u8 *bssid,
 				 struct ieee80211_csa_ie *csa_ie)
@@ -91,19 +91,13 @@ int ieee80211_parse_ch_switch_ie(struct
 		return -EINVAL;
 	}
 
-	if (!beacon && sec_chan_offs) {
+	if (sec_chan_offs) {
 		secondary_channel_offset = sec_chan_offs->sec_chan_offs;
-	} else if (beacon && ht_oper) {
-		secondary_channel_offset =
-			ht_oper->ht_param & IEEE80211_HT_PARAM_CHA_SEC_OFFSET;
 	} else if (!(sta_flags & IEEE80211_STA_DISABLE_HT)) {
-		/* If it's not a beacon, HT is enabled and the IE not present,
-		 * it's 20 MHz, 802.11-2012 8.5.2.6:
-		 *	This element [the Secondary Channel Offset Element] is
-		 *	present when switching to a 40 MHz channel. It may be
-		 *	present when switching to a 20 MHz channel (in which
-		 *	case the secondary channel offset is set to SCN).
-		 */
+		/* If the secondary channel offset IE is not present,
+		 * we can't know what's the post-CSA offset, so the
+		 * best we can do is use 20MHz.
+		*/
 		secondary_channel_offset = IEEE80211_HT_PARAM_CHA_SEC_NONE;
 	}
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 036/122] mac80211: schedule the actual switch of the station before CSA count 0
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 035/122] mac80211: use secondary channel offset IE also beacons during CSA Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 037/122] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
                   ` (81 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jouni Malinen, Luciano Coelho, Johannes Berg

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Luciano Coelho <luciano.coelho@intel.com>

commit ff1e417c7c239b7abfe70aa90460a77eaafc7f83 upstream.

Due to the time it takes to process the beacon that started the CSA
process, we may be late for the switch if we try to reach exactly
beacon 0.  To avoid that, use count - 1 when calculating the switch time.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/mac80211/mlme.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1086,7 +1086,8 @@ ieee80211_sta_process_chanswitch(struct
 		ieee80211_queue_work(&local->hw, &ifmgd->chswitch_work);
 	else
 		mod_timer(&ifmgd->chswitch_timer,
-			  TU_TO_EXP_TIME(csa_ie.count * cbss->beacon_interval));
+			  TU_TO_EXP_TIME((csa_ie.count - 1) *
+					 cbss->beacon_interval));
 }
 
 static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 037/122] mac80211: fix use-after-free in defragmentation
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 036/122] mac80211: schedule the actual switch of the station before CSA count 0 Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 038/122] drm/radeon: set correct CE ram size for CIK Greg Kroah-Hartman
                   ` (80 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Yosef Khyal, Johannes Berg

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Berg <johannes.berg@intel.com>

commit b8fff407a180286aa683d543d878d98d9fc57b13 upstream.

Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.

Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.

Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/mac80211/rx.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1679,11 +1679,14 @@ ieee80211_rx_h_defragment(struct ieee802
 	sc = le16_to_cpu(hdr->seq_ctrl);
 	frag = sc & IEEE80211_SCTL_FRAG;
 
-	if (likely((!ieee80211_has_morefrags(fc) && frag == 0) ||
-		   is_multicast_ether_addr(hdr->addr1))) {
-		/* not fragmented */
+	if (likely(!ieee80211_has_morefrags(fc) && frag == 0))
+		goto out;
+
+	if (is_multicast_ether_addr(hdr->addr1)) {
+		rx->local->dot11MulticastReceivedFrameCount++;
 		goto out;
 	}
+
 	I802_DEBUG_INC(rx->local->rx_handlers_fragments);
 
 	if (skb_linearize(rx->skb))
@@ -1776,10 +1779,7 @@ ieee80211_rx_h_defragment(struct ieee802
  out:
 	if (rx->sta)
 		rx->sta->rx_packets++;
-	if (is_multicast_ether_addr(hdr->addr1))
-		rx->local->dot11MulticastReceivedFrameCount++;
-	else
-		ieee80211_led_rx(rx->local);
+	ieee80211_led_rx(rx->local);
 	return RX_CONTINUE;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 038/122] drm/radeon: set correct CE ram size for CIK
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 037/122] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 039/122] drm/radeon: make sure mode init is complete in bandwidth_update Greg Kroah-Hartman
                   ` (79 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jammy Zhou, Alex Deucher

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jammy Zhou <Jammy.Zhou@amd.com>

commit dc4edad6530a9b7b66c3d905e2bc06021a05dcad upstream.

CE ram size is 32k/0k/0k for GFX/CS0/CS1 with CIK

Ported from amdgpu driver.

Signed-off-by: Jammy Zhou <Jammy.Zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/radeon/cik.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -3936,8 +3936,8 @@ static int cik_cp_gfx_start(struct radeo
 	/* init the CE partitions.  CE only used for gfx on CIK */
 	radeon_ring_write(ring, PACKET3(PACKET3_SET_BASE, 2));
 	radeon_ring_write(ring, PACKET3_BASE_INDEX(CE_PARTITION_BASE));
-	radeon_ring_write(ring, 0xc000);
-	radeon_ring_write(ring, 0xc000);
+	radeon_ring_write(ring, 0x8000);
+	radeon_ring_write(ring, 0x8000);
 
 	/* setup clear context state */
 	radeon_ring_write(ring, PACKET3(PACKET3_PREAMBLE_CNTL, 0));



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 039/122] drm/radeon: make sure mode init is complete in bandwidth_update
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 038/122] drm/radeon: set correct CE ram size for CIK Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 040/122] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
                   ` (78 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alex Deucher

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Deucher <alexander.deucher@amd.com>

commit 8efe82ca908400785253c8f0dfcf301e6bd93488 upstream.

The power management code calls into the display code for
certain things.  If certain power management sysfs attributes
are called before the driver has finished initializing all of
the hardware we can run into problems with uninitialized
modesetting state.  Add a check to make sure modesetting
init has completed to the bandwidth update callbacks to
fix this.  Can be triggered by the tlp and laptop start
up scripts depending on the timing.

bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=83611
https://bugs.freedesktop.org/show_bug.cgi?id=85771

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/radeon/cik.c       |    3 +++
 drivers/gpu/drm/radeon/evergreen.c |    3 +++
 drivers/gpu/drm/radeon/r100.c      |    3 +++
 drivers/gpu/drm/radeon/rs600.c     |    3 +++
 drivers/gpu/drm/radeon/rs690.c     |    3 +++
 drivers/gpu/drm/radeon/rv515.c     |    3 +++
 drivers/gpu/drm/radeon/si.c        |    3 +++
 7 files changed, 21 insertions(+)

--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -8893,6 +8893,9 @@ void dce8_bandwidth_update(struct radeon
 	u32 num_heads = 0, lb_size;
 	int i;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	for (i = 0; i < rdev->num_crtc; i++) {
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2362,6 +2362,9 @@ void evergreen_bandwidth_update(struct r
 	u32 num_heads = 0, lb_size;
 	int i;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	for (i = 0; i < rdev->num_crtc; i++) {
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3219,6 +3219,9 @@ void r100_bandwidth_update(struct radeon
 	uint32_t pixel_bytes1 = 0;
 	uint32_t pixel_bytes2 = 0;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	if (rdev->mode_info.crtcs[0]->base.enabled) {
--- a/drivers/gpu/drm/radeon/rs600.c
+++ b/drivers/gpu/drm/radeon/rs600.c
@@ -890,6 +890,9 @@ void rs600_bandwidth_update(struct radeo
 	u32 d1mode_priority_a_cnt, d2mode_priority_a_cnt;
 	/* FIXME: implement full support */
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	if (rdev->mode_info.crtcs[0]->base.enabled)
--- a/drivers/gpu/drm/radeon/rs690.c
+++ b/drivers/gpu/drm/radeon/rs690.c
@@ -579,6 +579,9 @@ void rs690_bandwidth_update(struct radeo
 	u32 d1mode_priority_a_cnt, d1mode_priority_b_cnt;
 	u32 d2mode_priority_a_cnt, d2mode_priority_b_cnt;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	if (rdev->mode_info.crtcs[0]->base.enabled)
--- a/drivers/gpu/drm/radeon/rv515.c
+++ b/drivers/gpu/drm/radeon/rv515.c
@@ -1276,6 +1276,9 @@ void rv515_bandwidth_update(struct radeo
 	struct drm_display_mode *mode0 = NULL;
 	struct drm_display_mode *mode1 = NULL;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	if (rdev->mode_info.crtcs[0]->base.enabled)
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -2227,6 +2227,9 @@ void dce6_bandwidth_update(struct radeon
 	u32 num_heads = 0, lb_size;
 	int i;
 
+	if (!rdev->mode_info.mode_config_initialized)
+		return;
+
 	radeon_update_display_priority(rdev);
 
 	for (i = 0; i < rdev->num_crtc; i++) {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 040/122] drm/radeon: add missing crtc unlock when setting up the MC
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 039/122] drm/radeon: make sure mode init is complete in bandwidth_update Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 042/122] ARM: 8191/1: decompressor: ensure I-side picks up relocated code Greg Kroah-Hartman
                   ` (77 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alex Deucher

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alex Deucher <alexander.deucher@amd.com>

commit f0d7bfb9407fccb6499ec01c33afe43512a439a2 upstream.

Need to unlock the crtc after updating the blanking state.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/radeon/evergreen.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -2573,6 +2573,7 @@ void evergreen_mc_stop(struct radeon_dev
 					WREG32(EVERGREEN_CRTC_UPDATE_LOCK + crtc_offsets[i], 1);
 					tmp |= EVERGREEN_CRTC_BLANK_DATA_EN;
 					WREG32(EVERGREEN_CRTC_BLANK_CONTROL + crtc_offsets[i], tmp);
+					WREG32(EVERGREEN_CRTC_UPDATE_LOCK + crtc_offsets[i], 0);
 				}
 			} else {
 				tmp = RREG32(EVERGREEN_CRTC_CONTROL + crtc_offsets[i]);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 042/122] ARM: 8191/1: decompressor: ensure I-side picks up relocated code
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 040/122] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 043/122] pinctrl: dra: dt-bindings: Fix output pull up/down Greg Kroah-Hartman
                   ` (76 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Marc Carino, Julien Grall,
	Will Deacon, Russell King

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Will Deacon <will.deacon@arm.com>

commit 238962ac71910d6c20162ea5230685fead1836a4 upstream.

To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.

Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).

This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.

Reported-by: Marc Carino <marc.ceeeee@gmail.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm/boot/compressed/head.S |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -400,8 +400,7 @@ dtb_check_done:
 		add	sp, sp, r6
 #endif
 
-		tst	r4, #1
-		bleq	cache_clean_flush
+		bl	cache_clean_flush
 
 		adr	r0, BSYM(restart)
 		add	r0, r0, r6
@@ -1050,6 +1049,8 @@ cache_clean_flush:
 		b	call_cache_fn
 
 __armv4_mpu_cache_flush:
+		tst	r4, #1
+		movne	pc, lr
 		mov	r2, #1
 		mov	r3, #0
 		mcr	p15, 0, ip, c7, c6, 0	@ invalidate D cache
@@ -1067,6 +1068,8 @@ __armv4_mpu_cache_flush:
 		mov	pc, lr
 		
 __fa526_cache_flush:
+		tst	r4, #1
+		movne	pc, lr
 		mov	r1, #0
 		mcr	p15, 0, r1, c7, c14, 0	@ clean and invalidate D cache
 		mcr	p15, 0, r1, c7, c5, 0	@ flush I cache
@@ -1075,13 +1078,16 @@ __fa526_cache_flush:
 
 __armv6_mmu_cache_flush:
 		mov	r1, #0
-		mcr	p15, 0, r1, c7, c14, 0	@ clean+invalidate D
+		tst	r4, #1
+		mcreq	p15, 0, r1, c7, c14, 0	@ clean+invalidate D
 		mcr	p15, 0, r1, c7, c5, 0	@ invalidate I+BTB
-		mcr	p15, 0, r1, c7, c15, 0	@ clean+invalidate unified
+		mcreq	p15, 0, r1, c7, c15, 0	@ clean+invalidate unified
 		mcr	p15, 0, r1, c7, c10, 4	@ drain WB
 		mov	pc, lr
 
 __armv7_mmu_cache_flush:
+		tst	r4, #1
+		bne	iflush
 		mrc	p15, 0, r10, c0, c1, 5	@ read ID_MMFR1
 		tst	r10, #0xf << 16		@ hierarchical cache (ARMv7)
 		mov	r10, #0
@@ -1142,6 +1148,8 @@ iflush:
 		mov	pc, lr
 
 __armv5tej_mmu_cache_flush:
+		tst	r4, #1
+		movne	pc, lr
 1:		mrc	p15, 0, r15, c7, c14, 3	@ test,clean,invalidate D cache
 		bne	1b
 		mcr	p15, 0, r0, c7, c5, 0	@ flush I cache
@@ -1149,6 +1157,8 @@ __armv5tej_mmu_cache_flush:
 		mov	pc, lr
 
 __armv4_mmu_cache_flush:
+		tst	r4, #1
+		movne	pc, lr
 		mov	r2, #64*1024		@ default: 32K dcache size (*2)
 		mov	r11, #32		@ default: 32 byte line size
 		mrc	p15, 0, r3, c0, c0, 1	@ read cache type
@@ -1182,6 +1192,8 @@ no_cache_id:
 
 __armv3_mmu_cache_flush:
 __armv3_mpu_cache_flush:
+		tst	r4, #1
+		movne	pc, lr
 		mov	r1, #0
 		mcr	p15, 0, r1, c7, c0, 0	@ invalidate whole cache v3
 		mov	pc, lr



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 043/122] pinctrl: dra: dt-bindings: Fix output pull up/down
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 042/122] ARM: 8191/1: decompressor: ensure I-side picks up relocated code Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 044/122] dm thin: grab a virtual cell before looking up the mapping Greg Kroah-Hartman
                   ` (75 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Roger Quadros, Nishanth Menon, Tony Lindgren

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roger Quadros <rogerq@ti.com>

commit 73b3a6657a88ef5348a0d69c9a8107d6f01ae862 upstream.

For PIN_OUTPUT_PULLUP and PIN_OUTPUT_PULLDOWN we must not set the
PULL_DIS bit which disables the PULLs.

PULL_ENA is a 0 and using it in an OR operation is a NOP, so don't
use it in the PIN_OUTPUT_PULLUP/DOWN macros.

Fixes: 23d9cec07c58 ("pinctrl: dra: dt-bindings: Fix pull enable/disable")

Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/dt-bindings/pinctrl/dra.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/dt-bindings/pinctrl/dra.h
+++ b/include/dt-bindings/pinctrl/dra.h
@@ -40,8 +40,8 @@
 
 /* Active pin states */
 #define PIN_OUTPUT		(0 | PULL_DIS)
-#define PIN_OUTPUT_PULLUP	(PIN_OUTPUT | PULL_ENA | PULL_UP)
-#define PIN_OUTPUT_PULLDOWN	(PIN_OUTPUT | PULL_ENA)
+#define PIN_OUTPUT_PULLUP	(PULL_UP)
+#define PIN_OUTPUT_PULLDOWN	(0)
 #define PIN_INPUT		(INPUT_EN | PULL_DIS)
 #define PIN_INPUT_SLEW		(INPUT_EN | SLEWCONTROL)
 #define PIN_INPUT_PULLUP	(PULL_ENA | INPUT_EN | PULL_UP)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 044/122] dm thin: grab a virtual cell before looking up the mapping
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 043/122] pinctrl: dra: dt-bindings: Fix output pull up/down Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 046/122] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
                   ` (74 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Joe Thornber, Mike Snitzer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joe Thornber <ejt@redhat.com>

commit c822ed967cba38505713d59ed40a114386ef6c01 upstream.

Avoids normal IO racing with discard.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-thin.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1704,6 +1704,14 @@ static int thin_bio_map(struct dm_target
 		return DM_MAPIO_SUBMITTED;
 	}
 
+	/*
+	 * We must hold the virtual cell before doing the lookup, otherwise
+	 * there's a race with discard.
+	 */
+	build_virtual_key(tc->td, block, &key);
+	if (dm_bio_detain(tc->pool->prison, &key, bio, &cell1, &cell_result))
+		return DM_MAPIO_SUBMITTED;
+
 	r = dm_thin_find_block(td, block, 0, &result);
 
 	/*
@@ -1727,13 +1735,10 @@ static int thin_bio_map(struct dm_target
 			 * shared flag will be set in their case.
 			 */
 			thin_defer_bio(tc, bio);
+			cell_defer_no_holder_no_free(tc, &cell1);
 			return DM_MAPIO_SUBMITTED;
 		}
 
-		build_virtual_key(tc->td, block, &key);
-		if (dm_bio_detain(tc->pool->prison, &key, bio, &cell1, &cell_result))
-			return DM_MAPIO_SUBMITTED;
-
 		build_data_key(tc->td, result.block, &key);
 		if (dm_bio_detain(tc->pool->prison, &key, bio, &cell2, &cell_result)) {
 			cell_defer_no_holder_no_free(tc, &cell1);
@@ -1754,6 +1759,7 @@ static int thin_bio_map(struct dm_target
 			 * of doing so.
 			 */
 			handle_unserviceable_bio(tc->pool, bio);
+			cell_defer_no_holder_no_free(tc, &cell1);
 			return DM_MAPIO_SUBMITTED;
 		}
 		/* fall through */
@@ -1764,6 +1770,7 @@ static int thin_bio_map(struct dm_target
 		 * provide the hint to load the metadata into cache.
 		 */
 		thin_defer_bio(tc, bio);
+		cell_defer_no_holder_no_free(tc, &cell1);
 		return DM_MAPIO_SUBMITTED;
 
 	default:
@@ -1773,6 +1780,7 @@ static int thin_bio_map(struct dm_target
 		 * pool is switched to fail-io mode.
 		 */
 		bio_io_error(bio);
+		cell_defer_no_holder_no_free(tc, &cell1);
 		return DM_MAPIO_SUBMITTED;
 	}
 }



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 046/122] firewire: cdev: prevent kernel stack leaking into ioctl arguments
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 044/122] dm thin: grab a virtual cell before looking up the mapping Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 047/122] ata: sata_rcar: Disable DIPM mode for r8a7790 ES1 Greg Kroah-Hartman
                   ` (73 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, David Ramos, Stefan Richter

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Stefan Richter <stefanr@s5r6.in-berlin.de>

commit eaca2d8e75e90a70a63a6695c9f61932609db212 upstream.

Found by the UC-KLEE tool:  A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect.  The handlers used data from uninitialized kernel stack then.

This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.

The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.

The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input.  That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call.  [Comment from Clemens Ladisch:  This part of the stack is
most likely to be already in the cache.]

Remarks:
  - There was never any leak from kernel stack to the ioctl output
    buffer itself.  IOW, it was not possible to read kernel stack by a
    read-type or write/read-type ioctl alone; the leak could at most
    happen in combination with read()ing subsequent event data.
  - The actual expected minimum user input of each ioctl from
    include/uapi/linux/firewire-cdev.h is, in bytes:
    [0x00] = 32, [0x05] =  4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
    [0x01] = 36, [0x06] = 20, [0x0b] =  4, [0x10] = 20, [0x15] = 20,
    [0x02] = 20, [0x07] =  4, [0x0c] =  0, [0x11] =  0, [0x16] =  8,
    [0x03] =  4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
    [0x04] = 20, [0x09] = 24, [0x0e] =  4, [0x13] = 40, [0x18] =  4.

Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/firewire/core-cdev.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1637,8 +1637,7 @@ static int dispatch_ioctl(struct client
 	    _IOC_SIZE(cmd) > sizeof(buffer))
 		return -ENOTTY;
 
-	if (_IOC_DIR(cmd) == _IOC_READ)
-		memset(&buffer, 0, _IOC_SIZE(cmd));
+	memset(&buffer, 0, sizeof(buffer));
 
 	if (_IOC_DIR(cmd) & _IOC_WRITE)
 		if (copy_from_user(&buffer, arg, _IOC_SIZE(cmd)))



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 047/122] ata: sata_rcar: Disable DIPM mode for r8a7790 ES1
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 046/122] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 048/122] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
                   ` (72 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Simon Horman, Tejun Heo

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Simon Horman <horms+renesas@verge.net.au>

commit aa1cf25887099bba68f1f3879c0d394e08b8779f upstream.

Unlike other SATA R-Car r8a7790 controllers the r8a7790 ES1 SATA R-Car
controller needs to be run with DIPM disabled.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/devicetree/bindings/ata/sata_rcar.txt |    3 ++-
 drivers/ata/sata_rcar.c                             |   10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

--- a/Documentation/devicetree/bindings/ata/sata_rcar.txt
+++ b/Documentation/devicetree/bindings/ata/sata_rcar.txt
@@ -3,7 +3,8 @@
 Required properties:
 - compatible		: should contain one of the following:
 			  - "renesas,sata-r8a7779" for R-Car H1
-			  - "renesas,sata-r8a7790" for R-Car H2
+			  - "renesas,sata-r8a7790-es1" for R-Car H2 ES1
+			  - "renesas,sata-r8a7790" for R-Car H2 other than ES1
 			  - "renesas,sata-r8a7791" for R-Car M2
 - reg			: address and length of the SATA registers;
 - interrupts		: must consist of one interrupt specifier.
--- a/drivers/ata/sata_rcar.c
+++ b/drivers/ata/sata_rcar.c
@@ -146,6 +146,7 @@
 enum sata_rcar_type {
 	RCAR_GEN1_SATA,
 	RCAR_GEN2_SATA,
+	RCAR_R8A7790_ES1_SATA,
 };
 
 struct sata_rcar_priv {
@@ -763,6 +764,9 @@ static void sata_rcar_setup_port(struct
 	ap->udma_mask	= ATA_UDMA6;
 	ap->flags	|= ATA_FLAG_SATA;
 
+	if (priv->type == RCAR_R8A7790_ES1_SATA)
+		ap->flags	|= ATA_FLAG_NO_DIPM;
+
 	ioaddr->cmd_addr = base + SDATA_REG;
 	ioaddr->ctl_addr = base + SSDEVCON_REG;
 	ioaddr->scr_addr = base + SCRSSTS_REG;
@@ -792,6 +796,7 @@ static void sata_rcar_init_controller(st
 		sata_rcar_gen1_phy_init(priv);
 		break;
 	case RCAR_GEN2_SATA:
+	case RCAR_R8A7790_ES1_SATA:
 		sata_rcar_gen2_phy_init(priv);
 		break;
 	default:
@@ -838,6 +843,10 @@ static struct of_device_id sata_rcar_mat
 		.data = (void *)RCAR_GEN2_SATA
 	},
 	{
+		.compatible = "renesas,sata-r8a7790-es1",
+		.data = (void *)RCAR_R8A7790_ES1_SATA
+	},
+	{
 		.compatible = "renesas,sata-r8a7791",
 		.data = (void *)RCAR_GEN2_SATA
 	},
@@ -849,6 +858,7 @@ static const struct platform_device_id s
 	{ "sata_rcar", RCAR_GEN1_SATA }, /* Deprecated by "sata-r8a7779" */
 	{ "sata-r8a7779", RCAR_GEN1_SATA },
 	{ "sata-r8a7790", RCAR_GEN2_SATA },
+	{ "sata-r8a7790-es1", RCAR_R8A7790_ES1_SATA },
 	{ "sata-r8a7791", RCAR_GEN2_SATA },
 	{ },
 };



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 048/122] nfs: fix pnfs direct write memory leak
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 047/122] ata: sata_rcar: Disable DIPM mode for r8a7790 ES1 Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 049/122] Correct the race condition in aarch64_insn_patch_text_sync() Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Peng Tao, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peng Tao <tao.peng@primarydata.com>

commit 8c393f9a721c30a030049a680e1bf896669bb279 upstream.

For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
So we need to take care to free them when freeing dreq.

Ideally this needs to be done inside layout driver where ds_cinfo.buckets
are allocated. But buckets are attached to dreq and reused across LD IO iterations.
So I feel it's OK to free them in the generic layer.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/direct.c         |    1 +
 include/linux/nfs_xdr.h |   11 +++++++++++
 2 files changed, 12 insertions(+)

--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -178,6 +178,7 @@ static void nfs_direct_req_free(struct k
 {
 	struct nfs_direct_req *dreq = container_of(kref, struct nfs_direct_req, kref);
 
+	nfs_free_pnfs_ds_cinfo(&dreq->ds_cinfo);
 	if (dreq->l_ctx != NULL)
 		nfs_put_lock_context(dreq->l_ctx);
 	if (dreq->ctx != NULL)
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1247,11 +1247,22 @@ struct nfs41_free_stateid_res {
 	unsigned int			status;
 };
 
+static inline void
+nfs_free_pnfs_ds_cinfo(struct pnfs_ds_commit_info *cinfo)
+{
+	kfree(cinfo->buckets);
+}
+
 #else
 
 struct pnfs_ds_commit_info {
 };
 
+static inline void
+nfs_free_pnfs_ds_cinfo(struct pnfs_ds_commit_info *cinfo)
+{
+}
+
 #endif /* CONFIG_NFS_V4_1 */
 
 struct nfs_page;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 049/122] Correct the race condition in aarch64_insn_patch_text_sync()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 048/122] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 050/122] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, William Cohen, Will Deacon, Catalin Marinas

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: William Cohen <wcohen@redhat.com>

commit 899d5933b2dd2720f2b20b01eaa07871aa6ad096 upstream.

When experimenting with patches to provide kprobes support for aarch64
smp machines would hang when inserting breakpoints into kernel code.
The hangs were caused by a race condition in the code called by
aarch64_insn_patch_text_sync().  The first processor in the
aarch64_insn_patch_text_cb() function would patch the code while other
processors were still entering the function and incrementing the
cpu_count field.  This resulted in some processors never observing the
exit condition and exiting the function.  Thus, processors in the
system hung.

The first processor to enter the patching function performs the
patching and signals that the patching is complete with an increment
of the cpu_count field. When all the processors have incremented the
cpu_count field the cpu_count will be num_cpus_online()+1 and they
will return to normal execution.

Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code
Signed-off-by: William Cohen <wcohen@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm64/kernel/insn.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -156,9 +156,10 @@ static int __kprobes aarch64_insn_patch_
 		 * which ends with "dsb; isb" pair guaranteeing global
 		 * visibility.
 		 */
-		atomic_set(&pp->cpu_count, -1);
+		/* Notify other processors with an additional increment. */
+		atomic_inc(&pp->cpu_count);
 	} else {
-		while (atomic_read(&pp->cpu_count) != -1)
+		while (atomic_read(&pp->cpu_count) <= num_online_cpus())
 			cpu_relax();
 		isb();
 	}



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 050/122] scsi: only re-lock door after EH on devices that were reset
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 049/122] Correct the race condition in aarch64_insn_patch_text_sync() Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 051/122] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Meelis Roos,
	Martin K. Petersen

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 48379270fe6808cf4612ee094adc8da2b7a83baa upstream.

Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH.  Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state.  Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Meelis Roos <meelis.roos@ut.ee>
Tested-by: Meelis Roos <meelis.roos@ut.ee>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/scsi_error.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1984,8 +1984,10 @@ static void scsi_restart_operations(stru
 	 * is no point trying to lock the door of an off-line device.
 	 */
 	shost_for_each_device(sdev, shost) {
-		if (scsi_device_online(sdev) && sdev->locked)
+		if (scsi_device_online(sdev) && sdev->was_reset && sdev->locked) {
 			scsi_eh_lock_door(sdev);
+			sdev->was_reset = 0;
+		}
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 051/122] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 050/122] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 052/122] block: Fix computation of merged request priority Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Helge Deller, John David Anglin

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit 2fe749f50b0bec07650ef135b29b1f55bf543869 upstream.

Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
	shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.

Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/include/uapi/asm/shmbuf.h |   25 +++++++++----------------
 arch/parisc/kernel/syscall_table.S    |    8 ++++----
 2 files changed, 13 insertions(+), 20 deletions(-)

--- a/arch/parisc/include/uapi/asm/shmbuf.h
+++ b/arch/parisc/include/uapi/asm/shmbuf.h
@@ -36,23 +36,16 @@ struct shmid64_ds {
 	unsigned int		__unused2;
 };
 
-#ifdef CONFIG_64BIT
-/* The 'unsigned int' (formerly 'unsigned long') data types below will
- * ensure that a 32-bit app calling shmctl(*,IPC_INFO,*) will work on
- * a wide kernel, but if some of these values are meant to contain pointers
- * they may need to be 'long long' instead. -PB XXX FIXME
- */
-#endif
 struct shminfo64 {
-	unsigned int	shmmax;
-	unsigned int	shmmin;
-	unsigned int	shmmni;
-	unsigned int	shmseg;
-	unsigned int	shmall;
-	unsigned int	__unused1;
-	unsigned int	__unused2;
-	unsigned int	__unused3;
-	unsigned int	__unused4;
+	unsigned long	shmmax;
+	unsigned long	shmmin;
+	unsigned long	shmmni;
+	unsigned long	shmseg;
+	unsigned long	shmall;
+	unsigned long	__unused1;
+	unsigned long	__unused2;
+	unsigned long	__unused3;
+	unsigned long	__unused4;
 };
 
 #endif /* _PARISC_SHMBUF_H */
--- a/arch/parisc/kernel/syscall_table.S
+++ b/arch/parisc/kernel/syscall_table.S
@@ -286,11 +286,11 @@
 	ENTRY_COMP(msgsnd)
 	ENTRY_COMP(msgrcv)
 	ENTRY_SAME(msgget)		/* 190 */
-	ENTRY_SAME(msgctl)
-	ENTRY_SAME(shmat)
+	ENTRY_COMP(msgctl)
+	ENTRY_COMP(shmat)
 	ENTRY_SAME(shmdt)
 	ENTRY_SAME(shmget)
-	ENTRY_SAME(shmctl)		/* 195 */
+	ENTRY_COMP(shmctl)		/* 195 */
 	ENTRY_SAME(ni_syscall)		/* streams1 */
 	ENTRY_SAME(ni_syscall)		/* streams2 */
 	ENTRY_SAME(lstat64)
@@ -323,7 +323,7 @@
 	ENTRY_SAME(epoll_ctl)		/* 225 */
 	ENTRY_SAME(epoll_wait)
  	ENTRY_SAME(remap_file_pages)
-	ENTRY_SAME(semtimedop)
+	ENTRY_COMP(semtimedop)
 	ENTRY_COMP(mq_open)
 	ENTRY_SAME(mq_unlink)		/* 230 */
 	ENTRY_COMP(mq_timedsend)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 052/122] block: Fix computation of merged request priority
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 051/122] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 053/122] dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Jeff Moyer, Jens Axboe

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit ece9c72accdc45c3a9484dacb1125ce572647288 upstream.

Priority of a merged request is computed by ioprio_best(). If one of the
requests has undefined priority (IOPRIO_CLASS_NONE) and another request
has priority from IOPRIO_CLASS_BE, the function will return the
undefined priority which is wrong. Fix the function to properly return
priority of a request with the defined priority.

Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ioprio.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/fs/ioprio.c
+++ b/fs/ioprio.c
@@ -157,14 +157,16 @@ out:
 
 int ioprio_best(unsigned short aprio, unsigned short bprio)
 {
-	unsigned short aclass = IOPRIO_PRIO_CLASS(aprio);
-	unsigned short bclass = IOPRIO_PRIO_CLASS(bprio);
+	unsigned short aclass;
+	unsigned short bclass;
 
-	if (aclass == IOPRIO_CLASS_NONE)
-		aclass = IOPRIO_CLASS_BE;
-	if (bclass == IOPRIO_CLASS_NONE)
-		bclass = IOPRIO_CLASS_BE;
+	if (!ioprio_valid(aprio))
+		aprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
+	if (!ioprio_valid(bprio))
+		bprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM);
 
+	aclass = IOPRIO_PRIO_CLASS(aprio);
+	bclass = IOPRIO_PRIO_CLASS(bprio);
 	if (aclass == bclass)
 		return min(aprio, bprio);
 	if (aclass > bclass)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 053/122] dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 052/122] block: Fix computation of merged request priority Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 054/122] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Mikulas Patocka, Mike Snitzer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <mpatocka@redhat.com>

commit 9d28eb12447ee08bb5d1e8bb3195cf20e1ecd1c0 upstream.

The shrinker uses gfp flags to indicate what kind of operation can the
driver wait for. If __GFP_IO flag is present, the driver can wait for
block I/O operations, if __GFP_FS flag is present, the driver can wait on
operations involving the filesystem.

dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
device that makes calls into the filesystem. If __GFP_IO is present and
__GFP_FS isn't, dm-bufio could still block on filesystem operations if it
runs on a loop block device.

The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
unreproducible) deadlock involving dm-bufio and loop device.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-bufio.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1448,9 +1448,9 @@ static void drop_buffers(struct dm_bufio
 
 /*
  * Test if the buffer is unused and too old, and commit it.
- * At if noio is set, we must not do any I/O because we hold
- * dm_bufio_clients_lock and we would risk deadlock if the I/O gets rerouted to
- * different bufio client.
+ * And if GFP_NOFS is used, we must not do any I/O because we hold
+ * dm_bufio_clients_lock and we would risk deadlock if the I/O gets
+ * rerouted to different bufio client.
  */
 static int __cleanup_old_buffer(struct dm_buffer *b, gfp_t gfp,
 				unsigned long max_jiffies)
@@ -1458,7 +1458,7 @@ static int __cleanup_old_buffer(struct d
 	if (jiffies - b->last_accessed < max_jiffies)
 		return 0;
 
-	if (!(gfp & __GFP_IO)) {
+	if (!(gfp & __GFP_FS)) {
 		if (test_bit(B_READING, &b->state) ||
 		    test_bit(B_WRITING, &b->state) ||
 		    test_bit(B_DIRTY, &b->state))
@@ -1500,7 +1500,7 @@ dm_bufio_shrink_scan(struct shrinker *sh
 	unsigned long freed;
 
 	c = container_of(shrink, struct dm_bufio_client, shrinker);
-	if (sc->gfp_mask & __GFP_IO)
+	if (sc->gfp_mask & __GFP_FS)
 		dm_bufio_lock(c);
 	else if (!dm_bufio_trylock(c))
 		return SHRINK_STOP;
@@ -1517,7 +1517,7 @@ dm_bufio_shrink_count(struct shrinker *s
 	unsigned long count;
 
 	c = container_of(shrink, struct dm_bufio_client, shrinker);
-	if (sc->gfp_mask & __GFP_IO)
+	if (sc->gfp_mask & __GFP_FS)
 		dm_bufio_lock(c);
 	else if (!dm_bufio_trylock(c))
 		return 0;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 054/122] dm btree: fix a recursion depth bug in btree walking code
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 053/122] dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 055/122] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Joe Thornber, Mike Snitzer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joe Thornber <ejt@redhat.com>

commit 9b460d3699324d570a4d4161c3741431887f102f upstream.

The walk code was using a 'ro_spine' to hold it's locked btree nodes.
But this data structure is designed for the rolling lock scheme, and
as such automatically unlocks blocks that are two steps up the call
chain.  This is not suitable for the simple recursive walk algorithm,
which retraces its steps.

This code is only used by the persistent array code, which in turn is
only used by dm-cache.  In order to trigger it you need to have a
mapping tree that is more than 2 levels deep; which equates to 8-16
million cache blocks.  For instance a 4T ssd with a very small block
size of 32k only just triggers this bug.

The fix just places the locked blocks on the stack, and stops using
the ro_spine altogether.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/persistent-data/dm-btree-internal.h |    6 ++++++
 drivers/md/persistent-data/dm-btree-spine.c    |    2 +-
 drivers/md/persistent-data/dm-btree.c          |   24 ++++++++++--------------
 3 files changed, 17 insertions(+), 15 deletions(-)

--- a/drivers/md/persistent-data/dm-btree-internal.h
+++ b/drivers/md/persistent-data/dm-btree-internal.h
@@ -42,6 +42,12 @@ struct btree_node {
 } __packed;
 
 
+/*
+ * Locks a block using the btree node validator.
+ */
+int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
+		 struct dm_block **result);
+
 void inc_children(struct dm_transaction_manager *tm, struct btree_node *n,
 		  struct dm_btree_value_type *vt);
 
--- a/drivers/md/persistent-data/dm-btree-spine.c
+++ b/drivers/md/persistent-data/dm-btree-spine.c
@@ -92,7 +92,7 @@ struct dm_block_validator btree_node_val
 
 /*----------------------------------------------------------------*/
 
-static int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
+int bn_read_lock(struct dm_btree_info *info, dm_block_t b,
 		 struct dm_block **result)
 {
 	return dm_tm_read_lock(info->tm, b, &btree_node_validator, result);
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -847,22 +847,26 @@ EXPORT_SYMBOL_GPL(dm_btree_find_lowest_k
  * FIXME: We shouldn't use a recursive algorithm when we have limited stack
  * space.  Also this only works for single level trees.
  */
-static int walk_node(struct ro_spine *s, dm_block_t block,
+static int walk_node(struct dm_btree_info *info, dm_block_t block,
 		     int (*fn)(void *context, uint64_t *keys, void *leaf),
 		     void *context)
 {
 	int r;
 	unsigned i, nr;
+	struct dm_block *node;
 	struct btree_node *n;
 	uint64_t keys;
 
-	r = ro_step(s, block);
-	n = ro_node(s);
+	r = bn_read_lock(info, block, &node);
+	if (r)
+		return r;
+
+	n = dm_block_data(node);
 
 	nr = le32_to_cpu(n->header.nr_entries);
 	for (i = 0; i < nr; i++) {
 		if (le32_to_cpu(n->header.flags) & INTERNAL_NODE) {
-			r = walk_node(s, value64(n, i), fn, context);
+			r = walk_node(info, value64(n, i), fn, context);
 			if (r)
 				goto out;
 		} else {
@@ -874,7 +878,7 @@ static int walk_node(struct ro_spine *s,
 	}
 
 out:
-	ro_pop(s);
+	dm_tm_unlock(info->tm, node);
 	return r;
 }
 
@@ -882,15 +886,7 @@ int dm_btree_walk(struct dm_btree_info *
 		  int (*fn)(void *context, uint64_t *keys, void *leaf),
 		  void *context)
 {
-	int r;
-	struct ro_spine spine;
-
 	BUG_ON(info->levels > 1);
-
-	init_ro_spine(&spine, info);
-	r = walk_node(&spine, root, fn, context);
-	exit_ro_spine(&spine);
-
-	return r;
+	return walk_node(info, root, fn, context);
 }
 EXPORT_SYMBOL_GPL(dm_btree_walk);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 055/122] dm raid: ensure superblocks size matches devices logical block size
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 054/122] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 056/122] Input: synaptics - add min/max quirk for Lenovo T440s Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Liuhua Wang, Heinz Mauelshagen,
	Dan Carpenter, Mike Snitzer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Heinz Mauelshagen <heinzm@redhat.com>

commit 40d43c4b4cac4c2647bf07110d7b07d35f399a84 upstream.

The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.

Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.

Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work.  Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.

[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" <lwang@suse.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm-raid.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -785,8 +785,7 @@ struct dm_raid_superblock {
 	__le32 layout;
 	__le32 stripe_sectors;
 
-	__u8 pad[452];		/* Round struct to 512 bytes. */
-				/* Always set to 0 when writing. */
+	/* Remainder of a logical block is zero-filled when writing (see super_sync()). */
 } __packed;
 
 static int read_disk_sb(struct md_rdev *rdev, int size)
@@ -823,7 +822,7 @@ static void super_sync(struct mddev *mdd
 		    test_bit(Faulty, &(rs->dev[i].rdev.flags)))
 			failed_devices |= (1ULL << i);
 
-	memset(sb, 0, sizeof(*sb));
+	memset(sb + 1, 0, rdev->sb_size - sizeof(*sb));
 
 	sb->magic = cpu_to_le32(DM_RAID_MAGIC);
 	sb->features = cpu_to_le32(0);	/* No features yet */
@@ -858,7 +857,11 @@ static int super_load(struct md_rdev *rd
 	uint64_t events_sb, events_refsb;
 
 	rdev->sb_start = 0;
-	rdev->sb_size = sizeof(*sb);
+	rdev->sb_size = bdev_logical_block_size(rdev->meta_bdev);
+	if (rdev->sb_size < sizeof(*sb) || rdev->sb_size > PAGE_SIZE) {
+		DMERR("superblock size of a logical block is no longer valid");
+		return -EINVAL;
+	}
 
 	ret = read_disk_sb(rdev, rdev->sb_size);
 	if (ret)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 056/122] Input: synaptics - add min/max quirk for Lenovo T440s
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 055/122] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 060/122] power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Takashi Iwai, Dmitry Torokhov

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit e4742b1e786ca386e88e6cfb2801e14e15e365cd upstream.

The new Lenovo T440s laptop has a different PnP ID "LEN0039", and it
needs the similar min/max quirk to make its clickpad working.

BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=903748
Reported-and-tested-by: Joschi Brauchle <joschibrauchle@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/input/mouse/synaptics.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -132,8 +132,8 @@ static const struct min_max_quirk min_ma
 		1232, 5710, 1156, 4696
 	},
 	{
-		(const char * const []){"LEN0034", "LEN0036", "LEN2002",
-					"LEN2004", NULL},
+		(const char * const []){"LEN0034", "LEN0036", "LEN0039",
+					"LEN2002", "LEN2004", NULL},
 		1024, 5112, 2024, 4832
 	},
 	{
@@ -160,6 +160,7 @@ static const char * const topbuttonpad_p
 	"LEN0036", /* T440 */
 	"LEN0037",
 	"LEN0038",
+	"LEN0039", /* T440s */
 	"LEN0041",
 	"LEN0042", /* Yoga */
 	"LEN0045",



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 060/122] power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 056/122] Input: synaptics - add min/max quirk for Lenovo T440s Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 061/122] power: charger-manager: Fix accessing invalidated power supply after charger unbind Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Krzysztof Kozlowski, Sebastian Reichel

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Krzysztof Kozlowski <k.kozlowski@samsung.com>

commit bdbe81445407644492b9ac69a24d35e3202d773b upstream.

The charger manager obtained reference to fuel gauge power supply in probe
with power_supply_get_by_name() for later usage. However if fuel gauge
driver was removed and re-added then this reference would point to old
power supply (from driver which was removed).

This lead to accessing old (and probably invalid) memory which could be
observed with:
$ echo "12-0036" > /sys/bus/i2c/drivers/max17042/unbind
$ echo "12-0036" > /sys/bus/i2c/drivers/max17042/bind
$ cat /sys/devices/virtual/power_supply/battery/capacity
[  240.480084] INFO: task cat:1393 blocked for more than 120 seconds.
[  240.484799]       Not tainted 3.17.0-next-20141007-00028-ge60b6dd79570 #203
[  240.491782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.499589] cat             D c0469530     0  1393      1 0x00000000
[  240.505947] [<c0469530>] (__schedule) from [<c0469d3c>] (schedule_preempt_disabled+0x14/0x20)
[  240.514449] [<c0469d3c>] (schedule_preempt_disabled) from [<c046af08>] (mutex_lock_nested+0x1bc/0x458)
[  240.523736] [<c046af08>] (mutex_lock_nested) from [<c0287a98>] (regmap_read+0x30/0x60)
[  240.531647] [<c0287a98>] (regmap_read) from [<c032238c>] (max17042_get_property+0x2e8/0x350)
[  240.540055] [<c032238c>] (max17042_get_property) from [<c03247d8>] (charger_get_property+0x264/0x348)
[  240.549252] [<c03247d8>] (charger_get_property) from [<c0320764>] (power_supply_show_property+0x48/0x1e0)
[  240.558808] [<c0320764>] (power_supply_show_property) from [<c027308c>] (dev_attr_show+0x1c/0x48)
[  240.567664] [<c027308c>] (dev_attr_show) from [<c0141fb0>] (sysfs_kf_seq_show+0x84/0x104)
[  240.575814] [<c0141fb0>] (sysfs_kf_seq_show) from [<c0140b18>] (kernfs_seq_show+0x24/0x28)
[  240.584061] [<c0140b18>] (kernfs_seq_show) from [<c0104574>] (seq_read+0x1b0/0x484)
[  240.591702] [<c0104574>] (seq_read) from [<c00e1e24>] (vfs_read+0x88/0x144)
[  240.598640] [<c00e1e24>] (vfs_read) from [<c00e1f20>] (SyS_read+0x40/0x8c)
[  240.605507] [<c00e1f20>] (SyS_read) from [<c000e760>] (ret_fast_syscall+0x0/0x48)
[  240.612952] 4 locks held by cat/1393:
[  240.616589]  #0:  (&p->lock){+.+.+.}, at: [<c01043f4>] seq_read+0x30/0x484
[  240.623414]  #1:  (&of->mutex){+.+.+.}, at: [<c01417dc>] kernfs_seq_start+0x1c/0x8c
[  240.631086]  #2:  (s_active#31){++++.+}, at: [<c01417e4>] kernfs_seq_start+0x24/0x8c
[  240.638777]  #3:  (&map->mutex){+.+...}, at: [<c0287a98>] regmap_read+0x30/0x60

The charger-manager should get reference to fuel gauge power supply on
each use of get_property callback. The thermal zone 'tzd' field of
power supply should not be used because of the same reason.

Additionally this change solves also the issue with nested
thermal_zone_get_temp() calls and related false lockdep positive for
deadlock for thermal zone's mutex [1]. When fuel gauge is used as source of
temperature then the charger manager forwards its get_temp calls to fuel
gauge thermal zone. So actually different mutexes are used (one for
charger manager thermal zone and second for fuel gauge thermal zone) but
for lockdep this is one class of mutex.

The recursion is removed by retrieving temperature through power
supply's get_property().

In case external thermal zone is used ('cm-thermal-zone' property is
present in DTS) the recursion does not exist. Charger manager simply
exports POWER_SUPPLY_PROP_TEMP_AMBIENT property (instead of
POWER_SUPPLY_PROP_TEMP) thus no thermal zone is created for this power
supply.

[1] https://lkml.org/lkml/2014/10/6/309

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/power/charger-manager.c       |   99 ++++++++++++++++++++++++----------
 include/linux/power/charger-manager.h |    1 
 2 files changed, 71 insertions(+), 29 deletions(-)

--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -97,6 +97,7 @@ static struct charger_global_desc *g_des
 static bool is_batt_present(struct charger_manager *cm)
 {
 	union power_supply_propval val;
+	struct power_supply *psy;
 	bool present = false;
 	int i, ret;
 
@@ -107,7 +108,11 @@ static bool is_batt_present(struct charg
 	case CM_NO_BATTERY:
 		break;
 	case CM_FUEL_GAUGE:
-		ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+		psy = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+		if (!psy)
+			break;
+
+		ret = psy->get_property(psy,
 				POWER_SUPPLY_PROP_PRESENT, &val);
 		if (ret == 0 && val.intval)
 			present = true;
@@ -167,12 +172,14 @@ static bool is_ext_pwr_online(struct cha
 static int get_batt_uV(struct charger_manager *cm, int *uV)
 {
 	union power_supply_propval val;
+	struct power_supply *fuel_gauge;
 	int ret;
 
-	if (!cm->fuel_gauge)
+	fuel_gauge = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+	if (!fuel_gauge)
 		return -ENODEV;
 
-	ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+	ret = fuel_gauge->get_property(fuel_gauge,
 				POWER_SUPPLY_PROP_VOLTAGE_NOW, &val);
 	if (ret)
 		return ret;
@@ -248,6 +255,7 @@ static bool is_full_charged(struct charg
 {
 	struct charger_desc *desc = cm->desc;
 	union power_supply_propval val;
+	struct power_supply *fuel_gauge;
 	int ret = 0;
 	int uV;
 
@@ -255,11 +263,15 @@ static bool is_full_charged(struct charg
 	if (!is_batt_present(cm))
 		return false;
 
-	if (cm->fuel_gauge && desc->fullbatt_full_capacity > 0) {
+	fuel_gauge = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+	if (!fuel_gauge)
+		return false;
+
+	if (desc->fullbatt_full_capacity > 0) {
 		val.intval = 0;
 
 		/* Not full if capacity of fuel gauge isn't full */
-		ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+		ret = fuel_gauge->get_property(fuel_gauge,
 				POWER_SUPPLY_PROP_CHARGE_FULL, &val);
 		if (!ret && val.intval > desc->fullbatt_full_capacity)
 			return true;
@@ -273,10 +285,10 @@ static bool is_full_charged(struct charg
 	}
 
 	/* Full, if the capacity is more than fullbatt_soc */
-	if (cm->fuel_gauge && desc->fullbatt_soc > 0) {
+	if (desc->fullbatt_soc > 0) {
 		val.intval = 0;
 
-		ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+		ret = fuel_gauge->get_property(fuel_gauge,
 				POWER_SUPPLY_PROP_CAPACITY, &val);
 		if (!ret && val.intval >= desc->fullbatt_soc)
 			return true;
@@ -551,6 +563,20 @@ static int check_charging_duration(struc
 	return ret;
 }
 
+static int cm_get_battery_temperature_by_psy(struct charger_manager *cm,
+					int *temp)
+{
+	struct power_supply *fuel_gauge;
+
+	fuel_gauge = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+	if (!fuel_gauge)
+		return -ENODEV;
+
+	return fuel_gauge->get_property(fuel_gauge,
+				POWER_SUPPLY_PROP_TEMP,
+				(union power_supply_propval *)temp);
+}
+
 static int cm_get_battery_temperature(struct charger_manager *cm,
 					int *temp)
 {
@@ -560,15 +586,18 @@ static int cm_get_battery_temperature(st
 		return -ENODEV;
 
 #ifdef CONFIG_THERMAL
-	ret = thermal_zone_get_temp(cm->tzd_batt, (unsigned long *)temp);
-	if (!ret)
-		/* Calibrate temperature unit */
-		*temp /= 100;
-#else
-	ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
-				POWER_SUPPLY_PROP_TEMP,
-				(union power_supply_propval *)temp);
+	if (cm->tzd_batt) {
+		ret = thermal_zone_get_temp(cm->tzd_batt, (unsigned long *)temp);
+		if (!ret)
+			/* Calibrate temperature unit */
+			*temp /= 100;
+	} else
 #endif
+	{
+		/* if-else continued from CONFIG_THERMAL */
+		ret = cm_get_battery_temperature_by_psy(cm, temp);
+	}
+
 	return ret;
 }
 
@@ -827,6 +856,7 @@ static int charger_get_property(struct p
 	struct charger_manager *cm = container_of(psy,
 			struct charger_manager, charger_psy);
 	struct charger_desc *desc = cm->desc;
+	struct power_supply *fuel_gauge;
 	int ret = 0;
 	int uV;
 
@@ -857,14 +887,20 @@ static int charger_get_property(struct p
 		ret = get_batt_uV(cm, &val->intval);
 		break;
 	case POWER_SUPPLY_PROP_CURRENT_NOW:
-		ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+		fuel_gauge = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+		if (!fuel_gauge) {
+			ret = -ENODEV;
+			break;
+		}
+		ret = fuel_gauge->get_property(fuel_gauge,
 				POWER_SUPPLY_PROP_CURRENT_NOW, val);
 		break;
 	case POWER_SUPPLY_PROP_TEMP:
 	case POWER_SUPPLY_PROP_TEMP_AMBIENT:
 		return cm_get_battery_temperature(cm, &val->intval);
 	case POWER_SUPPLY_PROP_CAPACITY:
-		if (!cm->fuel_gauge) {
+		fuel_gauge = power_supply_get_by_name(cm->desc->psy_fuel_gauge);
+		if (!fuel_gauge) {
 			ret = -ENODEV;
 			break;
 		}
@@ -875,7 +911,7 @@ static int charger_get_property(struct p
 			break;
 		}
 
-		ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+		ret = fuel_gauge->get_property(fuel_gauge,
 					POWER_SUPPLY_PROP_CAPACITY, val);
 		if (ret)
 			break;
@@ -924,7 +960,14 @@ static int charger_get_property(struct p
 		break;
 	case POWER_SUPPLY_PROP_CHARGE_NOW:
 		if (is_charging(cm)) {
-			ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+			fuel_gauge = power_supply_get_by_name(
+					cm->desc->psy_fuel_gauge);
+			if (!fuel_gauge) {
+				ret = -ENODEV;
+				break;
+			}
+
+			ret = fuel_gauge->get_property(fuel_gauge,
 						POWER_SUPPLY_PROP_CHARGE_NOW,
 						val);
 			if (ret) {
@@ -1485,14 +1528,15 @@ err:
 	return ret;
 }
 
-static int cm_init_thermal_data(struct charger_manager *cm)
+static int cm_init_thermal_data(struct charger_manager *cm,
+		struct power_supply *fuel_gauge)
 {
 	struct charger_desc *desc = cm->desc;
 	union power_supply_propval val;
 	int ret;
 
 	/* Verify whether fuel gauge provides battery temperature */
-	ret = cm->fuel_gauge->get_property(cm->fuel_gauge,
+	ret = fuel_gauge->get_property(fuel_gauge,
 					POWER_SUPPLY_PROP_TEMP, &val);
 
 	if (!ret) {
@@ -1502,8 +1546,6 @@ static int cm_init_thermal_data(struct c
 		cm->desc->measure_battery_temp = true;
 	}
 #ifdef CONFIG_THERMAL
-	cm->tzd_batt = cm->fuel_gauge->tzd;
-
 	if (ret && desc->thermal_zone) {
 		cm->tzd_batt =
 			thermal_zone_get_zone_by_name(desc->thermal_zone);
@@ -1666,6 +1708,7 @@ static int charger_manager_probe(struct
 	int ret = 0, i = 0;
 	int j = 0;
 	union power_supply_propval val;
+	struct power_supply *fuel_gauge;
 
 	if (g_desc && !rtc_dev && g_desc->rtc_name) {
 		rtc_dev = rtc_class_open(g_desc->rtc_name);
@@ -1744,8 +1787,8 @@ static int charger_manager_probe(struct
 		}
 	}
 
-	cm->fuel_gauge = power_supply_get_by_name(desc->psy_fuel_gauge);
-	if (!cm->fuel_gauge) {
+	fuel_gauge = power_supply_get_by_name(desc->psy_fuel_gauge);
+	if (!fuel_gauge) {
 		dev_err(&pdev->dev, "Cannot find power supply \"%s\"\n",
 			desc->psy_fuel_gauge);
 		return -ENODEV;
@@ -1788,13 +1831,13 @@ static int charger_manager_probe(struct
 	cm->charger_psy.num_properties = psy_default.num_properties;
 
 	/* Find which optional psy-properties are available */
-	if (!cm->fuel_gauge->get_property(cm->fuel_gauge,
+	if (!fuel_gauge->get_property(fuel_gauge,
 					  POWER_SUPPLY_PROP_CHARGE_NOW, &val)) {
 		cm->charger_psy.properties[cm->charger_psy.num_properties] =
 				POWER_SUPPLY_PROP_CHARGE_NOW;
 		cm->charger_psy.num_properties++;
 	}
-	if (!cm->fuel_gauge->get_property(cm->fuel_gauge,
+	if (!fuel_gauge->get_property(fuel_gauge,
 					  POWER_SUPPLY_PROP_CURRENT_NOW,
 					  &val)) {
 		cm->charger_psy.properties[cm->charger_psy.num_properties] =
@@ -1802,7 +1845,7 @@ static int charger_manager_probe(struct
 		cm->charger_psy.num_properties++;
 	}
 
-	ret = cm_init_thermal_data(cm);
+	ret = cm_init_thermal_data(cm, fuel_gauge);
 	if (ret) {
 		dev_err(&pdev->dev, "Failed to initialize thermal data\n");
 		cm->desc->measure_battery_temp = false;
--- a/include/linux/power/charger-manager.h
+++ b/include/linux/power/charger-manager.h
@@ -253,7 +253,6 @@ struct charger_manager {
 	struct device *dev;
 	struct charger_desc *desc;
 
-	struct power_supply *fuel_gauge;
 	struct power_supply **charger_stat;
 
 #ifdef CONFIG_THERMAL



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 061/122] power: charger-manager: Fix accessing invalidated power supply after charger unbind
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 060/122] power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 062/122] power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Krzysztof Kozlowski, Sebastian Reichel

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Krzysztof Kozlowski <k.kozlowski@samsung.com>

commit cdaf3e15385d3232b52287e50692506f8fd01a09 upstream.

The charger manager obtained in probe references to power supplies for
all chargers with power_supply_get_by_name() for later usage. However
if such charger driver was removed then this reference would point to
old power supply (from driver which was removed).

This lead to accessing invalid memory which could be observed with:
$ echo "max77693-charger" > /sys/bus/platform/drivers/max77693-charger/unbind
$ grep . /sys/devices/virtual/power_supply/battery/charger.0/*
$ grep . /sys/devices/virtual/power_supply/battery/*
[   15.339817] Unable to handle kernel paging request at virtual address 0001c12c
[   15.346187] pgd = edd08000
[   15.348814] [0001c12c] *pgd=6dce2831, *pte=00000000, *ppte=00000000
[   15.355075] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[   15.360967] Modules linked in:
[   15.364010] CPU: 2 PID: 1388 Comm: grep Not tainted 3.17.0-next-20141007-00027-ga95e761db1b0 #245
[   15.372859] task: ee03ad00 ti: edcf6000 task.ti: edcf6000
[   15.378241] PC is at 0x1c12c
[   15.381113] LR is at is_ext_pwr_online+0x30/0x6c
[   15.385706] pc : [<0001c12c>]    lr : [<c0339fc4>]    psr: a0000013
[   15.385706] sp : edcf7e88  ip : 00000000  fp : 00000000
[   15.397161] r10: eeb02c08  r9 : c04b1f84  r8 : eeb02c00
[   15.402369] r7 : edc69a10  r6 : eea6ac10  r5 : eea6ac10  r4 : 00000004
[   15.408878] r3 : 0001c12c  r2 : edcf7e8c  r1 : 00000004  r0 : ee914418
[   15.415390] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   15.422506] Control: 10c5387d  Table: 6dd0804a  DAC: 00000015
[   15.428236] Process grep (pid: 1388, stack limit = 0xedcf6240)
[   15.434050] Stack: (0xedcf7e88 to 0xedcf8000)
[   15.438395] 7e80:                   ee03ad00 00000000 edcf7f80 eea6aca8 edcf7ec4 c033b7b0
[   15.446554] 7ea0: 00000001 ee1cc3f0 00000004 c06e1e44 eebdc000 c06e1e44 eeb02c00 c0337144
[   15.454713] 7ec0: ee2dac68 c005cffc ee1cc3c0 c06e1e44 00000fff 00001000 eebdc000 c0278ca8
[   15.462872] 7ee0: c0278c8c ee1cc3c0 eeb7ce00 c014422c edcf7f20 00008000 ee1cc3c0 ee9a48c0
[   15.471030] 7f00: 00000001 00000001 edcf7f80 c0142d94 c0142d70 c01060f4 00021000 ee1cc3f0
[   15.479190] 7f20: 00000000 00000000 c06a2150 eebdc000 2e7ec000 ee9a48c0 00008000 00021000
[   15.487349] 7f40: edcf7f80 00008000 edcf6000 00021000 00021000 c00e39a4 00000000 ee9a48c0
[   15.495508] 7f60: 00004000 00000000 00000000 ee9a48c0 ee9a48c0 00008000 00021000 c00e3aa0
[   15.503668] 7f80: 00000000 00000000 0001f2e0 0001f2e0 00021000 00001000 00000003 c000f364
[   15.511826] 7fa0: 00000000 c000f1a0 0001f2e0 00021000 00000003 00021000 00008000 00000000
[   15.519986] 7fc0: 0001f2e0 00021000 00001000 00000003 00000001 000205e8 00000000 00021000
[   15.528145] 7fe0: 00008000 bebbe910 0000a7ad b6edc49c 60000010 00000003 aaaaaaaa aaaaaaaa
[   15.536320] [<c0339fc4>] (is_ext_pwr_online) from [<c033b7b0>] (charger_get_property+0x170/0x314)
[   15.545164] [<c033b7b0>] (charger_get_property) from [<c0337144>] (power_supply_show_property+0x48/0x20c)
[   15.554719] [<c0337144>] (power_supply_show_property) from [<c0278ca8>] (dev_attr_show+0x1c/0x48)
[   15.563577] [<c0278ca8>] (dev_attr_show) from [<c014422c>] (sysfs_kf_seq_show+0x84/0x104)
[   15.571725] [<c014422c>] (sysfs_kf_seq_show) from [<c0142d94>] (kernfs_seq_show+0x24/0x28)
[   15.579973] [<c0142d94>] (kernfs_seq_show) from [<c01060f4>] (seq_read+0x1b0/0x484)
[   15.587614] [<c01060f4>] (seq_read) from [<c00e39a4>] (vfs_read+0x88/0x144)
[   15.594552] [<c00e39a4>] (vfs_read) from [<c00e3aa0>] (SyS_read+0x40/0x8c)
[   15.601417] [<c00e3aa0>] (SyS_read) from [<c000f1a0>] (ret_fast_syscall+0x0/0x48)
[   15.608877] Code: bad PC value
[   15.611991] ---[ end trace a88fcc95208db283 ]---

The charger-manager should get reference to charger power supply on
each use of get_property callback.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/power/charger-manager.c       |   64 ++++++++++++++++++++--------------
 include/linux/power/charger-manager.h |    2 -
 2 files changed, 39 insertions(+), 27 deletions(-)

--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -118,10 +118,17 @@ static bool is_batt_present(struct charg
 			present = true;
 		break;
 	case CM_CHARGER_STAT:
-		for (i = 0; cm->charger_stat[i]; i++) {
-			ret = cm->charger_stat[i]->get_property(
-					cm->charger_stat[i],
-					POWER_SUPPLY_PROP_PRESENT, &val);
+		for (i = 0; cm->desc->psy_charger_stat[i]; i++) {
+			psy = power_supply_get_by_name(
+					cm->desc->psy_charger_stat[i]);
+			if (!psy) {
+				dev_err(cm->dev, "Cannot find power supply \"%s\"\n",
+					cm->desc->psy_charger_stat[i]);
+				continue;
+			}
+
+			ret = psy->get_property(psy, POWER_SUPPLY_PROP_PRESENT,
+					&val);
 			if (ret == 0 && val.intval) {
 				present = true;
 				break;
@@ -144,14 +151,20 @@ static bool is_batt_present(struct charg
 static bool is_ext_pwr_online(struct charger_manager *cm)
 {
 	union power_supply_propval val;
+	struct power_supply *psy;
 	bool online = false;
 	int i, ret;
 
 	/* If at least one of them has one, it's yes. */
-	for (i = 0; cm->charger_stat[i]; i++) {
-		ret = cm->charger_stat[i]->get_property(
-				cm->charger_stat[i],
-				POWER_SUPPLY_PROP_ONLINE, &val);
+	for (i = 0; cm->desc->psy_charger_stat[i]; i++) {
+		psy = power_supply_get_by_name(cm->desc->psy_charger_stat[i]);
+		if (!psy) {
+			dev_err(cm->dev, "Cannot find power supply \"%s\"\n",
+					cm->desc->psy_charger_stat[i]);
+			continue;
+		}
+
+		ret = psy->get_property(psy, POWER_SUPPLY_PROP_ONLINE, &val);
 		if (ret == 0 && val.intval) {
 			online = true;
 			break;
@@ -196,6 +209,7 @@ static bool is_charging(struct charger_m
 {
 	int i, ret;
 	bool charging = false;
+	struct power_supply *psy;
 	union power_supply_propval val;
 
 	/* If there is no battery, it cannot be charged */
@@ -203,17 +217,22 @@ static bool is_charging(struct charger_m
 		return false;
 
 	/* If at least one of the charger is charging, return yes */
-	for (i = 0; cm->charger_stat[i]; i++) {
+	for (i = 0; cm->desc->psy_charger_stat[i]; i++) {
 		/* 1. The charger sholuld not be DISABLED */
 		if (cm->emergency_stop)
 			continue;
 		if (!cm->charger_enabled)
 			continue;
 
+		psy = power_supply_get_by_name(cm->desc->psy_charger_stat[i]);
+		if (!psy) {
+			dev_err(cm->dev, "Cannot find power supply \"%s\"\n",
+					cm->desc->psy_charger_stat[i]);
+			continue;
+		}
+
 		/* 2. The charger should be online (ext-power) */
-		ret = cm->charger_stat[i]->get_property(
-				cm->charger_stat[i],
-				POWER_SUPPLY_PROP_ONLINE, &val);
+		ret = psy->get_property(psy, POWER_SUPPLY_PROP_ONLINE, &val);
 		if (ret) {
 			dev_warn(cm->dev, "Cannot read ONLINE value from %s\n",
 				 cm->desc->psy_charger_stat[i]);
@@ -226,9 +245,7 @@ static bool is_charging(struct charger_m
 		 * 3. The charger should not be FULL, DISCHARGING,
 		 * or NOT_CHARGING.
 		 */
-		ret = cm->charger_stat[i]->get_property(
-				cm->charger_stat[i],
-				POWER_SUPPLY_PROP_STATUS, &val);
+		ret = psy->get_property(psy, POWER_SUPPLY_PROP_STATUS, &val);
 		if (ret) {
 			dev_warn(cm->dev, "Cannot read STATUS value from %s\n",
 				 cm->desc->psy_charger_stat[i]);
@@ -1772,15 +1789,12 @@ static int charger_manager_probe(struct
 	while (desc->psy_charger_stat[i])
 		i++;
 
-	cm->charger_stat = devm_kzalloc(&pdev->dev,
-				sizeof(struct power_supply *) * i, GFP_KERNEL);
-	if (!cm->charger_stat)
-		return -ENOMEM;
-
+	/* Check if charger's supplies are present at probe */
 	for (i = 0; desc->psy_charger_stat[i]; i++) {
-		cm->charger_stat[i] = power_supply_get_by_name(
-					desc->psy_charger_stat[i]);
-		if (!cm->charger_stat[i]) {
+		struct power_supply *psy;
+
+		psy = power_supply_get_by_name(desc->psy_charger_stat[i]);
+		if (!psy) {
 			dev_err(&pdev->dev, "Cannot find power supply \"%s\"\n",
 				desc->psy_charger_stat[i]);
 			return -ENODEV;
@@ -2102,8 +2116,8 @@ static bool find_power_supply(struct cha
 	int i;
 	bool found = false;
 
-	for (i = 0; cm->charger_stat[i]; i++) {
-		if (psy == cm->charger_stat[i]) {
+	for (i = 0; cm->desc->psy_charger_stat[i]; i++) {
+		if (!strcmp(psy->name, cm->desc->psy_charger_stat[i])) {
 			found = true;
 			break;
 		}
--- a/include/linux/power/charger-manager.h
+++ b/include/linux/power/charger-manager.h
@@ -253,8 +253,6 @@ struct charger_manager {
 	struct device *dev;
 	struct charger_desc *desc;
 
-	struct power_supply **charger_stat;
-
 #ifdef CONFIG_THERMAL
 	struct thermal_zone_device *tzd_batt;
 #endif



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 062/122] power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 061/122] power: charger-manager: Fix accessing invalidated power supply after charger unbind Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 063/122] power: bq2415x_charger: Fix memory leak on DTS parsing error Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Krzysztof Kozlowski, Sebastian Reichel

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Krzysztof Kozlowski <k.kozlowski@samsung.com>

commit 0eaf437aa14949d2230aeab7364f4ab47901304a upstream.

The power_supply_get_by_phandle() on error returns ENODEV or NULL.
The driver later expects obtained pointer to power supply to be
valid or NULL. If it is not NULL then it dereferences it in
bq2415x_notifier_call() which would lead to dereferencing ENODEV-value
pointer.

Properly handle the power_supply_get_by_phandle() error case by
replacing error value with NULL. This indicates that usb charger
detection won't be used.

Fix also memory leak of 'name' if power_supply_get_by_phandle() fails
with NULL and probe should defer.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: faffd234cf85 ("bq2415x_charger: Add DT support")
[small fix regarding the missing ti,usb-charger-detection info message]
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/power/bq2415x_charger.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/drivers/power/bq2415x_charger.c
+++ b/drivers/power/bq2415x_charger.c
@@ -1579,8 +1579,15 @@ static int bq2415x_probe(struct i2c_clie
 	if (np) {
 		bq->notify_psy = power_supply_get_by_phandle(np, "ti,usb-charger-detection");
 
-		if (!bq->notify_psy)
-			return -EPROBE_DEFER;
+		if (IS_ERR(bq->notify_psy)) {
+			dev_info(&client->dev,
+				"no 'ti,usb-charger-detection' property (err=%ld)\n",
+				PTR_ERR(bq->notify_psy));
+			bq->notify_psy = NULL;
+		} else if (!bq->notify_psy) {
+			ret = -EPROBE_DEFER;
+			goto error_2;
+		}
 	}
 	else if (pdata->notify_device)
 		bq->notify_psy = power_supply_get_by_name(pdata->notify_device);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 063/122] power: bq2415x_charger: Fix memory leak on DTS parsing error
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 062/122] power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 064/122] x86, microcode, AMD: Fix early ucode loading on 32-bit Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Krzysztof Kozlowski, Sebastian Reichel

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Krzysztof Kozlowski <k.kozlowski@samsung.com>

commit 21e863b233553998737e1b506c823a00bf012e00 upstream.

Memory allocated for 'name' was leaking if required binding properties
were not present.

The memory for 'name' was allocated early at probe with kasprintf(). It
was freed in error paths executed before and after parsing DTS but not
in that error path.

Fix the error path for parsing device tree properties.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: faffd234cf85 ("bq2415x_charger: Add DT support")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/power/bq2415x_charger.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/power/bq2415x_charger.c
+++ b/drivers/power/bq2415x_charger.c
@@ -1609,27 +1609,27 @@ static int bq2415x_probe(struct i2c_clie
 		ret = of_property_read_u32(np, "ti,current-limit",
 				&bq->init_data.current_limit);
 		if (ret)
-			return ret;
+			goto error_2;
 		ret = of_property_read_u32(np, "ti,weak-battery-voltage",
 				&bq->init_data.weak_battery_voltage);
 		if (ret)
-			return ret;
+			goto error_2;
 		ret = of_property_read_u32(np, "ti,battery-regulation-voltage",
 				&bq->init_data.battery_regulation_voltage);
 		if (ret)
-			return ret;
+			goto error_2;
 		ret = of_property_read_u32(np, "ti,charge-current",
 				&bq->init_data.charge_current);
 		if (ret)
-			return ret;
+			goto error_2;
 		ret = of_property_read_u32(np, "ti,termination-current",
 				&bq->init_data.termination_current);
 		if (ret)
-			return ret;
+			goto error_2;
 		ret = of_property_read_u32(np, "ti,resistor-sense",
 				&bq->init_data.resistor_sense);
 		if (ret)
-			return ret;
+			goto error_2;
 	} else {
 		memcpy(&bq->init_data, pdata, sizeof(bq->init_data));
 	}



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 064/122] x86, microcode, AMD: Fix early ucode loading on 32-bit
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 063/122] power: bq2415x_charger: Fix memory leak on DTS parsing error Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 065/122] x86, microcode, AMD: Fix ucode patch stashing " Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Borislav Petkov, Thomas Gleixner

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit 4750a0d112cbfcc744929f1530ffe3193436766c upstream.

Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.

Because the ramdisk is exactly there:

[    0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]

and we fault at 0x35e04304.

And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.

So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!

Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
 00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
 f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
 c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c52 ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/microcode/amd_early.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/cpu/microcode/amd_early.c
+++ b/arch/x86/kernel/cpu/microcode/amd_early.c
@@ -348,6 +348,7 @@ int __init save_microcode_in_initrd_amd(
 {
 	unsigned long cont;
 	enum ucode_state ret;
+	u8 *cont_va;
 	u32 eax;
 
 	if (!container)
@@ -355,13 +356,15 @@ int __init save_microcode_in_initrd_amd(
 
 #ifdef CONFIG_X86_32
 	get_bsp_sig();
-	cont = (unsigned long)container;
+	cont	= (unsigned long)container;
+	cont_va = __va(container);
 #else
 	/*
 	 * We need the physical address of the container for both bitness since
 	 * boot_params.hdr.ramdisk_image is a physical address.
 	 */
-	cont = __pa(container);
+	cont    = __pa(container);
+	cont_va = container;
 #endif
 
 	/*
@@ -372,6 +375,8 @@ int __init save_microcode_in_initrd_amd(
 	if (relocated_ramdisk)
 		container = (u8 *)(__va(relocated_ramdisk) +
 			     (cont - boot_params.hdr.ramdisk_image));
+	else
+		container = cont_va;
 
 	if (ucode_new_rev)
 		pr_info("microcode: updated early to new patch_level=0x%08x\n",



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 065/122] x86, microcode, AMD: Fix ucode patch stashing on 32-bit
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 064/122] x86, microcode, AMD: Fix early ucode loading on 32-bit Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 066/122] x86, kaslr: Prevent .bss from overlaping initrd Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Richard Hendershot, Borislav Petkov

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Borislav Petkov <bp@suse.de>

commit c0a717f23dccdb6e3b03471bc846fdc636f2b353 upstream.

Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.

This way we actually do find it on the APs instead of having to go
through the initrd each time.

Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf475 ("x86, microcode, AMD: Fix early ucode loading")
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/microcode/amd_early.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/cpu/microcode/amd_early.c
+++ b/arch/x86/kernel/cpu/microcode/amd_early.c
@@ -108,12 +108,13 @@ static size_t compute_container_size(u8
  * load_microcode_amd() to save equivalent cpu table and microcode patches in
  * kernel heap memory.
  */
-static void apply_ucode_in_initrd(void *ucode, size_t size)
+static void apply_ucode_in_initrd(void *ucode, size_t size, bool save_patch)
 {
 	struct equiv_cpu_entry *eq;
 	size_t *cont_sz;
 	u32 *header;
 	u8  *data, **cont;
+	u8 (*patch)[PATCH_MAX_SIZE];
 	u16 eq_id = 0;
 	int offset, left;
 	u32 rev, eax, ebx, ecx, edx;
@@ -123,10 +124,12 @@ static void apply_ucode_in_initrd(void *
 	new_rev = (u32 *)__pa_nodebug(&ucode_new_rev);
 	cont_sz = (size_t *)__pa_nodebug(&container_size);
 	cont	= (u8 **)__pa_nodebug(&container);
+	patch	= (u8 (*)[PATCH_MAX_SIZE])__pa_nodebug(&amd_ucode_patch);
 #else
 	new_rev = &ucode_new_rev;
 	cont_sz = &container_size;
 	cont	= &container;
+	patch	= &amd_ucode_patch;
 #endif
 
 	data   = ucode;
@@ -213,9 +216,9 @@ static void apply_ucode_in_initrd(void *
 				rev = mc->hdr.patch_id;
 				*new_rev = rev;
 
-				/* save ucode patch */
-				memcpy(amd_ucode_patch, mc,
-				       min_t(u32, header[1], PATCH_MAX_SIZE));
+				if (save_patch)
+					memcpy(patch, mc,
+					       min_t(u32, header[1], PATCH_MAX_SIZE));
 			}
 		}
 
@@ -246,7 +249,7 @@ void __init load_ucode_amd_bsp(void)
 	*data = cp.data;
 	*size = cp.size;
 
-	apply_ucode_in_initrd(cp.data, cp.size);
+	apply_ucode_in_initrd(cp.data, cp.size, true);
 }
 
 #ifdef CONFIG_X86_32
@@ -263,7 +266,7 @@ void load_ucode_amd_ap(void)
 	size_t *usize;
 	void **ucode;
 
-	mc = (struct microcode_amd *)__pa(amd_ucode_patch);
+	mc = (struct microcode_amd *)__pa_nodebug(amd_ucode_patch);
 	if (mc->hdr.patch_id && mc->hdr.processor_rev_id) {
 		__apply_microcode_amd(mc);
 		return;
@@ -275,7 +278,7 @@ void load_ucode_amd_ap(void)
 	if (!*ucode || !*usize)
 		return;
 
-	apply_ucode_in_initrd(*ucode, *usize);
+	apply_ucode_in_initrd(*ucode, *usize, false);
 }
 
 static void __init collect_cpu_sig_on_bsp(void *arg)
@@ -339,7 +342,7 @@ void load_ucode_amd_ap(void)
 		 * AP has a different equivalence ID than BSP, looks like
 		 * mixed-steppings silicon so go through the ucode blob anew.
 		 */
-		apply_ucode_in_initrd(ucode_cpio.data, ucode_cpio.size);
+		apply_ucode_in_initrd(ucode_cpio.data, ucode_cpio.size, false);
 	}
 }
 #endif
@@ -347,6 +350,7 @@ void load_ucode_amd_ap(void)
 int __init save_microcode_in_initrd_amd(void)
 {
 	unsigned long cont;
+	int retval = 0;
 	enum ucode_state ret;
 	u8 *cont_va;
 	u32 eax;
@@ -387,7 +391,7 @@ int __init save_microcode_in_initrd_amd(
 
 	ret = load_microcode_amd(eax, container, container_size);
 	if (ret != UCODE_OK)
-		return -EINVAL;
+		retval = -EINVAL;
 
 	/*
 	 * This will be freed any msec now, stash patches for the current
@@ -396,5 +400,5 @@ int __init save_microcode_in_initrd_amd(
 	container = NULL;
 	container_size = 0;
 
-	return 0;
+	return retval;
 }



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 066/122] x86, kaslr: Prevent .bss from overlaping initrd
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 065/122] x86, microcode, AMD: Fix ucode patch stashing " Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 067/122] md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Fengguang Wu, Junjie Mao, Kees Cook,
	Josh Triplett, Matt Fleming, Ard Biesheuvel, Vivek Goyal,
	Andi Kleen, Thomas Gleixner

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Junjie Mao <eternal.n08@gmail.com>

commit e6023367d779060fddc9a52d1f474085b2b36298 upstream.

When choosing a random address, the current implementation does not take into
account the reversed space for .bss and .brk sections. Thus the relocated kernel
may overlap other components in memory. Here is an example of the overlap from a
x86_64 kernel in qemu (the ranges of physical addresses are presented):

 Physical Address

    0x0fe00000                  --+--------------------+  <-- randomized base
                               /  |  relocated kernel  |
                   vmlinux.bin    | (from vmlinux.bin) |
    0x1336d000    (an ELF file)   +--------------------+--
                               \  |                    |  \
    0x1376d870                  --+--------------------+   |
                                  |    relocs table    |   |
    0x13c1c2a8                    +--------------------+   .bss and .brk
                                  |                    |   |
    0x13ce6000                    +--------------------+   |
                                  |                    |  /
    0x13f77000                    |       initrd       |--
                                  |                    |
    0x13fef374                    +--------------------+

The initrd image will then be overwritten by the memset during early
initialization:

[    1.655204] Unpacking initramfs...
[    1.662831] Initramfs unpacking failed: junk in compressed archive

This patch prevents the above situation by requiring a larger space when looking
for a random kernel base, so that existing logic can effectively avoids the
overlap.

[kees: switched to perl to avoid hex translation pain in mawk vs gawk]
[kees: calculated overlap without relocs table]

Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/boot/compressed/Makefile  |    4 +++-
 arch/x86/boot/compressed/head_32.S |    5 +++--
 arch/x86/boot/compressed/head_64.S |    5 ++++-
 arch/x86/boot/compressed/misc.c    |   13 ++++++++++---
 arch/x86/boot/compressed/mkpiggy.c |    9 +++++++--
 arch/x86/tools/calc_run_size.pl    |   30 ++++++++++++++++++++++++++++++
 6 files changed, 57 insertions(+), 9 deletions(-)

--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -75,8 +75,10 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 
+RUN_SIZE = $(shell objdump -h vmlinux | \
+	     perl $(srctree)/arch/x86/tools/calc_run_size.pl)
 quiet_cmd_mkpiggy = MKPIGGY $@
-      cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
+      cmd_mkpiggy = $(obj)/mkpiggy $< $(RUN_SIZE) > $@ || ( rm -f $@ ; false )
 
 targets += piggy.S
 $(obj)/piggy.S: $(obj)/vmlinux.bin.$(suffix-y) $(obj)/mkpiggy FORCE
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -186,7 +186,8 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 				/* push arguments for decompress_kernel: */
-	pushl	$z_output_len	/* decompressed length */
+	pushl	$z_run_size	/* size of kernel with .bss and .brk */
+	pushl	$z_output_len	/* decompressed length, end of relocs */
 	leal	z_extract_offset_negative(%ebx), %ebp
 	pushl	%ebp		/* output address */
 	pushl	$z_input_len	/* input_len */
@@ -196,7 +197,7 @@ relocated:
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	decompress_kernel /* returns kernel location in %eax */
-	addl	$24, %esp
+	addl	$28, %esp
 
 /*
  * Jump to the decompressed kernel.
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -334,13 +334,16 @@ relocated:
  * Do the decompression, and jump to the new kernel..
  */
 	pushq	%rsi			/* Save the real mode argument */
+	movq	$z_run_size, %r9	/* size of kernel with .bss and .brk */
+	pushq	%r9
 	movq	%rsi, %rdi		/* real mode address */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
-	movq	$z_output_len, %r9	/* decompressed length */
+	movq	$z_output_len, %r9	/* decompressed length, end of relocs */
 	call	decompress_kernel	/* returns kernel location in %rax */
+	popq	%r9
 	popq	%rsi
 
 /*
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -393,7 +393,8 @@ asmlinkage void *decompress_kernel(void
 				  unsigned char *input_data,
 				  unsigned long input_len,
 				  unsigned char *output,
-				  unsigned long output_len)
+				  unsigned long output_len,
+				  unsigned long run_size)
 {
 	real_mode = rmode;
 
@@ -416,8 +417,14 @@ asmlinkage void *decompress_kernel(void
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
-	output = choose_kernel_location(input_data, input_len,
-					output, output_len);
+	/*
+	 * The memory hole needed for the kernel is the larger of either
+	 * the entire decompressed kernel plus relocation table, or the
+	 * entire decompressed kernel plus .bss and .brk sections.
+	 */
+	output = choose_kernel_location(input_data, input_len, output,
+					output_len > run_size ? output_len
+							      : run_size);
 
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -36,11 +36,13 @@ int main(int argc, char *argv[])
 	uint32_t olen;
 	long ilen;
 	unsigned long offs;
+	unsigned long run_size;
 	FILE *f = NULL;
 	int retval = 1;
 
-	if (argc < 2) {
-		fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
+	if (argc < 3) {
+		fprintf(stderr, "Usage: %s compressed_file run_size\n",
+				argv[0]);
 		goto bail;
 	}
 
@@ -74,6 +76,7 @@ int main(int argc, char *argv[])
 	offs += olen >> 12;	/* Add 8 bytes for each 32K block */
 	offs += 64*1024 + 128;	/* Add 64K + 128 bytes slack */
 	offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
+	run_size = atoi(argv[2]);
 
 	printf(".section \".rodata..compressed\",\"a\",@progbits\n");
 	printf(".globl z_input_len\n");
@@ -85,6 +88,8 @@ int main(int argc, char *argv[])
 	/* z_extract_offset_negative allows simplification of head_32.S */
 	printf(".globl z_extract_offset_negative\n");
 	printf("z_extract_offset_negative = -0x%lx\n", offs);
+	printf(".globl z_run_size\n");
+	printf("z_run_size = %lu\n", run_size);
 
 	printf(".globl input_data, input_data_end\n");
 	printf("input_data:\n");
--- /dev/null
+++ b/arch/x86/tools/calc_run_size.pl
@@ -0,0 +1,30 @@
+#!/usr/bin/perl
+#
+# Calculate the amount of space needed to run the kernel, including room for
+# the .bss and .brk sections.
+#
+# Usage:
+# objdump -h a.out | perl calc_run_size.pl
+use strict;
+
+my $mem_size = 0;
+my $file_offset = 0;
+
+my $sections=" *[0-9]+ \.(?:bss|brk) +";
+while (<>) {
+	if (/^$sections([0-9a-f]+) +(?:[0-9a-f]+ +){2}([0-9a-f]+)/) {
+		my $size = hex($1);
+		my $offset = hex($2);
+		$mem_size += $size;
+		if ($file_offset == 0) {
+			$file_offset = $offset;
+		} elsif ($file_offset != $offset) {
+			die ".bss and .brk lack common file offset\n";
+		}
+	}
+}
+
+if ($file_offset == 0) {
+	die "Never found .bss or .brk file offset\n";
+}
+printf("%d\n", $mem_size + $file_offset);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 067/122] md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 066/122] x86, kaslr: Prevent .bss from overlaping initrd Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 068/122] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, NeilBrown

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: NeilBrown <neilb@suse.de>

commit 45eaf45dfa4850df16bc2e8e7903d89021137f40 upstream.

md_check_recovery will skip any recovery and also clear
MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set.
So when we clear _FROZEN, we must set _NEEDED and ensure that
md_check_recovery gets run.
Otherwise we could miss out on something that is needed.

In particular, this can make it impossible to remove a
failed device from an array is the  'recovery-needed' processing
didn't happen.
Suitable for stable kernels since 3.13.

Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Fixes: 30b8feb730f9b9b3c5de02580897da03f59b6b16
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/md.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5333,6 +5333,7 @@ static int md_set_readonly(struct mddev
 		printk("md: %s still in use.\n",mdname(mddev));
 		if (did_freeze) {
 			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			md_wakeup_thread(mddev->thread);
 		}
 		err = -EBUSY;
@@ -5347,6 +5348,8 @@ static int md_set_readonly(struct mddev
 		mddev->ro = 1;
 		set_disk_ro(mddev->gendisk, 1);
 		clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		md_wakeup_thread(mddev->thread);
 		sysfs_notify_dirent_safe(mddev->sysfs_state);
 		err = 0;
 	}
@@ -5390,6 +5393,7 @@ static int do_md_stop(struct mddev * mdd
 		mutex_unlock(&mddev->open_mutex);
 		if (did_freeze) {
 			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			md_wakeup_thread(mddev->thread);
 		}
 		return -EBUSY;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 068/122] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 067/122] md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:51 ` [PATCH 3.14 069/122] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 4dfd4f7af0afd201706ad186352ca423b0f17d4b upstream.

NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so
unlike NFSv4.1, the recovery procedure when stateids have expired or
have been revoked requires us to just forget the delegation.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/nfs4proc.c |   24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2034,6 +2034,28 @@ static int nfs4_open_expired(struct nfs4
 	return ret;
 }
 
+static void nfs_finish_clear_delegation_stateid(struct nfs4_state *state)
+{
+	nfs_remove_bad_delegation(state->inode);
+	write_seqlock(&state->seqlock);
+	nfs4_stateid_copy(&state->stateid, &state->open_stateid);
+	write_sequnlock(&state->seqlock);
+	clear_bit(NFS_DELEGATED_STATE, &state->flags);
+}
+
+static void nfs40_clear_delegation_stateid(struct nfs4_state *state)
+{
+	if (rcu_access_pointer(NFS_I(state->inode)->delegation) != NULL)
+		nfs_finish_clear_delegation_stateid(state);
+}
+
+static int nfs40_open_expired(struct nfs4_state_owner *sp, struct nfs4_state *state)
+{
+	/* NFSv4.0 doesn't allow for delegation recovery on open expire */
+	nfs40_clear_delegation_stateid(state);
+	return nfs4_open_expired(sp, state);
+}
+
 #if defined(CONFIG_NFS_V4_1)
 static void nfs41_clear_delegation_stateid(struct nfs4_state *state)
 {
@@ -8255,7 +8277,7 @@ static const struct nfs4_state_recovery_
 static const struct nfs4_state_recovery_ops nfs40_nograce_recovery_ops = {
 	.owner_flag_bit = NFS_OWNER_RECLAIM_NOGRACE,
 	.state_flag_bit	= NFS_STATE_RECLAIM_NOGRACE,
-	.recover_open	= nfs4_open_expired,
+	.recover_open	= nfs40_open_expired,
 	.recover_lock	= nfs4_lock_expired,
 	.establish_clid = nfs4_init_clientid,
 };



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 069/122] NFS: Dont try to reclaim delegation open state if recovery failed
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 068/122] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
@ 2014-11-19 20:51 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 070/122] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit f8ebf7a8ca35dde321f0cd385fee6f1950609367 upstream.

If state recovery failed, then we should not attempt to reclaim delegated
state.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/delegation.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -109,6 +109,8 @@ again:
 			continue;
 		if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
 			continue;
+		if (!nfs4_valid_open_stateid(state))
+			continue;
 		if (!nfs4_stateid_match(&state->stateid, stateid))
 			continue;
 		get_nfs_open_context(ctx);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 070/122] nfs: Fix use of uninitialized variable in nfs_getattr()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2014-11-19 20:51 ` [PATCH 3.14 069/122] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 071/122] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Kara, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Kara <jack@suse.cz>

commit 16caf5b6101d03335b386e77e9e14136f989be87 upstream.

Variable 'err' needn't be initialized when nfs_getattr() uses it to
check whether it should call generic_fillattr() or not. That can result
in spurious error returns. Initialize 'err' properly.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/inode.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -592,7 +592,7 @@ int nfs_getattr(struct vfsmount *mnt, st
 {
 	struct inode *inode = dentry->d_inode;
 	int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME;
-	int err;
+	int err = 0;
 
 	trace_nfs_getattr_enter(inode);
 	/* Flush out writes to the server in order to update c/mtime.  */



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 071/122] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 070/122] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 072/122] NFSv4.1: nfs41_clear_delegation_stateid shouldnt trust NFS_DELEGATED_STATE Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 869f9dfa4d6d57b79e0afc3af14772c2a023eeb1 upstream.

Any attempt to call nfs_remove_bad_delegation() while a delegation is being
returned is currently a no-op. This means that we can end up looping
forever in nfs_end_delegation_return() if something causes the delegation
to be revoked.
This patch adds a mechanism whereby the state recovery code can communicate
to the delegation return code that the delegation is no longer valid and
that it should not be used when reclaiming state.
It also changes the return value for nfs4_handle_delegation_recall_error()
to ensure that nfs_end_delegation_return() does not reattempt the lock
reclaim before state recovery is done.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/delegation.c |   23 +++++++++++++++++++++--
 fs/nfs/delegation.h |    1 +
 fs/nfs/nfs4proc.c   |    2 +-
 3 files changed, 23 insertions(+), 3 deletions(-)

--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -179,7 +179,11 @@ static int nfs_do_return_delegation(stru
 {
 	int res = 0;
 
-	res = nfs4_proc_delegreturn(inode, delegation->cred, &delegation->stateid, issync);
+	if (!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+		res = nfs4_proc_delegreturn(inode,
+				delegation->cred,
+				&delegation->stateid,
+				issync);
 	nfs_free_delegation(delegation);
 	return res;
 }
@@ -366,11 +370,13 @@ static int nfs_end_delegation_return(str
 {
 	struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
 	struct nfs_inode *nfsi = NFS_I(inode);
-	int err;
+	int err = 0;
 
 	if (delegation == NULL)
 		return 0;
 	do {
+		if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+			break;
 		err = nfs_delegation_claim_opens(inode, &delegation->stateid);
 		if (!issync || err != -EAGAIN)
 			break;
@@ -591,10 +597,23 @@ static void nfs_client_mark_return_unuse
 	rcu_read_unlock();
 }
 
+static void nfs_revoke_delegation(struct inode *inode)
+{
+	struct nfs_delegation *delegation;
+	rcu_read_lock();
+	delegation = rcu_dereference(NFS_I(inode)->delegation);
+	if (delegation != NULL) {
+		set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
+		nfs_mark_return_delegation(NFS_SERVER(inode), delegation);
+	}
+	rcu_read_unlock();
+}
+
 void nfs_remove_bad_delegation(struct inode *inode)
 {
 	struct nfs_delegation *delegation;
 
+	nfs_revoke_delegation(inode);
 	delegation = nfs_inode_detach_delegation(inode);
 	if (delegation) {
 		nfs_inode_find_state_and_recover(inode, &delegation->stateid);
--- a/fs/nfs/delegation.h
+++ b/fs/nfs/delegation.h
@@ -31,6 +31,7 @@ enum {
 	NFS_DELEGATION_RETURN_IF_CLOSED,
 	NFS_DELEGATION_REFERENCED,
 	NFS_DELEGATION_RETURNING,
+	NFS_DELEGATION_REVOKED,
 };
 
 int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res);
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1587,7 +1587,7 @@ static int nfs4_handle_delegation_recall
 			nfs_inode_find_state_and_recover(state->inode,
 					stateid);
 			nfs4_schedule_stateid_recovery(server, state);
-			return 0;
+			return -EAGAIN;
 		case -NFS4ERR_DELAY:
 		case -NFS4ERR_GRACE:
 			set_bit(NFS_DELEGATED_STATE, &state->flags);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 072/122] NFSv4.1: nfs41_clear_delegation_stateid shouldnt trust NFS_DELEGATED_STATE
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 071/122] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 073/122] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Trond Myklebust

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <trond.myklebust@primarydata.com>

commit 0c116cadd94b16b30b1dd90d38b2784d9b39b01a upstream.

This patch removes the assumption made previously, that we only need to
check the delegation stateid when it matches the stateid on a cached
open.

If we believe that we hold a delegation for this file, then we must assume
that its stateid may have been revoked or expired too. If we don't test it
then our state recovery process may end up caching open/lock state in a
situation where it should not.
We therefore rename the function nfs41_clear_delegation_stateid as
nfs41_check_delegation_stateid, and change it to always run through the
delegation stateid test and recovery process as outlined in RFC5661.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/nfs4proc.c |   42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2057,45 +2057,37 @@ static int nfs40_open_expired(struct nfs
 }
 
 #if defined(CONFIG_NFS_V4_1)
-static void nfs41_clear_delegation_stateid(struct nfs4_state *state)
+static void nfs41_check_delegation_stateid(struct nfs4_state *state)
 {
 	struct nfs_server *server = NFS_SERVER(state->inode);
-	nfs4_stateid *stateid = &state->stateid;
+	nfs4_stateid stateid;
 	struct nfs_delegation *delegation;
-	struct rpc_cred *cred = NULL;
-	int status = -NFS4ERR_BAD_STATEID;
-
-	/* If a state reset has been done, test_stateid is unneeded */
-	if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
-		return;
+	struct rpc_cred *cred;
+	int status;
 
 	/* Get the delegation credential for use by test/free_stateid */
 	rcu_read_lock();
 	delegation = rcu_dereference(NFS_I(state->inode)->delegation);
-	if (delegation != NULL &&
-	    nfs4_stateid_match(&delegation->stateid, stateid)) {
-		cred = get_rpccred(delegation->cred);
-		rcu_read_unlock();
-		status = nfs41_test_stateid(server, stateid, cred);
-		trace_nfs4_test_delegation_stateid(state, NULL, status);
-	} else
+	if (delegation == NULL) {
 		rcu_read_unlock();
+		return;
+	}
+
+	nfs4_stateid_copy(&stateid, &delegation->stateid);
+	cred = get_rpccred(delegation->cred);
+	rcu_read_unlock();
+	status = nfs41_test_stateid(server, &stateid, cred);
+	trace_nfs4_test_delegation_stateid(state, NULL, status);
 
 	if (status != NFS_OK) {
 		/* Free the stateid unless the server explicitly
 		 * informs us the stateid is unrecognized. */
 		if (status != -NFS4ERR_BAD_STATEID)
-			nfs41_free_stateid(server, stateid, cred);
-		nfs_remove_bad_delegation(state->inode);
-
-		write_seqlock(&state->seqlock);
-		nfs4_stateid_copy(&state->stateid, &state->open_stateid);
-		write_sequnlock(&state->seqlock);
-		clear_bit(NFS_DELEGATED_STATE, &state->flags);
+			nfs41_free_stateid(server, &stateid, cred);
+		nfs_finish_clear_delegation_stateid(state);
 	}
 
-	if (cred != NULL)
-		put_rpccred(cred);
+	put_rpccred(cred);
 }
 
 /**
@@ -2139,7 +2131,7 @@ static int nfs41_open_expired(struct nfs
 {
 	int status;
 
-	nfs41_clear_delegation_stateid(state);
+	nfs41_check_delegation_stateid(state);
 	status = nfs41_check_open_stateid(state);
 	if (status != NFS_OK)
 		status = nfs4_open_expired(sp, state);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 073/122] media: ttusb-dec: buffer overflow in ioctl
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 072/122] NFSv4.1: nfs41_clear_delegation_stateid shouldnt trust NFS_DELEGATED_STATE Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 074/122] memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Mauro Carvalho Chehab

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit f2e323ec96077642d397bb1c355def536d489d16 upstream.

We need to add a limit check here so we don't overflow the buffer.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/media/usb/ttusb-dec/ttusbdecfe.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/media/usb/ttusb-dec/ttusbdecfe.c
+++ b/drivers/media/usb/ttusb-dec/ttusbdecfe.c
@@ -156,6 +156,9 @@ static int ttusbdecfe_dvbs_diseqc_send_m
 		   0x00, 0x00, 0x00, 0x00,
 		   0x00, 0x00 };
 
+	if (cmd->msg_len > sizeof(b) - 4)
+		return -EINVAL;
+
 	memcpy(&b[4], cmd->msg, cmd->msg_len);
 
 	state->config->send_command(fe, 0x72,



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 074/122] memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 073/122] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 075/122] vmcore: Remove "weak" from function declarations Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, Andrew Morton,
	Rashika Kheria, Nathan Fontenot, Anton Blanchard, Heiko Carstens,
	Yinghai Lu

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit e0a8400c6923a163265d52798cdd4c33f3f8ab5a upstream.

drivers/base/memory.c provides a default memory_block_size_bytes()
definition explicitly marked "weak".  Several architectures provide their
own definitions intended to override the default, but the "weak" attribute
on the declaration applied to the arch definitions as well, so the linker
chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
annotation from pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the declaration so we always prefer a
non-weak definition over the weak one, independent of link order.

Fixes: 41f107266b19 ("drivers: base: Add prototype declaration to the header file")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
CC: Rashika Kheria <rashika.kheria@gmail.com>
CC: Nathan Fontenot <nfont@austin.ibm.com>
CC: Anton Blanchard <anton@au1.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memory.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -35,7 +35,7 @@ struct memory_block {
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
-unsigned long __weak memory_block_size_bytes(void);
+unsigned long memory_block_size_bytes(void);
 
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 075/122] vmcore: Remove "weak" from function declarations
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 074/122] memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 076/122] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, Andrew Morton,
	Vivek Goyal, Michael Holzheu

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 5ab03ac5aaa1f032e071f1b3dc433b7839359c03 upstream.

For the following functions:

  elfcorehdr_alloc()
  elfcorehdr_free()
  elfcorehdr_read()
  elfcorehdr_read_notes()
  remap_oldmem_pfn_range()

fs/proc/vmcore.c provides default definitions explicitly marked "weak".
arch/s390 provides its own definitions intended to override the default
ones, but the "weak" attribute on the declarations applied to the s390
definitions as well, so the linker chose one based on link order (see
10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node
decl")).

Remove the "weak" attribute from the declarations so we always prefer a
non-weak definition over the weak one, independent of link order.

Fixes: be8a8d069e50 ("vmcore: introduce ELF header in new memory feature")
Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
CC: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/crash_dump.h |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -14,14 +14,13 @@
 extern unsigned long long elfcorehdr_addr;
 extern unsigned long long elfcorehdr_size;
 
-extern int __weak elfcorehdr_alloc(unsigned long long *addr,
-				   unsigned long long *size);
-extern void __weak elfcorehdr_free(unsigned long long addr);
-extern ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos);
-extern ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
-extern int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
-					 unsigned long from, unsigned long pfn,
-					 unsigned long size, pgprot_t prot);
+extern int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size);
+extern void elfcorehdr_free(unsigned long long addr);
+extern ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos);
+extern ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos);
+extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
+				  unsigned long from, unsigned long pfn,
+				  unsigned long size, pgprot_t prot);
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
 						unsigned long, int);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 076/122] kgdb: Remove "weak" from kgdb_arch_pc() declaration
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 075/122] vmcore: Remove "weak" from function declarations Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 077/122] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, Harvey Harrison

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 107bcc6d566cb40184068d888637f9aefe6252dd upstream.

kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition
explicitly marked "weak".  Several architectures provide their own
definitions intended to override the default, but the "weak" attribute on
the declaration applied to the arch definitions as well, so the linker
chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak
annotation from pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the declaration so we always prefer a
non-weak definition over the weak one, independent of link order.

Fixes: 688b744d8bc8 ("kgdb: fix signedness mixmatches, add statics, add declaration to header")
Tested-by: Vineet Gupta <vgupta@synopsys.com>	# for ARC build
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/kgdb.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -283,7 +283,7 @@ struct kgdb_io {
 
 extern struct kgdb_arch		arch_kgdb_ops;
 
-extern unsigned long __weak kgdb_arch_pc(int exception, struct pt_regs *regs);
+extern unsigned long kgdb_arch_pc(int exception, struct pt_regs *regs);
 
 #ifdef CONFIG_SERIAL_KGDB_NMI
 extern int kgdb_register_nmi_console(void);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 077/122] clocksource: Remove "weak" from clocksource_default_clock() declaration
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 076/122] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 078/122] IB/core: Clear AH attr variable to prevent garbage data Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bjorn Helgaas, John Stultz,
	Ingo Molnar, Daniel Lezcano, Martin Schwidefsky

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhelgaas@google.com>

commit 96a2adbc6f501996418da9f7afe39bf0e4d006a9 upstream.

kernel/time/jiffies.c provides a default clocksource_default_clock()
definition explicitly marked "weak".  arch/s390 provides its own definition
intended to override the default, but the "weak" attribute on the
declaration applied to the s390 definition as well, so the linker chose one
based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from
pcibios_get_phb_of_node decl")).

Remove the "weak" attribute from the clocksource_default_clock()
declaration so we always prefer a non-weak definition over the weak one,
independent of link order.

Fixes: f1b82746c1e9 ("clocksource: Cleanup clocksource selection")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/clocksource.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -289,7 +289,7 @@ extern struct clocksource* clocksource_g
 extern void clocksource_change_rating(struct clocksource *cs, int rating);
 extern void clocksource_suspend(void);
 extern void clocksource_resume(void);
-extern struct clocksource * __init __weak clocksource_default_clock(void);
+extern struct clocksource * __init clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 078/122] IB/core: Clear AH attr variable to prevent garbage data
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 077/122] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 079/122] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Devesh Sharma, Roland Dreier

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Devesh Sharma <devesh.sharma@emulex.com>

commit 8b0f93d9490653a7b9fc91f3570089132faed1c0 upstream.

During create-ah from userspace, uverbs is sending garbage data in
attr.dmac and attr.vlan_id.  This patch sets attr.dmac and
attr.vlan_id to zero.

Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/core/uverbs_cmd.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2425,6 +2425,8 @@ ssize_t ib_uverbs_create_ah(struct ib_uv
 	attr.grh.sgid_index    = cmd.attr.grh.sgid_index;
 	attr.grh.hop_limit     = cmd.attr.grh.hop_limit;
 	attr.grh.traffic_class = cmd.attr.grh.traffic_class;
+	attr.vlan_id           = 0;
+	memset(&attr.dmac, 0, sizeof(attr.dmac));
 	memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16);
 
 	ah = ib_create_ah(pd, &attr);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 079/122] ipc: always handle a new value of auto_msgmni
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (72 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 078/122] IB/core: Clear AH attr variable to prevent garbage data Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 080/122] netfilter: ipset: off by one in ip_set_nfnl_get_byindex() Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andrey Vagin, Mathias Krause,
	Manfred Spraul, Joe Perches, Davidlohr Bueso, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Vagin <avagin@openvz.org>

commit 1195d94e006b23c6292e78857e154872e33b6d7e upstream.

proc_dointvec_minmax() returns zero if a new value has been set.  So we
don't need to check all charecters have been handled.

Below you can find two examples.  In the new value has not been handled
properly.

$ strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n\0", 3)                    = 2
close(3)                                = 0
exit_group(0)
$ cat /sys/kernel/debug/tracing/trace

$strace ./a.out
open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
write(3, "0\n", 2)                      = 2
close(3)                                = 0

$ cat /sys/kernel/debug/tracing/trace
a.out-697   [000] ....  3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax

Fixes: 9eefe520c814 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin")
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Joe Perches <joe@perches.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 ipc/ipc_sysctl.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -123,7 +123,6 @@ static int proc_ipcauto_dointvec_minmax(
 	void __user *buffer, size_t *lenp, loff_t *ppos)
 {
 	struct ctl_table ipc_table;
-	size_t lenp_bef = *lenp;
 	int oldval;
 	int rc;
 
@@ -133,7 +132,7 @@ static int proc_ipcauto_dointvec_minmax(
 
 	rc = proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
 
-	if (write && !rc && lenp_bef == *lenp) {
+	if (write && !rc) {
 		int newval = *((int *)(ipc_table.data));
 		/*
 		 * The file "auto_msgmni" has correctly been set.



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 080/122] netfilter: ipset: off by one in ip_set_nfnl_get_byindex()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (73 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 079/122] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 081/122] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Jozsef Kadlecsik,
	Pablo Neira Ayuso

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.carpenter@oracle.com>

commit 0f9f5e1b83abd2b37c67658e02a6fc9001831fa5 upstream.

The ->ip_set_list[] array is initialized in ip_set_net_init() and it
has ->ip_set_max elements so this check should be >= instead of >
otherwise we are off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/ipset/ip_set_core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -636,7 +636,7 @@ ip_set_nfnl_get_byindex(struct net *net,
 	struct ip_set *set;
 	struct ip_set_net *inst = ip_set_pernet(net);
 
-	if (index > inst->ip_set_max)
+	if (index >= inst->ip_set_max)
 		return IPSET_INVALID_ID;
 
 	nfnl_lock(NFNL_SUBSYS_IPSET);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 081/122] netfilter: nf_log: account for size of NLMSG_DONE attribute
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (74 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 080/122] netfilter: ipset: off by one in ip_set_nfnl_get_byindex() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 082/122] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Houcheng Lin, Florian Westphal,
	Pablo Neira Ayuso

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a upstream.

We currently neither account for the nlattr size, nor do we consider
the size of the trailing NLMSG_DONE when allocating nlmsg skb.

This can result in nflog to stop working, as __nfulnl_send() re-tries
sending forever if it failed to append NLMSG_DONE (which will never
work if buffer is not large enough).

Reported-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -652,7 +652,8 @@ nfulnl_log_packet(struct net *net,
 		+ nla_total_size(sizeof(u_int32_t))	/* gid */
 		+ nla_total_size(plen)			/* prefix */
 		+ nla_total_size(sizeof(struct nfulnl_msg_packet_hw))
-		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp));
+		+ nla_total_size(sizeof(struct nfulnl_msg_packet_timestamp))
+		+ nla_total_size(sizeof(struct nfgenmsg));	/* NLMSG_DONE */
 
 	if (in && skb_mac_header_was_set(skb)) {
 		size +=   nla_total_size(skb->dev->hard_header_len)
@@ -695,8 +696,7 @@ nfulnl_log_packet(struct net *net,
 		goto unlock_and_release;
 	}
 
-	if (inst->skb &&
-	    size > skb_tailroom(inst->skb) - sizeof(struct nfgenmsg)) {
+	if (inst->skb && size > skb_tailroom(inst->skb)) {
 		/* either the queue len is too high or we don't have
 		 * enough room in the skb left. flush to userspace. */
 		__nfulnl_flush(inst);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 082/122] netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (75 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 081/122] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 083/122] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Florian Westphal, Pablo Neira Ayuso

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Florian Westphal <fw@strlen.de>

commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9 upstream.

don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
The nla length includes the size of the nla struct, so anything larger
results in u16 integer overflow.

This patch is similar to
9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -45,7 +45,8 @@
 #define NFULNL_NLBUFSIZ_DEFAULT	NLMSG_GOODSIZE
 #define NFULNL_TIMEOUT_DEFAULT 	100	/* every second */
 #define NFULNL_QTHRESH_DEFAULT 	100	/* 100 packets */
-#define NFULNL_COPY_RANGE_MAX	0xFFFF	/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+/* max packet size is limited by 16-bit struct nfattr nfa_len field */
+#define NFULNL_COPY_RANGE_MAX	(0xFFFF - NLA_HDRLEN)
 
 #define PRINTR(x, args...)	do { if (net_ratelimit()) \
 				     printk(x, ## args); } while (0);
@@ -255,6 +256,8 @@ nfulnl_set_mode(struct nfulnl_instance *
 
 	case NFULNL_COPY_PACKET:
 		inst->copy_mode = mode;
+		if (range == 0)
+			range = NFULNL_COPY_RANGE_MAX;
 		inst->copy_range = min_t(unsigned int,
 					 range, NFULNL_COPY_RANGE_MAX);
 		break;
@@ -682,8 +685,7 @@ nfulnl_log_packet(struct net *net,
 		break;
 
 	case NFULNL_COPY_PACKET:
-		if (inst->copy_range == 0
-		    || inst->copy_range > skb->len)
+		if (inst->copy_range > skb->len)
 			data_len = skb->len;
 		else
 			data_len = inst->copy_range;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 083/122] netfilter: nf_log: release skbuff on nlmsg put failure
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (76 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 082/122] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 084/122] netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Houcheng Lin, Florian Westphal,
	Pablo Neira Ayuso

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Houcheng Lin <houcheng@gmail.com>

commit b51d3fa364885a2c1e1668f88776c67c95291820 upstream.

The kernel should reserve enough room in the skb so that the DONE
message can always be appended.  However, in case of e.g. new attribute
erronously not being size-accounted for, __nfulnl_send() will still
try to put next nlmsg into this full skbuf, causing the skb to be stuck
forever and blocking delivery of further messages.

Fix issue by releasing skb immediately after nlmsg_put error and
WARN() so we can track down the cause of such size mismatch.

[ fw@strlen.de: add tailroom/len info to WARN ]

Signed-off-by: Houcheng Lin <houcheng@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nfnetlink_log.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -349,26 +349,25 @@ nfulnl_alloc_skb(struct net *net, u32 pe
 	return skb;
 }
 
-static int
+static void
 __nfulnl_send(struct nfulnl_instance *inst)
 {
-	int status = -1;
-
 	if (inst->qlen > 1) {
 		struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
 						 NLMSG_DONE,
 						 sizeof(struct nfgenmsg),
 						 0);
-		if (!nlh)
+		if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
+			      inst->skb->len, skb_tailroom(inst->skb))) {
+			kfree_skb(inst->skb);
 			goto out;
+		}
 	}
-	status = nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
-				   MSG_DONTWAIT);
-
+	nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
+			  MSG_DONTWAIT);
+out:
 	inst->qlen = 0;
 	inst->skb = NULL;
-out:
-	return status;
 }
 
 static void



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 084/122] netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (77 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 083/122] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 085/122] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Arturo Borrero Gonzalez, Pablo Neira Ayuso

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Arturo Borrero <arturo.borrero.glez@gmail.com>

commit 7965ee93719921ea5978f331da653dfa2d7b99f5 upstream.

The code looks for an already loaded target, and the correct list to search
is nft_target_list, not nft_match_list.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nft_compat.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -678,7 +678,7 @@ nft_target_select_ops(const struct nft_c
 	family = ctx->afi->family;
 
 	/* Re-use the existing target if it's already loaded. */
-	list_for_each_entry(nft_target, &nft_match_list, head) {
+	list_for_each_entry(nft_target, &nft_target_list, head) {
 		struct xt_target *target = nft_target->ops.data;
 
 		if (strcmp(target->name, tg_name) == 0 &&



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 085/122] netfilter: xt_bpf: add mising opaque struct sk_filter definition
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (78 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 084/122] netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 086/122] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pablo Neira Ayuso, Willem de Bruijn,
	David S. Miller

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pablo Neira <pablo@netfilter.org>

commit e10038a8ec06ac819b7552bb67aaa6d2d6f850c1 upstream.

This structure is not exposed to userspace, so fix this by defining
struct sk_filter; so we skip the casting in kernelspace. This is safe
since userspace has no way to lurk with that internal pointer.

Fixes: e6f30c7 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/uapi/linux/netfilter/xt_bpf.h |    2 ++
 1 file changed, 2 insertions(+)

--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -6,6 +6,8 @@
 
 #define XT_BPF_MAX_NUM_INSTR	64
 
+struct sk_filter;
+
 struct xt_bpf_info {
 	__u16 bpf_program_num_elem;
 	struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR];



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 086/122] ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (79 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 085/122] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 087/122] GFS2: Fix address space from page function Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jon Medhurst, Ben Dooks,
	Taras Kondratiuk, Wang Nan

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ben Dooks <ben.dooks@codethink.co.uk>

commit 888be25402021a425da3e85e2d5a954d7509286e upstream.

If we are running BE8, the data and instruction endianness do not
match, so use <asm/opcodes.h> to correctly translate memory accesses
into ARM instructions.

Acked-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
[taras.kondratiuk@linaro.org: fixed Thumb instruction fetch order]
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
[wangnan: backport to 3.10 and 3.14:
 - adjust context
 - backport all changes on arch/arm/kernel/probes.c to
   arch/arm/kernel/kprobes-common.c since we don't have
   commit c18377c303787ded44b7decd7dee694db0f205e9.
 - After the above adjustments, becomes same to Taras Kondratiuk's
   original patch:
     http://lists.linaro.org/pipermail/linaro-kernel/2014-January/010346.html
]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/arm/kernel/kprobes-common.c |   19 +++++++++++--------
 arch/arm/kernel/kprobes-thumb.c  |   21 +++++++++++++--------
 arch/arm/kernel/kprobes.c        |    9 +++++----
 3 files changed, 29 insertions(+), 20 deletions(-)

--- a/arch/arm/kernel/kprobes-common.c
+++ b/arch/arm/kernel/kprobes-common.c
@@ -14,6 +14,7 @@
 #include <linux/kernel.h>
 #include <linux/kprobes.h>
 #include <asm/system_info.h>
+#include <asm/opcodes.h>
 
 #include "kprobes.h"
 
@@ -305,7 +306,8 @@ kprobe_decode_ldmstm(kprobe_opcode_t ins
 
 	if (handler) {
 		/* We can emulate the instruction in (possibly) modified form */
-		asi->insn[0] = (insn & 0xfff00000) | (rn << 16) | reglist;
+		asi->insn[0] = __opcode_to_mem_arm((insn & 0xfff00000) |
+						   (rn << 16) | reglist);
 		asi->insn_handler = handler;
 		return INSN_GOOD;
 	}
@@ -334,13 +336,14 @@ prepare_emulated_insn(kprobe_opcode_t in
 #ifdef CONFIG_THUMB2_KERNEL
 	if (thumb) {
 		u16 *thumb_insn = (u16 *)asi->insn;
-		thumb_insn[1] = 0x4770; /* Thumb bx lr */
-		thumb_insn[2] = 0x4770; /* Thumb bx lr */
+		/* Thumb bx lr */
+		thumb_insn[1] = __opcode_to_mem_thumb16(0x4770);
+		thumb_insn[2] = __opcode_to_mem_thumb16(0x4770);
 		return insn;
 	}
-	asi->insn[1] = 0xe12fff1e; /* ARM bx lr */
+	asi->insn[1] = __opcode_to_mem_arm(0xe12fff1e); /* ARM bx lr */
 #else
-	asi->insn[1] = 0xe1a0f00e; /* mov pc, lr */
+	asi->insn[1] = __opcode_to_mem_arm(0xe1a0f00e); /* mov pc, lr */
 #endif
 	/* Make an ARM instruction unconditional */
 	if (insn < 0xe0000000)
@@ -360,12 +363,12 @@ set_emulated_insn(kprobe_opcode_t insn,
 	if (thumb) {
 		u16 *ip = (u16 *)asi->insn;
 		if (is_wide_instruction(insn))
-			*ip++ = insn >> 16;
-		*ip++ = insn;
+			*ip++ = __opcode_to_mem_thumb16(insn >> 16);
+		*ip++ = __opcode_to_mem_thumb16(insn);
 		return;
 	}
 #endif
-	asi->insn[0] = insn;
+	asi->insn[0] = __opcode_to_mem_arm(insn);
 }
 
 /*
--- a/arch/arm/kernel/kprobes-thumb.c
+++ b/arch/arm/kernel/kprobes-thumb.c
@@ -11,6 +11,7 @@
 #include <linux/kernel.h>
 #include <linux/kprobes.h>
 #include <linux/module.h>
+#include <asm/opcodes.h>
 
 #include "kprobes.h"
 
@@ -163,9 +164,9 @@ t32_decode_ldmstm(kprobe_opcode_t insn,
 	enum kprobe_insn ret = kprobe_decode_ldmstm(insn, asi);
 
 	/* Fixup modified instruction to have halfwords in correct order...*/
-	insn = asi->insn[0];
-	((u16 *)asi->insn)[0] = insn >> 16;
-	((u16 *)asi->insn)[1] = insn & 0xffff;
+	insn = __mem_to_opcode_arm(asi->insn[0]);
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(insn >> 16);
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0xffff);
 
 	return ret;
 }
@@ -1153,7 +1154,7 @@ t16_decode_hiregs(kprobe_opcode_t insn,
 {
 	insn &= ~0x00ff;
 	insn |= 0x001; /* Set Rdn = R1 and Rm = R0 */
-	((u16 *)asi->insn)[0] = insn;
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(insn);
 	asi->insn_handler = t16_emulate_hiregs;
 	return INSN_GOOD;
 }
@@ -1182,8 +1183,10 @@ t16_decode_push(kprobe_opcode_t insn, st
 	 * and call it with R9=SP and LR in the register list represented
 	 * by R8.
 	 */
-	((u16 *)asi->insn)[0] = 0xe929;		/* 1st half STMDB R9!,{} */
-	((u16 *)asi->insn)[1] = insn & 0x1ff;	/* 2nd half (register list) */
+	/* 1st half STMDB R9!,{} */
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(0xe929);
+	/* 2nd half (register list) */
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0x1ff);
 	asi->insn_handler = t16_emulate_push;
 	return INSN_GOOD;
 }
@@ -1232,8 +1235,10 @@ t16_decode_pop(kprobe_opcode_t insn, str
 	 * and call it with R9=SP and PC in the register list represented
 	 * by R8.
 	 */
-	((u16 *)asi->insn)[0] = 0xe8b9;		/* 1st half LDMIA R9!,{} */
-	((u16 *)asi->insn)[1] = insn & 0x1ff;	/* 2nd half (register list) */
+	/* 1st half LDMIA R9!,{} */
+	((u16 *)asi->insn)[0] = __opcode_to_mem_thumb16(0xe8b9);
+	/* 2nd half (register list) */
+	((u16 *)asi->insn)[1] = __opcode_to_mem_thumb16(insn & 0x1ff);
 	asi->insn_handler = insn & 0x100 ? t16_emulate_pop_pc
 					 : t16_emulate_pop_nopc;
 	return INSN_GOOD;
--- a/arch/arm/kernel/kprobes.c
+++ b/arch/arm/kernel/kprobes.c
@@ -26,6 +26,7 @@
 #include <linux/stop_machine.h>
 #include <linux/stringify.h>
 #include <asm/traps.h>
+#include <asm/opcodes.h>
 #include <asm/cacheflush.h>
 
 #include "kprobes.h"
@@ -62,10 +63,10 @@ int __kprobes arch_prepare_kprobe(struct
 #ifdef CONFIG_THUMB2_KERNEL
 	thumb = true;
 	addr &= ~1; /* Bit 0 would normally be set to indicate Thumb code */
-	insn = ((u16 *)addr)[0];
+	insn = __mem_to_opcode_thumb16(((u16 *)addr)[0]);
 	if (is_wide_instruction(insn)) {
-		insn <<= 16;
-		insn |= ((u16 *)addr)[1];
+		u16 inst2 = __mem_to_opcode_thumb16(((u16 *)addr)[1]);
+		insn = __opcode_thumb32_compose(insn, inst2);
 		decode_insn = thumb32_kprobe_decode_insn;
 	} else
 		decode_insn = thumb16_kprobe_decode_insn;
@@ -73,7 +74,7 @@ int __kprobes arch_prepare_kprobe(struct
 	thumb = false;
 	if (addr & 0x3)
 		return -EINVAL;
-	insn = *p->addr;
+	insn = __mem_to_opcode_arm(*p->addr);
 	decode_insn = arm_kprobe_decode_insn;
 #endif
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 087/122] GFS2: Fix address space from page function
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (80 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 086/122] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 088/122] rcu: Make callers awaken grace-period kthread Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Abhi Das, Steven Whitehouse

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Steven Whitehouse <swhiteho@redhat.com>

commit 1b2ad41214c9bf6e8befa000f0522629194bf540 upstream.

Now that rgrps use the address space which is part of the super
block, we need to update gfs2_mapping2sbd() to take account of
that. The only way to do that easily is to use a different set
of address_space_operations for rgrps.

Reported-by: Abhi Das <adas@redhat.com>
Tested-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/gfs2/meta_io.c    |    5 +++++
 fs/gfs2/meta_io.h    |    3 +++
 fs/gfs2/ops_fstype.c |    2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -97,6 +97,11 @@ const struct address_space_operations gf
 	.releasepage = gfs2_releasepage,
 };
 
+const struct address_space_operations gfs2_rgrp_aops = {
+	.writepage = gfs2_aspace_writepage,
+	.releasepage = gfs2_releasepage,
+};
+
 /**
  * gfs2_getbuf - Get a buffer with a given address space
  * @gl: the glock
--- a/fs/gfs2/meta_io.h
+++ b/fs/gfs2/meta_io.h
@@ -38,12 +38,15 @@ static inline void gfs2_buffer_copy_tail
 }
 
 extern const struct address_space_operations gfs2_meta_aops;
+extern const struct address_space_operations gfs2_rgrp_aops;
 
 static inline struct gfs2_sbd *gfs2_mapping2sbd(struct address_space *mapping)
 {
 	struct inode *inode = mapping->host;
 	if (mapping->a_ops == &gfs2_meta_aops)
 		return (((struct gfs2_glock *)mapping) - 1)->gl_sbd;
+	else if (mapping->a_ops == &gfs2_rgrp_aops)
+		return container_of(mapping, struct gfs2_sbd, sd_aspace);
 	else
 		return inode->i_sb->s_fs_info;
 }
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -104,7 +104,7 @@ static struct gfs2_sbd *init_sbd(struct
 	mapping = &sdp->sd_aspace;
 
 	address_space_init_once(mapping);
-	mapping->a_ops = &gfs2_meta_aops;
+	mapping->a_ops = &gfs2_rgrp_aops;
 	mapping->host = sb->s_bdev->bd_inode;
 	mapping->flags = 0;
 	mapping_set_gfp_mask(mapping, GFP_NOFS);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 088/122] rcu: Make callers awaken grace-period kthread
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (81 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 087/122] GFS2: Fix address space from page function Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 089/122] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Frederic Weisbecker, Josh Triplett,
	Pranith Kumar, Kamal Mostafa

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

commit 48a7639ce80cf279834d0d44865e49ecd714f37d upstream.

The rcu_start_gp_advanced() function currently uses irq_work_queue()
to defer wakeups of the RCU grace-period kthread.  This deferring
is necessary to avoid RCU-scheduler deadlocks involving the rcu_node
structure's lock, meaning that RCU cannot call any of the scheduler's
wake-up functions while holding one of these locks.

Unfortunately, the second and subsequent calls to irq_work_queue() are
ignored, and the first call will be ignored (aside from queuing the work
item) if the scheduler-clock tick is turned off.  This is OK for many
uses, especially those where irq_work_queue() is called from an interrupt
or softirq handler, because in those cases the scheduler-clock-tick state
will be re-evaluated, which will turn the scheduler-clock tick back on.
On the next tick, any deferred work will then be processed.

However, this strategy does not always work for RCU, which can be invoked
at process level from idle CPUs.  In this case, the tick might never
be turned back on, indefinitely defering a grace-period start request.
Note that the RCU CPU stall detector cannot see this condition, because
there is no RCU grace period in progress.  Therefore, we can (and do!)
see long tens-of-seconds stalls in grace-period handling.  In theory,
we could see a full grace-period hang, but rcutorture testing to date
has seen only the tens-of-seconds stalls.  Event tracing demonstrates
that irq_work_queue() is being called repeatedly to no effect during
these stalls: The "newreq" event appears repeatedly from a task that is
not one of the grace-period kthreads.

In theory, irq_work_queue() might be fixed to avoid this sort of issue,
but RCU's requirements are unusual and it is quite straightforward to pass
wake-up responsibility up through RCU's call chain, so that the wakeup
happens when the offending locks are released.

This commit therefore makes this change.  The rcu_start_gp_advanced(),
rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(),
__note_gp_changes(), and rcu_start_gp() functions now return a boolean
which indicates when a wake-up is needed.  A new rcu_gp_kthread_wake()
does the wakeup when it is necessary and safe to do so: No self-wakes,
no wake-ups if the ->gp_flags field indicates there is no need (as in
someone else did the wake-up before we got around to it), and no wake-ups
before the grace-period kthread has been created.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ Pranith: backport to 3.13-stable: just rcu_gp_kthread_wake(),
  prereq for 2aa792e "rcu: Use rcu_gp_kthread_wake() to wake up grace
  period kthreads" ]
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/rcu/tree.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1228,6 +1228,22 @@ static int rcu_future_gp_cleanup(struct
 }
 
 /*
+ * Awaken the grace-period kthread for the specified flavor of RCU.
+ * Don't do a self-awaken, and don't bother awakening when there is
+ * nothing for the grace-period kthread to do (as in several CPUs
+ * raced to awaken, and we lost), and finally don't try to awaken
+ * a kthread that has not yet been created.
+ */
+static void rcu_gp_kthread_wake(struct rcu_state *rsp)
+{
+	if (current == rsp->gp_kthread ||
+	    !ACCESS_ONCE(rsp->gp_flags) ||
+	    !rsp->gp_kthread)
+		return;
+	wake_up(&rsp->gp_wq);
+}
+
+/*
  * If there is room, assign a ->completed number to any callbacks on
  * this CPU that have not already been assigned.  Also accelerate any
  * callbacks that were previously assigned a ->completed number that has
@@ -1670,7 +1686,7 @@ static void rsp_wakeup(struct irq_work *
 	struct rcu_state *rsp = container_of(work, struct rcu_state, wakeup_work);
 
 	/* Wake up rcu_gp_kthread() to start the grace period. */
-	wake_up(&rsp->gp_wq);
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 089/122] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (82 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 088/122] rcu: Make callers awaken grace-period kthread Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 091/122] perf: Handle compat ioctl Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pranith Kumar, Mathieu Desnoyers,
	Paul E. McKenney, Kamal Mostafa

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pranith Kumar <bobby.prani@gmail.com>

commit 2aa792e6faf1a00f5accf1f69e87e11a390ba2cd upstream.

The rcu_gp_kthread_wake() function checks for three conditions before
waking up grace period kthreads:

*  Is the thread we are trying to wake up the current thread?
*  Are the gp_flags zero? (all threads wait on non-zero gp_flags condition)
*  Is there no thread created for this flavour, hence nothing to wake up?

If any one of these condition is true, we do not call wake_up().
It was found that there are quite a few avoidable wake ups both during
idle time and under stress induced by rcutorture.

Idle:

Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0
Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0

rcutorture:

Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0
Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0

Here case{1-3} are the cases listed above. We can avoid these wake
ups by using rcu_gp_kthread_wake() to conditionally wake up the grace
period kthreads.

There is a comment about an implied barrier supplied by the wake_up()
logic.  This barrier is necessary for the awakened thread to see the
updated ->gp_flags.  This flag is always being updated with the root node
lock held. Also, the awakened thread tries to acquire the root node lock
before reading ->gp_flags because of which there is proper ordering.

Hence this commit tries to avoid calling wake_up() whenever we can by
using rcu_gp_kthread_wake() function.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/rcu/tree.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1762,7 +1762,7 @@ static void rcu_report_qs_rsp(struct rcu
 {
 	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
 	raw_spin_unlock_irqrestore(&rcu_get_root(rsp)->lock, flags);
-	wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*
@@ -2338,7 +2338,7 @@ static void force_quiescent_state(struct
 	}
 	rsp->gp_flags |= RCU_GP_FLAG_FQS;
 	raw_spin_unlock_irqrestore(&rnp_old->lock, flags);
-	wake_up(&rsp->gp_wq);  /* Memory barrier implied by wake_up() path. */
+	rcu_gp_kthread_wake(rsp);
 }
 
 /*



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 091/122] perf: Handle compat ioctl
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (83 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 089/122] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 092/122] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Drew Richardson, Pawel Moll,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	David Ahern

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pawel Moll <pawel.moll@arm.com>

commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream.

When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.

For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.

This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/events/core.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
 #include <linux/hw_breakpoint.h>
 #include <linux/mm_types.h>
 #include <linux/cgroup.h>
+#include <linux/compat.h>
 
 #include "internal.h"
 
@@ -3693,6 +3694,26 @@ static long perf_ioctl(struct file *file
 	return 0;
 }
 
+#ifdef CONFIG_COMPAT
+static long perf_compat_ioctl(struct file *file, unsigned int cmd,
+				unsigned long arg)
+{
+	switch (_IOC_NR(cmd)) {
+	case _IOC_NR(PERF_EVENT_IOC_SET_FILTER):
+	case _IOC_NR(PERF_EVENT_IOC_ID):
+		/* Fix up pointer size (usually 4 -> 8 in 32-on-64-bit case */
+		if (_IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
+			cmd &= ~IOCSIZE_MASK;
+			cmd |= sizeof(void *) << IOCSIZE_SHIFT;
+		}
+		break;
+	}
+	return perf_ioctl(file, cmd, arg);
+}
+#else
+# define perf_compat_ioctl NULL
+#endif
+
 int perf_event_task_enable(void)
 {
 	struct perf_event *event;
@@ -4185,7 +4206,7 @@ static const struct file_operations perf
 	.read			= perf_read,
 	.poll			= perf_poll,
 	.unlocked_ioctl		= perf_ioctl,
-	.compat_ioctl		= perf_ioctl,
+	.compat_ioctl		= perf_compat_ioctl,
 	.mmap			= perf_mmap,
 	.fasync			= perf_fasync,
 };



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 092/122] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (84 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 091/122] perf: Handle compat ioctl Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 093/122] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vince Weaver, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Linus Torvalds, Ingo Molnar,
	Hou Pengyang

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vince Weaver <vincent.weaver@maine.edu>

commit 1996388e9f4e3444db8273bc08d25164d2967c21 upstream.

This was discussed back in February:

	https://lkml.org/lkml/2014/2/18/956

But I never saw a patch come out of it.

On IvyBridge we share the SandyBridge cache event tables, but the
dTLB-load-miss event is not compatible.  Patch it up after
the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2475,6 +2475,9 @@ __init int intel_pmu_init(void)
 	case 62: /* IvyBridge EP */
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
+		/* dTLB-load-misses on IVB is different than SNB */
+		hw_cache_event_ids[C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = 0x8108; /* DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK */
+
 		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
 		       sizeof(hw_cache_extra_regs));
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 093/122] KVM: x86: Dont report guest userspace emulation error to userspace
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (85 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 092/122] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 094/122] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nadav Amit, Paolo Bonzini

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nadav Amit <namit@cs.technion.ac.il>

commit a2b9e6c1a35afcc0973acb72e591c714e78885ff upstream.

Commit fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to
user-space") disabled the reporting of L2 (nested guest) emulation failures to
userspace due to race-condition between a vmexit and the instruction emulator.
The same rational applies also to userspace applications that are permitted by
the guest OS to access MMIO area or perform PIO.

This patch extends the current behavior - of injecting a #UD instead of
reporting it to userspace - also for guest userspace code.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kvm/x86.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4911,7 +4911,7 @@ static int handle_emulation_failure(stru
 
 	++vcpu->stat.insn_emulation_fail;
 	trace_kvm_emulate_insn_failed(vcpu);
-	if (!is_guest_mode(vcpu)) {
+	if (!is_guest_mode(vcpu) && kvm_x86_ops->get_cpl(vcpu) == 0) {
 		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
 		vcpu->run->internal.ndata = 0;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 094/122] net: sctp: fix remote memory pressure from excessive queueing
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (86 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 093/122] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 095/122] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	David S. Miller, Josh Boyer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream.

This scenario is not limited to ASCONF, just taken as one
example triggering the issue. When receiving ASCONF probes
in the form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
  [...]
  ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>

... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
ASCONFs and have increasing serial numbers, we process such
ASCONF chunk(s) marked with !end_of_packet and !singleton,
since we have not yet reached the SCTP packet end. SCTP does
only do verification on a chunk by chunk basis, as an SCTP
packet is nothing more than just a container of a stream of
chunks which it eats up one by one.

We could run into the case that we receive a packet with a
malformed tail, above marked as trailing JUNK. All previous
chunks are here goodformed, so the stack will eat up all
previous chunks up to this point. In case JUNK does not fit
into a chunk header and there are no more other chunks in
the input queue, or in case JUNK contains a garbage chunk
header, but the encoded chunk length would exceed the skb
tail, or we came here from an entirely different scenario
and the chunk has pdiscard=1 mark (without having had a flush
point), it will happen, that we will excessively queue up
the association's output queue (a correct final chunk may
then turn it into a response flood when flushing the
queue ;)): I ran a simple script with incremental ASCONF
serial numbers and could see the server side consuming
excessive amount of RAM [before/after: up to 2GB and more].

The issue at heart is that the chunk train basically ends
with !end_of_packet and !singleton markers and since commit
2e3216cd54b1 ("sctp: Follow security requirement of responding
with 1 packet") therefore preventing an output queue flush
point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
chunk (chunk = event_arg) even though local_cork is set,
but its precedence has changed since then. In the normal
case, the last chunk with end_of_packet=1 would trigger the
queue flush to accommodate possible outgoing bundling.

In the input queue, sctp_inq_pop() seems to do the right thing
in terms of discarding invalid chunks. So, above JUNK will
not enter the state machine and instead be released and exit
the sctp_assoc_bh_rcv() chunk processing loop. It's simply
the flush point being missing at loop exit. Adding a try-flush
approach on the output queue might not work as the underlying
infrastructure might be long gone at this point due to the
side-effect interpreter run.

One possibility, albeit a bit of a kludge, would be to defer
invalid chunk freeing into the state machine in order to
possibly trigger packet discards and thus indirectly a queue
flush on error. It would surely be better to discard chunks
as in the current, perhaps better controlled environment, but
going back and forth, it's simply architecturally not possible.
I tried various trailing JUNK attack cases and it seems to
look good now.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sctp/inqueue.c      |   33 +++++++--------------------------
 net/sctp/sm_statefuns.c |    3 +++
 2 files changed, 10 insertions(+), 26 deletions(-)

--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -140,18 +140,9 @@ struct sctp_chunk *sctp_inq_pop(struct s
 		} else {
 			/* Nothing to do. Next chunk in the packet, please. */
 			ch = (sctp_chunkhdr_t *) chunk->chunk_end;
-
 			/* Force chunk->skb->data to chunk->chunk_end.  */
-			skb_pull(chunk->skb,
-				 chunk->chunk_end - chunk->skb->data);
-
-			/* Verify that we have at least chunk headers
-			 * worth of buffer left.
-			 */
-			if (skb_headlen(chunk->skb) < sizeof(sctp_chunkhdr_t)) {
-				sctp_chunk_free(chunk);
-				chunk = queue->in_progress = NULL;
-			}
+			skb_pull(chunk->skb, chunk->chunk_end - chunk->skb->data);
+			/* We are guaranteed to pull a SCTP header. */
 		}
 	}
 
@@ -187,24 +178,14 @@ struct sctp_chunk *sctp_inq_pop(struct s
 	skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
 	chunk->subh.v = NULL; /* Subheader is no longer valid.  */
 
-	if (chunk->chunk_end < skb_tail_pointer(chunk->skb)) {
+	if (chunk->chunk_end + sizeof(sctp_chunkhdr_t) <
+	    skb_tail_pointer(chunk->skb)) {
 		/* This is not a singleton */
 		chunk->singleton = 0;
 	} else if (chunk->chunk_end > skb_tail_pointer(chunk->skb)) {
-		/* RFC 2960, Section 6.10  Bundling
-		 *
-		 * Partial chunks MUST NOT be placed in an SCTP packet.
-		 * If the receiver detects a partial chunk, it MUST drop
-		 * the chunk.
-		 *
-		 * Since the end of the chunk is past the end of our buffer
-		 * (which contains the whole packet, we can freely discard
-		 * the whole packet.
-		 */
-		sctp_chunk_free(chunk);
-		chunk = queue->in_progress = NULL;
-
-		return NULL;
+		/* Discard inside state machine. */
+		chunk->pdiscard = 1;
+		chunk->chunk_end = skb_tail_pointer(chunk->skb);
 	} else {
 		/* We are at the end of the packet, so mark the chunk
 		 * in case we need to send a SACK.
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -170,6 +170,9 @@ sctp_chunk_length_valid(struct sctp_chun
 {
 	__u16 chunk_length = ntohs(chunk->chunk_hdr->length);
 
+	/* Previously already marked? */
+	if (unlikely(chunk->pdiscard))
+		return 0;
 	if (unlikely(chunk_length < required_length))
 		return 0;
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 095/122] net: sctp: fix panic on duplicate ASCONF chunks
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (87 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 094/122] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 096/122] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	David S. Miller, Josh Boyer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream.

When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54b1
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/sctp/sctp.h |    5 +++++
 net/sctp/associola.c    |    2 ++
 2 files changed, 7 insertions(+)

--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -433,6 +433,11 @@ static inline void sctp_assoc_pending_pm
 	asoc->pmtu_pending = 0;
 }
 
+static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
+{
+	return !list_empty(&chunk->list);
+}
+
 /* Walk through a list of TLV parameters.  Don't trust the
  * individual parameter lengths and instead depend on
  * the chunk length to indicate when to stop.  Make sure
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1627,6 +1627,8 @@ struct sctp_chunk *sctp_assoc_lookup_asc
 	 * ack chunk whose serial number matches that of the request.
 	 */
 	list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
+		if (sctp_chunk_pending(ack))
+			continue;
 		if (ack->subh.addip_hdr->serial == serial) {
 			sctp_chunk_hold(ack);
 			return ack;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 096/122] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (88 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 095/122] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 097/122] iwlwifi: configure the LTR Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Daniel Borkmann, Vlad Yasevich,
	Neil Horman, David S. Miller, Josh Boyer

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dborkman@redhat.com>

commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream.

Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/sctp/sm.h    |    6 +-
 net/sctp/sm_make_chunk.c |   99 ++++++++++++++++++++++++++---------------------
 net/sctp/sm_statefuns.c  |   18 --------
 3 files changed, 60 insertions(+), 63 deletions(-)

--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -248,9 +248,9 @@ struct sctp_chunk *sctp_make_asconf_upda
 					      int, __be16);
 struct sctp_chunk *sctp_make_asconf_set_prim(struct sctp_association *asoc,
 					     union sctp_addr *addr);
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp);
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp);
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf);
 int sctp_process_asconf_ack(struct sctp_association *asoc,
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3113,50 +3113,63 @@ static __be16 sctp_process_asconf_param(
 	return SCTP_ERROR_NO_ERROR;
 }
 
-/* Verify the ASCONF packet before we process it.  */
-int sctp_verify_asconf(const struct sctp_association *asoc,
-		       struct sctp_paramhdr *param_hdr, void *chunk_end,
-		       struct sctp_paramhdr **errp) {
-	sctp_addip_param_t *asconf_param;
+/* Verify the ASCONF packet before we process it. */
+bool sctp_verify_asconf(const struct sctp_association *asoc,
+			struct sctp_chunk *chunk, bool addr_param_needed,
+			struct sctp_paramhdr **errp)
+{
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) chunk->chunk_hdr;
 	union sctp_params param;
-	int length, plen;
-
-	param.v = (sctp_paramhdr_t *) param_hdr;
-	while (param.v <= chunk_end - sizeof(sctp_paramhdr_t)) {
-		length = ntohs(param.p->length);
-		*errp = param.p;
+	bool addr_param_seen = false;
 
-		if (param.v > chunk_end - length ||
-		    length < sizeof(sctp_paramhdr_t))
-			return 0;
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		size_t length = ntohs(param.p->length);
 
+		*errp = param.p;
 		switch (param.p->type) {
+		case SCTP_PARAM_ERR_CAUSE:
+			break;
+		case SCTP_PARAM_IPV4_ADDRESS:
+			if (length != sizeof(sctp_ipv4addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
+		case SCTP_PARAM_IPV6_ADDRESS:
+			if (length != sizeof(sctp_ipv6addr_param_t))
+				return false;
+			addr_param_seen = true;
+			break;
 		case SCTP_PARAM_ADD_IP:
 		case SCTP_PARAM_DEL_IP:
 		case SCTP_PARAM_SET_PRIMARY:
-			asconf_param = (sctp_addip_param_t *)param.v;
-			plen = ntohs(asconf_param->param_hdr.length);
-			if (plen < sizeof(sctp_addip_param_t) +
-			    sizeof(sctp_paramhdr_t))
-				return 0;
+			/* In ASCONF chunks, these need to be first. */
+			if (addr_param_needed && !addr_param_seen)
+				return false;
+			length = ntohs(param.addip->param_hdr.length);
+			if (length < sizeof(sctp_addip_param_t) +
+				     sizeof(sctp_paramhdr_t))
+				return false;
 			break;
 		case SCTP_PARAM_SUCCESS_REPORT:
 		case SCTP_PARAM_ADAPTATION_LAYER_IND:
 			if (length != sizeof(sctp_addip_param_t))
-				return 0;
-
+				return false;
 			break;
 		default:
-			break;
+			/* This is unkown to us, reject! */
+			return false;
 		}
-
-		param.v += WORD_ROUND(length);
 	}
 
-	if (param.v != chunk_end)
-		return 0;
+	/* Remaining sanity checks. */
+	if (addr_param_needed && !addr_param_seen)
+		return false;
+	if (!addr_param_needed && addr_param_seen)
+		return false;
+	if (param.v != chunk->chunk_end)
+		return false;
 
-	return 1;
+	return true;
 }
 
 /* Process an incoming ASCONF chunk with the next expected serial no. and
@@ -3165,16 +3178,17 @@ int sctp_verify_asconf(const struct sctp
 struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf)
 {
+	sctp_addip_chunk_t *addip = (sctp_addip_chunk_t *) asconf->chunk_hdr;
+	bool all_param_pass = true;
+	union sctp_params param;
 	sctp_addiphdr_t		*hdr;
 	union sctp_addr_param	*addr_param;
 	sctp_addip_param_t	*asconf_param;
 	struct sctp_chunk	*asconf_ack;
-
 	__be16	err_code;
 	int	length = 0;
 	int	chunk_len;
 	__u32	serial;
-	int	all_param_pass = 1;
 
 	chunk_len = ntohs(asconf->chunk_hdr->length) - sizeof(sctp_chunkhdr_t);
 	hdr = (sctp_addiphdr_t *)asconf->skb->data;
@@ -3202,9 +3216,14 @@ struct sctp_chunk *sctp_process_asconf(s
 		goto done;
 
 	/* Process the TLVs contained within the ASCONF chunk. */
-	while (chunk_len > 0) {
+	sctp_walk_params(param, addip, addip_hdr.params) {
+		/* Skip preceeding address parameters. */
+		if (param.p->type == SCTP_PARAM_IPV4_ADDRESS ||
+		    param.p->type == SCTP_PARAM_IPV6_ADDRESS)
+			continue;
+
 		err_code = sctp_process_asconf_param(asoc, asconf,
-						     asconf_param);
+						     param.addip);
 		/* ADDIP 4.1 A7)
 		 * If an error response is received for a TLV parameter,
 		 * all TLVs with no response before the failed TLV are
@@ -3212,28 +3231,20 @@ struct sctp_chunk *sctp_process_asconf(s
 		 * the failed response are considered unsuccessful unless
 		 * a specific success indication is present for the parameter.
 		 */
-		if (SCTP_ERROR_NO_ERROR != err_code)
-			all_param_pass = 0;
-
+		if (err_code != SCTP_ERROR_NO_ERROR)
+			all_param_pass = false;
 		if (!all_param_pass)
-			sctp_add_asconf_response(asconf_ack,
-						 asconf_param->crr_id, err_code,
-						 asconf_param);
+			sctp_add_asconf_response(asconf_ack, param.addip->crr_id,
+						 err_code, param.addip);
 
 		/* ADDIP 4.3 D11) When an endpoint receiving an ASCONF to add
 		 * an IP address sends an 'Out of Resource' in its response, it
 		 * MUST also fail any subsequent add or delete requests bundled
 		 * in the ASCONF.
 		 */
-		if (SCTP_ERROR_RSRC_LOW == err_code)
+		if (err_code == SCTP_ERROR_RSRC_LOW)
 			goto done;
-
-		/* Move to the next ASCONF param. */
-		length = ntohs(asconf_param->param_hdr.length);
-		asconf_param = (void *)asconf_param + length;
-		chunk_len -= length;
 	}
-
 done:
 	asoc->peer.addip_serial++;
 
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3594,9 +3594,7 @@ sctp_disposition_t sctp_sf_do_asconf(str
 	struct sctp_chunk	*asconf_ack = NULL;
 	struct sctp_paramhdr	*err_param = NULL;
 	sctp_addiphdr_t		*hdr;
-	union sctp_addr_param	*addr_param;
 	__u32			serial;
-	int			length;
 
 	if (!sctp_vtag_verify(chunk, asoc)) {
 		sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_BAD_TAG,
@@ -3621,17 +3619,8 @@ sctp_disposition_t sctp_sf_do_asconf(str
 	hdr = (sctp_addiphdr_t *)chunk->skb->data;
 	serial = ntohl(hdr->serial);
 
-	addr_param = (union sctp_addr_param *)hdr->params;
-	length = ntohs(addr_param->p.length);
-	if (length < sizeof(sctp_paramhdr_t))
-		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
-			   (void *)addr_param, commands);
-
 	/* Verify the ASCONF chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-			    (sctp_paramhdr_t *)((void *)addr_param + length),
-			    (void *)chunk->chunk_end,
-			    &err_param))
+	if (!sctp_verify_asconf(asoc, chunk, true, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 						  (void *)err_param, commands);
 
@@ -3748,10 +3737,7 @@ sctp_disposition_t sctp_sf_do_asconf_ack
 	rcvd_serial = ntohl(addip_hdr->serial);
 
 	/* Verify the ASCONF-ACK chunk before processing it. */
-	if (!sctp_verify_asconf(asoc,
-	    (sctp_paramhdr_t *)addip_hdr->params,
-	    (void *)asconf_ack->chunk_end,
-	    &err_param))
+	if (!sctp_verify_asconf(asoc, asconf_ack, false, &err_param))
 		return sctp_sf_violation_paramlen(net, ep, asoc, type, arg,
 			   (void *)err_param, commands);
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 097/122] iwlwifi: configure the LTR
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (89 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 096/122] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 098/122] regmap: fix kernel hang on regmap_bulk_write with zero val_count Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Emmanuel Grumbach

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>

commit 9180ac50716a097a407c6d7e7e4589754a922260 upstream.

The LTR is the handshake between the device and the root
complex about the latency allowed when the bus exits power
save. This configuration was missing and this led to high
latency in the link power up. The end user could experience
high latency in the network because of this.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/net/wireless/iwlwifi/iwl-trans.h        |    2 +
 drivers/net/wireless/iwlwifi/mvm/fw-api-power.h |   35 +++++++++++++++++++++++-
 drivers/net/wireless/iwlwifi/mvm/fw-api.h       |    1 
 drivers/net/wireless/iwlwifi/mvm/fw.c           |    9 ++++++
 drivers/net/wireless/iwlwifi/mvm/ops.c          |    1 
 drivers/net/wireless/iwlwifi/pcie/trans.c       |   16 ++++++----
 6 files changed, 56 insertions(+), 8 deletions(-)

--- a/drivers/net/wireless/iwlwifi/iwl-trans.h
+++ b/drivers/net/wireless/iwlwifi/iwl-trans.h
@@ -514,6 +514,7 @@ enum iwl_trans_state {
  *	Set during transport allocation.
  * @hw_id_str: a string with info about HW ID. Set during transport allocation.
  * @pm_support: set to true in start_hw if link pm is supported
+ * @ltr_enabled: set to true if the LTR is enabled
  * @dev_cmd_pool: pool for Tx cmd allocation - for internal use only.
  *	The user should use iwl_trans_{alloc,free}_tx_cmd.
  * @dev_cmd_headroom: room needed for the transport's private use before the
@@ -539,6 +540,7 @@ struct iwl_trans {
 	u8 rx_mpdu_cmd, rx_mpdu_cmd_hdr_size;
 
 	bool pm_support;
+	bool ltr_enabled;
 
 	/* The following fields are internal only */
 	struct kmem_cache *dev_cmd_pool;
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api-power.h
@@ -66,13 +66,46 @@
 
 /* Power Management Commands, Responses, Notifications */
 
+/**
+ * enum iwl_ltr_config_flags - masks for LTR config command flags
+ * @LTR_CFG_FLAG_FEATURE_ENABLE: Feature operational status
+ * @LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS: allow LTR change on shadow
+ *	memory access
+ * @LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH: allow LTR msg send on ANY LTR
+ *	reg change
+ * @LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3: allow LTR msg send on transition from
+ *	D0 to D3
+ * @LTR_CFG_FLAG_SW_SET_SHORT: fixed static short LTR register
+ * @LTR_CFG_FLAG_SW_SET_LONG: fixed static short LONG register
+ * @LTR_CFG_FLAG_DENIE_C10_ON_PD: allow going into C10 on PD
+ */
+enum iwl_ltr_config_flags {
+	LTR_CFG_FLAG_FEATURE_ENABLE = BIT(0),
+	LTR_CFG_FLAG_HW_DIS_ON_SHADOW_REG_ACCESS = BIT(1),
+	LTR_CFG_FLAG_HW_EN_SHRT_WR_THROUGH = BIT(2),
+	LTR_CFG_FLAG_HW_DIS_ON_D0_2_D3 = BIT(3),
+	LTR_CFG_FLAG_SW_SET_SHORT = BIT(4),
+	LTR_CFG_FLAG_SW_SET_LONG = BIT(5),
+	LTR_CFG_FLAG_DENIE_C10_ON_PD = BIT(6),
+};
+
+/**
+ * struct iwl_ltr_config_cmd - configures the LTR
+ * @flags: See %enum iwl_ltr_config_flags
+ */
+struct iwl_ltr_config_cmd {
+	__le32 flags;
+	__le32 static_long;
+	__le32 static_short;
+} __packed;
+
 /* Radio LP RX Energy Threshold measured in dBm */
 #define POWER_LPRX_RSSI_THRESHOLD	75
 #define POWER_LPRX_RSSI_THRESHOLD_MAX	94
 #define POWER_LPRX_RSSI_THRESHOLD_MIN	30
 
 /**
- * enum iwl_scan_flags - masks for power table command flags
+ * enum iwl_power_flags - masks for power table command flags
  * @POWER_FLAGS_POWER_SAVE_ENA_MSK: '1' Allow to save power by turning off
  *		receiver and transmitter. '0' - does not allow.
  * @POWER_FLAGS_POWER_MANAGEMENT_ENA_MSK: '0' Driver disables power management,
--- a/drivers/net/wireless/iwlwifi/mvm/fw-api.h
+++ b/drivers/net/wireless/iwlwifi/mvm/fw-api.h
@@ -142,6 +142,7 @@ enum {
 	/* Power - legacy power table command */
 	POWER_TABLE_CMD = 0x77,
 	PSM_UAPSD_AP_MISBEHAVING_NOTIFICATION = 0x78,
+	LTR_CONFIG = 0xee,
 
 	/* Thermal Throttling*/
 	REPLY_THERMAL_MNG_BACKOFF = 0x7e,
--- a/drivers/net/wireless/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/iwlwifi/mvm/fw.c
@@ -439,6 +439,15 @@ int iwl_mvm_up(struct iwl_mvm *mvm)
 			goto error;
 	}
 
+	if (mvm->trans->ltr_enabled) {
+		struct iwl_ltr_config_cmd cmd = {
+			.flags = cpu_to_le32(LTR_CFG_FLAG_FEATURE_ENABLE),
+		};
+
+		WARN_ON(iwl_mvm_send_cmd_pdu(mvm, LTR_CONFIG, 0,
+					     sizeof(cmd), &cmd));
+	}
+
 	ret = iwl_mvm_power_update_device_mode(mvm);
 	if (ret)
 		goto error;
--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
+++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
@@ -313,6 +313,7 @@ static const char *iwl_mvm_cmd_strings[R
 	CMD(REPLY_BEACON_FILTERING_CMD),
 	CMD(REPLY_THERMAL_MNG_BACKOFF),
 	CMD(MAC_PM_POWER_TABLE),
+	CMD(LTR_CONFIG),
 	CMD(BT_COEX_CI),
 	CMD(PSM_UAPSD_AP_MISBEHAVING_NOTIFICATION),
 };
--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
@@ -94,6 +94,7 @@ static void iwl_pcie_apm_config(struct i
 {
 	struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
 	u16 lctl;
+	u16 cap;
 
 	/*
 	 * HW bug W/A for instability in PCIe bus L0S->L1 transition.
@@ -104,16 +105,17 @@ static void iwl_pcie_apm_config(struct i
 	 *    power savings, even without L1.
 	 */
 	pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_LNKCTL, &lctl);
-	if (lctl & PCI_EXP_LNKCTL_ASPM_L1) {
-		/* L1-ASPM enabled; disable(!) L0S */
+	if (lctl & PCI_EXP_LNKCTL_ASPM_L1)
 		iwl_set_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
-		dev_info(trans->dev, "L1 Enabled; Disabling L0S\n");
-	} else {
-		/* L1-ASPM disabled; enable(!) L0S */
+	else
 		iwl_clear_bit(trans, CSR_GIO_REG, CSR_GIO_REG_VAL_L0S_ENABLED);
-		dev_info(trans->dev, "L1 Disabled; Enabling L0S\n");
-	}
 	trans->pm_support = !(lctl & PCI_EXP_LNKCTL_ASPM_L0S);
+
+	pcie_capability_read_word(trans_pcie->pci_dev, PCI_EXP_DEVCTL2, &cap);
+	trans->ltr_enabled = cap & PCI_EXP_DEVCTL2_LTR_EN;
+	dev_info(trans->dev, "L1 %sabled - LTR %sabled\n",
+		 (lctl & PCI_EXP_LNKCTL_ASPM_L1) ? "En" : "Dis",
+		 trans->ltr_enabled ? "En" : "Dis");
 }
 
 /*



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 098/122] regmap: fix kernel hang on regmap_bulk_write with zero val_count.
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (90 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 097/122] iwlwifi: configure the LTR Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 099/122] lib: radix-tree: add radix_tree_delete_item() Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Quentin Casasnovas

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Quentin Casasnovas <quentin.casasnovas@oracle.com>

Fixes commit 2f06fa04cf35da5c24481da3ac84a2900d0b99c3 which was an
incorrect backported version of commit
d6b41cb06044a7d895db82bdd54f6e4219970510 upstream.

If val_count is zero we return -EINVAL with map->lock_arg locked, which
will deadlock the kernel next time we try to acquire this lock.

This was introduced by f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer
dereferencing error.") which improperly back-ported d6b41cb0.

This issue was found during review of Ubuntu Trusty 3.13.0-40.68 kernel to
prepare Ksplice rebootless updates.

Fixes: f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/base/regmap/regmap.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/base/regmap/regmap.c
+++ b/drivers/base/regmap/regmap.c
@@ -1557,8 +1557,10 @@ int regmap_bulk_write(struct regmap *map
 	} else {
 		void *wval;
 
-		if (!val_count)
-			return -EINVAL;
+		if (!val_count) {
+			ret = -EINVAL;
+			goto out;
+		}
 
 		wval = kmemdup(val, val_count * val_bytes, GFP_KERNEL);
 		if (!wval) {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 099/122] lib: radix-tree: add radix_tree_delete_item()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (91 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 098/122] regmap: fix kernel hang on regmap_bulk_write with zero val_count Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 100/122] mm: shmem: save one radix tree lookup when truncating swapped pages Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Minchan Kim,
	Rik van Riel, Mel Gorman, Andrea Arcangeli, Bob Liu,
	Christoph Hellwig, Dave Chinner, Greg Thelen, Hugh Dickins,
	Jan Kara, KOSAKI Motohiro, Luigi Semenzato, Metin Doslu,
	Michel Lespinasse, Ozgun Erdogan, Peter Zijlstra, Roman Gushchin,
	Ryan Mallon, Tejun Heo, Vlastimil Babka, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 53c59f262d747ea82e7414774c59a489501186a0 upstream.

Provide a function that does not just delete an entry at a given index,
but also allows passing in an expected item.  Delete only if that item
is still located at the specified index.

This is handy when lockless tree traversals want to delete entries as
well because they don't have to do an second, locked lookup to verify
the slot has not changed under them before deleting the entry.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/radix-tree.h |    1 +
 lib/radix-tree.c           |   31 +++++++++++++++++++++++++++----
 2 files changed, 28 insertions(+), 4 deletions(-)

--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -219,6 +219,7 @@ static inline void radix_tree_replace_sl
 int radix_tree_insert(struct radix_tree_root *, unsigned long, void *);
 void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
 void **radix_tree_lookup_slot(struct radix_tree_root *, unsigned long);
+void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
 void *radix_tree_delete(struct radix_tree_root *, unsigned long);
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1337,15 +1337,18 @@ static inline void radix_tree_shrink(str
 }
 
 /**
- *	radix_tree_delete    -    delete an item from a radix tree
+ *	radix_tree_delete_item    -    delete an item from a radix tree
  *	@root:		radix tree root
  *	@index:		index key
+ *	@item:		expected item
  *
- *	Remove the item at @index from the radix tree rooted at @root.
+ *	Remove @item at @index from the radix tree rooted at @root.
  *
- *	Returns the address of the deleted item, or NULL if it was not present.
+ *	Returns the address of the deleted item, or NULL if it was not present
+ *	or the entry at the given @index was not @item.
  */
-void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
+void *radix_tree_delete_item(struct radix_tree_root *root,
+			     unsigned long index, void *item)
 {
 	struct radix_tree_node *node = NULL;
 	struct radix_tree_node *slot = NULL;
@@ -1380,6 +1383,11 @@ void *radix_tree_delete(struct radix_tre
 	if (slot == NULL)
 		goto out;
 
+	if (item && slot != item) {
+		slot = NULL;
+		goto out;
+	}
+
 	/*
 	 * Clear all tags associated with the item to be deleted.
 	 * This way of doing it would be inefficient, but seldom is any set.
@@ -1424,6 +1432,21 @@ void *radix_tree_delete(struct radix_tre
 out:
 	return slot;
 }
+EXPORT_SYMBOL(radix_tree_delete_item);
+
+/**
+ *	radix_tree_delete    -    delete an item from a radix tree
+ *	@root:		radix tree root
+ *	@index:		index key
+ *
+ *	Remove the item at @index from the radix tree rooted at @root.
+ *
+ *	Returns the address of the deleted item, or NULL if it was not present.
+ */
+void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
+{
+	return radix_tree_delete_item(root, index, NULL);
+}
 EXPORT_SYMBOL(radix_tree_delete);
 
 /**



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 100/122] mm: shmem: save one radix tree lookup when truncating swapped pages
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (92 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 099/122] lib: radix-tree: add radix_tree_delete_item() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 101/122] mm: filemap: move radix tree hole searching here Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Minchan Kim,
	Rik van Riel, Mel Gorman, Andrea Arcangeli, Bob Liu,
	Christoph Hellwig, Dave Chinner, Greg Thelen, Hugh Dickins,
	Jan Kara, KOSAKI Motohiro, Luigi Semenzato, Metin Doslu,
	Michel Lespinasse, Ozgun Erdogan, Peter Zijlstra, Roman Gushchin,
	Ryan Mallon, Tejun Heo, Vlastimil Babka, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87 upstream.

Page cache radix tree slots are usually stabilized by the page lock, but
shmem's swap cookies have no such thing.  Because the overall truncation
loop is lockless, the swap entry is currently confirmed by a tree lookup
and then deleted by another tree lookup under the same tree lock region.

Use radix_tree_delete_item() instead, which does the verification and
deletion with only one lookup.  This also allows removing the
delete-only special case from shmem_radix_tree_replace().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/shmem.c |   25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -243,19 +243,17 @@ static int shmem_radix_tree_replace(stru
 			pgoff_t index, void *expected, void *replacement)
 {
 	void **pslot;
-	void *item = NULL;
+	void *item;
 
 	VM_BUG_ON(!expected);
+	VM_BUG_ON(!replacement);
 	pslot = radix_tree_lookup_slot(&mapping->page_tree, index);
-	if (pslot)
-		item = radix_tree_deref_slot_protected(pslot,
-							&mapping->tree_lock);
+	if (!pslot)
+		return -ENOENT;
+	item = radix_tree_deref_slot_protected(pslot, &mapping->tree_lock);
 	if (item != expected)
 		return -ENOENT;
-	if (replacement)
-		radix_tree_replace_slot(pslot, replacement);
-	else
-		radix_tree_delete(&mapping->page_tree, index);
+	radix_tree_replace_slot(pslot, replacement);
 	return 0;
 }
 
@@ -387,14 +385,15 @@ export:
 static int shmem_free_swap(struct address_space *mapping,
 			   pgoff_t index, void *radswap)
 {
-	int error;
+	void *old;
 
 	spin_lock_irq(&mapping->tree_lock);
-	error = shmem_radix_tree_replace(mapping, index, radswap, NULL);
+	old = radix_tree_delete_item(&mapping->page_tree, index, radswap);
 	spin_unlock_irq(&mapping->tree_lock);
-	if (!error)
-		free_swap_and_cache(radix_to_swp_entry(radswap));
-	return error;
+	if (old != radswap)
+		return -ENOENT;
+	free_swap_and_cache(radix_to_swp_entry(radswap));
+	return 0;
 }
 
 /*



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 101/122] mm: filemap: move radix tree hole searching here
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (93 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 100/122] mm: shmem: save one radix tree lookup when truncating swapped pages Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Rik van Riel,
	Minchan Kim, Mel Gorman, Andrea Arcangeli, Bob Liu,
	Christoph Hellwig, Dave Chinner, Greg Thelen, Hugh Dickins,
	Jan Kara, KOSAKI Motohiro, Luigi Semenzato, Metin Doslu,
	Michel Lespinasse, Ozgun Erdogan, Peter Zijlstra, Roman Gushchin,
	Ryan Mallon, Tejun Heo, Vlastimil Babka, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit e7b563bb2a6f4d974208da46200784b9c5b5a47e upstream.

The radix tree hole searching code is only used for page cache, for
example the readahead code trying to get a a picture of the area
surrounding a fault.

It sufficed to rely on the radix tree definition of holes, which is
"empty tree slot".  But this is about to change, though, as shadow page
descriptors will be stored in the page cache after the actual pages get
evicted from memory.

Move the functions over to mm/filemap.c and make them native page cache
operations, where they can later be adapted to handle the new definition
of "page cache hole".

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/blocklayout/blocklayout.c |    2 -
 include/linux/pagemap.h          |    5 ++
 include/linux/radix-tree.h       |    4 --
 lib/radix-tree.c                 |   75 --------------------------------------
 mm/filemap.c                     |   76 +++++++++++++++++++++++++++++++++++++++
 mm/readahead.c                   |    4 +-
 6 files changed, 84 insertions(+), 82 deletions(-)

--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -1213,7 +1213,7 @@ static u64 pnfs_num_cont_bytes(struct in
 	end = DIV_ROUND_UP(i_size_read(inode), PAGE_CACHE_SIZE);
 	if (end != NFS_I(inode)->npages) {
 		rcu_read_lock();
-		end = radix_tree_next_hole(&mapping->page_tree, idx + 1, ULONG_MAX);
+		end = page_cache_next_hole(mapping, idx + 1, ULONG_MAX);
 		rcu_read_unlock();
 	}
 
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -243,6 +243,11 @@ static inline struct page *page_cache_al
 
 typedef int filler_t(void *, struct page *);
 
+pgoff_t page_cache_next_hole(struct address_space *mapping,
+			     pgoff_t index, unsigned long max_scan);
+pgoff_t page_cache_prev_hole(struct address_space *mapping,
+			     pgoff_t index, unsigned long max_scan);
+
 extern struct page * find_get_page(struct address_space *mapping,
 				pgoff_t index);
 extern struct page * find_lock_page(struct address_space *mapping,
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -227,10 +227,6 @@ radix_tree_gang_lookup(struct radix_tree
 unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root,
 			void ***results, unsigned long *indices,
 			unsigned long first_index, unsigned int max_items);
-unsigned long radix_tree_next_hole(struct radix_tree_root *root,
-				unsigned long index, unsigned long max_scan);
-unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
-				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 int radix_tree_maybe_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -946,81 +946,6 @@ next:
 }
 EXPORT_SYMBOL(radix_tree_range_tag_if_tagged);
 
-
-/**
- *	radix_tree_next_hole    -    find the next hole (not-present entry)
- *	@root:		tree root
- *	@index:		index key
- *	@max_scan:	maximum range to search
- *
- *	Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the lowest
- *	indexed hole.
- *
- *	Returns: the index of the hole if found, otherwise returns an index
- *	outside of the set specified (in which case 'return - index >= max_scan'
- *	will be true). In rare cases of index wrap-around, 0 will be returned.
- *
- *	radix_tree_next_hole may be called under rcu_read_lock. However, like
- *	radix_tree_gang_lookup, this will not atomically search a snapshot of
- *	the tree at a single point in time. For example, if a hole is created
- *	at index 5, then subsequently a hole is created at index 10,
- *	radix_tree_next_hole covering both indexes may return 10 if called
- *	under rcu_read_lock.
- */
-unsigned long radix_tree_next_hole(struct radix_tree_root *root,
-				unsigned long index, unsigned long max_scan)
-{
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		if (!radix_tree_lookup(root, index))
-			break;
-		index++;
-		if (index == 0)
-			break;
-	}
-
-	return index;
-}
-EXPORT_SYMBOL(radix_tree_next_hole);
-
-/**
- *	radix_tree_prev_hole    -    find the prev hole (not-present entry)
- *	@root:		tree root
- *	@index:		index key
- *	@max_scan:	maximum range to search
- *
- *	Search backwards in the range [max(index-max_scan+1, 0), index]
- *	for the first hole.
- *
- *	Returns: the index of the hole if found, otherwise returns an index
- *	outside of the set specified (in which case 'index - return >= max_scan'
- *	will be true). In rare cases of wrap-around, ULONG_MAX will be returned.
- *
- *	radix_tree_next_hole may be called under rcu_read_lock. However, like
- *	radix_tree_gang_lookup, this will not atomically search a snapshot of
- *	the tree at a single point in time. For example, if a hole is created
- *	at index 10, then subsequently a hole is created at index 5,
- *	radix_tree_prev_hole covering both indexes may return 5 if called under
- *	rcu_read_lock.
- */
-unsigned long radix_tree_prev_hole(struct radix_tree_root *root,
-				   unsigned long index, unsigned long max_scan)
-{
-	unsigned long i;
-
-	for (i = 0; i < max_scan; i++) {
-		if (!radix_tree_lookup(root, index))
-			break;
-		index--;
-		if (index == ULONG_MAX)
-			break;
-	}
-
-	return index;
-}
-EXPORT_SYMBOL(radix_tree_prev_hole);
-
 /**
  *	radix_tree_gang_lookup - perform multiple lookup on a radix tree
  *	@root:		radix tree root
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -688,6 +688,82 @@ int __lock_page_or_retry(struct page *pa
 }
 
 /**
+ * page_cache_next_hole - find the next hole (not-present entry)
+ * @mapping: mapping
+ * @index: index
+ * @max_scan: maximum range to search
+ *
+ * Search the set [index, min(index+max_scan-1, MAX_INDEX)] for the
+ * lowest indexed hole.
+ *
+ * Returns: the index of the hole if found, otherwise returns an index
+ * outside of the set specified (in which case 'return - index >=
+ * max_scan' will be true). In rare cases of index wrap-around, 0 will
+ * be returned.
+ *
+ * page_cache_next_hole may be called under rcu_read_lock. However,
+ * like radix_tree_gang_lookup, this will not atomically search a
+ * snapshot of the tree at a single point in time. For example, if a
+ * hole is created at index 5, then subsequently a hole is created at
+ * index 10, page_cache_next_hole covering both indexes may return 10
+ * if called under rcu_read_lock.
+ */
+pgoff_t page_cache_next_hole(struct address_space *mapping,
+			     pgoff_t index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (!radix_tree_lookup(&mapping->page_tree, index))
+			break;
+		index++;
+		if (index == 0)
+			break;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL(page_cache_next_hole);
+
+/**
+ * page_cache_prev_hole - find the prev hole (not-present entry)
+ * @mapping: mapping
+ * @index: index
+ * @max_scan: maximum range to search
+ *
+ * Search backwards in the range [max(index-max_scan+1, 0), index] for
+ * the first hole.
+ *
+ * Returns: the index of the hole if found, otherwise returns an index
+ * outside of the set specified (in which case 'index - return >=
+ * max_scan' will be true). In rare cases of wrap-around, ULONG_MAX
+ * will be returned.
+ *
+ * page_cache_prev_hole may be called under rcu_read_lock. However,
+ * like radix_tree_gang_lookup, this will not atomically search a
+ * snapshot of the tree at a single point in time. For example, if a
+ * hole is created at index 10, then subsequently a hole is created at
+ * index 5, page_cache_prev_hole covering both indexes may return 5 if
+ * called under rcu_read_lock.
+ */
+pgoff_t page_cache_prev_hole(struct address_space *mapping,
+			     pgoff_t index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (!radix_tree_lookup(&mapping->page_tree, index))
+			break;
+		index--;
+		if (index == ULONG_MAX)
+			break;
+	}
+
+	return index;
+}
+EXPORT_SYMBOL(page_cache_prev_hole);
+
+/**
  * find_get_page - find and get a page reference
  * @mapping: the address_space to search
  * @offset: the page index
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -347,7 +347,7 @@ static pgoff_t count_history_pages(struc
 	pgoff_t head;
 
 	rcu_read_lock();
-	head = radix_tree_prev_hole(&mapping->page_tree, offset - 1, max);
+	head = page_cache_prev_hole(mapping, offset - 1, max);
 	rcu_read_unlock();
 
 	return offset - 1 - head;
@@ -427,7 +427,7 @@ ondemand_readahead(struct address_space
 		pgoff_t start;
 
 		rcu_read_lock();
-		start = radix_tree_next_hole(&mapping->page_tree, offset+1,max);
+		start = page_cache_next_hole(mapping, offset + 1, max);
 		rcu_read_unlock();
 
 		if (!start || start - offset > max)



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (94 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 101/122] mm: filemap: move radix tree hole searching here Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-20 15:49   ` Johannes Weiner
  2014-11-19 20:52 ` [PATCH 3.14 103/122] mm: madvise: fix MADV_WILLNEED on shmem swapouts Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  117 siblings, 1 reply; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Rik van Riel,
	Minchan Kim, Andrea Arcangeli, Bob Liu, Christoph Hellwig,
	Dave Chinner, Greg Thelen, Hugh Dickins, Jan Kara,
	KOSAKI Motohiro, Luigi Semenzato, Mel Gorman, Metin Doslu,
	Michel Lespinasse, Ozgun Erdogan, Peter Zijlstra, Roman Gushchin,
	Ryan Mallon, Tejun Heo, Vlastimil Babka, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream.

shmem mappings already contain exceptional entries where swap slot
information is remembered.

To be able to store eviction information for regular page cache, prepare
every site dealing with the radix trees directly to handle entries other
than pages.

The common lookup functions will filter out non-page entries and return
NULL for page cache holes, just as before.  But provide a raw version of
the API which returns non-page entries as well, and switch shmem over to
use it.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/compression.c   |    2 
 include/linux/mm.h       |    8 +
 include/linux/pagemap.h  |   15 ++-
 include/linux/pagevec.h  |    5 +
 include/linux/shmem_fs.h |    1 
 mm/filemap.c             |  202 +++++++++++++++++++++++++++++++++++++++++------
 mm/mincore.c             |   20 +++-
 mm/readahead.c           |    2 
 mm/shmem.c               |   97 ++++------------------
 mm/swap.c                |   51 +++++++++++
 mm/truncate.c            |   74 ++++++++++++++---
 11 files changed, 348 insertions(+), 129 deletions(-)

--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -472,7 +472,7 @@ static noinline int add_ra_bio_pages(str
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->page_tree, pg_index);
 		rcu_read_unlock();
-		if (page) {
+		if (page && !radix_tree_exceptional_entry(page)) {
 			misses++;
 			if (misses > 4)
 				break;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1041,6 +1041,14 @@ extern void show_free_areas(unsigned int
 extern bool skip_free_areas_node(unsigned int flags, int nid);
 
 int shmem_zero_setup(struct vm_area_struct *);
+#ifdef CONFIG_SHMEM
+bool shmem_mapping(struct address_space *mapping);
+#else
+static inline bool shmem_mapping(struct address_space *mapping)
+{
+	return false;
+}
+#endif
 
 extern int can_do_mlock(void);
 extern int user_shm_lock(size_t, struct user_struct *);
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -248,12 +248,15 @@ pgoff_t page_cache_next_hole(struct addr
 pgoff_t page_cache_prev_hole(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan);
 
-extern struct page * find_get_page(struct address_space *mapping,
-				pgoff_t index);
-extern struct page * find_lock_page(struct address_space *mapping,
-				pgoff_t index);
-extern struct page * find_or_create_page(struct address_space *mapping,
-				pgoff_t index, gfp_t gfp_mask);
+struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
+struct page *find_get_page(struct address_space *mapping, pgoff_t offset);
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset);
+struct page *find_lock_page(struct address_space *mapping, pgoff_t offset);
+struct page *find_or_create_page(struct address_space *mapping, pgoff_t index,
+				 gfp_t gfp_mask);
+unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
+			  unsigned int nr_entries, struct page **entries,
+			  pgoff_t *indices);
 unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 			unsigned int nr_pages, struct page **pages);
 unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -22,6 +22,11 @@ struct pagevec {
 
 void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
+unsigned pagevec_lookup_entries(struct pagevec *pvec,
+				struct address_space *mapping,
+				pgoff_t start, unsigned nr_entries,
+				pgoff_t *indices);
+void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
 		pgoff_t start, unsigned nr_pages);
 unsigned pagevec_lookup_tag(struct pagevec *pvec,
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -51,6 +51,7 @@ extern struct file *shmem_kernel_file_se
 					    unsigned long flags);
 extern int shmem_zero_setup(struct vm_area_struct *);
 extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern bool shmem_mapping(struct address_space *mapping);
 extern void shmem_unlock_mapping(struct address_space *mapping);
 extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 					pgoff_t index, gfp_t gfp_mask);
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -448,6 +448,29 @@ int replace_page_cache_page(struct page
 }
 EXPORT_SYMBOL_GPL(replace_page_cache_page);
 
+static int page_cache_tree_insert(struct address_space *mapping,
+				  struct page *page)
+{
+	void **slot;
+	int error;
+
+	slot = radix_tree_lookup_slot(&mapping->page_tree, page->index);
+	if (slot) {
+		void *p;
+
+		p = radix_tree_deref_slot_protected(slot, &mapping->tree_lock);
+		if (!radix_tree_exceptional_entry(p))
+			return -EEXIST;
+		radix_tree_replace_slot(slot, page);
+		mapping->nrpages++;
+		return 0;
+	}
+	error = radix_tree_insert(&mapping->page_tree, page->index, page);
+	if (!error)
+		mapping->nrpages++;
+	return error;
+}
+
 /**
  * add_to_page_cache_locked - add a locked page to the pagecache
  * @page:	page to add
@@ -482,11 +505,10 @@ int add_to_page_cache_locked(struct page
 	page->index = offset;
 
 	spin_lock_irq(&mapping->tree_lock);
-	error = radix_tree_insert(&mapping->page_tree, offset, page);
+	error = page_cache_tree_insert(mapping, page);
 	radix_tree_preload_end();
 	if (unlikely(error))
 		goto err_insert;
-	mapping->nrpages++;
 	__inc_zone_page_state(page, NR_FILE_PAGES);
 	spin_unlock_irq(&mapping->tree_lock);
 	trace_mm_filemap_add_to_page_cache(page);
@@ -714,7 +736,10 @@ pgoff_t page_cache_next_hole(struct addr
 	unsigned long i;
 
 	for (i = 0; i < max_scan; i++) {
-		if (!radix_tree_lookup(&mapping->page_tree, index))
+		struct page *page;
+
+		page = radix_tree_lookup(&mapping->page_tree, index);
+		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index++;
 		if (index == 0)
@@ -752,7 +777,10 @@ pgoff_t page_cache_prev_hole(struct addr
 	unsigned long i;
 
 	for (i = 0; i < max_scan; i++) {
-		if (!radix_tree_lookup(&mapping->page_tree, index))
+		struct page *page;
+
+		page = radix_tree_lookup(&mapping->page_tree, index);
+		if (!page || radix_tree_exceptional_entry(page))
 			break;
 		index--;
 		if (index == ULONG_MAX)
@@ -764,14 +792,19 @@ pgoff_t page_cache_prev_hole(struct addr
 EXPORT_SYMBOL(page_cache_prev_hole);
 
 /**
- * find_get_page - find and get a page reference
+ * find_get_entry - find and get a page cache entry
  * @mapping: the address_space to search
- * @offset: the page index
+ * @offset: the page cache index
+ *
+ * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * page cache page, it is returned with an increased refcount.
+ *
+ * If the slot holds a shadow entry of a previously evicted page, it
+ * is returned.
  *
- * Is there a pagecache struct page at the given (mapping, offset) tuple?
- * If yes, increment its refcount and return it; if no, return NULL.
+ * Otherwise, %NULL is returned.
  */
-struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_entry(struct address_space *mapping, pgoff_t offset)
 {
 	void **pagep;
 	struct page *page;
@@ -812,24 +845,50 @@ out:
 
 	return page;
 }
-EXPORT_SYMBOL(find_get_page);
+EXPORT_SYMBOL(find_get_entry);
 
 /**
- * find_lock_page - locate, pin and lock a pagecache page
+ * find_get_page - find and get a page reference
  * @mapping: the address_space to search
  * @offset: the page index
  *
- * Locates the desired pagecache page, locks it, increments its reference
- * count and returns its address.
+ * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * page cache page, it is returned with an increased refcount.
  *
- * Returns zero if the page was not present. find_lock_page() may sleep.
+ * Otherwise, %NULL is returned.
  */
-struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
+struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
+{
+	struct page *page = find_get_entry(mapping, offset);
+
+	if (radix_tree_exceptional_entry(page))
+		page = NULL;
+	return page;
+}
+EXPORT_SYMBOL(find_get_page);
+
+/**
+ * find_lock_entry - locate, pin and lock a page cache entry
+ * @mapping: the address_space to search
+ * @offset: the page cache index
+ *
+ * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * page cache page, it is returned locked and with an increased
+ * refcount.
+ *
+ * If the slot holds a shadow entry of a previously evicted page, it
+ * is returned.
+ *
+ * Otherwise, %NULL is returned.
+ *
+ * find_lock_entry() may sleep.
+ */
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
 {
 	struct page *page;
 
 repeat:
-	page = find_get_page(mapping, offset);
+	page = find_get_entry(mapping, offset);
 	if (page && !radix_tree_exception(page)) {
 		lock_page(page);
 		/* Has the page been truncated? */
@@ -842,6 +901,29 @@ repeat:
 	}
 	return page;
 }
+EXPORT_SYMBOL(find_lock_entry);
+
+/**
+ * find_lock_page - locate, pin and lock a pagecache page
+ * @mapping: the address_space to search
+ * @offset: the page index
+ *
+ * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * page cache page, it is returned locked and with an increased
+ * refcount.
+ *
+ * Otherwise, %NULL is returned.
+ *
+ * find_lock_page() may sleep.
+ */
+struct page *find_lock_page(struct address_space *mapping, pgoff_t offset)
+{
+	struct page *page = find_lock_entry(mapping, offset);
+
+	if (radix_tree_exceptional_entry(page))
+		page = NULL;
+	return page;
+}
 EXPORT_SYMBOL(find_lock_page);
 
 /**
@@ -850,16 +932,18 @@ EXPORT_SYMBOL(find_lock_page);
  * @index: the page's index into the mapping
  * @gfp_mask: page allocation mode
  *
- * Locates a page in the pagecache.  If the page is not present, a new page
- * is allocated using @gfp_mask and is added to the pagecache and to the VM's
- * LRU list.  The returned page is locked and has its reference count
- * incremented.
+ * Looks up the page cache slot at @mapping & @offset.  If there is a
+ * page cache page, it is returned locked and with an increased
+ * refcount.
+ *
+ * If the page is not present, a new page is allocated using @gfp_mask
+ * and added to the page cache and the VM's LRU list.  The page is
+ * returned locked and with an increased refcount.
  *
- * find_or_create_page() may sleep, even if @gfp_flags specifies an atomic
- * allocation!
+ * On memory exhaustion, %NULL is returned.
  *
- * find_or_create_page() returns the desired page's address, or zero on
- * memory exhaustion.
+ * find_or_create_page() may sleep, even if @gfp_flags specifies an
+ * atomic allocation!
  */
 struct page *find_or_create_page(struct address_space *mapping,
 		pgoff_t index, gfp_t gfp_mask)
@@ -892,6 +976,76 @@ repeat:
 EXPORT_SYMBOL(find_or_create_page);
 
 /**
+ * find_get_entries - gang pagecache lookup
+ * @mapping:	The address_space to search
+ * @start:	The starting page cache index
+ * @nr_entries:	The maximum number of entries
+ * @entries:	Where the resulting entries are placed
+ * @indices:	The cache indices corresponding to the entries in @entries
+ *
+ * find_get_entries() will search for and return a group of up to
+ * @nr_entries entries in the mapping.  The entries are placed at
+ * @entries.  find_get_entries() takes a reference against any actual
+ * pages it returns.
+ *
+ * The search returns a group of mapping-contiguous page cache entries
+ * with ascending indexes.  There may be holes in the indices due to
+ * not-present pages.
+ *
+ * Any shadow entries of evicted pages are included in the returned
+ * array.
+ *
+ * find_get_entries() returns the number of pages and shadow entries
+ * which were found.
+ */
+unsigned find_get_entries(struct address_space *mapping,
+			  pgoff_t start, unsigned int nr_entries,
+			  struct page **entries, pgoff_t *indices)
+{
+	void **slot;
+	unsigned int ret = 0;
+	struct radix_tree_iter iter;
+
+	if (!nr_entries)
+		return 0;
+
+	rcu_read_lock();
+restart:
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		struct page *page;
+repeat:
+		page = radix_tree_deref_slot(slot);
+		if (unlikely(!page))
+			continue;
+		if (radix_tree_exception(page)) {
+			if (radix_tree_deref_retry(page))
+				goto restart;
+			/*
+			 * Otherwise, we must be storing a swap entry
+			 * here as an exceptional entry: so return it
+			 * without attempting to raise page count.
+			 */
+			goto export;
+		}
+		if (!page_cache_get_speculative(page))
+			goto repeat;
+
+		/* Has the page moved? */
+		if (unlikely(page != *slot)) {
+			page_cache_release(page);
+			goto repeat;
+		}
+export:
+		indices[ret] = iter.index;
+		entries[ret] = page;
+		if (++ret == nr_entries)
+			break;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+/**
  * find_get_pages - gang pagecache lookup
  * @mapping:	The address_space to search
  * @start:	The starting page index
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -70,13 +70,21 @@ static unsigned char mincore_page(struct
 	 * any other file mapping (ie. marked !present and faulted in with
 	 * tmpfs's .fault). So swapped out tmpfs mappings are tested here.
 	 */
-	page = find_get_page(mapping, pgoff);
 #ifdef CONFIG_SWAP
-	/* shmem/tmpfs may return swap: account for swapcache page too. */
-	if (radix_tree_exceptional_entry(page)) {
-		swp_entry_t swap = radix_to_swp_entry(page);
-		page = find_get_page(swap_address_space(swap), swap.val);
-	}
+	if (shmem_mapping(mapping)) {
+		page = find_get_entry(mapping, pgoff);
+		/*
+		 * shmem/tmpfs may return swap: account for swapcache
+		 * page too.
+		 */
+		if (radix_tree_exceptional_entry(page)) {
+			swp_entry_t swp = radix_to_swp_entry(page);
+			page = find_get_page(swap_address_space(swp), swp.val);
+		}
+	} else
+		page = find_get_page(mapping, pgoff);
+#else
+	page = find_get_page(mapping, pgoff);
 #endif
 	if (page) {
 		present = PageUptodate(page);
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -179,7 +179,7 @@ __do_page_cache_readahead(struct address
 		rcu_read_lock();
 		page = radix_tree_lookup(&mapping->page_tree, page_offset);
 		rcu_read_unlock();
-		if (page)
+		if (page && !radix_tree_exceptional_entry(page))
 			continue;
 
 		page = page_cache_alloc_readahead(mapping);
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -330,56 +330,6 @@ static void shmem_delete_from_page_cache
 }
 
 /*
- * Like find_get_pages, but collecting swap entries as well as pages.
- */
-static unsigned shmem_find_get_pages_and_swap(struct address_space *mapping,
-					pgoff_t start, unsigned int nr_pages,
-					struct page **pages, pgoff_t *indices)
-{
-	void **slot;
-	unsigned int ret = 0;
-	struct radix_tree_iter iter;
-
-	if (!nr_pages)
-		return 0;
-
-	rcu_read_lock();
-restart:
-	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
-		struct page *page;
-repeat:
-		page = radix_tree_deref_slot(slot);
-		if (unlikely(!page))
-			continue;
-		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto restart;
-			/*
-			 * Otherwise, we must be storing a swap entry
-			 * here as an exceptional entry: so return it
-			 * without attempting to raise page count.
-			 */
-			goto export;
-		}
-		if (!page_cache_get_speculative(page))
-			goto repeat;
-
-		/* Has the page moved? */
-		if (unlikely(page != *slot)) {
-			page_cache_release(page);
-			goto repeat;
-		}
-export:
-		indices[ret] = iter.index;
-		pages[ret] = page;
-		if (++ret == nr_pages)
-			break;
-	}
-	rcu_read_unlock();
-	return ret;
-}
-
-/*
  * Remove swap entry from radix tree, free the swap and its page cache.
  */
 static int shmem_free_swap(struct address_space *mapping,
@@ -397,21 +347,6 @@ static int shmem_free_swap(struct addres
 }
 
 /*
- * Pagevec may contain swap entries, so shuffle up pages before releasing.
- */
-static void shmem_deswap_pagevec(struct pagevec *pvec)
-{
-	int i, j;
-
-	for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
-		struct page *page = pvec->pages[i];
-		if (!radix_tree_exceptional_entry(page))
-			pvec->pages[j++] = page;
-	}
-	pvec->nr = j;
-}
-
-/*
  * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
  */
 void shmem_unlock_mapping(struct address_space *mapping)
@@ -429,12 +364,12 @@ void shmem_unlock_mapping(struct address
 		 * Avoid pagevec_lookup(): find_get_pages() returns 0 as if it
 		 * has finished, if it hits a row of PAGEVEC_SIZE swap entries.
 		 */
-		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
-					PAGEVEC_SIZE, pvec.pages, indices);
+		pvec.nr = find_get_entries(mapping, index,
+					   PAGEVEC_SIZE, pvec.pages, indices);
 		if (!pvec.nr)
 			break;
 		index = indices[pvec.nr - 1] + 1;
-		shmem_deswap_pagevec(&pvec);
+		pagevec_remove_exceptionals(&pvec);
 		check_move_unevictable_pages(pvec.pages, pvec.nr);
 		pagevec_release(&pvec);
 		cond_resched();
@@ -466,9 +401,9 @@ static void shmem_undo_range(struct inod
 	pagevec_init(&pvec, 0);
 	index = start;
 	while (index < end) {
-		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
-				min(end - index, (pgoff_t)PAGEVEC_SIZE),
-							pvec.pages, indices);
+		pvec.nr = find_get_entries(mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE),
+			pvec.pages, indices);
 		if (!pvec.nr)
 			break;
 		mem_cgroup_uncharge_start();
@@ -497,7 +432,7 @@ static void shmem_undo_range(struct inod
 			}
 			unlock_page(page);
 		}
-		shmem_deswap_pagevec(&pvec);
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
@@ -535,9 +470,10 @@ static void shmem_undo_range(struct inod
 	index = start;
 	while (index < end) {
 		cond_resched();
-		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
+
+		pvec.nr = find_get_entries(mapping, index,
 				min(end - index, (pgoff_t)PAGEVEC_SIZE),
-							pvec.pages, indices);
+				pvec.pages, indices);
 		if (!pvec.nr) {
 			/* If all gone or hole-punch or unfalloc, we're done */
 			if (index == start || end != -1)
@@ -580,7 +516,7 @@ static void shmem_undo_range(struct inod
 			}
 			unlock_page(page);
 		}
-		shmem_deswap_pagevec(&pvec);
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		index++;
@@ -1087,7 +1023,7 @@ static int shmem_getpage_gfp(struct inod
 		return -EFBIG;
 repeat:
 	swap.val = 0;
-	page = find_lock_page(mapping, index);
+	page = find_lock_entry(mapping, index);
 	if (radix_tree_exceptional_entry(page)) {
 		swap = radix_to_swp_entry(page);
 		page = NULL;
@@ -1482,6 +1418,11 @@ static struct inode *shmem_get_inode(str
 	return inode;
 }
 
+bool shmem_mapping(struct address_space *mapping)
+{
+	return mapping->backing_dev_info == &shmem_backing_dev_info;
+}
+
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
 static const struct inode_operations shmem_short_symlink_operations;
@@ -1794,7 +1735,7 @@ static pgoff_t shmem_seek_hole_data(stru
 	pagevec_init(&pvec, 0);
 	pvec.nr = 1;		/* start small: we may be there already */
 	while (!done) {
-		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
+		pvec.nr = find_get_entries(mapping, index,
 					pvec.nr, pvec.pages, indices);
 		if (!pvec.nr) {
 			if (whence == SEEK_DATA)
@@ -1821,7 +1762,7 @@ static pgoff_t shmem_seek_hole_data(stru
 				break;
 			}
 		}
-		shmem_deswap_pagevec(&pvec);
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		pvec.nr = PAGEVEC_SIZE;
 		cond_resched();
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -948,6 +948,57 @@ void __pagevec_lru_add(struct pagevec *p
 EXPORT_SYMBOL(__pagevec_lru_add);
 
 /**
+ * pagevec_lookup_entries - gang pagecache lookup
+ * @pvec:	Where the resulting entries are placed
+ * @mapping:	The address_space to search
+ * @start:	The starting entry index
+ * @nr_entries:	The maximum number of entries
+ * @indices:	The cache indices corresponding to the entries in @pvec
+ *
+ * pagevec_lookup_entries() will search for and return a group of up
+ * to @nr_entries pages and shadow entries in the mapping.  All
+ * entries are placed in @pvec.  pagevec_lookup_entries() takes a
+ * reference against actual pages in @pvec.
+ *
+ * The search returns a group of mapping-contiguous entries with
+ * ascending indexes.  There may be holes in the indices due to
+ * not-present entries.
+ *
+ * pagevec_lookup_entries() returns the number of entries which were
+ * found.
+ */
+unsigned pagevec_lookup_entries(struct pagevec *pvec,
+				struct address_space *mapping,
+				pgoff_t start, unsigned nr_pages,
+				pgoff_t *indices)
+{
+	pvec->nr = find_get_entries(mapping, start, nr_pages,
+				    pvec->pages, indices);
+	return pagevec_count(pvec);
+}
+
+/**
+ * pagevec_remove_exceptionals - pagevec exceptionals pruning
+ * @pvec:	The pagevec to prune
+ *
+ * pagevec_lookup_entries() fills both pages and exceptional radix
+ * tree entries into the pagevec.  This function prunes all
+ * exceptionals from @pvec without leaving holes, so that it can be
+ * passed on to page-only pagevec operations.
+ */
+void pagevec_remove_exceptionals(struct pagevec *pvec)
+{
+	int i, j;
+
+	for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		if (!radix_tree_exceptional_entry(page))
+			pvec->pages[j++] = page;
+	}
+	pvec->nr = j;
+}
+
+/**
  * pagevec_lookup - gang pagecache lookup
  * @pvec:	Where the resulting pages are placed
  * @mapping:	The address_space to search
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -23,6 +23,22 @@
 #include <linux/rmap.h>
 #include "internal.h"
 
+static void clear_exceptional_entry(struct address_space *mapping,
+				    pgoff_t index, void *entry)
+{
+	/* Handled by shmem itself */
+	if (shmem_mapping(mapping))
+		return;
+
+	spin_lock_irq(&mapping->tree_lock);
+	/*
+	 * Regular page slots are stabilized by the page lock even
+	 * without the tree itself locked.  These unlocked entries
+	 * need verification under the tree lock.
+	 */
+	radix_tree_delete_item(&mapping->page_tree, index, entry);
+	spin_unlock_irq(&mapping->tree_lock);
+}
 
 /**
  * do_invalidatepage - invalidate part or all of a page
@@ -209,6 +225,7 @@ void truncate_inode_pages_range(struct a
 	unsigned int	partial_start;	/* inclusive */
 	unsigned int	partial_end;	/* exclusive */
 	struct pagevec	pvec;
+	pgoff_t		indices[PAGEVEC_SIZE];
 	pgoff_t		index;
 	int		i;
 
@@ -239,17 +256,23 @@ void truncate_inode_pages_range(struct a
 
 	pagevec_init(&pvec, 0);
 	index = start;
-	while (index < end && pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
+	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE),
+			indices)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
-			index = page->index;
+			index = indices[i];
 			if (index >= end)
 				break;
 
+			if (radix_tree_exceptional_entry(page)) {
+				clear_exceptional_entry(mapping, index, page);
+				continue;
+			}
+
 			if (!trylock_page(page))
 				continue;
 			WARN_ON(page->index != index);
@@ -260,6 +283,7 @@ void truncate_inode_pages_range(struct a
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
@@ -308,14 +332,16 @@ void truncate_inode_pages_range(struct a
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE))) {
+		if (!pagevec_lookup_entries(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE),
+			indices)) {
 			if (index == start)
 				break;
 			index = start;
 			continue;
 		}
-		if (index == start && pvec.pages[0]->index >= end) {
+		if (index == start && indices[0] >= end) {
+			pagevec_remove_exceptionals(&pvec);
 			pagevec_release(&pvec);
 			break;
 		}
@@ -324,16 +350,22 @@ void truncate_inode_pages_range(struct a
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
-			index = page->index;
+			index = indices[i];
 			if (index >= end)
 				break;
 
+			if (radix_tree_exceptional_entry(page)) {
+				clear_exceptional_entry(mapping, index, page);
+				continue;
+			}
+
 			lock_page(page);
 			WARN_ON(page->index != index);
 			wait_on_page_writeback(page);
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		index++;
@@ -376,6 +408,7 @@ EXPORT_SYMBOL(truncate_inode_pages);
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 		pgoff_t start, pgoff_t end)
 {
+	pgoff_t indices[PAGEVEC_SIZE];
 	struct pagevec pvec;
 	pgoff_t index = start;
 	unsigned long ret;
@@ -391,17 +424,23 @@ unsigned long invalidate_mapping_pages(s
 	 */
 
 	pagevec_init(&pvec, 0);
-	while (index <= end && pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
+			indices)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
-			index = page->index;
+			index = indices[i];
 			if (index > end)
 				break;
 
+			if (radix_tree_exceptional_entry(page)) {
+				clear_exceptional_entry(mapping, index, page);
+				continue;
+			}
+
 			if (!trylock_page(page))
 				continue;
 			WARN_ON(page->index != index);
@@ -415,6 +454,7 @@ unsigned long invalidate_mapping_pages(s
 				deactivate_page(page);
 			count += ret;
 		}
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
@@ -482,6 +522,7 @@ static int do_launder_page(struct addres
 int invalidate_inode_pages2_range(struct address_space *mapping,
 				  pgoff_t start, pgoff_t end)
 {
+	pgoff_t indices[PAGEVEC_SIZE];
 	struct pagevec pvec;
 	pgoff_t index;
 	int i;
@@ -492,17 +533,23 @@ int invalidate_inode_pages2_range(struct
 	cleancache_invalidate_inode(mapping);
 	pagevec_init(&pvec, 0);
 	index = start;
-	while (index <= end && pagevec_lookup(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
+			indices)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
-			index = page->index;
+			index = indices[i];
 			if (index > end)
 				break;
 
+			if (radix_tree_exceptional_entry(page)) {
+				clear_exceptional_entry(mapping, index, page);
+				continue;
+			}
+
 			lock_page(page);
 			WARN_ON(page->index != index);
 			if (page->mapping != mapping) {
@@ -540,6 +587,7 @@ int invalidate_inode_pages2_range(struct
 				ret = ret2;
 			unlock_page(page);
 		}
+		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 103/122] mm: madvise: fix MADV_WILLNEED on shmem swapouts
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (95 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 104/122] mm: remove read_cache_page_async() Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Hugh Dickins,
	Andrew Morton, Linus Torvalds, Mel Gorman

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 55231e5c898c5c03c14194001e349f40f59bd300 upstream.

MADV_WILLNEED currently does not read swapped out shmem pages back in.

Commit 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page
cache radix trees") made find_get_page() filter exceptional radix tree
entries but failed to convert all find_get_page() callers that WANT
exceptional entries over to find_get_entry().  One of them is shmem swap
readahead in madvise, which now skips over any swap-out records.

Convert it to find_get_entry().

Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/madvise.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -195,7 +195,7 @@ static void force_shm_swapin_readahead(s
 	for (; start < end; start += PAGE_SIZE) {
 		index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
 
-		page = find_get_page(mapping, index);
+		page = find_get_entry(mapping, index);
 		if (!radix_tree_exceptional_entry(page)) {
 			if (page)
 				page_cache_release(page);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 104/122] mm: remove read_cache_page_async()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (96 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 103/122] mm: madvise: fix MADV_WILLNEED on shmem swapouts Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 105/122] callers of iov_copy_from_user_atomic() dont need pagecache_disable() Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sasha Levin, Andrew Morton,
	Linus Torvalds, Mel Gorman

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Sasha Levin <sasha.levin@oracle.com>

commit 67f9fd91f93c582b7de2ab9325b6e179db77e4d5 upstream.

This patch removes read_cache_page_async() which wasn't really needed
anywhere and simplifies the code around it a bit.

read_cache_page_async() is useful when we want to read a page into the
cache without waiting for it to complete.  This happens when the
appropriate callback 'filler' doesn't complete its read operation and
releases the page lock immediately, and instead queues a different
completion routine to do that.  This never actually happened anywhere in
the code.

read_cache_page_async() had 3 different callers:

- read_cache_page() which is the sync version, it would just wait for
  the requested read to complete using wait_on_page_read().

- JFFS2 would call it from jffs2_gc_fetch_page(), but the filler
  function it supplied doesn't do any async reads, and would complete
  before the filler function returns - making it actually a sync read.

- CRAMFS would call it using the read_mapping_page_async() wrapper, with
  a similar story to JFFS2 - the filler function doesn't do anything that
  reminds async reads and would always complete before the filler function
  returns.

To sum it up, the code in mm/filemap.c never took advantage of having
read_cache_page_async().  While there are filler callbacks that do async
reads (such as the block one), we always called it with the
read_cache_page().

This patch adds a mandatory wait for read to complete when adding a new
page to the cache, and removes read_cache_page_async() and its wrappers.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/cramfs/inode.c       |    3 --
 fs/jffs2/fs.c           |    2 -
 include/linux/pagemap.h |   10 -------
 mm/filemap.c            |   64 +++++++++++++++++-------------------------------
 4 files changed, 25 insertions(+), 54 deletions(-)

--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -195,8 +195,7 @@ static void *cramfs_read(struct super_bl
 		struct page *page = NULL;
 
 		if (blocknr + i < devsize) {
-			page = read_mapping_page_async(mapping, blocknr + i,
-									NULL);
+			page = read_mapping_page(mapping, blocknr + i, NULL);
 			/* synchronous error? */
 			if (IS_ERR(page))
 				page = NULL;
--- a/fs/jffs2/fs.c
+++ b/fs/jffs2/fs.c
@@ -687,7 +687,7 @@ unsigned char *jffs2_gc_fetch_page(struc
 	struct inode *inode = OFNI_EDONI_2SFFJ(f);
 	struct page *pg;
 
-	pg = read_cache_page_async(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
+	pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
 			     (void *)jffs2_do_readpage_unlock, inode);
 	if (IS_ERR(pg))
 		return (void *)pg;
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -278,8 +278,6 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
 				pgoff_t index);
-extern struct page * read_cache_page_async(struct address_space *mapping,
-				pgoff_t index, filler_t *filler, void *data);
 extern struct page * read_cache_page(struct address_space *mapping,
 				pgoff_t index, filler_t *filler, void *data);
 extern struct page * read_cache_page_gfp(struct address_space *mapping,
@@ -287,14 +285,6 @@ extern struct page * read_cache_page_gfp
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
-static inline struct page *read_mapping_page_async(
-				struct address_space *mapping,
-				pgoff_t index, void *data)
-{
-	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
-	return read_cache_page_async(mapping, index, filler, data);
-}
-
 static inline struct page *read_mapping_page(struct address_space *mapping,
 				pgoff_t index, void *data)
 {
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2027,6 +2027,18 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
+static struct page *wait_on_page_read(struct page *page)
+{
+	if (!IS_ERR(page)) {
+		wait_on_page_locked(page);
+		if (!PageUptodate(page)) {
+			page_cache_release(page);
+			page = ERR_PTR(-EIO);
+		}
+	}
+	return page;
+}
+
 static struct page *__read_cache_page(struct address_space *mapping,
 				pgoff_t index,
 				int (*filler)(void *, struct page *),
@@ -2053,6 +2065,8 @@ repeat:
 		if (err < 0) {
 			page_cache_release(page);
 			page = ERR_PTR(err);
+		} else {
+			page = wait_on_page_read(page);
 		}
 	}
 	return page;
@@ -2089,6 +2103,10 @@ retry:
 	if (err < 0) {
 		page_cache_release(page);
 		return ERR_PTR(err);
+	} else {
+		page = wait_on_page_read(page);
+		if (IS_ERR(page))
+			return page;
 	}
 out:
 	mark_page_accessed(page);
@@ -2096,40 +2114,25 @@ out:
 }
 
 /**
- * read_cache_page_async - read into page cache, fill it if needed
+ * read_cache_page - read into page cache, fill it if needed
  * @mapping:	the page's address_space
  * @index:	the page index
  * @filler:	function to perform the read
  * @data:	first arg to filler(data, page) function, often left as NULL
  *
- * Same as read_cache_page, but don't wait for page to become unlocked
- * after submitting it to the filler.
- *
  * Read into the page cache. If a page already exists, and PageUptodate() is
- * not set, try to fill the page but don't wait for it to become unlocked.
+ * not set, try to fill the page and wait for it to become unlocked.
  *
  * If the page does not get brought uptodate, return -EIO.
  */
-struct page *read_cache_page_async(struct address_space *mapping,
+struct page *read_cache_page(struct address_space *mapping,
 				pgoff_t index,
 				int (*filler)(void *, struct page *),
 				void *data)
 {
 	return do_read_cache_page(mapping, index, filler, data, mapping_gfp_mask(mapping));
 }
-EXPORT_SYMBOL(read_cache_page_async);
-
-static struct page *wait_on_page_read(struct page *page)
-{
-	if (!IS_ERR(page)) {
-		wait_on_page_locked(page);
-		if (!PageUptodate(page)) {
-			page_cache_release(page);
-			page = ERR_PTR(-EIO);
-		}
-	}
-	return page;
-}
+EXPORT_SYMBOL(read_cache_page);
 
 /**
  * read_cache_page_gfp - read into page cache, using specified page allocation flags.
@@ -2148,31 +2151,10 @@ struct page *read_cache_page_gfp(struct
 {
 	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
 
-	return wait_on_page_read(do_read_cache_page(mapping, index, filler, NULL, gfp));
+	return do_read_cache_page(mapping, index, filler, NULL, gfp);
 }
 EXPORT_SYMBOL(read_cache_page_gfp);
 
-/**
- * read_cache_page - read into page cache, fill it if needed
- * @mapping:	the page's address_space
- * @index:	the page index
- * @filler:	function to perform the read
- * @data:	first arg to filler(data, page) function, often left as NULL
- *
- * Read into the page cache. If a page already exists, and PageUptodate() is
- * not set, try to fill the page then wait for it to become unlocked.
- *
- * If the page does not get brought uptodate, return -EIO.
- */
-struct page *read_cache_page(struct address_space *mapping,
-				pgoff_t index,
-				int (*filler)(void *, struct page *),
-				void *data)
-{
-	return wait_on_page_read(read_cache_page_async(mapping, index, filler, data));
-}
-EXPORT_SYMBOL(read_cache_page);
-
 static size_t __iovec_copy_from_user_inatomic(char *vaddr,
 			const struct iovec *iov, size_t base, size_t bytes)
 {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 105/122] callers of iov_copy_from_user_atomic() dont need pagecache_disable()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (97 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 104/122] mm: remove read_cache_page_async() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 106/122] mm/readahead.c: inline ra_submit Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Mel Gorman

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 9e8c2af96e0d2d5fe298dd796fb6bc16e888a48d upstream.

... it does that itself (via kmap_atomic())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/file.c |    5 -----
 fs/fuse/file.c  |    2 --
 mm/filemap.c    |    3 ---
 3 files changed, 10 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -425,13 +425,8 @@ static noinline int btrfs_copy_from_user
 		struct page *page = prepared_pages[pg];
 		/*
 		 * Copy data from userspace to the current page
-		 *
-		 * Disable pagefault to avoid recursive lock since
-		 * the pages are already locked
 		 */
-		pagefault_disable();
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, count);
-		pagefault_enable();
 
 		/* Flush processor's dcache for this page */
 		flush_dcache_page(page);
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1003,9 +1003,7 @@ static ssize_t fuse_fill_write_pages(str
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_page(page);
 
-		pagefault_disable();
 		tmp = iov_iter_copy_from_user_atomic(page, ii, offset, bytes);
-		pagefault_enable();
 		flush_dcache_page(page);
 
 		mark_page_accessed(page);
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2188,7 +2188,6 @@ size_t iov_iter_copy_from_user_atomic(st
 	char *kaddr;
 	size_t copied;
 
-	BUG_ON(!in_atomic());
 	kaddr = kmap_atomic(page);
 	if (likely(i->nr_segs == 1)) {
 		int left;
@@ -2562,9 +2561,7 @@ again:
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_page(page);
 
-		pagefault_disable();
 		copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
-		pagefault_enable();
 		flush_dcache_page(page);
 
 		mark_page_accessed(page);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 106/122] mm/readahead.c: inline ra_submit
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (98 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 105/122] callers of iov_copy_from_user_atomic() dont need pagecache_disable() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 107/122] mm/compaction: clean up unused code lines Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Fabian Frederick, Andrew Morton,
	Linus Torvalds, Mel Gorman

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Fabian Frederick <fabf@skynet.be>

commit 29f175d125f0f3a9503af8a5596f93d714cceb08 upstream.

Commit f9acc8c7b35a ("readahead: sanify file_ra_state names") left
ra_submit with a single function call.

Move ra_submit to internal.h and inline it to save some stack.  Thanks
to Andrew Morton for commenting different versions.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mm.h |    3 ---
 mm/internal.h      |   15 +++++++++++++++
 mm/readahead.c     |   21 +++------------------
 3 files changed, 18 insertions(+), 21 deletions(-)

--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1856,9 +1856,6 @@ void page_cache_async_readahead(struct a
 				unsigned long size);
 
 unsigned long max_sane_readahead(unsigned long nr);
-unsigned long ra_submit(struct file_ra_state *ra,
-			struct address_space *mapping,
-			struct file *filp);
 
 /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
 extern int expand_stack(struct vm_area_struct *vma, unsigned long address);
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -11,6 +11,7 @@
 #ifndef __MM_INTERNAL_H
 #define __MM_INTERNAL_H
 
+#include <linux/fs.h>
 #include <linux/mm.h>
 
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
@@ -21,6 +22,20 @@ static inline void set_page_count(struct
 	atomic_set(&page->_count, v);
 }
 
+extern int __do_page_cache_readahead(struct address_space *mapping,
+		struct file *filp, pgoff_t offset, unsigned long nr_to_read,
+		unsigned long lookahead_size);
+
+/*
+ * Submit IO for the read-ahead request in file_ra_state.
+ */
+static inline unsigned long ra_submit(struct file_ra_state *ra,
+		struct address_space *mapping, struct file *filp)
+{
+	return __do_page_cache_readahead(mapping, filp,
+					ra->start, ra->size, ra->async_size);
+}
+
 /*
  * Turn a non-refcounted page (->_count == 0) into refcounted with
  * a count of one.
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -8,9 +8,7 @@
  */
 
 #include <linux/kernel.h>
-#include <linux/fs.h>
 #include <linux/gfp.h>
-#include <linux/mm.h>
 #include <linux/export.h>
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
@@ -20,6 +18,8 @@
 #include <linux/syscalls.h>
 #include <linux/file.h>
 
+#include "internal.h"
+
 /*
  * Initialise a struct file's readahead state.  Assumes that the caller has
  * memset *ra to zero.
@@ -149,8 +149,7 @@ out:
  *
  * Returns the number of pages requested, or the maximum amount of I/O allowed.
  */
-static int
-__do_page_cache_readahead(struct address_space *mapping, struct file *filp,
+int __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			pgoff_t offset, unsigned long nr_to_read,
 			unsigned long lookahead_size)
 {
@@ -244,20 +243,6 @@ unsigned long max_sane_readahead(unsigne
 }
 
 /*
- * Submit IO for the read-ahead request in file_ra_state.
- */
-unsigned long ra_submit(struct file_ra_state *ra,
-		       struct address_space *mapping, struct file *filp)
-{
-	int actual;
-
-	actual = __do_page_cache_readahead(mapping, filp,
-					ra->start, ra->size, ra->async_size);
-
-	return actual;
-}
-
-/*
  * Set the initial window size, round to next power of 2 and square
  * for small size, x 4 for medium, and x 2 for large
  * for 128k (32 page) max ra



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 107/122] mm/compaction: clean up unused code lines
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (99 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 106/122] mm/readahead.c: inline ra_submit Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 108/122] mm/compaction: cleanup isolate_freepages() Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Heesub Shin, Vlastimil Babka,
	Dongjun Shin, Sunghwan Yun, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Bartlomiej Zolnierkiewicz, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Heesub Shin <heesub.shin@samsung.com>

commit 13fb44e4b0414d7e718433a49e6430d5b76bd46e upstream.

Remove code lines currently not in use or never called.

Signed-off-by: Heesub Shin <heesub.shin@samsung.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dongjun Shin <d.j.shin@samsung.com>
Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dongjun Shin <d.j.shin@samsung.com>
Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   10 ----------
 1 file changed, 10 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -208,12 +208,6 @@ static bool compact_checklock_irqsave(sp
 	return true;
 }
 
-static inline bool compact_trylock_irqsave(spinlock_t *lock,
-			unsigned long *flags, struct compact_control *cc)
-{
-	return compact_checklock_irqsave(lock, flags, false, cc);
-}
-
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
@@ -736,7 +730,6 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Found a block suitable for isolating free pages from */
-		isolated = 0;
 
 		/*
 		 * Take care when isolating in last pageblock of a zone which
@@ -1165,9 +1158,6 @@ static void __compact_pgdat(pg_data_t *p
 			if (zone_watermark_ok(zone, cc->order,
 						low_wmark_pages(zone), 0, 0))
 				compaction_defer_reset(zone, cc->order, false);
-			/* Currently async compaction is never deferred. */
-			else if (cc->sync)
-				defer_compaction(zone, cc->order);
 		}
 
 		VM_BUG_ON(!list_empty(&cc->freepages));



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 108/122] mm/compaction: cleanup isolate_freepages()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (100 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 107/122] mm/compaction: clean up unused code lines Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 109/122] mm, migration: add destination page freeing callback Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Minchan Kim,
	Mel Gorman, Joonsoo Kim, Bartlomiej Zolnierkiewicz,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, Dongjun Shin, Sunghwan Yun, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit c96b9e508f3d06ddb601dcc9792d62c044ab359e upstream.

isolate_freepages() is currently somewhat hard to follow thanks to many
looks like it is related to the 'low_pfn' variable, but in fact it is not.

This patch renames the 'high_pfn' variable to a hopefully less confusing name,
and slightly changes its handling without a functional change. A comment made
obsolete by recent changes is also updated.

[akpm@linux-foundation.org: comment fixes, per Minchan]
[iamjoonsoo.kim@lge.com: cleanups]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dongjun Shin <d.j.shin@samsung.com>
Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   56 +++++++++++++++++++++++++++-----------------------------
 1 file changed, 27 insertions(+), 29 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -665,7 +665,10 @@ static void isolate_freepages(struct zon
 				struct compact_control *cc)
 {
 	struct page *page;
-	unsigned long high_pfn, low_pfn, pfn, z_end_pfn;
+	unsigned long block_start_pfn;	/* start of current pageblock */
+	unsigned long block_end_pfn;	/* end of current pageblock */
+	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
+	unsigned long next_free_pfn; /* start pfn for scaning at next round */
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
 
@@ -673,32 +676,33 @@ static void isolate_freepages(struct zon
 	 * Initialise the free scanner. The starting point is where we last
 	 * successfully isolated from, zone-cached value, or the end of the
 	 * zone when isolating for the first time. We need this aligned to
-	 * the pageblock boundary, because we do pfn -= pageblock_nr_pages
-	 * in the for loop.
+	 * the pageblock boundary, because we do
+	 * block_start_pfn -= pageblock_nr_pages in the for loop.
+	 * For ending point, take care when isolating in last pageblock of a
+	 * a zone which ends in the middle of a pageblock.
 	 * The low boundary is the end of the pageblock the migration scanner
 	 * is using.
 	 */
-	pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
+	block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1);
+	block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
+						zone_end_pfn(zone));
 	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
 
 	/*
-	 * Take care that if the migration scanner is at the end of the zone
-	 * that the free scanner does not accidentally move to the next zone
-	 * in the next isolation cycle.
+	 * If no pages are isolated, the block_start_pfn < low_pfn check
+	 * will kick in.
 	 */
-	high_pfn = min(low_pfn, pfn);
-
-	z_end_pfn = zone_end_pfn(zone);
+	next_free_pfn = 0;
 
 	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
-					pfn -= pageblock_nr_pages) {
+	for (; block_start_pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
+				block_end_pfn = block_start_pfn,
+				block_start_pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
-		unsigned long end_pfn;
 
 		/*
 		 * This can iterate a massively long zone without finding any
@@ -707,7 +711,7 @@ static void isolate_freepages(struct zon
 		 */
 		cond_resched();
 
-		if (!pfn_valid(pfn))
+		if (!pfn_valid(block_start_pfn))
 			continue;
 
 		/*
@@ -717,7 +721,7 @@ static void isolate_freepages(struct zon
 		 * i.e. it's possible that all pages within a zones range of
 		 * pages do not belong to a single zone.
 		 */
-		page = pfn_to_page(pfn);
+		page = pfn_to_page(block_start_pfn);
 		if (page_zone(page) != zone)
 			continue;
 
@@ -730,14 +734,8 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Found a block suitable for isolating free pages from */
-
-		/*
-		 * Take care when isolating in last pageblock of a zone which
-		 * ends in the middle of a pageblock.
-		 */
-		end_pfn = min(pfn + pageblock_nr_pages, z_end_pfn);
-		isolated = isolate_freepages_block(cc, pfn, end_pfn,
-						   freelist, false);
+		isolated = isolate_freepages_block(cc, block_start_pfn,
+					block_end_pfn, freelist, false);
 		nr_freepages += isolated;
 
 		/*
@@ -745,9 +743,9 @@ static void isolate_freepages(struct zon
 		 * looking for free pages, the search will restart here as
 		 * page migration may have returned some pages to the allocator
 		 */
-		if (isolated) {
+		if (isolated && next_free_pfn == 0) {
 			cc->finished_update_free = true;
-			high_pfn = max(high_pfn, pfn);
+			next_free_pfn = block_start_pfn;
 		}
 	}
 
@@ -758,10 +756,10 @@ static void isolate_freepages(struct zon
 	 * If we crossed the migrate scanner, we want to keep it that way
 	 * so that compact_finished() may detect this
 	 */
-	if (pfn < low_pfn)
-		cc->free_pfn = max(pfn, zone->zone_start_pfn);
-	else
-		cc->free_pfn = high_pfn;
+	if (block_start_pfn < low_pfn)
+		next_free_pfn = cc->migrate_pfn;
+
+	cc->free_pfn = next_free_pfn;
 	cc->nr_freepages = nr_freepages;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 109/122] mm, migration: add destination page freeing callback
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (101 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 108/122] mm/compaction: cleanup isolate_freepages() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 110/122] mm, compaction: return failed migration target pages back to freelist Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Rientjes, Naoya Horiguchi,
	Mel Gorman, Vlastimil Babka, Greg Thelen, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit 68711a746345c44ae00c64d8dbac6a9ce13ac54a upstream.

Memory migration uses a callback defined by the caller to determine how to
allocate destination pages.  When migration fails for a source page,
however, it frees the destination page back to the system.

This patch adds a memory migration callback defined by the caller to
determine how to free destination pages.  If a caller, such as memory
compaction, builds its own freelist for migration targets, this can reuse
already freed memory instead of scanning additional memory.

If the caller provides a function to handle freeing of destination pages,
it is called when page migration fails.  If the caller passes NULL then
freeing back to the system will be handled as usual.  This patch
introduces no functional change.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 include/linux/migrate.h |   11 ++++++---
 mm/compaction.c         |    2 -
 mm/memory-failure.c     |    4 +--
 mm/memory_hotplug.c     |    2 -
 mm/mempolicy.c          |    4 +--
 mm/migrate.c            |   55 ++++++++++++++++++++++++++++++++++--------------
 mm/page_alloc.c         |    2 -
 7 files changed, 53 insertions(+), 27 deletions(-)

--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -5,7 +5,9 @@
 #include <linux/mempolicy.h>
 #include <linux/migrate_mode.h>
 
-typedef struct page *new_page_t(struct page *, unsigned long private, int **);
+typedef struct page *new_page_t(struct page *page, unsigned long private,
+				int **reason);
+typedef void free_page_t(struct page *page, unsigned long private);
 
 /*
  * Return values from addresss_space_operations.migratepage():
@@ -38,7 +40,7 @@ enum migrate_reason {
 extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *, enum migrate_mode);
-extern int migrate_pages(struct list_head *l, new_page_t x,
+extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
 		unsigned long private, enum migrate_mode mode, int reason);
 
 extern int migrate_prep(void);
@@ -56,8 +58,9 @@ extern int migrate_page_move_mapping(str
 #else
 
 static inline void putback_movable_pages(struct list_head *l) {}
-static inline int migrate_pages(struct list_head *l, new_page_t x,
-		unsigned long private, enum migrate_mode mode, int reason)
+static inline int migrate_pages(struct list_head *l, new_page_t new,
+		free_page_t free, unsigned long private, enum migrate_mode mode,
+		int reason)
 	{ return -ENOSYS; }
 
 static inline int migrate_prep(void) { return -ENOSYS; }
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1016,7 +1016,7 @@ static int compact_zone(struct zone *zon
 		}
 
 		nr_migrate = cc->nr_migratepages;
-		err = migrate_pages(&cc->migratepages, compaction_alloc,
+		err = migrate_pages(&cc->migratepages, compaction_alloc, NULL,
 				(unsigned long)cc,
 				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
 				MR_COMPACTION);
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1540,7 +1540,7 @@ static int soft_offline_huge_page(struct
 
 	/* Keep page count to indicate a given hugepage is isolated. */
 	list_move(&hpage->lru, &pagelist);
-	ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+	ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 				MIGRATE_SYNC, MR_MEMORY_FAILURE);
 	if (ret) {
 		pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
@@ -1621,7 +1621,7 @@ static int __soft_offline_page(struct pa
 		inc_zone_page_state(page, NR_ISOLATED_ANON +
 					page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
-		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
 		if (ret) {
 			if (!list_empty(&pagelist)) {
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1332,7 +1332,7 @@ do_migrate_range(unsigned long start_pfn
 		 * alloc_migrate_target should be improooooved!!
 		 * migrate_pages returns # of failed pages.
 		 */
-		ret = migrate_pages(&source, alloc_migrate_target, 0,
+		ret = migrate_pages(&source, alloc_migrate_target, NULL, 0,
 					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
 		if (ret)
 			putback_movable_pages(&source);
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1060,7 +1060,7 @@ static int migrate_to_node(struct mm_str
 			flags | MPOL_MF_DISCONTIG_OK, &pagelist);
 
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_node_page, dest,
+		err = migrate_pages(&pagelist, new_node_page, NULL, dest,
 					MIGRATE_SYNC, MR_SYSCALL);
 		if (err)
 			putback_movable_pages(&pagelist);
@@ -1306,7 +1306,7 @@ static long do_mbind(unsigned long start
 
 		if (!list_empty(&pagelist)) {
 			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
-			nr_failed = migrate_pages(&pagelist, new_page,
+			nr_failed = migrate_pages(&pagelist, new_page, NULL,
 				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
 			if (nr_failed)
 				putback_movable_pages(&pagelist);
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -941,8 +941,9 @@ out:
  * Obtain the lock on page, remove all ptes and migrate the page
  * to the newly allocated page in newpage.
  */
-static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force, enum migrate_mode mode)
+static int unmap_and_move(new_page_t get_new_page, free_page_t put_new_page,
+			unsigned long private, struct page *page, int force,
+			enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -986,11 +987,17 @@ out:
 				page_is_file_cache(page));
 		putback_lru_page(page);
 	}
+
 	/*
-	 * Move the new page to the LRU. If migration was not successful
-	 * then this will free the page.
+	 * If migration was not successful and there's a freeing callback, use
+	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
+	 * during isolation.
 	 */
-	putback_lru_page(newpage);
+	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+		put_new_page(newpage, private);
+	else
+		putback_lru_page(newpage);
+
 	if (result) {
 		if (rc)
 			*result = rc;
@@ -1019,8 +1026,9 @@ out:
  * will wait in the page fault for migration to complete.
  */
 static int unmap_and_move_huge_page(new_page_t get_new_page,
-				unsigned long private, struct page *hpage,
-				int force, enum migrate_mode mode)
+				free_page_t put_new_page, unsigned long private,
+				struct page *hpage, int force,
+				enum migrate_mode mode)
 {
 	int rc = 0;
 	int *result = NULL;
@@ -1059,20 +1067,30 @@ static int unmap_and_move_huge_page(new_
 	if (!page_mapped(hpage))
 		rc = move_to_new_page(new_hpage, hpage, 1, mode);
 
-	if (rc)
+	if (rc != MIGRATEPAGE_SUCCESS)
 		remove_migration_ptes(hpage, hpage);
 
 	if (anon_vma)
 		put_anon_vma(anon_vma);
 
-	if (!rc)
+	if (rc == MIGRATEPAGE_SUCCESS)
 		hugetlb_cgroup_migrate(hpage, new_hpage);
 
 	unlock_page(hpage);
 out:
 	if (rc != -EAGAIN)
 		putback_active_hugepage(hpage);
-	put_page(new_hpage);
+
+	/*
+	 * If migration was not successful and there's a freeing callback, use
+	 * it.  Otherwise, put_page() will drop the reference grabbed during
+	 * isolation.
+	 */
+	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+		put_new_page(new_hpage, private);
+	else
+		put_page(new_hpage);
+
 	if (result) {
 		if (rc)
 			*result = rc;
@@ -1089,6 +1107,8 @@ out:
  * @from:		The list of pages to be migrated.
  * @get_new_page:	The function used to allocate free pages to be used
  *			as the target of the page migration.
+ * @put_new_page:	The function used to free target pages if migration
+ *			fails, or NULL if no special handling is necessary.
  * @private:		Private data to be passed on to get_new_page()
  * @mode:		The migration mode that specifies the constraints for
  *			page migration, if any.
@@ -1102,7 +1122,8 @@ out:
  * Returns the number of pages that were not migrated, or an error code.
  */
 int migrate_pages(struct list_head *from, new_page_t get_new_page,
-		unsigned long private, enum migrate_mode mode, int reason)
+		free_page_t put_new_page, unsigned long private,
+		enum migrate_mode mode, int reason)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -1124,10 +1145,11 @@ int migrate_pages(struct list_head *from
 
 			if (PageHuge(page))
 				rc = unmap_and_move_huge_page(get_new_page,
-						private, page, pass > 2, mode);
+						put_new_page, private, page,
+						pass > 2, mode);
 			else
-				rc = unmap_and_move(get_new_page, private,
-						page, pass > 2, mode);
+				rc = unmap_and_move(get_new_page, put_new_page,
+						private, page, pass > 2, mode);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -1276,7 +1298,7 @@ set_status:
 
 	err = 0;
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_page_node,
+		err = migrate_pages(&pagelist, new_page_node, NULL,
 				(unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
 		if (err)
 			putback_movable_pages(&pagelist);
@@ -1732,7 +1754,8 @@ int migrate_misplaced_page(struct page *
 
 	list_add(&page->lru, &migratepages);
 	nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_page,
-				     node, MIGRATE_ASYNC, MR_NUMA_MISPLACED);
+				     NULL, node, MIGRATE_ASYNC,
+				     MR_NUMA_MISPLACED);
 	if (nr_remaining) {
 		if (!list_empty(&migratepages)) {
 			list_del(&page->lru);
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6261,7 +6261,7 @@ static int __alloc_contig_migrate_range(
 		cc->nr_migratepages -= nr_reclaimed;
 
 		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
-				    0, MIGRATE_SYNC, MR_CMA);
+				    NULL, 0, MIGRATE_SYNC, MR_CMA);
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 110/122] mm, compaction: return failed migration target pages back to freelist
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (102 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 109/122] mm, migration: add destination page freeing callback Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 111/122] mm, compaction: add per-zone migration pfn cache for async compaction Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Rientjes, Greg Thelen,
	Mel Gorman, Vlastimil Babka, Naoya Horiguchi, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit d53aea3d46d64e95da9952887969f7533b9ab25e upstream.

Greg reported that he found isolated free pages were returned back to the
VM rather than the compaction freelist.  This will cause holes behind the
free scanner and cause it to reallocate additional memory if necessary
later.

He detected the problem at runtime seeing that ext4 metadata pages (esp
the ones read by "sbi->s_group_desc[i] = sb_bread(sb, block)") were
constantly visited by compaction calls of migrate_pages().  These pages
had a non-zero b_count which caused fallback_migrate_page() ->
try_to_release_page() -> try_to_free_buffers() to fail.

Memory compaction works by having a "freeing scanner" scan from one end of
a zone which isolates pages as migration targets while another "migrating
scanner" scans from the other end of the same zone which isolates pages
for migration.

When page migration fails for an isolated page, the target page is
returned to the system rather than the freelist built by the freeing
scanner.  This may require the freeing scanner to continue scanning memory
after suitable migration targets have already been returned to the system
needlessly.

This patch returns destination pages to the freeing scanner freelist when
page migration fails.  This prevents unnecessary work done by the freeing
scanner but also encourages memory to be as compacted as possible at the
end of the zone.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -790,23 +790,32 @@ static struct page *compaction_alloc(str
 }
 
 /*
- * We cannot control nr_migratepages and nr_freepages fully when migration is
- * running as migrate_pages() has no knowledge of compact_control. When
- * migration is complete, we count the number of pages on the lists by hand.
+ * This is a migrate-callback that "frees" freepages back to the isolated
+ * freelist.  All pages on the freelist are from the same zone, so there is no
+ * special handling needed for NUMA.
+ */
+static void compaction_free(struct page *page, unsigned long data)
+{
+	struct compact_control *cc = (struct compact_control *)data;
+
+	list_add(&page->lru, &cc->freepages);
+	cc->nr_freepages++;
+}
+
+/*
+ * We cannot control nr_migratepages fully when migration is running as
+ * migrate_pages() has no knowledge of of compact_control.  When migration is
+ * complete, we count the number of pages on the list by hand.
  */
 static void update_nr_listpages(struct compact_control *cc)
 {
 	int nr_migratepages = 0;
-	int nr_freepages = 0;
 	struct page *page;
 
 	list_for_each_entry(page, &cc->migratepages, lru)
 		nr_migratepages++;
-	list_for_each_entry(page, &cc->freepages, lru)
-		nr_freepages++;
 
 	cc->nr_migratepages = nr_migratepages;
-	cc->nr_freepages = nr_freepages;
 }
 
 /* possible outcome of isolate_migratepages */
@@ -1016,8 +1025,8 @@ static int compact_zone(struct zone *zon
 		}
 
 		nr_migrate = cc->nr_migratepages;
-		err = migrate_pages(&cc->migratepages, compaction_alloc, NULL,
-				(unsigned long)cc,
+		err = migrate_pages(&cc->migratepages, compaction_alloc,
+				compaction_free, (unsigned long)cc,
 				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
 				MR_COMPACTION);
 		update_nr_listpages(cc);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 111/122] mm, compaction: add per-zone migration pfn cache for async compaction
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (103 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 110/122] mm, compaction: return failed migration target pages back to freelist Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 112/122] mm, compaction: embed migration mode in compact_control Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Rientjes, Vlastimil Babka,
	Naoya Horiguchi, Greg Thelen, Mel Gorman, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit 35979ef3393110ff3c12c6b94552208d3bdf1a36 upstream.

Each zone has a cached migration scanner pfn for memory compaction so that
subsequent calls to memory compaction can start where the previous call
left off.

Currently, the compaction migration scanner only updates the per-zone
cached pfn when pageblocks were not skipped for async compaction.  This
creates a dependency on calling sync compaction to avoid having subsequent
calls to async compaction from scanning an enormous amount of non-MOVABLE
pageblocks each time it is called.  On large machines, this could be
potentially very expensive.

This patch adds a per-zone cached migration scanner pfn only for async
compaction.  It is updated everytime a pageblock has been scanned in its
entirety and when no pages from it were successfully isolated.  The cached
migration scanner pfn for sync compaction is updated only when called for
sync compaction.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mmzone.h |    5 ++-
 mm/compaction.c        |   66 +++++++++++++++++++++++++++++--------------------
 2 files changed, 43 insertions(+), 28 deletions(-)

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -361,9 +361,10 @@ struct zone {
 	/* Set to true when the PG_migrate_skip bits should be cleared */
 	bool			compact_blockskip_flush;
 
-	/* pfns where compaction scanners should start */
+	/* pfn where compaction free scanner should start */
 	unsigned long		compact_cached_free_pfn;
-	unsigned long		compact_cached_migrate_pfn;
+	/* pfn where async and sync compaction migration scanner should start */
+	unsigned long		compact_cached_migrate_pfn[2];
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,7 +89,8 @@ static void __reset_isolation_suitable(s
 	unsigned long end_pfn = zone_end_pfn(zone);
 	unsigned long pfn;
 
-	zone->compact_cached_migrate_pfn = start_pfn;
+	zone->compact_cached_migrate_pfn[0] = start_pfn;
+	zone->compact_cached_migrate_pfn[1] = start_pfn;
 	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
@@ -131,9 +132,10 @@ void reset_isolation_suitable(pg_data_t
  */
 static void update_pageblock_skip(struct compact_control *cc,
 			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			bool set_unsuitable, bool migrate_scanner)
 {
 	struct zone *zone = cc->zone;
+	unsigned long pfn;
 
 	if (cc->ignore_skip_hint)
 		return;
@@ -141,20 +143,31 @@ static void update_pageblock_skip(struct
 	if (!page)
 		return;
 
-	if (!nr_isolated) {
-		unsigned long pfn = page_to_pfn(page);
+	if (nr_isolated)
+		return;
+
+	/*
+	 * Only skip pageblocks when all forms of compaction will be known to
+	 * fail in the near future.
+	 */
+	if (set_unsuitable)
 		set_pageblock_skip(page);
 
-		/* Update where compaction should restart */
-		if (migrate_scanner) {
-			if (!cc->finished_update_migrate &&
-			    pfn > zone->compact_cached_migrate_pfn)
-				zone->compact_cached_migrate_pfn = pfn;
-		} else {
-			if (!cc->finished_update_free &&
-			    pfn < zone->compact_cached_free_pfn)
-				zone->compact_cached_free_pfn = pfn;
-		}
+	pfn = page_to_pfn(page);
+
+	/* Update where async and sync compaction should restart */
+	if (migrate_scanner) {
+		if (cc->finished_update_migrate)
+			return;
+		if (pfn > zone->compact_cached_migrate_pfn[0])
+			zone->compact_cached_migrate_pfn[0] = pfn;
+		if (cc->sync && pfn > zone->compact_cached_migrate_pfn[1])
+			zone->compact_cached_migrate_pfn[1] = pfn;
+	} else {
+		if (cc->finished_update_free)
+			return;
+		if (pfn < zone->compact_cached_free_pfn)
+			zone->compact_cached_free_pfn = pfn;
 	}
 }
 #else
@@ -166,7 +179,7 @@ static inline bool isolation_suitable(st
 
 static void update_pageblock_skip(struct compact_control *cc,
 			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			bool set_unsuitable, bool migrate_scanner)
 {
 }
 #endif /* CONFIG_COMPACTION */
@@ -323,7 +336,8 @@ isolate_fail:
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated, true,
+				      false);
 
 	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
 	if (total_isolated)
@@ -458,7 +472,7 @@ isolate_migratepages_range(struct zone *
 	unsigned long flags;
 	bool locked = false;
 	struct page *page = NULL, *valid_page = NULL;
-	bool skipped_async_unsuitable = false;
+	bool set_unsuitable = true;
 	const isolate_mode_t mode = (!cc->sync ? ISOLATE_ASYNC_MIGRATE : 0) |
 				    (unevictable ? ISOLATE_UNEVICTABLE : 0);
 
@@ -535,8 +549,7 @@ isolate_migratepages_range(struct zone *
 			 */
 			mt = get_pageblock_migratetype(page);
 			if (!cc->sync && !migrate_async_suitable(mt)) {
-				cc->finished_update_migrate = true;
-				skipped_async_unsuitable = true;
+				set_unsuitable = false;
 				goto next_pageblock;
 			}
 		}
@@ -640,11 +653,10 @@ next_pageblock:
 	/*
 	 * Update the pageblock-skip information and cached scanner pfn,
 	 * if the whole pageblock was scanned without isolating any page.
-	 * This is not done when pageblock was skipped due to being unsuitable
-	 * for async compaction, so that eventual sync compaction can try.
 	 */
-	if (low_pfn == end_pfn && !skipped_async_unsuitable)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+	if (low_pfn == end_pfn)
+		update_pageblock_skip(cc, valid_page, nr_isolated,
+				      set_unsuitable, true);
 
 	trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
 
@@ -868,7 +880,8 @@ static int compact_finished(struct zone
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (cc->free_pfn <= cc->migrate_pfn) {
 		/* Let the next compaction start anew. */
-		zone->compact_cached_migrate_pfn = zone->zone_start_pfn;
+		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
 		zone->compact_cached_free_pfn = zone_end_pfn(zone);
 
 		/*
@@ -993,7 +1006,7 @@ static int compact_zone(struct zone *zon
 	 * information on where the scanners should start but check that it
 	 * is initialised by ensuring the values are within zone boundaries.
 	 */
-	cc->migrate_pfn = zone->compact_cached_migrate_pfn;
+	cc->migrate_pfn = zone->compact_cached_migrate_pfn[cc->sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
 	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
 		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
@@ -1001,7 +1014,8 @@ static int compact_zone(struct zone *zon
 	}
 	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
 		cc->migrate_pfn = start_pfn;
-		zone->compact_cached_migrate_pfn = cc->migrate_pfn;
+		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
+		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
 	}
 
 	trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 112/122] mm, compaction: embed migration mode in compact_control
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (104 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 111/122] mm, compaction: add per-zone migration pfn cache for async compaction Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 113/122] mm, compaction: terminate async compaction when rescheduling Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Rientjes, Mel Gorman,
	Vlastimil Babka, Greg Thelen, Naoya Horiguchi, Joonsoo Kim,
	Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit e0b9daeb453e602a95ea43853dc12d385558ce1f upstream.

We're going to want to manipulate the migration mode for compaction in the
page allocator, and currently compact_control's sync field is only a bool.

Currently, we only do MIGRATE_ASYNC or MIGRATE_SYNC_LIGHT compaction
depending on the value of this bool.  Convert the bool to enum
migrate_mode and pass the migration mode in directly.  Later, we'll want
to avoid MIGRATE_SYNC_LIGHT for thp allocations in the pagefault patch to
avoid unnecessary latency.

This also alters compaction triggered from sysfs, either for the entire
system or for a node, to force MIGRATE_SYNC.

[akpm@linux-foundation.org: fix build]
[iamjoonsoo.kim@lge.com: use MIGRATE_SYNC in alloc_contig_range()]
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/compaction.h |    4 ++--
 mm/compaction.c            |   36 +++++++++++++++++++-----------------
 mm/internal.h              |    2 +-
 mm/page_alloc.c            |   39 +++++++++++++++++----------------------
 4 files changed, 39 insertions(+), 42 deletions(-)

--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,7 +22,7 @@ extern int sysctl_extfrag_handler(struct
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask,
-			bool sync, bool *contended);
+			enum migrate_mode mode, bool *contended);
 extern void compact_pgdat(pg_data_t *pgdat, int order);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern unsigned long compaction_suitable(struct zone *zone, int order);
@@ -91,7 +91,7 @@ static inline bool compaction_restarting
 #else
 static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			bool sync, bool *contended)
+			enum migrate_mode mode, bool *contended)
 {
 	return COMPACT_CONTINUE;
 }
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -161,7 +161,8 @@ static void update_pageblock_skip(struct
 			return;
 		if (pfn > zone->compact_cached_migrate_pfn[0])
 			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->sync && pfn > zone->compact_cached_migrate_pfn[1])
+		if (cc->mode != MIGRATE_ASYNC &&
+		    pfn > zone->compact_cached_migrate_pfn[1])
 			zone->compact_cached_migrate_pfn[1] = pfn;
 	} else {
 		if (cc->finished_update_free)
@@ -208,7 +209,7 @@ static bool compact_checklock_irqsave(sp
 		}
 
 		/* async aborts if taking too long or contended */
-		if (!cc->sync) {
+		if (cc->mode == MIGRATE_ASYNC) {
 			cc->contended = true;
 			return false;
 		}
@@ -473,7 +474,8 @@ isolate_migratepages_range(struct zone *
 	bool locked = false;
 	struct page *page = NULL, *valid_page = NULL;
 	bool set_unsuitable = true;
-	const isolate_mode_t mode = (!cc->sync ? ISOLATE_ASYNC_MIGRATE : 0) |
+	const isolate_mode_t mode = (cc->mode == MIGRATE_ASYNC ?
+					ISOLATE_ASYNC_MIGRATE : 0) |
 				    (unevictable ? ISOLATE_UNEVICTABLE : 0);
 
 	/*
@@ -483,7 +485,7 @@ isolate_migratepages_range(struct zone *
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode == MIGRATE_ASYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -548,7 +550,8 @@ isolate_migratepages_range(struct zone *
 			 * the minimum amount of work satisfies the allocation
 			 */
 			mt = get_pageblock_migratetype(page);
-			if (!cc->sync && !migrate_async_suitable(mt)) {
+			if (cc->mode == MIGRATE_ASYNC &&
+			    !migrate_async_suitable(mt)) {
 				set_unsuitable = false;
 				goto next_pageblock;
 			}
@@ -981,6 +984,7 @@ static int compact_zone(struct zone *zon
 	int ret;
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
+	const bool sync = cc->mode != MIGRATE_ASYNC;
 
 	ret = compaction_suitable(zone, cc->order);
 	switch (ret) {
@@ -1006,7 +1010,7 @@ static int compact_zone(struct zone *zon
 	 * information on where the scanners should start but check that it
 	 * is initialised by ensuring the values are within zone boundaries.
 	 */
-	cc->migrate_pfn = zone->compact_cached_migrate_pfn[cc->sync];
+	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
 	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
 		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
@@ -1040,8 +1044,7 @@ static int compact_zone(struct zone *zon
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				compaction_free, (unsigned long)cc,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
+				compaction_free, (unsigned long)cc, cc->mode,
 				MR_COMPACTION);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
@@ -1074,9 +1077,8 @@ out:
 	return ret;
 }
 
-static unsigned long compact_zone_order(struct zone *zone,
-				 int order, gfp_t gfp_mask,
-				 bool sync, bool *contended)
+static unsigned long compact_zone_order(struct zone *zone, int order,
+		gfp_t gfp_mask, enum migrate_mode mode, bool *contended)
 {
 	unsigned long ret;
 	struct compact_control cc = {
@@ -1085,7 +1087,7 @@ static unsigned long compact_zone_order(
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -1107,7 +1109,7 @@ int sysctl_extfrag_threshold = 500;
  * @order: The order of the current allocation
  * @gfp_mask: The GFP mask of the current allocation
  * @nodemask: The allowed nodes to allocate from
- * @sync: Whether migration is synchronous or not
+ * @mode: The migration mode for async, sync light, or sync migration
  * @contended: Return value that is true if compaction was aborted due to lock contention
  * @page: Optionally capture a free page of the requested order during compaction
  *
@@ -1115,7 +1117,7 @@ int sysctl_extfrag_threshold = 500;
  */
 unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			bool sync, bool *contended)
+			enum migrate_mode mode, bool *contended)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
@@ -1140,7 +1142,7 @@ unsigned long try_to_compact_pages(struc
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync,
+		status = compact_zone_order(zone, order, gfp_mask, mode,
 						contended);
 		rc = max(status, rc);
 
@@ -1190,7 +1192,7 @@ void compact_pgdat(pg_data_t *pgdat, int
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = MIGRATE_ASYNC,
 	};
 
 	if (!order)
@@ -1203,7 +1205,7 @@ static void compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
 	};
 
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -134,7 +134,7 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum migrate_mode mode;		/* Async or sync migration mode */
 	bool ignore_skip_hint;		/* Scan blocks even if marked skip */
 	bool finished_update_free;	/* True when the zone cached pfns are
 					 * no longer being updated
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2246,7 +2246,7 @@ static struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, bool sync_migration,
+	int migratetype, enum migrate_mode mode,
 	bool *contended_compaction, bool *deferred_compaction,
 	unsigned long *did_some_progress)
 {
@@ -2260,7 +2260,7 @@ __alloc_pages_direct_compact(gfp_t gfp_m
 
 	current->flags |= PF_MEMALLOC;
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
-						nodemask, sync_migration,
+						nodemask, mode,
 						contended_compaction);
 	current->flags &= ~PF_MEMALLOC;
 
@@ -2293,7 +2293,7 @@ __alloc_pages_direct_compact(gfp_t gfp_m
 		 * As async compaction considers a subset of pageblocks, only
 		 * defer if the failure was a sync compaction failure.
 		 */
-		if (sync_migration)
+		if (mode != MIGRATE_ASYNC)
 			defer_compaction(preferred_zone, order);
 
 		cond_resched();
@@ -2306,9 +2306,8 @@ static inline struct page *
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, bool sync_migration,
-	bool *contended_compaction, bool *deferred_compaction,
-	unsigned long *did_some_progress)
+	int migratetype, enum migrate_mode mode, bool *contended_compaction,
+	bool *deferred_compaction, unsigned long *did_some_progress)
 {
 	return NULL;
 }
@@ -2503,7 +2502,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, u
 	int alloc_flags;
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
-	bool sync_migration = false;
+	enum migrate_mode migration_mode = MIGRATE_ASYNC;
 	bool deferred_compaction = false;
 	bool contended_compaction = false;
 
@@ -2597,17 +2596,15 @@ rebalance:
 	 * Try direct compaction. The first pass is asynchronous. Subsequent
 	 * attempts after direct reclaim are synchronous
 	 */
-	page = __alloc_pages_direct_compact(gfp_mask, order,
-					zonelist, high_zoneidx,
-					nodemask,
-					alloc_flags, preferred_zone,
-					migratetype, sync_migration,
-					&contended_compaction,
+	page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
+					high_zoneidx, nodemask, alloc_flags,
+					preferred_zone, migratetype,
+					migration_mode, &contended_compaction,
 					&deferred_compaction,
 					&did_some_progress);
 	if (page)
 		goto got_pg;
-	sync_migration = true;
+	migration_mode = MIGRATE_SYNC_LIGHT;
 
 	/*
 	 * If compaction is deferred for high-order allocations, it is because
@@ -2682,12 +2679,10 @@ rebalance:
 		 * direct reclaim and reclaim/compaction depends on compaction
 		 * being called after reclaim so call directly if necessary
 		 */
-		page = __alloc_pages_direct_compact(gfp_mask, order,
-					zonelist, high_zoneidx,
-					nodemask,
-					alloc_flags, preferred_zone,
-					migratetype, sync_migration,
-					&contended_compaction,
+		page = __alloc_pages_direct_compact(gfp_mask, order, zonelist,
+					high_zoneidx, nodemask, alloc_flags,
+					preferred_zone, migratetype,
+					migration_mode, &contended_compaction,
 					&deferred_compaction,
 					&did_some_progress);
 		if (page)
@@ -6261,7 +6256,7 @@ static int __alloc_contig_migrate_range(
 		cc->nr_migratepages -= nr_reclaimed;
 
 		ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
-				    NULL, 0, MIGRATE_SYNC, MR_CMA);
+				    NULL, 0, cc->mode, MR_CMA);
 	}
 	if (ret < 0) {
 		putback_movable_pages(&cc->migratepages);
@@ -6300,7 +6295,7 @@ int alloc_contig_range(unsigned long sta
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 113/122] mm, compaction: terminate async compaction when rescheduling
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (105 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 112/122] mm, compaction: embed migration mode in compact_control Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 114/122] mm/compaction: do not count migratepages when unnecessary Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Rientjes, Mel Gorman,
	Vlastimil Babka, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Rientjes <rientjes@google.com>

commit aeef4b83806f49a0c454b7d4578671b71045bee2 upstream.

Async compaction terminates prematurely when need_resched(), see
compact_checklock_irqsave().  This can never trigger, however, if the
cond_resched() in isolate_migratepages_range() always takes care of the
scheduling.

If the cond_resched() actually triggers, then terminate this pageblock
scan for async compaction as well.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -494,8 +494,13 @@ isolate_migratepages_range(struct zone *
 			return 0;
 	}
 
+	if (cond_resched()) {
+		/* Async terminates prematurely on need_resched() */
+		if (cc->mode == MIGRATE_ASYNC)
+			return 0;
+	}
+
 	/* Time to isolate some pages for migration */
-	cond_resched();
 	for (; low_pfn < end_pfn; low_pfn++) {
 		/* give a chance to irqs before checking need_resched() */
 		if (locked && !(low_pfn % SWAP_CLUSTER_MAX)) {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 114/122] mm/compaction: do not count migratepages when unnecessary
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (106 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 113/122] mm, compaction: terminate async compaction when rescheduling Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 115/122] mm/compaction: avoid rescanning pageblocks in isolate_freepages Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Naoya Horiguchi,
	Minchan Kim, Mel Gorman, Joonsoo Kim, Bartlomiej Zolnierkiewicz,
	Michal Nazarewicz, Christoph Lameter, Rik van Riel,
	David Rientjes, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit f8c9301fa5a2a8b873c67f2a3d8230d5c13f61b7 upstream.

During compaction, update_nr_listpages() has been used to count remaining
non-migrated and free pages after a call to migrage_pages().  The
freepages counting has become unneccessary, and it turns out that
migratepages counting is also unnecessary in most cases.

The only situation when it's needed to count cc->migratepages is when
migrate_pages() returns with a negative error code.  Otherwise, the
non-negative return value is the number of pages that were not migrated,
which is exactly the count of remaining pages in the cc->migratepages
list.

Furthermore, any non-zero count is only interesting for the tracepoint of
mm_compaction_migratepages events, because after that all remaining
unmigrated pages are put back and their count is set to 0.

This patch therefore removes update_nr_listpages() completely, and changes
the tracepoint definition so that the manual counting is done only when
the tracepoint is enabled, and only when migrate_pages() returns a
negative error code.

Furthermore, migrate_pages() and the tracepoints won't be called when
there's nothing to migrate.  This potentially avoids some wasted cycles
and reduces the volume of uninteresting mm_compaction_migratepages events
where "nr_migrated=0 nr_failed=0".  In the stress-highalloc mmtest, this
was about 75% of the events.  The mm_compaction_isolate_migratepages event
is better for determining that nothing was isolated for migration, and
this one was just duplicating the info.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/trace/events/compaction.h |   25 +++++++++++++++++++++----
 mm/compaction.c                   |   31 +++++++------------------------
 2 files changed, 28 insertions(+), 28 deletions(-)

--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -5,6 +5,7 @@
 #define _TRACE_COMPACTION_H
 
 #include <linux/types.h>
+#include <linux/list.h>
 #include <linux/tracepoint.h>
 #include <trace/events/gfpflags.h>
 
@@ -47,10 +48,11 @@ DEFINE_EVENT(mm_compaction_isolate_templ
 
 TRACE_EVENT(mm_compaction_migratepages,
 
-	TP_PROTO(unsigned long nr_migrated,
-		unsigned long nr_failed),
+	TP_PROTO(unsigned long nr_all,
+		int migrate_rc,
+		struct list_head *migratepages),
 
-	TP_ARGS(nr_migrated, nr_failed),
+	TP_ARGS(nr_all, migrate_rc, migratepages),
 
 	TP_STRUCT__entry(
 		__field(unsigned long, nr_migrated)
@@ -58,7 +60,22 @@ TRACE_EVENT(mm_compaction_migratepages,
 	),
 
 	TP_fast_assign(
-		__entry->nr_migrated = nr_migrated;
+		unsigned long nr_failed = 0;
+		struct list_head *page_lru;
+
+		/*
+		 * migrate_pages() returns either a non-negative number
+		 * with the number of pages that failed migration, or an
+		 * error code, in which case we need to count the remaining
+		 * pages manually
+		 */
+		if (migrate_rc >= 0)
+			nr_failed = migrate_rc;
+		else
+			list_for_each(page_lru, migratepages)
+				nr_failed++;
+
+		__entry->nr_migrated = nr_all - nr_failed;
 		__entry->nr_failed = nr_failed;
 	),
 
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -822,22 +822,6 @@ static void compaction_free(struct page
 	cc->nr_freepages++;
 }
 
-/*
- * We cannot control nr_migratepages fully when migration is running as
- * migrate_pages() has no knowledge of of compact_control.  When migration is
- * complete, we count the number of pages on the list by hand.
- */
-static void update_nr_listpages(struct compact_control *cc)
-{
-	int nr_migratepages = 0;
-	struct page *page;
-
-	list_for_each_entry(page, &cc->migratepages, lru)
-		nr_migratepages++;
-
-	cc->nr_migratepages = nr_migratepages;
-}
-
 /* possible outcome of isolate_migratepages */
 typedef enum {
 	ISOLATE_ABORT,		/* Abort compaction now */
@@ -1032,7 +1016,6 @@ static int compact_zone(struct zone *zon
 	migrate_prep_local();
 
 	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
-		unsigned long nr_migrate, nr_remaining;
 		int err;
 
 		switch (isolate_migratepages(zone, cc)) {
@@ -1047,20 +1030,20 @@ static int compact_zone(struct zone *zon
 			;
 		}
 
-		nr_migrate = cc->nr_migratepages;
+		if (!cc->nr_migratepages)
+			continue;
+
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
 				compaction_free, (unsigned long)cc, cc->mode,
 				MR_COMPACTION);
-		update_nr_listpages(cc);
-		nr_remaining = cc->nr_migratepages;
 
-		trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
-						nr_remaining);
+		trace_mm_compaction_migratepages(cc->nr_migratepages, err,
+							&cc->migratepages);
 
-		/* Release isolated pages not migrated */
+		/* All pages were either migrated or will be released */
+		cc->nr_migratepages = 0;
 		if (err) {
 			putback_movable_pages(&cc->migratepages);
-			cc->nr_migratepages = 0;
 			/*
 			 * migrate_pages() may return -ENOMEM when scanners meet
 			 * and we want compact_finished() to detect it



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 115/122] mm/compaction: avoid rescanning pageblocks in isolate_freepages
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (107 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 114/122] mm/compaction: do not count migratepages when unnecessary Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 116/122] mm, compaction: properly signal and act upon lock and need_sched() contention Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Minchan Kim,
	Mel Gorman, Joonsoo Kim, Bartlomiej Zolnierkiewicz,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit e9ade569910a82614ff5f2c2cea2b65a8d785da4 upstream.

The compaction free scanner in isolate_freepages() currently remembers PFN
of the highest pageblock where it successfully isolates, to be used as the
starting pageblock for the next invocation.  The rationale behind this is
that page migration might return free pages to the allocator when
migration fails and we don't want to skip them if the compaction
continues.

Since migration now returns free pages back to compaction code where they
can be reused, this is no longer a concern.  This patch changes
isolate_freepages() so that the PFN for restarting is updated with each
pageblock where isolation is attempted.  Using stress-highalloc from
mmtests, this resulted in 10% reduction of the pages scanned by the free
scanner.

Note that the somewhat similar functionality that records highest
successful pageblock in zone->compact_cached_free_pfn, remains unchanged.
This cache is used when the whole compaction is restarted, not for
multiple invocations of the free scanner during single compaction.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -688,7 +688,6 @@ static void isolate_freepages(struct zon
 	unsigned long block_start_pfn;	/* start of current pageblock */
 	unsigned long block_end_pfn;	/* end of current pageblock */
 	unsigned long low_pfn;	     /* lowest pfn scanner is able to scan */
-	unsigned long next_free_pfn; /* start pfn for scaning at next round */
 	int nr_freepages = cc->nr_freepages;
 	struct list_head *freelist = &cc->freepages;
 
@@ -709,12 +708,6 @@ static void isolate_freepages(struct zon
 	low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
 
 	/*
-	 * If no pages are isolated, the block_start_pfn < low_pfn check
-	 * will kick in.
-	 */
-	next_free_pfn = 0;
-
-	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -754,19 +747,19 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Found a block suitable for isolating free pages from */
+		cc->free_pfn = block_start_pfn;
 		isolated = isolate_freepages_block(cc, block_start_pfn,
 					block_end_pfn, freelist, false);
 		nr_freepages += isolated;
 
 		/*
-		 * Record the highest PFN we isolated pages from. When next
-		 * looking for free pages, the search will restart here as
-		 * page migration may have returned some pages to the allocator
+		 * Set a flag that we successfully isolated in this pageblock.
+		 * In the next loop iteration, zone->compact_cached_free_pfn
+		 * will not be updated and thus it will effectively contain the
+		 * highest pageblock we isolated pages from.
 		 */
-		if (isolated && next_free_pfn == 0) {
+		if (isolated)
 			cc->finished_update_free = true;
-			next_free_pfn = block_start_pfn;
-		}
 	}
 
 	/* split_free_page does not map the pages */
@@ -777,9 +770,8 @@ static void isolate_freepages(struct zon
 	 * so that compact_finished() may detect this
 	 */
 	if (block_start_pfn < low_pfn)
-		next_free_pfn = cc->migrate_pfn;
+		cc->free_pfn = cc->migrate_pfn;
 
-	cc->free_pfn = next_free_pfn;
 	cc->nr_freepages = nr_freepages;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 116/122] mm, compaction: properly signal and act upon lock and need_sched() contention
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (108 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 115/122] mm/compaction: avoid rescanning pageblocks in isolate_freepages Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 117/122] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Joonsoo Kim, Vlastimil Babka,
	Naoya Horiguchi, Minchan Kim, Mel Gorman,
	Bartlomiej Zolnierkiewicz, Michal Nazarewicz, Christoph Lameter,
	Rik van Riel, Shawn Guo, Kevin Hilman, Stephen Warren,
	Fabio Estevam, David Rientjes, Stephen Rothwell, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit be9765722e6b7ece8263cbab857490332339bd6f upstream.

Compaction uses compact_checklock_irqsave() function to periodically check
for lock contention and need_resched() to either abort async compaction,
or to free the lock, schedule and retake the lock.  When aborting,
cc->contended is set to signal the contended state to the caller.  Two
problems have been identified in this mechanism.

First, compaction also calls directly cond_resched() in both scanners when
no lock is yet taken.  This call either does not abort async compaction,
or set cc->contended appropriately.  This patch introduces a new
compact_should_abort() function to achieve both.  In isolate_freepages(),
the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to
match what the migration scanner does in the preliminary page checks.  In
case a pageblock is found suitable for calling isolate_freepages_block(),
the checks within there are done on higher frequency.

Second, isolate_freepages() does not check if isolate_freepages_block()
aborted due to contention, and advances to the next pageblock.  This
violates the principle of aborting on contention, and might result in
pageblocks not being scanned completely, since the scanning cursor is
advanced.  This problem has been noticed in the code by Joonsoo Kim when
reviewing related patches.  This patch makes isolate_freepages_block()
check the cc->contended flag and abort.

In case isolate_freepages() has already isolated some pages before
aborting due to contention, page migration will proceed, which is OK since
we do not want to waste the work that has been done, and page migration
has own checks for contention.  However, we do not want another isolation
attempt by either of the scanners, so cc->contended flag check is added
also to compaction_alloc() and compact_finished() to make sure compaction
is aborted right after the migration.

The outcome of the patch should be reduced lock contention by async
compaction and lower latencies for higher-order allocations where direct
compaction is involved.

[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/compaction.c |   54 ++++++++++++++++++++++++++++++++++++++++++++----------
 mm/internal.h   |    5 ++++-
 2 files changed, 48 insertions(+), 11 deletions(-)

--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -222,6 +222,30 @@ static bool compact_checklock_irqsave(sp
 	return true;
 }
 
+/*
+ * Aside from avoiding lock contention, compaction also periodically checks
+ * need_resched() and either schedules in sync compaction or aborts async
+ * compaction. This is similar to what compact_checklock_irqsave() does, but
+ * is used where no lock is concerned.
+ *
+ * Returns false when no scheduling was needed, or sync compaction scheduled.
+ * Returns true when async compaction should abort.
+ */
+static inline bool compact_should_abort(struct compact_control *cc)
+{
+	/* async compaction aborts if contended */
+	if (need_resched()) {
+		if (cc->mode == MIGRATE_ASYNC) {
+			cc->contended = true;
+			return true;
+		}
+
+		cond_resched();
+	}
+
+	return false;
+}
+
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
@@ -494,11 +518,8 @@ isolate_migratepages_range(struct zone *
 			return 0;
 	}
 
-	if (cond_resched()) {
-		/* Async terminates prematurely on need_resched() */
-		if (cc->mode == MIGRATE_ASYNC)
-			return 0;
-	}
+	if (compact_should_abort(cc))
+		return 0;
 
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
@@ -720,9 +741,11 @@ static void isolate_freepages(struct zon
 		/*
 		 * This can iterate a massively long zone without finding any
 		 * suitable migration targets, so periodically check if we need
-		 * to schedule.
+		 * to schedule, or even abort async compaction.
 		 */
-		cond_resched();
+		if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
+						&& compact_should_abort(cc))
+			break;
 
 		if (!pfn_valid(block_start_pfn))
 			continue;
@@ -760,6 +783,13 @@ static void isolate_freepages(struct zon
 		 */
 		if (isolated)
 			cc->finished_update_free = true;
+
+		/*
+		 * isolate_freepages_block() might have aborted due to async
+		 * compaction being contended
+		 */
+		if (cc->contended)
+			break;
 	}
 
 	/* split_free_page does not map the pages */
@@ -786,9 +816,13 @@ static struct page *compaction_alloc(str
 	struct compact_control *cc = (struct compact_control *)data;
 	struct page *freepage;
 
-	/* Isolate free pages if necessary */
+	/*
+	 * Isolate free pages if necessary, and if we are not aborting due to
+	 * contention.
+	 */
 	if (list_empty(&cc->freepages)) {
-		isolate_freepages(cc->zone, cc);
+		if (!cc->contended)
+			isolate_freepages(cc->zone, cc);
 
 		if (list_empty(&cc->freepages))
 			return NULL;
@@ -858,7 +892,7 @@ static int compact_finished(struct zone
 	unsigned int order;
 	unsigned long watermark;
 
-	if (fatal_signal_pending(current))
+	if (cc->contended || fatal_signal_pending(current))
 		return COMPACT_PARTIAL;
 
 	/* Compaction run completes if the migrate and free scanner meet */
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -144,7 +144,10 @@ struct compact_control {
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
-	bool contended;			/* True if a lock was contended */
+	bool contended;			/* True if a lock was contended, or
+					 * need_resched() true during async
+					 * compaction
+					 */
 };
 
 unsigned long



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 117/122] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (109 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 116/122] mm, compaction: properly signal and act upon lock and need_sched() contention Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 118/122] mm: fix direct reclaim writeback regression Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Shaohua Li,
	Rik van Riel, Mel Gorman, Hugh Dickins, Johannes Weiner,
	linux-mm, Peter Zijlstra, Ingo Molnar

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Shaohua Li <shli@kernel.org>

commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 upstream.

We use the accessed bit to age a page at page reclaim time,
and currently we also flush the TLB when doing so.

But in some workloads TLB flush overhead is very heavy. In my
simple multithreaded app with a lot of swap to several pcie
SSDs, removing the tlb flush gives about 20% ~ 30% swapout
speedup.

Fortunately just removing the TLB flush is a valid optimization:
on x86 CPUs, clearing the accessed bit without a TLB flush
doesn't cause data corruption.

It could cause incorrect page aging and the (mistaken) reclaim of
hot pages, but the chance of that should be relatively low.

So as a performance optimization don't flush the TLB when
clearing the accessed bit, it will eventually be flushed by
a context switch or a VM operation anyway. [ In the rare
event of it not getting flushed for a long time the delay
shouldn't really matter because there's no real memory
pressure for swapout to react to. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
[ Rewrote the changelog and the code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/mm/pgtable.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -399,13 +399,20 @@ int pmdp_test_and_clear_young(struct vm_
 int ptep_clear_flush_young(struct vm_area_struct *vma,
 			   unsigned long address, pte_t *ptep)
 {
-	int young;
-
-	young = ptep_test_and_clear_young(vma, address, ptep);
-	if (young)
-		flush_tlb_page(vma, address);
-
-	return young;
+	/*
+	 * On x86 CPUs, clearing the accessed bit without a TLB flush
+	 * doesn't cause data corruption. [ It could cause incorrect
+	 * page aging and the (mistaken) reclaim of hot pages, but the
+	 * chance of that should be relatively low. ]
+	 *
+	 * So as a performance optimization don't flush the TLB when
+	 * clearing the accessed bit, it will eventually be flushed by
+	 * a context switch or a VM operation anyway. [ In the rare
+	 * event of it not getting flushed for a long time the delay
+	 * shouldn't really matter because there's no real memory
+	 * pressure for swapout to react to. ]
+	 */
+	return ptep_test_and_clear_young(vma, address, ptep);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 118/122] mm: fix direct reclaim writeback regression
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (110 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 117/122] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 119/122] fs/superblock: unregister sb shrinker before ->kill_sb() Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dave Jones, Hugh Dickins,
	Linus Torvalds, Mel Gorman

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hughd@google.com>

commit 8bdd638091605dc66d92c57c4b80eb87fffc15f7 upstream.

Shortly before 3.16-rc1, Dave Jones reported:

  WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
           xfs_vm_writepage+0x5ce/0x630 [xfs]()
  CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
  Call Trace:
    xfs_vm_writepage+0x5ce/0x630 [xfs]
    shrink_page_list+0x8f9/0xb90
    shrink_inactive_list+0x253/0x510
    shrink_lruvec+0x563/0x6c0
    shrink_zone+0x3b/0x100
    shrink_zones+0x1f1/0x3c0
    try_to_free_pages+0x164/0x380
    __alloc_pages_nodemask+0x822/0xc90
    alloc_pages_vma+0xaf/0x1c0
    handle_mm_fault+0xa31/0xc50
  etc.

 970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
 971                   PF_MEMALLOC))

I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.

However, my own /var/log/messages now shows similar complaints

  WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
  WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()

from stressing some mmotm trees during July.

Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?

Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.

Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.

Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it.  Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/migrate.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -993,9 +993,10 @@ out:
 	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
 	 * during isolation.
 	 */
-	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+	if (rc != MIGRATEPAGE_SUCCESS && put_new_page) {
+		ClearPageSwapBacked(newpage);
 		put_new_page(newpage, private);
-	else
+	} else
 		putback_lru_page(newpage);
 
 	if (result) {



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 119/122] fs/superblock: unregister sb shrinker before ->kill_sb()
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (111 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 118/122] mm: fix direct reclaim writeback regression Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 120/122] fs/superblock: avoid locking counting inodes and dentries before reclaiming them Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tim Chen, Mel Gorman,
	Johannes Weiner, Hugh Dickins, Dave Chinner, Yuanhan Liu,
	Bob Liu, Jan Kara, Rik van Riel, Al Viro, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Chinner <david@fromorbit.com>

commit 28f2cd4f6da24a1aa06c226618ed5ad69e13df64 upstream.

This series is aimed at regressions noticed during reclaim activity.  The
first two patches are shrinker patches that were posted ages ago but never
merged for reasons that are unclear to me.  I'm posting them again to see
if there was a reason they were dropped or if they just got lost.  Dave?
Time?  The last patch adjusts proportional reclaim.  Yuanhan Liu, can you
retest the vm scalability test cases on a larger machine?  Hugh, does this
work for you on the memcg test cases?

Based on ext4, I get the following results but unfortunately my larger
test machines are all unavailable so this is based on a relatively small
machine.

postmark
                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla       proportion-v1r4
Ops/sec Transactions         21.00 (  0.00%)       25.00 ( 19.05%)
Ops/sec FilesCreate          39.00 (  0.00%)       45.00 ( 15.38%)
Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
Ops/sec DataRead/MB          25.97 (  0.00%)       30.02 ( 15.59%)
Ops/sec DataWrite/MB         49.99 (  0.00%)       57.78 ( 15.58%)

ffsb (mail server simulator)
                                 3.15.0-rc5             3.15.0-rc5
                                    vanilla        proportion-v1r4
Ops/sec readall           9402.63 (  0.00%)      9805.74 (  4.29%)
Ops/sec create            4695.45 (  0.00%)      4781.39 (  1.83%)
Ops/sec delete             173.72 (  0.00%)       177.23 (  2.02%)
Ops/sec Transactions     14271.80 (  0.00%)     14764.37 (  3.45%)
Ops/sec Read                37.00 (  0.00%)        38.50 (  4.05%)
Ops/sec Write               18.20 (  0.00%)        18.50 (  1.65%)

dd of a large file
                                3.15.0-rc5            3.15.0-rc5
                                   vanilla       proportion-v1r4
WallTime DownloadTar       75.00 (  0.00%)       61.00 ( 18.67%)
WallTime DD               423.00 (  0.00%)      401.00 (  5.20%)
WallTime Delete             2.00 (  0.00%)        5.00 (-150.00%)

stutter (times mmap latency during large amounts of IO)

                            3.15.0-rc5            3.15.0-rc5
                               vanilla       proportion-v1r4
Unit >5ms Delays  80252.0000 (  0.00%)  81523.0000 ( -1.58%)
Unit Mmap min         8.2118 (  0.00%)      8.3206 ( -1.33%)
Unit Mmap mean       17.4614 (  0.00%)     17.2868 (  1.00%)
Unit Mmap stddev     24.9059 (  0.00%)     34.6771 (-39.23%)
Unit Mmap max      2811.6433 (  0.00%)   2645.1398 (  5.92%)
Unit Mmap 90%        20.5098 (  0.00%)     18.3105 ( 10.72%)
Unit Mmap 93%        22.9180 (  0.00%)     20.1751 ( 11.97%)
Unit Mmap 95%        25.2114 (  0.00%)     22.4988 ( 10.76%)
Unit Mmap 99%        46.1430 (  0.00%)     43.5952 (  5.52%)
Unit Ideal  Tput     85.2623 (  0.00%)     78.8906 (  7.47%)
Unit Tput min        44.0666 (  0.00%)     43.9609 (  0.24%)
Unit Tput mean       45.5646 (  0.00%)     45.2009 (  0.80%)
Unit Tput stddev      0.9318 (  0.00%)      1.1084 (-18.95%)
Unit Tput max        46.7375 (  0.00%)     46.7539 ( -0.04%)

This patch (of 3):

We will like to unregister the sb shrinker before ->kill_sb().  This will
allow cached objects to be counted without call to grab_super_passive() to
update ref count on sb.  We want to avoid locking during memory
reclamation especially when we are skipping the memory reclaim when we are
out of cached objects.

This is safe because grab_super_passive does a try-lock on the
sb->s_umount now, and so if we are in the unmount process, it won't ever
block.  That means what used to be a deadlock and races we were avoiding
by using grab_super_passive() is now:

        shrinker                        umount

        down_read(shrinker_rwsem)
                                        down_write(sb->s_umount)
                                        shrinker_unregister
                                          down_write(shrinker_rwsem)
                                            <blocks>
        grab_super_passive(sb)
          down_read_trylock(sb->s_umount)
            <fails>
        <shrinker aborts>
        ....
        <shrinkers finish running>
        up_read(shrinker_rwsem)
                                          <unblocks>
                                          <removes shrinker>
                                          up_write(shrinker_rwsem)
                                        ->kill_sb()
                                        ....

So it is safe to deregister the shrinker before ->kill_sb().

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/super.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- a/fs/super.c
+++ b/fs/super.c
@@ -278,10 +278,8 @@ void deactivate_locked_super(struct supe
 	struct file_system_type *fs = s->s_type;
 	if (atomic_dec_and_test(&s->s_active)) {
 		cleancache_invalidate_fs(s);
-		fs->kill_sb(s);
-
-		/* caches are now gone, we can safely kill the shrinker now */
 		unregister_shrinker(&s->s_shrink);
+		fs->kill_sb(s);
 
 		put_filesystem(fs);
 		put_super(s);



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 120/122] fs/superblock: avoid locking counting inodes and dentries before reclaiming them
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (112 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 119/122] fs/superblock: unregister sb shrinker before ->kill_sb() Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 121/122] mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tim Chen, Mel Gorman,
	Johannes Weiner, Hugh Dickins, Dave Chinner, Yuanhan Liu,
	Bob Liu, Jan Kara, Rik van Riel, Al Viro, Andrew Morton,
	Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tim Chen <tim.c.chen@linux.intel.com>

commit d23da150a37c9fe3cc83dbaf71b3e37fd434ed52 upstream.

We remove the call to grab_super_passive in call to super_cache_count.
This becomes a scalability bottleneck as multiple threads are trying to do
memory reclamation, e.g.  when we are doing large amount of file read and
page cache is under pressure.  The cached objects quickly got reclaimed
down to 0 and we are aborting the cache_scan() reclaim.  But counting
creates a log jam acquiring the sb_lock.

We are holding the shrinker_rwsem which ensures the safety of call to
list_lru_count_node() and s_op->nr_cached_objects.  The shrinker is
unregistered now before ->kill_sb() so the operation is safe when we are
doing unmount.

The impact will depend heavily on the machine and the workload but for a
small machine using postmark tuned to use 4xRAM size the results were

                                  3.15.0-rc5            3.15.0-rc5
                                     vanilla         shrinker-v1r1
Ops/sec Transactions         21.00 (  0.00%)       24.00 ( 14.29%)
Ops/sec FilesCreate          39.00 (  0.00%)       44.00 ( 12.82%)
Ops/sec CreateTransact       10.00 (  0.00%)       12.00 ( 20.00%)
Ops/sec FilesDeleted       6202.00 (  0.00%)     6202.00 (  0.00%)
Ops/sec DeleteTransact       11.00 (  0.00%)       12.00 (  9.09%)
Ops/sec DataRead/MB          25.97 (  0.00%)       29.10 ( 12.05%)
Ops/sec DataWrite/MB         49.99 (  0.00%)       56.02 ( 12.06%)

ffsb running in a configuration that is meant to simulate a mail server showed

                                 3.15.0-rc5             3.15.0-rc5
                                    vanilla          shrinker-v1r1
Ops/sec readall           9402.63 (  0.00%)      9567.97 (  1.76%)
Ops/sec create            4695.45 (  0.00%)      4735.00 (  0.84%)
Ops/sec delete             173.72 (  0.00%)       179.83 (  3.52%)
Ops/sec Transactions     14271.80 (  0.00%)     14482.81 (  1.48%)
Ops/sec Read                37.00 (  0.00%)        37.60 (  1.62%)
Ops/sec Write               18.20 (  0.00%)        18.30 (  0.55%)

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/super.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/fs/super.c
+++ b/fs/super.c
@@ -114,9 +114,14 @@ static unsigned long super_cache_count(s
 
 	sb = container_of(shrink, struct super_block, s_shrink);
 
-	if (!grab_super_passive(sb))
-		return 0;
-
+	/*
+	 * Don't call grab_super_passive as it is a potential
+	 * scalability bottleneck. The counts could get updated
+	 * between super_cache_count and super_cache_scan anyway.
+	 * Call to super_cache_count with shrinker_rwsem held
+	 * ensures the safety of call to list_lru_count_node() and
+	 * s_op->nr_cached_objects().
+	 */
 	if (sb->s_op && sb->s_op->nr_cached_objects)
 		total_objects = sb->s_op->nr_cached_objects(sb,
 						 sc->nid);
@@ -127,7 +132,6 @@ static unsigned long super_cache_count(s
 						 sc->nid);
 
 	total_objects = vfs_pressure_ratio(total_objects);
-	drop_super(sb);
 	return total_objects;
 }
 



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 121/122] mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (113 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 120/122] fs/superblock: avoid locking counting inodes and dentries before reclaiming them Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-19 20:52 ` [PATCH 3.14 122/122] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mel Gorman, Johannes Weiner,
	Hugh Dickins, Tim Chen, Dave Chinner, Yuanhan Liu, Bob Liu,
	Jan Kara, Rik van Riel, Al Viro, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgorman@suse.de>

commit 1a501907bbea8e6ebb0b16cf6db9e9cbf1d2c813 upstream.

Commit "mm: vmscan: obey proportional scanning requirements for kswapd"
ensured that file/anon lists were scanned proportionally for reclaim from
kswapd but ignored it for direct reclaim.  The intent was to minimse
direct reclaim latency but Yuanhan Liu pointer out that it substitutes one
long stall for many small stalls and distorts aging for normal workloads
like streaming readers/writers.  Hugh Dickins pointed out that a
side-effect of the same commit was that when one LRU list dropped to zero
that the entirety of the other list was shrunk leading to excessive
reclaim in memcgs.  This patch scans the file/anon lists proportionally
for direct reclaim to similarly age page whether reclaimed by kswapd or
direct reclaim but takes care to abort reclaim if one LRU drops to zero
after reclaiming the requested number of pages.

Based on ext4 and using the Intel VM scalability test

                                              3.15.0-rc5            3.15.0-rc5
                                                shrinker            proportion
Unit  lru-file-readonce    elapsed      5.3500 (  0.00%)      5.4200 ( -1.31%)
Unit  lru-file-readonce time_range      0.2700 (  0.00%)      0.1400 ( 48.15%)
Unit  lru-file-readonce time_stddv      0.1148 (  0.00%)      0.0536 ( 53.33%)
Unit lru-file-readtwice    elapsed      8.1700 (  0.00%)      8.1700 (  0.00%)
Unit lru-file-readtwice time_range      0.4300 (  0.00%)      0.2300 ( 46.51%)
Unit lru-file-readtwice time_stddv      0.1650 (  0.00%)      0.0971 ( 41.16%)

The test cases are running multiple dd instances reading sparse files. The results are within
the noise for the small test machine. The impact of the patch is more noticable from the vmstats

                            3.15.0-rc5  3.15.0-rc5
                              shrinker  proportion
Minor Faults                     35154       36784
Major Faults                       611        1305
Swap Ins                           394        1651
Swap Outs                         4394        5891
Allocation stalls               118616       44781
Direct pages scanned           4935171     4602313
Kswapd pages scanned          15921292    16258483
Kswapd pages reclaimed        15913301    16248305
Direct pages reclaimed         4933368     4601133
Kswapd efficiency                  99%         99%
Kswapd velocity             670088.047  682555.961
Direct efficiency                  99%         99%
Direct velocity             207709.217  193212.133
Percentage direct scans            23%         22%
Page writes by reclaim        4858.000    6232.000
Page writes file                   464         341
Page writes anon                  4394        5891

Note that there are fewer allocation stalls even though the amount
of direct reclaim scanning is very approximately the same.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/vmscan.c |   36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2018,13 +2018,27 @@ static void shrink_lruvec(struct lruvec
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct blk_plug plug;
-	bool scan_adjusted = false;
+	bool scan_adjusted;
 
 	get_scan_count(lruvec, sc, nr);
 
 	/* Record the original scan target for proportional adjustments later */
 	memcpy(targets, nr, sizeof(nr));
 
+	/*
+	 * Global reclaiming within direct reclaim at DEF_PRIORITY is a normal
+	 * event that can occur when there is little memory pressure e.g.
+	 * multiple streaming readers/writers. Hence, we do not abort scanning
+	 * when the requested number of pages are reclaimed when scanning at
+	 * DEF_PRIORITY on the assumption that the fact we are direct
+	 * reclaiming implies that kswapd is not keeping up and it is best to
+	 * do a batch of work at once. For memcg reclaim one check is made to
+	 * abort proportional reclaim if either the file or anon lru has already
+	 * dropped to zero at the first pass.
+	 */
+	scan_adjusted = (global_reclaim(sc) && !current_is_kswapd() &&
+			 sc->priority == DEF_PRIORITY);
+
 	blk_start_plug(&plug);
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
@@ -2045,17 +2059,8 @@ static void shrink_lruvec(struct lruvec
 			continue;
 
 		/*
-		 * For global direct reclaim, reclaim only the number of pages
-		 * requested. Less care is taken to scan proportionally as it
-		 * is more important to minimise direct reclaim stall latency
-		 * than it is to properly age the LRU lists.
-		 */
-		if (global_reclaim(sc) && !current_is_kswapd())
-			break;
-
-		/*
 		 * For kswapd and memcg, reclaim at least the number of pages
-		 * requested. Ensure that the anon and file LRUs shrink
+		 * requested. Ensure that the anon and file LRUs are scanned
 		 * proportionally what was requested by get_scan_count(). We
 		 * stop reclaiming one LRU and reduce the amount scanning
 		 * proportional to the original scan target.
@@ -2063,6 +2068,15 @@ static void shrink_lruvec(struct lruvec
 		nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
 		nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
 
+		/*
+		 * It's just vindictive to attack the larger once the smaller
+		 * has gone to zero.  And given the way we stop scanning the
+		 * smaller below, this makes sure that we only make one nudge
+		 * towards proportionality once we've got nr_to_reclaim.
+		 */
+		if (!nr_file || !nr_anon)
+			break;
+
 		if (nr_file > nr_anon) {
 			unsigned long scan_target = targets[LRU_INACTIVE_ANON] +
 						targets[LRU_ACTIVE_ANON] + 1;



^ permalink raw reply	[flat|nested] 123+ messages in thread

* [PATCH 3.14 122/122] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (114 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 121/122] mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY Greg Kroah-Hartman
@ 2014-11-19 20:52 ` Greg Kroah-Hartman
  2014-11-20  5:32 ` [PATCH 3.14 000/122] 3.14.25-stable review Guenter Roeck
  2014-11-21  1:37 ` Shuah Khan
  117 siblings, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-19 20:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vlastimil Babka, Yong-Taek Lee,
	Bartlomiej Zolnierkiewicz, Joonsoo Kim, Mel Gorman, Minchan Kim,
	KOSAKI Motohiro, Marek Szyprowski, Hugh Dickins, Rik van Riel,
	Michal Nazarewicz, Wang, Yalin, Andrew Morton, Linus Torvalds

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Vlastimil Babka <vbabka@suse.cz>

commit 5bcc9f86ef09a933255ee66bd899d4601785dad5 upstream.

For the MIGRATE_RESERVE pages, it is useful when they do not get
misplaced on free_list of other migratetype, otherwise they might get
allocated prematurely and e.g.  fragment the MIGRATE_RESEVE pageblocks.
While this cannot be avoided completely when allocating new
MIGRATE_RESERVE pageblocks in min_free_kbytes sysctl handler, we should
prevent the misplacement where possible.

Currently, it is possible for the misplacement to happen when a
MIGRATE_RESERVE page is allocated on pcplist through rmqueue_bulk() as a
fallback for other desired migratetype, and then later freed back
through free_pcppages_bulk() without being actually used.  This happens
because free_pcppages_bulk() uses get_freepage_migratetype() to choose
the free_list, and rmqueue_bulk() calls set_freepage_migratetype() with
the *desired* migratetype and not the page's original MIGRATE_RESERVE
migratetype.

This patch fixes the problem by moving the call to
set_freepage_migratetype() from rmqueue_bulk() down to
__rmqueue_smallest() and __rmqueue_fallback() where the actual page's
migratetype (e.g.  from which free_list the page is taken from) is used.
Note that this migratetype might be different from the pageblock's
migratetype due to freepage stealing decisions.  This is OK, as page
stealing never uses MIGRATE_RESERVE as a fallback, and also takes care
to leave all MIGRATE_CMA pages on the correct freelist.

Therefore, as an additional benefit, the call to
get_pageblock_migratetype() from rmqueue_bulk() when CMA is enabled, can
be removed completely.  This relies on the fact that MIGRATE_CMA
pageblocks are created only during system init, and the above.  The
related is_migrate_isolate() check is also unnecessary, as memory
isolation has other ways to move pages between freelists, and drain pcp
lists containing pages that should be isolated.  The buffered_rmqueue()
can also benefit from calling get_freepage_migratetype() instead of
get_pageblock_migratetype().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: "Wang, Yalin" <Yalin.Wang@sonymobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/page_alloc.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -943,6 +943,7 @@ struct page *__rmqueue_smallest(struct z
 		rmv_page_order(page);
 		area->nr_free--;
 		expand(zone, page, order, current_order, area, migratetype);
+		set_freepage_migratetype(page, migratetype);
 		return page;
 	}
 
@@ -1069,7 +1070,9 @@ static int try_to_steal_freepages(struct
 
 	/*
 	 * When borrowing from MIGRATE_CMA, we need to release the excess
-	 * buddy pages to CMA itself.
+	 * buddy pages to CMA itself. We also ensure the freepage_migratetype
+	 * is set to CMA so it is returned to the correct freelist in case
+	 * the page ends up being not actually allocated from the pcp lists.
 	 */
 	if (is_migrate_cma(fallback_type))
 		return fallback_type;
@@ -1137,6 +1140,12 @@ __rmqueue_fallback(struct zone *zone, in
 
 			expand(zone, page, order, current_order, area,
 			       new_type);
+			/* The freepage_migratetype may differ from pageblock's
+			 * migratetype depending on the decisions in
+			 * try_to_steal_freepages. This is OK as long as it does
+			 * not differ for MIGRATE_CMA type.
+			 */
+			set_freepage_migratetype(page, new_type);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype, new_type);
@@ -1187,7 +1196,7 @@ static int rmqueue_bulk(struct zone *zon
 			unsigned long count, struct list_head *list,
 			int migratetype, int cold)
 {
-	int mt = migratetype, i;
+	int i;
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
@@ -1208,14 +1217,8 @@ static int rmqueue_bulk(struct zone *zon
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		if (IS_ENABLED(CONFIG_CMA)) {
-			mt = get_pageblock_migratetype(page);
-			if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
-				mt = migratetype;
-		}
-		set_freepage_migratetype(page, mt);
 		list = &page->lru;
-		if (is_migrate_cma(mt))
+		if (is_migrate_cma(get_freepage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
 					      -(1 << order));
 	}
@@ -1584,7 +1587,7 @@ again:
 		if (!page)
 			goto failed;
 		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pageblock_migratetype(page));
+					  get_freepage_migratetype(page));
 	}
 
 	__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));



^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 000/122] 3.14.25-stable review
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (115 preceding siblings ...)
  2014-11-19 20:52 ` [PATCH 3.14 122/122] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced Greg Kroah-Hartman
@ 2014-11-20  5:32 ` Guenter Roeck
  2014-11-21  1:37 ` Shuah Khan
  117 siblings, 0 replies; 123+ messages in thread
From: Guenter Roeck @ 2014-11-20  5:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, satoru.takeuchi, shuah.kh, stable

On 11/19/2014 12:50 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.25 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Nov 21 20:51:52 UTC 2014.
> Anything received after that time might be too late.
>

Build results:
	total: 137 pass: 137 fail: 0
Qemu test results:
	total: 30 pass: 30 fail: 0

Details at http://server.roeck-us.net:8010/builders.

Guenter


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees
  2014-11-19 20:52 ` [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees Greg Kroah-Hartman
@ 2014-11-20 15:49   ` Johannes Weiner
  2014-11-20 16:20     ` Greg Kroah-Hartman
  2014-11-20 16:21     ` Mel Gorman
  0 siblings, 2 replies; 123+ messages in thread
From: Johannes Weiner @ 2014-11-20 15:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, stable, Rik van Riel, Minchan Kim,
	Andrea Arcangeli, Bob Liu, Christoph Hellwig, Dave Chinner,
	Greg Thelen, Hugh Dickins, Jan Kara, KOSAKI Motohiro,
	Luigi Semenzato, Mel Gorman, Metin Doslu, Michel Lespinasse,
	Ozgun Erdogan, Peter Zijlstra, Roman Gushchin, Ryan Mallon,
	Tejun Heo, Vlastimil Babka, Andrew Morton, Linus Torvalds

On Wed, Nov 19, 2014 at 12:52:32PM -0800, Greg Kroah-Hartman wrote:
> 3.14-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream.
> 
> shmem mappings already contain exceptional entries where swap slot
> information is remembered.
> 
> To be able to store eviction information for regular page cache, prepare
> every site dealing with the radix trees directly to handle entries other
> than pages.

I don't see the rest of the thrash-detection patches in this series.
Without anybody actually planting "entries other than pages" in the
radix trees, why is this needed in -stable?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees
  2014-11-20 15:49   ` Johannes Weiner
@ 2014-11-20 16:20     ` Greg Kroah-Hartman
  2014-11-20 16:21     ` Mel Gorman
  1 sibling, 0 replies; 123+ messages in thread
From: Greg Kroah-Hartman @ 2014-11-20 16:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, stable, Rik van Riel, Minchan Kim,
	Andrea Arcangeli, Bob Liu, Christoph Hellwig, Dave Chinner,
	Greg Thelen, Hugh Dickins, Jan Kara, KOSAKI Motohiro,
	Luigi Semenzato, Mel Gorman, Metin Doslu, Michel Lespinasse,
	Ozgun Erdogan, Peter Zijlstra, Roman Gushchin, Ryan Mallon,
	Tejun Heo, Vlastimil Babka, Andrew Morton, Linus Torvalds

On Thu, Nov 20, 2014 at 10:49:37AM -0500, Johannes Weiner wrote:
> On Wed, Nov 19, 2014 at 12:52:32PM -0800, Greg Kroah-Hartman wrote:
> > 3.14-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream.
> > 
> > shmem mappings already contain exceptional entries where swap slot
> > information is remembered.
> > 
> > To be able to store eviction information for regular page cache, prepare
> > every site dealing with the radix trees directly to handle entries other
> > than pages.
> 
> I don't see the rest of the thrash-detection patches in this series.
> Without anybody actually planting "entries other than pages" in the
> radix trees, why is this needed in -stable?

What specific patches are you referring to?  I still have 31 pending
patches that Mel has backported that I have not applied to the
3.14-stable kernel for his big series of performance improvements.  I've
been slowly feeding them into releases to catch any possible problems
earlier.

So perhaps they have yet to be applied?  Or they are later on in this
patch series (there are 20 more patches in this series after this
patch...)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees
  2014-11-20 15:49   ` Johannes Weiner
  2014-11-20 16:20     ` Greg Kroah-Hartman
@ 2014-11-20 16:21     ` Mel Gorman
  2014-11-20 20:39       ` Johannes Weiner
  1 sibling, 1 reply; 123+ messages in thread
From: Mel Gorman @ 2014-11-20 16:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Greg Kroah-Hartman, linux-kernel, stable, Rik van Riel,
	Minchan Kim, Andrea Arcangeli, Bob Liu, Christoph Hellwig,
	Dave Chinner, Greg Thelen, Hugh Dickins, Jan Kara,
	KOSAKI Motohiro, Luigi Semenzato, Metin Doslu, Michel Lespinasse,
	Ozgun Erdogan, Peter Zijlstra, Roman Gushchin, Ryan Mallon,
	Tejun Heo, Vlastimil Babka, Andrew Morton, Linus Torvalds

On Thu, Nov 20, 2014 at 10:49:37AM -0500, Johannes Weiner wrote:
> On Wed, Nov 19, 2014 at 12:52:32PM -0800, Greg Kroah-Hartman wrote:
> > 3.14-stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream.
> > 
> > shmem mappings already contain exceptional entries where swap slot
> > information is remembered.
> > 
> > To be able to store eviction information for regular page cache, prepare
> > every site dealing with the radix trees directly to handle entries other
> > than pages.
> 
> I don't see the rest of the thrash-detection patches in this series.
> Without anybody actually planting "entries other than pages" in the
> radix trees, why is this needed in -stable?

Patches later on in the stable series depended upon the code structure
introduced by this patch even though the thrash-detection stuff was not
backported. It could have been backported without it but then the
patches would look much different from their mainline equivalent.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees
  2014-11-20 16:21     ` Mel Gorman
@ 2014-11-20 20:39       ` Johannes Weiner
  0 siblings, 0 replies; 123+ messages in thread
From: Johannes Weiner @ 2014-11-20 20:39 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Greg Kroah-Hartman, linux-kernel, stable, Rik van Riel,
	Minchan Kim, Andrea Arcangeli, Bob Liu, Christoph Hellwig,
	Dave Chinner, Greg Thelen, Hugh Dickins, Jan Kara,
	KOSAKI Motohiro, Luigi Semenzato, Metin Doslu, Michel Lespinasse,
	Ozgun Erdogan, Peter Zijlstra, Roman Gushchin, Ryan Mallon,
	Tejun Heo, Vlastimil Babka, Andrew Morton, Linus Torvalds

On Thu, Nov 20, 2014 at 04:21:22PM +0000, Mel Gorman wrote:
> On Thu, Nov 20, 2014 at 10:49:37AM -0500, Johannes Weiner wrote:
> > I don't see the rest of the thrash-detection patches in this series.
> > Without anybody actually planting "entries other than pages" in the
> > radix trees, why is this needed in -stable?
> 
> Patches later on in the stable series depended upon the code structure
> introduced by this patch even though the thrash-detection stuff was not
> backported. It could have been backported without it but then the
> patches would look much different from their mainline equivalent.

That makes sense.  Thanks!

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [PATCH 3.14 000/122] 3.14.25-stable review
  2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
                   ` (116 preceding siblings ...)
  2014-11-20  5:32 ` [PATCH 3.14 000/122] 3.14.25-stable review Guenter Roeck
@ 2014-11-21  1:37 ` Shuah Khan
  117 siblings, 0 replies; 123+ messages in thread
From: Shuah Khan @ 2014-11-21  1:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, satoru.takeuchi, shuah.kh, stable

On 11/19/2014 01:50 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.25 release.
> There are 122 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Nov 21 20:51:52 UTC 2014.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.14.25-rc1.gz
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

-- Shuah

-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply	[flat|nested] 123+ messages in thread

end of thread, other threads:[~2014-11-21  1:37 UTC | newest]

Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-19 20:50 [PATCH 3.14 000/122] 3.14.25-stable review Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 001/122] Revert "drivers/net: Disable UFO through virtio" Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 002/122] ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 003/122] vti6: Use vti6_dev_init " Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 004/122] sit: Use ipip6_tunnel_init " Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 005/122] gre6: Move the setting of dev->iflink into the ndo_init functions Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 006/122] vxlan: Do not reuse sockets for a different address family Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 007/122] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 008/122] net: sctp: fix memory leak in auth key management Greg Kroah-Hartman
2014-11-19 20:50 ` [PATCH 3.14 009/122] smsc911x: power-up phydev before doing a software reset Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 010/122] sunvdc: add cdrom and v1.1 protocol support Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 011/122] sunvdc: compute vdisk geometry from capacity Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 012/122] sunvdc: limit each sg segment to a page Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 013/122] vio: fix reuse of vio_dring slot Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 014/122] sunvdc: dont call VD_OP_GET_VTOC Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 015/122] sparc64: Fix crashes in schizo_pcierr_intr_other() Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 016/122] sparc64: Do irq_{enter,exit}() around generic_smp_call_function*() Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 017/122] sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 018/122] zram: avoid kunmap_atomic() of a NULL pointer Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 019/122] crypto: caam - fix missing dma unmap on error path Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 020/122] crypto: caam - remove duplicated sg copy functions Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 021/122] hwrng: pseries - port to new read API and fix stack corruption Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 022/122] tun: Fix csum_start with VLAN acceleration Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 023/122] x86, x32, audit: Fix x32s AUDIT_ARCH wrt audit Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 024/122] audit: correct AUDIT_GET_FEATURE return message type Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 025/122] audit: AUDIT_FEATURE_CHANGE message format missing delimiting space Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 026/122] audit: keep inode pinned Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 027/122] ahci: Add Device IDs for Intel Sunrise Point PCH Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 028/122] ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 029/122] ALSA: usb-audio: Fix memory leak in FTU quirk Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 030/122] xtensa: re-wire umount syscall to sys_oldumount Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 031/122] libceph: do not crash on large auth tickets Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 032/122] macvtap: Fix csum_start when VLAN tags are present Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 033/122] mac80211_hwsim: release driver when ieee80211_register_hw fails Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 034/122] mac80211: properly flush delayed scan work on interface removal Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 035/122] mac80211: use secondary channel offset IE also beacons during CSA Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 036/122] mac80211: schedule the actual switch of the station before CSA count 0 Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 037/122] mac80211: fix use-after-free in defragmentation Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 038/122] drm/radeon: set correct CE ram size for CIK Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 039/122] drm/radeon: make sure mode init is complete in bandwidth_update Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 040/122] drm/radeon: add missing crtc unlock when setting up the MC Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 042/122] ARM: 8191/1: decompressor: ensure I-side picks up relocated code Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 043/122] pinctrl: dra: dt-bindings: Fix output pull up/down Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 044/122] dm thin: grab a virtual cell before looking up the mapping Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 046/122] firewire: cdev: prevent kernel stack leaking into ioctl arguments Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 047/122] ata: sata_rcar: Disable DIPM mode for r8a7790 ES1 Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 048/122] nfs: fix pnfs direct write memory leak Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 049/122] Correct the race condition in aarch64_insn_patch_text_sync() Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 050/122] scsi: only re-lock door after EH on devices that were reset Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 051/122] parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 052/122] block: Fix computation of merged request priority Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 053/122] dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 054/122] dm btree: fix a recursion depth bug in btree walking code Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 055/122] dm raid: ensure superblocks size matches devices logical block size Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 056/122] Input: synaptics - add min/max quirk for Lenovo T440s Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 060/122] power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 061/122] power: charger-manager: Fix accessing invalidated power supply after charger unbind Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 062/122] power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 063/122] power: bq2415x_charger: Fix memory leak on DTS parsing error Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 064/122] x86, microcode, AMD: Fix early ucode loading on 32-bit Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 065/122] x86, microcode, AMD: Fix ucode patch stashing " Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 066/122] x86, kaslr: Prevent .bss from overlaping initrd Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 067/122] md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 068/122] NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Greg Kroah-Hartman
2014-11-19 20:51 ` [PATCH 3.14 069/122] NFS: Dont try to reclaim delegation open state if recovery failed Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 070/122] nfs: Fix use of uninitialized variable in nfs_getattr() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 071/122] NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 072/122] NFSv4.1: nfs41_clear_delegation_stateid shouldnt trust NFS_DELEGATED_STATE Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 073/122] media: ttusb-dec: buffer overflow in ioctl Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 074/122] memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 075/122] vmcore: Remove "weak" from function declarations Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 076/122] kgdb: Remove "weak" from kgdb_arch_pc() declaration Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 077/122] clocksource: Remove "weak" from clocksource_default_clock() declaration Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 078/122] IB/core: Clear AH attr variable to prevent garbage data Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 079/122] ipc: always handle a new value of auto_msgmni Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 080/122] netfilter: ipset: off by one in ip_set_nfnl_get_byindex() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 081/122] netfilter: nf_log: account for size of NLMSG_DONE attribute Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 082/122] netfilter: nfnetlink_log: fix maximum packet length logged to userspace Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 083/122] netfilter: nf_log: release skbuff on nlmsg put failure Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 084/122] netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 085/122] netfilter: xt_bpf: add mising opaque struct sk_filter definition Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 086/122] ARM: probes: fix instruction fetch order with <asm/opcodes.h> Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 087/122] GFS2: Fix address space from page function Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 088/122] rcu: Make callers awaken grace-period kthread Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 089/122] rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 091/122] perf: Handle compat ioctl Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 092/122] perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 093/122] KVM: x86: Dont report guest userspace emulation error to userspace Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 094/122] net: sctp: fix remote memory pressure from excessive queueing Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 095/122] net: sctp: fix panic on duplicate ASCONF chunks Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 096/122] net: sctp: fix skb_over_panic when receiving malformed " Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 097/122] iwlwifi: configure the LTR Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 098/122] regmap: fix kernel hang on regmap_bulk_write with zero val_count Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 099/122] lib: radix-tree: add radix_tree_delete_item() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 100/122] mm: shmem: save one radix tree lookup when truncating swapped pages Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 101/122] mm: filemap: move radix tree hole searching here Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 102/122] mm + fs: prepare for non-page entries in page cache radix trees Greg Kroah-Hartman
2014-11-20 15:49   ` Johannes Weiner
2014-11-20 16:20     ` Greg Kroah-Hartman
2014-11-20 16:21     ` Mel Gorman
2014-11-20 20:39       ` Johannes Weiner
2014-11-19 20:52 ` [PATCH 3.14 103/122] mm: madvise: fix MADV_WILLNEED on shmem swapouts Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 104/122] mm: remove read_cache_page_async() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 105/122] callers of iov_copy_from_user_atomic() dont need pagecache_disable() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 106/122] mm/readahead.c: inline ra_submit Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 107/122] mm/compaction: clean up unused code lines Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 108/122] mm/compaction: cleanup isolate_freepages() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 109/122] mm, migration: add destination page freeing callback Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 110/122] mm, compaction: return failed migration target pages back to freelist Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 111/122] mm, compaction: add per-zone migration pfn cache for async compaction Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 112/122] mm, compaction: embed migration mode in compact_control Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 113/122] mm, compaction: terminate async compaction when rescheduling Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 114/122] mm/compaction: do not count migratepages when unnecessary Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 115/122] mm/compaction: avoid rescanning pageblocks in isolate_freepages Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 116/122] mm, compaction: properly signal and act upon lock and need_sched() contention Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 117/122] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 118/122] mm: fix direct reclaim writeback regression Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 119/122] fs/superblock: unregister sb shrinker before ->kill_sb() Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 120/122] fs/superblock: avoid locking counting inodes and dentries before reclaiming them Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 121/122] mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY Greg Kroah-Hartman
2014-11-19 20:52 ` [PATCH 3.14 122/122] mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced Greg Kroah-Hartman
2014-11-20  5:32 ` [PATCH 3.14 000/122] 3.14.25-stable review Guenter Roeck
2014-11-21  1:37 ` Shuah Khan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).