All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.9 00/72] 4.9.21-stable review
@ 2017-04-06  8:37 Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
                   ` (66 more replies)
  0 siblings, 67 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, stable

This is the start of the stable review cycle for the 4.9.21 release.
There are 72 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat Apr  8 08:36:01 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.21-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.9.21-rc1

Keith Busch <keith.busch@intel.com>
    nvme/pci: Disable on removal when disconnected

Keith Busch <keith.busch@intel.com>
    nvme/core: Fix race kicking freed request_queue

Jason A. Donenfeld <Jason@zx2c4.com>
    padata: avoid race in reordering

NeilBrown <neilb@suse.com>
    blk: Ensure users for current->bio_list can see the full list.

NeilBrown <neilb@suse.com>
    blk: improve order of bio handling in generic_make_request()

Johannes Weiner <hannes@cmpxchg.org>
    mm: workingset: fix premature shadow node shrinking with cgroups

Felix Fietkau <nbd@nbd.name>
    MIPS: Lantiq: Fix cascaded IRQ setup

Jon Mason <jon.mason@broadcom.com>
    ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags

Joe Carnuccio <joe.carnuccio@cavium.com>
    qla2xxx: Allow vref count to timeout on vport delete.

Rafał Miłecki <rafal@milecki.pl>
    ARM: BCM5301X: Add back handler ignoring external imprecise aborts

Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()

Johannes Weiner <hannes@cmpxchg.org>
    mm: rmap: fix huge file mmap accounting in the memcg stats

Kees Cook <keescook@chromium.org>
    lib/syscall: Clear return values when no stack

Tony Luck <tony.luck@intel.com>
    x86/mce: Fix copy/paste error in exception table entries

Baoquan He <bhe@redhat.com>
    x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

Lucas Stach <l.stach@pengutronix.de>
    drm/etnaviv: (re-)protect fence allocation with GPU mutex

Eric Anholt <eric@anholt.net>
    drm/vc4: Allocate the right amount of space for boot-time CRTC state.

Michel Dänzer <michel.daenzer@amd.com>
    drm/radeon: Override fpfn for all VRAM placements in radeon_evict_flags

David Hildenbrand <david@redhat.com>
    KVM: kvm_io_bus_unregister_dev() should never fail

Peter Xu <peterx@redhat.com>
    KVM: x86: clear bus pointer when destroyed

Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
    serial: mxs-auart: Fix baudrate calculation

Alan Stern <stern@rowland.harvard.edu>
    USB: fix linked-list corruption in rh_call_control()

Nicolas Ferre <nicolas.ferre@microchip.com>
    tty/serial: atmel: fix TX path in atmel_console_write()

Richard Genoud <richard.genoud@gmail.com>
    tty/serial: atmel: fix race condition (TX+DMA)

Joerg Roedel <jroedel@suse.de>
    ACPI: Do not create a platform_device for IOAPIC/IOxAPIC

Josh Poimboeuf <jpoimboe@redhat.com>
    ACPI: Fix incompatibility with mcount-based function graph tracing

Helge Deller <deller@gmx.de>
    parisc: Fix access fault handling in pa_memcpy()

Helge Deller <deller@gmx.de>
    parisc: Avoid stalled CPU warnings after system shutdown

Helge Deller <deller@gmx.de>
    parisc: Clean up fixup routines for get_user()/put_user()

Kinglong Mee <kinglongmee@gmail.com>
    nfsd: map the ENOKEY to nfserr_perm for avoiding warning

Olga Kornievskaia <kolga@netapp.com>
    NFSv4.1 fix infinite loop on IO BAD_STATEID error

Ludovic Desroches <ludovic.desroches@microchip.com>
    mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection

Hans de Goede <hdegoede@redhat.com>
    mmc: sdhci: Disable runtime pm when the sdio_irq is enabled

Aaron Armstrong Skomra <skomra@gmail.com>
    HID: wacom: Don't add ghost interface as shared data

Takashi Sakamoto <takashi.sakamoto@miraclelinux.com>
    ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer

Songjun Wu <songjun.wu@microchip.com>
    ASoC: atmel-classd: fix audio clock rate

Hui Wang <hui.wang@canonical.com>
    ALSA: hda - fix a problem for lineout on a Dell AIO machine

Takashi Iwai <tiwai@suse.de>
    ALSA: seq: Fix race during FIFO resize

Bjorn Helgaas <bhelgaas@google.com>
    PCI: iproc: Save host bridge window resource in struct iproc_pcie

Bart Van Assche <bart.vanassche@sandisk.com>
    scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function

Bart Van Assche <bart.vanassche@sandisk.com>
    scsi: scsi_dh_alua: Check scsi_device_get() return value

John Garry <john.garry@huawei.com>
    scsi: libsas: fix ata xfer length

peter chang <dpf@google.com>
    scsi: sg: check length passed to SG_NEXT_CMD_LEN

Christoph Hellwig <hch@lst.de>
    xfs: try any AG when allocating the first btree block when reflinking

Brian Foster <bfoster@redhat.com>
    xfs: use iomap new flag for newly allocated delalloc blocks

Chandan Rajendra <chandan@linux.vnet.ibm.com>
    xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask

Christoph Hellwig <hch@lst.de>
    xfs: fix and streamline error handling in xfs_end_io

Christoph Hellwig <hch@lst.de>
    xfs: only reclaim unwritten COW extents periodically

Christoph Hellwig <hch@lst.de>
    xfs: tune down agno asserts in the bmap code

Chandan Rajendra <chandan@linux.vnet.ibm.com>
    xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment

Brian Foster <bfoster@redhat.com>
    xfs: don't reserve blocks for right shift transactions

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fix uninitialized variable in _reflink_convert_cow

Brian Foster <bfoster@redhat.com>
    xfs: split indlen reservations fairly when under reserved

Brian Foster <bfoster@redhat.com>
    xfs: handle indlen shortage on delalloc extent merge

Christoph Hellwig <hch@lst.de>
    xfs: don't fail xfs_extent_busy allocation

Christoph Hellwig <hch@lst.de>
    xfs: reject all unaligned direct writes to reflinked files

Christoph Hellwig <hch@lst.de>
    xfs: update ctime and mtime on clone destinatation inodes

Hou Tao <houtao1@huawei.com>
    xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: mark speculative prealloc CoW fork extents unwritten

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: allow unwritten extents in the CoW fork

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: verify free block header fields

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: check for obviously bad level values in the bmbt root

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: filter out obviously bad btree pointers

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fail _dir_open when readahead fails

Darrick J. Wong <darrick.wong@oracle.com>
    xfs: fix toctou race when locking an inode to access the data map

Brian Foster <bfoster@redhat.com>
    xfs: fix eofblocks race with file extending async dio writes

Brian Foster <bfoster@redhat.com>
    xfs: sync eofblocks scans under iolock are livelock prone

Brian Foster <bfoster@redhat.com>
    xfs: pull up iolock from xfs_free_eofblocks()

Christoph Hellwig <hch@lst.de>
    xfs: use per-AG reservations for the finobt

Christoph Hellwig <hch@lst.de>
    xfs: only update mount/resv fields on success in __xfs_ag_resv_init

Ross Lagerwall <ross.lagerwall@citrix.com>
    xen/setup: Don't relocate p2m over existing one

Ilya Dryomov <idryomov@gmail.com>
    libceph: force GFP_NOIO for socket allocations


-------------

Diffstat:

 Makefile                                   |   4 +-
 arch/arm/boot/dts/bcm5301x.dtsi            |   4 +-
 arch/arm/mach-bcm/bcm_5301x.c              |  28 ++
 arch/mips/lantiq/irq.c                     |  38 ++-
 arch/parisc/include/asm/uaccess.h          |  59 ++--
 arch/parisc/kernel/parisc_ksyms.c          |  10 -
 arch/parisc/kernel/process.c               |   2 +
 arch/parisc/lib/Makefile                   |   2 +-
 arch/parisc/lib/fixup.S                    |  98 ------
 arch/parisc/lib/lusercopy.S                | 318 ++++++++++++++++++++
 arch/parisc/lib/memcpy.c                   | 461 +----------------------------
 arch/parisc/mm/fault.c                     |  17 ++
 arch/x86/lib/memcpy_64.S                   |   2 +-
 arch/x86/mm/kaslr.c                        |   4 +-
 arch/x86/xen/setup.c                       |   6 +-
 block/bio.c                                |  12 +-
 block/blk-core.c                           |  39 ++-
 drivers/acpi/Makefile                      |   1 -
 drivers/acpi/acpi_platform.c               |   8 +-
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c      |   4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c        |   4 +-
 drivers/gpu/drm/vc4/vc4_crtc.c             |  13 +-
 drivers/hid/wacom_sys.c                    |  16 +-
 drivers/md/dm.c                            |  29 +-
 drivers/md/raid10.c                        |   3 +-
 drivers/mmc/host/sdhci-of-at91.c           |  11 +-
 drivers/mmc/host/sdhci.c                   |   6 +
 drivers/nvme/host/core.c                   |   6 +-
 drivers/nvme/host/pci.c                    |   4 +-
 drivers/pci/host/pcie-iproc-bcma.c         |  24 +-
 drivers/pci/host/pcie-iproc-platform.c     |  19 +-
 drivers/pci/host/pcie-iproc.h              |   1 +
 drivers/scsi/device_handler/scsi_dh_alua.c |  38 ++-
 drivers/scsi/libsas/sas_ata.c              |   2 +-
 drivers/scsi/qla2xxx/qla_attr.c            |   2 -
 drivers/scsi/qla2xxx/qla_def.h             |   3 +
 drivers/scsi/qla2xxx/qla_init.c            |   1 +
 drivers/scsi/qla2xxx/qla_mid.c             |  14 +-
 drivers/scsi/qla2xxx/qla_os.c              |   1 +
 drivers/scsi/sg.c                          |   2 +
 drivers/tty/serial/atmel_serial.c          |   8 +
 drivers/tty/serial/mxs-auart.c             |   2 +-
 drivers/usb/core/hcd.c                     |   7 +-
 fs/nfs/nfs4proc.c                          |   9 +-
 fs/nfsd/nfsproc.c                          |   1 +
 fs/xfs/libxfs/xfs_ag_resv.c                |  70 ++++-
 fs/xfs/libxfs/xfs_bmap.c                   | 211 +++++++------
 fs/xfs/libxfs/xfs_bmap_btree.c             |   6 +-
 fs/xfs/libxfs/xfs_btree.c                  |   3 +-
 fs/xfs/libxfs/xfs_btree.h                  |   2 +-
 fs/xfs/libxfs/xfs_da_btree.c               |   6 +-
 fs/xfs/libxfs/xfs_da_btree.h               |   2 +-
 fs/xfs/libxfs/xfs_dir2_node.c              |  51 +++-
 fs/xfs/libxfs/xfs_ialloc.c                 |   3 +-
 fs/xfs/libxfs/xfs_ialloc_btree.c           |  90 +++++-
 fs/xfs/libxfs/xfs_ialloc_btree.h           |   3 +
 fs/xfs/libxfs/xfs_inode_fork.c             |   9 +-
 fs/xfs/xfs_aops.c                          | 110 +++----
 fs/xfs/xfs_bmap_util.c                     |  62 ++--
 fs/xfs/xfs_bmap_util.h                     |   3 +-
 fs/xfs/xfs_buf_item.c                      |   1 +
 fs/xfs/xfs_extent_busy.c                   |  13 +-
 fs/xfs/xfs_file.c                          |  26 +-
 fs/xfs/xfs_icache.c                        |  61 ++--
 fs/xfs/xfs_icache.h                        |   2 -
 fs/xfs/xfs_inode.c                         |  76 ++---
 fs/xfs/xfs_iomap.c                         |  18 +-
 fs/xfs/xfs_mount.c                         |   3 +-
 fs/xfs/xfs_mount.h                         |   1 +
 fs/xfs/xfs_reflink.c                       | 151 ++++++++--
 fs/xfs/xfs_reflink.h                       |   6 +-
 fs/xfs/xfs_super.c                         |   2 +-
 fs/xfs/xfs_trace.h                         |  10 +-
 include/linux/kvm_host.h                   |   4 +-
 include/linux/memcontrol.h                 |   6 +
 kernel/padata.c                            |   5 +-
 lib/syscall.c                              |   1 +
 mm/hugetlb.c                               |   6 +-
 mm/rmap.c                                  |   4 +-
 mm/workingset.c                            |   2 +-
 net/ceph/messenger.c                       |   6 +
 sound/core/seq/seq_fifo.c                  |   4 +
 sound/pci/hda/patch_realtek.c              |  12 +-
 sound/soc/atmel/atmel-classd.c             |   2 +-
 sound/soc/intel/skylake/skl-topology.c     |   2 +-
 virt/kvm/eventfd.c                         |   3 +-
 virt/kvm/kvm_main.c                        |  42 ++-
 87 files changed, 1333 insertions(+), 1110 deletions(-)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sergey Jerusalimov, Ilya Dryomov,
	Jeff Layton

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ilya Dryomov <idryomov@gmail.com>

commit 633ee407b9d15a75ac9740ba9d3338815e1fcb95 upstream.

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [<ffffffff816dd629>] schedule+0x29/0x70
    [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
    [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
    [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
    [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
    [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
    [<ffffffff81086335>] flush_work+0x165/0x250
    [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
    [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
    [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
    [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
    [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
    [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
    [<ffffffff8115af70>] shrink_slab+0x100/0x140
    [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
    [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
    [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
    [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
    [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
    [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
    [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
    [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
    [<ffffffff811d8566>] alloc_inode+0x26/0xa0
    [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
    [<ffffffff815b933e>] sock_alloc+0x1e/0x80
    [<ffffffff815ba855>] __sock_create+0x95/0x220
    [<ffffffff815baa04>] sock_create_kern+0x24/0x30
    [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
    [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [<ffffffff81084c19>] process_one_work+0x159/0x4f0
    [<ffffffff8108561b>] worker_thread+0x11b/0x530
    [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
    [<ffffffff8108b6f9>] kthread+0xc9/0xe0
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
    [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Link: http://tracker.ceph.com/issues/19309
Reported-by: Sergey Jerusalimov <wintchester@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/ceph/messenger.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -7,6 +7,7 @@
 #include <linux/kthread.h>
 #include <linux/net.h>
 #include <linux/nsproxy.h>
+#include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/socket.h>
 #include <linux/string.h>
@@ -469,11 +470,16 @@ static int ceph_tcp_connect(struct ceph_
 {
 	struct sockaddr_storage *paddr = &con->peer_addr.in_addr;
 	struct socket *sock;
+	unsigned int noio_flag;
 	int ret;
 
 	BUG_ON(con->sock);
+
+	/* sock_create_kern() allocates with GFP_KERNEL */
+	noio_flag = memalloc_noio_save();
 	ret = sock_create_kern(read_pnet(&con->msgr->net), paddr->ss_family,
 			       SOCK_STREAM, IPPROTO_TCP, &sock);
+	memalloc_noio_restore(noio_flag);
 	if (ret)
 		return ret;
 	sock->sk->sk_allocation = GFP_NOFS;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 03/72] xfs: only update mount/resv fields on success in __xfs_ag_resv_init Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ross Lagerwall, Juergen Gross

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ross Lagerwall <ross.lagerwall@citrix.com>

commit 7ecec8503af37de6be4f96b53828d640a968705f upstream.

When relocating the p2m, take special care not to relocate it so
that is overlaps with the current location of the p2m/initrd. This is
needed since the full extent of the current location is not marked as a
reserved region in the e820.

This was seen to happen to a dom0 with a large initial p2m and a small
reserved region in the middle of the initial p2m.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/xen/setup.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -713,10 +713,9 @@ static void __init xen_reserve_xen_mfnli
 		size = PFN_PHYS(xen_start_info->nr_p2m_frames);
 	}
 
-	if (!xen_is_e820_reserved(start, size)) {
-		memblock_reserve(start, size);
+	memblock_reserve(start, size);
+	if (!xen_is_e820_reserved(start, size))
 		return;
-	}
 
 #ifdef CONFIG_X86_32
 	/*
@@ -727,6 +726,7 @@ static void __init xen_reserve_xen_mfnli
 	BUG();
 #else
 	xen_relocate_p2m();
+	memblock_free(start, size);
 #endif
 }
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 03/72] xfs: only update mount/resv fields on success in __xfs_ag_resv_init
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 04/72] xfs: use per-AG reservations for the finobt Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 4dfa2b84118fd6c95202ae87e62adf5000ccd4d0 upstream.

Try to reserve the blocks first and only then update the fields in
or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
again after a previous failure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_ag_resv.c |   23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -200,22 +200,27 @@ __xfs_ag_resv_init(
 	struct xfs_mount		*mp = pag->pag_mount;
 	struct xfs_ag_resv		*resv;
 	int				error;
+	xfs_extlen_t			reserved;
 
-	resv = xfs_perag_resv(pag, type);
 	if (used > ask)
 		ask = used;
-	resv->ar_asked = ask;
-	resv->ar_reserved = resv->ar_orig_reserved = ask - used;
-	mp->m_ag_max_usable -= ask;
+	reserved = ask - used;
 
-	trace_xfs_ag_resv_init(pag, type, ask);
-
-	error = xfs_mod_fdblocks(mp, -(int64_t)resv->ar_reserved, true);
-	if (error)
+	error = xfs_mod_fdblocks(mp, -(int64_t)reserved, true);
+	if (error) {
 		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
 				error, _RET_IP_);
+		return error;
+	}
 
-	return error;
+	mp->m_ag_max_usable -= ask;
+
+	resv = xfs_perag_resv(pag, type);
+	resv->ar_asked = ask;
+	resv->ar_reserved = resv->ar_orig_reserved = reserved;
+
+	trace_xfs_ag_resv_init(pag, type, ask);
+	return 0;
 }
 
 /* Create a per-AG block reservation. */

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 04/72] xfs: use per-AG reservations for the finobt
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 03/72] xfs: only update mount/resv fields on success in __xfs_ag_resv_init Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 05/72] xfs: pull up iolock from xfs_free_eofblocks() Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 76d771b4cbe33c581bd6ca2710c120be51172440 upstream.

Currently we try to rely on the global reserved block pool for block
allocations for the free inode btree, but I have customer reports
(fairly complex workload, need to find an easier reproducer) where that
is not enough as the AG where we free an inode that requires a new
finobt block is entirely full.  This causes us to cancel a dirty
transaction and thus a file system shutdown.

I think the right way to guard against this is to treat the finot the same
way as the refcount btree and have a per-AG reservations for the possible
worst case size of it, and the patch below implements that.

Note that this could increase mount times with large finobt trees.  In
an ideal world we would have added a field for the number of finobt
fields to the AGI, similar to what we did for the refcount blocks.
We should do add it next time we rev the AGI or AGF format by adding
new fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_ag_resv.c      |   47 +++++++++++++++++---
 fs/xfs/libxfs/xfs_ialloc_btree.c |   90 +++++++++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc_btree.h |    3 +
 fs/xfs/xfs_inode.c               |   23 +++++----
 fs/xfs/xfs_mount.h               |    1 
 5 files changed, 144 insertions(+), 20 deletions(-)

--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -39,6 +39,7 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_btree.h"
 #include "xfs_refcount_btree.h"
+#include "xfs_ialloc_btree.h"
 
 /*
  * Per-AG Block Reservations
@@ -210,6 +211,9 @@ __xfs_ag_resv_init(
 	if (error) {
 		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
 				error, _RET_IP_);
+		xfs_warn(mp,
+"Per-AG reservation for AG %u failed.  Filesystem may run out of space.",
+				pag->pag_agno);
 		return error;
 	}
 
@@ -228,6 +232,8 @@ int
 xfs_ag_resv_init(
 	struct xfs_perag		*pag)
 {
+	struct xfs_mount		*mp = pag->pag_mount;
+	xfs_agnumber_t			agno = pag->pag_agno;
 	xfs_extlen_t			ask;
 	xfs_extlen_t			used;
 	int				error = 0;
@@ -236,23 +242,45 @@ xfs_ag_resv_init(
 	if (pag->pag_meta_resv.ar_asked == 0) {
 		ask = used = 0;
 
-		error = xfs_refcountbt_calc_reserves(pag->pag_mount,
-				pag->pag_agno, &ask, &used);
+		error = xfs_refcountbt_calc_reserves(mp, agno, &ask, &used);
 		if (error)
 			goto out;
 
-		error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
-				ask, used);
+		error = xfs_finobt_calc_reserves(mp, agno, &ask, &used);
 		if (error)
 			goto out;
+
+		error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
+				ask, used);
+		if (error) {
+			/*
+			 * Because we didn't have per-AG reservations when the
+			 * finobt feature was added we might not be able to
+			 * reserve all needed blocks.  Warn and fall back to the
+			 * old and potentially buggy code in that case, but
+			 * ensure we do have the reservation for the refcountbt.
+			 */
+			ask = used = 0;
+
+			mp->m_inotbt_nores = true;
+
+			error = xfs_refcountbt_calc_reserves(mp, agno, &ask,
+					&used);
+			if (error)
+				goto out;
+
+			error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
+					ask, used);
+			if (error)
+				goto out;
+		}
 	}
 
 	/* Create the AGFL metadata reservation */
 	if (pag->pag_agfl_resv.ar_asked == 0) {
 		ask = used = 0;
 
-		error = xfs_rmapbt_calc_reserves(pag->pag_mount, pag->pag_agno,
-				&ask, &used);
+		error = xfs_rmapbt_calc_reserves(mp, agno, &ask, &used);
 		if (error)
 			goto out;
 
@@ -261,9 +289,16 @@ xfs_ag_resv_init(
 			goto out;
 	}
 
+#ifdef DEBUG
+	/* need to read in the AGF for the ASSERT below to work */
+	error = xfs_alloc_pagf_init(pag->pag_mount, NULL, pag->pag_agno, 0);
+	if (error)
+		return error;
+
 	ASSERT(xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved +
 	       xfs_perag_resv(pag, XFS_AG_RESV_AGFL)->ar_reserved <=
 	       pag->pagf_freeblks + pag->pagf_flcount);
+#endif
 out:
 	return error;
 }
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -82,11 +82,12 @@ xfs_finobt_set_root(
 }
 
 STATIC int
-xfs_inobt_alloc_block(
+__xfs_inobt_alloc_block(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_ptr	*start,
 	union xfs_btree_ptr	*new,
-	int			*stat)
+	int			*stat,
+	enum xfs_ag_resv_type	resv)
 {
 	xfs_alloc_arg_t		args;		/* block allocation args */
 	int			error;		/* error return value */
@@ -103,6 +104,7 @@ xfs_inobt_alloc_block(
 	args.maxlen = 1;
 	args.prod = 1;
 	args.type = XFS_ALLOCTYPE_NEAR_BNO;
+	args.resv = resv;
 
 	error = xfs_alloc_vextent(&args);
 	if (error) {
@@ -123,6 +125,27 @@ xfs_inobt_alloc_block(
 }
 
 STATIC int
+xfs_inobt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	return __xfs_inobt_alloc_block(cur, start, new, stat, XFS_AG_RESV_NONE);
+}
+
+STATIC int
+xfs_finobt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	return __xfs_inobt_alloc_block(cur, start, new, stat,
+			XFS_AG_RESV_METADATA);
+}
+
+STATIC int
 xfs_inobt_free_block(
 	struct xfs_btree_cur	*cur,
 	struct xfs_buf		*bp)
@@ -328,7 +351,7 @@ static const struct xfs_btree_ops xfs_fi
 
 	.dup_cursor		= xfs_inobt_dup_cursor,
 	.set_root		= xfs_finobt_set_root,
-	.alloc_block		= xfs_inobt_alloc_block,
+	.alloc_block		= xfs_finobt_alloc_block,
 	.free_block		= xfs_inobt_free_block,
 	.get_minrecs		= xfs_inobt_get_minrecs,
 	.get_maxrecs		= xfs_inobt_get_maxrecs,
@@ -478,3 +501,64 @@ xfs_inobt_rec_check_count(
 	return 0;
 }
 #endif	/* DEBUG */
+
+static xfs_extlen_t
+xfs_inobt_max_size(
+	struct xfs_mount	*mp)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_inobt_mxr[0] == 0)
+		return 0;
+
+	return xfs_btree_calc_size(mp, mp->m_inobt_mnr,
+		(uint64_t)mp->m_sb.sb_agblocks * mp->m_sb.sb_inopblock /
+				XFS_INODES_PER_CHUNK);
+}
+
+static int
+xfs_inobt_count_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum,
+	xfs_extlen_t		*tree_blocks)
+{
+	struct xfs_buf		*agbp;
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
+	if (error)
+		return error;
+
+	cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno, btnum);
+	error = xfs_btree_count_blocks(cur, tree_blocks);
+	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+	xfs_buf_relse(agbp);
+
+	return error;
+}
+
+/*
+ * Figure out how many blocks to reserve and how many are used by this btree.
+ */
+int
+xfs_finobt_calc_reserves(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*ask,
+	xfs_extlen_t		*used)
+{
+	xfs_extlen_t		tree_len = 0;
+	int			error;
+
+	if (!xfs_sb_version_hasfinobt(&mp->m_sb))
+		return 0;
+
+	error = xfs_inobt_count_blocks(mp, agno, XFS_BTNUM_FINO, &tree_len);
+	if (error)
+		return error;
+
+	*ask += xfs_inobt_max_size(mp);
+	*used += tree_len;
+	return 0;
+}
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -72,4 +72,7 @@ int xfs_inobt_rec_check_count(struct xfs
 #define xfs_inobt_rec_check_count(mp, rec)	0
 #endif	/* DEBUG */
 
+int xfs_finobt_calc_reserves(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_extlen_t *ask, xfs_extlen_t *used);
+
 #endif	/* __XFS_IALLOC_BTREE_H__ */
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1801,22 +1801,23 @@ xfs_inactive_ifree(
 	int			error;
 
 	/*
-	 * The ifree transaction might need to allocate blocks for record
-	 * insertion to the finobt. We don't want to fail here at ENOSPC, so
-	 * allow ifree to dip into the reserved block pool if necessary.
-	 *
-	 * Freeing large sets of inodes generally means freeing inode chunks,
-	 * directory and file data blocks, so this should be relatively safe.
-	 * Only under severe circumstances should it be possible to free enough
-	 * inodes to exhaust the reserve block pool via finobt expansion while
-	 * at the same time not creating free space in the filesystem.
+	 * We try to use a per-AG reservation for any block needed by the finobt
+	 * tree, but as the finobt feature predates the per-AG reservation
+	 * support a degraded file system might not have enough space for the
+	 * reservation at mount time.  In that case try to dip into the reserved
+	 * pool and pray.
 	 *
 	 * Send a warning if the reservation does happen to fail, as the inode
 	 * now remains allocated and sits on the unlinked list until the fs is
 	 * repaired.
 	 */
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ifree,
-			XFS_IFREE_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
+	if (unlikely(mp->m_inotbt_nores)) {
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ifree,
+				XFS_IFREE_SPACE_RES(mp), 0, XFS_TRANS_RESERVE,
+				&tp);
+	} else {
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ifree, 0, 0, 0, &tp);
+	}
 	if (error) {
 		if (error == -ENOSPC) {
 			xfs_warn_ratelimited(mp,
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -140,6 +140,7 @@ typedef struct xfs_mount {
 	int			m_fixedfsid[2];	/* unchanged for life of FS */
 	uint			m_dmevmask;	/* DMI events for this FS */
 	__uint64_t		m_flags;	/* global mount flags */
+	bool			m_inotbt_nores; /* no per-AG finobt resv. */
 	int			m_ialloc_inos;	/* inodes in inode allocation */
 	int			m_ialloc_blks;	/* blocks in inode allocation */
 	int			m_ialloc_min_blks;/* min blocks in sparse inode

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 05/72] xfs: pull up iolock from xfs_free_eofblocks()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 04/72] xfs: use per-AG reservations for the finobt Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 06/72] xfs: sync eofblocks scans under iolock are livelock prone Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Christoph Hellwig,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit a36b926180cda375ac2ec89e1748b47137cfc51c upstream.

xfs_free_eofblocks() requires the IOLOCK_EXCL lock, but is called from
different contexts where the lock may or may not be held. The
need_iolock parameter exists for this reason, to indicate whether
xfs_free_eofblocks() must acquire the iolock itself before it can
proceed.

This is ugly and confusing. Simplify the semantics of
xfs_free_eofblocks() to require the caller to acquire the iolock
appropriately and kill the need_iolock parameter. While here, the mp
param can be removed as well as the xfs_mount is accessible from the
xfs_inode structure. This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |   41 +++++++++++++++------------------------
 fs/xfs/xfs_bmap_util.h |    3 --
 fs/xfs/xfs_icache.c    |   24 ++++++++++++++---------
 fs/xfs/xfs_inode.c     |   51 ++++++++++++++++++++++++++-----------------------
 4 files changed, 60 insertions(+), 59 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -917,17 +917,18 @@ xfs_can_free_eofblocks(struct xfs_inode
  */
 int
 xfs_free_eofblocks(
-	xfs_mount_t	*mp,
-	xfs_inode_t	*ip,
-	bool		need_iolock)
+	struct xfs_inode	*ip)
 {
-	xfs_trans_t	*tp;
-	int		error;
-	xfs_fileoff_t	end_fsb;
-	xfs_fileoff_t	last_fsb;
-	xfs_filblks_t	map_len;
-	int		nimaps;
-	xfs_bmbt_irec_t	imap;
+	struct xfs_trans	*tp;
+	int			error;
+	xfs_fileoff_t		end_fsb;
+	xfs_fileoff_t		last_fsb;
+	xfs_filblks_t		map_len;
+	int			nimaps;
+	struct xfs_bmbt_irec	imap;
+	struct xfs_mount	*mp = ip->i_mount;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
 
 	/*
 	 * Figure out if there are any blocks beyond the end
@@ -944,6 +945,10 @@ xfs_free_eofblocks(
 	error = xfs_bmapi_read(ip, end_fsb, map_len, &imap, &nimaps, 0);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
+	/*
+	 * If there are blocks after the end of file, truncate the file to its
+	 * current size to free them up.
+	 */
 	if (!error && (nimaps != 0) &&
 	    (imap.br_startblock != HOLESTARTBLOCK ||
 	     ip->i_delayed_blks)) {
@@ -954,22 +959,10 @@ xfs_free_eofblocks(
 		if (error)
 			return error;
 
-		/*
-		 * There are blocks after the end of file.
-		 * Free them up now by truncating the file to
-		 * its current size.
-		 */
-		if (need_iolock) {
-			if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
-				return -EAGAIN;
-		}
-
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0,
 				&tp);
 		if (error) {
 			ASSERT(XFS_FORCED_SHUTDOWN(mp));
-			if (need_iolock)
-				xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 			return error;
 		}
 
@@ -997,8 +990,6 @@ xfs_free_eofblocks(
 		}
 
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		if (need_iolock)
-			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 	}
 	return error;
 }
@@ -1415,7 +1406,7 @@ xfs_shift_file_space(
 	 * into the accessible region of the file.
 	 */
 	if (xfs_can_free_eofblocks(ip, true)) {
-		error = xfs_free_eofblocks(mp, ip, false);
+		error = xfs_free_eofblocks(ip);
 		if (error)
 			return error;
 	}
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -63,8 +63,7 @@ int	xfs_insert_file_space(struct xfs_ino
 
 /* EOF block manipulation functions */
 bool	xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
-int	xfs_free_eofblocks(struct xfs_mount *mp, struct xfs_inode *ip,
-			   bool need_iolock);
+int	xfs_free_eofblocks(struct xfs_inode *ip);
 
 int	xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
 			 struct xfs_swapext *sx);
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1324,7 +1324,7 @@ xfs_inode_free_eofblocks(
 	int			flags,
 	void			*args)
 {
-	int ret;
+	int ret = 0;
 	struct xfs_eofblocks *eofb = args;
 	bool need_iolock = true;
 	int match;
@@ -1360,19 +1360,25 @@ xfs_inode_free_eofblocks(
 			return 0;
 
 		/*
-		 * A scan owner implies we already hold the iolock. Skip it in
-		 * xfs_free_eofblocks() to avoid deadlock. This also eliminates
-		 * the possibility of EAGAIN being returned.
+		 * A scan owner implies we already hold the iolock. Skip it here
+		 * to avoid deadlock.
 		 */
 		if (eofb->eof_scan_owner == ip->i_ino)
 			need_iolock = false;
 	}
 
-	ret = xfs_free_eofblocks(ip->i_mount, ip, need_iolock);
-
-	/* don't revisit the inode if we're not waiting */
-	if (ret == -EAGAIN && !(flags & SYNC_WAIT))
-		ret = 0;
+	/*
+	 * If the caller is waiting, return -EAGAIN to keep the background
+	 * scanner moving and revisit the inode in a subsequent pass.
+	 */
+	if (need_iolock && !xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+		if (flags & SYNC_WAIT)
+			ret = -EAGAIN;
+		return ret;
+	}
+	ret = xfs_free_eofblocks(ip);
+	if (need_iolock)
+		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 	return ret;
 }
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1701,32 +1701,34 @@ xfs_release(
 	if (xfs_can_free_eofblocks(ip, false)) {
 
 		/*
+		 * Check if the inode is being opened, written and closed
+		 * frequently and we have delayed allocation blocks outstanding
+		 * (e.g. streaming writes from the NFS server), truncating the
+		 * blocks past EOF will cause fragmentation to occur.
+		 *
+		 * In this case don't do the truncation, but we have to be
+		 * careful how we detect this case. Blocks beyond EOF show up as
+		 * i_delayed_blks even when the inode is clean, so we need to
+		 * truncate them away first before checking for a dirty release.
+		 * Hence on the first dirty close we will still remove the
+		 * speculative allocation, but after that we will leave it in
+		 * place.
+		 */
+		if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
+			return 0;
+		/*
 		 * If we can't get the iolock just skip truncating the blocks
 		 * past EOF because we could deadlock with the mmap_sem
-		 * otherwise.  We'll get another chance to drop them once the
+		 * otherwise. We'll get another chance to drop them once the
 		 * last reference to the inode is dropped, so we'll never leak
 		 * blocks permanently.
-		 *
-		 * Further, check if the inode is being opened, written and
-		 * closed frequently and we have delayed allocation blocks
-		 * outstanding (e.g. streaming writes from the NFS server),
-		 * truncating the blocks past EOF will cause fragmentation to
-		 * occur.
-		 *
-		 * In this case don't do the truncation, either, but we have to
-		 * be careful how we detect this case. Blocks beyond EOF show
-		 * up as i_delayed_blks even when the inode is clean, so we
-		 * need to truncate them away first before checking for a dirty
-		 * release. Hence on the first dirty close we will still remove
-		 * the speculative allocation, but after that we will leave it
-		 * in place.
 		 */
-		if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
-			return 0;
-
-		error = xfs_free_eofblocks(mp, ip, true);
-		if (error && error != -EAGAIN)
-			return error;
+		if (xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+			error = xfs_free_eofblocks(ip);
+			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+			if (error)
+				return error;
+		}
 
 		/* delalloc blocks after truncation means it really is dirty */
 		if (ip->i_delayed_blks)
@@ -1913,8 +1915,11 @@ xfs_inactive(
 		 * cache. Post-eof blocks must be freed, lest we end up with
 		 * broken free space accounting.
 		 */
-		if (xfs_can_free_eofblocks(ip, true))
-			xfs_free_eofblocks(mp, ip, false);
+		if (xfs_can_free_eofblocks(ip, true)) {
+			xfs_ilock(ip, XFS_IOLOCK_EXCL);
+			xfs_free_eofblocks(ip);
+			xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+		}
 
 		return;
 	}

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 06/72] xfs: sync eofblocks scans under iolock are livelock prone
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 05/72] xfs: pull up iolock from xfs_free_eofblocks() Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 07/72] xfs: fix eofblocks race with file extending async dio writes Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Christoph Hellwig,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit c3155097ad89a956579bc305856a1f2878494e52 upstream.

The xfs_eofblocks.eof_scan_owner field is an internal field to
facilitate invoking eofb scans from the kernel while under the iolock.
This is necessary because the eofb scan acquires the iolock of each
inode. Synchronous scans are invoked on certain buffered write failures
while under iolock. In such cases, the scan owner indicates that the
context for the scan already owns the particular iolock and prevents a
double lock deadlock.

eofblocks scans while under iolock are still livelock prone in the event
of multiple parallel scans, however. If multiple buffered writes to
different inodes fail and invoke eofblocks scans at the same time, each
scan avoids a deadlock with its own inode by virtue of the
eof_scan_owner field, but will never be able to acquire the iolock of
the inode from the parallel scan. Because the low free space scans are
invoked with SYNC_WAIT, the scan will not return until it has processed
every tagged inode and thus both scans will spin indefinitely on the
iolock being held across the opposite scan. This problem can be
reproduced reliably by generic/224 on systems with higher cpu counts
(x16).

To avoid this problem, simplify the semantics of eofblocks scans to
never invoke a scan while under iolock. This means that the buffered
write context must drop the iolock before the scan. It must reacquire
the lock before the write retry and also repeat the initial write
checks, as the original state might no longer be valid once the iolock
was dropped.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 fs/xfs/xfs_file.c   |   13 +++++++++----
 fs/xfs/xfs_icache.c |   45 +++++++--------------------------------------
 fs/xfs/xfs_icache.h |    2 --
 3 files changed, 16 insertions(+), 44 deletions(-)

--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -675,8 +675,10 @@ xfs_file_buffered_aio_write(
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
 	int			enospc = 0;
-	int			iolock = XFS_IOLOCK_EXCL;
+	int			iolock;
 
+write_retry:
+	iolock = XFS_IOLOCK_EXCL;
 	xfs_rw_ilock(ip, iolock);
 
 	ret = xfs_file_aio_write_checks(iocb, from, &iolock);
@@ -686,7 +688,6 @@ xfs_file_buffered_aio_write(
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = inode_to_bdi(inode);
 
-write_retry:
 	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
 	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
 	if (likely(ret >= 0))
@@ -702,18 +703,21 @@ write_retry:
 	 * running at the same time.
 	 */
 	if (ret == -EDQUOT && !enospc) {
+		xfs_rw_iunlock(ip, iolock);
 		enospc = xfs_inode_free_quota_eofblocks(ip);
 		if (enospc)
 			goto write_retry;
 		enospc = xfs_inode_free_quota_cowblocks(ip);
 		if (enospc)
 			goto write_retry;
+		iolock = 0;
 	} else if (ret == -ENOSPC && !enospc) {
 		struct xfs_eofblocks eofb = {0};
 
 		enospc = 1;
 		xfs_flush_inodes(ip->i_mount);
-		eofb.eof_scan_owner = ip->i_ino; /* for locking */
+
+		xfs_rw_iunlock(ip, iolock);
 		eofb.eof_flags = XFS_EOF_FLAGS_SYNC;
 		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
 		goto write_retry;
@@ -721,7 +725,8 @@ write_retry:
 
 	current->backing_dev_info = NULL;
 out:
-	xfs_rw_iunlock(ip, iolock);
+	if (iolock)
+		xfs_rw_iunlock(ip, iolock);
 	return ret;
 }
 
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1326,11 +1326,8 @@ xfs_inode_free_eofblocks(
 {
 	int ret = 0;
 	struct xfs_eofblocks *eofb = args;
-	bool need_iolock = true;
 	int match;
 
-	ASSERT(!eofb || (eofb && eofb->eof_scan_owner != 0));
-
 	if (!xfs_can_free_eofblocks(ip, false)) {
 		/* inode could be preallocated or append-only */
 		trace_xfs_inode_free_eofblocks_invalid(ip);
@@ -1358,27 +1355,19 @@ xfs_inode_free_eofblocks(
 		if (eofb->eof_flags & XFS_EOF_FLAGS_MINFILESIZE &&
 		    XFS_ISIZE(ip) < eofb->eof_min_file_size)
 			return 0;
-
-		/*
-		 * A scan owner implies we already hold the iolock. Skip it here
-		 * to avoid deadlock.
-		 */
-		if (eofb->eof_scan_owner == ip->i_ino)
-			need_iolock = false;
 	}
 
 	/*
 	 * If the caller is waiting, return -EAGAIN to keep the background
 	 * scanner moving and revisit the inode in a subsequent pass.
 	 */
-	if (need_iolock && !xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 		if (flags & SYNC_WAIT)
 			ret = -EAGAIN;
 		return ret;
 	}
 	ret = xfs_free_eofblocks(ip);
-	if (need_iolock)
-		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 	return ret;
 }
@@ -1425,15 +1414,10 @@ __xfs_inode_free_quota_eofblocks(
 	struct xfs_eofblocks eofb = {0};
 	struct xfs_dquot *dq;
 
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
-
 	/*
-	 * Set the scan owner to avoid a potential livelock. Otherwise, the scan
-	 * can repeatedly trylock on the inode we're currently processing. We
-	 * run a sync scan to increase effectiveness and use the union filter to
+	 * Run a sync scan to increase effectiveness and use the union filter to
 	 * cover all applicable quotas in a single scan.
 	 */
-	eofb.eof_scan_owner = ip->i_ino;
 	eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC;
 
 	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
@@ -1585,12 +1569,9 @@ xfs_inode_free_cowblocks(
 {
 	int ret;
 	struct xfs_eofblocks *eofb = args;
-	bool need_iolock = true;
 	int match;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 
-	ASSERT(!eofb || (eofb && eofb->eof_scan_owner != 0));
-
 	/*
 	 * Just clear the tag if we have an empty cow fork or none at all. It's
 	 * possible the inode was fully unshared since it was originally tagged.
@@ -1623,28 +1604,16 @@ xfs_inode_free_cowblocks(
 		if (eofb->eof_flags & XFS_EOF_FLAGS_MINFILESIZE &&
 		    XFS_ISIZE(ip) < eofb->eof_min_file_size)
 			return 0;
-
-		/*
-		 * A scan owner implies we already hold the iolock. Skip it in
-		 * xfs_free_eofblocks() to avoid deadlock. This also eliminates
-		 * the possibility of EAGAIN being returned.
-		 */
-		if (eofb->eof_scan_owner == ip->i_ino)
-			need_iolock = false;
 	}
 
 	/* Free the CoW blocks */
-	if (need_iolock) {
-		xfs_ilock(ip, XFS_IOLOCK_EXCL);
-		xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
-	}
+	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
 
 	ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF);
 
-	if (need_iolock) {
-		xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
-		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-	}
+	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 	return ret;
 }
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -27,7 +27,6 @@ struct xfs_eofblocks {
 	kgid_t		eof_gid;
 	prid_t		eof_prid;
 	__u64		eof_min_file_size;
-	xfs_ino_t	eof_scan_owner;
 };
 
 #define SYNC_WAIT		0x0001	/* wait for i/o to complete */
@@ -102,7 +101,6 @@ xfs_fs_eofblocks_from_user(
 	dst->eof_flags = src->eof_flags;
 	dst->eof_prid = src->eof_prid;
 	dst->eof_min_file_size = src->eof_min_file_size;
-	dst->eof_scan_owner = NULLFSINO;
 
 	dst->eof_uid = INVALID_UID;
 	if (src->eof_flags & XFS_EOF_FLAGS_UID) {

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 07/72] xfs: fix eofblocks race with file extending async dio writes
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 06/72] xfs: sync eofblocks scans under iolock are livelock prone Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 08/72] xfs: fix toctou race when locking an inode to access the data map Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Brian Foster, Christoph Hellwig,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit e4229d6b0bc9280f29624faf170cf76a9f1ca60e upstream.

It's possible for post-eof blocks to end up being used for direct I/O
writes. dio write performs an upfront unwritten extent allocation, sends
the dio and then updates the inode size (if necessary) on write
completion. If a file release occurs while a file extending dio write is
in flight, it is possible to mistake the post-eof blocks for speculative
preallocation and incorrectly truncate them from the inode. This means
that the resulting dio write completion can discover a hole and allocate
new blocks rather than perform unwritten extent conversion.

This requires a strange mix of I/O and is thus not likely to reproduce
in real world workloads. It is intermittently reproduced by generic/299.
The error manifests as an assert failure due to transaction overrun
because the aforementioned write completion transaction has only
reserved enough blocks for btree operations:

  XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \
   file: fs/xfs//xfs_trans.c, line: 309

The root cause is that xfs_free_eofblocks() uses i_size to truncate
post-eof blocks from the inode, but async, file extending direct writes
do not update i_size until write completion, long after inode locks are
dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
to the incorrect size.

Update xfs_free_eofblocks() to serialize against dio similar to how
extending writes are serialized against i_size updates before post-eof
block zeroing. Specifically, wait on dio while under the iolock. This
ensures that dio write completions have updated i_size before post-eof
blocks are processed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -959,6 +959,9 @@ xfs_free_eofblocks(
 		if (error)
 			return error;
 
+		/* wait on dio to ensure i_size has settled */
+		inode_dio_wait(VFS_I(ip));
+
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0,
 				&tp);
 		if (error) {

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 08/72] xfs: fix toctou race when locking an inode to access the data map
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 07/72] xfs: fix eofblocks race with file extending async dio writes Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 09/72] xfs: fail _dir_open when readahead fails Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 4b5bd5bf3fb182dc504b1b64e0331300f156e756 upstream.

We use di_format and if_flags to decide whether we're grabbing the ilock
in btree mode (btree extents not loaded) or shared mode (anything else),
but the state of those fields can be changed by other threads that are
also trying to load the btree extents -- IFEXTENTS gets set before the
_bmap_read_extents call and cleared if it fails.

We don't actually need to have IFEXTENTS set until after the bmbt
records are successfully loaded and validated, which will fix the race
between multiple threads trying to read the same directory.  The next
patch strengthens directory bmbt validation by refusing to open the
directory if reading the bmbt to start directory readahead fails.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_inode_fork.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -497,15 +497,14 @@ xfs_iread_extents(
 	 * We know that the size is valid (it's checked in iformat_btree)
 	 */
 	ifp->if_bytes = ifp->if_real_bytes = 0;
-	ifp->if_flags |= XFS_IFEXTENTS;
 	xfs_iext_add(ifp, 0, nextents);
 	error = xfs_bmap_read_extents(tp, ip, whichfork);
 	if (error) {
 		xfs_iext_destroy(ifp);
-		ifp->if_flags &= ~XFS_IFEXTENTS;
 		return error;
 	}
 	xfs_validate_extents(ifp, nextents, XFS_EXTFMT_INODE(ip));
+	ifp->if_flags |= XFS_IFEXTENTS;
 	return 0;
 }
 /*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 09/72] xfs: fail _dir_open when readahead fails
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 08/72] xfs: fix toctou race when locking an inode to access the data map Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 10/72] xfs: filter out obviously bad btree pointers Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Eric Sandeen,
	Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 7a652bbe366464267190c2792a32ce4fff5595ef upstream.

When we open a directory, we try to readahead block 0 of the directory
on the assumption that we're going to need it soon.  If the bmbt is
corrupt, the directory will never be usable and the readahead fails
immediately, so we might as well prevent the directory from being opened
at all.  This prevents a subsequent read or modify operation from
hitting it and taking the fs offline.

NOTE: We're only checking for early failures in the block mapping, not
the readahead directory block itself.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_da_btree.c |    6 ++----
 fs/xfs/libxfs/xfs_da_btree.h |    2 +-
 fs/xfs/xfs_file.c            |    4 ++--
 3 files changed, 5 insertions(+), 7 deletions(-)

--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2633,7 +2633,7 @@ out_free:
 /*
  * Readahead the dir/attr block.
  */
-xfs_daddr_t
+int
 xfs_da_reada_buf(
 	struct xfs_inode	*dp,
 	xfs_dablk_t		bno,
@@ -2664,7 +2664,5 @@ out_free:
 	if (mapp != &map)
 		kmem_free(mapp);
 
-	if (error)
-		return -1;
-	return mappedbno;
+	return error;
 }
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -201,7 +201,7 @@ int	xfs_da_read_buf(struct xfs_trans *tr
 			       xfs_dablk_t bno, xfs_daddr_t mappedbno,
 			       struct xfs_buf **bpp, int whichfork,
 			       const struct xfs_buf_ops *ops);
-xfs_daddr_t	xfs_da_reada_buf(struct xfs_inode *dp, xfs_dablk_t bno,
+int	xfs_da_reada_buf(struct xfs_inode *dp, xfs_dablk_t bno,
 				xfs_daddr_t mapped_bno, int whichfork,
 				const struct xfs_buf_ops *ops);
 int	xfs_da_shrink_inode(xfs_da_args_t *args, xfs_dablk_t dead_blkno,
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -992,9 +992,9 @@ xfs_dir_open(
 	 */
 	mode = xfs_ilock_data_map_shared(ip);
 	if (ip->i_d.di_nextents > 0)
-		xfs_dir3_data_readahead(ip, 0, -1);
+		error = xfs_dir3_data_readahead(ip, 0, -1);
 	xfs_iunlock(ip, mode);
-	return 0;
+	return error;
 }
 
 STATIC int

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 10/72] xfs: filter out obviously bad btree pointers
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 09/72] xfs: fail _dir_open when readahead fails Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 11/72] xfs: check for obviously bad level values in the bmbt root Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Eric Sandeen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit d5a91baeb6033c3392121e4d5c011cdc08dfa9f7 upstream.

Don't let anybody load an obviously bad btree pointer.  Since the values
come from disk, we must return an error, not just ASSERT.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c  |    5 +----
 fs/xfs/libxfs/xfs_btree.c |    3 ++-
 fs/xfs/libxfs/xfs_btree.h |    2 +-
 3 files changed, 4 insertions(+), 6 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1278,7 +1278,6 @@ xfs_bmap_read_extents(
 	/* REFERENCED */
 	xfs_extnum_t		room;	/* number of entries there's room for */
 
-	bno = NULLFSBLOCK;
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	exntf = (whichfork != XFS_DATA_FORK) ? XFS_EXTFMT_NOSTATE :
@@ -1291,9 +1290,7 @@ xfs_bmap_read_extents(
 	ASSERT(level > 0);
 	pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
 	bno = be64_to_cpu(*pp);
-	ASSERT(bno != NULLFSBLOCK);
-	ASSERT(XFS_FSB_TO_AGNO(mp, bno) < mp->m_sb.sb_agcount);
-	ASSERT(XFS_FSB_TO_AGBNO(mp, bno) < mp->m_sb.sb_agblocks);
+
 	/*
 	 * Go down the tree until leaf level is reached, following the first
 	 * pointer (leftmost) at each level.
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -810,7 +810,8 @@ xfs_btree_read_bufl(
 	xfs_daddr_t		d;		/* real disk block address */
 	int			error;
 
-	ASSERT(fsbno != NULLFSBLOCK);
+	if (!XFS_FSB_SANITY_CHECK(mp, fsbno))
+		return -EFSCORRUPTED;
 	d = XFS_FSB_TO_DADDR(mp, fsbno);
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, d,
 				   mp->m_bsize, lock, &bp, ops);
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -491,7 +491,7 @@ static inline int xfs_btree_get_level(st
 #define	XFS_FILBLKS_MAX(a,b)	max_t(xfs_filblks_t, (a), (b))
 
 #define	XFS_FSB_SANITY_CHECK(mp,fsb)	\
-	(XFS_FSB_TO_AGNO(mp, fsb) < mp->m_sb.sb_agcount && \
+	(fsb && XFS_FSB_TO_AGNO(mp, fsb) < mp->m_sb.sb_agcount && \
 		XFS_FSB_TO_AGBNO(mp, fsb) < mp->m_sb.sb_agblocks)
 
 /*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 11/72] xfs: check for obviously bad level values in the bmbt root
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 10/72] xfs: filter out obviously bad btree pointers Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 12/72] xfs: verify free block header fields Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit b3bf607d58520ea8c0666aeb4be60dbb724cd3a2 upstream.

We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
no such thing as a zero-level bmbt (for that we have extents format),
so if we see this, send back an error code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_inode_fork.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -26,6 +26,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_inode_item.h"
+#include "xfs_btree.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_bmap.h"
 #include "xfs_error.h"
@@ -429,11 +430,13 @@ xfs_iformat_btree(
 	/* REFERENCED */
 	int			nrecs;
 	int			size;
+	int			level;
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork);
 	size = XFS_BMAP_BROOT_SPACE(mp, dfp);
 	nrecs = be16_to_cpu(dfp->bb_numrecs);
+	level = be16_to_cpu(dfp->bb_level);
 
 	/*
 	 * blow out if -- fork has less extents than can fit in
@@ -446,7 +449,8 @@ xfs_iformat_btree(
 					XFS_IFORK_MAXEXT(ip, whichfork) ||
 		     XFS_BMDR_SPACE_CALC(nrecs) >
 					XFS_DFORK_SIZE(dip, mp, whichfork) ||
-		     XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks)) {
+		     XFS_IFORK_NEXTENTS(ip, whichfork) > ip->i_d.di_nblocks) ||
+		     level == 0 || level > XFS_BTREE_MAXLEVELS) {
 		xfs_warn(mp, "corrupt inode %Lu (btree).",
 					(unsigned long long) ip->i_ino);
 		XFS_CORRUPTION_ERROR("xfs_iformat_btree", XFS_ERRLEVEL_LOW,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 12/72] xfs: verify free block header fields
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 11/72] xfs: check for obviously bad level values in the bmbt root Greg Kroah-Hartman
@ 2017-04-06  8:37 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 13/72] xfs: allow unwritten extents in the CoW fork Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit de14c5f541e78c59006bee56f6c5c2ef1ca07272 upstream.

Perform basic sanity checking of the directory free block header
fields so that we avoid hanging the system on invalid data.

(Granted that just means that now we shutdown on directory write,
but that seems better than hanging...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_dir2_node.c |   51 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -155,6 +155,42 @@ const struct xfs_buf_ops xfs_dir3_free_b
 	.verify_write = xfs_dir3_free_write_verify,
 };
 
+/* Everything ok in the free block header? */
+static bool
+xfs_dir3_free_header_check(
+	struct xfs_inode	*dp,
+	xfs_dablk_t		fbno,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	unsigned int		firstdb;
+	int			maxbests;
+
+	maxbests = dp->d_ops->free_max_bests(mp->m_dir_geo);
+	firstdb = (xfs_dir2_da_to_db(mp->m_dir_geo, fbno) -
+		   xfs_dir2_byte_to_db(mp->m_dir_geo, XFS_DIR2_FREE_OFFSET)) *
+			maxbests;
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		struct xfs_dir3_free_hdr *hdr3 = bp->b_addr;
+
+		if (be32_to_cpu(hdr3->firstdb) != firstdb)
+			return false;
+		if (be32_to_cpu(hdr3->nvalid) > maxbests)
+			return false;
+		if (be32_to_cpu(hdr3->nvalid) < be32_to_cpu(hdr3->nused))
+			return false;
+	} else {
+		struct xfs_dir2_free_hdr *hdr = bp->b_addr;
+
+		if (be32_to_cpu(hdr->firstdb) != firstdb)
+			return false;
+		if (be32_to_cpu(hdr->nvalid) > maxbests)
+			return false;
+		if (be32_to_cpu(hdr->nvalid) < be32_to_cpu(hdr->nused))
+			return false;
+	}
+	return true;
+}
 
 static int
 __xfs_dir3_free_read(
@@ -168,11 +204,22 @@ __xfs_dir3_free_read(
 
 	err = xfs_da_read_buf(tp, dp, fbno, mappedbno, bpp,
 				XFS_DATA_FORK, &xfs_dir3_free_buf_ops);
+	if (err || !*bpp)
+		return err;
+
+	/* Check things that we can't do in the verifier. */
+	if (!xfs_dir3_free_header_check(dp, fbno, *bpp)) {
+		xfs_buf_ioerror(*bpp, -EFSCORRUPTED);
+		xfs_verifier_error(*bpp);
+		xfs_trans_brelse(tp, *bpp);
+		return -EFSCORRUPTED;
+	}
 
 	/* try read returns without an error or *bpp if it lands in a hole */
-	if (!err && tp && *bpp)
+	if (tp)
 		xfs_trans_buf_set_type(tp, *bpp, XFS_BLFT_DIR_FREE_BUF);
-	return err;
+
+	return 0;
 }
 
 int

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 13/72] xfs: allow unwritten extents in the CoW fork
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2017-04-06  8:37 ` [PATCH 4.9 12/72] xfs: verify free block header fields Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 14/72] xfs: mark speculative prealloc CoW fork extents unwritten Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 05a630d76bd3f39baf0eecfa305bed2820796dee upstream.

In the data fork, we only allow extents to perform the following state
transitions:

delay -> real <-> unwritten

There's no way to move directly from a delalloc reservation to an
/unwritten/ allocated extent.  However, for the CoW fork we want to be
able to do the following to each extent:

delalloc -> unwritten -> written -> remapped to data fork

This will help us to avoid a race in the speculative CoW preallocation
code between a first thread that is allocating a CoW extent and a second
thread that is remapping part of a file after a write.  In order to do
this, however, we need two things: first, we have to be able to
transition from da to unwritten, and second the function that converts
between real and unwritten has to be made aware of the cow fork.  Do
both of those things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |   80 +++++++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 30 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1952,6 +1952,7 @@ xfs_bmap_add_extent_delay_real(
 		 */
 		trace_xfs_bmap_pre_update(bma->ip, bma->idx, state, _THIS_IP_);
 		xfs_bmbt_set_startblock(ep, new->br_startblock);
+		xfs_bmbt_set_state(ep, new->br_state);
 		trace_xfs_bmap_post_update(bma->ip, bma->idx, state, _THIS_IP_);
 
 		(*nextents)++;
@@ -2290,6 +2291,7 @@ STATIC int				/* error */
 xfs_bmap_add_extent_unwritten_real(
 	struct xfs_trans	*tp,
 	xfs_inode_t		*ip,	/* incore inode pointer */
+	int			whichfork,
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_btree_cur_t		**curp,	/* if *curp is null, not a btree */
 	xfs_bmbt_irec_t		*new,	/* new data to add to file extents */
@@ -2309,12 +2311,14 @@ xfs_bmap_add_extent_unwritten_real(
 					/* left is 0, right is 1, prev is 2 */
 	int			rval=0;	/* return value (logging flags) */
 	int			state = 0;/* state bits, accessed thru macros */
-	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_mount	*mp = ip->i_mount;
 
 	*logflagsp = 0;
 
 	cur = *curp;
-	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 
 	ASSERT(*idx >= 0);
 	ASSERT(*idx <= xfs_iext_count(ifp));
@@ -2373,7 +2377,7 @@ xfs_bmap_add_extent_unwritten_real(
 	 * Don't set contiguous if the combined extent would be too large.
 	 * Also check for all-three-contiguous being too large.
 	 */
-	if (*idx < xfs_iext_count(&ip->i_df) - 1) {
+	if (*idx < xfs_iext_count(ifp) - 1) {
 		state |= BMAP_RIGHT_VALID;
 		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx + 1), &RIGHT);
 		if (isnullstartblock(RIGHT.br_startblock))
@@ -2413,7 +2417,8 @@ xfs_bmap_add_extent_unwritten_real(
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
 
 		xfs_iext_remove(ip, *idx + 1, 2, state);
-		ip->i_d.di_nextents -= 2;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) - 2);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2456,7 +2461,8 @@ xfs_bmap_add_extent_unwritten_real(
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
 
 		xfs_iext_remove(ip, *idx + 1, 1, state);
-		ip->i_d.di_nextents--;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2491,7 +2497,8 @@ xfs_bmap_add_extent_unwritten_real(
 		xfs_bmbt_set_state(ep, newext);
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
 		xfs_iext_remove(ip, *idx + 1, 1, state);
-		ip->i_d.di_nextents--;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2603,7 +2610,8 @@ xfs_bmap_add_extent_unwritten_real(
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
 
 		xfs_iext_insert(ip, *idx, 1, new, state);
-		ip->i_d.di_nextents++;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) + 1);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2681,7 +2689,8 @@ xfs_bmap_add_extent_unwritten_real(
 		++*idx;
 		xfs_iext_insert(ip, *idx, 1, new, state);
 
-		ip->i_d.di_nextents++;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) + 1);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2729,7 +2738,8 @@ xfs_bmap_add_extent_unwritten_real(
 		++*idx;
 		xfs_iext_insert(ip, *idx, 2, &r[0], state);
 
-		ip->i_d.di_nextents += 2;
+		XFS_IFORK_NEXT_SET(ip, whichfork,
+				XFS_IFORK_NEXTENTS(ip, whichfork) + 2);
 		if (cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2783,17 +2793,17 @@ xfs_bmap_add_extent_unwritten_real(
 	}
 
 	/* update reverse mappings */
-	error = xfs_rmap_convert_extent(mp, dfops, ip, XFS_DATA_FORK, new);
+	error = xfs_rmap_convert_extent(mp, dfops, ip, whichfork, new);
 	if (error)
 		goto done;
 
 	/* convert to a btree if necessary */
-	if (xfs_bmap_needs_btree(ip, XFS_DATA_FORK)) {
+	if (xfs_bmap_needs_btree(ip, whichfork)) {
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(cur == NULL);
 		error = xfs_bmap_extents_to_btree(tp, ip, first, dfops, &cur,
-				0, &tmp_logflags, XFS_DATA_FORK);
+				0, &tmp_logflags, whichfork);
 		*logflagsp |= tmp_logflags;
 		if (error)
 			goto done;
@@ -2805,7 +2815,7 @@ xfs_bmap_add_extent_unwritten_real(
 		*curp = cur;
 	}
 
-	xfs_bmap_check_leaf_extents(*curp, ip, XFS_DATA_FORK);
+	xfs_bmap_check_leaf_extents(*curp, ip, whichfork);
 done:
 	*logflagsp |= rval;
 	return error;
@@ -4458,10 +4468,16 @@ xfs_bmapi_allocate(
 	bma->got.br_state = XFS_EXT_NORM;
 
 	/*
-	 * A wasdelay extent has been initialized, so shouldn't be flagged
-	 * as unwritten.
+	 * In the data fork, a wasdelay extent has been initialized, so
+	 * shouldn't be flagged as unwritten.
+	 *
+	 * For the cow fork, however, we convert delalloc reservations
+	 * (extents allocated for speculative preallocation) to
+	 * allocated unwritten extents, and only convert the unwritten
+	 * extents to real extents when we're about to write the data.
 	 */
-	if (!bma->wasdel && (bma->flags & XFS_BMAPI_PREALLOC) &&
+	if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) &&
+	    (bma->flags & XFS_BMAPI_PREALLOC) &&
 	    xfs_sb_version_hasextflgbit(&mp->m_sb))
 		bma->got.br_state = XFS_EXT_UNWRITTEN;
 
@@ -4512,8 +4528,6 @@ xfs_bmapi_convert_unwritten(
 			(XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT))
 		return 0;
 
-	ASSERT(whichfork != XFS_COW_FORK);
-
 	/*
 	 * Modify (by adding) the state flag, if writing.
 	 */
@@ -4538,8 +4552,8 @@ xfs_bmapi_convert_unwritten(
 			return error;
 	}
 
-	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
-			&bma->cur, mval, bma->firstblock, bma->dfops,
+	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, whichfork,
+			&bma->idx, &bma->cur, mval, bma->firstblock, bma->dfops,
 			&tmp_logflags);
 	/*
 	 * Log the inode core unconditionally in the unwritten extent conversion
@@ -4548,8 +4562,12 @@ xfs_bmapi_convert_unwritten(
 	 * in the transaction for the sake of fsync(), even if nothing has
 	 * changed, because fsync() will not force the log for this transaction
 	 * unless it sees the inode pinned.
+	 *
+	 * Note: If we're only converting cow fork extents, there aren't
+	 * any on-disk updates to make, so we don't need to log anything.
 	 */
-	bma->logflags |= tmp_logflags | XFS_ILOG_CORE;
+	if (whichfork != XFS_COW_FORK)
+		bma->logflags |= tmp_logflags | XFS_ILOG_CORE;
 	if (error)
 		return error;
 
@@ -4623,15 +4641,15 @@ xfs_bmapi_write(
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
 	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
-	ASSERT(tp != NULL);
+	ASSERT(tp != NULL ||
+	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
+			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
 	ASSERT(len > 0);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	ASSERT(!(flags & XFS_BMAPI_REMAP) || whichfork == XFS_DATA_FORK);
 	ASSERT(!(flags & XFS_BMAPI_PREALLOC) || !(flags & XFS_BMAPI_REMAP));
 	ASSERT(!(flags & XFS_BMAPI_CONVERT) || !(flags & XFS_BMAPI_REMAP));
-	ASSERT(!(flags & XFS_BMAPI_PREALLOC) || whichfork != XFS_COW_FORK);
-	ASSERT(!(flags & XFS_BMAPI_CONVERT) || whichfork != XFS_COW_FORK);
 
 	/* zeroing is for currently only for data extents, not metadata */
 	ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) !=
@@ -5653,8 +5671,8 @@ __xfs_bunmapi(
 			}
 			del.br_state = XFS_EXT_UNWRITTEN;
 			error = xfs_bmap_add_extent_unwritten_real(tp, ip,
-					&lastx, &cur, &del, firstblock, dfops,
-					&logflags);
+					whichfork, &lastx, &cur, &del,
+					firstblock, dfops, &logflags);
 			if (error)
 				goto error0;
 			goto nodelete;
@@ -5711,8 +5729,9 @@ __xfs_bunmapi(
 				prev.br_state = XFS_EXT_UNWRITTEN;
 				lastx--;
 				error = xfs_bmap_add_extent_unwritten_real(tp,
-						ip, &lastx, &cur, &prev,
-						firstblock, dfops, &logflags);
+						ip, whichfork, &lastx, &cur,
+						&prev, firstblock, dfops,
+						&logflags);
 				if (error)
 					goto error0;
 				goto nodelete;
@@ -5720,8 +5739,9 @@ __xfs_bunmapi(
 				ASSERT(del.br_state == XFS_EXT_NORM);
 				del.br_state = XFS_EXT_UNWRITTEN;
 				error = xfs_bmap_add_extent_unwritten_real(tp,
-						ip, &lastx, &cur, &del,
-						firstblock, dfops, &logflags);
+						ip, whichfork, &lastx, &cur,
+						&del, firstblock, dfops,
+						&logflags);
 				if (error)
 					goto error0;
 				goto nodelete;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 14/72] xfs: mark speculative prealloc CoW fork extents unwritten
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 13/72] xfs: allow unwritten extents in the CoW fork Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 15/72] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 5eda43000064a69a39fb7869cc63c9571535ad29 upstream.

Christoph Hellwig pointed out that there's a potentially nasty race when
performing simultaneous nearby directio cow writes:

"Thread 1 writes a range from B to c

"                    B --------- C
                           p

"a little later thread 2 writes from A to B

"        A --------- B
               p

[editor's note: the 'p' denote cowextsize boundaries, which I added to
make this more clear]

"but the code preallocates beyond B into the range where thread
"1 has just written, but ->end_io hasn't been called yet.
"But once ->end_io is called thread 2 has already allocated
"up to the extent size hint into the write range of thread 1,
"so the end_io handler will splice the unintialized blocks from
"that preallocation back into the file right after B."

We can avoid this race by ensuring that thread 1 cannot accidentally
remap the blocks that thread 2 allocated (as part of speculative
preallocation) as part of t2's write preparation in t1's end_io handler.
The way we make this happen is by taking advantage of the unwritten
extent flag as an intermediate step.

Recall that when we begin the process of writing data to shared blocks,
we create a delayed allocation extent in the CoW fork:

D: --RRRRRRSSSRRRRRRRR---
C: ------DDDDDDD---------

When a thread prepares to CoW some dirty data out to disk, it will now
convert the delalloc reservation into an /unwritten/ allocated extent in
the cow fork.  The da conversion code tries to opportunistically
allocate as much of a (speculatively prealloc'd) extent as possible, so
we may end up allocating a larger extent than we're actually writing
out:

D: --RRRRRRSSSRRRRRRRR---
U: ------UUUUUUU---------

Next, we convert only the part of the extent that we're actively
planning to write to normal (i.e. not unwritten) status:

D: --RRRRRRSSSRRRRRRRR---
U: ------UURRUUU---------

If the write succeeds, the end_cow function will now scan the relevant
range of the CoW fork for real extents and remap only the real extents
into the data fork:

D: --RRRRRRRRSRRRRRRRR---
U: ------UU--UUU---------

This ensures that we never obliterate valid data fork extents with
unwritten blocks from the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c    |    6 ++
 fs/xfs/xfs_iomap.c   |    2 
 fs/xfs/xfs_reflink.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_reflink.h |    2 
 fs/xfs/xfs_trace.h   |    8 ++-
 5 files changed, 123 insertions(+), 11 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -486,6 +486,12 @@ xfs_submit_ioend(
 	struct xfs_ioend	*ioend,
 	int			status)
 {
+	/* Convert CoW extents to regular */
+	if (!status && ioend->io_type == XFS_IO_COW) {
+		status = xfs_reflink_convert_cow(XFS_I(ioend->io_inode),
+				ioend->io_offset, ioend->io_size);
+	}
+
 	/* Reserve log space if we might write beyond the on-disk inode size. */
 	if (!status &&
 	    ioend->io_type != XFS_IO_UNWRITTEN &&
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -685,7 +685,7 @@ xfs_iomap_write_allocate(
 	int		nres;
 
 	if (whichfork == XFS_COW_FORK)
-		flags |= XFS_BMAPI_COWFORK;
+		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
 
 	/*
 	 * Make sure that the dquots are there.
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -82,11 +82,22 @@
  * mappings are a reservation against the free space in the filesystem;
  * adjacent mappings can also be combined into fewer larger mappings.
  *
+ * As an optimization, the CoW extent size hint (cowextsz) creates
+ * outsized aligned delalloc reservations in the hope of landing out of
+ * order nearby CoW writes in a single extent on disk, thereby reducing
+ * fragmentation and improving future performance.
+ *
+ * D: --RRRRRRSSSRRRRRRRR--- (data fork)
+ * C: ------DDDDDDD--------- (CoW fork)
+ *
  * When dirty pages are being written out (typically in writepage), the
- * delalloc reservations are converted into real mappings by allocating
- * blocks and replacing the delalloc mapping with real ones.  A delalloc
- * mapping can be replaced by several real ones if the free space is
- * fragmented.
+ * delalloc reservations are converted into unwritten mappings by
+ * allocating blocks and replacing the delalloc mapping with real ones.
+ * A delalloc mapping can be replaced by several unwritten ones if the
+ * free space is fragmented.
+ *
+ * D: --RRRRRRSSSRRRRRRRR---
+ * C: ------UUUUUUU---------
  *
  * We want to adapt the delalloc mechanism for copy-on-write, since the
  * write paths are similar.  The first two steps (creating the reservation
@@ -101,13 +112,29 @@
  * Block-aligned directio writes will use the same mechanism as buffered
  * writes.
  *
+ * Just prior to submitting the actual disk write requests, we convert
+ * the extents representing the range of the file actually being written
+ * (as opposed to extra pieces created for the cowextsize hint) to real
+ * extents.  This will become important in the next step:
+ *
+ * D: --RRRRRRSSSRRRRRRRR---
+ * C: ------UUrrUUU---------
+ *
  * CoW remapping must be done after the data block write completes,
  * because we don't want to destroy the old data fork map until we're sure
  * the new block has been written.  Since the new mappings are kept in a
  * separate fork, we can simply iterate these mappings to find the ones
  * that cover the file blocks that we just CoW'd.  For each extent, simply
  * unmap the corresponding range in the data fork, map the new range into
- * the data fork, and remove the extent from the CoW fork.
+ * the data fork, and remove the extent from the CoW fork.  Because of
+ * the presence of the cowextsize hint, however, we must be careful
+ * only to remap the blocks that we've actually written out --  we must
+ * never remap delalloc reservations nor CoW staging blocks that have
+ * yet to be written.  This corresponds exactly to the real extents in
+ * the CoW fork:
+ *
+ * D: --RRRRRRrrSRRRRRRRR---
+ * C: ------UU--UUU---------
  *
  * Since the remapping operation can be applied to an arbitrary file
  * range, we record the need for the remap step as a flag in the ioend
@@ -296,6 +323,65 @@ xfs_reflink_reserve_cow(
 	return 0;
 }
 
+/* Convert part of an unwritten CoW extent to a real one. */
+STATIC int
+xfs_reflink_convert_cow_extent(
+	struct xfs_inode		*ip,
+	struct xfs_bmbt_irec		*imap,
+	xfs_fileoff_t			offset_fsb,
+	xfs_filblks_t			count_fsb,
+	struct xfs_defer_ops		*dfops)
+{
+	struct xfs_bmbt_irec		irec = *imap;
+	xfs_fsblock_t			first_block;
+	int				nimaps = 1;
+
+	if (imap->br_state == XFS_EXT_NORM)
+		return 0;
+
+	xfs_trim_extent(&irec, offset_fsb, count_fsb);
+	trace_xfs_reflink_convert_cow(ip, &irec);
+	if (irec.br_blockcount == 0)
+		return 0;
+	return xfs_bmapi_write(NULL, ip, irec.br_startoff, irec.br_blockcount,
+			XFS_BMAPI_COWFORK | XFS_BMAPI_CONVERT, &first_block,
+			0, &irec, &nimaps, dfops);
+}
+
+/* Convert all of the unwritten CoW extents in a file's range to real ones. */
+int
+xfs_reflink_convert_cow(
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	xfs_off_t		count)
+{
+	struct xfs_bmbt_irec	got;
+	struct xfs_defer_ops	dfops;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
+	xfs_extnum_t		idx;
+	bool			found;
+	int			error;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	/* Convert all the extents to real from unwritten. */
+	for (found = xfs_iext_lookup_extent(ip, ifp, offset_fsb, &idx, &got);
+	     found && got.br_startoff < end_fsb;
+	     found = xfs_iext_get_extent(ifp, ++idx, &got)) {
+		error = xfs_reflink_convert_cow_extent(ip, &got, offset_fsb,
+				end_fsb - offset_fsb, &dfops);
+		if (error)
+			break;
+	}
+
+	/* Finish up. */
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
+
 /* Allocate all CoW reservations covering a range of blocks in a file. */
 static int
 __xfs_reflink_allocate_cow(
@@ -328,6 +414,7 @@ __xfs_reflink_allocate_cow(
 		goto out_unlock;
 	ASSERT(nimaps == 1);
 
+	/* Make sure there's a CoW reservation for it. */
 	error = xfs_reflink_reserve_cow(ip, &imap, &shared);
 	if (error)
 		goto out_trans_cancel;
@@ -337,14 +424,16 @@ __xfs_reflink_allocate_cow(
 		goto out_trans_cancel;
 	}
 
+	/* Allocate the entire reservation as unwritten blocks. */
 	xfs_trans_ijoin(tp, ip, 0);
 	error = xfs_bmapi_write(tp, ip, imap.br_startoff, imap.br_blockcount,
-			XFS_BMAPI_COWFORK, &first_block,
+			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, &first_block,
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK),
 			&imap, &nimaps, &dfops);
 	if (error)
 		goto out_trans_cancel;
 
+	/* Finish up. */
 	error = xfs_defer_finish(&tp, &dfops, NULL);
 	if (error)
 		goto out_trans_cancel;
@@ -389,11 +478,12 @@ xfs_reflink_allocate_cow_range(
 		if (error) {
 			trace_xfs_reflink_allocate_cow_range_error(ip, error,
 					_RET_IP_);
-			break;
+			return error;
 		}
 	}
 
-	return error;
+	/* Convert the CoW extents to regular. */
+	return xfs_reflink_convert_cow(ip, offset, count);
 }
 
 /*
@@ -669,6 +759,16 @@ xfs_reflink_end_cow(
 
 		ASSERT(!isnullstartblock(got.br_startblock));
 
+		/*
+		 * Don't remap unwritten extents; these are
+		 * speculatively preallocated CoW extents that have been
+		 * allocated but have not yet been involved in a write.
+		 */
+		if (got.br_state == XFS_EXT_UNWRITTEN) {
+			idx--;
+			goto next_extent;
+		}
+
 		/* Unmap the old blocks in the data fork. */
 		xfs_defer_init(&dfops, &firstfsb);
 		rlen = del.br_blockcount;
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -30,6 +30,8 @@ extern int xfs_reflink_reserve_cow(struc
 		struct xfs_bmbt_irec *imap, bool *shared);
 extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip,
 		xfs_off_t offset, xfs_off_t count);
+extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
+		xfs_off_t count);
 extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
 		struct xfs_bmbt_irec *imap, bool *need_alloc);
 extern int xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3183,6 +3183,7 @@ DECLARE_EVENT_CLASS(xfs_inode_irec_class
 		__field(xfs_fileoff_t, lblk)
 		__field(xfs_extlen_t, len)
 		__field(xfs_fsblock_t, pblk)
+		__field(int, state)
 	),
 	TP_fast_assign(
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
@@ -3190,13 +3191,15 @@ DECLARE_EVENT_CLASS(xfs_inode_irec_class
 		__entry->lblk = irec->br_startoff;
 		__entry->len = irec->br_blockcount;
 		__entry->pblk = irec->br_startblock;
+		__entry->state = irec->br_state;
 	),
-	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x pblk %llu",
+	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x pblk %llu st %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->lblk,
 		  __entry->len,
-		  __entry->pblk)
+		  __entry->pblk,
+		  __entry->state)
 );
 #define DEFINE_INODE_IREC_EVENT(name) \
 DEFINE_EVENT(xfs_inode_irec_class, name, \
@@ -3345,6 +3348,7 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_trim
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_alloc);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_found);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_enospc);
+DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 15/72] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 14/72] xfs: mark speculative prealloc CoW fork extents unwritten Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 16/72] xfs: update ctime and mtime on clone destinatation inodes Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hou Tao, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hou Tao <houtao1@huawei.com>

commit 4dd2eb633598cb6a5a0be2fd9a2be0819f5eeb5f upstream.

After successful IO or permanent error, b_first_retry_time also
needs to be cleared, else the invalid first retry time will be
used by the next retry check.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf_item.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1162,6 +1162,7 @@ xfs_buf_iodone_callbacks(
 	 */
 	bp->b_last_error = 0;
 	bp->b_retries = 0;
+	bp->b_first_retry_time = 0;
 
 	xfs_buf_do_callbacks(bp);
 	bp->b_fspriv = NULL;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 16/72] xfs: update ctime and mtime on clone destinatation inodes
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 15/72] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 17/72] xfs: reject all unaligned direct writes to reflinked files Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit c5ecb42342852892f978572ddc6dca703460f25a upstream.

We're changing both metadata and data, so we need to update the
timestamps for clone operations.  Dedupe on the other hand does
not change file data, and only changes invisible metadata so the
timestamps should not be updated.

This follows existing btrfs behavior.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: remove redundant is_dedupe test]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_reflink.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -985,13 +985,14 @@ STATIC int
 xfs_reflink_update_dest(
 	struct xfs_inode	*dest,
 	xfs_off_t		newlen,
-	xfs_extlen_t		cowextsize)
+	xfs_extlen_t		cowextsize,
+	bool			is_dedupe)
 {
 	struct xfs_mount	*mp = dest->i_mount;
 	struct xfs_trans	*tp;
 	int			error;
 
-	if (newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0)
+	if (is_dedupe && newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0)
 		return 0;
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp);
@@ -1012,6 +1013,10 @@ xfs_reflink_update_dest(
 		dest->i_d.di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
 	}
 
+	if (!is_dedupe) {
+		xfs_trans_ichgtime(tp, dest,
+				   XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
+	}
 	xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
 
 	error = xfs_trans_commit(tp);
@@ -1528,7 +1533,8 @@ xfs_reflink_remap_range(
 	    !(dest->i_d.di_flags2 & XFS_DIFLAG2_COWEXTSIZE))
 		cowextsize = src->i_d.di_cowextsize;
 
-	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize);
+	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize,
+			is_dedupe);
 
 out_unlock:
 	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 17/72] xfs: reject all unaligned direct writes to reflinked files
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 16/72] xfs: update ctime and mtime on clone destinatation inodes Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 18/72] xfs: dont fail xfs_extent_busy allocation Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Brian Foster,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 54a4ef8af4e0dc5c983d17fcb9cf5fd25666d94e upstream.

We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback.  But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O.  To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system.  But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[slight changes in context due to the new direct I/O code in 4.10+]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c  |   45 ---------------------------------------------
 fs/xfs/xfs_file.c  |    9 +++++++++
 fs/xfs/xfs_trace.h |    2 +-
 3 files changed, 10 insertions(+), 46 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1263,44 +1263,6 @@ xfs_map_trim_size(
 	bh_result->b_size = mapping_size;
 }
 
-/* Bounce unaligned directio writes to the page cache. */
-static int
-xfs_bounce_unaligned_dio_write(
-	struct xfs_inode	*ip,
-	xfs_fileoff_t		offset_fsb,
-	struct xfs_bmbt_irec	*imap)
-{
-	struct xfs_bmbt_irec	irec;
-	xfs_fileoff_t		delta;
-	bool			shared;
-	bool			x;
-	int			error;
-
-	irec = *imap;
-	if (offset_fsb > irec.br_startoff) {
-		delta = offset_fsb - irec.br_startoff;
-		irec.br_blockcount -= delta;
-		irec.br_startblock += delta;
-		irec.br_startoff = offset_fsb;
-	}
-	error = xfs_reflink_trim_around_shared(ip, &irec, &shared, &x);
-	if (error)
-		return error;
-
-	/*
-	 * We're here because we're trying to do a directio write to a
-	 * region that isn't aligned to a filesystem block.  If any part
-	 * of the extent is shared, fall back to buffered mode to handle
-	 * the RMW.  This is done by returning -EREMCHG ("remote addr
-	 * changed"), which is caught further up the call stack.
-	 */
-	if (shared) {
-		trace_xfs_reflink_bounce_dio_write(ip, imap);
-		return -EREMCHG;
-	}
-	return 0;
-}
-
 STATIC int
 __xfs_get_blocks(
 	struct inode		*inode,
@@ -1438,13 +1400,6 @@ __xfs_get_blocks(
 	if (imap.br_startblock != HOLESTARTBLOCK &&
 	    imap.br_startblock != DELAYSTARTBLOCK &&
 	    (create || !ISUNWRITTEN(&imap))) {
-		if (create && direct && !is_cow) {
-			error = xfs_bounce_unaligned_dio_write(ip, offset_fsb,
-					&imap);
-			if (error)
-				return error;
-		}
-
 		xfs_map_buffer(inode, bh_result, &imap, offset);
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -554,6 +554,15 @@ xfs_file_dio_aio_write(
 	if ((iocb->ki_pos & mp->m_blockmask) ||
 	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
 		unaligned_io = 1;
+
+		/*
+		 * We can't properly handle unaligned direct I/O to reflink
+		 * files yet, as we can't unshare a partial block.
+		 */
+		if (xfs_is_reflink_inode(ip)) {
+			trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count);
+			return -EREMCHG;
+		}
 		iolock = XFS_IOLOCK_EXCL;
 	} else {
 		iolock = XFS_IOLOCK_SHARED;
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3353,7 +3353,7 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_conv
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);
 
-DEFINE_INODE_IREC_EVENT(xfs_reflink_bounce_dio_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
 DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 18/72] xfs: dont fail xfs_extent_busy allocation
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 17/72] xfs: reject all unaligned direct writes to reflinked files Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 19/72] xfs: handle indlen shortage on delalloc extent merge Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Brian Foster,
	Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 5e30c23d13919a718b22d4921dc5c0accc59da27 upstream.

We don't just need the structure to track busy extents which can be
avoided with a synchronous transaction, but also to keep track of
pending discard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_extent_busy.c |   13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

--- a/fs/xfs/xfs_extent_busy.c
+++ b/fs/xfs/xfs_extent_busy.c
@@ -45,18 +45,7 @@ xfs_extent_busy_insert(
 	struct rb_node		**rbp;
 	struct rb_node		*parent = NULL;
 
-	new = kmem_zalloc(sizeof(struct xfs_extent_busy), KM_MAYFAIL);
-	if (!new) {
-		/*
-		 * No Memory!  Since it is now not possible to track the free
-		 * block, make this a synchronous transaction to insure that
-		 * the block is not reused before this transaction commits.
-		 */
-		trace_xfs_extent_busy_enomem(tp->t_mountp, agno, bno, len);
-		xfs_trans_set_sync(tp);
-		return;
-	}
-
+	new = kmem_zalloc(sizeof(struct xfs_extent_busy), KM_SLEEP);
 	new->agno = agno;
 	new->bno = bno;
 	new->length = len;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 19/72] xfs: handle indlen shortage on delalloc extent merge
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 18/72] xfs: dont fail xfs_extent_busy allocation Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 20/72] xfs: split indlen reservations fairly when under reserved Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 0e339ef8556d9e567aa7925f8892c263d79430d9 upstream.

When a delalloc extent is created, it can be merged with pre-existing,
contiguous, delalloc extents. When this occurs,
xfs_bmap_add_extent_hole_delay() merges the extents along with the
associated indirect block reservations. The expectation here is that the
combined worst case indlen reservation is always less than or equal to
the indlen reservation for the individual extents.

This is not always the case, however, as existing extents can less than
the expected indlen reservation if the extent was previously split due
to a hole punch. If a new extent merges with such an extent, the total
indlen requirement may be larger than the sum of the indlen reservations
held by both extents.

xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
reservation is always available and assigns it to the merged extent
without consideration for the indlen held by the pre-existing extent. As
a result, the subsequent xfs_mod_fdblocks() call can attempt an
unintentional allocation rather than a free (indicated by an ASSERT()
failure). Further, if the allocation happens to fail in this context,
the failure goes unhandled and creates a filesystem wide block
accounting inconsistency.

Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
indlen reservation assigned to the merged extent to the sum of the
indlen reservations held by each of the individual extents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2907,7 +2907,8 @@ xfs_bmap_add_extent_hole_delay(
 		oldlen = startblockval(left.br_startblock) +
 			startblockval(new->br_startblock) +
 			startblockval(right.br_startblock);
-		newlen = xfs_bmap_worst_indlen(ip, temp);
+		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+					 oldlen);
 		xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, *idx),
 			nullstartblock((int)newlen));
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
@@ -2928,7 +2929,8 @@ xfs_bmap_add_extent_hole_delay(
 		xfs_bmbt_set_blockcount(xfs_iext_get_ext(ifp, *idx), temp);
 		oldlen = startblockval(left.br_startblock) +
 			startblockval(new->br_startblock);
-		newlen = xfs_bmap_worst_indlen(ip, temp);
+		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+					 oldlen);
 		xfs_bmbt_set_startblock(xfs_iext_get_ext(ifp, *idx),
 			nullstartblock((int)newlen));
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
@@ -2944,7 +2946,8 @@ xfs_bmap_add_extent_hole_delay(
 		temp = new->br_blockcount + right.br_blockcount;
 		oldlen = startblockval(new->br_startblock) +
 			startblockval(right.br_startblock);
-		newlen = xfs_bmap_worst_indlen(ip, temp);
+		newlen = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+					 oldlen);
 		xfs_bmbt_set_allf(xfs_iext_get_ext(ifp, *idx),
 			new->br_startoff,
 			nullstartblock((int)newlen), temp, right.br_state);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 20/72] xfs: split indlen reservations fairly when under reserved
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 19/72] xfs: handle indlen shortage on delalloc extent merge Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 21/72] xfs: fix uninitialized variable in _reflink_convert_cow Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Patrick Dung, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 75d65361cf3c0dae2af970c305e19c727b28a510 upstream.

Certain workoads that punch holes into speculative preallocation can
cause delalloc indirect reservation splits when the delalloc extent is
split in two. If further splits occur, an already short-handed extent
can be split into two in a manner that leaves zero indirect blocks for
one of the two new extents. This occurs because the shortage is large
enough that the xfs_bmap_split_indlen() algorithm completely drains the
requested indlen of one of the extents before it honors the existing
reservation.

This ultimately results in a warning from xfs_bmap_del_extent(). This
has been observed during file copies of large, sparse files using 'cp
--sparse=always.'

To avoid this problem, update xfs_bmap_split_indlen() to explicitly
apply the reservation shortage fairly between both extents. This smooths
out the overall indlen shortage and defers the situation where we end up
with a delalloc extent with zero indlen reservation to extreme
circumstances.

Reported-by: Patrick Dung <mpatdung@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |   61 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 18 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4899,34 +4899,59 @@ xfs_bmap_split_indlen(
 	xfs_filblks_t			len2 = *indlen2;
 	xfs_filblks_t			nres = len1 + len2; /* new total res. */
 	xfs_filblks_t			stolen = 0;
+	xfs_filblks_t			resfactor;
 
 	/*
 	 * Steal as many blocks as we can to try and satisfy the worst case
 	 * indlen for both new extents.
 	 */
-	while (nres > ores && avail) {
-		nres--;
-		avail--;
-		stolen++;
-	}
+	if (ores < nres && avail)
+		stolen = XFS_FILBLKS_MIN(nres - ores, avail);
+	ores += stolen;
+
+	 /* nothing else to do if we've satisfied the new reservation */
+	if (ores >= nres)
+		return stolen;
+
+	/*
+	 * We can't meet the total required reservation for the two extents.
+	 * Calculate the percent of the overall shortage between both extents
+	 * and apply this percentage to each of the requested indlen values.
+	 * This distributes the shortage fairly and reduces the chances that one
+	 * of the two extents is left with nothing when extents are repeatedly
+	 * split.
+	 */
+	resfactor = (ores * 100);
+	do_div(resfactor, nres);
+	len1 *= resfactor;
+	do_div(len1, 100);
+	len2 *= resfactor;
+	do_div(len2, 100);
+	ASSERT(len1 + len2 <= ores);
+	ASSERT(len1 < *indlen1 && len2 < *indlen2);
 
 	/*
-	 * The only blocks available are those reserved for the original
-	 * extent and what we can steal from the extent being removed.
-	 * If this still isn't enough to satisfy the combined
-	 * requirements for the two new extents, skim blocks off of each
-	 * of the new reservations until they match what is available.
+	 * Hand out the remainder to each extent. If one of the two reservations
+	 * is zero, we want to make sure that one gets a block first. The loop
+	 * below starts with len1, so hand len2 a block right off the bat if it
+	 * is zero.
 	 */
-	while (nres > ores) {
-		if (len1) {
-			len1--;
-			nres--;
+	ores -= (len1 + len2);
+	ASSERT((*indlen1 - len1) + (*indlen2 - len2) >= ores);
+	if (ores && !len2 && *indlen2) {
+		len2++;
+		ores--;
+	}
+	while (ores) {
+		if (len1 < *indlen1) {
+			len1++;
+			ores--;
 		}
-		if (nres == ores)
+		if (!ores)
 			break;
-		if (len2) {
-			len2--;
-			nres--;
+		if (len2 < *indlen2) {
+			len2++;
+			ores--;
 		}
 	}
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 21/72] xfs: fix uninitialized variable in _reflink_convert_cow
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 20/72] xfs: split indlen reservations fairly when under reserved Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 22/72] xfs: dont reserve blocks for right shift transactions Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Darrick J. Wong <darrick.wong@oracle.com>

commit 93aaead52a9eebdc20dc8fa673c350e592a06949 upstream.

Fix an uninitialize variable.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_reflink.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -363,7 +363,7 @@ xfs_reflink_convert_cow(
 	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, offset + count);
 	xfs_extnum_t		idx;
 	bool			found;
-	int			error;
+	int			error = 0;
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 22/72] xfs: dont reserve blocks for right shift transactions
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 21/72] xfs: fix uninitialized variable in _reflink_convert_cow Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 23/72] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ross Zwisler, Brian Foster, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 48af96ab92bc68fb645068b978ce36df2379e076 upstream.

The block reservation for the transaction allocated in
xfs_shift_file_space() is an artifact of the original collapse range
support. It exists to handle the case where a collapse range occurs,
the initial extent is left shifted into a location that forms a
contiguous boundary with the previous extent and thus the extents
are merged. This code was subsequently refactored and reused for
insert range (right shift) support.

If an insert range occurs under low free space conditions, the
extent at the starting offset is split before the first shift
transaction is allocated. If the block reservation fails, this
leaves separate, but contiguous extents around in the inode. While
not a fatal problem, this is unexpected and will flag a warning on
subsequent insert range operations on the inode. This problem has
been reproduce intermittently by generic/270 running against a
ramdisk device.

Since right shift does not create new extent boundaries in the
inode, a block reservation for extent merge is unnecessary. Update
xfs_shift_file_space() to conditionally reserve fs blocks for left
shift transactions only. This avoids the warning reproduced by
generic/270.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_bmap_util.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1387,10 +1387,16 @@ xfs_shift_file_space(
 	xfs_fileoff_t		stop_fsb;
 	xfs_fileoff_t		next_fsb;
 	xfs_fileoff_t		shift_fsb;
+	uint			resblks;
 
 	ASSERT(direction == SHIFT_LEFT || direction == SHIFT_RIGHT);
 
 	if (direction == SHIFT_LEFT) {
+		/*
+		 * Reserve blocks to cover potential extent merges after left
+		 * shift operations.
+		 */
+		resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
 		next_fsb = XFS_B_TO_FSB(mp, offset + len);
 		stop_fsb = XFS_B_TO_FSB(mp, VFS_I(ip)->i_size);
 	} else {
@@ -1398,6 +1404,7 @@ xfs_shift_file_space(
 		 * If right shift, delegate the work of initialization of
 		 * next_fsb to xfs_bmap_shift_extent as it has ilock held.
 		 */
+		resblks = 0;
 		next_fsb = NULLFSBLOCK;
 		stop_fsb = XFS_B_TO_FSB(mp, offset);
 	}
@@ -1439,21 +1446,14 @@ xfs_shift_file_space(
 	}
 
 	while (!error && !done) {
-		/*
-		 * We would need to reserve permanent block for transaction.
-		 * This will come into picture when after shifting extent into
-		 * hole we found that adjacent extents can be merged which
-		 * may lead to freeing of a block during record update.
-		 */
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write,
-				XFS_DIOSTRAT_SPACE_RES(mp, 0), 0, 0, &tp);
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0,
+					&tp);
 		if (error)
 			break;
 
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
-				ip->i_gdquot, ip->i_pdquot,
-				XFS_DIOSTRAT_SPACE_RES(mp, 0), 0,
+				ip->i_gdquot, ip->i_pdquot, resblks, 0,
 				XFS_QMOPT_RES_REGBLKS);
 		if (error)
 			goto out_trans_cancel;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 23/72] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 22/72] xfs: dont reserve blocks for right shift transactions Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 24/72] xfs: tune down agno asserts in the bmap code Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Darrick J. Wong, Chandan Rajendra,
	Christoph Hellwig

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chandan Rajendra <chandan@linux.vnet.ibm.com>

commit 8ee9fdbebc84b39f1d1c201c5e32277c61d034aa upstream.

On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,

XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026

kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
DEBUG_PAGEALLOC
NUMA
pSeries
Modules linked in:
CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
task: c000000102606d80 task.stack: c0000001026b8000
NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
CR: 28004428  XER: 00000000
CFAR: c0000000004ef180 SOFTE: 1
GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
NIP [c0000000004ef798] .assfail+0x28/0x30
LR [c0000000004ef798] .assfail+0x28/0x30
Call Trace:
[c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
[c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
[c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
[c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
[c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
[c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
[c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
[c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
[c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
[c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
[c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
[c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
[c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
Instruction dump:
4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. This causes
xfs_ialloc_cluster_alignment() to return 0.  Due to this
args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
equivalent of -1 assigned to it. This later causes alloc_len in
xfs_alloc_space_available() to have a value of 0. In such a scenario
when args.total is also 0, the assert statement "ASSERT(args->maxlen >
0);" fails.

This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_ialloc.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -51,8 +51,7 @@ xfs_ialloc_cluster_alignment(
 	struct xfs_mount	*mp)
 {
 	if (xfs_sb_version_hasalign(&mp->m_sb) &&
-	    mp->m_sb.sb_inoalignmt >=
-			XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size))
+	    mp->m_sb.sb_inoalignmt >= xfs_icluster_size_fsb(mp))
 		return mp->m_sb.sb_inoalignmt;
 	return 1;
 }

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 24/72] xfs: tune down agno asserts in the bmap code
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 23/72] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 25/72] xfs: only reclaim unwritten COW extents periodically Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 410d17f67e583559be3a922f8b6cc336331893f3 upstream.

In various places we currently assert that xfs_bmap_btalloc allocates
from the same as the firstblock value passed in, unless it's either
NULLAGNO or the dop_low flag is set.  But the reflink code does not
fully follow this convention as it passes in firstblock purely as
a hint for the allocator without actually having previous allocations
in the transaction, and without having a minleft check on the current
AG, leading to the assert firing on a very full and heavily used
file system.  As even the reflink code only allocates from equal or
higher AGs for now we can simply the check to always allow for equal
or higher AGs.

Note that we need to eventually split the two meanings of the firstblock
value.  At that point we can also allow the reflink code to allocate
from any AG instead of limiting it in any way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -804,9 +804,7 @@ try_another_ag:
 	 */
 	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(*firstblock == NULLFSBLOCK ||
-	       args.agno == XFS_FSB_TO_AGNO(mp, *firstblock) ||
-	       (dfops->dop_low &&
-		args.agno > XFS_FSB_TO_AGNO(mp, *firstblock)));
+	       args.agno >= XFS_FSB_TO_AGNO(mp, *firstblock));
 	*firstblock = cur->bc_private.b.firstblock = args.fsbno;
 	cur->bc_private.b.allocated++;
 	ip->i_d.di_nblocks++;
@@ -3923,17 +3921,13 @@ xfs_bmap_btalloc(
 		 * the first block that was allocated.
 		 */
 		ASSERT(*ap->firstblock == NULLFSBLOCK ||
-		       XFS_FSB_TO_AGNO(mp, *ap->firstblock) ==
-		       XFS_FSB_TO_AGNO(mp, args.fsbno) ||
-		       (ap->dfops->dop_low &&
-			XFS_FSB_TO_AGNO(mp, *ap->firstblock) <
-			XFS_FSB_TO_AGNO(mp, args.fsbno)));
+		       XFS_FSB_TO_AGNO(mp, *ap->firstblock) <=
+		       XFS_FSB_TO_AGNO(mp, args.fsbno));
 
 		ap->blkno = args.fsbno;
 		if (*ap->firstblock == NULLFSBLOCK)
 			*ap->firstblock = args.fsbno;
-		ASSERT(nullfb || fb_agno == args.agno ||
-		       (ap->dfops->dop_low && fb_agno < args.agno));
+		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
 		if (!(ap->flags & XFS_BMAPI_COWFORK))
 			ap->ip->i_d.di_nblocks += args.len;
@@ -4858,13 +4852,9 @@ error0:
 	if (bma.cur) {
 		if (!error) {
 			ASSERT(*firstblock == NULLFSBLOCK ||
-			       XFS_FSB_TO_AGNO(mp, *firstblock) ==
+			       XFS_FSB_TO_AGNO(mp, *firstblock) <=
 			       XFS_FSB_TO_AGNO(mp,
-				       bma.cur->bc_private.b.firstblock) ||
-			       (dfops->dop_low &&
-				XFS_FSB_TO_AGNO(mp, *firstblock) <
-				XFS_FSB_TO_AGNO(mp,
-					bma.cur->bc_private.b.firstblock)));
+				       bma.cur->bc_private.b.firstblock));
 			*firstblock = bma.cur->bc_private.b.firstblock;
 		}
 		xfs_btree_del_cursor(bma.cur,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 25/72] xfs: only reclaim unwritten COW extents periodically
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 24/72] xfs: tune down agno asserts in the bmap code Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 26/72] xfs: fix and streamline error handling in xfs_end_io Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 3802a345321a08093ba2ddb1849e736f84e8d450 upstream.

We only want to reclaim preallocations from our periodic work item.
Currently this is archived by looking for a dirty inode, but that check
is rather fragile.  Instead add a flag to xfs_reflink_cancel_cow_* so
that the caller can ask for just cancelling unwritten extents in the COW
fork.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix typos in commit message]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c    |    2 +-
 fs/xfs/xfs_icache.c  |    2 +-
 fs/xfs/xfs_inode.c   |    2 +-
 fs/xfs/xfs_reflink.c |   23 ++++++++++++++++-------
 fs/xfs/xfs_reflink.h |    4 ++--
 fs/xfs/xfs_super.c   |    2 +-
 6 files changed, 22 insertions(+), 13 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -298,7 +298,7 @@ xfs_end_io(
 			goto done;
 		if (ioend->io_bio->bi_error) {
 			error = xfs_reflink_cancel_cow_range(ip,
-					ioend->io_offset, ioend->io_size);
+					ioend->io_offset, ioend->io_size, true);
 			goto done;
 		}
 		error = xfs_reflink_end_cow(ip, ioend->io_offset,
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1610,7 +1610,7 @@ xfs_inode_free_cowblocks(
 	xfs_ilock(ip, XFS_IOLOCK_EXCL);
 	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
 
-	ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF);
+	ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
 
 	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
 	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1624,7 +1624,7 @@ xfs_itruncate_extents(
 
 	/* Remove all pending CoW reservations. */
 	error = xfs_reflink_cancel_cow_blocks(ip, &tp, first_unmap_block,
-			last_block);
+			last_block, true);
 	if (error)
 		goto out;
 
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -571,14 +571,18 @@ xfs_reflink_trim_irec_to_next_cow(
 }
 
 /*
- * Cancel all pending CoW reservations for some block range of an inode.
+ * Cancel CoW reservations for some block range of an inode.
+ *
+ * If cancel_real is true this function cancels all COW fork extents for the
+ * inode; if cancel_real is false, real extents are not cleared.
  */
 int
 xfs_reflink_cancel_cow_blocks(
 	struct xfs_inode		*ip,
 	struct xfs_trans		**tpp,
 	xfs_fileoff_t			offset_fsb,
-	xfs_fileoff_t			end_fsb)
+	xfs_fileoff_t			end_fsb,
+	bool				cancel_real)
 {
 	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
 	struct xfs_bmbt_irec		got, prev, del;
@@ -605,7 +609,7 @@ xfs_reflink_cancel_cow_blocks(
 					&idx, &got, &del);
 			if (error)
 				break;
-		} else {
+		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
 			xfs_trans_ijoin(*tpp, ip, 0);
 			xfs_defer_init(&dfops, &firstfsb);
 
@@ -648,13 +652,17 @@ xfs_reflink_cancel_cow_blocks(
 }
 
 /*
- * Cancel all pending CoW reservations for some byte range of an inode.
+ * Cancel CoW reservations for some byte range of an inode.
+ *
+ * If cancel_real is true this function cancels all COW fork extents for the
+ * inode; if cancel_real is false, real extents are not cleared.
  */
 int
 xfs_reflink_cancel_cow_range(
 	struct xfs_inode	*ip,
 	xfs_off_t		offset,
-	xfs_off_t		count)
+	xfs_off_t		count,
+	bool			cancel_real)
 {
 	struct xfs_trans	*tp;
 	xfs_fileoff_t		offset_fsb;
@@ -680,7 +688,8 @@ xfs_reflink_cancel_cow_range(
 	xfs_trans_ijoin(tp, ip, 0);
 
 	/* Scrape out the old CoW reservations */
-	error = xfs_reflink_cancel_cow_blocks(ip, &tp, offset_fsb, end_fsb);
+	error = xfs_reflink_cancel_cow_blocks(ip, &tp, offset_fsb, end_fsb,
+			cancel_real);
 	if (error)
 		goto out_cancel;
 
@@ -1686,7 +1695,7 @@ next:
 	 * We didn't find any shared blocks so turn off the reflink flag.
 	 * First, get rid of any leftover CoW mappings.
 	 */
-	error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF);
+	error = xfs_reflink_cancel_cow_blocks(ip, tpp, 0, NULLFILEOFF, true);
 	if (error)
 		return error;
 
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -39,9 +39,9 @@ extern int xfs_reflink_trim_irec_to_next
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
-		xfs_fileoff_t end_fsb);
+		xfs_fileoff_t end_fsb, bool cancel_real);
 extern int xfs_reflink_cancel_cow_range(struct xfs_inode *ip, xfs_off_t offset,
-		xfs_off_t count);
+		xfs_off_t count, bool cancel_real);
 extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 extern int xfs_reflink_recover_cow(struct xfs_mount *mp);
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -948,7 +948,7 @@ xfs_fs_destroy_inode(
 	XFS_STATS_INC(ip->i_mount, vn_remove);
 
 	if (xfs_is_reflink_inode(ip)) {
-		error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF);
+		error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
 		if (error && !XFS_FORCED_SHUTDOWN(ip->i_mount))
 			xfs_warn(ip->i_mount,
 "Error %d while evicting CoW blocks for inode %llu.",

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 26/72] xfs: fix and streamline error handling in xfs_end_io
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 25/72] xfs: only reclaim unwritten COW extents periodically Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 27/72] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 787eb485509f9d58962bd8b4dbc6a5ac6e2034fe upstream.

There are two different cases of buffered I/O errors:

 - first we can have an already shutdown fs.  In that case we should skip
   any on-disk operations and just clean up the appen transaction if
   present and destroy the ioend
 - a real I/O error.  In that case we should cleanup any lingering COW
   blocks.  This gets skipped in the current code and is fixed by this
   patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_aops.c |   59 ++++++++++++++++++++++++------------------------------
 1 file changed, 27 insertions(+), 32 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -279,54 +279,49 @@ xfs_end_io(
 	struct xfs_ioend	*ioend =
 		container_of(work, struct xfs_ioend, io_work);
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	xfs_off_t		offset = ioend->io_offset;
+	size_t			size = ioend->io_size;
 	int			error = ioend->io_bio->bi_error;
 
 	/*
-	 * Set an error if the mount has shut down and proceed with end I/O
-	 * processing so it can perform whatever cleanups are necessary.
+	 * Just clean up the in-memory strutures if the fs has been shut down.
 	 */
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		error = -EIO;
+		goto done;
+	}
 
 	/*
-	 * For a CoW extent, we need to move the mapping from the CoW fork
-	 * to the data fork.  If instead an error happened, just dump the
-	 * new blocks.
+	 * Clean up any COW blocks on an I/O error.
 	 */
-	if (ioend->io_type == XFS_IO_COW) {
-		if (error)
-			goto done;
-		if (ioend->io_bio->bi_error) {
-			error = xfs_reflink_cancel_cow_range(ip,
-					ioend->io_offset, ioend->io_size, true);
-			goto done;
+	if (unlikely(error)) {
+		switch (ioend->io_type) {
+		case XFS_IO_COW:
+			xfs_reflink_cancel_cow_range(ip, offset, size, true);
+			break;
 		}
-		error = xfs_reflink_end_cow(ip, ioend->io_offset,
-				ioend->io_size);
-		if (error)
-			goto done;
+
+		goto done;
 	}
 
 	/*
-	 * For unwritten extents we need to issue transactions to convert a
-	 * range to normal written extens after the data I/O has finished.
-	 * Detecting and handling completion IO errors is done individually
-	 * for each case as different cleanup operations need to be performed
-	 * on error.
+	 * Success:  commit the COW or unwritten blocks if needed.
 	 */
-	if (ioend->io_type == XFS_IO_UNWRITTEN) {
-		if (error)
-			goto done;
-		error = xfs_iomap_write_unwritten(ip, ioend->io_offset,
-						  ioend->io_size);
-	} else if (ioend->io_append_trans) {
-		error = xfs_setfilesize_ioend(ioend, error);
-	} else {
-		ASSERT(!xfs_ioend_is_append(ioend) ||
-		       ioend->io_type == XFS_IO_COW);
+	switch (ioend->io_type) {
+	case XFS_IO_COW:
+		error = xfs_reflink_end_cow(ip, offset, size);
+		break;
+	case XFS_IO_UNWRITTEN:
+		error = xfs_iomap_write_unwritten(ip, offset, size);
+		break;
+	default:
+		ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_append_trans);
+		break;
 	}
 
 done:
+	if (ioend->io_append_trans)
+		error = xfs_setfilesize_ioend(ioend, error);
 	xfs_destroy_ioend(ioend, error);
 }
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 27/72] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 26/72] xfs: fix and streamline error handling in xfs_end_io Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 28/72] xfs: use iomap new flag for newly allocated delalloc blocks Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Chandan Rajendra, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chandan Rajendra <chandan@linux.vnet.ibm.com>

commit d5825712ee98d68a2c17bc89dad2c30276894cba upstream.

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. Hence in
xfs_set_inoalignment(), xfs_mount->m_inoalign_mask gets initialized to
-1 instead of 0. However, xfs_mount->m_sinoalign would get correctly
intialized to 0 because for every positive value of xfs_mount->m_dalign,
the condition "!(mp->m_dalign & mp->m_inoalign_mask)" would evaluate to
false.

Also, xfs_imap() worked fine even with xfs_mount->m_inoalign_mask having
-1 as the value because blks_per_cluster variable would have the value 1
and hence we would never have a need to use xfs_mount->m_inoalign_mask
to compute the inode chunk's agbno and offset within the chunk.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_mount.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -502,8 +502,7 @@ STATIC void
 xfs_set_inoalignment(xfs_mount_t *mp)
 {
 	if (xfs_sb_version_hasalign(&mp->m_sb) &&
-	    mp->m_sb.sb_inoalignmt >=
-	    XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size))
+		mp->m_sb.sb_inoalignmt >= xfs_icluster_size_fsb(mp))
 		mp->m_inoalign_mask = mp->m_sb.sb_inoalignmt - 1;
 	else
 		mp->m_inoalign_mask = 0;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 28/72] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 27/72] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 29/72] xfs: try any AG when allocating the first btree block when reflinking Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Xiong Zhou, Brian Foster,
	Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit f65e6fad293b3a5793b7fa2044800506490e7a2e upstream.

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c |   24 ++++++++++++++----------
 fs/xfs/xfs_iomap.c       |   16 +++++++++++-----
 2 files changed, 25 insertions(+), 15 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4253,6 +4253,19 @@ xfs_bmapi_read(
 	return 0;
 }
 
+/*
+ * Add a delayed allocation extent to an inode. Blocks are reserved from the
+ * global pool and the extent inserted into the inode in-core extent tree.
+ *
+ * On entry, got refers to the first extent beyond the offset of the extent to
+ * allocate or eof is specified if no such extent exists. On return, got refers
+ * to the extent record that was inserted to the inode fork.
+ *
+ * Note that the allocated extent may have been merged with contiguous extents
+ * during insertion into the inode fork. Thus, got does not reflect the current
+ * state of the inode fork on return. If necessary, the caller can use lastx to
+ * look up the updated record in the inode fork.
+ */
 int
 xfs_bmapi_reserve_delalloc(
 	struct xfs_inode	*ip,
@@ -4339,13 +4352,8 @@ xfs_bmapi_reserve_delalloc(
 	got->br_startblock = nullstartblock(indlen);
 	got->br_blockcount = alen;
 	got->br_state = XFS_EXT_NORM;
-	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
-	/*
-	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
-	 * might have merged it into one of the neighbouring ones.
-	 */
-	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
+	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
 	 * Tag the inode if blocks were preallocated. Note that COW fork
@@ -4357,10 +4365,6 @@ xfs_bmapi_reserve_delalloc(
 	if (whichfork == XFS_COW_FORK && (prealloc || aoff < off || alen > len))
 		xfs_inode_set_cowblocks_tag(ip);
 
-	ASSERT(got->br_startoff <= aoff);
-	ASSERT(got->br_startoff + got->br_blockcount >= aoff + alen);
-	ASSERT(isnullstartblock(got->br_startblock));
-	ASSERT(got->br_state == XFS_EXT_NORM);
 	return 0;
 
 out_unreserve_blocks:
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -637,6 +637,11 @@ retry:
 		goto out_unlock;
 	}
 
+	/*
+	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+	 * them out if the write happens to fail.
+	 */
+	iomap->flags = IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1061,7 +1066,8 @@ xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
 	loff_t			offset,
 	loff_t			length,
-	ssize_t			written)
+	ssize_t			written,
+	struct iomap		*iomap)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
@@ -1080,14 +1086,14 @@ xfs_file_iomap_end_delalloc(
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
 	/*
-	 * Trim back delalloc blocks if we didn't manage to write the whole
-	 * range reserved.
+	 * Trim delalloc blocks if they were allocated by this write and we
+	 * didn't manage to write the whole range.
 	 *
 	 * We don't need to care about racing delalloc as we hold i_mutex
 	 * across the reserve/allocate/unreserve calls. If there are delalloc
 	 * blocks in the range, they are ours.
 	 */
-	if (start_fsb < end_fsb) {
+	if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) {
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
@@ -1117,7 +1123,7 @@ xfs_file_iomap_end(
 {
 	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written);
+				length, written, iomap);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 29/72] xfs: try any AG when allocating the first btree block when reflinking
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 28/72] xfs: use iomap new flag for newly allocated delalloc blocks Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 30/72] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Christoph Hellwig, Darrick J. Wong

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <hch@lst.de>

commit 2fcc319d2467a5f5b78f35f79fd6e22741a31b1e upstream.

When a reflink operation causes the bmap code to allocate a btree block
we're currently doing single-AG allocations due to having ->firstblock
set and then try any higher AG due a little reflink quirk we've put in
when adding the reflink code.  But given that we do not have a minleft
reservation of any kind in this AG we can still not have any space in
the same or higher AG even if the file system has enough free space.
To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
path instead.

[And yes, we need to redo this properly instead of piling hacks over
 hacks.  I'm working on that, but it's not going to be a small series.
 In the meantime this fixes the customer reported issue]

Also add a warning for failing allocations to make it easier to debug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/libxfs/xfs_bmap.c       |   10 +++++++---
 fs/xfs/libxfs/xfs_bmap_btree.c |    6 +++---
 2 files changed, 10 insertions(+), 6 deletions(-)

--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -769,8 +769,8 @@ xfs_bmap_extents_to_btree(
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
 	} else if (dfops->dop_low) {
-try_another_ag:
 		args.type = XFS_ALLOCTYPE_START_BNO;
+try_another_ag:
 		args.fsbno = *firstblock;
 	} else {
 		args.type = XFS_ALLOCTYPE_NEAR_BNO;
@@ -796,13 +796,17 @@ try_another_ag:
 	if (xfs_sb_version_hasreflink(&cur->bc_mp->m_sb) &&
 	    args.fsbno == NULLFSBLOCK &&
 	    args.type == XFS_ALLOCTYPE_NEAR_BNO) {
-		dfops->dop_low = true;
+		args.type = XFS_ALLOCTYPE_FIRST_AG;
 		goto try_another_ag;
 	}
+	if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) {
+		xfs_iroot_realloc(ip, -1, whichfork);
+		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+		return -ENOSPC;
+	}
 	/*
 	 * Allocation can't fail, the space was reserved.
 	 */
-	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(*firstblock == NULLFSBLOCK ||
 	       args.agno >= XFS_FSB_TO_AGNO(mp, *firstblock));
 	*firstblock = cur->bc_private.b.firstblock = args.fsbno;
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -453,8 +453,8 @@ xfs_bmbt_alloc_block(
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
-try_another_ag:
 		args.type = XFS_ALLOCTYPE_START_BNO;
+try_another_ag:
 		/*
 		 * Make sure there is sufficient room left in the AG to
 		 * complete a full tree split for an extent insert.  If
@@ -494,8 +494,8 @@ try_another_ag:
 	if (xfs_sb_version_hasreflink(&cur->bc_mp->m_sb) &&
 	    args.fsbno == NULLFSBLOCK &&
 	    args.type == XFS_ALLOCTYPE_NEAR_BNO) {
-		cur->bc_private.b.dfops->dop_low = true;
 		args.fsbno = cur->bc_private.b.firstblock;
+		args.type = XFS_ALLOCTYPE_FIRST_AG;
 		goto try_another_ag;
 	}
 
@@ -512,7 +512,7 @@ try_another_ag:
 			goto error0;
 		cur->bc_private.b.dfops->dop_low = true;
 	}
-	if (args.fsbno == NULLFSBLOCK) {
+	if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) {
 		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
 		*stat = 0;
 		return 0;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 30/72] scsi: sg: check length passed to SG_NEXT_CMD_LEN
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 29/72] xfs: try any AG when allocating the first btree block when reflinking Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 31/72] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Peter Chang, Douglas Gilbert,
	Martin K. Petersen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: peter chang <dpf@google.com>

commit bf33f87dd04c371ea33feb821b60d63d754e3124 upstream.

The user can control the size of the next command passed along, but the
value passed to the ioctl isn't checked against the usable max command
size.

Signed-off-by: Peter Chang <dpf@google.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/sg.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -998,6 +998,8 @@ sg_ioctl(struct file *filp, unsigned int
 		result = get_user(val, ip);
 		if (result)
 			return result;
+		if (val > SG_MAX_CDB_SIZE)
+			return -ENOMEM;
 		sfp->next_cmd_len = (val > 0) ? val : 0;
 		return 0;
 	case SG_GET_VERSION_NUM:

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 31/72] scsi: libsas: fix ata xfer length
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 30/72] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 32/72] scsi: scsi_dh_alua: Check scsi_device_get() return value Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, John Garry, Martin K. Petersen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: John Garry <john.garry@huawei.com>

commit 9702c67c6066f583b629cf037d2056245bb7a8e6 upstream.

The total ata xfer length may not be calculated properly, in that we do
not use the proper method to get an sg element dma length.

According to the code comment, sg_dma_len() should be used after
dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of the hisi_sas
controller in hip07. Multiple sg elements were being combined into a
single element, but the original first element length was being use as
the total xfer length.

Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/libsas/sas_ata.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -221,7 +221,7 @@ static unsigned int sas_ata_qc_issue(str
 		task->num_scatter = qc->n_elem;
 	} else {
 		for_each_sg(qc->sg, sg, qc->n_elem, si)
-			xfer += sg->length;
+			xfer += sg_dma_len(sg);
 
 		task->total_xfer_len = xfer;
 		task->num_scatter = si;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 32/72] scsi: scsi_dh_alua: Check scsi_device_get() return value
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 31/72] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 33/72] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bart Van Assche, Hannes Reinecke,
	Tang Junhui, Martin K. Petersen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bart.vanassche@sandisk.com>

commit 625fe857e4fac6518716f3c0ff5e5deb8ec6d238 upstream.

Do not queue ALUA work nor call scsi_device_put() if the
scsi_device_get() call fails. This patch fixes the following crash:

general protection fault: 0000 [#1] SMP
RIP: 0010:scsi_device_put+0xb/0x30
Call Trace:
 scsi_disk_put+0x2d/0x40
 sd_release+0x3d/0xb0
 __blkdev_put+0x29e/0x360
 blkdev_put+0x49/0x170
 dm_put_table_device+0x58/0xc0 [dm_mod]
 dm_put_device+0x70/0xc0 [dm_mod]
 free_priority_group+0x92/0xc0 [dm_multipath]
 free_multipath+0x70/0xc0 [dm_multipath]
 multipath_dtr+0x19/0x20 [dm_multipath]
 dm_table_destroy+0x67/0x120 [dm_mod]
 dev_suspend+0xde/0x240 [dm_mod]
 ctl_ioctl+0x1f5/0x520 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Fixes: commit 03197b61c5ec ("scsi_dh_alua: Use workqueue for RTPG")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/device_handler/scsi_dh_alua.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -870,7 +870,7 @@ static void alua_rtpg_queue(struct alua_
 	unsigned long flags;
 	struct workqueue_struct *alua_wq = kaluad_wq;
 
-	if (!pg)
+	if (!pg || scsi_device_get(sdev))
 		return;
 
 	spin_lock_irqsave(&pg->lock, flags);
@@ -884,14 +884,12 @@ static void alua_rtpg_queue(struct alua_
 		pg->flags |= ALUA_PG_RUN_RTPG;
 		kref_get(&pg->kref);
 		pg->rtpg_sdev = sdev;
-		scsi_device_get(sdev);
 		start_queue = 1;
 	} else if (!(pg->flags & ALUA_PG_RUN_RTPG) && force) {
 		pg->flags |= ALUA_PG_RUN_RTPG;
 		/* Do not queue if the worker is already running */
 		if (!(pg->flags & ALUA_PG_RUNNING)) {
 			kref_get(&pg->kref);
-			sdev = NULL;
 			start_queue = 1;
 		}
 	}
@@ -900,13 +898,15 @@ static void alua_rtpg_queue(struct alua_
 		alua_wq = kaluad_sync_wq;
 	spin_unlock_irqrestore(&pg->lock, flags);
 
-	if (start_queue &&
-	    !queue_delayed_work(alua_wq, &pg->rtpg_work,
-				msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS))) {
-		if (sdev)
-			scsi_device_put(sdev);
-		kref_put(&pg->kref, release_port_group);
+	if (start_queue) {
+		if (queue_delayed_work(alua_wq, &pg->rtpg_work,
+				msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS)))
+			sdev = NULL;
+		else
+			kref_put(&pg->kref, release_port_group);
 	}
+	if (sdev)
+		scsi_device_put(sdev);
 }
 
 /*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 33/72] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 32/72] scsi: scsi_dh_alua: Check scsi_device_get() return value Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 35/72] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Bart Van Assche, Hannes Reinecke,
	Tang Junhui, Martin K. Petersen

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bart Van Assche <bart.vanassche@sandisk.com>

commit 7cb689fe42927281b8d98606ae5450173fcc66a9 upstream.

Callers of scsi_dh_activate(), e.g. dm-mpath, assume that this function
either returns an error code or calls the completion function. Make
alua_activate() call the completion function even if scsi_device_get()
fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/device_handler/scsi_dh_alua.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -113,7 +113,7 @@ struct alua_queue_data {
 #define ALUA_POLICY_SWITCH_ALL		1
 
 static void alua_rtpg_work(struct work_struct *work);
-static void alua_rtpg_queue(struct alua_port_group *pg,
+static bool alua_rtpg_queue(struct alua_port_group *pg,
 			    struct scsi_device *sdev,
 			    struct alua_queue_data *qdata, bool force);
 static void alua_check(struct scsi_device *sdev, bool force);
@@ -862,7 +862,13 @@ static void alua_rtpg_work(struct work_s
 	kref_put(&pg->kref, release_port_group);
 }
 
-static void alua_rtpg_queue(struct alua_port_group *pg,
+/**
+ * alua_rtpg_queue() - cause RTPG to be submitted asynchronously
+ *
+ * Returns true if and only if alua_rtpg_work() will be called asynchronously.
+ * That function is responsible for calling @qdata->fn().
+ */
+static bool alua_rtpg_queue(struct alua_port_group *pg,
 			    struct scsi_device *sdev,
 			    struct alua_queue_data *qdata, bool force)
 {
@@ -871,7 +877,7 @@ static void alua_rtpg_queue(struct alua_
 	struct workqueue_struct *alua_wq = kaluad_wq;
 
 	if (!pg || scsi_device_get(sdev))
-		return;
+		return false;
 
 	spin_lock_irqsave(&pg->lock, flags);
 	if (qdata) {
@@ -907,6 +913,8 @@ static void alua_rtpg_queue(struct alua_
 	}
 	if (sdev)
 		scsi_device_put(sdev);
+
+	return true;
 }
 
 /*
@@ -1007,11 +1015,13 @@ static int alua_activate(struct scsi_dev
 		mutex_unlock(&h->init_mutex);
 		goto out;
 	}
-	fn = NULL;
 	rcu_read_unlock();
 	mutex_unlock(&h->init_mutex);
 
-	alua_rtpg_queue(pg, sdev, qdata, true);
+	if (alua_rtpg_queue(pg, sdev, qdata, true))
+		fn = NULL;
+	else
+		err = SCSI_DH_DEV_OFFLINED;
 	kref_put(&pg->kref, release_port_group);
 out:
 	if (fn)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 35/72] ALSA: seq: Fix race during FIFO resize
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 33/72] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 36/72] ALSA: hda - fix a problem for lineout on a Dell AIO machine Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Takashi Iwai

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <tiwai@suse.de>

commit 2d7d54002e396c180db0c800c1046f0a3c471597 upstream.

When a new event is queued while processing to resize the FIFO in
snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool
that is being queued gets removed.  For avoiding this race, we need to
close the pool to be deleted and sync its usage before actually
deleting it.

The issue was spotted by syzkaller.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/core/seq/seq_fifo.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/sound/core/seq/seq_fifo.c
+++ b/sound/core/seq/seq_fifo.c
@@ -265,6 +265,10 @@ int snd_seq_fifo_resize(struct snd_seq_f
 	/* NOTE: overflow flag is not cleared */
 	spin_unlock_irqrestore(&f->lock, flags);
 
+	/* close the old pool and wait until all users are gone */
+	snd_seq_pool_mark_closing(oldpool);
+	snd_use_lock_sync(&f->use_lock);
+
 	/* release cells in old pool */
 	for (cell = oldhead; cell; cell = next) {
 		next = cell->next;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 36/72] ALSA: hda - fix a problem for lineout on a Dell AIO machine
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 35/72] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 37/72] ASoC: atmel-classd: fix audio clock rate Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hui Wang, Takashi Iwai

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hui Wang <hui.wang@canonical.com>

commit 2f726aec19a9d2c63bec9a8a53a3910ffdcd09f8 upstream.

On this Dell AIO machine, the lineout jack does not work.

We found the pin 0x1a is assigned to lineout on this machine, and in
the past, we applied ALC298_FIXUP_DELL1_MIC_NO_PRESENCE to fix the
heaset-set mic problem for this machine, this fixup will redefine
the pin 0x1a to headphone-mic, as a result the lineout doesn't
work anymore.

After consulting with Dell, they told us this machine doesn't support
microphone via headset jack, so we add a new fixup which only defines
the pin 0x18 as the headset-mic.

[rearranged the fixup insertion position by tiwai in order to make the
 merge with other branches easier -- tiwai]

Fixes: 59ec4b57bcae ("ALSA: hda - Fix headset mic detection problem for two dell machines")
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/pci/hda/patch_realtek.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -4846,6 +4846,7 @@ enum {
 	ALC292_FIXUP_DISABLE_AAMIX,
 	ALC293_FIXUP_DISABLE_AAMIX_MULTIJACK,
 	ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
+	ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE,
 	ALC275_FIXUP_DELL_XPS,
 	ALC256_FIXUP_DELL_XPS_13_HEADPHONE_NOISE,
 	ALC293_FIXUP_LENOVO_SPK_NOISE,
@@ -5446,6 +5447,15 @@ static const struct hda_fixup alc269_fix
 		.chained = true,
 		.chain_id = ALC269_FIXUP_HEADSET_MODE
 	},
+	[ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE] = {
+		.type = HDA_FIXUP_PINS,
+		.v.pins = (const struct hda_pintbl[]) {
+			{ 0x18, 0x01a1913c }, /* use as headset mic, without its own jack detect */
+			{ }
+		},
+		.chained = true,
+		.chain_id = ALC269_FIXUP_HEADSET_MODE
+	},
 	[ALC275_FIXUP_DELL_XPS] = {
 		.type = HDA_FIXUP_VERBS,
 		.v.verbs = (const struct hda_verb[]) {
@@ -5518,7 +5528,7 @@ static const struct hda_fixup alc269_fix
 		.type = HDA_FIXUP_FUNC,
 		.v.func = alc298_fixup_speaker_volume,
 		.chained = true,
-		.chain_id = ALC298_FIXUP_DELL1_MIC_NO_PRESENCE,
+		.chain_id = ALC298_FIXUP_DELL_AIO_MIC_NO_PRESENCE,
 	},
 	[ALC256_FIXUP_DELL_INSPIRON_7559_SUBWOOFER] = {
 		.type = HDA_FIXUP_PINS,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 37/72] ASoC: atmel-classd: fix audio clock rate
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 36/72] ALSA: hda - fix a problem for lineout on a Dell AIO machine Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 38/72] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dushara Jayasinghe, Songjun Wu,
	Nicolas Ferre, Mark Brown

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Songjun Wu <songjun.wu@microchip.com>

commit cd3ac9affc43b44f49d7af70d275f0bd426ba643 upstream.

Fix the audio clock rate according to the datasheet.

Reported-by: Dushara Jayasinghe <dushara@successful.com.au>
Signed-off-by: Songjun Wu <songjun.wu@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/soc/atmel/atmel-classd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/soc/atmel/atmel-classd.c
+++ b/sound/soc/atmel/atmel-classd.c
@@ -349,7 +349,7 @@ static int atmel_classd_codec_dai_digita
 }
 
 #define CLASSD_ACLK_RATE_11M2896_MPY_8 (112896 * 100 * 8)
-#define CLASSD_ACLK_RATE_12M288_MPY_8  (12228 * 1000 * 8)
+#define CLASSD_ACLK_RATE_12M288_MPY_8  (12288 * 1000 * 8)
 
 static struct {
 	int rate;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 38/72] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 37/72] ASoC: atmel-classd: fix audio clock rate Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 39/72] HID: wacom: Dont add ghost interface as shared data Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Takashi Sakamoto, Vinod Koul, Mark Brown

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Takashi Sakamoto <takashi.sakamoto@miraclelinux.com>

commit d1a6fe41d3c4ff0d26f0b186d774493555ca5282 upstream.

In 'skl_tplg_set_module_init_data()', a pointer to 'params' member of
'struct skl_algo_data' is calculated, then casted to (u32 *) and assigned
to a member of configuration data. The configuration data is passed to the
other functions and used to process intel IPC. In this processing, the
value of member is used to get message data, however this can bring invalid
memory access in 'skl_set_module_params()' as a result of calculation of
a pointer for actual message data.

(sound/soc/intel/skylake/skl-topology.c)
skl_tplg_init_pipe_modules()
->skl_tplg_set_module_init_data() (has this bug)
->skl_tplg_set_module_params()
  (sound/soc/intel/skylake/skl-messages.c)
  ->skl_set_module_params()
    ((char *)param) + data_offset

This commit fixes the bug.

Fixes: abb740033b56 ("ASoC: Intel: Skylake: Add support to configure module params")
Signed-off-by: Takashi Sakamoto <takashi.sakamoto@miraclelinux.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 sound/soc/intel/skylake/skl-topology.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/sound/soc/intel/skylake/skl-topology.c
+++ b/sound/soc/intel/skylake/skl-topology.c
@@ -448,7 +448,7 @@ static int skl_tplg_set_module_init_data
 			if (bc->set_params != SKL_PARAM_INIT)
 				continue;
 
-			mconfig->formats_config.caps = (u32 *)&bc->params;
+			mconfig->formats_config.caps = (u32 *)bc->params;
 			mconfig->formats_config.caps_size = bc->size;
 
 			break;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 39/72] HID: wacom: Dont add ghost interface as shared data
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 38/72] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 40/72] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aaron Armstrong Skomra, Jiri Kosina

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Aaron Armstrong Skomra <skomra@gmail.com>

commit 8b4073596997f2ccbf68d8e72e07b827388a4536 upstream.

A previous commit (below) adds a check for already probed interfaces to
Wacom's matching heuristic. Unfortunately this causes the Bamboo Pen
(CTL-460) to match itself to its 'ghost' touch interface. After
subsequent changes to the driver this match to the ghost causes the
kernel to crash. This patch avoids calling wacom_add_shared_data()
for the BAMBOO_PEN's ghost touch interface.

Fixes: 41372d5d40e7 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
Signed-off-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/hid/wacom_sys.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -2017,6 +2017,14 @@ static int wacom_parse_and_register(stru
 
 	wacom_update_name(wacom, wireless ? " (WL)" : "");
 
+	/* pen only Bamboo neither support touch nor pad */
+	if ((features->type == BAMBOO_PEN) &&
+	    ((features->device_type & WACOM_DEVICETYPE_TOUCH) ||
+	    (features->device_type & WACOM_DEVICETYPE_PAD))) {
+		error = -ENODEV;
+		goto fail;
+	}
+
 	error = wacom_add_shared_data(hdev);
 	if (error)
 		goto fail;
@@ -2063,14 +2071,6 @@ static int wacom_parse_and_register(stru
 		error = -ENODEV;
 		goto fail_quirks;
 	}
-
-	/* pen only Bamboo neither support touch nor pad */
-	if ((features->type == BAMBOO_PEN) &&
-	    ((features->device_type & WACOM_DEVICETYPE_TOUCH) ||
-	    (features->device_type & WACOM_DEVICETYPE_PAD))) {
-		error = -ENODEV;
-		goto fail_quirks;
-	}
 
 	if (features->device_type & WACOM_DEVICETYPE_WL_MONITOR)
 		error = hid_hw_open(hdev);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 40/72] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 39/72] HID: wacom: Dont add ghost interface as shared data Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 41/72] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dong Aisheng, Ian W MORRISON,
	Hans de Goede, Adrian Hunter, Dong Aisheng, Ulf Hansson

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hans de Goede <hdegoede@redhat.com>

commit 923713b357455cfb9aca2cd3429cb0806a724ed2 upstream.

SDIO cards may need clock to send the card interrupt to the host.

On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
pinging the tablet results in:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms

Where as with this patch I get:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms

Cc: Dong Aisheng <b29396@freescale.com>
Cc: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/mmc/host/sdhci.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1823,6 +1823,9 @@ static void sdhci_enable_sdio_irq(struct
 	struct sdhci_host *host = mmc_priv(mmc);
 	unsigned long flags;
 
+	if (enable)
+		pm_runtime_get_noresume(host->mmc->parent);
+
 	spin_lock_irqsave(&host->lock, flags);
 	if (enable)
 		host->flags |= SDHCI_SDIO_IRQ_ENABLED;
@@ -1831,6 +1834,9 @@ static void sdhci_enable_sdio_irq(struct
 
 	sdhci_enable_sdio_irq_nolock(host, enable);
 	spin_unlock_irqrestore(&host->lock, flags);
+
+	if (!enable)
+		pm_runtime_put_noidle(host->mmc->parent);
 }
 
 static int sdhci_start_signal_voltage_switch(struct mmc_host *mmc,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 41/72] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 40/72] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 42/72] NFSv4.1 fix infinite loop on IO BAD_STATEID error Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ludovic Desroches, Ulf Hansson

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ludovic Desroches <ludovic.desroches@microchip.com>

commit d0918764c17b94c30bbb2619929b1719ff52707a upstream.

The controller has different timings for MMC_TIMING_UHS_DDR50 and
MMC_TIMING_MMC_DDR52. Configuring the controller with SDHCI_CTRL_UHS_DDR50,
when MMC_TIMING_MMC_DDR52 timings are requested, is not correct and can
lead to unexpected behavior.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Fixes: bb5f8ea4d514 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/mmc/host/sdhci-of-at91.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/mmc/host/sdhci-of-at91.c
+++ b/drivers/mmc/host/sdhci-of-at91.c
@@ -29,6 +29,8 @@
 
 #include "sdhci-pltfm.h"
 
+#define SDMMC_MC1R	0x204
+#define		SDMMC_MC1R_DDR		BIT(3)
 #define SDMMC_CACR	0x230
 #define		SDMMC_CACR_CAPWREN	BIT(0)
 #define		SDMMC_CACR_KEY		(0x46 << 8)
@@ -103,11 +105,18 @@ static void sdhci_at91_set_power(struct
 	sdhci_set_power_noreg(host, mode, vdd);
 }
 
+void sdhci_at91_set_uhs_signaling(struct sdhci_host *host, unsigned int timing)
+{
+	if (timing == MMC_TIMING_MMC_DDR52)
+		sdhci_writeb(host, SDMMC_MC1R_DDR, SDMMC_MC1R);
+	sdhci_set_uhs_signaling(host, timing);
+}
+
 static const struct sdhci_ops sdhci_at91_sama5d2_ops = {
 	.set_clock		= sdhci_at91_set_clock,
 	.set_bus_width		= sdhci_set_bus_width,
 	.reset			= sdhci_reset,
-	.set_uhs_signaling	= sdhci_set_uhs_signaling,
+	.set_uhs_signaling	= sdhci_at91_set_uhs_signaling,
 	.set_power		= sdhci_at91_set_power,
 };
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 42/72] NFSv4.1 fix infinite loop on IO BAD_STATEID error
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 41/72] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 43/72] nfsd: map the ENOKEY to nfserr_perm for avoiding warning Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Olga Kornievskaia, Anna Schumaker

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Olga Kornievskaia <kolga@netapp.com>

commit 0e3d3e5df07dcf8a50d96e0ecd6ab9a888f55dfc upstream.

Commit 63d63cbf5e03 "NFSv4.1: Don't recheck delegations that
have already been checked" introduced a regression where when a
client received BAD_STATEID error it would not send any TEST_STATEID
and instead go into an infinite loop of resending the IO that caused
the BAD_STATEID.

Fixes: 63d63cbf5e03 ("NFSv4.1: Don't recheck delegations that have already been checked")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfs/nfs4proc.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2532,17 +2532,14 @@ static void nfs41_check_delegation_state
 	}
 
 	nfs4_stateid_copy(&stateid, &delegation->stateid);
-	if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) {
+	if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags) ||
+		!test_and_clear_bit(NFS_DELEGATION_TEST_EXPIRED,
+			&delegation->flags)) {
 		rcu_read_unlock();
 		nfs_finish_clear_delegation_stateid(state, &stateid);
 		return;
 	}
 
-	if (!test_and_clear_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags)) {
-		rcu_read_unlock();
-		return;
-	}
-
 	cred = get_rpccred(delegation->cred);
 	rcu_read_unlock();
 	status = nfs41_test_and_free_expired_stateid(server, &stateid, cred);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 43/72] nfsd: map the ENOKEY to nfserr_perm for avoiding warning
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 42/72] NFSv4.1 fix infinite loop on IO BAD_STATEID error Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 44/72] parisc: Clean up fixup routines for get_user()/put_user() Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Kinglong Mee, J. Bruce Fields

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Kinglong Mee <kinglongmee@gmail.com>

commit c952cd4e949ab3d07287efc2e80246e03727d15d upstream.

Now that Ext4 and f2fs filesystems support encrypted directories and
files, attempts to access those files may return ENOKEY, resulting in
the following WARNING.

Map ENOKEY to nfserr_perm instead of nfserr_io.

[ 1295.411759] ------------[ cut here ]------------
[ 1295.411787] WARNING: CPU: 0 PID: 12786 at fs/nfsd/nfsproc.c:796 nfserrno+0x74/0x80 [nfsd]
[ 1295.411806] nfsd: non-standard errno: -126
[ 1295.411816] Modules linked in: nfsd nfs_acl auth_rpcgss nfsv4 nfs lockd fscache tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event coretemp crct10dif_pclmul crc32_generic crc32_pclmul snd_ens1371 gameport ghash_clmulni_intel snd_ac97_codec f2fs intel_rapl_perf ac97_bus snd_seq ppdev snd_pcm snd_rawmidi snd_timer vmw_balloon snd_seq_device snd joydev soundcore parport_pc parport nfit acpi_cpufreq tpm_tis vmw_vmci tpm_tis_core tpm shpchp i2c_piix4 grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel e1000 mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: nfs_acl]
[ 1295.412522] CPU: 0 PID: 12786 Comm: nfsd Tainted: G        W       4.11.0-rc1+ #521
[ 1295.412959] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1295.413814] Call Trace:
[ 1295.414252]  dump_stack+0x63/0x86
[ 1295.414666]  __warn+0xcb/0xf0
[ 1295.415087]  warn_slowpath_fmt+0x5f/0x80
[ 1295.415502]  ? put_filp+0x42/0x50
[ 1295.415927]  nfserrno+0x74/0x80 [nfsd]
[ 1295.416339]  nfsd_open+0xd7/0x180 [nfsd]
[ 1295.416746]  nfs4_get_vfs_file+0x367/0x3c0 [nfsd]
[ 1295.417182]  ? security_inode_permission+0x41/0x60
[ 1295.417591]  nfsd4_process_open2+0x9b2/0x1200 [nfsd]
[ 1295.418007]  nfsd4_open+0x481/0x790 [nfsd]
[ 1295.418409]  nfsd4_proc_compound+0x395/0x680 [nfsd]
[ 1295.418812]  nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 1295.419233]  svc_process_common+0x4d9/0x830 [sunrpc]
[ 1295.419631]  svc_process+0xfe/0x1b0 [sunrpc]
[ 1295.420033]  nfsd+0xe9/0x150 [nfsd]
[ 1295.420420]  kthread+0x101/0x140
[ 1295.420802]  ? nfsd_destroy+0x60/0x60 [nfsd]
[ 1295.421199]  ? kthread_park+0x90/0x90
[ 1295.421598]  ret_from_fork+0x2c/0x40
[ 1295.421996] ---[ end trace 0d5a969cd7852e1f ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/nfsd/nfsproc.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -790,6 +790,7 @@ nfserrno (int errno)
 		{ nfserr_serverfault, -ESERVERFAULT },
 		{ nfserr_serverfault, -ENFILE },
 		{ nfserr_io, -EUCLEAN },
+		{ nfserr_perm, -ENOKEY },
 	};
 	int	i;
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 44/72] parisc: Clean up fixup routines for get_user()/put_user()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 43/72] nfsd: map the ENOKEY to nfserr_perm for avoiding warning Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 45/72] parisc: Avoid stalled CPU warnings after system shutdown Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Helge Deller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit d19f5e41b344a057bb2450024a807476f30978d2 upstream.

Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.

This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace.  Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.

This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
   helps the compiler register allocator and thus creates less assembler
   statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/include/asm/uaccess.h |   59 +++++++++++++---------
 arch/parisc/kernel/parisc_ksyms.c |   10 ---
 arch/parisc/lib/Makefile          |    2 
 arch/parisc/lib/fixup.S           |   98 --------------------------------------
 arch/parisc/mm/fault.c            |   17 ++++++
 5 files changed, 52 insertions(+), 134 deletions(-)

--- a/arch/parisc/include/asm/uaccess.h
+++ b/arch/parisc/include/asm/uaccess.h
@@ -68,6 +68,15 @@ struct exception_table_entry {
 	".previous\n"
 
 /*
+ * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() creates a special exception table entry
+ * (with lowest bit set) for which the fault handler in fixup_exception() will
+ * load -EFAULT into %r8 for a read or write fault, and zeroes the target
+ * register in case of a read fault in get_user().
+ */
+#define ASM_EXCEPTIONTABLE_ENTRY_EFAULT( fault_addr, except_addr )\
+	ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr + 1)
+
+/*
  * The page fault handler stores, in a per-cpu area, the following information
  * if a fixup routine is available.
  */
@@ -94,7 +103,7 @@ struct exception_data {
 #define __get_user(x, ptr)                               \
 ({                                                       \
 	register long __gu_err __asm__ ("r8") = 0;       \
-	register long __gu_val __asm__ ("r9") = 0;       \
+	register long __gu_val;				 \
 							 \
 	load_sr2();					 \
 	switch (sizeof(*(ptr))) {			 \
@@ -110,22 +119,23 @@ struct exception_data {
 })
 
 #define __get_user_asm(ldx, ptr)                        \
-	__asm__("\n1:\t" ldx "\t0(%%sr2,%2),%0\n\t"	\
-		ASM_EXCEPTIONTABLE_ENTRY(1b, fixup_get_user_skip_1)\
+	__asm__("1: " ldx " 0(%%sr2,%2),%0\n"		\
+		"9:\n"					\
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)	\
 		: "=r"(__gu_val), "=r"(__gu_err)        \
-		: "r"(ptr), "1"(__gu_err)		\
-		: "r1");
+		: "r"(ptr), "1"(__gu_err));
 
 #if !defined(CONFIG_64BIT)
 
 #define __get_user_asm64(ptr) 				\
-	__asm__("\n1:\tldw 0(%%sr2,%2),%0"		\
-		"\n2:\tldw 4(%%sr2,%2),%R0\n\t"		\
-		ASM_EXCEPTIONTABLE_ENTRY(1b, fixup_get_user_skip_2)\
-		ASM_EXCEPTIONTABLE_ENTRY(2b, fixup_get_user_skip_1)\
+	__asm__("   copy %%r0,%R0\n"			\
+		"1: ldw 0(%%sr2,%2),%0\n"		\
+		"2: ldw 4(%%sr2,%2),%R0\n"		\
+		"9:\n"					\
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)	\
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b)	\
 		: "=r"(__gu_val), "=r"(__gu_err)	\
-		: "r"(ptr), "1"(__gu_err)		\
-		: "r1");
+		: "r"(ptr), "1"(__gu_err));
 
 #endif /* !defined(CONFIG_64BIT) */
 
@@ -151,32 +161,31 @@ struct exception_data {
  * The "__put_user/kernel_asm()" macros tell gcc they read from memory
  * instead of writing. This is because they do not write to any memory
  * gcc knows about, so there are no aliasing issues. These macros must
- * also be aware that "fixup_put_user_skip_[12]" are executed in the
- * context of the fault, and any registers used there must be listed
- * as clobbers. In this case only "r1" is used by the current routines.
- * r8/r9 are already listed as err/val.
+ * also be aware that fixups are executed in the context of the fault,
+ * and any registers used there must be listed as clobbers.
+ * r8 is already listed as err.
  */
 
 #define __put_user_asm(stx, x, ptr)                         \
 	__asm__ __volatile__ (                              \
-		"\n1:\t" stx "\t%2,0(%%sr2,%1)\n\t"	    \
-		ASM_EXCEPTIONTABLE_ENTRY(1b, fixup_put_user_skip_1)\
+		"1: " stx " %2,0(%%sr2,%1)\n"		    \
+		"9:\n"					    \
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)	    \
 		: "=r"(__pu_err)                            \
-		: "r"(ptr), "r"(x), "0"(__pu_err)	    \
-		: "r1")
+		: "r"(ptr), "r"(x), "0"(__pu_err))
 
 
 #if !defined(CONFIG_64BIT)
 
 #define __put_user_asm64(__val, ptr) do {	    	    \
 	__asm__ __volatile__ (				    \
-		"\n1:\tstw %2,0(%%sr2,%1)"		    \
-		"\n2:\tstw %R2,4(%%sr2,%1)\n\t"		    \
-		ASM_EXCEPTIONTABLE_ENTRY(1b, fixup_put_user_skip_2)\
-		ASM_EXCEPTIONTABLE_ENTRY(2b, fixup_put_user_skip_1)\
+		"1: stw %2,0(%%sr2,%1)\n"		    \
+		"2: stw %R2,4(%%sr2,%1)\n"		    \
+		"9:\n"					    \
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)	    \
+		ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b)	    \
 		: "=r"(__pu_err)                            \
-		: "r"(ptr), "r"(__val), "0"(__pu_err) \
-		: "r1");				    \
+		: "r"(ptr), "r"(__val), "0"(__pu_err));	    \
 } while (0)
 
 #endif /* !defined(CONFIG_64BIT) */
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -47,16 +47,6 @@ EXPORT_SYMBOL(__cmpxchg_u64);
 EXPORT_SYMBOL(lclear_user);
 EXPORT_SYMBOL(lstrnlen_user);
 
-/* Global fixups - defined as int to avoid creation of function pointers */
-extern int fixup_get_user_skip_1;
-extern int fixup_get_user_skip_2;
-extern int fixup_put_user_skip_1;
-extern int fixup_put_user_skip_2;
-EXPORT_SYMBOL(fixup_get_user_skip_1);
-EXPORT_SYMBOL(fixup_get_user_skip_2);
-EXPORT_SYMBOL(fixup_put_user_skip_1);
-EXPORT_SYMBOL(fixup_put_user_skip_2);
-
 #ifndef CONFIG_64BIT
 /* Needed so insmod can set dp value */
 extern int $global$;
--- a/arch/parisc/lib/Makefile
+++ b/arch/parisc/lib/Makefile
@@ -2,7 +2,7 @@
 # Makefile for parisc-specific library files
 #
 
-lib-y	:= lusercopy.o bitops.o checksum.o io.o memset.o fixup.o memcpy.o \
+lib-y	:= lusercopy.o bitops.o checksum.o io.o memset.o memcpy.o \
 	   ucmpdi2.o delay.o
 
 obj-y	:= iomap.o
--- a/arch/parisc/lib/fixup.S
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * Linux/PA-RISC Project (http://www.parisc-linux.org/)
- *
- *  Copyright (C) 2004  Randolph Chung <tausq@debian.org>
- *
- *    This program is free software; you can redistribute it and/or modify
- *    it under the terms of the GNU General Public License as published by
- *    the Free Software Foundation; either version 2, or (at your option)
- *    any later version.
- *
- *    This program is distributed in the hope that it will be useful,
- *    but WITHOUT ANY WARRANTY; without even the implied warranty of
- *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *    GNU General Public License for more details.
- *
- *    You should have received a copy of the GNU General Public License
- *    along with this program; if not, write to the Free Software
- *    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- * 
- * Fixup routines for kernel exception handling.
- */
-#include <asm/asm-offsets.h>
-#include <asm/assembly.h>
-#include <asm/errno.h>
-#include <linux/linkage.h>
-
-#ifdef CONFIG_SMP
-	.macro  get_fault_ip t1 t2
-	loadgp
-	addil LT%__per_cpu_offset,%r27
-	LDREG RT%__per_cpu_offset(%r1),\t1
-	/* t2 = smp_processor_id() */
-	mfctl 30,\t2
-	ldw TI_CPU(\t2),\t2
-#ifdef CONFIG_64BIT
-	extrd,u \t2,63,32,\t2
-#endif
-	/* t2 = &__per_cpu_offset[smp_processor_id()]; */
-	LDREGX \t2(\t1),\t2 
-	addil LT%exception_data,%r27
-	LDREG RT%exception_data(%r1),\t1
-	/* t1 = this_cpu_ptr(&exception_data) */
-	add,l \t1,\t2,\t1
-	/* %r27 = t1->fault_gp - restore gp */
-	LDREG EXCDATA_GP(\t1), %r27
-	/* t1 = t1->fault_ip */
-	LDREG EXCDATA_IP(\t1), \t1
-	.endm
-#else
-	.macro  get_fault_ip t1 t2
-	loadgp
-	/* t1 = this_cpu_ptr(&exception_data) */
-	addil LT%exception_data,%r27
-	LDREG RT%exception_data(%r1),\t2
-	/* %r27 = t2->fault_gp - restore gp */
-	LDREG EXCDATA_GP(\t2), %r27
-	/* t1 = t2->fault_ip */
-	LDREG EXCDATA_IP(\t2), \t1
-	.endm
-#endif
-
-	.level LEVEL
-
-	.text
-	.section .fixup, "ax"
-
-	/* get_user() fixups, store -EFAULT in r8, and 0 in r9 */
-ENTRY_CFI(fixup_get_user_skip_1)
-	get_fault_ip %r1,%r8
-	ldo 4(%r1), %r1
-	ldi -EFAULT, %r8
-	bv %r0(%r1)
-	copy %r0, %r9
-ENDPROC_CFI(fixup_get_user_skip_1)
-
-ENTRY_CFI(fixup_get_user_skip_2)
-	get_fault_ip %r1,%r8
-	ldo 8(%r1), %r1
-	ldi -EFAULT, %r8
-	bv %r0(%r1)
-	copy %r0, %r9
-ENDPROC_CFI(fixup_get_user_skip_2)
-
-	/* put_user() fixups, store -EFAULT in r8 */
-ENTRY_CFI(fixup_put_user_skip_1)
-	get_fault_ip %r1,%r8
-	ldo 4(%r1), %r1
-	bv %r0(%r1)
-	ldi -EFAULT, %r8
-ENDPROC_CFI(fixup_put_user_skip_1)
-
-ENTRY_CFI(fixup_put_user_skip_2)
-	get_fault_ip %r1,%r8
-	ldo 8(%r1), %r1
-	bv %r0(%r1)
-	ldi -EFAULT, %r8
-ENDPROC_CFI(fixup_put_user_skip_2)
-
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -149,6 +149,23 @@ int fixup_exception(struct pt_regs *regs
 		d->fault_space = regs->isr;
 		d->fault_addr = regs->ior;
 
+		/*
+		 * Fix up get_user() and put_user().
+		 * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() sets the least-significant
+		 * bit in the relative address of the fixup routine to indicate
+		 * that %r8 should be loaded with -EFAULT to report a userspace
+		 * access error.
+		 */
+		if (fix->fixup & 1) {
+			regs->gr[8] = -EFAULT;
+
+			/* zero target register for get_user() */
+			if (parisc_acctyp(0, regs->iir) == VM_READ) {
+				int treg = regs->iir & 0x1f;
+				regs->gr[treg] = 0;
+			}
+		}
+
 		regs->iaoq[0] = (unsigned long)&fix->fixup + fix->fixup;
 		regs->iaoq[0] &= ~3;
 		/*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 45/72] parisc: Avoid stalled CPU warnings after system shutdown
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 44/72] parisc: Clean up fixup routines for get_user()/put_user() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 46/72] parisc: Fix access fault handling in pa_memcpy() Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Helge Deller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit 476e75a44b56038bee9207242d4bc718f6b4de06 upstream.

Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function.  But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.

Fixes: 73580dac7618 ("parisc: Fix system shutdown halt")
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/kernel/process.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -140,6 +140,8 @@ void machine_power_off(void)
 	printk(KERN_EMERG "System shut down completed.\n"
 	       "Please power this system off now.");
 
+	/* prevent soft lockup/stalled CPU messages for endless loop. */
+	rcu_sysrq_start();
 	for (;;);
 }
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 46/72] parisc: Fix access fault handling in pa_memcpy()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 45/72] parisc: Avoid stalled CPU warnings after system shutdown Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 47/72] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Al Viro, John David Anglin, Helge Deller

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Helge Deller <deller@gmx.de>

commit 554bfeceb8a22d448cd986fc9efce25e833278a1 upstream.

pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.

Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.

Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.

This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.

Runtime tested with 32- and 64bit kernels.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/parisc/lib/lusercopy.S |  318 ++++++++++++++++++++++++++++++
 arch/parisc/lib/memcpy.c    |  461 --------------------------------------------
 2 files changed, 321 insertions(+), 458 deletions(-)

--- a/arch/parisc/lib/lusercopy.S
+++ b/arch/parisc/lib/lusercopy.S
@@ -5,6 +5,8 @@
  *    Copyright (C) 2000 Richard Hirst <rhirst with parisc-linux.org>
  *    Copyright (C) 2001 Matthieu Delahaye <delahaym at esiee.fr>
  *    Copyright (C) 2003 Randolph Chung <tausq with parisc-linux.org>
+ *    Copyright (C) 2017 Helge Deller <deller@gmx.de>
+ *    Copyright (C) 2017 John David Anglin <dave.anglin@bell.net>
  *
  *
  *    This program is free software; you can redistribute it and/or modify
@@ -132,4 +134,320 @@ ENDPROC_CFI(lstrnlen_user)
 
 	.procend
 
+
+
+/*
+ * unsigned long pa_memcpy(void *dstp, const void *srcp, unsigned long len)
+ *
+ * Inputs:
+ * - sr1 already contains space of source region
+ * - sr2 already contains space of destination region
+ *
+ * Returns:
+ * - number of bytes that could not be copied.
+ *   On success, this will be zero.
+ *
+ * This code is based on a C-implementation of a copy routine written by
+ * Randolph Chung, which in turn was derived from the glibc.
+ *
+ * Several strategies are tried to try to get the best performance for various
+ * conditions. In the optimal case, we copy by loops that copy 32- or 16-bytes
+ * at a time using general registers.  Unaligned copies are handled either by
+ * aligning the destination and then using shift-and-write method, or in a few
+ * cases by falling back to a byte-at-a-time copy.
+ *
+ * Testing with various alignments and buffer sizes shows that this code is
+ * often >10x faster than a simple byte-at-a-time copy, even for strangely
+ * aligned operands. It is interesting to note that the glibc version of memcpy
+ * (written in C) is actually quite fast already. This routine is able to beat
+ * it by 30-40% for aligned copies because of the loop unrolling, but in some
+ * cases the glibc version is still slightly faster. This lends more
+ * credibility that gcc can generate very good code as long as we are careful.
+ *
+ * Possible optimizations:
+ * - add cache prefetching
+ * - try not to use the post-increment address modifiers; they may create
+ *   additional interlocks. Assumption is that those were only efficient on old
+ *   machines (pre PA8000 processors)
+ */
+
+	dst = arg0
+	src = arg1
+	len = arg2
+	end = arg3
+	t1  = r19
+	t2  = r20
+	t3  = r21
+	t4  = r22
+	srcspc = sr1
+	dstspc = sr2
+
+	t0 = r1
+	a1 = t1
+	a2 = t2
+	a3 = t3
+	a0 = t4
+
+	save_src = ret0
+	save_dst = ret1
+	save_len = r31
+
+ENTRY_CFI(pa_memcpy)
+	.proc
+	.callinfo NO_CALLS
+	.entry
+
+	/* Last destination address */
+	add	dst,len,end
+
+	/* short copy with less than 16 bytes? */
+	cmpib,>>=,n 15,len,.Lbyte_loop
+
+	/* same alignment? */
+	xor	src,dst,t0
+	extru	t0,31,2,t1
+	cmpib,<>,n  0,t1,.Lunaligned_copy
+
+#ifdef CONFIG_64BIT
+	/* only do 64-bit copies if we can get aligned. */
+	extru	t0,31,3,t1
+	cmpib,<>,n  0,t1,.Lalign_loop32
+
+	/* loop until we are 64-bit aligned */
+.Lalign_loop64:
+	extru	dst,31,3,t1
+	cmpib,=,n	0,t1,.Lcopy_loop_16
+20:	ldb,ma	1(srcspc,src),t1
+21:	stb,ma	t1,1(dstspc,dst)
+	b	.Lalign_loop64
+	ldo	-1(len),len
+
+	ASM_EXCEPTIONTABLE_ENTRY(20b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(21b,.Lcopy_done)
+
+	ldi	31,t0
+.Lcopy_loop_16:
+	cmpb,COND(>>=),n t0,len,.Lword_loop
+
+10:	ldd	0(srcspc,src),t1
+11:	ldd	8(srcspc,src),t2
+	ldo	16(src),src
+12:	std,ma	t1,8(dstspc,dst)
+13:	std,ma	t2,8(dstspc,dst)
+14:	ldd	0(srcspc,src),t1
+15:	ldd	8(srcspc,src),t2
+	ldo	16(src),src
+16:	std,ma	t1,8(dstspc,dst)
+17:	std,ma	t2,8(dstspc,dst)
+
+	ASM_EXCEPTIONTABLE_ENTRY(10b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(11b,.Lcopy16_fault)
+	ASM_EXCEPTIONTABLE_ENTRY(12b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(13b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(14b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(15b,.Lcopy16_fault)
+	ASM_EXCEPTIONTABLE_ENTRY(16b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(17b,.Lcopy_done)
+
+	b	.Lcopy_loop_16
+	ldo	-32(len),len
+
+.Lword_loop:
+	cmpib,COND(>>=),n 3,len,.Lbyte_loop
+20:	ldw,ma	4(srcspc,src),t1
+21:	stw,ma	t1,4(dstspc,dst)
+	b	.Lword_loop
+	ldo	-4(len),len
+
+	ASM_EXCEPTIONTABLE_ENTRY(20b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(21b,.Lcopy_done)
+
+#endif /* CONFIG_64BIT */
+
+	/* loop until we are 32-bit aligned */
+.Lalign_loop32:
+	extru	dst,31,2,t1
+	cmpib,=,n	0,t1,.Lcopy_loop_4
+20:	ldb,ma	1(srcspc,src),t1
+21:	stb,ma	t1,1(dstspc,dst)
+	b	.Lalign_loop32
+	ldo	-1(len),len
+
+	ASM_EXCEPTIONTABLE_ENTRY(20b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(21b,.Lcopy_done)
+
+
+.Lcopy_loop_4:
+	cmpib,COND(>>=),n 15,len,.Lbyte_loop
+
+10:	ldw	0(srcspc,src),t1
+11:	ldw	4(srcspc,src),t2
+12:	stw,ma	t1,4(dstspc,dst)
+13:	stw,ma	t2,4(dstspc,dst)
+14:	ldw	8(srcspc,src),t1
+15:	ldw	12(srcspc,src),t2
+	ldo	16(src),src
+16:	stw,ma	t1,4(dstspc,dst)
+17:	stw,ma	t2,4(dstspc,dst)
+
+	ASM_EXCEPTIONTABLE_ENTRY(10b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(11b,.Lcopy8_fault)
+	ASM_EXCEPTIONTABLE_ENTRY(12b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(13b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(14b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(15b,.Lcopy8_fault)
+	ASM_EXCEPTIONTABLE_ENTRY(16b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(17b,.Lcopy_done)
+
+	b	.Lcopy_loop_4
+	ldo	-16(len),len
+
+.Lbyte_loop:
+	cmpclr,COND(<>) len,%r0,%r0
+	b,n	.Lcopy_done
+20:	ldb	0(srcspc,src),t1
+	ldo	1(src),src
+21:	stb,ma	t1,1(dstspc,dst)
+	b	.Lbyte_loop
+	ldo	-1(len),len
+
+	ASM_EXCEPTIONTABLE_ENTRY(20b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(21b,.Lcopy_done)
+
+.Lcopy_done:
+	bv	%r0(%r2)
+	sub	end,dst,ret0
+
+
+	/* src and dst are not aligned the same way. */
+	/* need to go the hard way */
+.Lunaligned_copy:
+	/* align until dst is 32bit-word-aligned */
+	extru	dst,31,2,t1
+	cmpib,COND(=),n	0,t1,.Lcopy_dstaligned
+20:	ldb	0(srcspc,src),t1
+	ldo	1(src),src
+21:	stb,ma	t1,1(dstspc,dst)
+	b	.Lunaligned_copy
+	ldo	-1(len),len
+
+	ASM_EXCEPTIONTABLE_ENTRY(20b,.Lcopy_done)
+	ASM_EXCEPTIONTABLE_ENTRY(21b,.Lcopy_done)
+
+.Lcopy_dstaligned:
+
+	/* store src, dst and len in safe place */
+	copy	src,save_src
+	copy	dst,save_dst
+	copy	len,save_len
+
+	/* len now needs give number of words to copy */
+	SHRREG	len,2,len
+
+	/*
+	 * Copy from a not-aligned src to an aligned dst using shifts.
+	 * Handles 4 words per loop.
+	 */
+
+	depw,z src,28,2,t0
+	subi 32,t0,t0
+	mtsar t0
+	extru len,31,2,t0
+	cmpib,= 2,t0,.Lcase2
+	/* Make src aligned by rounding it down.  */
+	depi 0,31,2,src
+
+	cmpiclr,<> 3,t0,%r0
+	b,n .Lcase3
+	cmpiclr,<> 1,t0,%r0
+	b,n .Lcase1
+.Lcase0:
+	cmpb,= %r0,len,.Lcda_finish
+	nop
+
+1:	ldw,ma 4(srcspc,src), a3
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+1:	ldw,ma 4(srcspc,src), a0
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	b,n .Ldo3
+.Lcase1:
+1:	ldw,ma 4(srcspc,src), a2
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+1:	ldw,ma 4(srcspc,src), a3
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	ldo -1(len),len
+	cmpb,=,n %r0,len,.Ldo0
+.Ldo4:
+1:	ldw,ma 4(srcspc,src), a0
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	shrpw a2, a3, %sar, t0
+1:	stw,ma t0, 4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcopy_done)
+.Ldo3:
+1:	ldw,ma 4(srcspc,src), a1
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	shrpw a3, a0, %sar, t0
+1:	stw,ma t0, 4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcopy_done)
+.Ldo2:
+1:	ldw,ma 4(srcspc,src), a2
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	shrpw a0, a1, %sar, t0
+1:	stw,ma t0, 4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcopy_done)
+.Ldo1:
+1:	ldw,ma 4(srcspc,src), a3
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	shrpw a1, a2, %sar, t0
+1:	stw,ma t0, 4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcopy_done)
+	ldo -4(len),len
+	cmpb,<> %r0,len,.Ldo4
+	nop
+.Ldo0:
+	shrpw a2, a3, %sar, t0
+1:	stw,ma t0, 4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcopy_done)
+
+.Lcda_rdfault:
+.Lcda_finish:
+	/* calculate new src, dst and len and jump to byte-copy loop */
+	sub	dst,save_dst,t0
+	add	save_src,t0,src
+	b	.Lbyte_loop
+	sub	save_len,t0,len
+
+.Lcase3:
+1:	ldw,ma 4(srcspc,src), a0
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+1:	ldw,ma 4(srcspc,src), a1
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	b .Ldo2
+	ldo 1(len),len
+.Lcase2:
+1:	ldw,ma 4(srcspc,src), a1
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+1:	ldw,ma 4(srcspc,src), a2
+	ASM_EXCEPTIONTABLE_ENTRY(1b,.Lcda_rdfault)
+	b .Ldo1
+	ldo 2(len),len
+
+
+	/* fault exception fixup handlers: */
+#ifdef CONFIG_64BIT
+.Lcopy16_fault:
+10:	b	.Lcopy_done
+	std,ma	t1,8(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(10b,.Lcopy_done)
+#endif
+
+.Lcopy8_fault:
+10:	b	.Lcopy_done
+	stw,ma	t1,4(dstspc,dst)
+	ASM_EXCEPTIONTABLE_ENTRY(10b,.Lcopy_done)
+
+	.exit
+ENDPROC_CFI(pa_memcpy)
+	.procend
+
 	.end
--- a/arch/parisc/lib/memcpy.c
+++ b/arch/parisc/lib/memcpy.c
@@ -2,7 +2,7 @@
  *    Optimized memory copy routines.
  *
  *    Copyright (C) 2004 Randolph Chung <tausq@debian.org>
- *    Copyright (C) 2013 Helge Deller <deller@gmx.de>
+ *    Copyright (C) 2013-2017 Helge Deller <deller@gmx.de>
  *
  *    This program is free software; you can redistribute it and/or modify
  *    it under the terms of the GNU General Public License as published by
@@ -21,474 +21,21 @@
  *    Portions derived from the GNU C Library
  *    Copyright (C) 1991, 1997, 2003 Free Software Foundation, Inc.
  *
- * Several strategies are tried to try to get the best performance for various
- * conditions. In the optimal case, we copy 64-bytes in an unrolled loop using 
- * fp regs. This is followed by loops that copy 32- or 16-bytes at a time using
- * general registers.  Unaligned copies are handled either by aligning the 
- * destination and then using shift-and-write method, or in a few cases by 
- * falling back to a byte-at-a-time copy.
- *
- * I chose to implement this in C because it is easier to maintain and debug,
- * and in my experiments it appears that the C code generated by gcc (3.3/3.4
- * at the time of writing) is fairly optimal. Unfortunately some of the 
- * semantics of the copy routine (exception handling) is difficult to express
- * in C, so we have to play some tricks to get it to work.
- *
- * All the loads and stores are done via explicit asm() code in order to use
- * the right space registers. 
- * 
- * Testing with various alignments and buffer sizes shows that this code is 
- * often >10x faster than a simple byte-at-a-time copy, even for strangely
- * aligned operands. It is interesting to note that the glibc version
- * of memcpy (written in C) is actually quite fast already. This routine is 
- * able to beat it by 30-40% for aligned copies because of the loop unrolling, 
- * but in some cases the glibc version is still slightly faster. This lends 
- * more credibility that gcc can generate very good code as long as we are 
- * careful.
- *
- * TODO:
- * - cache prefetching needs more experimentation to get optimal settings
- * - try not to use the post-increment address modifiers; they create additional
- *   interlocks
- * - replace byte-copy loops with stybs sequences
  */
 
-#ifdef __KERNEL__
 #include <linux/module.h>
 #include <linux/compiler.h>
 #include <linux/uaccess.h>
-#define s_space "%%sr1"
-#define d_space "%%sr2"
-#else
-#include "memcpy.h"
-#define s_space "%%sr0"
-#define d_space "%%sr0"
-#define pa_memcpy new2_copy
-#endif
 
 DECLARE_PER_CPU(struct exception_data, exception_data);
 
-#define preserve_branch(label)	do {					\
-	volatile int dummy = 0;						\
-	/* The following branch is never taken, it's just here to  */	\
-	/* prevent gcc from optimizing away our exception code. */ 	\
-	if (unlikely(dummy != dummy))					\
-		goto label;						\
-} while (0)
-
 #define get_user_space() (segment_eq(get_fs(), KERNEL_DS) ? 0 : mfsp(3))
 #define get_kernel_space() (0)
 
-#define MERGE(w0, sh_1, w1, sh_2)  ({					\
-	unsigned int _r;						\
-	asm volatile (							\
-	"mtsar %3\n"							\
-	"shrpw %1, %2, %%sar, %0\n"					\
-	: "=r"(_r)							\
-	: "r"(w0), "r"(w1), "r"(sh_2)					\
-	);								\
-	_r;								\
-})
-#define THRESHOLD	16
-
-#ifdef DEBUG_MEMCPY
-#define DPRINTF(fmt, args...) do { printk(KERN_DEBUG "%s:%d:%s ", __FILE__, __LINE__, __func__ ); printk(KERN_DEBUG fmt, ##args ); } while (0)
-#else
-#define DPRINTF(fmt, args...)
-#endif
-
-#define def_load_ai_insn(_insn,_sz,_tt,_s,_a,_t,_e)	\
-	__asm__ __volatile__ (				\
-	"1:\t" #_insn ",ma " #_sz "(" _s ",%1), %0\n\t"	\
-	ASM_EXCEPTIONTABLE_ENTRY(1b,_e)			\
-	: _tt(_t), "+r"(_a)				\
-	: 						\
-	: "r8")
-
-#define def_store_ai_insn(_insn,_sz,_tt,_s,_a,_t,_e) 	\
-	__asm__ __volatile__ (				\
-	"1:\t" #_insn ",ma %1, " #_sz "(" _s ",%0)\n\t"	\
-	ASM_EXCEPTIONTABLE_ENTRY(1b,_e)			\
-	: "+r"(_a) 					\
-	: _tt(_t)					\
-	: "r8")
-
-#define ldbma(_s, _a, _t, _e) def_load_ai_insn(ldbs,1,"=r",_s,_a,_t,_e)
-#define stbma(_s, _t, _a, _e) def_store_ai_insn(stbs,1,"r",_s,_a,_t,_e)
-#define ldwma(_s, _a, _t, _e) def_load_ai_insn(ldw,4,"=r",_s,_a,_t,_e)
-#define stwma(_s, _t, _a, _e) def_store_ai_insn(stw,4,"r",_s,_a,_t,_e)
-#define flddma(_s, _a, _t, _e) def_load_ai_insn(fldd,8,"=f",_s,_a,_t,_e)
-#define fstdma(_s, _t, _a, _e) def_store_ai_insn(fstd,8,"f",_s,_a,_t,_e)
-
-#define def_load_insn(_insn,_tt,_s,_o,_a,_t,_e) 	\
-	__asm__ __volatile__ (				\
-	"1:\t" #_insn " " #_o "(" _s ",%1), %0\n\t"	\
-	ASM_EXCEPTIONTABLE_ENTRY(1b,_e)			\
-	: _tt(_t) 					\
-	: "r"(_a)					\
-	: "r8")
-
-#define def_store_insn(_insn,_tt,_s,_t,_o,_a,_e) 	\
-	__asm__ __volatile__ (				\
-	"1:\t" #_insn " %0, " #_o "(" _s ",%1)\n\t" 	\
-	ASM_EXCEPTIONTABLE_ENTRY(1b,_e)			\
-	: 						\
-	: _tt(_t), "r"(_a)				\
-	: "r8")
-
-#define ldw(_s,_o,_a,_t,_e)	def_load_insn(ldw,"=r",_s,_o,_a,_t,_e)
-#define stw(_s,_t,_o,_a,_e) 	def_store_insn(stw,"r",_s,_t,_o,_a,_e)
-
-#ifdef  CONFIG_PREFETCH
-static inline void prefetch_src(const void *addr)
-{
-	__asm__("ldw 0(" s_space ",%0), %%r0" : : "r" (addr));
-}
-
-static inline void prefetch_dst(const void *addr)
-{
-	__asm__("ldd 0(" d_space ",%0), %%r0" : : "r" (addr));
-}
-#else
-#define prefetch_src(addr) do { } while(0)
-#define prefetch_dst(addr) do { } while(0)
-#endif
-
-#define PA_MEMCPY_OK		0
-#define PA_MEMCPY_LOAD_ERROR	1
-#define PA_MEMCPY_STORE_ERROR	2
-
-/* Copy from a not-aligned src to an aligned dst, using shifts. Handles 4 words
- * per loop.  This code is derived from glibc. 
- */
-static noinline unsigned long copy_dstaligned(unsigned long dst,
-					unsigned long src, unsigned long len)
-{
-	/* gcc complains that a2 and a3 may be uninitialized, but actually
-	 * they cannot be.  Initialize a2/a3 to shut gcc up.
-	 */
-	register unsigned int a0, a1, a2 = 0, a3 = 0;
-	int sh_1, sh_2;
-
-	/* prefetch_src((const void *)src); */
-
-	/* Calculate how to shift a word read at the memory operation
-	   aligned srcp to make it aligned for copy.  */
-	sh_1 = 8 * (src % sizeof(unsigned int));
-	sh_2 = 8 * sizeof(unsigned int) - sh_1;
-
-	/* Make src aligned by rounding it down.  */
-	src &= -sizeof(unsigned int);
-
-	switch (len % 4)
-	{
-		case 2:
-			/* a1 = ((unsigned int *) src)[0];
-			   a2 = ((unsigned int *) src)[1]; */
-			ldw(s_space, 0, src, a1, cda_ldw_exc);
-			ldw(s_space, 4, src, a2, cda_ldw_exc);
-			src -= 1 * sizeof(unsigned int);
-			dst -= 3 * sizeof(unsigned int);
-			len += 2;
-			goto do1;
-		case 3:
-			/* a0 = ((unsigned int *) src)[0];
-			   a1 = ((unsigned int *) src)[1]; */
-			ldw(s_space, 0, src, a0, cda_ldw_exc);
-			ldw(s_space, 4, src, a1, cda_ldw_exc);
-			src -= 0 * sizeof(unsigned int);
-			dst -= 2 * sizeof(unsigned int);
-			len += 1;
-			goto do2;
-		case 0:
-			if (len == 0)
-				return PA_MEMCPY_OK;
-			/* a3 = ((unsigned int *) src)[0];
-			   a0 = ((unsigned int *) src)[1]; */
-			ldw(s_space, 0, src, a3, cda_ldw_exc);
-			ldw(s_space, 4, src, a0, cda_ldw_exc);
-			src -=-1 * sizeof(unsigned int);
-			dst -= 1 * sizeof(unsigned int);
-			len += 0;
-			goto do3;
-		case 1:
-			/* a2 = ((unsigned int *) src)[0];
-			   a3 = ((unsigned int *) src)[1]; */
-			ldw(s_space, 0, src, a2, cda_ldw_exc);
-			ldw(s_space, 4, src, a3, cda_ldw_exc);
-			src -=-2 * sizeof(unsigned int);
-			dst -= 0 * sizeof(unsigned int);
-			len -= 1;
-			if (len == 0)
-				goto do0;
-			goto do4;			/* No-op.  */
-	}
-
-	do
-	{
-		/* prefetch_src((const void *)(src + 4 * sizeof(unsigned int))); */
-do4:
-		/* a0 = ((unsigned int *) src)[0]; */
-		ldw(s_space, 0, src, a0, cda_ldw_exc);
-		/* ((unsigned int *) dst)[0] = MERGE (a2, sh_1, a3, sh_2); */
-		stw(d_space, MERGE (a2, sh_1, a3, sh_2), 0, dst, cda_stw_exc);
-do3:
-		/* a1 = ((unsigned int *) src)[1]; */
-		ldw(s_space, 4, src, a1, cda_ldw_exc);
-		/* ((unsigned int *) dst)[1] = MERGE (a3, sh_1, a0, sh_2); */
-		stw(d_space, MERGE (a3, sh_1, a0, sh_2), 4, dst, cda_stw_exc);
-do2:
-		/* a2 = ((unsigned int *) src)[2]; */
-		ldw(s_space, 8, src, a2, cda_ldw_exc);
-		/* ((unsigned int *) dst)[2] = MERGE (a0, sh_1, a1, sh_2); */
-		stw(d_space, MERGE (a0, sh_1, a1, sh_2), 8, dst, cda_stw_exc);
-do1:
-		/* a3 = ((unsigned int *) src)[3]; */
-		ldw(s_space, 12, src, a3, cda_ldw_exc);
-		/* ((unsigned int *) dst)[3] = MERGE (a1, sh_1, a2, sh_2); */
-		stw(d_space, MERGE (a1, sh_1, a2, sh_2), 12, dst, cda_stw_exc);
-
-		src += 4 * sizeof(unsigned int);
-		dst += 4 * sizeof(unsigned int);
-		len -= 4;
-	}
-	while (len != 0);
-
-do0:
-	/* ((unsigned int *) dst)[0] = MERGE (a2, sh_1, a3, sh_2); */
-	stw(d_space, MERGE (a2, sh_1, a3, sh_2), 0, dst, cda_stw_exc);
-
-	preserve_branch(handle_load_error);
-	preserve_branch(handle_store_error);
-
-	return PA_MEMCPY_OK;
-
-handle_load_error:
-	__asm__ __volatile__ ("cda_ldw_exc:\n");
-	return PA_MEMCPY_LOAD_ERROR;
-
-handle_store_error:
-	__asm__ __volatile__ ("cda_stw_exc:\n");
-	return PA_MEMCPY_STORE_ERROR;
-}
-
-
-/* Returns PA_MEMCPY_OK, PA_MEMCPY_LOAD_ERROR or PA_MEMCPY_STORE_ERROR.
- * In case of an access fault the faulty address can be read from the per_cpu
- * exception data struct. */
-static noinline unsigned long pa_memcpy_internal(void *dstp, const void *srcp,
-					unsigned long len)
-{
-	register unsigned long src, dst, t1, t2, t3;
-	register unsigned char *pcs, *pcd;
-	register unsigned int *pws, *pwd;
-	register double *pds, *pdd;
-	unsigned long ret;
-
-	src = (unsigned long)srcp;
-	dst = (unsigned long)dstp;
-	pcs = (unsigned char *)srcp;
-	pcd = (unsigned char *)dstp;
-
-	/* prefetch_src((const void *)srcp); */
-
-	if (len < THRESHOLD)
-		goto byte_copy;
-
-	/* Check alignment */
-	t1 = (src ^ dst);
-	if (unlikely(t1 & (sizeof(double)-1)))
-		goto unaligned_copy;
-
-	/* src and dst have same alignment. */
-
-	/* Copy bytes till we are double-aligned. */
-	t2 = src & (sizeof(double) - 1);
-	if (unlikely(t2 != 0)) {
-		t2 = sizeof(double) - t2;
-		while (t2 && len) {
-			/* *pcd++ = *pcs++; */
-			ldbma(s_space, pcs, t3, pmc_load_exc);
-			len--;
-			stbma(d_space, t3, pcd, pmc_store_exc);
-			t2--;
-		}
-	}
-
-	pds = (double *)pcs;
-	pdd = (double *)pcd;
-
-#if 0
-	/* Copy 8 doubles at a time */
-	while (len >= 8*sizeof(double)) {
-		register double r1, r2, r3, r4, r5, r6, r7, r8;
-		/* prefetch_src((char *)pds + L1_CACHE_BYTES); */
-		flddma(s_space, pds, r1, pmc_load_exc);
-		flddma(s_space, pds, r2, pmc_load_exc);
-		flddma(s_space, pds, r3, pmc_load_exc);
-		flddma(s_space, pds, r4, pmc_load_exc);
-		fstdma(d_space, r1, pdd, pmc_store_exc);
-		fstdma(d_space, r2, pdd, pmc_store_exc);
-		fstdma(d_space, r3, pdd, pmc_store_exc);
-		fstdma(d_space, r4, pdd, pmc_store_exc);
-
-#if 0
-		if (L1_CACHE_BYTES <= 32)
-			prefetch_src((char *)pds + L1_CACHE_BYTES);
-#endif
-		flddma(s_space, pds, r5, pmc_load_exc);
-		flddma(s_space, pds, r6, pmc_load_exc);
-		flddma(s_space, pds, r7, pmc_load_exc);
-		flddma(s_space, pds, r8, pmc_load_exc);
-		fstdma(d_space, r5, pdd, pmc_store_exc);
-		fstdma(d_space, r6, pdd, pmc_store_exc);
-		fstdma(d_space, r7, pdd, pmc_store_exc);
-		fstdma(d_space, r8, pdd, pmc_store_exc);
-		len -= 8*sizeof(double);
-	}
-#endif
-
-	pws = (unsigned int *)pds;
-	pwd = (unsigned int *)pdd;
-
-word_copy:
-	while (len >= 8*sizeof(unsigned int)) {
-		register unsigned int r1,r2,r3,r4,r5,r6,r7,r8;
-		/* prefetch_src((char *)pws + L1_CACHE_BYTES); */
-		ldwma(s_space, pws, r1, pmc_load_exc);
-		ldwma(s_space, pws, r2, pmc_load_exc);
-		ldwma(s_space, pws, r3, pmc_load_exc);
-		ldwma(s_space, pws, r4, pmc_load_exc);
-		stwma(d_space, r1, pwd, pmc_store_exc);
-		stwma(d_space, r2, pwd, pmc_store_exc);
-		stwma(d_space, r3, pwd, pmc_store_exc);
-		stwma(d_space, r4, pwd, pmc_store_exc);
-
-		ldwma(s_space, pws, r5, pmc_load_exc);
-		ldwma(s_space, pws, r6, pmc_load_exc);
-		ldwma(s_space, pws, r7, pmc_load_exc);
-		ldwma(s_space, pws, r8, pmc_load_exc);
-		stwma(d_space, r5, pwd, pmc_store_exc);
-		stwma(d_space, r6, pwd, pmc_store_exc);
-		stwma(d_space, r7, pwd, pmc_store_exc);
-		stwma(d_space, r8, pwd, pmc_store_exc);
-		len -= 8*sizeof(unsigned int);
-	}
-
-	while (len >= 4*sizeof(unsigned int)) {
-		register unsigned int r1,r2,r3,r4;
-		ldwma(s_space, pws, r1, pmc_load_exc);
-		ldwma(s_space, pws, r2, pmc_load_exc);
-		ldwma(s_space, pws, r3, pmc_load_exc);
-		ldwma(s_space, pws, r4, pmc_load_exc);
-		stwma(d_space, r1, pwd, pmc_store_exc);
-		stwma(d_space, r2, pwd, pmc_store_exc);
-		stwma(d_space, r3, pwd, pmc_store_exc);
-		stwma(d_space, r4, pwd, pmc_store_exc);
-		len -= 4*sizeof(unsigned int);
-	}
-
-	pcs = (unsigned char *)pws;
-	pcd = (unsigned char *)pwd;
-
-byte_copy:
-	while (len) {
-		/* *pcd++ = *pcs++; */
-		ldbma(s_space, pcs, t3, pmc_load_exc);
-		stbma(d_space, t3, pcd, pmc_store_exc);
-		len--;
-	}
-
-	return PA_MEMCPY_OK;
-
-unaligned_copy:
-	/* possibly we are aligned on a word, but not on a double... */
-	if (likely((t1 & (sizeof(unsigned int)-1)) == 0)) {
-		t2 = src & (sizeof(unsigned int) - 1);
-
-		if (unlikely(t2 != 0)) {
-			t2 = sizeof(unsigned int) - t2;
-			while (t2) {
-				/* *pcd++ = *pcs++; */
-				ldbma(s_space, pcs, t3, pmc_load_exc);
-				stbma(d_space, t3, pcd, pmc_store_exc);
-				len--;
-				t2--;
-			}
-		}
-
-		pws = (unsigned int *)pcs;
-		pwd = (unsigned int *)pcd;
-		goto word_copy;
-	}
-
-	/* Align the destination.  */
-	if (unlikely((dst & (sizeof(unsigned int) - 1)) != 0)) {
-		t2 = sizeof(unsigned int) - (dst & (sizeof(unsigned int) - 1));
-		while (t2) {
-			/* *pcd++ = *pcs++; */
-			ldbma(s_space, pcs, t3, pmc_load_exc);
-			stbma(d_space, t3, pcd, pmc_store_exc);
-			len--;
-			t2--;
-		}
-		dst = (unsigned long)pcd;
-		src = (unsigned long)pcs;
-	}
-
-	ret = copy_dstaligned(dst, src, len / sizeof(unsigned int));
-	if (ret)
-		return ret;
-
-	pcs += (len & -sizeof(unsigned int));
-	pcd += (len & -sizeof(unsigned int));
-	len %= sizeof(unsigned int);
-
-	preserve_branch(handle_load_error);
-	preserve_branch(handle_store_error);
-
-	goto byte_copy;
-
-handle_load_error:
-	__asm__ __volatile__ ("pmc_load_exc:\n");
-	return PA_MEMCPY_LOAD_ERROR;
-
-handle_store_error:
-	__asm__ __volatile__ ("pmc_store_exc:\n");
-	return PA_MEMCPY_STORE_ERROR;
-}
-
-
 /* Returns 0 for success, otherwise, returns number of bytes not transferred. */
-static unsigned long pa_memcpy(void *dstp, const void *srcp, unsigned long len)
-{
-	unsigned long ret, fault_addr, reference;
-	struct exception_data *d;
+extern unsigned long pa_memcpy(void *dst, const void *src,
+				unsigned long len);
 
-	ret = pa_memcpy_internal(dstp, srcp, len);
-	if (likely(ret == PA_MEMCPY_OK))
-		return 0;
-
-	/* if a load or store fault occured we can get the faulty addr */
-	d = this_cpu_ptr(&exception_data);
-	fault_addr = d->fault_addr;
-
-	/* error in load or store? */
-	if (ret == PA_MEMCPY_LOAD_ERROR)
-		reference = (unsigned long) srcp;
-	else
-		reference = (unsigned long) dstp;
-
-	DPRINTF("pa_memcpy: fault type = %lu, len=%lu fault_addr=%lu ref=%lu\n",
-		ret, len, fault_addr, reference);
-
-	if (fault_addr >= reference)
-		return len - (fault_addr - reference);
-	else
-		return len;
-}
-
-#ifdef __KERNEL__
 unsigned long __copy_to_user(void __user *dst, const void *src,
 			     unsigned long len)
 {
@@ -537,5 +84,3 @@ long probe_kernel_read(void *dst, const
 
 	return __probe_kernel_read(dst, src, size);
 }
-
-#endif

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 47/72] ACPI: Fix incompatibility with mcount-based function graph tracing
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 46/72] parisc: Fix access fault handling in pa_memcpy() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 48/72] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paul Menzel, Josh Poimboeuf,
	Steven Rostedt (VMware),
	Rafael J. Wysocki

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Josh Poimboeuf <jpoimboe@redhat.com>

commit 61b79e16c68d703dde58c25d3935d67210b7d71b upstream.

Paul Menzel reported a warning:

  WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
  Bad frame pointer: expected f6919d98, received f6919db0
    from func acpi_pm_device_sleep_wake return to c43b6f9d

The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109

I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.

But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason.  It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there.  As far as I can tell, there's
no reason for it to be there.  The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/Makefile |    1 -
 1 file changed, 1 deletion(-)

--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -2,7 +2,6 @@
 # Makefile for the Linux ACPI interpreter
 #
 
-ccflags-y			:= -Os
 ccflags-$(CONFIG_ACPI_DEBUG)	+= -DACPI_DEBUG_OUTPUT
 
 #

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 48/72] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 47/72] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 49/72] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Joerg Roedel, Rafael J. Wysocki

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <jroedel@suse.de>

commit 08f63d97749185fab942a3a47ed80f5bd89b8b7d upstream.

No platform-device is required for IO(x)APICs, so don't even
create them.

[ rjw: This fixes a problem with leaking platform device objects
  after IOAPIC/IOxAPIC hot-removal events.]

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/acpi/acpi_platform.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/drivers/acpi/acpi_platform.c
+++ b/drivers/acpi/acpi_platform.c
@@ -25,9 +25,11 @@
 ACPI_MODULE_NAME("platform");
 
 static const struct acpi_device_id forbidden_id_list[] = {
-	{"PNP0000", 0},	/* PIC */
-	{"PNP0100", 0},	/* Timer */
-	{"PNP0200", 0},	/* AT DMA Controller */
+	{"PNP0000",  0},	/* PIC */
+	{"PNP0100",  0},	/* Timer */
+	{"PNP0200",  0},	/* AT DMA Controller */
+	{"ACPI0009", 0},	/* IOxAPIC */
+	{"ACPI000A", 0},	/* IOAPIC */
 	{"", 0},
 };
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 49/72] tty/serial: atmel: fix race condition (TX+DMA)
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 48/72] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 50/72] tty/serial: atmel: fix TX path in atmel_console_write() Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolas Ferre, Richard Genoud

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Richard Genoud <richard.genoud@gmail.com>

commit 31ca2c63fdc0aee725cbd4f207c1256f5deaabde upstream.

If uart_flush_buffer() is called between atmel_tx_dma() and
atmel_complete_tx_dma(), the circular buffer has been cleared, but not
atmel_port->tx_len.
That leads to a circular buffer overflow (dumping (UART_XMIT_SIZE -
atmel_port->tx_len) bytes).

Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/atmel_serial.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/drivers/tty/serial/atmel_serial.c
+++ b/drivers/tty/serial/atmel_serial.c
@@ -1938,6 +1938,11 @@ static void atmel_flush_buffer(struct ua
 		atmel_uart_writel(port, ATMEL_PDC_TCR, 0);
 		atmel_port->pdc_tx.ofs = 0;
 	}
+	/*
+	 * in uart_flush_buffer(), the xmit circular buffer has just
+	 * been cleared, so we have to reset tx_len accordingly.
+	 */
+	atmel_port->tx_len = 0;
 }
 
 /*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 50/72] tty/serial: atmel: fix TX path in atmel_console_write()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 49/72] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 51/72] USB: fix linked-list corruption in rh_call_control() Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nicolas Ferre

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicolas Ferre <nicolas.ferre@microchip.com>

commit 497e1e16f45c70574dc9922c7f75c642c2162119 upstream.

A side effect of 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA
from transmitting in stop_tx") is that the console can be called with
TX path disabled. Then the system would hang trying to push charecters
out in atmel_console_putchar().

Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Fixes: 89d8232411a8 ("tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/atmel_serial.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/tty/serial/atmel_serial.c
+++ b/drivers/tty/serial/atmel_serial.c
@@ -2476,6 +2476,9 @@ static void atmel_console_write(struct c
 	pdc_tx = atmel_uart_readl(port, ATMEL_PDC_PTSR) & ATMEL_PDC_TXTEN;
 	atmel_uart_writel(port, ATMEL_PDC_PTCR, ATMEL_PDC_TXTDIS);
 
+	/* Make sure that tx path is actually able to send characters */
+	atmel_uart_writel(port, ATMEL_US_CR, ATMEL_US_TXEN);
+
 	uart_console_write(port, s, count, atmel_console_putchar);
 
 	/*

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 51/72] USB: fix linked-list corruption in rh_call_control()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 50/72] tty/serial: atmel: fix TX path in atmel_console_write() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 54/72] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Alan Stern

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alan Stern <stern@rowland.harvard.edu>

commit 1633682053a7ee8058e10c76722b9b28e97fb73f upstream.

Using KASAN, Dmitry found a bug in the rh_call_control() routine: If
buffer allocation fails, the routine returns immediately without
unlinking its URB from the control endpoint, eventually leading to
linked-list corruption.

This patch fixes the problem by jumping to the end of the routine
(where the URB is unlinked) when an allocation failure occurs.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/usb/core/hcd.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -520,8 +520,10 @@ static int rh_call_control (struct usb_h
 	 */
 	tbuf_size =  max_t(u16, sizeof(struct usb_hub_descriptor), wLength);
 	tbuf = kzalloc(tbuf_size, GFP_KERNEL);
-	if (!tbuf)
-		return -ENOMEM;
+	if (!tbuf) {
+		status = -ENOMEM;
+		goto err_alloc;
+	}
 
 	bufp = tbuf;
 
@@ -734,6 +736,7 @@ error:
 	}
 
 	kfree(tbuf);
+ err_alloc:
 
 	/* any errors get returned through the urb completion */
 	spin_lock_irq(&hcd_root_hub_lock);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 54/72] KVM: kvm_io_bus_unregister_dev() should never fail
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 51/72] USB: fix linked-list corruption in rh_call_control() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 56/72] drm/vc4: Allocate the right amount of space for boot-time CRTC state Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Cornelia Huck,
	David Hildenbrand, Paolo Bonzini

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Hildenbrand <david@redhat.com>

commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.

No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.

There is nothing the callers could do, except retrying over and over
again.

So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).

Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/kvm_host.h |    4 ++--
 virt/kvm/eventfd.c       |    3 ++-
 virt/kvm/kvm_main.c      |   42 +++++++++++++++++++++++++-----------------
 3 files changed, 29 insertions(+), 20 deletions(-)

--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -162,8 +162,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcp
 		    int len, void *val);
 int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 			    int len, struct kvm_io_device *dev);
-int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
-			      struct kvm_io_device *dev);
+void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+			       struct kvm_io_device *dev);
 struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 					 gpa_t addr);
 
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -870,7 +870,8 @@ kvm_deassign_ioeventfd_idx(struct kvm *k
 			continue;
 
 		kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
-		kvm->buses[bus_idx]->ioeventfd_count--;
+		if (kvm->buses[bus_idx])
+			kvm->buses[bus_idx]->ioeventfd_count--;
 		ioeventfd_release(p);
 		ret = 0;
 		break;
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -721,7 +721,8 @@ static void kvm_destroy_vm(struct kvm *k
 	spin_unlock(&kvm_lock);
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++) {
-		kvm_io_bus_destroy(kvm->buses[i]);
+		if (kvm->buses[i])
+			kvm_io_bus_destroy(kvm->buses[i]);
 		kvm->buses[i] = NULL;
 	}
 	kvm_coalesced_mmio_free(kvm);
@@ -3465,6 +3466,8 @@ int kvm_io_bus_write(struct kvm_vcpu *vc
 	};
 
 	bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+	if (!bus)
+		return -ENOMEM;
 	r = __kvm_io_bus_write(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
@@ -3482,6 +3485,8 @@ int kvm_io_bus_write_cookie(struct kvm_v
 	};
 
 	bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+	if (!bus)
+		return -ENOMEM;
 
 	/* First try the device referenced by cookie. */
 	if ((cookie >= 0) && (cookie < bus->dev_count) &&
@@ -3532,6 +3537,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcp
 	};
 
 	bus = srcu_dereference(vcpu->kvm->buses[bus_idx], &vcpu->kvm->srcu);
+	if (!bus)
+		return -ENOMEM;
 	r = __kvm_io_bus_read(vcpu, bus, &range, val);
 	return r < 0 ? r : 0;
 }
@@ -3544,6 +3551,9 @@ int kvm_io_bus_register_dev(struct kvm *
 	struct kvm_io_bus *new_bus, *bus;
 
 	bus = kvm->buses[bus_idx];
+	if (!bus)
+		return -ENOMEM;
+
 	/* exclude ioeventfd which is limited by maximum fd */
 	if (bus->dev_count - bus->ioeventfd_count > NR_IOBUS_DEVS - 1)
 		return -ENOSPC;
@@ -3563,45 +3573,41 @@ int kvm_io_bus_register_dev(struct kvm *
 }
 
 /* Caller must hold slots_lock. */
-int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
-			      struct kvm_io_device *dev)
+void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+			       struct kvm_io_device *dev)
 {
-	int i, r;
+	int i;
 	struct kvm_io_bus *new_bus, *bus;
 
 	bus = kvm->buses[bus_idx];
-
-	/*
-	 * It's possible the bus being released before hand. If so,
-	 * we're done here.
-	 */
 	if (!bus)
-		return 0;
+		return;
 
-	r = -ENOENT;
 	for (i = 0; i < bus->dev_count; i++)
 		if (bus->range[i].dev == dev) {
-			r = 0;
 			break;
 		}
 
-	if (r)
-		return r;
+	if (i == bus->dev_count)
+		return;
 
 	new_bus = kmalloc(sizeof(*bus) + ((bus->dev_count - 1) *
 			  sizeof(struct kvm_io_range)), GFP_KERNEL);
-	if (!new_bus)
-		return -ENOMEM;
+	if (!new_bus)  {
+		pr_err("kvm: failed to shrink bus, removing it completely\n");
+		goto broken;
+	}
 
 	memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
 	new_bus->dev_count--;
 	memcpy(new_bus->range + i, bus->range + i + 1,
 	       (new_bus->dev_count - i) * sizeof(struct kvm_io_range));
 
+broken:
 	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
 	synchronize_srcu_expedited(&kvm->srcu);
 	kfree(bus);
-	return r;
+	return;
 }
 
 struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
@@ -3614,6 +3620,8 @@ struct kvm_io_device *kvm_io_bus_get_dev
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 
 	bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
+	if (!bus)
+		goto out_unlock;
 
 	dev_idx = kvm_io_bus_get_first_dev(bus, addr, 1);
 	if (dev_idx < 0)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 56/72] drm/vc4: Allocate the right amount of space for boot-time CRTC state.
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 54/72] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 57/72] drm/etnaviv: (re-)protect fence allocation with GPU mutex Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Anholt, Stefan Wahren, Daniel Vetter

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Anholt <eric@anholt.net>

commit 6d6e500391875cc372336c88e9a8af377be19c36 upstream.

Without this, the first modeset would dereference past the allocation
when trying to free the mm node.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170328201343.4884-1-eric@anholt.net
Fixes: d8dbf44f13b9 ("drm/vc4: Make the CRTCs cooperate on allocating display lists.")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/vc4/vc4_crtc.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -842,6 +842,17 @@ static void vc4_crtc_destroy_state(struc
 	drm_atomic_helper_crtc_destroy_state(crtc, state);
 }
 
+static void
+vc4_crtc_reset(struct drm_crtc *crtc)
+{
+	if (crtc->state)
+		__drm_atomic_helper_crtc_destroy_state(crtc->state);
+
+	crtc->state = kzalloc(sizeof(struct vc4_crtc_state), GFP_KERNEL);
+	if (crtc->state)
+		crtc->state->crtc = crtc;
+}
+
 static const struct drm_crtc_funcs vc4_crtc_funcs = {
 	.set_config = drm_atomic_helper_set_config,
 	.destroy = vc4_crtc_destroy,
@@ -849,7 +860,7 @@ static const struct drm_crtc_funcs vc4_c
 	.set_property = NULL,
 	.cursor_set = NULL, /* handled by drm_mode_cursor_universal */
 	.cursor_move = NULL, /* handled by drm_mode_cursor_universal */
-	.reset = drm_atomic_helper_crtc_reset,
+	.reset = vc4_crtc_reset,
 	.atomic_duplicate_state = vc4_crtc_duplicate_state,
 	.atomic_destroy_state = vc4_crtc_destroy_state,
 	.gamma_set = vc4_crtc_gamma_set,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 57/72] drm/etnaviv: (re-)protect fence allocation with GPU mutex
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 56/72] drm/vc4: Allocate the right amount of space for boot-time CRTC state Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38   ` Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Lucas Stach

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lucas Stach <l.stach@pengutronix.de>

commit f3cd1b064f1179d9e6188c6d67297a2360880e10 upstream.

The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.

Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1299,6 +1299,8 @@ int etnaviv_gpu_submit(struct etnaviv_gp
 		goto out_pm_put;
 	}
 
+	mutex_lock(&gpu->lock);
+
 	fence = etnaviv_gpu_fence_alloc(gpu);
 	if (!fence) {
 		event_free(gpu, event);
@@ -1306,8 +1308,6 @@ int etnaviv_gpu_submit(struct etnaviv_gp
 		goto out_pm_put;
 	}
 
-	mutex_lock(&gpu->lock);
-
 	gpu->event[event].fence = fence;
 	submit->fence = fence->seqno;
 	gpu->active_fence = submit->fence;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 58/72] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
@ 2017-04-06  8:38   ` Greg Kroah-Hartman
  2017-04-06  8:37 ` [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one Greg Kroah-Hartman
                     ` (65 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Baoquan He, Bhupesh Sharma,
	Dave Young, Thomas Garnier, Andrew Morton, Andy Lutomirski,
	Ard Biesheuvel, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	H. Peter Anvin, Josh Poimboeuf, Kees Cook, Linus Torvalds,
	Masahiro Yamada, Matt Fleming, Peter Zijlstra, Thomas Gleixner,
	linux-efi, Ingo Molnar

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Baoquan He <bhe@redhat.com>

commit a46f60d76004965e5669dbf3fc21ef3bc3632eb4 upstream.

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Thomas Garnier <thgarnie@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/mm/kaslr.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -48,7 +48,7 @@ static const unsigned long vaddr_start =
 #if defined(CONFIG_X86_ESPFIX64)
 static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
 #elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_START;
+static const unsigned long vaddr_end = EFI_VA_END;
 #else
 static const unsigned long vaddr_end = __START_KERNEL_map;
 #endif
@@ -105,7 +105,7 @@ void __init kernel_randomize_memory(void
 	 */
 	BUILD_BUG_ON(vaddr_start >= vaddr_end);
 	BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
-		     vaddr_end >= EFI_VA_START);
+		     vaddr_end >= EFI_VA_END);
 	BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
 		      IS_ENABLED(CONFIG_EFI)) &&
 		     vaddr_end >= __START_KERNEL_map);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 58/72] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
@ 2017-04-06  8:38   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Baoquan He, Bhupesh Sharma,
	Dave Young, Thomas Garnier, Andrew Morton, Andy Lutomirski,
	Ard Biesheuvel, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	H. Peter Anvin, Josh Poimboeuf, Kees Cook, Linus Torvalds,
	Masahiro Yamada, Matt Fleming, Peter Zijlstra, Thomas Gleixner

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Baoquan He <bhe@redhat.com>

commit a46f60d76004965e5669dbf3fc21ef3bc3632eb4 upstream.

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Thomas Garnier <thgarnie@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/mm/kaslr.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -48,7 +48,7 @@ static const unsigned long vaddr_start =
 #if defined(CONFIG_X86_ESPFIX64)
 static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
 #elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_START;
+static const unsigned long vaddr_end = EFI_VA_END;
 #else
 static const unsigned long vaddr_end = __START_KERNEL_map;
 #endif
@@ -105,7 +105,7 @@ void __init kernel_randomize_memory(void
 	 */
 	BUILD_BUG_ON(vaddr_start >= vaddr_end);
 	BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
-		     vaddr_end >= EFI_VA_START);
+		     vaddr_end >= EFI_VA_END);
 	BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
 		      IS_ENABLED(CONFIG_EFI)) &&
 		     vaddr_end >= __START_KERNEL_map);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 59/72] x86/mce: Fix copy/paste error in exception table entries
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2017-04-06  8:38   ` Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 61/72] mm: rmap: fix huge file mmap accounting in the memcg stats Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tony Luck, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Josh Poimboeuf, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony.luck@intel.com>

commit 26a37ab319a26d330bab298770d692bb9c852aff upstream.

Back in commit:

  92b0729c34cab ("x86/mm, x86/mce: Add memcpy_mcsafe()")

... I made a copy/paste error setting up the exception table entries
and ended up with two for label .L_cache_w3 and none for .L_cache_w2.

This means that if we take a machine check on:

  .L_cache_w2: movq 2*8(%rsi), %r10

then we don't have an exception table entry for this instruction
and we can't recover.

Fix: s/3/2/

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 92b0729c34cab ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Link: http://lkml.kernel.org/r/1490046030-25862-1-git-send-email-tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/lib/memcpy_64.S |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -290,7 +290,7 @@ EXPORT_SYMBOL_GPL(memcpy_mcsafe_unrolled
 	_ASM_EXTABLE_FAULT(.L_copy_leading_bytes, .L_memcpy_mcsafe_fail)
 	_ASM_EXTABLE_FAULT(.L_cache_w0, .L_memcpy_mcsafe_fail)
 	_ASM_EXTABLE_FAULT(.L_cache_w1, .L_memcpy_mcsafe_fail)
-	_ASM_EXTABLE_FAULT(.L_cache_w3, .L_memcpy_mcsafe_fail)
+	_ASM_EXTABLE_FAULT(.L_cache_w2, .L_memcpy_mcsafe_fail)
 	_ASM_EXTABLE_FAULT(.L_cache_w3, .L_memcpy_mcsafe_fail)
 	_ASM_EXTABLE_FAULT(.L_cache_w4, .L_memcpy_mcsafe_fail)
 	_ASM_EXTABLE_FAULT(.L_cache_w5, .L_memcpy_mcsafe_fail)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 61/72] mm: rmap: fix huge file mmap accounting in the memcg stats
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 59/72] x86/mce: Fix copy/paste error in exception table entries Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 62/72] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Kirill A. Shutemov,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 553af430e7c981e6e8fa5007c5b7b5773acc63dd upstream.

Huge pages are accounted as single units in the memcg's "file_mapped"
counter.  Account the correct number of base pages, like we do in the
corresponding node counter.

Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/memcontrol.h |    6 ++++++
 mm/rmap.c                  |    4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -739,6 +739,12 @@ static inline bool mem_cgroup_oom_synchr
 	return false;
 }
 
+static inline void mem_cgroup_update_page_stat(struct page *page,
+					       enum mem_cgroup_stat_index idx,
+					       int nr)
+{
+}
+
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_stat_index idx)
 {
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1295,7 +1295,7 @@ void page_add_file_rmap(struct page *pag
 			goto out;
 	}
 	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, nr);
-	mem_cgroup_inc_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
+	mem_cgroup_update_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED, nr);
 out:
 	unlock_page_memcg(page);
 }
@@ -1335,7 +1335,7 @@ static void page_remove_file_rmap(struct
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
 	__mod_node_page_state(page_pgdat(page), NR_FILE_MAPPED, -nr);
-	mem_cgroup_dec_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED);
+	mem_cgroup_update_page_stat(page, MEM_CGROUP_STAT_FILE_MAPPED, -nr);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 62/72] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 61/72] mm: rmap: fix huge file mmap accounting in the memcg stats Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 64/72] qla2xxx: Allow vref count to timeout on vport delete Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Naoya Horiguchi, Hillf Danton,
	Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Mike Kravetz,
	Christian Borntraeger, Gerald Schaefer, Andrew Morton,
	Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

commit c9d398fa237882ea07167e23bcfc5e6847066518 upstream.

I found the race condition which triggers the following bug when
move_pages() and soft offline are called on a single hugetlb page
concurrently.

    Soft offlining page 0x119400 at 0x700000000000
    BUG: unable to handle kernel paging request at ffffea0011943820
    IP: follow_huge_pmd+0x143/0x190
    PGD 7ffd2067
    PUD 7ffd1067
    PMD 0
        [61163.582052] Oops: 0000 [#1] SMP
    Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
    CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:follow_huge_pmd+0x143/0x190
    RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
    RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
    RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
    RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
    R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
    R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
    FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
    Call Trace:
     follow_page_mask+0x270/0x550
     SYSC_move_pages+0x4ea/0x8f0
     SyS_move_pages+0xe/0x10
     do_syscall_64+0x67/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fc976e03949
    RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
    RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
    RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
    R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
    R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
    Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
    RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
    CR2: ffffea0011943820
    ---[ end trace e4f81353a2d23232 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled

This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.

Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()")
Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/hugetlb.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4471,6 +4471,7 @@ follow_huge_pmd(struct mm_struct *mm, un
 {
 	struct page *page = NULL;
 	spinlock_t *ptl;
+	pte_t pte;
 retry:
 	ptl = pmd_lockptr(mm, pmd);
 	spin_lock(ptl);
@@ -4480,12 +4481,13 @@ retry:
 	 */
 	if (!pmd_huge(*pmd))
 		goto out;
-	if (pmd_present(*pmd)) {
+	pte = huge_ptep_get((pte_t *)pmd);
+	if (pte_present(pte)) {
 		page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
 		if (flags & FOLL_GET)
 			get_page(page);
 	} else {
-		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
+		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
 			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
 			goto retry;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 64/72] qla2xxx: Allow vref count to timeout on vport delete.
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 62/72] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 66/72] MIPS: Lantiq: Fix cascaded IRQ setup Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Joe Carnuccio, Himanshu Madhani,
	Nicholas Bellinger

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Joe Carnuccio <joe.carnuccio@cavium.com>

commit c4a9b538ab2a109c5f9798bea1f8f4bf93aadfb9 upstream.

Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/qla2xxx/qla_attr.c |    2 --
 drivers/scsi/qla2xxx/qla_def.h  |    3 +++
 drivers/scsi/qla2xxx/qla_init.c |    1 +
 drivers/scsi/qla2xxx/qla_mid.c  |   14 ++++++++------
 drivers/scsi/qla2xxx/qla_os.c   |    1 +
 5 files changed, 13 insertions(+), 8 deletions(-)

--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -2153,8 +2153,6 @@ qla24xx_vport_delete(struct fc_vport *fc
 		    "Timer for the VP[%d] has stopped\n", vha->vp_idx);
 	}
 
-	BUG_ON(atomic_read(&vha->vref_count));
-
 	qla2x00_free_fcports(vha);
 
 	mutex_lock(&ha->vport_lock);
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -3742,6 +3742,7 @@ typedef struct scsi_qla_host {
 	struct qla8044_reset_template reset_tmplt;
 	struct qla_tgt_counters tgt_counters;
 	uint16_t	bbcr;
+	wait_queue_head_t vref_waitq;
 } scsi_qla_host_t;
 
 struct qla27xx_image_status {
@@ -3780,6 +3781,7 @@ struct qla_tgt_vp_map {
 	mb();						     \
 	if (__vha->flags.delete_progress) {		     \
 		atomic_dec(&__vha->vref_count);		     \
+		wake_up(&__vha->vref_waitq);		\
 		__bail = 1;				     \
 	} else {					     \
 		__bail = 0;				     \
@@ -3788,6 +3790,7 @@ struct qla_tgt_vp_map {
 
 #define QLA_VHA_MARK_NOT_BUSY(__vha) do {		     \
 	atomic_dec(&__vha->vref_count);			     \
+	wake_up(&__vha->vref_waitq);			\
 } while (0)
 
 /*
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -4356,6 +4356,7 @@ qla2x00_update_fcports(scsi_qla_host_t *
 			}
 		}
 		atomic_dec(&vha->vref_count);
+		wake_up(&vha->vref_waitq);
 	}
 	spin_unlock_irqrestore(&ha->vport_slock, flags);
 }
--- a/drivers/scsi/qla2xxx/qla_mid.c
+++ b/drivers/scsi/qla2xxx/qla_mid.c
@@ -74,13 +74,14 @@ qla24xx_deallocate_vp_id(scsi_qla_host_t
 	 * ensures no active vp_list traversal while the vport is removed
 	 * from the queue)
 	 */
-	spin_lock_irqsave(&ha->vport_slock, flags);
-	while (atomic_read(&vha->vref_count)) {
-		spin_unlock_irqrestore(&ha->vport_slock, flags);
-
-		msleep(500);
+	wait_event_timeout(vha->vref_waitq, atomic_read(&vha->vref_count),
+	    10*HZ);
 
-		spin_lock_irqsave(&ha->vport_slock, flags);
+	spin_lock_irqsave(&ha->vport_slock, flags);
+	if (atomic_read(&vha->vref_count)) {
+		ql_dbg(ql_dbg_vport, vha, 0xfffa,
+		    "vha->vref_count=%u timeout\n", vha->vref_count.counter);
+		vha->vref_count = (atomic_t)ATOMIC_INIT(0);
 	}
 	list_del(&vha->list);
 	qlt_update_vp_map(vha, RESET_VP_IDX);
@@ -269,6 +270,7 @@ qla2x00_alert_all_vps(struct rsp_que *rs
 
 			spin_lock_irqsave(&ha->vport_slock, flags);
 			atomic_dec(&vha->vref_count);
+			wake_up(&vha->vref_waitq);
 		}
 		i++;
 	}
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -4045,6 +4045,7 @@ struct scsi_qla_host *qla2x00_create_hos
 
 	spin_lock_init(&vha->work_lock);
 	spin_lock_init(&vha->cmd_list_lock);
+	init_waitqueue_head(&vha->vref_waitq);
 
 	sprintf(vha->host_str, "%s_%ld", QLA2XXX_DRIVER_NAME, vha->host_no);
 	ql_dbg(ql_dbg_init, vha, 0x0041,

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 66/72] MIPS: Lantiq: Fix cascaded IRQ setup
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 64/72] qla2xxx: Allow vref count to timeout on vport delete Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 67/72] mm: workingset: fix premature shadow node shrinking with cgroups Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Felix Fietkau, John Crispin,
	linux-mips, James Hogan, Amit Pundir

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Felix Fietkau <nbd@nbd.name>

commit 6c356eda225e3ee134ed4176b9ae3a76f793f4dd upstream.

With the IRQ stack changes integrated, the XRX200 devices started
emitting a constant stream of kernel messages like this:

[  565.415310] Spurious IRQ: CAUSE=0x1100c300

This is caused by IP0 getting handled by plat_irq_dispatch() rather than
its vectored interrupt handler, which is fixed by commit de856416e714
("MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch").

Fix plat_irq_dispatch() to handle non-vectored IPI interrupts correctly
by setting up IP2-6 as proper chained IRQ handlers and calling do_IRQ
for all MIPS CPU interrupts.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: John Crispin <john@phrozen.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15077/
[james.hogan@imgtec.com: tweaked commit message]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/mips/lantiq/irq.c |   36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

--- a/arch/mips/lantiq/irq.c
+++ b/arch/mips/lantiq/irq.c
@@ -269,6 +269,11 @@ static void ltq_hw5_irqdispatch(void)
 DEFINE_HWx_IRQDISPATCH(5)
 #endif
 
+static void ltq_hw_irq_handler(struct irq_desc *desc)
+{
+	ltq_hw_irqdispatch(irq_desc_get_irq(desc) - 2);
+}
+
 #ifdef CONFIG_MIPS_MT_SMP
 void __init arch_init_ipiirq(int irq, struct irqaction *action)
 {
@@ -313,23 +318,19 @@ static struct irqaction irq_call = {
 asmlinkage void plat_irq_dispatch(void)
 {
 	unsigned int pending = read_c0_status() & read_c0_cause() & ST0_IM;
-	unsigned int i;
+	int irq;
 
-	if ((MIPS_CPU_TIMER_IRQ == 7) && (pending & CAUSEF_IP7)) {
-		do_IRQ(MIPS_CPU_TIMER_IRQ);
-		goto out;
-	} else {
-		for (i = 0; i < MAX_IM; i++) {
-			if (pending & (CAUSEF_IP2 << i)) {
-				ltq_hw_irqdispatch(i);
-				goto out;
-			}
-		}
+	if (!pending) {
+		spurious_interrupt();
+		return;
 	}
-	pr_alert("Spurious IRQ: CAUSE=0x%08x\n", read_c0_status());
 
-out:
-	return;
+	pending >>= CAUSEB_IP;
+	while (pending) {
+		irq = fls(pending) - 1;
+		do_IRQ(MIPS_CPU_IRQ_BASE + irq);
+		pending &= ~BIT(irq);
+	}
 }
 
 static int icu_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw)
@@ -354,11 +355,6 @@ static const struct irq_domain_ops irq_d
 	.map = icu_map,
 };
 
-static struct irqaction cascade = {
-	.handler = no_action,
-	.name = "cascade",
-};
-
 int __init icu_of_init(struct device_node *node, struct device_node *parent)
 {
 	struct device_node *eiu_node;
@@ -390,7 +386,7 @@ int __init icu_of_init(struct device_nod
 	mips_cpu_irq_init();
 
 	for (i = 0; i < MAX_IM; i++)
-		setup_irq(i + 2, &cascade);
+		irq_set_chained_handler(i + 2, ltq_hw_irq_handler);
 
 	if (cpu_has_vint) {
 		pr_info("Setting up vectored interrupts\n");

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 67/72] mm: workingset: fix premature shadow node shrinking with cgroups
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 66/72] MIPS: Lantiq: Fix cascaded IRQ setup Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 68/72] blk: improve order of bio handling in generic_make_request() Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Johannes Weiner, Vladimir Davydov,
	Michal Hocko, Andrew Morton, Linus Torvalds

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Johannes Weiner <hannes@cmpxchg.org>

commit 0cefabdaf757a6455d75f00cb76874e62703ed18 upstream.

Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg
aware") enabled cgroup-awareness in the shadow node shrinker, but forgot
to also enable cgroup-awareness in the list_lru the shadow nodes sit on.

Consequently, all shadow nodes are sitting on a global (per-NUMA node)
list, while the shrinker applies the limits according to the amount of
cache in the cgroup its shrinking.  The result is excessive pressure on
the shadow nodes from cgroups that have very little cache.

Enable memcg-mode on the shadow node LRUs, such that per-cgroup limits
are applied to per-cgroup lists.

Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20170322005320.8165-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@tarantool.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 mm/workingset.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -492,7 +492,7 @@ static int __init workingset_init(void)
 	pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
 	       timestamp_bits, max_order, bucket_order);
 
-	ret = list_lru_init_key(&workingset_shadow_nodes, &shadow_nodes_key);
+	ret = __list_lru_init(&workingset_shadow_nodes, true, &shadow_nodes_key);
 	if (ret)
 		goto err;
 	ret = register_shrinker(&workingset_shadow_shrinker);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 68/72] blk: improve order of bio handling in generic_make_request()
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 67/72] mm: workingset: fix premature shadow node shrinking with cgroups Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 69/72] blk: Ensure users for current->bio_list can see the full list Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jinpu Wang, NeilBrown, Jens Axboe

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: NeilBrown <neilb@suse.com>

commit 79bd99596b7305ab08109a8bf44a6a4511dbf1cd upstream.

To avoid recursion on the kernel stack when stacked block devices
are in use, generic_make_request() will, when called recursively,
queue new requests for later handling.  They will be handled when the
make_request_fn for the current bio completes.

If any bios are submitted by a make_request_fn, these will ultimately
be handled seqeuntially.  If the handling of one of those generates
further requests, they will be added to the end of the queue.

This strict first-in-first-out behaviour can lead to deadlocks in
various ways, normally because a request might need to wait for a
previous request to the same device to complete.  This can happen when
they share a mempool, and can happen due to interdependencies
particular to the device.  Both md and dm have examples where this happens.

These deadlocks can be erradicated by more selective ordering of bios.
Specifically by handling them in depth-first order.  That is: when the
handling of one bio generates one or more further bios, they are
handled immediately after the parent, before any siblings of the
parent.  That way, when generic_make_request() calls make_request_fn
for some particular device, we can be certain that all previously
submited requests for that device have been completely handled and are
not waiting for anything in the queue of requests maintained in
generic_make_request().

An easy way to achieve this would be to use a last-in-first-out stack
instead of a queue.  However this will change the order of consecutive
bios submitted by a make_request_fn, which could have unexpected consequences.
Instead we take a slightly more complex approach.
A fresh queue is created for each call to a make_request_fn.  After it completes,
any bios for a different device are placed on the front of the main queue, followed
by any bios for the same device, followed by all bios that were already on
the queue before the make_request_fn was called.
This provides the depth-first approach without reordering bios on the same level.

This, by itself, it not enough to remove all deadlocks.  It just makes
it possible for drivers to take the extra step required themselves.

To avoid deadlocks, drivers must never risk waiting for a request
after submitting one to generic_make_request.  This includes never
allocing from a mempool twice in the one call to a make_request_fn.

A common pattern in drivers is to call bio_split() in a loop, handling
the first part and then looping around to possibly split the next part.
Instead, a driver that finds it needs to split a bio should queue
(with generic_make_request) the second part, handle the first part,
and then return.  The new code in generic_make_request will ensure the
requests to underlying bios are processed first, then the second bio
that was split off.  If it splits again, the same process happens.  In
each case one bio will be completely handled before the next one is attempted.

With this is place, it should be possible to disable the
punt_bios_to_recover() recovery thread for many block devices, and
eventually it may be possible to remove it completely.

Ref: http://www.spinics.net/lists/raid/msg54680.html
Tested-by: Jinpu Wang <jinpu.wang@profitbricks.com>
Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-core.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2036,17 +2036,34 @@ blk_qc_t generic_make_request(struct bio
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, false) == 0)) {
+			struct bio_list hold;
+			struct bio_list lower, same;
+
+			/* Create a fresh bio_list for all subordinate requests */
+			hold = bio_list_on_stack;
+			bio_list_init(&bio_list_on_stack);
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
 
-			bio = bio_list_pop(current->bio_list);
+			/* sort new bios into those for a lower level
+			 * and those for the same level
+			 */
+			bio_list_init(&lower);
+			bio_list_init(&same);
+			while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
+				if (q == bdev_get_queue(bio->bi_bdev))
+					bio_list_add(&same, bio);
+				else
+					bio_list_add(&lower, bio);
+			/* now assemble so we handle the lowest level first */
+			bio_list_merge(&bio_list_on_stack, &lower);
+			bio_list_merge(&bio_list_on_stack, &same);
+			bio_list_merge(&bio_list_on_stack, &hold);
 		} else {
-			struct bio *bio_next = bio_list_pop(current->bio_list);
-
 			bio_io_error(bio);
-			bio = bio_next;
 		}
+		bio = bio_list_pop(current->bio_list);
 	} while (bio);
 	current->bio_list = NULL; /* deactivate */
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 69/72] blk: Ensure users for current->bio_list can see the full list.
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 68/72] blk: improve order of bio handling in generic_make_request() Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 70/72] padata: avoid race in reordering Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, NeilBrown, Jens Axboe, Jack Wang

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: NeilBrown <neilb@suse.com>

commit f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f upstream.

Commit 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
changed current->bio_list so that it did not contain *all* of the
queued bios, but only those submitted by the currently running
make_request_fn.

There are two places which walk the list and requeue selected bios,
and others that check if the list is empty.  These are no longer
correct.

So redefine current->bio_list to point to an array of two lists, which
contain all queued bios, and adjust various code to test or walk both
lists.

Signed-off-by: NeilBrown <neilb@suse.com>
Fixes: 79bd99596b73 ("blk: improve order of bio handling in generic_make_request()")
Signed-off-by: Jens Axboe <axboe@fb.com>
Cc: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/bio.c         |   12 +++++++++---
 block/blk-core.c    |   30 ++++++++++++++++++------------
 drivers/md/dm.c     |   29 ++++++++++++++++-------------
 drivers/md/raid10.c |    3 ++-
 4 files changed, 45 insertions(+), 29 deletions(-)

--- a/block/bio.c
+++ b/block/bio.c
@@ -372,10 +372,14 @@ static void punt_bios_to_rescuer(struct
 	bio_list_init(&punt);
 	bio_list_init(&nopunt);
 
-	while ((bio = bio_list_pop(current->bio_list)))
+	while ((bio = bio_list_pop(&current->bio_list[0])))
 		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
+	current->bio_list[0] = nopunt;
 
-	*current->bio_list = nopunt;
+	bio_list_init(&nopunt);
+	while ((bio = bio_list_pop(&current->bio_list[1])))
+		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
+	current->bio_list[1] = nopunt;
 
 	spin_lock(&bs->rescue_lock);
 	bio_list_merge(&bs->rescue_list, &punt);
@@ -462,7 +466,9 @@ struct bio *bio_alloc_bioset(gfp_t gfp_m
 		 * we retry with the original gfp_flags.
 		 */
 
-		if (current->bio_list && !bio_list_empty(current->bio_list))
+		if (current->bio_list &&
+		    (!bio_list_empty(&current->bio_list[0]) ||
+		     !bio_list_empty(&current->bio_list[1])))
 			gfp_mask &= ~__GFP_DIRECT_RECLAIM;
 
 		p = mempool_alloc(bs->bio_pool, gfp_mask);
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1994,7 +1994,14 @@ end_io:
  */
 blk_qc_t generic_make_request(struct bio *bio)
 {
-	struct bio_list bio_list_on_stack;
+	/*
+	 * bio_list_on_stack[0] contains bios submitted by the current
+	 * make_request_fn.
+	 * bio_list_on_stack[1] contains bios that were submitted before
+	 * the current make_request_fn, but that haven't been processed
+	 * yet.
+	 */
+	struct bio_list bio_list_on_stack[2];
 	blk_qc_t ret = BLK_QC_T_NONE;
 
 	if (!generic_make_request_checks(bio))
@@ -2011,7 +2018,7 @@ blk_qc_t generic_make_request(struct bio
 	 * should be added at the tail
 	 */
 	if (current->bio_list) {
-		bio_list_add(current->bio_list, bio);
+		bio_list_add(&current->bio_list[0], bio);
 		goto out;
 	}
 
@@ -2030,18 +2037,17 @@ blk_qc_t generic_make_request(struct bio
 	 * bio_list, and call into ->make_request() again.
 	 */
 	BUG_ON(bio->bi_next);
-	bio_list_init(&bio_list_on_stack);
-	current->bio_list = &bio_list_on_stack;
+	bio_list_init(&bio_list_on_stack[0]);
+	current->bio_list = bio_list_on_stack;
 	do {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, false) == 0)) {
-			struct bio_list hold;
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
-			hold = bio_list_on_stack;
-			bio_list_init(&bio_list_on_stack);
+			bio_list_on_stack[1] = bio_list_on_stack[0];
+			bio_list_init(&bio_list_on_stack[0]);
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
@@ -2051,19 +2057,19 @@ blk_qc_t generic_make_request(struct bio
 			 */
 			bio_list_init(&lower);
 			bio_list_init(&same);
-			while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
+			while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
 				if (q == bdev_get_queue(bio->bi_bdev))
 					bio_list_add(&same, bio);
 				else
 					bio_list_add(&lower, bio);
 			/* now assemble so we handle the lowest level first */
-			bio_list_merge(&bio_list_on_stack, &lower);
-			bio_list_merge(&bio_list_on_stack, &same);
-			bio_list_merge(&bio_list_on_stack, &hold);
+			bio_list_merge(&bio_list_on_stack[0], &lower);
+			bio_list_merge(&bio_list_on_stack[0], &same);
+			bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
 		} else {
 			bio_io_error(bio);
 		}
-		bio = bio_list_pop(current->bio_list);
+		bio = bio_list_pop(&bio_list_on_stack[0]);
 	} while (bio);
 	current->bio_list = NULL; /* deactivate */
 
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -986,26 +986,29 @@ static void flush_current_bio_list(struc
 	struct dm_offload *o = container_of(cb, struct dm_offload, cb);
 	struct bio_list list;
 	struct bio *bio;
+	int i;
 
 	INIT_LIST_HEAD(&o->cb.list);
 
 	if (unlikely(!current->bio_list))
 		return;
 
-	list = *current->bio_list;
-	bio_list_init(current->bio_list);
-
-	while ((bio = bio_list_pop(&list))) {
-		struct bio_set *bs = bio->bi_pool;
-		if (unlikely(!bs) || bs == fs_bio_set) {
-			bio_list_add(current->bio_list, bio);
-			continue;
+	for (i = 0; i < 2; i++) {
+		list = current->bio_list[i];
+		bio_list_init(&current->bio_list[i]);
+
+		while ((bio = bio_list_pop(&list))) {
+			struct bio_set *bs = bio->bi_pool;
+			if (unlikely(!bs) || bs == fs_bio_set) {
+				bio_list_add(&current->bio_list[i], bio);
+				continue;
+			}
+
+			spin_lock(&bs->rescue_lock);
+			bio_list_add(&bs->rescue_list, bio);
+			queue_work(bs->rescue_workqueue, &bs->rescue_work);
+			spin_unlock(&bs->rescue_lock);
 		}
-
-		spin_lock(&bs->rescue_lock);
-		bio_list_add(&bs->rescue_list, bio);
-		queue_work(bs->rescue_workqueue, &bs->rescue_work);
-		spin_unlock(&bs->rescue_lock);
 	}
 }
 
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -941,7 +941,8 @@ static void wait_barrier(struct r10conf
 				    !conf->barrier ||
 				    (atomic_read(&conf->nr_pending) &&
 				     current->bio_list &&
-				     !bio_list_empty(current->bio_list)),
+				     (!bio_list_empty(&current->bio_list[0]) ||
+				      !bio_list_empty(&current->bio_list[1]))),
 				    conf->resync_lock);
 		conf->nr_waiting--;
 		if (!conf->nr_waiting)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 70/72] padata: avoid race in reordering
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 69/72] blk: Ensure users for current->bio_list can see the full list Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 71/72] nvme/core: Fix race kicking freed request_queue Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason A. Donenfeld, Steffen Klassert,
	Herbert Xu

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jason A. Donenfeld <Jason@zx2c4.com>

commit de5540d088fe97ad583cc7d396586437b32149a5 upstream.

Under extremely heavy uses of padata, crashes occur, and with list
debugging turned on, this happens instead:

[87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33
__list_add+0xae/0x130
[87487.301868] list_add corruption. prev->next should be next
(ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00).
[87487.339011]  [<ffffffff9a53d075>] dump_stack+0x68/0xa3
[87487.342198]  [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0
[87487.345364]  [<ffffffff99d6b91f>] __warn+0xff/0x140
[87487.348513]  [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50
[87487.351659]  [<ffffffff9a58b5de>] __list_add+0xae/0x130
[87487.354772]  [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70
[87487.357915]  [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420
[87487.361084]  [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120

padata_reorder calls list_add_tail with the list to which its adding
locked, which seems correct:

spin_lock(&squeue->serial.lock);
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);

This therefore leaves only place where such inconsistency could occur:
if padata->list is added at the same time on two different threads.
This pdata pointer comes from the function call to
padata_get_next(pd), which has in it the following block:

next_queue = per_cpu_ptr(pd->pqueue, cpu);
padata = NULL;
reorder = &next_queue->reorder;
if (!list_empty(&reorder->list)) {
       padata = list_entry(reorder->list.next,
                           struct padata_priv, list);
       spin_lock(&reorder->lock);
       list_del_init(&padata->list);
       atomic_dec(&pd->reorder_objects);
       spin_unlock(&reorder->lock);

       pd->processed++;

       goto out;
}
out:
return padata;

I strongly suspect that the problem here is that two threads can race
on reorder list. Even though the deletion is locked, call to
list_entry is not locked, which means it's feasible that two threads
pick up the same padata object and subsequently call list_add_tail on
them at the same time. The fix is thus be hoist that lock outside of
that block.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/padata.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -190,19 +190,20 @@ static struct padata_priv *padata_get_ne
 
 	reorder = &next_queue->reorder;
 
+	spin_lock(&reorder->lock);
 	if (!list_empty(&reorder->list)) {
 		padata = list_entry(reorder->list.next,
 				    struct padata_priv, list);
 
-		spin_lock(&reorder->lock);
 		list_del_init(&padata->list);
 		atomic_dec(&pd->reorder_objects);
-		spin_unlock(&reorder->lock);
 
 		pd->processed++;
 
+		spin_unlock(&reorder->lock);
 		goto out;
 	}
+	spin_unlock(&reorder->lock);
 
 	if (__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index) {
 		padata = ERR_PTR(-ENODATA);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 71/72] nvme/core: Fix race kicking freed request_queue
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 70/72] padata: avoid race in reordering Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06  8:38 ` [PATCH 4.9 72/72] nvme/pci: Disable on removal when disconnected Greg Kroah-Hartman
                   ` (2 subsequent siblings)
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Keith Busch, Johannes Thumshirn,
	Christoph Hellwig, Sagi Grimberg, Jens Axboe

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Keith Busch <keith.busch@intel.com>

commit f33447b90e96076483525b21cc4e0a8977cdd07c upstream.

If a namespace has already been marked dead, we don't want to kick the
request_queue again since we may have just freed it from another thread.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/core.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2057,9 +2057,9 @@ void nvme_kill_queues(struct nvme_ctrl *
 		 * Revalidating a dead namespace sets capacity to 0. This will
 		 * end buffered writers dirtying pages that can't be synced.
 		 */
-		if (ns->disk && !test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-			revalidate_disk(ns->disk);
-
+		if (!ns->disk || test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+			continue;
+		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 		blk_mq_abort_requeue_list(ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH 4.9 72/72] nvme/pci: Disable on removal when disconnected
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 71/72] nvme/core: Fix race kicking freed request_queue Greg Kroah-Hartman
@ 2017-04-06  8:38 ` Greg Kroah-Hartman
  2017-04-06 17:46 ` [PATCH 4.9 00/72] 4.9.21-stable review Shuah Khan
  2017-04-06 21:52 ` Guenter Roeck
  66 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2017-04-06  8:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Keith Busch, Johannes Thumshirn,
	Christoph Hellwig, Sagi Grimberg, Jens Axboe

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Keith Busch <keith.busch@intel.com>

commit 6db28eda266052f86a6b402422de61eeb7d2e351 upstream.

If the device is not present, the driver should disable the queues
immediately. Prior to this, the driver was relying on the watchdog timer
to kill the queues if requests were outstanding to the device, and that
just delays removal up to one second.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/nvme/host/pci.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1983,8 +1983,10 @@ static void nvme_remove(struct pci_dev *
 
 	pci_set_drvdata(pdev, NULL);
 
-	if (!pci_device_is_present(pdev))
+	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
+		nvme_dev_disable(dev, false);
+	}
 
 	flush_work(&dev->reset_work);
 	nvme_uninit_ctrl(&dev->ctrl);

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 4.9 00/72] 4.9.21-stable review
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2017-04-06  8:38 ` [PATCH 4.9 72/72] nvme/pci: Disable on removal when disconnected Greg Kroah-Hartman
@ 2017-04-06 17:46 ` Shuah Khan
  2017-04-06 21:52 ` Guenter Roeck
  66 siblings, 0 replies; 69+ messages in thread
From: Shuah Khan @ 2017-04-06 17:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: torvalds, akpm, linux, patches, ben.hutchings, stable, Shuah Khan

On 04/06/2017 02:37 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.21 release.
> There are 72 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Apr  8 08:36:01 UTC 2017.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.21-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 4.9 00/72] 4.9.21-stable review
  2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2017-04-06 17:46 ` [PATCH 4.9 00/72] 4.9.21-stable review Shuah Khan
@ 2017-04-06 21:52 ` Guenter Roeck
  66 siblings, 0 replies; 69+ messages in thread
From: Guenter Roeck @ 2017-04-06 21:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings, stable

On Thu, Apr 06, 2017 at 10:37:47AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.21 release.
> There are 72 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Apr  8 08:36:01 UTC 2017.
> Anything received after that time might be too late.
> 
Build results:
	total: 149 pass: 149 fail: 0
Qemu test results:
	total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2017-04-06 21:52 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06  8:37 [PATCH 4.9 00/72] 4.9.21-stable review Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 01/72] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 02/72] xen/setup: Dont relocate p2m over existing one Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 03/72] xfs: only update mount/resv fields on success in __xfs_ag_resv_init Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 04/72] xfs: use per-AG reservations for the finobt Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 05/72] xfs: pull up iolock from xfs_free_eofblocks() Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 06/72] xfs: sync eofblocks scans under iolock are livelock prone Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 07/72] xfs: fix eofblocks race with file extending async dio writes Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 08/72] xfs: fix toctou race when locking an inode to access the data map Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 09/72] xfs: fail _dir_open when readahead fails Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 10/72] xfs: filter out obviously bad btree pointers Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 11/72] xfs: check for obviously bad level values in the bmbt root Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.9 12/72] xfs: verify free block header fields Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 13/72] xfs: allow unwritten extents in the CoW fork Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 14/72] xfs: mark speculative prealloc CoW fork extents unwritten Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 15/72] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 16/72] xfs: update ctime and mtime on clone destinatation inodes Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 17/72] xfs: reject all unaligned direct writes to reflinked files Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 18/72] xfs: dont fail xfs_extent_busy allocation Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 19/72] xfs: handle indlen shortage on delalloc extent merge Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 20/72] xfs: split indlen reservations fairly when under reserved Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 21/72] xfs: fix uninitialized variable in _reflink_convert_cow Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 22/72] xfs: dont reserve blocks for right shift transactions Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 23/72] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 24/72] xfs: tune down agno asserts in the bmap code Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 25/72] xfs: only reclaim unwritten COW extents periodically Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 26/72] xfs: fix and streamline error handling in xfs_end_io Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 27/72] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 28/72] xfs: use iomap new flag for newly allocated delalloc blocks Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 29/72] xfs: try any AG when allocating the first btree block when reflinking Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 30/72] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 31/72] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 32/72] scsi: scsi_dh_alua: Check scsi_device_get() return value Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 33/72] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 35/72] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 36/72] ALSA: hda - fix a problem for lineout on a Dell AIO machine Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 37/72] ASoC: atmel-classd: fix audio clock rate Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 38/72] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 39/72] HID: wacom: Dont add ghost interface as shared data Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 40/72] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 41/72] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 42/72] NFSv4.1 fix infinite loop on IO BAD_STATEID error Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 43/72] nfsd: map the ENOKEY to nfserr_perm for avoiding warning Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 44/72] parisc: Clean up fixup routines for get_user()/put_user() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 45/72] parisc: Avoid stalled CPU warnings after system shutdown Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 46/72] parisc: Fix access fault handling in pa_memcpy() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 47/72] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 48/72] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 49/72] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 50/72] tty/serial: atmel: fix TX path in atmel_console_write() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 51/72] USB: fix linked-list corruption in rh_call_control() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 54/72] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 56/72] drm/vc4: Allocate the right amount of space for boot-time CRTC state Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 57/72] drm/etnaviv: (re-)protect fence allocation with GPU mutex Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 58/72] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization Greg Kroah-Hartman
2017-04-06  8:38   ` Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 59/72] x86/mce: Fix copy/paste error in exception table entries Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 61/72] mm: rmap: fix huge file mmap accounting in the memcg stats Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 62/72] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 64/72] qla2xxx: Allow vref count to timeout on vport delete Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 66/72] MIPS: Lantiq: Fix cascaded IRQ setup Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 67/72] mm: workingset: fix premature shadow node shrinking with cgroups Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 68/72] blk: improve order of bio handling in generic_make_request() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 69/72] blk: Ensure users for current->bio_list can see the full list Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 70/72] padata: avoid race in reordering Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 71/72] nvme/core: Fix race kicking freed request_queue Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.9 72/72] nvme/pci: Disable on removal when disconnected Greg Kroah-Hartman
2017-04-06 17:46 ` [PATCH 4.9 00/72] 4.9.21-stable review Shuah Khan
2017-04-06 21:52 ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.