All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7154 bytes --]

This is the start of the longterm review cycle for the 2.6.32.69 release.
All patches will be posted as a response to this one. If anyone has any
issue with these being applied, please let me know. If anyone is a
maintainer of the proper subsystem, and wants to add a Signed-off-by: line
to the patch, please respond with it. If anyone thinks some important
patches are missing and should be added prior to the release, please
report them quickly with their respective mainline commit IDs.

Responses should be made by Sat Dec  5 22:47:02 CET 2015.
Anything received after that time might be too late. If someone
wants a bit more time for a deeper review, please let me know.

NOTE: 2.6.32 is approaching end of support. There will probably be one
or maybe two other versions issued in the next 3 months, and that will
be all, at least for me. Adding to this the time it can take to validate
and deploy in some environments, it probably makes sense to start to
think about switching to another longterm branch. 3.2 and 3.4 are good
candidates for those seeking rock-solid versions. Longterm branches and
their projected EOLs are listed here :

     https://www.kernel.org/category/releases.html

The whole patch series can be found in one patch at :
     https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc1.gz

The shortlog and diffstat are appended below.

Thanks,
Willy

===============

Andy Lutomirski (1):
      x86/paravirt: Replace the paravirt nop with a bona fide empty function

Ani Sinha (1):
      ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.

Benjamin Randazzo (1):
      md: use kzalloc() when bitmap is disabled

Christophe Leroy (1):
      splice: sendfile() at once fails for big files

Dan Carpenter (2):
      rds: fix an integer overflow test in rds_info_getsockopt()
      devres: fix a for loop bounds check

Dāvis Mosāns (1):
      mvsas: Fix NULL pointer dereference in mvs_slot_task_free

Eric Dumazet (1):
      net: avoid NULL deref in inet_ctl_sock_destroy()

Eric W. Biederman (1):
      dcache: Handle escaped paths in prepend_path

Filipe Manana (1):
      Btrfs: fix read corruption of compressed and shared extents

Herbert Xu (4):
      net: Clone skb before setting peeked flag
      net: Fix skb_set_peeked use-after-free bug
      net: Fix skb csum races when peeking
      crypto: api - Only abort operations on fatal signal

Herton R. Krzesinski (1):
      ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

Hin-Tak Leung (2):
      hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
      hfs: fix B-tree corruption after insertion at position 0

Jan Kara (1):
      xfs: Fix xfs_attr_leafblock definition

Jason Wang (1):
      virtio-net: drop NETIF_F_FRAGLIST

Jeff Mahoney (1):
      btrfs: skip waiting on ordered range for special files

Joe Perches (1):
      ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings

Johan Hovold (1):
      USB: whiteheat: fix potential null-deref at probe

Konstantin Khlebnikov (1):
      pagemap: hide physical addresses from non-privileged users

Kosuke Tatsukawa (1):
      tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c

Linus Torvalds (1):
      Initialize msg/shm IPC objects before doing ipc_addid()

Maciej W. Rozycki (1):
      binfmt_elf: Don't clobber passed executable's file header

Manfred Spraul (1):
      ipc/sem.c: fully initialize sem_array before making it visible

Marcelo Leitner (1):
      ipv6: addrconf: validate new MTU before applying it

Masahiro Yamada (1):
      devres: fix devres_get()

Mathias Nyman (1):
      xhci: fix off by one error in TRB DMA address boundary check

Mel Gorman (1):
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault

Michal Kubeček (1):
      ipv6: fix tunnel error handling

Olga Kornievskaia (1):
      Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount

Paul Bolle (1):
      windfarm: decrement client count when unregistering

Peter Zijlstra (1):
      module: Fix locking in symbol_put_addr()

Pravin B Shelar (2):
      skbuff: Fix skb checksum flag on skb pull
      skbuff: Fix skb checksum partial check.

Richard Purdie (1):
      HID: core: Avoid uninitialized buffer access

Sabrina Dubroca (1):
      net: add length argument to skb_copy_and_csum_datagram_iovec

Sasha Levin (1):
      RDS: verify the underlying transport exists before creating a connection

Sowmini Varadhan (1):
      RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv

Takashi Iwai (1):
      Input: evdev - do not report errors form flush()

Thomas Gleixner (1):
      x86/process: Add proper bound checks in 64bit get_wchan()

Trond Myklebust (1):
      SUNRPC: xs_reset_transport must mark the connection as disconnected

 arch/x86/kernel/entry_64.S        | 11 +++++++++
 arch/x86/kernel/paravirt.c        | 16 +++++++++---
 arch/x86/kernel/process_64.c      | 52 +++++++++++++++++++++++++++++++--------
 drivers/base/devres.c             |  4 +--
 drivers/char/n_tty.c              |  6 ++---
 drivers/hid/hid-core.c            |  2 +-
 drivers/input/evdev.c             | 13 +++-------
 drivers/macintosh/windfarm_core.c |  2 +-
 drivers/md/md.c                   |  4 +--
 drivers/net/virtio_net.c          |  2 +-
 drivers/scsi/mvsas/mv_sas.c       |  2 ++
 drivers/usb/host/xhci-ring.c      |  2 +-
 drivers/usb/serial/whiteheat.c    | 31 +++++++++++++++++++++++
 fs/binfmt_elf.c                   | 10 ++++----
 fs/dcache.c                       | 11 ++++++++-
 fs/hfs/bnode.c                    |  9 +++----
 fs/hfs/brec.c                     | 20 ++++++++-------
 fs/hfsplus/bnode.c                |  9 +++----
 fs/nfs/nfs4state.c                |  2 +-
 fs/proc/task_mmu.c                | 21 ++++++----------
 fs/splice.c                       | 12 ++++++++-
 fs/xfs/xfs_attr_leaf.h            | 11 +++++++--
 include/linux/skbuff.h            |  3 ++-
 include/net/inet_common.h         |  3 ++-
 ipc/msg.c                         | 18 +++++++-------
 ipc/sem.c                         | 34 ++++++++++++++++---------
 ipc/shm.c                         | 13 +++++-----
 ipc/util.c                        |  8 +++---
 kernel/module.c                   |  8 ++++--
 lib/devres.c                      |  2 +-
 mm/hugetlb.c                      |  8 ++++++
 net/core/datagram.c               | 50 +++++++++++++++++++++++++++++++++----
 net/core/ethtool.c                |  2 +-
 net/ipv4/ipmr.c                   |  4 +--
 net/ipv4/tcp_input.c              |  2 +-
 net/ipv4/udp.c                    |  2 +-
 net/ipv6/addrconf.c               | 37 +++++++++++++++++++++++++++-
 net/ipv6/raw.c                    |  2 +-
 net/ipv6/udp.c                    |  3 ++-
 net/rds/connection.c              |  6 +++++
 net/rds/info.c                    |  2 +-
 net/rds/tcp_recv.c                | 11 +++++++--
 net/rxrpc/ar-recvmsg.c            |  3 ++-
 net/sunrpc/xprtsock.c             |  2 ++
 44 files changed, 346 insertions(+), 129 deletions(-)
--



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 01/38] [PATCH 01/38] dcache: Handle escaped paths in prepend_path
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric W. Biederman, Al Viro, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root.  For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in.  So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root.  Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1.  So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out.  d_path just wants to print
something, and does not care about the weird cases.  Which raises
the question what should be printed?

Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths.  As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: For 2.6.32, implement the "(unreachable)" string in __d_path()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/dcache.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 44c0aea..07c4472 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1910,7 +1910,7 @@ char *__d_path(const struct path *path, struct path *root,
 	struct dentry *dentry = path->dentry;
 	struct vfsmount *vfsmnt = path->mnt;
 	char *end = buffer + buflen;
-	char *retval;
+	char *retval, *tail;
 
 	spin_lock(&vfsmount_lock);
 	prepend(&end, &buflen, "\0", 1);
@@ -1923,6 +1923,7 @@ char *__d_path(const struct path *path, struct path *root,
 	/* Get '/' right */
 	retval = end-1;
 	*retval = '/';
+	tail = end;
 
 	for (;;) {
 		struct dentry * parent;
@@ -1930,6 +1931,14 @@ char *__d_path(const struct path *path, struct path *root,
 		if (dentry == root->dentry && vfsmnt == root->mnt)
 			break;
 		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+			/* Escaped? */
+			if (dentry != vfsmnt->mnt_root) {
+				buflen += (tail - end);
+				end = tail;
+				prepend(&end, &buflen, "(unreachable)/", 14);
+				retval = end;
+				goto out;
+			}
 			/* Global root? */
 			if (vfsmnt->mnt_parent == vfsmnt) {
 				goto global_root;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 03/38] [PATCH 03/38] md: use kzalloc() when bitmap is disabled
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Benjamin Randazzo, NeilBrown, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit b6878d9e03043695dbf3fa1caa6dfc09db225b16 upstream.

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769         file = kmalloc(sizeof(*file), GFP_NOIO);
5770         if (!file)
5771                 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786         if (err == 0 &&
5787             copy_to_user(arg, file, sizeof(*file)))
5788                 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775         /* bitmap disabled, zero the first byte and copy out */
5776         if (!mddev->bitmap_info.file)
5777                 file->pathname[0] = '\0';

Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 2.6.32: patch both possible allocation calls]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/md/md.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4ce6e2f..ab0f708 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4680,9 +4680,9 @@ static int get_bitmap_file(mddev_t * mddev, void __user * arg)
 	int err = -ENOMEM;
 
 	if (md_allow_write(mddev))
-		file = kmalloc(sizeof(*file), GFP_NOIO);
+		file = kzalloc(sizeof(*file), GFP_NOIO);
 	else
-		file = kmalloc(sizeof(*file), GFP_KERNEL);
+		file = kzalloc(sizeof(*file), GFP_KERNEL);
 
 	if (!file)
 		goto out;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 04/38] [PATCH 04/38] ipv6: addrconf: validate new MTU before applying it
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marcelo Ricardo Leitner, Sabrina Dubroca, David S. Miller,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 77751427a1ff25b27d47a4c36b12c3c8667855ac upstream.

Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.

If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)

The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.

Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32:
 - Add a strategy for the sysctl as we don't get a default strategy
 - Adjust context, spacing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/addrconf.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e8c4fd9..34eed01 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4036,6 +4036,40 @@ static int addrconf_sysctl_forward_strategy(ctl_table *table,
 	return addrconf_fixup_forwarding(table, valp, val);
 }
 
+static
+struct ctl_table *addrconf_sysctl_mtu_init(struct ctl_table *newctl,
+					   const struct ctl_table *ctl)
+{
+	struct inet6_dev *idev = ctl->extra1;
+	static int min_mtu = IPV6_MIN_MTU;
+
+	*newctl = *ctl;
+	newctl->extra1 = &min_mtu;
+	newctl->extra2 = idev ? &idev->dev->mtu : NULL;
+	return newctl;
+}
+
+static
+int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
+			void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct ctl_table lctl;
+
+	return proc_dointvec_minmax(addrconf_sysctl_mtu_init(&lctl, ctl),
+				    write, buffer, lenp, ppos);
+}
+
+static int addrconf_sysctl_mtu_strategy(struct ctl_table *ctl,
+					void __user *oldval,
+					size_t __user *oldlenp,
+					void __user *newval, size_t newlen)
+{
+	struct ctl_table lctl;
+
+	return sysctl_intvec(addrconf_sysctl_mtu_init(&lctl, ctl),
+			     oldval, oldlenp, newval, newlen);
+}
+
 static void dev_disable_change(struct inet6_dev *idev)
 {
 	if (!idev || !idev->dev)
@@ -4142,7 +4176,8 @@ static struct addrconf_sysctl_table
 			.data		=	&ipv6_devconf.mtu6,
 			.maxlen		=	sizeof(int),
 			.mode		=	0644,
-			.proc_handler	=	proc_dointvec,
+			.proc_handler	=	addrconf_sysctl_mtu,
+			.strategy	=	addrconf_sysctl_mtu_strategy,
 		},
 		{
 			.ctl_name	=	NET_IPV6_ACCEPT_RA,
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 05/38] [PATCH 05/38] virtio-net: drop NETIF_F_FRAGLIST
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 48900cb6af4282fa0fb6ff4d72a81aa3dadb5c39 upstream.

virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.

A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.

Drop NETIF_F_FRAGLIST so we only get what we can handle.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 2.6.32: there's only a single features field]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 97a56f0..2bc6661 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -868,7 +868,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 	/* Do we support "hardware" checksums? */
 	if (csum && virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) {
 		/* This opens up the world of extra features. */
-		dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
+		dev->features |= NETIF_F_HW_CSUM | NETIF_F_SG;
 		if (gso && virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
 			dev->features |= NETIF_F_TSO | NETIF_F_UFO
 				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 06/38] [PATCH 06/38] USB: whiteheat: fix potential null-deref at probe
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Moein Ghasemzadeh, Johan Hovold, Greg Kroah-Hartman,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit cbb4be652d374f64661137756b8f357a1827d6a4 upstream.

Fix potential null-pointer dereference at probe by making sure that the
required endpoints are present.

The whiteheat driver assumes there are at least five pairs of bulk
endpoints, of which the final pair is used for the "command port". An
attempt to bind to an interface with fewer bulk endpoints would
currently lead to an oops.

Fixes CVE-2015-5257.

Reported-by: Moein Ghasemzadeh <moein@istuary.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/serial/whiteheat.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
index 748c627..1e5f35d 100644
--- a/drivers/usb/serial/whiteheat.c
+++ b/drivers/usb/serial/whiteheat.c
@@ -143,6 +143,8 @@ static int  whiteheat_firmware_download(struct usb_serial *serial,
 static int  whiteheat_firmware_attach(struct usb_serial *serial);
 
 /* function prototypes for the Connect Tech WhiteHEAT serial converter */
+static int whiteheat_probe(struct usb_serial *serial,
+				const struct usb_device_id *id);
 static int  whiteheat_attach(struct usb_serial *serial);
 static void whiteheat_release(struct usb_serial *serial);
 static int  whiteheat_open(struct tty_struct *tty,
@@ -188,6 +190,7 @@ static struct usb_serial_driver whiteheat_device = {
 	.usb_driver =		&whiteheat_driver,
 	.id_table =		id_table_std,
 	.num_ports =		4,
+	.probe =		whiteheat_probe,
 	.attach =		whiteheat_attach,
 	.release =		whiteheat_release,
 	.open =			whiteheat_open,
@@ -387,6 +390,34 @@ static int whiteheat_firmware_attach(struct usb_serial *serial)
 /*****************************************************************************
  * Connect Tech's White Heat serial driver functions
  *****************************************************************************/
+
+static int whiteheat_probe(struct usb_serial *serial,
+				const struct usb_device_id *id)
+{
+	struct usb_host_interface *iface_desc;
+	struct usb_endpoint_descriptor *endpoint;
+	size_t num_bulk_in = 0;
+	size_t num_bulk_out = 0;
+	size_t min_num_bulk;
+	unsigned int i;
+
+	iface_desc = serial->interface->cur_altsetting;
+
+	for (i = 0; i < iface_desc->desc.bNumEndpoints; i++) {
+		endpoint = &iface_desc->endpoint[i].desc;
+		if (usb_endpoint_is_bulk_in(endpoint))
+			++num_bulk_in;
+		if (usb_endpoint_is_bulk_out(endpoint))
+			++num_bulk_out;
+	}
+
+	min_num_bulk = COMMAND_PORT + 1;
+	if (num_bulk_in < min_num_bulk || num_bulk_out < min_num_bulk)
+		return -ENODEV;
+
+	return 0;
+}
+
 static int whiteheat_attach(struct usb_serial *serial)
 {
 	struct usb_serial_port *command_port;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 07/38] [PATCH 07/38] ipc/sem.c: fully initialize sem_array before making it visible
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Manfred Spraul, Rik van Riel, Davidlohr Bueso, Rafael Aquini,
	Andrew Morton, Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit e8577d1f0329d4842e8302e289fb2c22156abef4 upstream.

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Rik van Riel <riel@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 2.6.32:
 - Adjust context
 - The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 ipc/sem.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index b781007..26dc5b1 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -264,6 +264,12 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 		return retval;
 	}
 
+	sma->sem_base = (struct sem *) &sma[1];
+	INIT_LIST_HEAD(&sma->sem_pending);
+	INIT_LIST_HEAD(&sma->list_id);
+	sma->sem_nsems = nsems;
+	sma->sem_ctime = get_seconds();
+
 	id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni);
 	if (id < 0) {
 		security_sem_free(sma);
@@ -272,11 +278,6 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 	}
 	ns->used_sems += nsems;
 
-	sma->sem_base = (struct sem *) &sma[1];
-	INIT_LIST_HEAD(&sma->sem_pending);
-	INIT_LIST_HEAD(&sma->list_id);
-	sma->sem_nsems = nsems;
-	sma->sem_ctime = get_seconds();
 	sem_unlock(sma);
 
 	return sma->sem_perm.id;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 08/38] [PATCH 08/38] Initialize msg/shm IPC objects before doing ipc_addid()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dmitry Vyukov, Manfred Spraul, Davidlohr Bueso, Linus Torvalds,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit b9a532277938798b53178d5a66af6e2915cb27cf upstream.

As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state.  Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f0329:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 2.6.32:
 - Adjust context
 - The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 ipc/msg.c  | 18 +++++++++---------
 ipc/shm.c  | 13 +++++++------
 ipc/util.c |  8 ++++----
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 779f762..94c6411 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -199,6 +199,15 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 		return retval;
 	}
 
+	msq->q_stime = msq->q_rtime = 0;
+	msq->q_ctime = get_seconds();
+	msq->q_cbytes = msq->q_qnum = 0;
+	msq->q_qbytes = ns->msg_ctlmnb;
+	msq->q_lspid = msq->q_lrpid = 0;
+	INIT_LIST_HEAD(&msq->q_messages);
+	INIT_LIST_HEAD(&msq->q_receivers);
+	INIT_LIST_HEAD(&msq->q_senders);
+
 	/*
 	 * ipc_addid() locks msq
 	 */
@@ -209,15 +218,6 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 		return id;
 	}
 
-	msq->q_stime = msq->q_rtime = 0;
-	msq->q_ctime = get_seconds();
-	msq->q_cbytes = msq->q_qnum = 0;
-	msq->q_qbytes = ns->msg_ctlmnb;
-	msq->q_lspid = msq->q_lrpid = 0;
-	INIT_LIST_HEAD(&msq->q_messages);
-	INIT_LIST_HEAD(&msq->q_receivers);
-	INIT_LIST_HEAD(&msq->q_senders);
-
 	msg_unlock(msq);
 
 	return msq->q_perm.id;
diff --git a/ipc/shm.c b/ipc/shm.c
index d30732c..75cb87c 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -386,12 +386,6 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	if (IS_ERR(file))
 		goto no_file;
 
-	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
-	if (id < 0) {
-		error = id;
-		goto no_id;
-	}
-
 	shp->shm_cprid = task_tgid_vnr(current);
 	shp->shm_lprid = 0;
 	shp->shm_atim = shp->shm_dtim = 0;
@@ -399,6 +393,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	shp->shm_segsz = size;
 	shp->shm_nattch = 0;
 	shp->shm_file = file;
+
+	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
+	if (id < 0) {
+		error = id;
+		goto no_id;
+	}
+
 	/*
 	 * shmid gets reported as "inode#" in /proc/pid/maps.
 	 * proc-ps tools use this. Changing this will break them.
diff --git a/ipc/util.c b/ipc/util.c
index 79ce84e..b6fe615 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -264,6 +264,10 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	rcu_read_lock();
 	spin_lock(&new->lock);
 
+	current_euid_egid(&euid, &egid);
+	new->cuid = new->uid = euid;
+	new->gid = new->cgid = egid;
+
 	err = idr_get_new(&ids->ipcs_idr, new, &id);
 	if (err) {
 		spin_unlock(&new->lock);
@@ -273,10 +277,6 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 
 	ids->in_use++;
 
-	current_euid_egid(&euid, &egid);
-	new->cuid = new->uid = euid;
-	new->gid = new->cgid = egid;
-
 	new->seq = ids->seq++;
 	if(ids->seq > ids->seq_max)
 		ids->seq = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 10/38] [PATCH 10/38] rds: fix an integer overflow test in rds_info_getsockopt()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 468b732b6f76b138c0926eadf38ac88467dcd271 upstream.

"len" is a signed integer.  We check that len is not negative, so it
goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.

I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.

Fixes: a8c879a7ee98 ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f3a66bdc88c3261f55b942453476e623056b92d9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/info.c b/net/rds/info.c
index 814a91a..0a857a0 100644
--- a/net/rds/info.c
+++ b/net/rds/info.c
@@ -174,7 +174,7 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval,
 
 	/* check for all kinds of wrapping and the like */
 	start = (unsigned long)optval;
-	if (len < 0 || len + PAGE_SIZE - 1 < len || start + len < start) {
+	if (len < 0 || len > INT_MAX - PAGE_SIZE + 1 || start + len < start) {
 		ret = -EINVAL;
 		goto out;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 11/38] [PATCH 11/38] net: Clone skb before setting peeked flag
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konstantin Khlebnikov, Herbert Xu, David S. Miller,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec upstream.

Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.

The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first.  This causes funky races which leads
to double-free.

This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.

Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 72e6f0680249f5e0a87f2b282d033baefd90d84e)
[wt: adjusted context for 2.6.32. Introduces a bug, see next commit]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/datagram.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 4ade301..cbb3100 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -127,6 +127,35 @@ out_noerr:
 	goto out;
 }
 
+static int skb_set_peeked(struct sk_buff *skb)
+{
+	struct sk_buff *nskb;
+
+	if (skb->peeked)
+		return 0;
+
+	/* We have to unshare an skb before modifying it. */
+	if (!skb_shared(skb))
+		goto done;
+
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (!nskb)
+		return -ENOMEM;
+
+	skb->prev->next = nskb;
+	skb->next->prev = nskb;
+	nskb->prev = skb->prev;
+	nskb->next = skb->next;
+
+	consume_skb(skb);
+	skb = nskb;
+
+done:
+	skb->peeked = 1;
+
+	return 0;
+}
+
 /**
  *	__skb_recv_datagram - Receive a datagram skbuff
  *	@sk: socket
@@ -160,6 +189,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 				    int *peeked, int *err)
 {
 	struct sk_buff *skb;
+	unsigned long cpu_flags;
 	long timeo;
 	/*
 	 * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
@@ -178,14 +208,16 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 		 * Look at current nfs client by the way...
 		 * However, this function was corrent in any case. 8)
 		 */
-		unsigned long cpu_flags;
-
 		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
 		skb = skb_peek(&sk->sk_receive_queue);
 		if (skb) {
 			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
-				skb->peeked = 1;
+
+				error = skb_set_peeked(skb);
+				if (error)
+					goto unlock_err;
+
 				atomic_inc(&skb->users);
 			} else
 				__skb_unlink(skb, &sk->sk_receive_queue);
@@ -204,6 +236,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 
 	return NULL;
 
+unlock_err:
+	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 no_packet:
 	*err = error;
 	return NULL;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 12/38] [PATCH 12/38] net: Fix skb_set_peeked use-after-free bug
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brenden Blanco, Herbert Xu, Konstantin Khlebnikov,
	David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit a0a2a6602496a45ae838a96db8b8173794b5d398 upstream.

The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit e553622ccb6b6e06079f980f55cf04128db3420c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/datagram.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index cbb3100..c855336 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -127,12 +127,12 @@ out_noerr:
 	goto out;
 }
 
-static int skb_set_peeked(struct sk_buff *skb)
+static struct sk_buff *skb_set_peeked(struct sk_buff *skb)
 {
 	struct sk_buff *nskb;
 
 	if (skb->peeked)
-		return 0;
+		return skb;
 
 	/* We have to unshare an skb before modifying it. */
 	if (!skb_shared(skb))
@@ -140,7 +140,7 @@ static int skb_set_peeked(struct sk_buff *skb)
 
 	nskb = skb_clone(skb, GFP_ATOMIC);
 	if (!nskb)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	skb->prev->next = nskb;
 	skb->next->prev = nskb;
@@ -153,7 +153,7 @@ static int skb_set_peeked(struct sk_buff *skb)
 done:
 	skb->peeked = 1;
 
-	return 0;
+	return skb;
 }
 
 /**
@@ -214,8 +214,9 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
 
-				error = skb_set_peeked(skb);
-				if (error)
+				skb = skb_set_peeked(skb);
+				error = PTR_ERR(skb);
+				if (IS_ERR(skb))
 					goto unlock_err;
 
 				atomic_inc(&skb->users);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 13/38] [PATCH 13/38] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Herton R. Krzesinski, Manfred Spraul, Davidlohr Bueso,
	Rafael Aquini, Aristeu Rozanski, David Jeffery, Andrew Morton,
	Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a1c4fb80c5d94ef61a77c1e891cae616a50d8d3c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 ipc/sem.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 26dc5b1..fe3579f 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1296,16 +1296,27 @@ void exit_sem(struct task_struct *tsk)
 		rcu_read_lock();
 		un = list_entry_rcu(ulp->list_proc.next,
 				    struct sem_undo, list_proc);
-		if (&un->list_proc == &ulp->list_proc)
-			semid = -1;
-		 else
-			semid = un->semid;
+		if (&un->list_proc == &ulp->list_proc) {
+			/*
+			 * We must wait for freeary() before freeing this ulp,
+			 * in case we raced with last sem_undo. There is a small
+			 * possibility where we exit while freeary() didn't
+			 * finish unlocking sem_undo_list.
+			 */
+			spin_unlock_wait(&ulp->lock);
+			rcu_read_unlock();
+			break;
+		}
+		spin_lock(&ulp->lock);
+		semid = un->semid;
+		spin_unlock(&ulp->lock);
 		rcu_read_unlock();
 
+		/* exit_sem raced with IPC_RMID, nothing to do */
 		if (semid == -1)
-			break;
+			continue;
 
-		sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
+		sma = sem_lock_check(tsk->nsproxy->ipc_ns, semid);
 
 		/* exit_sem raced with IPC_RMID, nothing to do */
 		if (IS_ERR(sma))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 14/38] [PATCH 14/38] devres: fix devres_get()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Masahiro Yamada, Tejun Heo, Greg Kroah-Hartman, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 64526370d11ce8868ca495723d595b61e8697fbf upstream.

Currently, devres_get() passes devres_free() the pointer to devres,
but devres_free() should be given with the pointer to resource data.

Fixes: 9ac7849e35f7 ("devres: device resource management")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit ebc0ae5a7159086a992f7904dc9ad849a13eecfc)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/base/devres.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 05dd307..6d022dc 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -253,10 +253,10 @@ void * devres_get(struct device *dev, void *new_res,
 	if (!dr) {
 		add_dr(dev, &new_dr->node);
 		dr = new_dr;
-		new_dr = NULL;
+		new_res = NULL;
 	}
 	spin_unlock_irqrestore(&dev->devres_lock, flags);
-	devres_free(new_dr);
+	devres_free(new_res);
 
 	return dr->data;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 15/38] [PATCH 15/38] windfarm: decrement client count when unregistering
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul Bolle, Michael Ellerman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit fe2b592173ff0274e70dc44d1d28c19bb995aa7c upstream.

wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.

Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 48c46d4aed1c27f363e4673e988212340e82e1bb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/macintosh/windfarm_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/windfarm_core.c b/drivers/macintosh/windfarm_core.c
index 075b4d9..553414e 100644
--- a/drivers/macintosh/windfarm_core.c
+++ b/drivers/macintosh/windfarm_core.c
@@ -418,7 +418,7 @@ int wf_unregister_client(struct notifier_block *nb)
 {
 	mutex_lock(&wf_lock);
 	blocking_notifier_chain_unregister(&wf_client_list, nb);
-	wf_client_count++;
+	wf_client_count--;
 	if (wf_client_count == 0)
 		wf_stop_thread();
 	mutex_unlock(&wf_lock);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 16/38] [PATCH 16/38] xfs: Fix xfs_attr_leafblock definition
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jan Kara, Dave Chinner, Dave Chinner, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit ffeecc5213024ae663377b442eedcfbacf6d0c5d upstream.

struct xfs_attr_leafblock contains 'entries' array which is declared
with size 1 altough it can in fact contain much more entries. Since this
array is followed by further struct members, gcc (at least in version
4.8.3) thinks that the array has the fixed size of 1 element and thus
may optimize away all accesses beyond the end of array resulting in
non-working code. This problem was only observed with userspace code in
xfsprogs, however it's better to be safe in kernel as well and have
matching kernel and xfsprogs definitions.

Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 86cbc0072fa4fc7906dd8abfa6489638014300bb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/xfs/xfs_attr_leaf.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_attr_leaf.h b/fs/xfs/xfs_attr_leaf.h
index 9c7d22f..c782906 100644
--- a/fs/xfs/xfs_attr_leaf.h
+++ b/fs/xfs/xfs_attr_leaf.h
@@ -111,8 +111,15 @@ typedef struct xfs_attr_leaf_name_remote {
 typedef struct xfs_attr_leafblock {
 	xfs_attr_leaf_hdr_t	hdr;	/* constant-structure header block */
 	xfs_attr_leaf_entry_t	entries[1];	/* sorted on key, not name */
-	xfs_attr_leaf_name_local_t namelist;	/* grows from bottom of buf */
-	xfs_attr_leaf_name_remote_t valuelist;	/* grows from bottom of buf */
+	/*
+	 * The rest of the block contains the following structures after the
+	 * leaf entries, growing from the bottom up. The variables are never
+	 * referenced and definining them can actually make gcc optimize away
+	 * accesses to the 'entries' array above index 0 so don't do that.
+	 *
+	 * xfs_attr_leaf_name_local_t namelist;
+	 * xfs_attr_leaf_name_remote_t valuelist;
+	 */
 } xfs_attr_leafblock_t;
 
 /*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 17/38] [PATCH 17/38] SUNRPC: xs_reset_transport must mark the connection as disconnected
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Trond Myklebust, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream.

In case the reconnection attempt fails.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: add local variable xprt]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 9434e4855e90d2af3751cd93b47b4a3e40bc2dc1)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/sunrpc/xprtsock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index d37f07c..ec21612 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -734,6 +734,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
 {
 	struct socket *sock = transport->sock;
 	struct sock *sk = transport->inet;
+	struct rpc_xprt *xprt = &transport->xprt;
 
 	if (sk == NULL)
 		return;
@@ -747,6 +748,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
 	sk->sk_user_data = NULL;
 
 	xs_restore_old_callbacks(transport, sk);
+	xprt_clear_connected(xprt);
 	write_unlock_bh(&sk->sk_callback_lock);
 
 	sk->sk_no_check = 0;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 18/38] [PATCH 18/38] Input: evdev - do not report errors form flush()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Takashi Iwai, Dmitry Torokhov, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit eb38f3a4f6e86f8bb10a3217ebd85ecc5d763aae upstream.

We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices.  close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV.  logind was overreacting to it and decided to kill
itself when an unexpected error code was received.  What a tragedy.

The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().

Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.

Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: there's no revoked flag to test]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit a6706174cfe9fa100651b5012aec9796006a884b)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/input/evdev.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index dee6706..f05566f 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -102,19 +102,14 @@ static int evdev_flush(struct file *file, fl_owner_t id)
 {
 	struct evdev_client *client = file->private_data;
 	struct evdev *evdev = client->evdev;
-	int retval;
 
-	retval = mutex_lock_interruptible(&evdev->mutex);
-	if (retval)
-		return retval;
+	mutex_lock(&evdev->mutex);
 
-	if (!evdev->exist)
-		retval = -ENODEV;
-	else
-		retval = input_flush_device(&evdev->handle, file);
+	if (evdev->exist)
+		input_flush_device(&evdev->handle, file);
 
 	mutex_unlock(&evdev->mutex);
-	return retval;
+	return 0;
 }
 
 static void evdev_free(struct device *dev)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
@ 2015-11-29 21:47 ` Willy Tarreau
  2015-11-30  1:54   ` Ben Hutchings
  0 siblings, 1 reply; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konstantin Khlebnikov, Naoya Horiguchi, Mark Williamson,
	Andrew Morton, Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.

This patch makes pagemap readable for normal users and hides physical
addresses from them.  For some use-cases PFN isn't required at all.

See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
Tested-by:  Mark Williamson <mwilliamson@undo-software.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Add the same check in the places where we look up a PFN
 - Add struct pagemapread * parameters where necessary
 - Open-code file_ns_capable()
 - Delete pagemap_open() entirely, as it would always return 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
[wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, and
 security_capable() only takes a capability. Tested OK. ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/proc/task_mmu.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 73db5a6..78658aa 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -8,6 +8,7 @@
 #include <linux/mempolicy.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/security.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -539,6 +540,7 @@ const struct file_operations proc_clear_refs_operations = {
 
 struct pagemapread {
 	u64 __user *out, *end;
+	bool show_pfn;
 };
 
 #define PM_ENTRY_BYTES      sizeof(u64)
@@ -589,14 +591,14 @@ static u64 swap_pte_to_pagemap_entry(pte_t pte)
 	return swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT);
 }
 
-static u64 pte_to_pagemap_entry(pte_t pte)
+static u64 pte_to_pagemap_entry(struct pagemapread *pm, pte_t pte)
 {
 	u64 pme = 0;
 	if (is_swap_pte(pte))
 		pme = PM_PFRAME(swap_pte_to_pagemap_entry(pte))
 			| PM_PSHIFT(PAGE_SHIFT) | PM_SWAP;
 	else if (pte_present(pte))
-		pme = PM_PFRAME(pte_pfn(pte))
+		pme = (pm->show_pfn ? PM_PFRAME(pte_pfn(pte)) : 0)
 			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
 	return pme;
 }
@@ -624,7 +626,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		if (vma && (vma->vm_start <= addr) &&
 		    !is_vm_hugetlb_page(vma)) {
 			pte = pte_offset_map(pmd, addr);
-			pfn = pte_to_pagemap_entry(*pte);
+			pfn = pte_to_pagemap_entry(pm, *pte);
 			/* unmap before userspace copy */
 			pte_unmap(pte);
 		}
@@ -695,6 +697,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_task;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = !security_capable(CAP_SYS_ADMIN);
+
 	mm = get_task_mm(task);
 	if (!mm)
 		goto out_task;
@@ -773,19 +778,9 @@ out:
 	return ret;
 }
 
-static int pagemap_open(struct inode *inode, struct file *file)
-{
-	/* do not disclose physical addresses to unprivileged
-	   userspace (closes a rowhammer attack vector) */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	return 0;
-}
-
 const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
-	.open		= pagemap_open,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 20/38] [PATCH 20/38] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hin-Tak Leung, Sergei Antonov, Anton Altaparmakov, Sasha Levin,
	Al Viro, Christoph Hellwig, Vyacheslav Dubeyko, Sougata Santra,
	Andrew Morton, Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state".  Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem                  Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/fedora-root    3276800  551665    2725135   17% /
/dev/mapper/fedora-home   52879360  716221   52163139    2% /home
/dev/nbd0p2             4294967295 1387818 4293579477    1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years.  [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree.  The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents.  When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging.  There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create().  In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887
https://bugzilla.kernel.org/show_bug.cgi?id=42342
https://bugzilla.kernel.org/show_bug.cgi?id=63841
https://bugzilla.kernel.org/show_bug.cgi?id=78761

[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:36 2004 -0800

    [PATCH] HFS rewrite

http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd

commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date:   Wed Feb 25 16:17:48 2004 -0800

    [PATCH] HFS+ support

[3]:
http://sourceforge.net/projects/linux-hfsplus/

http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/
http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/

http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5

Date:   Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.

[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d

commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@linux-m68k.org>
Date:   Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit dd04e674cde34f570509b9e2a6b549af89897640)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfs/bnode.c     | 9 ++++-----
 fs/hfsplus/bnode.c | 9 ++++-----
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
index 0d20006..3136308 100644
--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -286,7 +286,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -396,11 +395,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;
 
-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }
 
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 29da657..7d75904 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -446,7 +446,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
 			page_cache_release(page);
 			goto fail;
 		}
-		page_cache_release(page);
 		node->page[i] = page;
 	}
 
@@ -556,11 +555,11 @@ node_error:
 
 void hfs_bnode_free(struct hfs_bnode *node)
 {
-	//int i;
+	int i;
 
-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
-	//	if (node->page[i])
-	//		page_cache_release(node->page[i]);
+	for (i = 0; i < node->tree->pages_per_bnode; i++)
+		if (node->page[i])
+			page_cache_release(node->page[i]);
 	kfree(node);
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 21/38] [PATCH 21/38] hfs: fix B-tree corruption after insertion at position 0
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hin-Tak Leung, Sergei Antonov, Joe Perches, Vyacheslav Dubeyko,
	Anton Altaparmakov, Al Viro, Christoph Hellwig, Andrew Morton,
	Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream.

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().

This is an identical change to the corresponding hfs b-tree code to Sergei
Antonov's "hfsplus: fix B-tree corruption after insertion at position 0",
to keep similar code paths in the hfs and hfsplus drivers in sync, where
appropriate.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit d46a34909db526644edf8ae62058d8371dd5f7e9)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/hfs/brec.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c
index 92fb358..db240c5 100644
--- a/fs/hfs/brec.c
+++ b/fs/hfs/brec.c
@@ -132,13 +132,16 @@ skip:
 	hfs_bnode_write(node, entry, data_off + key_len, entry_len);
 	hfs_bnode_dump(node);
 
-	if (new_node) {
-		/* update parent key if we inserted a key
-		 * at the start of the first node
-		 */
-		if (!rec && new_node != node)
-			hfs_brec_update_parent(fd);
+	/*
+	 * update parent key if we inserted a key
+	 * at the start of the node and it is not the new node
+	 */
+	if (!rec && new_node != node) {
+		hfs_bnode_read_key(node, fd->search_key, data_off + size);
+		hfs_brec_update_parent(fd);
+	}
 
+	if (new_node) {
 		hfs_bnode_put(fd->bnode);
 		if (!new_node->parent) {
 			hfs_btree_inc_height(tree);
@@ -167,9 +170,6 @@ skip:
 		goto again;
 	}
 
-	if (!rec)
-		hfs_brec_update_parent(fd);
-
 	return 0;
 }
 
@@ -366,6 +366,8 @@ again:
 	if (IS_ERR(parent))
 		return PTR_ERR(parent);
 	__hfs_brec_find(parent, fd);
+	if (fd->record < 0)
+		return -ENOENT;
 	hfs_bnode_dump(parent);
 	rec = fd->record;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 22/38] [PATCH 22/38] x86/paravirt: Replace the paravirt nop with a bona fide empty function
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andy Lutomirski, Thomas Gleixner, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit fc57a7c68020dcf954428869eafd934c0ab1536f upstream.

PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):

    ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
              2792: R_X86_64_PC32     pv_irq_ops+0x2c

That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse.  This is bad news for a CLBR_NONE operation.

Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in?  This can potentially cause breakage
that is very difficult to debug.

A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.

The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.

Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.

The Xen case may have other problems, so document them.

This is part of a fix for some random crashes that Sasha saw.

Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 81fbc9a5dd000126ef727dcdaea3ef5714d1e898)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/entry_64.S | 11 +++++++++++
 arch/x86/kernel/paravirt.c | 16 ++++++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 303eaeb8..b289d66 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1552,7 +1552,18 @@ END(error_exit)
 	/* runs on exception stack */
 ENTRY(nmi)
 	INTR_FRAME
+	/*
+	 * Fix up the exception frame if we're on Xen.
+	 * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
+	 * one value to the stack on native, so it may clobber the rdx
+	 * scratch slot, but it won't clobber any of the important
+	 * slots past it.
+	 *
+	 * Xen is a different story, because the Xen frame itself overlaps
+	 * the "NMI executing" variable.
+	 */
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
+
 	pushq_cfi $-1
 	subq $15*8, %rsp
 	CFI_ADJUST_CFA_OFFSET 15*8
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1b1739d..889e54f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -38,10 +38,18 @@
 #include <asm/tlbflush.h>
 #include <asm/timer.h>
 
-/* nop stub */
-void _paravirt_nop(void)
-{
-}
+/*
+ * nop stub, which must not clobber anything *including the stack* to
+ * avoid confusing the entry prologues.
+ */
+extern void _paravirt_nop(void);
+asm (".pushsection .entry.text, \"ax\"\n"
+     ".global _paravirt_nop\n"
+     "_paravirt_nop:\n\t"
+     "ret\n\t"
+     ".size _paravirt_nop, . - _paravirt_nop\n\t"
+     ".type _paravirt_nop, @function\n\t"
+     ".popsection");
 
 /* identity function, which can be inlined */
 u32 _paravirt_ident_32(u32 x)
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 23/38] [PATCH 23/38] RDS: verify the underlying transport exists before creating a connection
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Santosh Shilimkar, Sasha Levin, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 74e98eb085889b0d2d4908f59f6e00026063014f upstream.

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 987ad6eef35223b149baf453171b74917c372cbc)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/connection.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index cc8b568..8c3ddcd 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -187,6 +187,12 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 		}
 	}
 
+	if (trans == NULL) {
+		kmem_cache_free(rds_conn_slab, conn);
+		conn = ERR_PTR(-ENODEV);
+		goto out;
+	}
+
 	conn->c_trans = trans;
 
 	ret = trans->conn_alloc(conn, gfp);
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 24/38] [PATCH 24/38] net: Fix skb csum races when peeking
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Herbert Xu, Eric Dumazet, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 89c22d8c3b278212eef6a8cc66b570bc840a6f5a ]

When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 58a5897a53d535bf95523e6f381f88116217f5ca)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/datagram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index c855336..253d068 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -675,7 +675,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len)
 	if (likely(!sum)) {
 		if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
 			netdev_rx_csum_fault(skb->dev);
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		if (!skb_shared(skb))
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
 	}
 	return sum;
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 25/38] [PATCH 25/38] net: add length argument to skb_copy_and_csum_datagram_iovec
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sabrina Dubroca, Hannes Frederic Sowa, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees <= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 127500d724f8c43f452610c9080444eedb5eaa6c)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/skbuff.h | 3 ++-
 net/core/datagram.c    | 6 +++++-
 net/ipv4/tcp_input.c   | 2 +-
 net/ipv4/udp.c         | 2 +-
 net/ipv6/raw.c         | 2 +-
 net/ipv6/udp.c         | 3 ++-
 net/rxrpc/ar-recvmsg.c | 3 ++-
 7 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ae77862..c282a2c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1765,7 +1765,8 @@ extern int	       skb_copy_datagram_iovec(const struct sk_buff *from,
 					       int size);
 extern int	       skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
 							int hlen,
-							struct iovec *iov);
+							struct iovec *iov,
+							int len);
 extern int	       skb_copy_datagram_from_iovec(struct sk_buff *skb,
 						    int offset,
 						    const struct iovec *from,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 253d068..767c17a 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -693,6 +693,7 @@ EXPORT_SYMBOL(__skb_checksum_complete);
  *	@skb: skbuff
  *	@hlen: hardware length
  *	@iov: io vector
+ *	@len: amount of data to copy from skb to iov
  *
  *	Caller _must_ check that skb will fit to this iovec.
  *
@@ -702,11 +703,14 @@ EXPORT_SYMBOL(__skb_checksum_complete);
  *			   can be modified!
  */
 int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
-				     int hlen, struct iovec *iov)
+				     int hlen, struct iovec *iov, int len)
 {
 	__wsum csum;
 	int chunk = skb->len - hlen;
 
+	if (chunk > len)
+		chunk = len;
+
 	if (!chunk)
 		return 0;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c821218..d3dcfb9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4985,7 +4985,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 		err = skb_copy_datagram_iovec(skb, hlen, tp->ucopy.iov, chunk);
 	else
 		err = skb_copy_and_csum_datagram_iovec(skb, hlen,
-						       tp->ucopy.iov);
+						       tp->ucopy.iov, chunk);
 
 	if (!err) {
 		tp->ucopy.len -= chunk;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 3ae286b..83b507d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -975,7 +975,7 @@ try_again:
 	else {
 		err = skb_copy_and_csum_datagram_iovec(skb,
 						       sizeof(struct udphdr),
-						       msg->msg_iov);
+						       msg->msg_iov, copied);
 
 		if (err == -EINVAL)
 			goto csum_copy_err;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index d5b09c7..f016542 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -476,7 +476,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			goto csum_copy_err;
 		err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 	} else {
-		err = skb_copy_and_csum_datagram_iovec(skb, 0, msg->msg_iov);
+		err = skb_copy_and_csum_datagram_iovec(skb, 0, msg->msg_iov, copied);
 		if (err == -EINVAL)
 			goto csum_copy_err;
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0b023f3..5c8bd19 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -233,7 +233,8 @@ try_again:
 		err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
 					      msg->msg_iov, copied       );
 	else {
-		err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
+		err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
+						       msg->msg_iov, copied);
 		if (err == -EINVAL)
 			goto csum_copy_err;
 	}
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index b6076b2..813e1c4 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -184,7 +184,8 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 						      msg->msg_iov, copy);
 		} else {
 			ret = skb_copy_and_csum_datagram_iovec(skb, offset,
-							       msg->msg_iov);
+							       msg->msg_iov,
+							       copy);
 			if (ret == -EINVAL)
 				goto csum_copy_error;
 		}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 26/38] [PATCH 26/38] module: Fix locking in symbol_put_addr()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: poma, Peter Zijlstra (Intel),
	Rusty Russell, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 275d7d44d802ef271a42dc87ac091a495ba72fc5 upstream.

Poma (on the way to another bug) reported an assertion triggering:

  [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
  [<ffffffff81150822>] __module_address+0x32/0x150
  [<ffffffff81150956>] __module_text_address+0x16/0x70
  [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
  [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]

Laura Abbott <labbott@redhat.com> produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.

This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).

While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().

Reported-by: poma <pomidorabelisima@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: a6e6abd575fc ("module: remove module_text_address()")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 3895ff2d13043ecd091813b67a485ec487870b63)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 kernel/module.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4b270e6..04aa4f1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -923,11 +923,15 @@ void symbol_put_addr(void *addr)
 	if (core_kernel_text(a))
 		return;
 
-	/* module_text_address is safe here: we're supposed to have reference
-	 * to module from symbol_get, so it can't go away. */
+	/*
+	 * Even though we hold a reference on the module; we still need to
+	 * disable preemption in order to safely traverse the data structure.
+	 */
+	preempt_disable();
 	modaddr = __module_text_address(a);
 	BUG_ON(!modaddr);
 	module_put(modaddr);
+	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(symbol_put_addr);
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 27/38] [PATCH 27/38] x86/process: Add proper bound checks in 64bit get_wchan()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dmitry Vyukov, Sasha Levin, Thomas Gleixner, Borislav Petkov,
	Andrey Ryabinin, Andy Lutomirski, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko, kasan-dev,
	Denys Vlasenko, Andi Kleen, Wolfram Gloger, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit eddd3826a1a0190e5235703d1e666affa4d13b96 upstream.

Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
 - s/READ_ONCE/ACCESS_ONCE/
 - Remove use of TOP_OF_KERNEL_STACK_PADDING, not defined here and would
   be defined as 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 5311d93d0d33ae878d5fbb35ea5693b9c813ba04)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 arch/x86/kernel/process_64.c | 52 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 39493bc..936b0ba 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -550,27 +550,59 @@ void set_personality_ia32(void)
 	current_thread_info()->status |= TS_COMPAT;
 }
 
+/*
+ * Called from fs/proc with a reference on @p to find the function
+ * which called into schedule(). This needs to be done carefully
+ * because the task might wake up and we might look at a stack
+ * changing under us.
+ */
 unsigned long get_wchan(struct task_struct *p)
 {
-	unsigned long stack;
-	u64 fp, ip;
+	unsigned long start, bottom, top, sp, fp, ip;
 	int count = 0;
 
 	if (!p || p == current || p->state == TASK_RUNNING)
 		return 0;
-	stack = (unsigned long)task_stack_page(p);
-	if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
+
+	start = (unsigned long)task_stack_page(p);
+	if (!start)
+		return 0;
+
+	/*
+	 * Layout of the stack page:
+	 *
+	 * ----------- topmax = start + THREAD_SIZE - sizeof(unsigned long)
+	 * PADDING
+	 * ----------- top = topmax - TOP_OF_KERNEL_STACK_PADDING
+	 * stack
+	 * ----------- bottom = start + sizeof(thread_info)
+	 * thread_info
+	 * ----------- start
+	 *
+	 * The tasks stack pointer points at the location where the
+	 * framepointer is stored. The data on the stack is:
+	 * ... IP FP ... IP FP
+	 *
+	 * We need to read FP and IP, so we need to adjust the upper
+	 * bound by another unsigned long.
+	 */
+	top = start + THREAD_SIZE;
+	top -= 2 * sizeof(unsigned long);
+	bottom = start + sizeof(struct thread_info);
+
+	sp = ACCESS_ONCE(p->thread.sp);
+	if (sp < bottom || sp > top)
 		return 0;
-	fp = *(u64 *)(p->thread.sp);
+
+	fp = ACCESS_ONCE(*(unsigned long *)sp);
 	do {
-		if (fp < (unsigned long)stack ||
-		    fp >= (unsigned long)stack+THREAD_SIZE)
+		if (fp < bottom || fp > top)
 			return 0;
-		ip = *(u64 *)(fp+8);
+		ip = ACCESS_ONCE(*(unsigned long *)(fp + sizeof(unsigned long)));
 		if (!in_sched_functions(ip))
 			return ip;
-		fp = *(u64 *)fp;
-	} while (count++ < 16);
+		fp = ACCESS_ONCE(*(unsigned long *)fp);
+	} while (count++ < 16 && p->state != TASK_RUNNING);
 	return 0;
 }
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 28/38] [PATCH 28/38] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mel Gorman, SunDong, Michal Hocko, Andrea Arcangeli,
	Hugh Dickins, Naoya Horiguchi, David Rientjes, Andrew Morton,
	Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 846bc2d8bef1e0d253cdfabfe707f37fc8cd836d)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 mm/hugetlb.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b435d1f..d81312f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2006,6 +2006,14 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 			continue;
 
 		/*
+		 * Shared VMAs have their own reserves and do not affect
+		 * MAP_PRIVATE accounting but it is possible that a shared
+		 * VMA is using the same page so check and skip such VMAs.
+		 */
+		if (iter_vma->vm_flags & VM_MAYSHARE)
+			continue;
+
+		/*
 		 * Unmap the page from other VMAs without their own reserves.
 		 * They get marked to be SIGKILLed if they fault in these
 		 * areas. This is because a future no-page fault on this VMA
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 29/38] [PATCH 29/38] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kosuke Tatsukawa, Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit e81107d4c6bd098878af9796b24edc8d4a9524fd upstream.

My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 #1 [ffff88303d107bd0] schedule at ffffffff815c513e
 #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 #5 [ffff88303d107dd0] tty_read at ffffffff81368013
 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 #8 [ffff88303d107f00] sys_read at ffffffff811a4306
 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
 - Use wake_up_interruptible(), not wake_up_interruptible_poll()
 - There are only two spurious uses of waitqueue_active() to remove]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 80910ccdd3ee35e4131df38bc73b86ee60abdf0b)
[wt: file is drivers/char/n_tty.c in 2.6.32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/char/n_tty.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 5269fa0..34be7fd 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -1287,8 +1287,7 @@ handle_newline:
 			tty->canon_data++;
 			spin_unlock_irqrestore(&tty->read_lock, flags);
 			kill_fasync(&tty->fasync, SIGIO, POLL_IN);
-			if (waitqueue_active(&tty->read_wait))
-				wake_up_interruptible(&tty->read_wait);
+			wake_up_interruptible(&tty->read_wait);
 			return;
 		}
 	}
@@ -1410,8 +1409,7 @@ static void n_tty_receive_buf(struct tty_struct *tty, const unsigned char *cp,
 
 	if (!tty->icanon && (tty->read_cnt >= tty->minimum_to_wake)) {
 		kill_fasync(&tty->fasync, SIGIO, POLL_IN);
-		if (waitqueue_active(&tty->read_wait))
-			wake_up_interruptible(&tty->read_wait);
+		wake_up_interruptible(&tty->read_wait);
 	}
 
 	/*
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 31/38] [PATCH 31/38] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Joe Perches, Ben Hutchings, David S. Miller, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 077cb37fcf6f00a45f375161200b5ee0cd4e937b ]

It seems that kernel memory can leak into userspace by a
kmalloc, ethtool_get_strings, then copy_to_user sequence.

Avoid this by using kcalloc to zero fill the copied buffer.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 68c3e59aa9cdf2d8870d8fbe4f37b1a509d0abeb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/core/ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f9e7179..ed17505 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -794,7 +794,7 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
 		}
 	}
 
-	data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
+	data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER);
 	if (!data)
 		return -ENOMEM;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 32/38] [PATCH 32/38] HID: core: Avoid uninitialized buffer access
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Richard Purdie, Jiri Kosina, linux-input, Darren Hart,
	Jiri Kosina, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 79b568b9d0c7c5d81932f4486d50b38efdd6da6d upstream.

hid_connect adds various strings to the buffer but they're all
conditional. You can find circumstances where nothing would be written
to it but the kernel will still print the supposedly empty buffer with
printk. This leads to corruption on the console/in the logs.

Ensure buf is initialized to an empty string.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
[dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
Cc: Jiri Kosina <jikos@kernel.org>
Cc: linux-input@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 604bfd00358e3d7fce8dc789fe52d2f2be0fa4c7)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/hid/hid-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index e7e28b5..644ab4d 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1235,7 +1235,7 @@ int hid_connect(struct hid_device *hdev, unsigned int connect_mask)
 		"Multi-Axis Controller"
 	};
 	const char *type, *bus;
-	char buf[64];
+	char buf[64] = "";
 	unsigned int i;
 	int len;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 33/38] [PATCH 33/38] devres: fix a for loop bounds check
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, Tejun Heo, Greg Kroah-Hartman, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 1f35d04a02a652f14566f875aef3a6f2af4cb77b upstream.

The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
DEVICE_COUNT_RESOURCE (16).  This bug was found using a static checker.
It may be that the "if (!(mask & (1 << i)))" check means we never
actually go past the end of the array in real life.

Fixes: ec04b075843d ('iomap: implement pcim_iounmap_regions()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit e7102453150c7081a27744989374c474d2ebea8e)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 lib/devres.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/devres.c b/lib/devres.c
index 72c8909..e4891d5 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -338,7 +338,7 @@ void pcim_iounmap_regions(struct pci_dev *pdev, u16 mask)
 	if (!iomap)
 		return;
 
-	for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+	for (i = 0; i < PCIM_IOMAP_MAX; i++) {
 		if (!(mask & (1 << i)))
 			continue;
 
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 34/38] [PATCH 34/38] binfmt_elf: Dont clobber passed executables file header
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Maciej W. Rozycki, Al Viro, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream.

Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header.  Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.

This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.

With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'.  With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.

Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit beebd9fa9d8aeb8f1a3028acc1987c808b601e7d)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/binfmt_elf.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 400786e..b8a0388 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -676,16 +676,16 @@ static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 			if (file_permission(interpreter, MAY_READ) < 0)
 				bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
 
-			retval = kernel_read(interpreter, 0, bprm->buf,
-					     BINPRM_BUF_SIZE);
-			if (retval != BINPRM_BUF_SIZE) {
+			/* Get the exec headers */
+			retval = kernel_read(interpreter, 0,
+					     (void *)&loc->interp_elf_ex,
+					     sizeof(loc->interp_elf_ex));
+			if (retval != sizeof(loc->interp_elf_ex)) {
 				if (retval >= 0)
 					retval = -EIO;
 				goto out_free_dentry;
 			}
 
-			/* Get the exec headers */
-			loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
 			break;
 		}
 		elf_ppnt++;
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 35/38] [PATCH 35/38] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sowmini Varadhan, Santosh Shilimkar, David S. Miller,
	Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 8ce675ff39b9958d1c10f86cf58e357efaafc856 ]

Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.

Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f114d9374ba3e42c86b112c8b4dbcba50a7330e7)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/rds/tcp_recv.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index c00daff..43d9d66 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -233,8 +233,15 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
 			}
 
 			to_copy = min(tc->t_tinc_data_rem, left);
-			pskb_pull(clone, offset);
-			pskb_trim(clone, to_copy);
+			if (!pskb_pull(clone, offset) ||
+			    pskb_trim(clone, to_copy)) {
+				printk(KERN_WARNING "rds_tcp_data_recv: pull/trim failed "
+					"left %zu data_rem %zu skb_len %d\n",
+					left, tc->t_tinc_data_rem, skb->len);
+				kfree_skb(clone);
+				desc->error = -ENOMEM;
+				goto out;
+			}
 			skb_queue_tail(&tinc->ti_skb_list, clone);
 
 			rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 36/38] [PATCH 36/38] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ani Sinha, Eric Dumazet, David S. Miller, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[<ffffffff81482b2a>] dump_stack+0x52/0x80
[<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
[<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
[<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
[<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
[<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810e6974>] ? pollwake+0x4d/0x51
[<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
[<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
[<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
[<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
[<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
[<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
[<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
[<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
[<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
[<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
[<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
[<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
[<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
[<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
[<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha <ani@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 33cf84ba8c25b40c7de52029efc8d4372725c95f)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/ipmr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 99508d6..36dbc6e 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1203,7 +1203,7 @@ static inline int ipmr_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options * opt	= &(IPCB(skb)->opt);
 
-	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+	IP_INC_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
 
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
@@ -1266,7 +1266,7 @@ static void ipmr_queue_xmit(struct sk_buff *skb, struct mfc_cache *c, int vifi)
 		   to blackhole.
 		 */
 
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 		ip_rt_put(rt);
 		goto out_free;
 	}
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 37/38] [PATCH 37/38] net: avoid NULL deref in inet_ctl_sock_destroy()
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Dumazet, Dmitry Vyukov, David S. Miller, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ]

Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit f79c83d6c41930362bc66fc71489e92975a2facf)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/net/inet_common.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 18c7732..1fb67ab 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -47,7 +47,8 @@ extern int			inet_ctl_sock_create(struct sock **sk,
 
 static inline void inet_ctl_sock_destroy(struct sock *sk)
 {
-	sk_release_kernel(sk);
+	if (sk)
+		sk_release_kernel(sk);
 }
 
 #endif
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 38/38] [PATCH 38/38] splice: sendfile() at once fails for big files
@ 2015-11-29 21:47 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-29 21:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christophe Leroy, Jens Axboe, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 0ff28d9f4674d781e492bcff6f32f0fe48cf0fed upstream.

Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing.

/* md5sum2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/if_alg.h>

int main(int argc, char **argv)
{
	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
	struct stat st;
	struct sockaddr_alg sa = {
		.salg_family = AF_ALG,
		.salg_type = "hash",
		.salg_name = "md5",
	};
	int n;

	bind(sk, (struct sockaddr*)&sa, sizeof(sa));

	for (n = 1; n < argc; n++) {
		int size;
		int offset = 0;
		char buf[4096];
		int fd;
		int sko;
		int i;

		fd = open(argv[n], O_RDONLY);
		sko = accept(sk, NULL, 0);
		fstat(fd, &st);
		size = st.st_size;
		sendfile(sko, fd, &offset, size);
		size = read(sko, buf, sizeof(buf));
		for (i = 0; i < size; i++)
			printf("%2.2x", buf[i]);
		printf("  %s\n", argv[n]);
		close(fd);
		close(sko);
	}
	exit(0);
}

Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.

root@vgoip:~# ls -l patch-3.6.*
-rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz

After investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.

This patch adds SPLICE_F_MORE to the flags when more data is pending.

With the patch applied, we get the correct sums:

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit fcb2781782b61631db4ed31e98757795eacd31cb)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/splice.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 1ef1c00..5c006c8b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1123,7 +1123,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 	long ret, bytes;
 	umode_t i_mode;
 	size_t len;
-	int i, flags;
+	int i, flags, more;
 
 	/*
 	 * We require the input being a regular file, as we don't want to
@@ -1166,6 +1166,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 	 * Don't block on output, we have to drain the direct pipe.
 	 */
 	sd->flags &= ~SPLICE_F_NONBLOCK;
+	more = sd->flags & SPLICE_F_MORE;
 
 	while (len) {
 		size_t read_len;
@@ -1179,6 +1180,15 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		sd->total_len = read_len;
 
 		/*
+		 * If more data is pending, set SPLICE_F_MORE
+		 * If this is the last data and SPLICE_F_MORE was not set
+		 * initially, clears it.
+		 */
+		if (read_len < len)
+			sd->flags |= SPLICE_F_MORE;
+		else if (!more)
+			sd->flags &= ~SPLICE_F_MORE;
+		/*
 		 * NOTE: nonblocking mode only applies to the input. We
 		 * must not do the output in nonblocking mode as then we
 		 * could get stuck data in the internal pipe:
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 09/38] [PATCH 09/38] xhci: fix off by one error in TRB DMA address boundary check
@ 2015-11-30  1:25 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  1:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Arkadiusz Miśkiewicz, Mathias Nyman,
	Greg Kroah-Hartman, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 7895086afde2a05fa24a0e410d8e6b75ca7c8fdd upstream.

We need to check that a TRB is part of the current segment
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:

[  106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

The trb-end address is one outside the end-seg address.

Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6e3ae6256145b5597bee0296eb5fc384cd86aa3d)
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/usb/host/xhci-ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 38fb682..fa22638 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -80,7 +80,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 		return 0;
 	/* offset in TRBs */
 	segment_offset = trb - seg->trbs;
-	if (segment_offset > TRBS_PER_SEGMENT)
+	if (segment_offset >= TRBS_PER_SEGMENT)
 		return 0;
 	return seg->dma + (segment_offset * sizeof(*trb));
 }
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-29 21:47 ` [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
@ 2015-11-30  1:54   ` Ben Hutchings
  2015-11-30  7:01       ` Willy Tarreau
  0 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-11-30  1:54 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable
  Cc: Konstantin Khlebnikov, Naoya Horiguchi, Mark Williamson,
	Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2304 bytes --]

On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.
> 
> This patch makes pagemap readable for normal users and hides physical
> addresses from them.  For some use-cases PFN isn't required at all.
> 
> See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
> 
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
> Tested-by:  Mark Williamson <mwilliamson@undo-software.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> [bwh: Backported to 3.2:
>  - Add the same check in the places where we look up a PFN
>  - Add struct pagemapread * parameters where necessary
>  - Open-code file_ns_capable()
>  - Delete pagemap_open() entirely, as it would always return 0]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> (cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
> [wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, and
>  security_capable() only takes a capability. Tested OK. ]
[...]
> +	/* do not disclose physical addresses: attack vector */
> +	pm.show_pfn = !security_capable(CAP_SYS_ADMIN);
[...]

This is wrong; see
<https://marc.info/?l=linux-api&m=143144321020852&w=2>.

For 2.6.32 perhaps you could retain the capability check at open time
but store the result in private state for use at read time.

The ptrace check presumably should also be done at open time, as was
implemented upstream in:

commit a06db751c321546e5563041956a57613259c6720
Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date:   Tue Sep 8 14:59:59 2015 -0700

    pagemap: check permissions and capabilities at open time

But that wasn't cc'd to stable and hasn't been applied to any stable
branch (yet).

Ben.

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, reading IRC for the first time

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 30/38] [PATCH 30/38] mvsas: Fix NULL pointer dereference in mvs_slot_task_free
@ 2015-11-30  2:04 ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  2:04 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dāvis Mosāns, Tomas Henzl,
	Johannes Thumshirn, James Bottomley, Ben Hutchings,
	Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

commit 2280521719e81919283b82902ac24058f87dfc1b upstream.

When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays
NULL but it's later used in mvs_abort_task as slot which is passed
to mvs_slot_task_free causing NULL pointer dereference.

Just return from mvs_slot_task_free when passed with NULL slot.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit cc1875ecbc3c9fb2774097e03870280c91c1e0e1)

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 drivers/scsi/mvsas/mv_sas.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 0d21386..e4c01b5 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1035,6 +1035,8 @@ static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
 static void mvs_slot_task_free(struct mvs_info *mvi, struct sas_task *task,
 			  struct mvs_slot_info *slot, u32 slot_idx)
 {
+	if (!slot)
+		return;
 	if (!slot->task)
 		return;
 	if (!sas_protocol_ata(task->task_proto))
-- 
1.7.12.2.21.g234cd45.dirty




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-11-30  2:42 ` Ben Hutchings
  2015-11-30  6:51     ` Willy Tarreau
  0 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-11-30  2:42 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]

On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> This is the start of the longterm review cycle for the 2.6.32.69 release.
> All patches will be posted as a response to this one. If anyone has any
> issue with these being applied, please let me know. If anyone is a
> maintainer of the proper subsystem, and wants to add a Signed-off-by: line
> to the patch, please respond with it. If anyone thinks some important
> patches are missing and should be added prior to the release, please
> report them quickly with their respective mainline commit IDs.
> 
> Responses should be made by Sat Dec  5 22:47:02 CET 2015.
> Anything received after that time might be too late. If someone
> wants a bit more time for a deeper review, please let me know.
> 
> NOTE: 2.6.32 is approaching end of support. There will probably be one
> or maybe two other versions issued in the next 3 months, and that will
> be all, at least for me. Adding to this the time it can take to validate
> and deploy in some environments, it probably makes sense to start to
> think about switching to another longterm branch. 3.2 and 3.4 are good
> candidates for those seeking rock-solid versions. Longterm branches and
> their projected EOLs are listed here :
> 
>      https://www.kernel.org/category/releases.html
> 
> The whole patch series can be found in one patch at :
>      https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc1.gz
> 
> The shortlog and diffstat are appended below.

Patches 9 and 30 didn't hit the lists, but I've bounced the versions I
received.

Patch 2 didn't arrive here or on the list, but appears to be commit
a41cbe86df3a ("Failing to send a CLOSE if file is opened WRONLY and
server reboots on a 4.x mount").

These subjects in the shortlog don't appear in the patch series:

> Filipe Manana (1):
>       Btrfs: fix read corruption of compressed and shared extents
[...]
> Herbert Xu (4):
[...]
>       crypto: api - Only abort operations on fatal signal
[...]
> Jeff Mahoney (1):
>       btrfs: skip waiting on ordered range for special files
[...]
> Michal Kubeček (1):
>       ipv6: fix tunnel error handling
[...]
> Pravin B Shelar (2):
>       skbuff: Fix skb checksum flag on skb pull
>       skbuff: Fix skb checksum partial check.

Commit 397d425dc26d ("vfs: Test for and handle paths that are
unreachable from their mnt_root") is missing.

Ben.

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, reading IRC for the first tim

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
       [not found] ` <20151129214702.957590241@1wt.eu>
@ 2015-11-30  6:44   ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  6:44 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Olga Kornievskaia, Trond Myklebust

resending.

On Sun, Nov 29, 2015 at 10:47:04PM +0100, Willy Tarreau wrote:
> 2.6.32-longterm review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> commit a41cbe86df3afbc82311a1640e20858c0cd7e065 upstream.
> 
> A test case is as the description says:
> open(foobar, O_WRONLY);
> sleep()  --> reboot the server
> close(foobar)
> 
> The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
> line before going to restart, there is
> clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
> 
> NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
> owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
> value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
> out state and when we go to close it, “call_close” doesn’t get set as
> state flag is not set and CLOSE doesn’t go on the wire.
> 
> Signed-off-by: Olga Kornievskaia <aglo@umich.edu>
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  fs/nfs/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 71ee6f6..614446b 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -929,7 +929,7 @@ restart:
>  							__func__);
>  				}
>  				nfs4_put_open_state(state);
> -				clear_bit(NFS4CLNT_RECLAIM_NOGRACE,
> +				clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>  					&state->flags);
>  				goto restart;
>  			}
> -- 
> 1.7.12.2.21.g234cd45.dirty
> 
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
  2015-11-30  2:42 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
@ 2015-11-30  6:51     ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  6:51 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

Hi Ben,

On Mon, Nov 30, 2015 at 02:42:13AM +0000, Ben Hutchings wrote:
> Patches 9 and 30 didn't hit the lists, but I've bounced the versions I
> received.
 
Thanks. Strangely, 9 arrived late, I don't know why.

> Patch 2 didn't arrive here or on the list, but appears to be commit
> a41cbe86df3a ("Failing to send a CLOSE if file is opened WRONLY and
> server reboots on a 4.x mount").

Yes that's it, I've resent it now.

> These subjects in the shortlog don't appear in the patch series:
> 
> > Filipe Manana (1):
> >       Btrfs: fix read corruption of compressed and shared extents
> [...]
> > Herbert Xu (4):
> [...]
> >       crypto: api - Only abort operations on fatal signal
> [...]
> > Jeff Mahoney (1):
> >       btrfs: skip waiting on ordered range for special files
> [...]
> > Michal Kube??ek (1):
> >       ipv6: fix tunnel error handling
> [...]
> > Pravin B Shelar (2):
> >       skbuff: Fix skb checksum flag on skb pull
> >       skbuff: Fix skb checksum partial check.
> 

Indeed I removed them during the build attempt, and long before building
the changelog, I'm worried that there's a bug in my script which seems
to take a specific branch to emit the log instead of the current one :-/
Thanks for letting me know and sorry for the confusion.

> Commit 397d425dc26d ("vfs: Test for and handle paths that are
> unreachable from their mnt_root") is missing.

OK I'm seeing it in your 3.2 branch, I'll try to backport it.

Thanks Ben!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-11-30  6:51     ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  6:51 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

Hi Ben,

On Mon, Nov 30, 2015 at 02:42:13AM +0000, Ben Hutchings wrote:
> Patches 9 and 30 didn't hit the lists, but I've bounced the versions I
> received.
 
Thanks. Strangely, 9 arrived late, I don't know why.

> Patch 2 didn't arrive here or on the list, but appears to be commit
> a41cbe86df3a ("Failing to send a CLOSE if file is opened WRONLY and
> server reboots on a 4.x mount").

Yes that's it, I've resent it now.

> These subjects in the shortlog don't appear in the patch series:
> 
> > Filipe Manana (1):
> > ������Btrfs: fix read corruption of compressed and shared extents
> [...]
> > Herbert Xu (4):
> [...]
> > ������crypto: api - Only abort operations on fatal signal
> [...]
> > Jeff Mahoney (1):
> > ������btrfs: skip waiting on ordered range for special files
> [...]
> > Michal Kube??ek (1):
> > ������ipv6: fix tunnel error handling
> [...]
> > Pravin B Shelar (2):
> > ������skbuff: Fix skb checksum flag on skb pull
> > ������skbuff: Fix skb checksum partial check.
> 

Indeed I removed them during the build attempt, and long before building
the changelog, I'm worried that there's a bug in my script which seems
to take a specific branch to emit the log instead of the current one :-/
Thanks for letting me know and sorry for the confusion.

> Commit 397d425dc26d ("vfs: Test for and handle paths that are
> unreachable from their mnt_root") is missing.

OK I'm seeing it in your 3.2 branch, I'll try to backport it.

Thanks Ben!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30  1:54   ` Ben Hutchings
@ 2015-11-30  7:01       ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  7:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> This is wrong; see
> <https://marc.info/?l=linux-api&m=143144321020852&w=2>.

Damned, and I now remember this discussion. The worst thing is that
I purposely booted a machine to test the fix and was happy with it,
I forgot this point :-(

> For 2.6.32 perhaps you could retain the capability check at open time
> but store the result in private state for use at read time.

I'll see if it is possible to opencode security_capable() with 2.6.32's
infrastructure, and how far this brings us. Or maybe we should even drop
this one completely and leave pagemap readable only for superuser on
2.6.32, it doesn't seem to be that big of a deal either.

> The ptrace check presumably should also be done at open time, as was
> implemented upstream in:
> 
> commit a06db751c321546e5563041956a57613259c6720
> Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date:   Tue Sep 8 14:59:59 2015 -0700
> 
>     pagemap: check permissions and capabilities at open time
> 
> But that wasn't cc'd to stable and hasn't been applied to any stable
> branch (yet).

I really prefer to avoid fixing things that are not in more recent
branches (especially upgrade candidates for 2.6.32 like yours).

Thanks!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
@ 2015-11-30  7:01       ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30  7:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> This is wrong; see
> <https://marc.info/?l=linux-api&m=143144321020852&w=2>.

Damned, and I now remember this discussion. The worst thing is that
I purposely booted a machine to test the fix and was happy with it,
I forgot this point :-(

> For 2.6.32 perhaps you could retain the capability check at open time
> but store the result in private state for use at read time.

I'll see if it is possible to opencode security_capable() with 2.6.32's
infrastructure, and how far this brings us. Or maybe we should even drop
this one completely and leave pagemap readable only for superuser on
2.6.32, it doesn't seem to be that big of a deal either.

> The ptrace check presumably should also be done at open time, as was
> implemented upstream in:
> 
> commit a06db751c321546e5563041956a57613259c6720
> Author: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date:���Tue Sep 8 14:59:59 2015 -0700
> 
> ����pagemap: check permissions and capabilities at open time
> 
> But that wasn't cc'd to stable and hasn't been applied to any stable
> branch (yet).

I really prefer to avoid fixing things that are not in more recent
branches (especially upgrade candidates for 2.6.32 like yours).

Thanks!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
  2015-11-30  6:51     ` Willy Tarreau
  (?)
@ 2015-11-30 11:23     ` Willy Tarreau
  -1 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 11:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable, ebiederm

[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

On Mon, Nov 30, 2015 at 07:51:48AM +0100, Willy Tarreau wrote:
> > Commit 397d425dc26d ("vfs: Test for and handle paths that are
> > unreachable from their mnt_root") is missing.
> 
> OK I'm seeing it in your 3.2 branch, I'll try to backport it.

The code in 2.6.32 looks like a plate of spaghetti, which was heavily
reworked in 2.6.38-rc1 by commit 31e6b01 ("fs: rcu-walk for path lookup").
It even does something suspiciously useless :

		if (this.name[0] == '.') switch (this.len) {
			default:
				break;
			case 2:	
				if (this.name[1] != '.')
					break;
				follow_dotdot(nd);
				inode = nd->path.dentry->d_inode;
				/* fallthrough */
			case 1:
				goto return_reval;
		}

Look how inode is assigned after follow_dotdot() and is never used
between the moment it's assigned and the moment it's re-assigned. I
think I got it right though. I checked that I don't need to pass
via path_put() on exit since follow_dotdot() does it on the error
path. I'd appreciate that you, Eric, or anyone else would review it
though.

Thanks,
Willy


[-- Attachment #2: 0001-vfs-Test-for-and-handle-paths-that-are-unreachable-f.patch --]
[-- Type: text/plain, Size: 3729 bytes --]

>From a91bc7f9c52b053e7c0a59eed4eaec533934a7f4 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Sat, 15 Aug 2015 20:27:13 -0500
Subject: vfs: Test for and handle paths that are unreachable from their
 mnt_root

commit 397d425dc26da728396e66d392d5dcb8dac30c37 upstream.

In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected.  We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
  from path->mnt.mnt_root.  AKA to validate that rename did not do
  something nasty to the bind mount.

  To avoid races path_connected must be called after following a path
  component to it's next path component.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 6d84ade2c8242ab83fbc5bacb66eb81a8d1ca6db)
[wt: backported to 2.6.32 which significantly differs]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/namei.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0d766d2..1554add 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -414,6 +414,24 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
 	return dentry;
 }
 
+/**
+ * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
+ * @path: nameidate to verify
+ *
+ * Rename can sometimes move a file or directory outside of a bind
+ * mount, path_connected allows those cases to be detected.
+ */
+static bool path_connected(const struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+
+	/* Only bind mounts can have disconnected paths */
+	if (mnt->mnt_root == mnt->mnt_sb->s_root)
+		return true;
+
+	return is_subdir(path->dentry, mnt->mnt_root);
+}
+
 /*
  * Internal lookup() using the new generic dcache.
  * SMP-safe
@@ -754,7 +772,7 @@ int follow_down(struct path *path)
 	return 0;
 }
 
-static __always_inline void follow_dotdot(struct nameidata *nd)
+static __always_inline int follow_dotdot(struct nameidata *nd)
 {
 	set_root(nd);
 
@@ -771,6 +789,10 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 			nd->path.dentry = dget(nd->path.dentry->d_parent);
 			spin_unlock(&dcache_lock);
 			dput(old);
+			if (unlikely(!path_connected(&nd->path))) {
+				path_put(&nd->path);
+				return -ENOENT;
+			}
 			break;
 		}
 		spin_unlock(&dcache_lock);
@@ -788,6 +810,7 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 		nd->path.mnt = parent;
 	}
 	follow_mount(&nd->path);
+	return 0;
 }
 
 /*
@@ -905,7 +928,9 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 			case 2:	
 				if (this.name[1] != '.')
 					break;
-				follow_dotdot(nd);
+				err = follow_dotdot(nd);
+				if (err)
+					goto return_err;
 				inode = nd->path.dentry->d_inode;
 				/* fallthrough */
 			case 1:
@@ -960,7 +985,9 @@ last_component:
 			case 2:	
 				if (this.name[1] != '.')
 					break;
-				follow_dotdot(nd);
+				err = follow_dotdot(nd);
+				if (err)
+					goto return_err;
 				inode = nd->path.dentry->d_inode;
 				/* fallthrough */
 			case 1:
-- 
1.7.12.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30  7:01       ` Willy Tarreau
  (?)
@ 2015-11-30 11:30       ` Willy Tarreau
  2015-11-30 11:49         ` Konstantin Khlebnikov
  2015-11-30 14:55         ` Ben Hutchings
  -1 siblings, 2 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 11:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Mon, Nov 30, 2015 at 08:01:36AM +0100, Willy Tarreau wrote:
> On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> > On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> > This is wrong; see
> > <https://marc.info/?l=linux-api&m=143144321020852&w=2>.
> 
> Damned, and I now remember this discussion. The worst thing is that
> I purposely booted a machine to test the fix and was happy with it,
> I forgot this point :-(
> 
> > For 2.6.32 perhaps you could retain the capability check at open time
> > but store the result in private state for use at read time.
> 
> I'll see if it is possible to opencode security_capable() with 2.6.32's
> infrastructure, and how far this brings us. Or maybe we should even drop
> this one completely and leave pagemap readable only for superuser on
> 2.6.32, it doesn't seem to be that big of a deal either.

It was easy enough to open-code security_capable() in the end. I've
tested this version which works fine for me here. If that's OK for you
I'll emit an -rc2 with the last two patches.

Thanks,
Willy


[-- Attachment #2: 0001-pagemap-hide-physical-addresses-from-non-privileged-.patch --]
[-- Type: text/plain, Size: 3800 bytes --]

>From fde24678af1b04712144457512afbc16fd71b252 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 8 Sep 2015 15:00:07 -0700
Subject: pagemap: hide physical addresses from non-privileged users

commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.

This patch makes pagemap readable for normal users and hides physical
addresses from them.  For some use-cases PFN isn't required at all.

See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
Tested-by:  Mark Williamson <mwilliamson@undo-software.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Add the same check in the places where we look up a PFN
 - Add struct pagemapread * parameters where necessary
 - Open-code file_ns_capable()
 - Delete pagemap_open() entirely, as it would always return 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
[wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, open-coded
 security_capable(). Tested OK. ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/proc/task_mmu.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 73db5a6..24d3602 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -8,6 +8,7 @@
 #include <linux/mempolicy.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/security.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -539,6 +540,7 @@ const struct file_operations proc_clear_refs_operations = {
 
 struct pagemapread {
 	u64 __user *out, *end;
+	bool show_pfn;
 };
 
 #define PM_ENTRY_BYTES      sizeof(u64)
@@ -589,14 +591,14 @@ static u64 swap_pte_to_pagemap_entry(pte_t pte)
 	return swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT);
 }
 
-static u64 pte_to_pagemap_entry(pte_t pte)
+static u64 pte_to_pagemap_entry(struct pagemapread *pm, pte_t pte)
 {
 	u64 pme = 0;
 	if (is_swap_pte(pte))
 		pme = PM_PFRAME(swap_pte_to_pagemap_entry(pte))
 			| PM_PSHIFT(PAGE_SHIFT) | PM_SWAP;
 	else if (pte_present(pte))
-		pme = PM_PFRAME(pte_pfn(pte))
+		pme = (pm->show_pfn ? PM_PFRAME(pte_pfn(pte)) : 0)
 			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
 	return pme;
 }
@@ -624,7 +626,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		if (vma && (vma->vm_start <= addr) &&
 		    !is_vm_hugetlb_page(vma)) {
 			pte = pte_offset_map(pmd, addr);
-			pfn = pte_to_pagemap_entry(*pte);
+			pfn = pte_to_pagemap_entry(pm, *pte);
 			/* unmap before userspace copy */
 			pte_unmap(pte);
 		}
@@ -695,6 +697,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_task;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, SECURITY_CAP_AUDIT);
+
 	mm = get_task_mm(task);
 	if (!mm)
 		goto out_task;
@@ -773,19 +778,9 @@ out:
 	return ret;
 }
 
-static int pagemap_open(struct inode *inode, struct file *file)
-{
-	/* do not disclose physical addresses to unprivileged
-	   userspace (closes a rowhammer attack vector) */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	return 0;
-}
-
 const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
-	.open		= pagemap_open,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
-- 
1.7.12.1


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30 11:30       ` Willy Tarreau
@ 2015-11-30 11:49         ` Konstantin Khlebnikov
  2015-11-30 12:13           ` Willy Tarreau
  2015-11-30 14:55         ` Ben Hutchings
  1 sibling, 1 reply; 61+ messages in thread
From: Konstantin Khlebnikov @ 2015-11-30 11:49 UTC (permalink / raw)
  To: Willy Tarreau, Ben Hutchings
  Cc: linux-kernel, stable, Naoya Horiguchi, Mark Williamson,
	Andrew Morton, Linus Torvalds

On 30.11.2015 14:30, Willy Tarreau wrote:
> On Mon, Nov 30, 2015 at 08:01:36AM +0100, Willy Tarreau wrote:
>> >On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
>>> > >On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
>>> > >This is wrong; see
>>> > ><https://marc.info/?l=linux-api&m=143144321020852&w=2>.
>> >
>> >Damned, and I now remember this discussion. The worst thing is that
>> >I purposely booted a machine to test the fix and was happy with it,
>> >I forgot this point:-(
>> >
>>> > >For 2.6.32 perhaps you could retain the capability check at open time
>>> > >but store the result in private state for use at read time.
>> >
>> >I'll see if it is possible to opencode security_capable() with 2.6.32's
>> >infrastructure, and how far this brings us. Or maybe we should even drop
>> >this one completely and leave pagemap readable only for superuser on
>> >2.6.32, it doesn't seem to be that big of a deal either.
> It was easy enough to open-code security_capable() in the end. I've
> tested this version which works fine for me here. If that's OK for you
> I'll emit an -rc2 with the last two patches.
>
> Thanks,
> Willy
>
>
> 0001-pagemap-hide-physical-addresses-from-non-privileged-.patch
>
>
>  From fde24678af1b04712144457512afbc16fd71b252 Mon Sep 17 00:00:00 2001
> From: Konstantin Khlebnikov<khlebnikov@yandex-team.ru>
> Date: Tue, 8 Sep 2015 15:00:07 -0700
> Subject: pagemap: hide physical addresses from non-privileged users
>
> commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.
>
> This patch makes pagemap readable for normal users and hides physical
> addresses from them.  For some use-cases PFN isn't required at all.
>
> Seehttp://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name
>
> Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
> Signed-off-by: Konstantin Khlebnikov<khlebnikov@yandex-team.ru>
> Cc: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> Reviewed-by: Mark Williamson<mwilliamson@undo-software.com>
> Tested-by:  Mark Williamson<mwilliamson@undo-software.com>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org>
> [bwh: Backported to 3.2:
>   - Add the same check in the places where we look up a PFN
>   - Add struct pagemapread * parameters where necessary
>   - Open-code file_ns_capable()
>   - Delete pagemap_open() entirely, as it would always return 0]
> Signed-off-by: Ben Hutchings<ben@decadent.org.uk>
> (cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
> [wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, open-coded
>   security_capable(). Tested OK. ]
> Signed-off-by: Willy Tarreau<w@1wt.eu>
> ---
>   fs/proc/task_mmu.c | 21 ++++++++-------------
>   1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 73db5a6..24d3602 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -8,6 +8,7 @@
>   #include <linux/mempolicy.h>
>   #include <linux/swap.h>
>   #include <linux/swapops.h>
> +#include <linux/security.h>
>
>   #include <asm/elf.h>
>   #include <asm/uaccess.h>
> @@ -539,6 +540,7 @@ const struct file_operations proc_clear_refs_operations = {
>
>   struct pagemapread {
>   	u64 __user *out, *end;
> +	bool show_pfn;
>   };
>
>   #define PM_ENTRY_BYTES      sizeof(u64)
> @@ -589,14 +591,14 @@ static u64 swap_pte_to_pagemap_entry(pte_t pte)
>   	return swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT);
>   }
>
> -static u64 pte_to_pagemap_entry(pte_t pte)
> +static u64 pte_to_pagemap_entry(struct pagemapread *pm, pte_t pte)
>   {
>   	u64 pme = 0;
>   	if (is_swap_pte(pte))
>   		pme = PM_PFRAME(swap_pte_to_pagemap_entry(pte))
>   			| PM_PSHIFT(PAGE_SHIFT) | PM_SWAP;
>   	else if (pte_present(pte))
> -		pme = PM_PFRAME(pte_pfn(pte))
> +		pme = (pm->show_pfn ? PM_PFRAME(pte_pfn(pte)) : 0)
>   			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
>   	return pme;
>   }
> @@ -624,7 +626,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>   		if (vma && (vma->vm_start <= addr) &&
>   		    !is_vm_hugetlb_page(vma)) {
>   			pte = pte_offset_map(pmd, addr);
> -			pfn = pte_to_pagemap_entry(*pte);
> +			pfn = pte_to_pagemap_entry(pm, *pte);
>   			/* unmap before userspace copy */
>   			pte_unmap(pte);
>   		}
> @@ -695,6 +697,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
>   	if (!count)
>   		goto out_task;
>
> +	/* do not disclose physical addresses: attack vector */
> +	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, SECURITY_CAP_AUDIT);
> +

At first sight this is confusing... but correct. It really returns zero
for success, unlike to new file_ns_capable which returns bool true.

The rest looks good too.

>   	mm = get_task_mm(task);
>   	if (!mm)
>   		goto out_task;
> @@ -773,19 +778,9 @@ out:
>   	return ret;
>   }
>
> -static int pagemap_open(struct inode *inode, struct file *file)
> -{
> -	/* do not disclose physical addresses to unprivileged
> -	   userspace (closes a rowhammer attack vector) */
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -	return 0;
> -}
> -
>   const struct file_operations proc_pagemap_operations = {
>   	.llseek		= mem_lseek, /* borrow this */
>   	.read		= pagemap_read,
> -	.open		= pagemap_open,
>   };
>   #endif /* CONFIG_PROC_PAGE_MONITOR */
>
> -- 1.7.12.1
>


-- 
Konstantin

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30 11:49         ` Konstantin Khlebnikov
@ 2015-11-30 12:13           ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 12:13 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Ben Hutchings, linux-kernel, stable, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

On Mon, Nov 30, 2015 at 02:49:59PM +0300, Konstantin Khlebnikov wrote:
> On 30.11.2015 14:30, Willy Tarreau wrote:
> >+	/* do not disclose physical addresses: attack vector */
> >+	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, 
> >SECURITY_CAP_AUDIT);
> >+
> 
> At first sight this is confusing... but correct. It really returns zero
> for success, unlike to new file_ns_capable which returns bool true.

Yes, it trapped me as well, the first attempt I made only allowed non-root
to read the pagemap!

> The rest looks good too.

OK thank you.

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
  2015-11-30  6:51     ` Willy Tarreau
  (?)
  (?)
@ 2015-11-30 14:43     ` Ben Hutchings
  2015-11-30 15:10       ` Willy Tarreau
  -1 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-11-30 14:43 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

On Mon, 2015-11-30 at 07:51 +0100, Willy Tarreau wrote:
[...]
> > Commit 397d425dc26d ("vfs: Test for and handle paths that are
> > unreachable from their mnt_root") is missing.
> 
> OK I'm seeing it in your 3.2 branch, I'll try to backport it.

Eric sent you a backport here:
http://article.gmane.org/gmane.linux.kernel.stable/151074

Ben.

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, reading IRC for the first time

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30 11:30       ` Willy Tarreau
  2015-11-30 11:49         ` Konstantin Khlebnikov
@ 2015-11-30 14:55         ` Ben Hutchings
  2015-11-30 15:14             ` Willy Tarreau
  1 sibling, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-11-30 14:55 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1830 bytes --]

On Mon, 2015-11-30 at 12:30 +0100, Willy Tarreau wrote:
> On Mon, Nov 30, 2015 at 08:01:36AM +0100, Willy Tarreau wrote:
> > On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> > > On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> > > This is wrong; see
> > > <https://marc.info/?l=linux-api&m=143144321020852&w=2>.
> > 
> > Damned, and I now remember this discussion. The worst thing is that
> > I purposely booted a machine to test the fix and was happy with it,
> > I forgot this point :-(
> > 
> > > For 2.6.32 perhaps you could retain the capability check at open time
> > > but store the result in private state for use at read time.
> > 
> > I'll see if it is possible to opencode security_capable() with 2.6.32's
> > infrastructure, and how far this brings us. Or maybe we should even drop
> > this one completely and leave pagemap readable only for superuser on
> > 2.6.32, it doesn't seem to be that big of a deal either.
> 
> It was easy enough to open-code security_capable() in the end. I've
> tested this version which works fine for me here. If that's OK for you
> I'll emit an -rc2 with the last two patches.
[...]
> +	/* do not disclose physical addresses: attack vector */
> +	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, SECURITY_CAP_AUDIT);
[...]

But this bypasses SELinux's additional restrictions on capabilities.
I think it would be better to cherry-pick this first:

commit 6037b715d6fab139742c3df8851db4c823081561
Author: Chris Wright <chrisw@sous-sol.org>
Date:   Wed Feb 9 22:11:51 2011 -0800

    security: add cred argument to security_capable()

and then you can pass file->f_cred to security_capable().

Ben.    

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, reading IRC for the first time

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
  2015-11-30 14:43     ` Ben Hutchings
@ 2015-11-30 15:10       ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 15:10 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

On Mon, Nov 30, 2015 at 02:43:02PM +0000, Ben Hutchings wrote:
> On Mon, 2015-11-30 at 07:51 +0100, Willy Tarreau wrote:
> [...]
> > > Commit 397d425dc26d ("vfs: Test for and handle paths that are
> > > unreachable from their mnt_root") is missing.
> > 
> > OK I'm seeing it in your 3.2 branch, I'll try to backport it.
> 
> Eric sent you a backport here:
> http://article.gmane.org/gmane.linux.kernel.stable/151074

Yes and I dropped it since you spotted a bug there, instead I've used
your version which was fixed. I didn't realize by then that only one
of the two patches was replaced by yours. That was a misunderstanding
on my side. Both versions appear to be equivalent (which reassures me)
but I'll use Eric's.

Thanks!
Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
  2015-11-30 14:55         ` Ben Hutchings
@ 2015-11-30 15:14             ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 15:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

On Mon, Nov 30, 2015 at 02:55:20PM +0000, Ben Hutchings wrote:
> On Mon, 2015-11-30 at 12:30 +0100, Willy Tarreau wrote:
> > On Mon, Nov 30, 2015 at 08:01:36AM +0100, Willy Tarreau wrote:
> > > On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> > > > On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> > > > This is wrong; see
> > > > <https://marc.info/?l=linux-api&m=143144321020852&w=2>.
> > > 
> > > Damned, and I now remember this discussion. The worst thing is that
> > > I purposely booted a machine to test the fix and was happy with it,
> > > I forgot this point :-(
> > > 
> > > > For 2.6.32 perhaps you could retain the capability check at open time
> > > > but store the result in private state for use at read time.
> > > 
> > > I'll see if it is possible to opencode security_capable() with 2.6.32's
> > > infrastructure, and how far this brings us. Or maybe we should even drop
> > > this one completely and leave pagemap readable only for superuser on
> > > 2.6.32, it doesn't seem to be that big of a deal either.
> > 
> > It was easy enough to open-code security_capable() in the end. I've
> > tested this version which works fine for me here. If that's OK for you
> > I'll emit an -rc2 with the last two patches.
> [...]
> > +	/* do not disclose physical addresses: attack vector */
> > +	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, SECURITY_CAP_AUDIT);
> [...]
> 
> But this bypasses SELinux's additional restrictions on capabilities.

Got it, I didn't think about it.

> I think it would be better to cherry-pick this first:
> 
> commit 6037b715d6fab139742c3df8851db4c823081561
> Author: Chris Wright <chrisw@sous-sol.org>
> Date:   Wed Feb 9 22:11:51 2011 -0800
> 
>     security: add cred argument to security_capable()
> 
> and then you can pass file->f_cred to security_capable().

That makes sense indeed, the patch should fit nicely. Thanks for the
pointer.

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users
@ 2015-11-30 15:14             ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 15:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, stable, Konstantin Khlebnikov, Naoya Horiguchi,
	Mark Williamson, Andrew Morton, Linus Torvalds

On Mon, Nov 30, 2015 at 02:55:20PM +0000, Ben Hutchings wrote:
> On Mon, 2015-11-30 at 12:30 +0100, Willy Tarreau wrote:
> > On Mon, Nov 30, 2015 at 08:01:36AM +0100, Willy Tarreau wrote:
> > > On Mon, Nov 30, 2015 at 01:54:22AM +0000, Ben Hutchings wrote:
> > > > On Sun, 2015-11-29 at 22:47 +0100, Willy Tarreau wrote:
> > > > This is wrong; see
> > > > <https://marc.info/?l=linux-api&m=143144321020852&w=2>.
> > > 
> > > Damned, and I now remember this discussion. The worst thing is that
> > > I purposely booted a machine to test the fix and was happy with it,
> > > I forgot this point :-(
> > > 
> > > > For 2.6.32 perhaps you could retain the capability check at open time
> > > > but store the result in private state for use at read time.
> > > 
> > > I'll see if it is possible to opencode security_capable() with 2.6.32's
> > > infrastructure, and how far this brings us. Or maybe we should even drop
> > > this one completely and leave pagemap readable only for superuser on
> > > 2.6.32, it doesn't seem to be that big of a deal either.
> > 
> > It was easy enough to open-code security_capable() in the end. I've
> > tested this version which works fine for me here. If that's OK for you
> > I'll emit an -rc2 with the last two patches.
> [...]
> > +	/* do not disclose physical addresses: attack vector */
> > +	pm.show_pfn = !cap_capable(current, file->f_cred, CAP_SYS_ADMIN, SECURITY_CAP_AUDIT);
> [...]
> 
> But this bypasses SELinux's additional restrictions on capabilities.

Got it, I didn't think about it.

> I think it would be better to cherry-pick this first:
> 
> commit 6037b715d6fab139742c3df8851db4c823081561
> Author: Chris Wright <chrisw@sous-sol.org>
> Date:���Wed Feb 9 22:11:51 2011 -0800
> 
> ����security: add cred argument to security_capable()
> 
> and then you can pass file->f_cred to security_capable().

That makes sense indeed, the patch should fit nicely. Thanks for the
pointer.

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-11-30 16:04   ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 16:04 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Ben Hutchings

Here comes 2.6.32.69-rc2.

It adds or replaces the following patches on top of -rc1 that were needed
or incorrectly backported as spotted by Ben :

  Chris Wright (1):
      security: add cred argument to security_capable()

  Eric W. Biederman (1):
      vfs: Test for and handle paths that are unreachable from their mnt_root

  Konstantin Khlebnikov (1):
      pagemap: hide physical addresses from non-privileged users

These patches will be posted as a response to this one (with 39 and 40 being
the extra ones). If anyone has any issue with these being applied, please let
me know. If anyone is a maintainer of the proper subsystem, and wants to add
a Signed-off-by: line to the patch, please respond with it. If anyone thinks
some important patches are missing and should be added prior to the release,
please report them quickly with their respective mainline commit IDs.

Last response delay remains unchanged : Sat Dec  5 22:47:02 CET 2015.

The updated patch series can be found in one patch at :
     https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc2.gz

Thanks,
Willy



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 39/38] vfs: Test for and handle paths that are unreachable from their mnt_root
@ 2015-11-30 16:04   ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 16:04 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eric W. Biederman, Al Viro, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <ebiederm@xmission.com>

commit 397d425dc26da728396e66d392d5dcb8dac30c37 upstream.

In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.

Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected.  We certainly can not match
the current behavior as the current behavior is a security hole.

Therefore when encounting .. when following an unconnected path
return -ENOENT.

- Add a function path_connected to verify path->dentry is reachable
  from path->mnt.mnt_root.  AKA to validate that rename did not do
  something nasty to the bind mount.

  To avoid races path_connected must be called after following a path
  component to it's next path component.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/namei.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 0d766d2..6551acb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -434,6 +434,24 @@ static struct dentry * cached_lookup(struct dentry * parent, struct qstr * name,
 	return dentry;
 }
 
+/**
+ * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
+ * @path: nameidate to verify
+ *
+ * Rename can sometimes move a file or directory outside of a bind
+ * mount, path_connected allows those cases to be detected.
+ */
+static bool path_connected(const struct path *path)
+{
+	struct vfsmount *mnt = path->mnt;
+
+	/* Only bind mounts can have disconnected paths */
+	if (mnt->mnt_root == mnt->mnt_sb->s_root)
+		return true;
+
+	return is_subdir(path->dentry, mnt->mnt_root);
+}
+
 /*
  * Short-cut version of permission(), for calling by
  * path_walk(), when dcache lock is held.  Combines parts
@@ -754,7 +772,7 @@ int follow_down(struct path *path)
 	return 0;
 }
 
-static __always_inline void follow_dotdot(struct nameidata *nd)
+static __always_inline int follow_dotdot(struct nameidata *nd)
 {
 	set_root(nd);
 
@@ -771,6 +789,8 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 			nd->path.dentry = dget(nd->path.dentry->d_parent);
 			spin_unlock(&dcache_lock);
 			dput(old);
+			if (unlikely(!path_connected(&nd->path)))
+				return -ENOENT;
 			break;
 		}
 		spin_unlock(&dcache_lock);
@@ -788,6 +808,7 @@ static __always_inline void follow_dotdot(struct nameidata *nd)
 		nd->path.mnt = parent;
 	}
 	follow_mount(&nd->path);
+	return 0;
 }
 
 /*
@@ -905,7 +926,9 @@ static int __link_path_walk(const char *name, struct nameidata *nd)
 			case 2:	
 				if (this.name[1] != '.')
 					break;
-				follow_dotdot(nd);
+				err = follow_dotdot(nd);
+				if (err < 0)
+					goto out_nd_path_put;
 				inode = nd->path.dentry->d_inode;
 				/* fallthrough */
 			case 1:
@@ -960,7 +983,9 @@ last_component:
 			case 2:	
 				if (this.name[1] != '.')
 					break;
-				follow_dotdot(nd);
+				err = follow_dotdot(nd);
+				if (err < 0)
+					goto out_nd_path_put;
 				inode = nd->path.dentry->d_inode;
 				/* fallthrough */
 			case 1:
@@ -1022,6 +1047,7 @@ out_dput:
 		path_put_conditional(&next, nd);
 		break;
 	}
+out_nd_path_put:
 	path_put(&nd->path);
 return_err:
 	return err;
-- 
1.7.12.2.21.g234cd45.dirty






^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 40/38] security: add cred argument to security_capable()
@ 2015-11-30 16:05   ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 16:05 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chris Wright, Serge Hallyn, James Morris, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Chris Wright <chrisw@sous-sol.org>

commit 6037b715d6fab139742c3df8851db4c823081561 upstream.

Expand security_capable() to include cred, so that it can be usable in a
wider range of call sites.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
[wt: needed by next patch only]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 include/linux/security.h | 6 +++---
 kernel/capability.c      | 2 +-
 security/security.c      | 5 ++---
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index d40d23f..73ebc3f 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1733,7 +1733,7 @@ int security_capset(struct cred *new, const struct cred *old,
 		    const kernel_cap_t *effective,
 		    const kernel_cap_t *inheritable,
 		    const kernel_cap_t *permitted);
-int security_capable(int cap);
+int security_capable(const struct cred *cred, int cap);
 int security_real_capable(struct task_struct *tsk, int cap);
 int security_real_capable_noaudit(struct task_struct *tsk, int cap);
 int security_acct(struct file *file);
@@ -1938,9 +1938,9 @@ static inline int security_capset(struct cred *new,
 	return cap_capset(new, old, effective, inheritable, permitted);
 }
 
-static inline int security_capable(int cap)
+static inline int security_capable(const struct cred *cred, int cap)
 {
-	return cap_capable(current, current_cred(), cap, SECURITY_CAP_AUDIT);
+	return cap_capable(current, cred, cap, SECURITY_CAP_AUDIT);
 }
 
 static inline int security_real_capable(struct task_struct *tsk, int cap)
diff --git a/kernel/capability.c b/kernel/capability.c
index 8a944f5..771618c 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -305,7 +305,7 @@ int capable(int cap)
 		BUG();
 	}
 
-	if (security_capable(cap) == 0) {
+	if (security_capable(current_cred(), cap) == 0) {
 		current->flags |= PF_SUPERPRIV;
 		return 1;
 	}
diff --git a/security/security.c b/security/security.c
index c4c6732..227b173 100644
--- a/security/security.c
+++ b/security/security.c
@@ -151,10 +151,9 @@ int security_capset(struct cred *new, const struct cred *old,
 				    effective, inheritable, permitted);
 }
 
-int security_capable(int cap)
+int security_capable(const struct cred *cred, int cap)
 {
-	return security_ops->capable(current, current_cred(), cap,
-				     SECURITY_CAP_AUDIT);
+	return security_ops->capable(current, cred, cap, SECURITY_CAP_AUDIT);
 }
 
 int security_real_capable(struct task_struct *tsk, int cap)
-- 
1.7.12.2.21.g234cd45.dirty






^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2.6.32 19/38] pagemap: hide physical addresses from non-privileged users
@ 2015-11-30 16:05   ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-11-30 16:05 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Konstantin Khlebnikov, Naoya Horiguchi, Mark Williamson,
	Andrew Morton, Linus Torvalds, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

commit 1c90308e7a77af6742a97d1021cca923b23b7f0d upstream.

This patch makes pagemap readable for normal users and hides physical
addresses from them.  For some use-cases PFN isn't required at all.

See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
Tested-by:  Mark Williamson <mwilliamson@undo-software.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Add the same check in the places where we look up a PFN
 - Add struct pagemapread * parameters where necessary
 - Open-code file_ns_capable()
 - Delete pagemap_open() entirely, as it would always return 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit b1fb185f26e85f76e3ac6ce557398d78797c9684)
[wt: adjusted context, no pagemap_hugetlb_range() in 2.6.32, needs
     cred argument to security_capable(), tested OK ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 fs/proc/task_mmu.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 73db5a6..36c1edf 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -8,6 +8,7 @@
 #include <linux/mempolicy.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/security.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -539,6 +540,7 @@ const struct file_operations proc_clear_refs_operations = {
 
 struct pagemapread {
 	u64 __user *out, *end;
+	bool show_pfn;
 };
 
 #define PM_ENTRY_BYTES      sizeof(u64)
@@ -589,14 +591,14 @@ static u64 swap_pte_to_pagemap_entry(pte_t pte)
 	return swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT);
 }
 
-static u64 pte_to_pagemap_entry(pte_t pte)
+static u64 pte_to_pagemap_entry(struct pagemapread *pm, pte_t pte)
 {
 	u64 pme = 0;
 	if (is_swap_pte(pte))
 		pme = PM_PFRAME(swap_pte_to_pagemap_entry(pte))
 			| PM_PSHIFT(PAGE_SHIFT) | PM_SWAP;
 	else if (pte_present(pte))
-		pme = PM_PFRAME(pte_pfn(pte))
+		pme = (pm->show_pfn ? PM_PFRAME(pte_pfn(pte)) : 0)
 			| PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
 	return pme;
 }
@@ -624,7 +626,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		if (vma && (vma->vm_start <= addr) &&
 		    !is_vm_hugetlb_page(vma)) {
 			pte = pte_offset_map(pmd, addr);
-			pfn = pte_to_pagemap_entry(*pte);
+			pfn = pte_to_pagemap_entry(pm, *pte);
 			/* unmap before userspace copy */
 			pte_unmap(pte);
 		}
@@ -695,6 +697,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf,
 	if (!count)
 		goto out_task;
 
+	/* do not disclose physical addresses: attack vector */
+	pm.show_pfn = !security_capable(file->f_cred, CAP_SYS_ADMIN);
+
 	mm = get_task_mm(task);
 	if (!mm)
 		goto out_task;
@@ -773,19 +778,9 @@ out:
 	return ret;
 }
 
-static int pagemap_open(struct inode *inode, struct file *file)
-{
-	/* do not disclose physical addresses to unprivileged
-	   userspace (closes a rowhammer attack vector) */
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	return 0;
-}
-
 const struct file_operations proc_pagemap_operations = {
 	.llseek		= mem_lseek, /* borrow this */
 	.read		= pagemap_read,
-	.open		= pagemap_open,
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
-- 
1.7.12.2.21.g234cd45.dirty








^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-12-01  0:43   ` Ben Hutchings
  2015-12-01  6:57       ` Willy Tarreau
  0 siblings, 1 reply; 61+ messages in thread
From: Ben Hutchings @ 2015-12-01  0:43 UTC (permalink / raw)
  To: Willy Tarreau, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Mon, 2015-11-30 at 17:04 +0100, Willy Tarreau wrote:
> Here comes 2.6.32.69-rc2.
> 
> It adds or replaces the following patches on top of -rc1 that were needed
> or incorrectly backported as spotted by Ben :
> 
>   Chris Wright (1):
>       security: add cred argument to security_capable()
> 
>   Eric W. Biederman (1):
>       vfs: Test for and handle paths that are unreachable from their mnt_root
> 
>   Konstantin Khlebnikov (1):
>       pagemap: hide physical addresses from non-privileged users
> 
> These patches will be posted as a response to this one (with 39 and 40 being
> the extra ones). If anyone has any issue with these being applied, please let
> me know. If anyone is a maintainer of the proper subsystem, and wants to add
> a Signed-off-by: line to the patch, please respond with it. If anyone thinks
> some important patches are missing and should be added prior to the release,
> please report them quickly with their respective mainline commit IDs.
> 
> Last response delay remains unchanged : Sat Dec  5 22:47:02 CET 2015.
> 
> The updated patch series can be found in one patch at :
>      https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc2.gz

I've applied this to the Debian 6.0 kernel source, and all looks good.

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
                                - John Levine, moderator of comp.compilers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
  2015-12-01  0:43   ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
@ 2015-12-01  6:57       ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-12-01  6:57 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

On Tue, Dec 01, 2015 at 12:43:14AM +0000, Ben Hutchings wrote:
> On Mon, 2015-11-30 at 17:04 +0100, Willy Tarreau wrote:
> > Here comes 2.6.32.69-rc2.
> > 
> > It adds or replaces the following patches on top of -rc1 that were needed
> > or incorrectly backported as spotted by Ben :
> > 
> >   Chris Wright (1):
> >       security: add cred argument to security_capable()
> > 
> >   Eric W. Biederman (1):
> >       vfs: Test for and handle paths that are unreachable from their mnt_root
> > 
> >   Konstantin Khlebnikov (1):
> >       pagemap: hide physical addresses from non-privileged users
> > 
> > These patches will be posted as a response to this one (with 39 and 40 being
> > the extra ones). If anyone has any issue with these being applied, please let
> > me know. If anyone is a maintainer of the proper subsystem, and wants to add
> > a Signed-off-by: line to the patch, please respond with it. If anyone thinks
> > some important patches are missing and should be added prior to the release,
> > please report them quickly with their respective mainline commit IDs.
> > 
> > Last response delay remains unchanged : Sat Dec  5 22:47:02 CET 2015.
> > 
> > The updated patch series can be found in one patch at :
> >      https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc2.gz
> 
> I've applied this to the Debian 6.0 kernel source, and all looks good.

Thanks for the feedback Ben!

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2.6.32 00/38] 2.6.32.69-longterm review
@ 2015-12-01  6:57       ` Willy Tarreau
  0 siblings, 0 replies; 61+ messages in thread
From: Willy Tarreau @ 2015-12-01  6:57 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-kernel, stable

On Tue, Dec 01, 2015 at 12:43:14AM +0000, Ben Hutchings wrote:
> On Mon, 2015-11-30 at 17:04 +0100, Willy Tarreau wrote:
> > Here comes 2.6.32.69-rc2.
> > 
> > It adds or replaces the following patches on top of -rc1 that were needed
> > or incorrectly backported as spotted by Ben :
> > 
> > � Chris Wright (1):
> > ������security: add cred argument to security_capable()
> > 
> > � Eric W. Biederman (1):
> > ������vfs: Test for and handle paths that are unreachable from their mnt_root
> > 
> > � Konstantin Khlebnikov (1):
> > ������pagemap: hide physical addresses from non-privileged users
> > 
> > These patches will be posted as a response to this one (with 39 and 40 being
> > the extra ones). If anyone has any issue with these being applied, please let
> > me know. If anyone is a maintainer of the proper subsystem, and wants to add
> > a Signed-off-by: line to the patch, please respond with it. If anyone thinks
> > some important patches are missing and should be added prior to the release,
> > please report them quickly with their respective mainline commit IDs.
> > 
> > Last response delay remains unchanged : Sat Dec��5 22:47:02 CET 2015.
> > 
> > The updated patch series can be found in one patch at :
> > �����https://kernel.org/pub/linux/kernel/v2.6/longterm-review/patch-2.6.32.69-rc2.gz
> 
> I've applied this to the Debian 6.0 kernel source, and all looks good.

Thanks for the feedback Ben!

Willy


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2015-12-01  6:57 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-29 21:47 [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-29 21:47 ` Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 01/38] [PATCH 01/38] dcache: Handle escaped paths in prepend_path Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 03/38] [PATCH 03/38] md: use kzalloc() when bitmap is disabled Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 04/38] [PATCH 04/38] ipv6: addrconf: validate new MTU before applying it Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 05/38] [PATCH 05/38] virtio-net: drop NETIF_F_FRAGLIST Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 06/38] [PATCH 06/38] USB: whiteheat: fix potential null-deref at probe Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 07/38] [PATCH 07/38] ipc/sem.c: fully initialize sem_array before making it visible Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 08/38] [PATCH 08/38] Initialize msg/shm IPC objects before doing ipc_addid() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 10/38] [PATCH 10/38] rds: fix an integer overflow test in rds_info_getsockopt() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 11/38] [PATCH 11/38] net: Clone skb before setting peeked flag Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 12/38] [PATCH 12/38] net: Fix skb_set_peeked use-after-free bug Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 13/38] [PATCH 13/38] ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 14/38] [PATCH 14/38] devres: fix devres_get() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 15/38] [PATCH 15/38] windfarm: decrement client count when unregistering Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 16/38] [PATCH 16/38] xfs: Fix xfs_attr_leafblock definition Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 17/38] [PATCH 17/38] SUNRPC: xs_reset_transport must mark the connection as disconnected Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 18/38] [PATCH 18/38] Input: evdev - do not report errors form flush() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 19/38] [PATCH 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-11-30  1:54   ` Ben Hutchings
2015-11-30  7:01     ` Willy Tarreau
2015-11-30  7:01       ` Willy Tarreau
2015-11-30 11:30       ` Willy Tarreau
2015-11-30 11:49         ` Konstantin Khlebnikov
2015-11-30 12:13           ` Willy Tarreau
2015-11-30 14:55         ` Ben Hutchings
2015-11-30 15:14           ` Willy Tarreau
2015-11-30 15:14             ` Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 20/38] [PATCH 20/38] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 21/38] [PATCH 21/38] hfs: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 22/38] [PATCH 22/38] x86/paravirt: Replace the paravirt nop with a bona fide empty function Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 23/38] [PATCH 23/38] RDS: verify the underlying transport exists before creating a connection Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 24/38] [PATCH 24/38] net: Fix skb csum races when peeking Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 25/38] [PATCH 25/38] net: add length argument to skb_copy_and_csum_datagram_iovec Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 26/38] [PATCH 26/38] module: Fix locking in symbol_put_addr() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 27/38] [PATCH 27/38] x86/process: Add proper bound checks in 64bit get_wchan() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 28/38] [PATCH 28/38] mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 29/38] [PATCH 29/38] tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 31/38] [PATCH 31/38] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 32/38] [PATCH 32/38] HID: core: Avoid uninitialized buffer access Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 33/38] [PATCH 33/38] devres: fix a for loop bounds check Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 34/38] [PATCH 34/38] binfmt_elf: Dont clobber passed executables file header Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 35/38] [PATCH 35/38] RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 36/38] [PATCH 36/38] ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 37/38] [PATCH 37/38] net: avoid NULL deref in inet_ctl_sock_destroy() Willy Tarreau
2015-11-29 21:47 ` [PATCH 2.6.32 38/38] [PATCH 38/38] splice: sendfile() at once fails for big files Willy Tarreau
2015-11-30  1:25 ` [PATCH 2.6.32 09/38] [PATCH 09/38] xhci: fix off by one error in TRB DMA address boundary check Willy Tarreau
2015-11-30  2:04 ` [PATCH 2.6.32 30/38] [PATCH 30/38] mvsas: Fix NULL pointer dereference in mvs_slot_task_free Willy Tarreau
2015-11-30  2:42 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-11-30  6:51   ` Willy Tarreau
2015-11-30  6:51     ` Willy Tarreau
2015-11-30 11:23     ` Willy Tarreau
2015-11-30 14:43     ` Ben Hutchings
2015-11-30 15:10       ` Willy Tarreau
     [not found] ` <20151129214702.957590241@1wt.eu>
2015-11-30  6:44   ` [PATCH 2.6.32 02/38] [PATCH 02/38] Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount Willy Tarreau
2015-11-30 16:04 ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Willy Tarreau
2015-11-30 16:04   ` Willy Tarreau
2015-11-30 16:04   ` [PATCH 2.6.32 39/38] vfs: Test for and handle paths that are unreachable from their mnt_root Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 40/38] security: add cred argument to security_capable() Willy Tarreau
2015-11-30 16:05   ` [PATCH 2.6.32 19/38] pagemap: hide physical addresses from non-privileged users Willy Tarreau
2015-12-01  0:43   ` [PATCH 2.6.32 00/38] 2.6.32.69-longterm review Ben Hutchings
2015-12-01  6:57     ` Willy Tarreau
2015-12-01  6:57       ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.