All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.4 00/16] 4.4.101-stable review
@ 2017-11-22 10:11 Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 01/16] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
                   ` (18 more replies)
  0 siblings, 19 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

This is the start of the stable review cycle for the 4.4.101 release.
There are 16 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Nov 24 10:11:01 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.101-rc1.gz
or in the git tree and branch at:
  git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 4.4.101-rc1

Jan Harkes <jaharkes@cs.cmu.edu>
    coda: fix 'kernel memory exposure attempt' in fsync

Pavel Tatashin <pasha.tatashin@oracle.com>
    mm/page_alloc.c: broken deferred calculation

Corey Minyard <cminyard@mvista.com>
    ipmi: fix unsigned long underflow

alex chen <alex.chen@huawei.com>
    ocfs2: should wait dio before inode lock in ocfs2_setattr()

Keith Busch <keith.busch@intel.com>
    nvme: Fix memory order on async queue deletion

Mark Rutland <mark.rutland@arm.com>
    arm64: fix dump_instr when PAN and UAO are in use

Lukas Wunner <lukas@wunner.de>
    serial: omap: Fix EFR write on RTS deassertion

Roberto Sassu <roberto.sassu@huawei.com>
    ima: do not update security.ima if appraisal status is not INTEGRITY_PASS

Eric W. Biederman <ebiederm@xmission.com>
    net/sctp: Always set scope_id in sctp_inet6_skb_msgname

Huacai Chen <chenhc@lemote.com>
    fealnx: Fix building error on MIPS

Xin Long <lucien.xin@gmail.com>
    sctp: do not peel off an assoc from one netns to another one

Jason A. Donenfeld <Jason@zx2c4.com>
    af_netlink: ensure that NLMSG_DONE never fails in dumps

Cong Wang <xiyou.wangcong@gmail.com>
    vlan: fix a use-after-free in vlan_device_event()

Hangbin Liu <liuhangbin@gmail.com>
    bonding: discard lowest hash bit for 802.3ad layer3+4

Ye Yin <hustcat@gmail.com>
    netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed

Eric Dumazet <edumazet@google.com>
    tcp: do not mangle skb->cb[] in tcp_make_synack()


-------------

Diffstat:

 Makefile                              |  4 ++--
 arch/arm64/kernel/traps.c             | 26 +++++++++++++-------------
 drivers/char/ipmi/ipmi_msghandler.c   | 10 ++++++----
 drivers/net/bonding/bond_main.c       |  2 +-
 drivers/net/ethernet/fealnx.c         |  6 +++---
 drivers/nvme/host/pci.c               |  2 +-
 drivers/tty/serial/omap-serial.c      |  2 +-
 fs/coda/upcall.c                      |  3 +--
 fs/ocfs2/file.c                       |  9 +++++++--
 include/linux/mmzone.h                |  3 ++-
 include/linux/skbuff.h                |  7 +++++++
 mm/page_alloc.c                       | 27 ++++++++++++++++++---------
 net/8021q/vlan.c                      |  6 +++---
 net/core/skbuff.c                     |  1 +
 net/ipv4/tcp_output.c                 |  9 ++-------
 net/netlink/af_netlink.c              | 17 +++++++++++------
 net/netlink/af_netlink.h              |  1 +
 net/sctp/ipv6.c                       |  2 ++
 net/sctp/socket.c                     |  4 ++++
 security/integrity/ima/ima_appraise.c |  3 +++
 20 files changed, 89 insertions(+), 55 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 01/16] tcp: do not mangle skb->cb[] in tcp_make_synack()
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 02/16] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, Christoph Paasch,
	David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>


[ Upstream commit 3b11775033dc87c3d161996c54507b15ba26414a ]

Christoph Paasch sent a patch to address the following issue :

tcp_make_synack() is leaving some TCP private info in skb->cb[],
then send the packet by other means than tcp_transmit_skb()

tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.

tcp_make_synack() should not use tcp_init_nondata_skb() :

tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
queues (the ones that are only sent via tcp_transmit_skb())

This patch fixes the issue and should even save few cpu cycles ;)

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_output.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3018,13 +3018,8 @@ struct sk_buff *tcp_make_synack(const st
 	tcp_ecn_make_synack(req, th);
 	th->source = htons(ireq->ir_num);
 	th->dest = ireq->ir_rmt_port;
-	/* Setting of flags are superfluous here for callers (and ECE is
-	 * not even correctly set)
-	 */
-	tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn,
-			     TCPHDR_SYN | TCPHDR_ACK);
-
-	th->seq = htonl(TCP_SKB_CB(skb)->seq);
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	th->seq = htonl(tcp_rsk(req)->snt_isn);
 	/* XXX data is queued and acked as is. No buffer/window check */
 	th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt);
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 02/16] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 01/16] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 03/16] bonding: discard lowest hash bit for 802.3ad layer3+4 Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ye Yin, Wei Zhou, Julian Anastasov,
	Simon Horman, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ye Yin <hustcat@gmail.com>


[ Upstream commit 2b5ec1a5f9738ee7bf8f5ec0526e75e00362c48f ]

When run ipvs in two different network namespace at the same host, and one
ipvs transport network traffic to the other network namespace ipvs.
'ipvs_property' flag will make the second ipvs take no effect. So we should
clear 'ipvs_property' when SKB network namespace changed.

Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()")
Signed-off-by: Ye Yin <hustcat@gmail.com>
Signed-off-by: Wei Zhou <chouryzhou@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |    7 +++++++
 net/core/skbuff.c      |    1 +
 2 files changed, 8 insertions(+)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3400,6 +3400,13 @@ static inline void nf_reset_trace(struct
 #endif
 }
 
+static inline void ipvs_reset(struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_IP_VS)
+	skb->ipvs_property = 0;
+#endif
+}
+
 /* Note: This doesn't put any conntrack and bridge info in dst. */
 static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src,
 			     bool copy)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4229,6 +4229,7 @@ void skb_scrub_packet(struct sk_buff *sk
 	if (!xnet)
 		return;
 
+	ipvs_reset(skb);
 	skb_orphan(skb);
 	skb->mark = 0;
 }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 03/16] bonding: discard lowest hash bit for 802.3ad layer3+4
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 01/16] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 02/16] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 04/16] vlan: fix a use-after-free in vlan_device_event() Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Paolo Abeni, Hangbin Liu, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Hangbin Liu <liuhangbin@gmail.com>


[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ]

After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), we will try to use even ports for connect(). Then if an
application (seen clearly with iperf) opens multiple streams to the same
destination IP and port, each stream will be given an even source port.

So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
will always hash all these streams to the same interface. And the total
throughput will limited to a single slave.

Change the tcp code will impact the whole tcp behavior, only for bonding
usage. Paolo Abeni suggested fix this by changing the bonding code only,
which should be more reasonable, and less impact.

Fix this by discarding the lowest hash bit because it contains little entropy.
After the fix we can re-balance between slaves.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/bonding/bond_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3166,7 +3166,7 @@ u32 bond_xmit_hash(struct bonding *bond,
 	hash ^= (hash >> 16);
 	hash ^= (hash >> 8);
 
-	return hash;
+	return hash >> 1;
 }
 
 /*-------------------------- Device entry points ----------------------------*/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 04/16] vlan: fix a use-after-free in vlan_device_event()
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2017-11-22 10:11 ` [PATCH 4.4 03/16] bonding: discard lowest hash bit for 802.3ad layer3+4 Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 05/16] af_netlink: ensure that NLMSG_DONE never fails in dumps Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Fengguang Wu, Alexander Duyck,
	Linus Torvalds, Girish Moodalbail, Cong Wang, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Cong Wang <xiyou.wangcong@gmail.com>


[ Upstream commit 052d41c01b3a2e3371d66de569717353af489d63 ]

After refcnt reaches zero, vlan_vid_del() could free
dev->vlan_info via RCU:

	RCU_INIT_POINTER(dev->vlan_info, NULL);
	call_rcu(&vlan_info->rcu, vlan_info_rcu_free);

However, the pointer 'grp' still points to that memory
since it is set before vlan_vid_del():

        vlan_info = rtnl_dereference(dev->vlan_info);
        if (!vlan_info)
                goto out;
        grp = &vlan_info->grp;

Depends on when that RCU callback is scheduled, we could
trigger a use-after-free in vlan_group_for_each_dev()
right following this vlan_vid_del().

Fix it by moving vlan_vid_del() before setting grp. This
is also symmetric to the vlan_vid_add() we call in
vlan_device_event().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/8021q/vlan.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -376,6 +376,9 @@ static int vlan_device_event(struct noti
 			dev->name);
 		vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
 	}
+	if (event == NETDEV_DOWN &&
+	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+		vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
 
 	vlan_info = rtnl_dereference(dev->vlan_info);
 	if (!vlan_info)
@@ -423,9 +426,6 @@ static int vlan_device_event(struct noti
 		struct net_device *tmp;
 		LIST_HEAD(close_list);
 
-		if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)
-			vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
-
 		/* Put all VLANs for this dev in the down state too.  */
 		vlan_group_for_each_dev(grp, i, vlandev) {
 			flgs = vlandev->flags;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 05/16] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2017-11-22 10:11 ` [PATCH 4.4 04/16] vlan: fix a use-after-free in vlan_device_event() Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:11 ` [PATCH 4.4 06/16] sctp: do not peel off an assoc from one netns to another one Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason A. Donenfeld, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Jason A. Donenfeld" <Jason@zx2c4.com>


[ Upstream commit 0642840b8bb008528dbdf929cec9f65ac4231ad0 ]

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/netlink/af_netlink.c |   17 +++++++++++------
 net/netlink/af_netlink.h |    1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2077,7 +2077,7 @@ static int netlink_dump(struct sock *sk)
 	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	struct module *module;
-	int len, err = -ENOBUFS;
+	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
 
@@ -2125,9 +2125,11 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	if (nlk->dump_done_errno > 0)
+		nlk->dump_done_errno = cb->dump(skb, cb);
 
-	if (len > 0) {
+	if (nlk->dump_done_errno > 0 ||
+	    skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2137,13 +2139,15 @@ static int netlink_dump(struct sock *sk)
 		return 0;
 	}
 
-	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+			       sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
 
-	memcpy(nlmsg_data(nlh), &len, sizeof(len));
+	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno,
+	       sizeof(nlk->dump_done_errno));
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
@@ -2208,6 +2212,7 @@ int __netlink_dump_start(struct sock *ss
 	cb->skb = skb;
 
 	nlk->cb_running = true;
+	nlk->dump_done_errno = INT_MAX;
 
 	mutex_unlock(nlk->cb_mutex);
 
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -38,6 +38,7 @@ struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	int			dump_done_errno;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 06/16] sctp: do not peel off an assoc from one netns to another one
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2017-11-22 10:11 ` [PATCH 4.4 05/16] af_netlink: ensure that NLMSG_DONE never fails in dumps Greg Kroah-Hartman
@ 2017-11-22 10:11 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 07/16] fealnx: Fix building error on MIPS Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:11 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, ChunYu Wang, Xin Long,
	Marcelo Ricardo Leitner, Neil Horman, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xin Long <lucien.xin@gmail.com>


[ Upstream commit df80cd9b28b9ebaa284a41df611dbf3a2d05ca74 ]

Now when peeling off an association to the sock in another netns, all
transports in this assoc are not to be rehashed and keep use the old
key in hashtable.

As a transport uses sk->net as the hash key to insert into hashtable,
it would miss removing these transports from hashtable due to the new
netns when closing the sock and all transports are being freeed, then
later an use-after-free issue could be caused when looking up an asoc
and dereferencing those transports.

This is a very old issue since very beginning, ChunYu found it with
syzkaller fuzz testing with this series:

  socket$inet6_sctp()
  bind$inet6()
  sendto$inet6()
  unshare(0x40000000)
  getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
  getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()

This patch is to block this call when peeling one assoc off from one
netns to another one, so that the netns of all transport would not
go out-sync with the key in hashtable.

Note that this patch didn't fix it by rehashing transports, as it's
difficult to handle the situation when the tuple is already in use
in the new netns. Besides, no one would like to peel off one assoc
to another netns, considering ipaddrs, ifaces, etc. are usually
different.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/socket.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4457,6 +4457,10 @@ int sctp_do_peeloff(struct sock *sk, sct
 	if (!net_eq(current->nsproxy->net_ns, sock_net(sk)))
 		return -EINVAL;
 
+	/* Do not peel off from one netns to another one. */
+	if (!net_eq(current->nsproxy->net_ns, sock_net(sk)))
+		return -EINVAL;
+
 	if (!asoc)
 		return -EINVAL;
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 07/16] fealnx: Fix building error on MIPS
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2017-11-22 10:11 ` [PATCH 4.4 06/16] sctp: do not peel off an assoc from one netns to another one Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 08/16] net/sctp: Always set scope_id in sctp_inet6_skb_msgname Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Huacai Chen, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Huacai Chen <chenhc@lemote.com>


[ Upstream commit cc54c1d32e6a4bb3f116721abf900513173e4d02 ]

This patch try to fix the building error on MIPS. The reason is MIPS
has already defined the LONG macro, which conflicts with the LONG enum
in drivers/net/ethernet/fealnx.c.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/fealnx.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/fealnx.c
+++ b/drivers/net/ethernet/fealnx.c
@@ -257,8 +257,8 @@ enum rx_desc_status_bits {
 	RXFSD = 0x00000800,	/* first descriptor */
 	RXLSD = 0x00000400,	/* last descriptor */
 	ErrorSummary = 0x80,	/* error summary */
-	RUNT = 0x40,		/* runt packet received */
-	LONG = 0x20,		/* long packet received */
+	RUNTPKT = 0x40,		/* runt packet received */
+	LONGPKT = 0x20,		/* long packet received */
 	FAE = 0x10,		/* frame align error */
 	CRC = 0x08,		/* crc error */
 	RXER = 0x04,		/* receive error */
@@ -1633,7 +1633,7 @@ static int netdev_rx(struct net_device *
 					       dev->name, rx_status);
 
 				dev->stats.rx_errors++;	/* end of a packet. */
-				if (rx_status & (LONG | RUNT))
+				if (rx_status & (LONGPKT | RUNTPKT))
 					dev->stats.rx_length_errors++;
 				if (rx_status & RXER)
 					dev->stats.rx_frame_errors++;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 08/16] net/sctp: Always set scope_id in sctp_inet6_skb_msgname
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 07/16] fealnx: Fix building error on MIPS Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 09/16] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alexander Potapenko,
	Eric W. Biederman, David S. Miller

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <ebiederm@xmission.com>


[ Upstream commit 7c8a61d9ee1df0fb4747879fa67a99614eb62fec ]

Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
discovered that in some configurations sctp would leak 4 bytes of
kernel stack.

Working with his reproducer I discovered that those 4 bytes that
are leaked is the scope id of an ipv6 address returned by recvmsg.

With a little code inspection and a shrewd guess I discovered that
sctp_inet6_skb_msgname only initializes the scope_id field for link
local ipv6 addresses to the interface index the link local address
pertains to instead of initializing the scope_id field for all ipv6
addresses.

That is almost reasonable as scope_id's are meaniningful only for link
local addresses.  Set the scope_id in all other cases to 0 which is
not a valid interface index to make it clear there is nothing useful
in the scope_id field.

There should be no danger of breaking userspace as the stack leak
guaranteed that previously meaningless random data was being returned.

Fixes: 372f525b495c ("SCTP:  Resync with LKSCTP tree.")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sctp/ipv6.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -806,6 +806,8 @@ static void sctp_inet6_skb_msgname(struc
 		if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
 			struct sctp_ulpevent *ev = sctp_skb2event(skb);
 			addr->v6.sin6_scope_id = ev->iif;
+		} else {
+			addr->v6.sin6_scope_id = 0;
 		}
 	}
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 09/16] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 08/16] net/sctp: Always set scope_id in sctp_inet6_skb_msgname Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 10/16] serial: omap: Fix EFR write on RTS deassertion Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Roberto Sassu, Mimi Zohar, James Morris

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Roberto Sassu <roberto.sassu@huawei.com>

commit 020aae3ee58c1af0e7ffc4e2cc9fe4dc630338cb upstream.

Commit b65a9cfc2c38 ("Untangling ima mess, part 2: deal with counters")
moved the call of ima_file_check() from may_open() to do_filp_open() at a
point where the file descriptor is already opened.

This breaks the assumption made by IMA that file descriptors being closed
belong to files whose access was granted by ima_file_check(). The
consequence is that security.ima and security.evm are updated with good
values, regardless of the current appraisal status.

For example, if a file does not have security.ima, IMA will create it after
opening the file for writing, even if access is denied. Access to the file
will be allowed afterwards.

Avoid this issue by checking the appraisal status before updating
security.ima.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 security/integrity/ima/ima_appraise.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/security/integrity/ima/ima_appraise.c
+++ b/security/integrity/ima/ima_appraise.c
@@ -297,6 +297,9 @@ void ima_update_xattr(struct integrity_i
 	if (iint->flags & IMA_DIGSIG)
 		return;
 
+	if (iint->ima_file_status != INTEGRITY_PASS)
+		return;
+
 	rc = ima_collect_measurement(iint, file, NULL, NULL);
 	if (rc < 0)
 		return;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 10/16] serial: omap: Fix EFR write on RTS deassertion
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 09/16] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 11/16] arm64: fix dump_instr when PAN and UAO are in use Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Peter Hurley, Lukas Wunner

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Lukas Wunner <lukas@wunner.de>

commit 2a71de2f7366fb1aec632116d0549ec56d6a3940 upstream.

Commit 348f9bb31c56 ("serial: omap: Fix RTS handling") sought to enable
auto RTS upon manual RTS assertion and disable it on deassertion.
However it seems the latter was done incorrectly, it clears all bits in
the Extended Features Register *except* auto RTS.

Fixes: 348f9bb31c56 ("serial: omap: Fix RTS handling")
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/omap-serial.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/tty/serial/omap-serial.c
+++ b/drivers/tty/serial/omap-serial.c
@@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct
 	if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS))
 		up->efr |= UART_EFR_RTS;
 	else
-		up->efr &= UART_EFR_RTS;
+		up->efr &= ~UART_EFR_RTS;
 	serial_out(up, UART_EFR, up->efr);
 	serial_out(up, UART_LCR, lcr);
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 11/16] arm64: fix dump_instr when PAN and UAO are in use
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 10/16] serial: omap: Fix EFR write on RTS deassertion Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 12/16] [PATCH-stable] nvme: Fix memory order on async queue deletion Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Catalin Marinas, James Morse,
	Robin Murphy, Mark Rutland, Vladimir Murzin, Will Deacon

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mark Rutland <mark.rutland@arm.com>

commit c5cea06be060f38e5400d796e61cfc8c36e52924 upstream.

If the kernel is set to show unhandled signals, and a user task does not
handle a SIGILL as a result of an instruction abort, we will attempt to
log the offending instruction with dump_instr before killing the task.

We use dump_instr to log the encoding of the offending userspace
instruction. However, dump_instr is also used to dump instructions from
kernel space, and internally always switches to KERNEL_DS before dumping
the instruction with get_user. When both PAN and UAO are in use, reading
a user instruction via get_user while in KERNEL_DS will result in a
permission fault, which leads to an Oops.

As we have regs corresponding to the context of the original instruction
abort, we can inspect this and only flip to KERNEL_DS if the original
abort was taken from the kernel, avoiding this issue. At the same time,
remove the redundant (and incorrect) comments regarding the order
dump_mem and dump_instr are called in.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Fixes: 57f4959bad0a154a ("arm64: kernel: Add support for User Access Override")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm64/kernel/traps.c |   26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -64,8 +64,7 @@ static void dump_mem(const char *lvl, co
 
 	/*
 	 * We need to switch to kernel mode so that we can use __get_user
-	 * to safely read from kernel space.  Note that we now dump the
-	 * code first, just in case the backtrace kills us.
+	 * to safely read from kernel space.
 	 */
 	fs = get_fs();
 	set_fs(KERNEL_DS);
@@ -111,21 +110,12 @@ static void dump_backtrace_entry(unsigne
 	print_ip_sym(where);
 }
 
-static void dump_instr(const char *lvl, struct pt_regs *regs)
+static void __dump_instr(const char *lvl, struct pt_regs *regs)
 {
 	unsigned long addr = instruction_pointer(regs);
-	mm_segment_t fs;
 	char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str;
 	int i;
 
-	/*
-	 * We need to switch to kernel mode so that we can use __get_user
-	 * to safely read from kernel space.  Note that we now dump the
-	 * code first, just in case the backtrace kills us.
-	 */
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-
 	for (i = -4; i < 1; i++) {
 		unsigned int val, bad;
 
@@ -139,8 +129,18 @@ static void dump_instr(const char *lvl,
 		}
 	}
 	printk("%sCode: %s\n", lvl, str);
+}
 
-	set_fs(fs);
+static void dump_instr(const char *lvl, struct pt_regs *regs)
+{
+	if (!user_mode(regs)) {
+		mm_segment_t fs = get_fs();
+		set_fs(KERNEL_DS);
+		__dump_instr(lvl, regs);
+		set_fs(fs);
+	} else {
+		__dump_instr(lvl, regs);
+	}
 }
 
 static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 12/16] [PATCH-stable] nvme: Fix memory order on async queue deletion
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 11/16] arm64: fix dump_instr when PAN and UAO are in use Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Keith Busch

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Keith Busch <keith.busch@intel.com>

This patch is a fix specific to the 3.19 - 4.4 kernels. The 4.5 kernel
inadvertently fixed this bug differently (db3cbfff5bcc0), but is not
a stable candidate due it being a complicated re-write of the entire
feature.

This patch fixes a potential timing bug with nvme's asynchronous queue
deletion, which causes an allocated request to be accidentally released
due to the ordering of the shared completion context among the sq/cq
pair. The completion context saves the request that issued the queue
deletion. If the submission side deletion happens to reset the active
request, the completion side will release the wrong request tag back into
the pool of available tags. This means the driver will create multiple
commands with the same tag, corrupting the queue context.

The error is observable in the kernel logs like:

  "nvme XX:YY:ZZ completed id XX twice on qid:0"

In this particular case, this message occurs because the queue is
corrupted.

The following timing sequence demonstrates the error:

  CPU A                                 CPU B
  -----------------------               -----------------------------
  nvme_irq
   nvme_process_cq
    async_completion
     queue_kthread_work  ----------->   nvme_del_sq_work_handler
                                         nvme_delete_cq
                                          adapter_async_del_queue
                                           nvme_submit_admin_async_cmd
                                            cmdinfo->req = req;

     blk_mq_free_request(cmdinfo->req); <-- wrong request!!!

This patch fixes the bug by releasing the request in the completion side
prior to waking the submission thread, such that that thread can't muck
with the shared completion context.

Fixes: a4aea5623d4a5 ("NVMe: Convert to blk-mq")

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/nvme/host/pci.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -350,8 +350,8 @@ static void async_completion(struct nvme
 	struct async_cmd_info *cmdinfo = ctx;
 	cmdinfo->result = le32_to_cpup(&cqe->result);
 	cmdinfo->status = le16_to_cpup(&cqe->status) >> 1;
-	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
 	blk_mq_free_request(cmdinfo->req);
+	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
 }
 
 static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 12/16] [PATCH-stable] nvme: Fix memory order on async queue deletion Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-12-05 15:49   ` Ben Hutchings
  2017-11-22 10:12 ` [PATCH 4.4 14/16] ipmi: fix unsigned long underflow Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Alex Chen, Jun Piao, Joseph Qi,
	Changwei Ge, Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: alex chen <alex.chen@huawei.com>

commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.

we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will happen:

process 1                  process 2                    process 3
truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
ocfs2_setattr
 ocfs2_inode_lock_tracker
  ocfs2_inode_lock_full
 inode_dio_wait
  __inode_dio_wait
  -->waiting for all dio
  requests finish
                                                        dlm_proxy_ast_handler
                                                         dlm_do_local_bast
                                                          ocfs2_blocking_ast
                                                           ocfs2_generic_handle_bast
                                                            set OCFS2_LOCK_BLOCKED flag
                        dio_end_io
                         dio_bio_end_aio
                          dio_complete
                           ocfs2_dio_end_io
                            ocfs2_dio_end_io_write
                             ocfs2_inode_lock
                              __ocfs2_cluster_lock
                               ocfs2_wait_for_mask
                               -->waiting for OCFS2_LOCK_BLOCKED
                               flag to be cleared, that is waiting
                               for 'process 1' unlocking the inode lock
                           inode_dio_end
                           -->here dec the i_dio_count, but will never
                           be called, so a deadlock happened.

Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/ocfs2/file.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1166,6 +1166,13 @@ int ocfs2_setattr(struct dentry *dentry,
 	}
 	size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
 	if (size_change) {
+		/*
+		 * Here we should wait dio to finish before inode lock
+		 * to avoid a deadlock between ocfs2_setattr() and
+		 * ocfs2_dio_end_io_write()
+		 */
+		inode_dio_wait(inode);
+
 		status = ocfs2_rw_lock(inode, 1);
 		if (status < 0) {
 			mlog_errno(status);
@@ -1186,8 +1193,6 @@ int ocfs2_setattr(struct dentry *dentry,
 		if (status)
 			goto bail_unlock;
 
-		inode_dio_wait(inode);
-
 		if (i_size_read(inode) >= attr->ia_size) {
 			if (ocfs2_should_order_data(inode)) {
 				status = ocfs2_begin_ordered_truncate(inode,

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 14/16] ipmi: fix unsigned long underflow
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 15/16] mm/page_alloc.c: broken deferred calculation Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Weilong Chen, Corey Minyard

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Corey Minyard <cminyard@mvista.com>

commit 392a17b10ec4320d3c0e96e2a23ebaad1123b989 upstream.

When I set the timeout to a specific value such as 500ms, the timeout
event will not happen in time due to the overflow in function
check_msg_timeout:
...
	ent->timeout -= timeout_period;
	if (ent->timeout > 0)
		return;
...

The type of timeout_period is long, but ent->timeout is unsigned long.
This patch makes the type consistent.

Reported-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/char/ipmi/ipmi_msghandler.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -4029,7 +4029,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struc
 }
 
 static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent,
-			      struct list_head *timeouts, long timeout_period,
+			      struct list_head *timeouts,
+			      unsigned long timeout_period,
 			      int slot, unsigned long *flags,
 			      unsigned int *waiting_msgs)
 {
@@ -4042,8 +4043,8 @@ static void check_msg_timeout(ipmi_smi_t
 	if (!ent->inuse)
 		return;
 
-	ent->timeout -= timeout_period;
-	if (ent->timeout > 0) {
+	if (timeout_period < ent->timeout) {
+		ent->timeout -= timeout_period;
 		(*waiting_msgs)++;
 		return;
 	}
@@ -4109,7 +4110,8 @@ static void check_msg_timeout(ipmi_smi_t
 	}
 }
 
-static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period)
+static unsigned int ipmi_timeout_handler(ipmi_smi_t intf,
+					 unsigned long timeout_period)
 {
 	struct list_head     timeouts;
 	struct ipmi_recv_msg *msg, *msg2;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 15/16] mm/page_alloc.c: broken deferred calculation
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 14/16] ipmi: fix unsigned long underflow Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 10:12 ` [PATCH 4.4 16/16] coda: fix kernel memory exposure attempt in fsync Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pavel Tatashin, Michal Hocko,
	Mel Gorman, Andrew Morton, Linus Torvalds

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <pasha.tatashin@oracle.com>

commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.

In reset_deferred_meminit() we determine number of pages that must not
be deferred.  We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.

The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes.  However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.

The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.

Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mmzone.h |    3 ++-
 mm/page_alloc.c        |   27 ++++++++++++++++++---------
 2 files changed, 20 insertions(+), 10 deletions(-)

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -688,7 +688,8 @@ typedef struct pglist_data {
 	 * is the first PFN that needs to be initialised.
 	 */
 	unsigned long first_deferred_pfn;
-	unsigned long static_init_size;
+	/* Number of non-deferred pages */
+	unsigned long static_init_pgcnt;
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 } pg_data_t;
 
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -267,28 +267,37 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+
+/*
+ * Determine how many pages need to be initialized durig early boot
+ * (non-deferred initialization).
+ * The value of first_deferred_pfn will be set later, once non-deferred pages
+ * are initialized, but for now set it ULONG_MAX.
+ */
 static inline void reset_deferred_meminit(pg_data_t *pgdat)
 {
-	unsigned long max_initialise;
-	unsigned long reserved_lowmem;
+	phys_addr_t start_addr, end_addr;
+	unsigned long max_pgcnt;
+	unsigned long reserved;
 
 	/*
 	 * Initialise at least 2G of a node but also take into account that
 	 * two large system hashes that can take up 1GB for 0.25TB/node.
 	 */
-	max_initialise = max(2UL << (30 - PAGE_SHIFT),
-		(pgdat->node_spanned_pages >> 8));
+	max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
+			(pgdat->node_spanned_pages >> 8));
 
 	/*
 	 * Compensate the all the memblock reservations (e.g. crash kernel)
 	 * from the initial estimation to make sure we will initialize enough
 	 * memory to boot.
 	 */
-	reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
-			pgdat->node_start_pfn + max_initialise);
-	max_initialise += reserved_lowmem;
+	start_addr = PFN_PHYS(pgdat->node_start_pfn);
+	end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
+	reserved = memblock_reserved_memory_within(start_addr, end_addr);
+	max_pgcnt += PHYS_PFN(reserved);
 
-	pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
+	pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
 	pgdat->first_deferred_pfn = ULONG_MAX;
 }
 
@@ -324,7 +333,7 @@ static inline bool update_defer_init(pg_
 		return true;
 	/* Initialise at least 2G of the highest zone */
 	(*nr_initialised)++;
-	if ((*nr_initialised > pgdat->static_init_size) &&
+	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4.4 16/16] coda: fix kernel memory exposure attempt in fsync
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 15/16] mm/page_alloc.c: broken deferred calculation Greg Kroah-Hartman
@ 2017-11-22 10:12 ` Greg Kroah-Hartman
  2017-11-22 15:29 ` [PATCH 4.4 00/16] 4.4.101-stable review Nathan Chancellor
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 10:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jan Harkes, Al Viro

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jan Harkes <jaharkes@cs.cmu.edu>

commit d337b66a4c52c7b04eec661d86c2ef6e168965a2 upstream.

When an application called fsync on a file in Coda a small request with
just the file identifier was allocated, but the declared length was set
to the size of union of all possible upcall requests.

This bug has been around for a very long time and is now caught by the
extra checking in usercopy that was introduced in Linux-4.8.

The exposure happens when the Coda cache manager process reads the fsync
upcall request at which point it is killed. As a result there is nobody
servicing any further upcalls, trapping any processes that try to access
the mounted Coda filesystem.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/coda/upcall.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/coda/upcall.c
+++ b/fs/coda/upcall.c
@@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb,
 	UPARG(CODA_FSYNC);
 
 	inp->coda_fsync.VFid = *fid;
-	error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs),
-			    &outsize, inp);
+	error = coda_upcall(coda_vcp(sb), insize, &outsize, inp);
 
 	CODA_FREE(inp, insize);
 	return error;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 00/16] 4.4.101-stable review
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2017-11-22 10:12 ` [PATCH 4.4 16/16] coda: fix kernel memory exposure attempt in fsync Greg Kroah-Hartman
@ 2017-11-22 15:29 ` Nathan Chancellor
  2017-11-22 17:05   ` Greg Kroah-Hartman
  2017-11-22 21:32 ` Guenter Roeck
  2017-11-23 14:28 ` Naresh Kamboju
  18 siblings, 1 reply; 33+ messages in thread
From: Nathan Chancellor @ 2017-11-22 15:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Wed, Nov 22, 2017 at 11:11:53AM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.101 release.
> There are 16 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Nov 24 10:11:01 UTC 2017.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.101-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 
> -------------
> Pseudo-Shortlog of commits:
> 
> Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>     Linux 4.4.101-rc1
> 
> Jan Harkes <jaharkes@cs.cmu.edu>
>     coda: fix 'kernel memory exposure attempt' in fsync
> 
> Pavel Tatashin <pasha.tatashin@oracle.com>
>     mm/page_alloc.c: broken deferred calculation
> 
> Corey Minyard <cminyard@mvista.com>
>     ipmi: fix unsigned long underflow
> 
> alex chen <alex.chen@huawei.com>
>     ocfs2: should wait dio before inode lock in ocfs2_setattr()
> 
> Keith Busch <keith.busch@intel.com>
>     nvme: Fix memory order on async queue deletion
> 
> Mark Rutland <mark.rutland@arm.com>
>     arm64: fix dump_instr when PAN and UAO are in use
> 
> Lukas Wunner <lukas@wunner.de>
>     serial: omap: Fix EFR write on RTS deassertion
> 
> Roberto Sassu <roberto.sassu@huawei.com>
>     ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
> 
> Eric W. Biederman <ebiederm@xmission.com>
>     net/sctp: Always set scope_id in sctp_inet6_skb_msgname
> 
> Huacai Chen <chenhc@lemote.com>
>     fealnx: Fix building error on MIPS
> 
> Xin Long <lucien.xin@gmail.com>
>     sctp: do not peel off an assoc from one netns to another one
> 
> Jason A. Donenfeld <Jason@zx2c4.com>
>     af_netlink: ensure that NLMSG_DONE never fails in dumps
> 
> Cong Wang <xiyou.wangcong@gmail.com>
>     vlan: fix a use-after-free in vlan_device_event()
> 
> Hangbin Liu <liuhangbin@gmail.com>
>     bonding: discard lowest hash bit for 802.3ad layer3+4
> 
> Ye Yin <hustcat@gmail.com>
>     netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
> 
> Eric Dumazet <edumazet@google.com>
>     tcp: do not mangle skb->cb[] in tcp_make_synack()
> 
> 
> -------------
> 
> Diffstat:
> 
>  Makefile                              |  4 ++--
>  arch/arm64/kernel/traps.c             | 26 +++++++++++++-------------
>  drivers/char/ipmi/ipmi_msghandler.c   | 10 ++++++----
>  drivers/net/bonding/bond_main.c       |  2 +-
>  drivers/net/ethernet/fealnx.c         |  6 +++---
>  drivers/nvme/host/pci.c               |  2 +-
>  drivers/tty/serial/omap-serial.c      |  2 +-
>  fs/coda/upcall.c                      |  3 +--
>  fs/ocfs2/file.c                       |  9 +++++++--
>  include/linux/mmzone.h                |  3 ++-
>  include/linux/skbuff.h                |  7 +++++++
>  mm/page_alloc.c                       | 27 ++++++++++++++++++---------
>  net/8021q/vlan.c                      |  6 +++---
>  net/core/skbuff.c                     |  1 +
>  net/ipv4/tcp_output.c                 |  9 ++-------
>  net/netlink/af_netlink.c              | 17 +++++++++++------
>  net/netlink/af_netlink.h              |  1 +
>  net/sctp/ipv6.c                       |  2 ++
>  net/sctp/socket.c                     |  4 ++++
>  security/integrity/ima/ima_appraise.c |  3 +++
>  20 files changed, 89 insertions(+), 55 deletions(-)
> 
>

Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5. No
initial issues noticed in general usage or dmesg.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 00/16] 4.4.101-stable review
  2017-11-22 15:29 ` [PATCH 4.4 00/16] 4.4.101-stable review Nathan Chancellor
@ 2017-11-22 17:05   ` Greg Kroah-Hartman
  2017-11-22 17:38     ` Nathan Chancellor
  0 siblings, 1 reply; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-22 17:05 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Wed, Nov 22, 2017 at 08:29:10AM -0700, Nathan Chancellor wrote:
> Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5. No
> initial issues noticed in general usage or dmesg.

Ah, also works for the OnePlus 5?  Nice, so that kind of means that the
5T is also this same kernel version?  Time to go order one of those as
well... :)

thanks for testing and letting me know.

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 00/16] 4.4.101-stable review
  2017-11-22 17:05   ` Greg Kroah-Hartman
@ 2017-11-22 17:38     ` Nathan Chancellor
  0 siblings, 0 replies; 33+ messages in thread
From: Nathan Chancellor @ 2017-11-22 17:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, linux, shuahkh, patches,
	ben.hutchings, lkft-triage, stable

On Wed, Nov 22, 2017 at 06:05:27PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Nov 22, 2017 at 08:29:10AM -0700, Nathan Chancellor wrote:
> > Merged, compiled, and flashed onto my Pixel 2 XL and OnePlus 5. No
> > initial issues noticed in general usage or dmesg.
> 
> Ah, also works for the OnePlus 5?  Nice, so that kind of means that the
> 5T is also this same kernel version?  Time to go order one of those as
> well... :)
> 
> thanks for testing and letting me know.
> 
> greg k-h

Yes, all Snapdragon 835 devices should be based on a 4.4 kernel. I have
a Pixel 2 XL, OnePlus 5, and an Essential Phone. Haven't been able to
test on the Essential yet as Nougat is on 4.4.21 and I have neither the
time nor the patience to do that bringup again (was bad enough on the
OnePlus 5) and I can't flash a kernel on Oreo yet.

The 5T should be unified with the 5 at some point (one other developer I
know built and flashed the 5T on the 5 and everything minus the
fingerprint scanner worked fine).

Source is here: https://github.com/OnePlusOSS/android_kernel_oneplus_msm8998

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 00/16] 4.4.101-stable review
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2017-11-22 15:29 ` [PATCH 4.4 00/16] 4.4.101-stable review Nathan Chancellor
@ 2017-11-22 21:32 ` Guenter Roeck
  2017-11-23 14:28 ` Naresh Kamboju
  18 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2017-11-22 21:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuahkh, patches, ben.hutchings,
	lkft-triage, stable

On Wed, Nov 22, 2017 at 11:11:53AM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.101 release.
> There are 16 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Fri Nov 24 10:11:01 UTC 2017.
> Anything received after that time might be too late.
> 

Build results:
	total: 145 pass: 145 fail: 0
Qemu test results:
	total: 116 pass: 116 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 00/16] 4.4.101-stable review
  2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2017-11-22 21:32 ` Guenter Roeck
@ 2017-11-23 14:28 ` Naresh Kamboju
  18 siblings, 0 replies; 33+ messages in thread
From: Naresh Kamboju @ 2017-11-23 14:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, Guenter Roeck, Shuah Khan, patches,
	Ben Hutchings, lkft-triage, linux- stable, Tom Gall

On 22 November 2017 at 15:41, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 4.4.101 release.
> There are 16 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Nov 24 10:11:01 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.101-rc1.gz
> or in the git tree and branch at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.
Also as per usual the HiKey results are reported separate because the
platform support isn’t in tree.

Summary
------------------------------------------------------------------------

kernel: 4.4.101-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: 904d324caabd69df4d4ab2d43788e9ba6ba01e5c
git describe: v4.4.100-20-g904d324caabd
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.100-20-g904d324caabd

No regressions (compared to build v4.4.100-19-g57e6ad4bfed1)

Boards, architectures and test suites:
-------------------------------------

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 28, fail: 2, skip: 23
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 28, skip: 36
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 936, skip: 159
* ltp-timers-tests - pass: 12,

x15 - arm
* boot - pass: 20,
* kselftest - pass: 28, skip: 26,
* libhugetlbfs - pass: 87, skip: 1,
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 20, skip: 2
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 13, skip: 1,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1035, skip: 67,
* ltp-timers-tests - pass: 12,

SuperServer 5019S-ML - x86_64
* boot - pass: 20,
* kselftest - pass: 40, skip: 30
* libhugetlbfs - pass: 76, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64,
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 61, skip: 1,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 18,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 9, skip: 1
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 956, skip: 164,
* ltp-timers-tests - pass: 12,

And the hikey results.

Summary
------------------------------------------------------------------------

kernel: 4.4.101-rc1
git repo: https://git.linaro.org/lkft/arm64-stable-rc.git
git tag: 4.4.101-rc1-hikey-20171123-59
git commit: 6ac5099d48384b7c967aa09727e0c98be846957a
git describe: 4.4.101-rc1-hikey-20171123-59
Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.101-rc1-hikey-20171123-59

No regressions (compared to build 4.4.101-rc1-hikey-20171123)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 26, skip: 28
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 28, skip: 36
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 60,
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 979, skip: 124,
* ltp-timers-tests - pass: 12,

Documentation - https://collaborate.linaro.org/display/LKFT/Email+Reports

Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-11-22 10:12 ` [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
@ 2017-12-05 15:49   ` Ben Hutchings
  2017-12-06  1:02     ` alex chen
  0 siblings, 1 reply; 33+ messages in thread
From: Ben Hutchings @ 2017-12-05 15:49 UTC (permalink / raw)
  To: Alex Chen, Jun Piao, Joseph Qi, Greg Kroah-Hartman
  Cc: stable, Changwei Ge, Mark Fasheh, Joel Becker, Junxiao Bi,
	Andrew Morton, Linus Torvalds, LKML

On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: alex chen <alex.chen@huawei.com>
> 
> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> 
> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will happen:
[...]

I looked at the kernel-doc for inode_dio_wait():

/**
 * inode_dio_wait - wait for outstanding DIO requests to finish
 * @inode: inode to wait for
 *
 * Waits for all pending direct I/O requests to finish so that we can
 * proceed with a truncate or equivalent operation.
 *
 * Must be called under a lock that serializes taking new references
 * to i_dio_count, usually by inode->i_mutex.
 */

Now that ocfs2_setattr() calls this outside of the inode locked region,
what prevents another task adding a new dio request immediately
afterward?

Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
the dio completion path didn't previously take the inode lock.  So it
doesn't look this fix is needed in 3.18 or 4.4.

Ben.

-- 
Ben Hutchings
Software Developer, Codethink Ltd.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-05 15:49   ` Ben Hutchings
@ 2017-12-06  1:02     ` alex chen
  2017-12-06 16:36       ` Greg Kroah-Hartman
  2017-12-07 18:25       ` Ben Hutchings
  0 siblings, 2 replies; 33+ messages in thread
From: alex chen @ 2017-12-06  1:02 UTC (permalink / raw)
  To: Ben Hutchings, Greg Kroah-Hartman
  Cc: piaojun, Joseph Qi, stable, Changwei Ge, Mark Fasheh,
	Joel Becker, Junxiao Bi, Andrew Morton, Linus Torvalds, LKML

Hi Ben,

Thanks for your reply.

On 2017/12/5 23:49, Ben Hutchings wrote:
> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: alex chen <alex.chen@huawei.com>
>>
>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>
>> we should wait dio requests to finish before inode lock in
>> ocfs2_setattr(), otherwise the following deadlock will happen:
> [...]
> 
> I looked at the kernel-doc for inode_dio_wait():
> 
> /**
>  * inode_dio_wait - wait for outstanding DIO requests to finish
>  * @inode: inode to wait for
>  *
>  * Waits for all pending direct I/O requests to finish so that we can
>  * proceed with a truncate or equivalent operation.
>  *
>  * Must be called under a lock that serializes taking new references
>  * to i_dio_count, usually by inode->i_mutex.
>  */
> 
> Now that ocfs2_setattr() calls this outside of the inode locked region,
> what prevents another task adding a new dio request immediately
> afterward?
>

In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
prevent another bio to be issued from this node.
Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
to guarantee no more bio will be issued from the other nodes in this cluster.

> Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
> the dio completion path didn't previously take the inode lock.  So it
> doesn't look this fix is needed in 3.18 or 4.4.

Yes, ocfs2_dio_end_io_write() was introduced in 4.6 and the problem this patch
fixes is only exist in the kernel 4.6 and above 4.6.

I'm sorry that I don't clearly point out which the stable version of kernel this patch
will fixes.

Thanks,
Alex

> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-06  1:02     ` alex chen
@ 2017-12-06 16:36       ` Greg Kroah-Hartman
  2017-12-07 18:25       ` Ben Hutchings
  1 sibling, 0 replies; 33+ messages in thread
From: Greg Kroah-Hartman @ 2017-12-06 16:36 UTC (permalink / raw)
  To: alex chen
  Cc: Ben Hutchings, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML

On Wed, Dec 06, 2017 at 09:02:42AM +0800, alex chen wrote:
> Hi Ben,
> 
> Thanks for your reply.
> 
> On 2017/12/5 23:49, Ben Hutchings wrote:
> > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> >> 4.4-stable review patch.  If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: alex chen <alex.chen@huawei.com>
> >>
> >> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> >>
> >> we should wait dio requests to finish before inode lock in
> >> ocfs2_setattr(), otherwise the following deadlock will happen:
> > [...]
> > 
> > I looked at the kernel-doc for inode_dio_wait():
> > 
> > /**
> >  * inode_dio_wait - wait for outstanding DIO requests to finish
> >  * @inode: inode to wait for
> >  *
> >  * Waits for all pending direct I/O requests to finish so that we can
> >  * proceed with a truncate or equivalent operation.
> >  *
> >  * Must be called under a lock that serializes taking new references
> >  * to i_dio_count, usually by inode->i_mutex.
> >  */
> > 
> > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > what prevents another task adding a new dio request immediately
> > afterward?
> >
> 
> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> prevent another bio to be issued from this node.
> Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
> to guarantee no more bio will be issued from the other nodes in this cluster.
> 
> > Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
> > the dio completion path didn't previously take the inode lock.  So it
> > doesn't look this fix is needed in 3.18 or 4.4.
> 
> Yes, ocfs2_dio_end_io_write() was introduced in 4.6 and the problem this patch
> fixes is only exist in the kernel 4.6 and above 4.6.
> 
> I'm sorry that I don't clearly point out which the stable version of kernel this patch
> will fixes.

Not a problem, now dropped.

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-06  1:02     ` alex chen
  2017-12-06 16:36       ` Greg Kroah-Hartman
@ 2017-12-07 18:25       ` Ben Hutchings
  2017-12-08  0:39         ` alex chen
  1 sibling, 1 reply; 33+ messages in thread
From: Ben Hutchings @ 2017-12-07 18:25 UTC (permalink / raw)
  To: alex chen, Greg Kroah-Hartman
  Cc: piaojun, Joseph Qi, stable, Changwei Ge, Mark Fasheh,
	Joel Becker, Junxiao Bi, Andrew Morton, Linus Torvalds, LKML

On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> Hi Ben,
> 
> Thanks for your reply.
> 
> On 2017/12/5 23:49, Ben Hutchings wrote:
> > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > > 
> > > ------------------
> > > 
> > > From: alex chen <alex.chen@huawei.com>
> > > 
> > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > > 
> > > we should wait dio requests to finish before inode lock in
> > > ocfs2_setattr(), otherwise the following deadlock will happen:
> > 
> > [...]
> > 
> > I looked at the kernel-doc for inode_dio_wait():
> > 
> > /**
> >  * inode_dio_wait - wait for outstanding DIO requests to finish
> >  * @inode: inode to wait for
> >  *
> >  * Waits for all pending direct I/O requests to finish so that we can
> >  * proceed with a truncate or equivalent operation.
> >  *
> >  * Must be called under a lock that serializes taking new references
> >  * to i_dio_count, usually by inode->i_mutex.
> >  */
> > 
> > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > what prevents another task adding a new dio request immediately
> > afterward?
> > 
> 
> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> prevent another bio to be issued from this node.
[...]

Yes but there seems to be a race condition - after the call to
inode_dio_wait() and before the call to inode_lock(), another dio
request can be added.

Ben.

-- 
Ben Hutchings
Software Developer, Codethink Ltd.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-07 18:25       ` Ben Hutchings
@ 2017-12-08  0:39         ` alex chen
  2017-12-08  2:26           ` Ben Hutchings
  0 siblings, 1 reply; 33+ messages in thread
From: alex chen @ 2017-12-08  0:39 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML



On 2017/12/8 2:25, Ben Hutchings wrote:
> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>> Hi Ben,
>>
>> Thanks for your reply.
>>
>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>>
>>>> From: alex chen <alex.chen@huawei.com>
>>>>
>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>
>>>> we should wait dio requests to finish before inode lock in
>>>> ocfs2_setattr(), otherwise the following deadlock will happen:
>>>
>>> [...]
>>>
>>> I looked at the kernel-doc for inode_dio_wait():
>>>
>>> /**
>>>  * inode_dio_wait - wait for outstanding DIO requests to finish
>>>  * @inode: inode to wait for
>>>  *
>>>  * Waits for all pending direct I/O requests to finish so that we can
>>>  * proceed with a truncate or equivalent operation.
>>>  *
>>>  * Must be called under a lock that serializes taking new references
>>>  * to i_dio_count, usually by inode->i_mutex.
>>>  */
>>>
>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>> what prevents another task adding a new dio request immediately
>>> afterward?
>>>
>>
>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>> prevent another bio to be issued from this node.
> [...]
> 
> Yes but there seems to be a race condition - after the call to
> inode_dio_wait() and before the call to inode_lock(), another dio
> request can be added.

In the truncating file situation, the lock order is as follow:
do_truncate()
 inode_lock()
 notify_change()
  ocfs2_setattr()
   inode_dio_wait()
    --here it is under the protect of inode_lock(), so another dio requests
      from another process will not be added.
   ocfs2_rw_lock()
   ocfs2_inode_lock_tracker()
    this function is used to prevent the inode from being modified by another
    nodes in the cluster
 inode_unlock()

> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08  0:39         ` alex chen
@ 2017-12-08  2:26           ` Ben Hutchings
  2017-12-08  4:03             ` alex chen
  0 siblings, 1 reply; 33+ messages in thread
From: Ben Hutchings @ 2017-12-08  2:26 UTC (permalink / raw)
  To: alex chen
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML

On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
> 
> On 2017/12/8 2:25, Ben Hutchings wrote:
> > On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> > > Hi Ben,
> > > 
> > > Thanks for your reply.
> > > 
> > > On 2017/12/5 23:49, Ben Hutchings wrote:
> > > > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > > > 4.4-stable review patch.  If anyone has any objections,
> > > > > please let me know.
> > > > > 
> > > > > ------------------
> > > > > 
> > > > > From: alex chen <alex.chen@huawei.com>
> > > > > 
> > > > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > > > > 
> > > > > we should wait dio requests to finish before inode lock in
> > > > > ocfs2_setattr(), otherwise the following deadlock will
> > > > > happen:
> > > > 
> > > > [...]
> > > > 
> > > > I looked at the kernel-doc for inode_dio_wait():
> > > > 
> > > > /**
> > > >  * inode_dio_wait - wait for outstanding DIO requests to finish
> > > >  * @inode: inode to wait for
> > > >  *
> > > >  * Waits for all pending direct I/O requests to finish so that we can
> > > >  * proceed with a truncate or equivalent operation.
> > > >  *
> > > >  * Must be called under a lock that serializes taking new references
> > > >  * to i_dio_count, usually by inode->i_mutex.
> > > >  */
> > > > 
> > > > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > > > what prevents another task adding a new dio request immediately
> > > > afterward?
> > > > 
> > > 
> > > In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> > > prevent another bio to be issued from this node.
> > 
> > [...]
> > 
> > Yes but there seems to be a race condition - after the call to
> > inode_dio_wait() and before the call to inode_lock(), another dio
> > request can be added.

Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock(). 
However:

> In the truncating file situation, the lock order is as follow:
> do_truncate()
>  inode_lock()
>  notify_change()
>   ocfs2_setattr()
>    inode_dio_wait()
>     --here it is under the protect of inode_lock(), so another dio requests
>       from another process will not be added.

only DIO reads seem to take the inode lock.

Ben.

>    ocfs2_rw_lock()
>    ocfs2_inode_lock_tracker()
>     this function is used to prevent the inode from being modified by another
>     nodes in the cluster
>  inode_unlock()
> 
> > 
> > Ben.
> > 
> 
> 
-- 
Ben Hutchings
Software Developer, Codethink Ltd.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08  2:26           ` Ben Hutchings
@ 2017-12-08  4:03             ` alex chen
  2017-12-08  5:36               ` Ben Hutchings
  0 siblings, 1 reply; 33+ messages in thread
From: alex chen @ 2017-12-08  4:03 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML



On 2017/12/8 10:26, Ben Hutchings wrote:
> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>
>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>> Hi Ben,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>> 4.4-stable review patch.  If anyone has any objections,
>>>>>> please let me know.
>>>>>>
>>>>>> ------------------
>>>>>>
>>>>>> From: alex chen <alex.chen@huawei.com>
>>>>>>
>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>
>>>>>> we should wait dio requests to finish before inode lock in
>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>> happen:
>>>>>
>>>>> [...]
>>>>>
>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>
>>>>> /**
>>>>>  * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>  * @inode: inode to wait for
>>>>>  *
>>>>>  * Waits for all pending direct I/O requests to finish so that we can
>>>>>  * proceed with a truncate or equivalent operation.
>>>>>  *
>>>>>  * Must be called under a lock that serializes taking new references
>>>>>  * to i_dio_count, usually by inode->i_mutex.
>>>>>  */
>>>>>
>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>> what prevents another task adding a new dio request immediately
>>>>> afterward?
>>>>>
>>>>
>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>> prevent another bio to be issued from this node.
>>>
>>> [...]
>>>
>>> Yes but there seems to be a race condition - after the call to
>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>> request can be added.
> 
> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock(). 
> However:
> 
>> In the truncating file situation, the lock order is as follow:
>> do_truncate()
>>  inode_lock()
>>  notify_change()
>>   ocfs2_setattr()
>>    inode_dio_wait()
>>     --here it is under the protect of inode_lock(), so another dio requests
>>       from another process will not be added.
> 
> only DIO reads seem to take the inode lock.
> 

I do not clearly understand what you mean.
The inode_lock() will be called in ocfs2_file_write_iter().
You mean only DIO writes seem to take the inode_lock()?

BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
and didn't adjust the order of inode_lock() and inode_dio_wait().

Thanks,
Alex

> Ben.
> 
>>    ocfs2_rw_lock()
>>    ocfs2_inode_lock_tracker()
>>     this function is used to prevent the inode from being modified by another
>>     nodes in the cluster
>>  inode_unlock()
>>
>>>
>>> Ben.
>>>
>>
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08  4:03             ` alex chen
@ 2017-12-08  5:36               ` Ben Hutchings
  2017-12-08  6:16                 ` alex chen
  0 siblings, 1 reply; 33+ messages in thread
From: Ben Hutchings @ 2017-12-08  5:36 UTC (permalink / raw)
  To: alex chen
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML

On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
> 
> On 2017/12/8 10:26, Ben Hutchings wrote:
> > On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
> > > 
> > > On 2017/12/8 2:25, Ben Hutchings wrote:
> > > > On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> > > > > Hi Ben,
> > > > > 
> > > > > Thanks for your reply.
> > > > > 
> > > > > On 2017/12/5 23:49, Ben Hutchings wrote:
> > > > > > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > > > > > 4.4-stable review patch.  If anyone has any objections,
> > > > > > > please let me know.
> > > > > > > 
> > > > > > > ------------------
> > > > > > > 
> > > > > > > From: alex chen <alex.chen@huawei.com>
> > > > > > > 
> > > > > > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > > > > > > 
> > > > > > > we should wait dio requests to finish before inode lock in
> > > > > > > ocfs2_setattr(), otherwise the following deadlock will
> > > > > > > happen:
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > I looked at the kernel-doc for inode_dio_wait():
> > > > > > 
> > > > > > /**
> > > > > >  * inode_dio_wait - wait for outstanding DIO requests to finish
> > > > > >  * @inode: inode to wait for
> > > > > >  *
> > > > > >  * Waits for all pending direct I/O requests to finish so that we can
> > > > > >  * proceed with a truncate or equivalent operation.
> > > > > >  *
> > > > > >  * Must be called under a lock that serializes taking new references
> > > > > >  * to i_dio_count, usually by inode->i_mutex.
> > > > > >  */
> > > > > > 
> > > > > > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > > > > > what prevents another task adding a new dio request immediately
> > > > > > afterward?
> > > > > > 
> > > > > 
> > > > > In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> > > > > prevent another bio to be issued from this node.
> > > > 
> > > > [...]
> > > > 
> > > > Yes but there seems to be a race condition - after the call to
> > > > inode_dio_wait() and before the call to inode_lock(), another dio
> > > > request can be added.
> > 
> > Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock(). 
> > However:
> > 
> > > In the truncating file situation, the lock order is as follow:
> > > do_truncate()
> > >  inode_lock()
> > >  notify_change()
> > >   ocfs2_setattr()
> > >    inode_dio_wait()
> > >     --here it is under the protect of inode_lock(), so another dio requests
> > >       from another process will not be added.
> > 
> > only DIO reads seem to take the inode lock.
> > 
> 
> I do not clearly understand what you mean.
> The inode_lock() will be called in ocfs2_file_write_iter().

Oh I see.  I didn't realise that was part of the call chain.

> You mean only DIO writes seem to take the inode_lock()?

I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
reads - but ocfs2 doesn't set the flag for that.  Maybe that's OK?

> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
> and didn't adjust the order of inode_lock() and inode_dio_wait().

Right.  I think you've convinced me to stop worrying about this.

Ben.

-- 
Ben Hutchings
Software Developer, Codethink Ltd.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08  5:36               ` Ben Hutchings
@ 2017-12-08  6:16                 ` alex chen
  2017-12-08 10:04                   ` Changwei Ge
  0 siblings, 1 reply; 33+ messages in thread
From: alex chen @ 2017-12-08  6:16 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Changwei Ge,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML



On 2017/12/8 13:36, Ben Hutchings wrote:
> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>
>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>
>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>> Hi Ben,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>> 4.4-stable review patch.  If anyone has any objections,
>>>>>>>> please let me know.
>>>>>>>>
>>>>>>>> ------------------
>>>>>>>>
>>>>>>>> From: alex chen <alex.chen@huawei.com>
>>>>>>>>
>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>
>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>> happen:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>
>>>>>>> /**
>>>>>>>  * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>>  * @inode: inode to wait for
>>>>>>>  *
>>>>>>>  * Waits for all pending direct I/O requests to finish so that we can
>>>>>>>  * proceed with a truncate or equivalent operation.
>>>>>>>  *
>>>>>>>  * Must be called under a lock that serializes taking new references
>>>>>>>  * to i_dio_count, usually by inode->i_mutex.
>>>>>>>  */
>>>>>>>
>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>> afterward?
>>>>>>>
>>>>>>
>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>> prevent another bio to be issued from this node.
>>>>>
>>>>> [...]
>>>>>
>>>>> Yes but there seems to be a race condition - after the call to
>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>> request can be added.
>>>
>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock(). 
>>> However:
>>>
>>>> In the truncating file situation, the lock order is as follow:
>>>> do_truncate()
>>>>  inode_lock()
>>>>  notify_change()
>>>>   ocfs2_setattr()
>>>>    inode_dio_wait()
>>>>     --here it is under the protect of inode_lock(), so another dio requests
>>>>       from another process will not be added.
>>>
>>> only DIO reads seem to take the inode lock.
>>>
>>
>> I do not clearly understand what you mean.
>> The inode_lock() will be called in ocfs2_file_write_iter().
> 
> Oh I see.  I didn't realise that was part of the call chain.
> 
>> You mean only DIO writes seem to take the inode_lock()?
> 
> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
> reads - but ocfs2 doesn't set the flag for that.  Maybe that's OK?

I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().

Thanks,
Alex
> 
>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>> and didn't adjust the order of inode_lock() and inode_dio_wait().
> 
> Right.  I think you've convinced me to stop worrying about this.
> 
> Ben.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08  6:16                 ` alex chen
@ 2017-12-08 10:04                   ` Changwei Ge
  2017-12-12  1:34                     ` alex chen
  0 siblings, 1 reply; 33+ messages in thread
From: Changwei Ge @ 2017-12-08 10:04 UTC (permalink / raw)
  To: alex chen, Ben Hutchings
  Cc: Greg Kroah-Hartman, piaojun, Joseph Qi, stable, Mark Fasheh,
	Joel Becker, Junxiao Bi, Andrew Morton, Linus Torvalds, LKML

On 2017/12/8 14:21, alex chen wrote:
> 
> 
> On 2017/12/8 13:36, Ben Hutchings wrote:
>> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>>
>>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>>
>>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>>> Hi Ben,
>>>>>>>
>>>>>>> Thanks for your reply.
>>>>>>>
>>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>>> 4.4-stable review patch.  If anyone has any objections,
>>>>>>>>> please let me know.
>>>>>>>>>
>>>>>>>>> ------------------
>>>>>>>>>
>>>>>>>>> From: alex chen <alex.chen@huawei.com>
>>>>>>>>>
>>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>>
>>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>>> happen:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>>
>>>>>>>> /**
>>>>>>>>   * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>>>   * @inode: inode to wait for
>>>>>>>>   *
>>>>>>>>   * Waits for all pending direct I/O requests to finish so that we can
>>>>>>>>   * proceed with a truncate or equivalent operation.
>>>>>>>>   *
>>>>>>>>   * Must be called under a lock that serializes taking new references
>>>>>>>>   * to i_dio_count, usually by inode->i_mutex.
>>>>>>>>   */
>>>>>>>>
>>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>>> afterward?
>>>>>>>>
>>>>>>>
>>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>>> prevent another bio to be issued from this node.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Yes but there seems to be a race condition - after the call to
>>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>>> request can be added.
>>>>
>>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
>>>> However:
>>>>
>>>>> In the truncating file situation, the lock order is as follow:
>>>>> do_truncate()
>>>>>   inode_lock()
>>>>>   notify_change()
>>>>>    ocfs2_setattr()
>>>>>     inode_dio_wait()
>>>>>      --here it is under the protect of inode_lock(), so another dio requests
>>>>>        from another process will not be added.
>>>>
>>>> only DIO reads seem to take the inode lock.
>>>>
>>>
>>> I do not clearly understand what you mean.
>>> The inode_lock() will be called in ocfs2_file_write_iter().
>>
>> Oh I see.  I didn't realise that was part of the call chain.
>>
>>> You mean only DIO writes seem to take the inode_lock()?
>>
>> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
>> reads - but ocfs2 doesn't set the flag for that.  Maybe that's OK?
> 
> I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().

So this is actually another problem which was NOT introduced by Alex's 
patch, right?
Ocfs2 perhaps should depend on vfs to flush page cache to get rid of 
stale data on disk.

Thank,
Changwei

> 
> Thanks,
> Alex
>>
>>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>>> and didn't adjust the order of inode_lock() and inode_dio_wait().
>>
>> Right.  I think you've convinced me to stop worrying about this.
>>
>> Ben.
>>
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()
  2017-12-08 10:04                   ` Changwei Ge
@ 2017-12-12  1:34                     ` alex chen
  0 siblings, 0 replies; 33+ messages in thread
From: alex chen @ 2017-12-12  1:34 UTC (permalink / raw)
  To: Changwei Ge
  Cc: Ben Hutchings, Greg Kroah-Hartman, piaojun, Joseph Qi, stable,
	Mark Fasheh, Joel Becker, Junxiao Bi, Andrew Morton,
	Linus Torvalds, LKML



On 2017/12/8 18:04, Changwei Ge wrote:
> On 2017/12/8 14:21, alex chen wrote:
>>
>>
>> On 2017/12/8 13:36, Ben Hutchings wrote:
>>> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>>>
>>>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>>>
>>>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>>>> Hi Ben,
>>>>>>>>
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>>>> 4.4-stable review patch.  If anyone has any objections,
>>>>>>>>>> please let me know.
>>>>>>>>>>
>>>>>>>>>> ------------------
>>>>>>>>>>
>>>>>>>>>> From: alex chen <alex.chen@huawei.com>
>>>>>>>>>>
>>>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>>>
>>>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>>>> happen:
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>>   * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>>>>   * @inode: inode to wait for
>>>>>>>>>   *
>>>>>>>>>   * Waits for all pending direct I/O requests to finish so that we can
>>>>>>>>>   * proceed with a truncate or equivalent operation.
>>>>>>>>>   *
>>>>>>>>>   * Must be called under a lock that serializes taking new references
>>>>>>>>>   * to i_dio_count, usually by inode->i_mutex.
>>>>>>>>>   */
>>>>>>>>>
>>>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>>>> afterward?
>>>>>>>>>
>>>>>>>>
>>>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>>>> prevent another bio to be issued from this node.
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Yes but there seems to be a race condition - after the call to
>>>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>>>> request can be added.
>>>>>
>>>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
>>>>> However:
>>>>>
>>>>>> In the truncating file situation, the lock order is as follow:
>>>>>> do_truncate()
>>>>>>   inode_lock()
>>>>>>   notify_change()
>>>>>>    ocfs2_setattr()
>>>>>>     inode_dio_wait()
>>>>>>      --here it is under the protect of inode_lock(), so another dio requests
>>>>>>        from another process will not be added.
>>>>>
>>>>> only DIO reads seem to take the inode lock.
>>>>>
>>>>
>>>> I do not clearly understand what you mean.
>>>> The inode_lock() will be called in ocfs2_file_write_iter().
>>>
>>> Oh I see.  I didn't realise that was part of the call chain.
>>>
>>>> You mean only DIO writes seem to take the inode_lock()?
>>>
>>> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
>>> reads - but ocfs2 doesn't set the flag for that.  Maybe that's OK?
>>
>> I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().
> 
> So this is actually another problem which was NOT introduced by Alex's 
> patch, right?
> Ocfs2 perhaps should depend on vfs to flush page cache to get rid of 
> stale data on disk.

Yes, I think we should set the DIO_LOCKING flag to synchronize direct I/O reads/writes
and truncate.
The following patch is being tested in my local environment.

Signed-off-by: Alex Chen <alex.chen@huawei.com>
---
 fs/ocfs2/aops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7e1659d..d10632f 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2491,7 +2491,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)

 	return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
 				    iter, get_block,
-				    ocfs2_dio_end_io, NULL, 0);
+				    ocfs2_dio_end_io, NULL, DIO_LOCKING);
 }

 const struct address_space_operations ocfs2_aops = {
-- 
1.9.5.msysgit.1

> 
> Thank,
> Changwei
> 
>>
>> Thanks,
>> Alex
>>>
>>>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>>>> and didn't adjust the order of inode_lock() and inode_dio_wait().
>>>
>>> Right.  I think you've convinced me to stop worrying about this.
>>>
>>> Ben.
>>>
>>
>>
> 
> 
> .
> 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-12-12  1:34 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22 10:11 [PATCH 4.4 00/16] 4.4.101-stable review Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 01/16] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 02/16] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 03/16] bonding: discard lowest hash bit for 802.3ad layer3+4 Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 04/16] vlan: fix a use-after-free in vlan_device_event() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 05/16] af_netlink: ensure that NLMSG_DONE never fails in dumps Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.4 06/16] sctp: do not peel off an assoc from one netns to another one Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 07/16] fealnx: Fix building error on MIPS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 08/16] net/sctp: Always set scope_id in sctp_inet6_skb_msgname Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 09/16] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 10/16] serial: omap: Fix EFR write on RTS deassertion Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 11/16] arm64: fix dump_instr when PAN and UAO are in use Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 12/16] [PATCH-stable] nvme: Fix memory order on async queue deletion Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
2017-12-05 15:49   ` Ben Hutchings
2017-12-06  1:02     ` alex chen
2017-12-06 16:36       ` Greg Kroah-Hartman
2017-12-07 18:25       ` Ben Hutchings
2017-12-08  0:39         ` alex chen
2017-12-08  2:26           ` Ben Hutchings
2017-12-08  4:03             ` alex chen
2017-12-08  5:36               ` Ben Hutchings
2017-12-08  6:16                 ` alex chen
2017-12-08 10:04                   ` Changwei Ge
2017-12-12  1:34                     ` alex chen
2017-11-22 10:12 ` [PATCH 4.4 14/16] ipmi: fix unsigned long underflow Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 15/16] mm/page_alloc.c: broken deferred calculation Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.4 16/16] coda: fix kernel memory exposure attempt in fsync Greg Kroah-Hartman
2017-11-22 15:29 ` [PATCH 4.4 00/16] 4.4.101-stable review Nathan Chancellor
2017-11-22 17:05   ` Greg Kroah-Hartman
2017-11-22 17:38     ` Nathan Chancellor
2017-11-22 21:32 ` Guenter Roeck
2017-11-23 14:28 ` Naresh Kamboju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.