All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] net subsystem refcount conversions
@ 2017-03-16 15:28 ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova

This series, for core network subsystem components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.
These patches contain only generic net pieces. Other changes will be sent separately.

The patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

If there are no objections to the patches, please merge them via respective trees.

Elena Reshetova (17):
  net: convert neighbour.refcnt from atomic_t to refcount_t
  net: convert neigh_params.refcnt from atomic_t to refcount_t
  net: convert nf_bridge_info.use from atomic_t to refcount_t
  net: convert sk_buff.users from atomic_t to refcount_t
  net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
  net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
  net: convert sock.sk_refcnt from atomic_t to refcount_t
  net: convert sk_filter.refcnt from atomic_t to refcount_t
  net: convert ip_mc_list.refcnt from atomic_t to refcount_t
  net: convert in_device.refcnt from atomic_t to refcount_t
  net: convert netpoll_info.refcnt from atomic_t to refcount_t
  net: convert unix_address.refcnt from atomic_t to refcount_t
  net: convert fib_rule.refcnt from atomic_t to refcount_t
  net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
  net: convert net.passive from atomic_t to refcount_t
  net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
  net: convert packet_fanout.sk_ref from atomic_t to refcount_t

 crypto/algif_aead.c                  |  2 +-
 drivers/atm/fore200e.c               | 12 +-----------
 drivers/atm/he.c                     |  2 +-
 drivers/atm/idt77252.c               |  4 ++--
 drivers/infiniband/hw/nes/nes_cm.c   |  4 ++--
 drivers/isdn/mISDN/socket.c          |  2 +-
 drivers/net/rionet.c                 |  2 +-
 drivers/s390/net/ctcm_main.c         | 26 ++++++++++++------------
 drivers/s390/net/netiucv.c           | 10 +++++-----
 drivers/s390/net/qeth_core_main.c    |  4 ++--
 drivers/scsi/cxgbi/libcxgbi.h        |  2 +-
 include/linux/atmdev.h               |  2 +-
 include/linux/filter.h               |  3 ++-
 include/linux/igmp.h                 |  3 ++-
 include/linux/inetdevice.h           | 11 ++++++-----
 include/linux/netpoll.h              |  3 ++-
 include/linux/skbuff.h               | 16 +++++++--------
 include/net/af_unix.h                |  3 ++-
 include/net/arp.h                    |  2 +-
 include/net/fib_rules.h              |  7 ++++---
 include/net/inet_frag.h              |  4 ++--
 include/net/inet_hashtables.h        |  4 ++--
 include/net/ndisc.h                  |  2 +-
 include/net/neighbour.h              | 15 +++++++-------
 include/net/net_namespace.h          |  3 ++-
 include/net/netfilter/br_netfilter.h |  2 +-
 include/net/netlabel.h               |  8 ++++----
 include/net/request_sock.h           |  9 +++++----
 include/net/sock.h                   | 25 ++++++++++++------------
 net/atm/br2684.c                     |  2 +-
 net/atm/clip.c                       |  8 ++++----
 net/atm/common.c                     | 10 +++++-----
 net/atm/lec.c                        |  4 ++--
 net/atm/mpc.c                        |  4 ++--
 net/atm/pppoatm.c                    |  2 +-
 net/atm/proc.c                       |  2 +-
 net/atm/raw.c                        |  2 +-
 net/atm/signaling.c                  |  2 +-
 net/bluetooth/af_bluetooth.c         |  2 +-
 net/bluetooth/rfcomm/sock.c          |  2 +-
 net/bridge/br_netfilter_hooks.c      |  4 ++--
 net/caif/caif_socket.c               |  2 +-
 net/core/datagram.c                  | 10 +++++-----
 net/core/dev.c                       | 10 +++++-----
 net/core/fib_rules.c                 |  4 ++--
 net/core/filter.c                    |  7 ++++---
 net/core/neighbour.c                 | 22 ++++++++++-----------
 net/core/net-sysfs.c                 |  2 +-
 net/core/net_namespace.c             |  4 ++--
 net/core/netpoll.c                   | 10 +++++-----
 net/core/pktgen.c                    | 16 +++++++--------
 net/core/rtnetlink.c                 |  2 +-
 net/core/skbuff.c                    | 38 ++++++++++++++++++------------------
 net/core/sock.c                      | 30 ++++++++++++++--------------
 net/dccp/ipv6.c                      |  2 +-
 net/decnet/dn_neigh.c                |  2 +-
 net/ipv4/af_inet.c                   |  2 +-
 net/ipv4/cipso_ipv4.c                |  4 ++--
 net/ipv4/devinet.c                   |  2 +-
 net/ipv4/esp4.c                      |  2 +-
 net/ipv4/igmp.c                      | 10 +++++-----
 net/ipv4/inet_connection_sock.c      |  2 +-
 net/ipv4/inet_fragment.c             | 14 ++++++-------
 net/ipv4/inet_hashtables.c           |  4 ++--
 net/ipv4/inet_timewait_sock.c        |  8 ++++----
 net/ipv4/ip_fragment.c               |  2 +-
 net/ipv4/ip_output.c                 |  6 +++---
 net/ipv4/ping.c                      |  4 ++--
 net/ipv4/raw.c                       |  2 +-
 net/ipv4/syncookies.c                |  2 +-
 net/ipv4/tcp.c                       |  4 ++--
 net/ipv4/tcp_fastopen.c              |  2 +-
 net/ipv4/tcp_ipv4.c                  |  4 ++--
 net/ipv4/tcp_offload.c               |  2 +-
 net/ipv4/tcp_output.c                | 13 ++++++------
 net/ipv4/udp.c                       |  6 +++---
 net/ipv4/udp_diag.c                  |  4 ++--
 net/ipv6/calipso.c                   |  4 ++--
 net/ipv6/datagram.c                  |  2 +-
 net/ipv6/esp6.c                      |  2 +-
 net/ipv6/inet6_hashtables.c          |  4 ++--
 net/ipv6/ip6_output.c                |  4 ++--
 net/ipv6/syncookies.c                |  2 +-
 net/ipv6/tcp_ipv6.c                  |  6 +++---
 net/ipv6/udp.c                       |  2 +-
 net/kcm/kcmproc.c                    |  2 +-
 net/key/af_key.c                     |  8 ++++----
 net/l2tp/l2tp_debugfs.c              |  3 +--
 net/llc/llc_conn.c                   |  8 ++++----
 net/llc/llc_sap.c                    |  2 +-
 net/netfilter/xt_TPROXY.c            |  4 ++--
 net/netlink/af_netlink.c             | 14 ++++++-------
 net/packet/af_packet.c               | 14 ++++++-------
 net/packet/internal.h                |  4 +++-
 net/phonet/socket.c                  |  4 ++--
 net/rds/tcp_send.c                   |  2 +-
 net/rxrpc/af_rxrpc.c                 |  6 +++---
 net/rxrpc/skbuff.c                   | 12 ++++++------
 net/sched/em_meta.c                  |  2 +-
 net/sched/sch_atm.c                  |  2 +-
 net/sctp/output.c                    |  2 +-
 net/sctp/outqueue.c                  |  2 +-
 net/sctp/proc.c                      |  2 +-
 net/sctp/socket.c                    |  6 +++---
 net/tipc/socket.c                    |  2 +-
 net/unix/af_unix.c                   | 16 +++++++--------
 106 files changed, 320 insertions(+), 319 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 00/17] net subsystem refcount conversions
@ 2017-03-16 15:28 ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris, kuznet, kaber,
	Elena Reshetova

This series, for core network subsystem components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.
These patches contain only generic net pieces. Other changes will be sent separately.

The patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

If there are no objections to the patches, please merge them via respective trees.

Elena Reshetova (17):
  net: convert neighbour.refcnt from atomic_t to refcount_t
  net: convert neigh_params.refcnt from atomic_t to refcount_t
  net: convert nf_bridge_info.use from atomic_t to refcount_t
  net: convert sk_buff.users from atomic_t to refcount_t
  net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
  net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
  net: convert sock.sk_refcnt from atomic_t to refcount_t
  net: convert sk_filter.refcnt from atomic_t to refcount_t
  net: convert ip_mc_list.refcnt from atomic_t to refcount_t
  net: convert in_device.refcnt from atomic_t to refcount_t
  net: convert netpoll_info.refcnt from atomic_t to refcount_t
  net: convert unix_address.refcnt from atomic_t to refcount_t
  net: convert fib_rule.refcnt from atomic_t to refcount_t
  net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
  net: convert net.passive from atomic_t to refcount_t
  net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
  net: convert packet_fanout.sk_ref from atomic_t to refcount_t

 crypto/algif_aead.c                  |  2 +-
 drivers/atm/fore200e.c               | 12 +-----------
 drivers/atm/he.c                     |  2 +-
 drivers/atm/idt77252.c               |  4 ++--
 drivers/infiniband/hw/nes/nes_cm.c   |  4 ++--
 drivers/isdn/mISDN/socket.c          |  2 +-
 drivers/net/rionet.c                 |  2 +-
 drivers/s390/net/ctcm_main.c         | 26 ++++++++++++------------
 drivers/s390/net/netiucv.c           | 10 +++++-----
 drivers/s390/net/qeth_core_main.c    |  4 ++--
 drivers/scsi/cxgbi/libcxgbi.h        |  2 +-
 include/linux/atmdev.h               |  2 +-
 include/linux/filter.h               |  3 ++-
 include/linux/igmp.h                 |  3 ++-
 include/linux/inetdevice.h           | 11 ++++++-----
 include/linux/netpoll.h              |  3 ++-
 include/linux/skbuff.h               | 16 +++++++--------
 include/net/af_unix.h                |  3 ++-
 include/net/arp.h                    |  2 +-
 include/net/fib_rules.h              |  7 ++++---
 include/net/inet_frag.h              |  4 ++--
 include/net/inet_hashtables.h        |  4 ++--
 include/net/ndisc.h                  |  2 +-
 include/net/neighbour.h              | 15 +++++++-------
 include/net/net_namespace.h          |  3 ++-
 include/net/netfilter/br_netfilter.h |  2 +-
 include/net/netlabel.h               |  8 ++++----
 include/net/request_sock.h           |  9 +++++----
 include/net/sock.h                   | 25 ++++++++++++------------
 net/atm/br2684.c                     |  2 +-
 net/atm/clip.c                       |  8 ++++----
 net/atm/common.c                     | 10 +++++-----
 net/atm/lec.c                        |  4 ++--
 net/atm/mpc.c                        |  4 ++--
 net/atm/pppoatm.c                    |  2 +-
 net/atm/proc.c                       |  2 +-
 net/atm/raw.c                        |  2 +-
 net/atm/signaling.c                  |  2 +-
 net/bluetooth/af_bluetooth.c         |  2 +-
 net/bluetooth/rfcomm/sock.c          |  2 +-
 net/bridge/br_netfilter_hooks.c      |  4 ++--
 net/caif/caif_socket.c               |  2 +-
 net/core/datagram.c                  | 10 +++++-----
 net/core/dev.c                       | 10 +++++-----
 net/core/fib_rules.c                 |  4 ++--
 net/core/filter.c                    |  7 ++++---
 net/core/neighbour.c                 | 22 ++++++++++-----------
 net/core/net-sysfs.c                 |  2 +-
 net/core/net_namespace.c             |  4 ++--
 net/core/netpoll.c                   | 10 +++++-----
 net/core/pktgen.c                    | 16 +++++++--------
 net/core/rtnetlink.c                 |  2 +-
 net/core/skbuff.c                    | 38 ++++++++++++++++++------------------
 net/core/sock.c                      | 30 ++++++++++++++--------------
 net/dccp/ipv6.c                      |  2 +-
 net/decnet/dn_neigh.c                |  2 +-
 net/ipv4/af_inet.c                   |  2 +-
 net/ipv4/cipso_ipv4.c                |  4 ++--
 net/ipv4/devinet.c                   |  2 +-
 net/ipv4/esp4.c                      |  2 +-
 net/ipv4/igmp.c                      | 10 +++++-----
 net/ipv4/inet_connection_sock.c      |  2 +-
 net/ipv4/inet_fragment.c             | 14 ++++++-------
 net/ipv4/inet_hashtables.c           |  4 ++--
 net/ipv4/inet_timewait_sock.c        |  8 ++++----
 net/ipv4/ip_fragment.c               |  2 +-
 net/ipv4/ip_output.c                 |  6 +++---
 net/ipv4/ping.c                      |  4 ++--
 net/ipv4/raw.c                       |  2 +-
 net/ipv4/syncookies.c                |  2 +-
 net/ipv4/tcp.c                       |  4 ++--
 net/ipv4/tcp_fastopen.c              |  2 +-
 net/ipv4/tcp_ipv4.c                  |  4 ++--
 net/ipv4/tcp_offload.c               |  2 +-
 net/ipv4/tcp_output.c                | 13 ++++++------
 net/ipv4/udp.c                       |  6 +++---
 net/ipv4/udp_diag.c                  |  4 ++--
 net/ipv6/calipso.c                   |  4 ++--
 net/ipv6/datagram.c                  |  2 +-
 net/ipv6/esp6.c                      |  2 +-
 net/ipv6/inet6_hashtables.c          |  4 ++--
 net/ipv6/ip6_output.c                |  4 ++--
 net/ipv6/syncookies.c                |  2 +-
 net/ipv6/tcp_ipv6.c                  |  6 +++---
 net/ipv6/udp.c                       |  2 +-
 net/kcm/kcmproc.c                    |  2 +-
 net/key/af_key.c                     |  8 ++++----
 net/l2tp/l2tp_debugfs.c              |  3 +--
 net/llc/llc_conn.c                   |  8 ++++----
 net/llc/llc_sap.c                    |  2 +-
 net/netfilter/xt_TPROXY.c            |  4 ++--
 net/netlink/af_netlink.c             | 14 ++++++-------
 net/packet/af_packet.c               | 14 ++++++-------
 net/packet/internal.h                |  4 +++-
 net/phonet/socket.c                  |  4 ++--
 net/rds/tcp_send.c                   |  2 +-
 net/rxrpc/af_rxrpc.c                 |  6 +++---
 net/rxrpc/skbuff.c                   | 12 ++++++------
 net/sched/em_meta.c                  |  2 +-
 net/sched/sch_atm.c                  |  2 +-
 net/sctp/output.c                    |  2 +-
 net/sctp/outqueue.c                  |  2 +-
 net/sctp/proc.c                      |  2 +-
 net/sctp/socket.c                    |  6 +++---
 net/tipc/socket.c                    |  2 +-
 net/unix/af_unix.c                   | 16 +++++++--------
 106 files changed, 320 insertions(+), 319 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 01/17] net: convert neighbour.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/arp.h       |  2 +-
 include/net/ndisc.h     |  2 +-
 include/net/neighbour.h |  9 +++++----
 net/atm/clip.c          |  6 +++---
 net/core/neighbour.c    | 14 +++++++-------
 net/decnet/dn_neigh.c   |  2 +-
 6 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 65619a2..17d90e4 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -28,7 +28,7 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 
 	rcu_read_lock_bh();
 	n = __ipv4_neigh_lookup_noref(dev, key);
-	if (n && !atomic_inc_not_zero(&n->refcnt))
+	if (n && !refcount_inc_not_zero(&n->refcnt))
 		n = NULL;
 	rcu_read_unlock_bh();
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 8a02146..54062c1 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -384,7 +384,7 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 
 	rcu_read_lock_bh();
 	n = __ipv6_neigh_lookup_noref(dev, pkey);
-	if (n && !atomic_inc_not_zero(&n->refcnt))
+	if (n && !refcount_inc_not_zero(&n->refcnt))
 		n = NULL;
 	rcu_read_unlock_bh();
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 5ebf694..9a66cfc9 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -17,6 +17,7 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/rcupdate.h>
@@ -137,7 +138,7 @@ struct neighbour {
 	unsigned long		confirmed;
 	unsigned long		updated;
 	rwlock_t		lock;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	struct sk_buff_head	arp_queue;
 	unsigned int		arp_queue_len_bytes;
 	struct timer_list	timer;
@@ -408,18 +409,18 @@ static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 
 static inline void neigh_release(struct neighbour *neigh)
 {
-	if (atomic_dec_and_test(&neigh->refcnt))
+	if (refcount_dec_and_test(&neigh->refcnt))
 		neigh_destroy(neigh);
 }
 
 static inline struct neighbour * neigh_clone(struct neighbour *neigh)
 {
 	if (neigh)
-		atomic_inc(&neigh->refcnt);
+		refcount_inc(&neigh->refcnt);
 	return neigh;
 }
 
-#define neigh_hold(n)	atomic_inc(&(n)->refcnt)
+#define neigh_hold(n)	refcount_inc(&(n)->refcnt)
 
 static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 {
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 53b4ac0..33e0940 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -137,11 +137,11 @@ static int neigh_check_cb(struct neighbour *n)
 	if (entry->vccs || time_before(jiffies, entry->expires))
 		return 0;
 
-	if (atomic_read(&n->refcnt) > 1) {
+	if (refcount_read(&n->refcnt) > 1) {
 		struct sk_buff *skb;
 
 		pr_debug("destruction postponed with ref %d\n",
-			 atomic_read(&n->refcnt));
+			 refcount_read(&n->refcnt));
 
 		while ((skb = skb_dequeue(&n->arp_queue)) != NULL)
 			dev_kfree_skb(skb);
@@ -767,7 +767,7 @@ static void atmarp_info(struct seq_file *seq, struct neighbour *n,
 			seq_printf(seq, "(resolving)\n");
 		else
 			seq_printf(seq, "(expired, ref %d)\n",
-				   atomic_read(&entry->neigh->refcnt));
+				   refcount_read(&entry->neigh->refcnt));
 	} else if (!svc) {
 		seq_printf(seq, "%d.%d.%d\n",
 			   clip_vcc->vcc->dev->number,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e7c12ca..36f8008 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -140,7 +140,7 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 			 * - it is not permanent
 			 */
 			write_lock(&n->lock);
-			if (atomic_read(&n->refcnt) == 1 &&
+			if (refcount_read(&n->refcnt) == 1 &&
 			    !(n->nud_state & NUD_PERMANENT)) {
 				rcu_assign_pointer(*np,
 					rcu_dereference_protected(n->next,
@@ -218,7 +218,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 			neigh_del_timer(n);
 			n->dead = 1;
 
-			if (atomic_read(&n->refcnt) != 1) {
+			if (refcount_read(&n->refcnt) != 1) {
 				/* The most unpleasant situation.
 				   We must destroy neighbour entry,
 				   but someone still uses it.
@@ -299,7 +299,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device
 
 	NEIGH_CACHE_STAT_INC(tbl, allocs);
 	n->tbl		  = tbl;
-	atomic_set(&n->refcnt, 1);
+	refcount_set(&n->refcnt, 1);
 	n->dead		  = 1;
 out:
 	return n;
@@ -408,7 +408,7 @@ struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey,
 	rcu_read_lock_bh();
 	n = __neigh_lookup_noref(tbl, pkey, dev);
 	if (n) {
-		if (!atomic_inc_not_zero(&n->refcnt))
+		if (!refcount_inc_not_zero(&n->refcnt))
 			n = NULL;
 		NEIGH_CACHE_STAT_INC(tbl, hits);
 	}
@@ -437,7 +437,7 @@ struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net,
 	     n = rcu_dereference_bh(n->next)) {
 		if (!memcmp(n->primary_key, pkey, key_len) &&
 		    net_eq(dev_net(n->dev), net)) {
-			if (!atomic_inc_not_zero(&n->refcnt))
+			if (!refcount_inc_not_zero(&n->refcnt))
 				n = NULL;
 			NEIGH_CACHE_STAT_INC(tbl, hits);
 			break;
@@ -785,7 +785,7 @@ static void neigh_periodic_work(struct work_struct *work)
 			if (time_before(n->used, n->confirmed))
 				n->used = n->confirmed;
 
-			if (atomic_read(&n->refcnt) == 1 &&
+			if (refcount_read(&n->refcnt) == 1 &&
 			    (state == NUD_FAILED ||
 			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
 				*np = n->next;
@@ -2183,7 +2183,7 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
 	ci.ndm_used	 = jiffies_to_clock_t(now - neigh->used);
 	ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
 	ci.ndm_updated	 = jiffies_to_clock_t(now - neigh->updated);
-	ci.ndm_refcnt	 = atomic_read(&neigh->refcnt) - 1;
+	ci.ndm_refcnt	 = refcount_read(&neigh->refcnt) - 1;
 	read_unlock_bh(&neigh->lock);
 
 	if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) ||
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 482730c..d8f7b6d 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -559,7 +559,7 @@ static inline void dn_neigh_format_entry(struct seq_file *seq,
 		   (dn->flags&DN_NDFLAG_R2) ? "2" : "-",
 		   (dn->flags&DN_NDFLAG_P3) ? "3" : "-",
 		   dn->n.nud_state,
-		   atomic_read(&dn->n.refcnt),
+		   refcount_read(&dn->n.refcnt),
 		   dn->blksize,
 		   (dn->n.dev) ? dn->n.dev->name : "?");
 	read_unlock(&n->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 01/17] net: convert neighbour.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/arp.h       |  2 +-
 include/net/ndisc.h     |  2 +-
 include/net/neighbour.h |  9 +++++----
 net/atm/clip.c          |  6 +++---
 net/core/neighbour.c    | 14 +++++++-------
 net/decnet/dn_neigh.c   |  2 +-
 6 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 65619a2..17d90e4 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -28,7 +28,7 @@ static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32
 
 	rcu_read_lock_bh();
 	n = __ipv4_neigh_lookup_noref(dev, key);
-	if (n && !atomic_inc_not_zero(&n->refcnt))
+	if (n && !refcount_inc_not_zero(&n->refcnt))
 		n = NULL;
 	rcu_read_unlock_bh();
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 8a02146..54062c1 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -384,7 +384,7 @@ static inline struct neighbour *__ipv6_neigh_lookup(struct net_device *dev, cons
 
 	rcu_read_lock_bh();
 	n = __ipv6_neigh_lookup_noref(dev, pkey);
-	if (n && !atomic_inc_not_zero(&n->refcnt))
+	if (n && !refcount_inc_not_zero(&n->refcnt))
 		n = NULL;
 	rcu_read_unlock_bh();
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 5ebf694..9a66cfc9 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -17,6 +17,7 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/rcupdate.h>
@@ -137,7 +138,7 @@ struct neighbour {
 	unsigned long		confirmed;
 	unsigned long		updated;
 	rwlock_t		lock;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	struct sk_buff_head	arp_queue;
 	unsigned int		arp_queue_len_bytes;
 	struct timer_list	timer;
@@ -408,18 +409,18 @@ static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 
 static inline void neigh_release(struct neighbour *neigh)
 {
-	if (atomic_dec_and_test(&neigh->refcnt))
+	if (refcount_dec_and_test(&neigh->refcnt))
 		neigh_destroy(neigh);
 }
 
 static inline struct neighbour * neigh_clone(struct neighbour *neigh)
 {
 	if (neigh)
-		atomic_inc(&neigh->refcnt);
+		refcount_inc(&neigh->refcnt);
 	return neigh;
 }
 
-#define neigh_hold(n)	atomic_inc(&(n)->refcnt)
+#define neigh_hold(n)	refcount_inc(&(n)->refcnt)
 
 static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 {
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 53b4ac0..33e0940 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -137,11 +137,11 @@ static int neigh_check_cb(struct neighbour *n)
 	if (entry->vccs || time_before(jiffies, entry->expires))
 		return 0;
 
-	if (atomic_read(&n->refcnt) > 1) {
+	if (refcount_read(&n->refcnt) > 1) {
 		struct sk_buff *skb;
 
 		pr_debug("destruction postponed with ref %d\n",
-			 atomic_read(&n->refcnt));
+			 refcount_read(&n->refcnt));
 
 		while ((skb = skb_dequeue(&n->arp_queue)) != NULL)
 			dev_kfree_skb(skb);
@@ -767,7 +767,7 @@ static void atmarp_info(struct seq_file *seq, struct neighbour *n,
 			seq_printf(seq, "(resolving)\n");
 		else
 			seq_printf(seq, "(expired, ref %d)\n",
-				   atomic_read(&entry->neigh->refcnt));
+				   refcount_read(&entry->neigh->refcnt));
 	} else if (!svc) {
 		seq_printf(seq, "%d.%d.%d\n",
 			   clip_vcc->vcc->dev->number,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e7c12ca..36f8008 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -140,7 +140,7 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 			 * - it is not permanent
 			 */
 			write_lock(&n->lock);
-			if (atomic_read(&n->refcnt) == 1 &&
+			if (refcount_read(&n->refcnt) == 1 &&
 			    !(n->nud_state & NUD_PERMANENT)) {
 				rcu_assign_pointer(*np,
 					rcu_dereference_protected(n->next,
@@ -218,7 +218,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 			neigh_del_timer(n);
 			n->dead = 1;
 
-			if (atomic_read(&n->refcnt) != 1) {
+			if (refcount_read(&n->refcnt) != 1) {
 				/* The most unpleasant situation.
 				   We must destroy neighbour entry,
 				   but someone still uses it.
@@ -299,7 +299,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device
 
 	NEIGH_CACHE_STAT_INC(tbl, allocs);
 	n->tbl		  = tbl;
-	atomic_set(&n->refcnt, 1);
+	refcount_set(&n->refcnt, 1);
 	n->dead		  = 1;
 out:
 	return n;
@@ -408,7 +408,7 @@ struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey,
 	rcu_read_lock_bh();
 	n = __neigh_lookup_noref(tbl, pkey, dev);
 	if (n) {
-		if (!atomic_inc_not_zero(&n->refcnt))
+		if (!refcount_inc_not_zero(&n->refcnt))
 			n = NULL;
 		NEIGH_CACHE_STAT_INC(tbl, hits);
 	}
@@ -437,7 +437,7 @@ struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net,
 	     n = rcu_dereference_bh(n->next)) {
 		if (!memcmp(n->primary_key, pkey, key_len) &&
 		    net_eq(dev_net(n->dev), net)) {
-			if (!atomic_inc_not_zero(&n->refcnt))
+			if (!refcount_inc_not_zero(&n->refcnt))
 				n = NULL;
 			NEIGH_CACHE_STAT_INC(tbl, hits);
 			break;
@@ -785,7 +785,7 @@ static void neigh_periodic_work(struct work_struct *work)
 			if (time_before(n->used, n->confirmed))
 				n->used = n->confirmed;
 
-			if (atomic_read(&n->refcnt) == 1 &&
+			if (refcount_read(&n->refcnt) == 1 &&
 			    (state == NUD_FAILED ||
 			     time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
 				*np = n->next;
@@ -2183,7 +2183,7 @@ static int neigh_fill_info(struct sk_buff *skb, struct neighbour *neigh,
 	ci.ndm_used	 = jiffies_to_clock_t(now - neigh->used);
 	ci.ndm_confirmed = jiffies_to_clock_t(now - neigh->confirmed);
 	ci.ndm_updated	 = jiffies_to_clock_t(now - neigh->updated);
-	ci.ndm_refcnt	 = atomic_read(&neigh->refcnt) - 1;
+	ci.ndm_refcnt	 = refcount_read(&neigh->refcnt) - 1;
 	read_unlock_bh(&neigh->lock);
 
 	if (nla_put_u32(skb, NDA_PROBES, atomic_read(&neigh->probes)) ||
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 482730c..d8f7b6d 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -559,7 +559,7 @@ static inline void dn_neigh_format_entry(struct seq_file *seq,
 		   (dn->flags&DN_NDFLAG_R2) ? "2" : "-",
 		   (dn->flags&DN_NDFLAG_P3) ? "3" : "-",
 		   dn->n.nud_state,
-		   atomic_read(&dn->n.refcnt),
+		   refcount_read(&dn->n.refcnt),
 		   dn->blksize,
 		   (dn->n.dev) ? dn->n.dev->name : "?");
 	read_unlock(&n->lock);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 02/17] net: convert neigh_params.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/neighbour.h | 6 +++---
 net/core/neighbour.c    | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 9a66cfc9..d2eab49 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -77,7 +77,7 @@ struct neigh_parms {
 	void	*sysctl_table;
 
 	int dead;
-	atomic_t refcnt;
+	refcount_t refcnt;
 	struct rcu_head rcu_head;
 
 	int	reachable_time;
@@ -394,12 +394,12 @@ void neigh_sysctl_unregister(struct neigh_parms *p);
 
 static inline void __neigh_parms_put(struct neigh_parms *parms)
 {
-	atomic_dec(&parms->refcnt);
+	refcount_dec(&parms->refcnt);
 }
 
 static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 {
-	atomic_inc(&parms->refcnt);
+	refcount_inc(&parms->refcnt);
 	return parms;
 }
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 36f8008..9b38acd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -673,7 +673,7 @@ static void neigh_parms_destroy(struct neigh_parms *parms);
 
 static inline void neigh_parms_put(struct neigh_parms *parms)
 {
-	if (atomic_dec_and_test(&parms->refcnt))
+	if (refcount_dec_and_test(&parms->refcnt))
 		neigh_parms_destroy(parms);
 }
 
@@ -1436,7 +1436,7 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 	p = kmemdup(&tbl->parms, sizeof(*p), GFP_KERNEL);
 	if (p) {
 		p->tbl		  = tbl;
-		atomic_set(&p->refcnt, 1);
+		refcount_set(&p->refcnt, 1);
 		p->reachable_time =
 				neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
 		dev_hold(dev);
@@ -1499,7 +1499,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
 	INIT_LIST_HEAD(&tbl->parms_list);
 	list_add(&tbl->parms.list, &tbl->parms_list);
 	write_pnet(&tbl->parms.net, &init_net);
-	atomic_set(&tbl->parms.refcnt, 1);
+	refcount_set(&tbl->parms.refcnt, 1);
 	tbl->parms.reachable_time =
 			  neigh_rand_reach_time(NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME));
 
@@ -1746,7 +1746,7 @@ static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
 
 	if ((parms->dev &&
 	     nla_put_u32(skb, NDTPA_IFINDEX, parms->dev->ifindex)) ||
-	    nla_put_u32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt)) ||
+	    nla_put_u32(skb, NDTPA_REFCNT, refcount_read(&parms->refcnt)) ||
 	    nla_put_u32(skb, NDTPA_QUEUE_LENBYTES,
 			NEIGH_VAR(parms, QUEUE_LEN_BYTES)) ||
 	    /* approximative value for deprecated QUEUE_LEN (in packets) */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 02/17] net: convert neigh_params.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/neighbour.h | 6 +++---
 net/core/neighbour.c    | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 9a66cfc9..d2eab49 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -77,7 +77,7 @@ struct neigh_parms {
 	void	*sysctl_table;
 
 	int dead;
-	atomic_t refcnt;
+	refcount_t refcnt;
 	struct rcu_head rcu_head;
 
 	int	reachable_time;
@@ -394,12 +394,12 @@ void neigh_sysctl_unregister(struct neigh_parms *p);
 
 static inline void __neigh_parms_put(struct neigh_parms *parms)
 {
-	atomic_dec(&parms->refcnt);
+	refcount_dec(&parms->refcnt);
 }
 
 static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 {
-	atomic_inc(&parms->refcnt);
+	refcount_inc(&parms->refcnt);
 	return parms;
 }
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 36f8008..9b38acd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -673,7 +673,7 @@ static void neigh_parms_destroy(struct neigh_parms *parms);
 
 static inline void neigh_parms_put(struct neigh_parms *parms)
 {
-	if (atomic_dec_and_test(&parms->refcnt))
+	if (refcount_dec_and_test(&parms->refcnt))
 		neigh_parms_destroy(parms);
 }
 
@@ -1436,7 +1436,7 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 	p = kmemdup(&tbl->parms, sizeof(*p), GFP_KERNEL);
 	if (p) {
 		p->tbl		  = tbl;
-		atomic_set(&p->refcnt, 1);
+		refcount_set(&p->refcnt, 1);
 		p->reachable_time =
 				neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
 		dev_hold(dev);
@@ -1499,7 +1499,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
 	INIT_LIST_HEAD(&tbl->parms_list);
 	list_add(&tbl->parms.list, &tbl->parms_list);
 	write_pnet(&tbl->parms.net, &init_net);
-	atomic_set(&tbl->parms.refcnt, 1);
+	refcount_set(&tbl->parms.refcnt, 1);
 	tbl->parms.reachable_time =
 			  neigh_rand_reach_time(NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME));
 
@@ -1746,7 +1746,7 @@ static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
 
 	if ((parms->dev &&
 	     nla_put_u32(skb, NDTPA_IFINDEX, parms->dev->ifindex)) ||
-	    nla_put_u32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt)) ||
+	    nla_put_u32(skb, NDTPA_REFCNT, refcount_read(&parms->refcnt)) ||
 	    nla_put_u32(skb, NDTPA_QUEUE_LENBYTES,
 			NEIGH_VAR(parms, QUEUE_LEN_BYTES)) ||
 	    /* approximative value for deprecated QUEUE_LEN (in packets) */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 03/17] net: convert nf_bridge_info.use from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/skbuff.h               | 6 +++---
 include/net/netfilter/br_netfilter.h | 2 +-
 net/bridge/br_netfilter_hooks.c      | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4bcb75f..957a2b0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -250,7 +250,7 @@ struct nf_conntrack {
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 struct nf_bridge_info {
-	atomic_t		use;
+	refcount_t		use;
 	enum {
 		BRNF_PROTO_UNCHANGED,
 		BRNF_PROTO_8021Q,
@@ -3589,13 +3589,13 @@ static inline void nf_conntrack_get(struct nf_conntrack *nfct)
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
 {
-	if (nf_bridge && atomic_dec_and_test(&nf_bridge->use))
+	if (nf_bridge && refcount_dec_and_test(&nf_bridge->use))
 		kfree(nf_bridge);
 }
 static inline void nf_bridge_get(struct nf_bridge_info *nf_bridge)
 {
 	if (nf_bridge)
-		atomic_inc(&nf_bridge->use);
+		refcount_inc(&nf_bridge->use);
 }
 #endif /* CONFIG_BRIDGE_NETFILTER */
 static inline void nf_reset(struct sk_buff *skb)
diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
index 0b0c35c..925524e 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -8,7 +8,7 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
 	skb->nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
 
 	if (likely(skb->nf_bridge))
-		atomic_set(&(skb->nf_bridge->use), 1);
+		refcount_set(&(skb->nf_bridge->use), 1);
 
 	return skb->nf_bridge;
 }
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 52739e6..ac04135 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -149,12 +149,12 @@ static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
 {
 	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
 
-	if (atomic_read(&nf_bridge->use) > 1) {
+	if (refcount_read(&nf_bridge->use) > 1) {
 		struct nf_bridge_info *tmp = nf_bridge_alloc(skb);
 
 		if (tmp) {
 			memcpy(tmp, nf_bridge, sizeof(struct nf_bridge_info));
-			atomic_set(&tmp->use, 1);
+			refcount_set(&tmp->use, 1);
 		}
 		nf_bridge_put(nf_bridge);
 		nf_bridge = tmp;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 03/17] net: convert nf_bridge_info.use from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/skbuff.h               | 6 +++---
 include/net/netfilter/br_netfilter.h | 2 +-
 net/bridge/br_netfilter_hooks.c      | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4bcb75f..957a2b0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -250,7 +250,7 @@ struct nf_conntrack {
 
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 struct nf_bridge_info {
-	atomic_t		use;
+	refcount_t		use;
 	enum {
 		BRNF_PROTO_UNCHANGED,
 		BRNF_PROTO_8021Q,
@@ -3589,13 +3589,13 @@ static inline void nf_conntrack_get(struct nf_conntrack *nfct)
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
 {
-	if (nf_bridge && atomic_dec_and_test(&nf_bridge->use))
+	if (nf_bridge && refcount_dec_and_test(&nf_bridge->use))
 		kfree(nf_bridge);
 }
 static inline void nf_bridge_get(struct nf_bridge_info *nf_bridge)
 {
 	if (nf_bridge)
-		atomic_inc(&nf_bridge->use);
+		refcount_inc(&nf_bridge->use);
 }
 #endif /* CONFIG_BRIDGE_NETFILTER */
 static inline void nf_reset(struct sk_buff *skb)
diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
index 0b0c35c..925524e 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -8,7 +8,7 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
 	skb->nf_bridge = kzalloc(sizeof(struct nf_bridge_info), GFP_ATOMIC);
 
 	if (likely(skb->nf_bridge))
-		atomic_set(&(skb->nf_bridge->use), 1);
+		refcount_set(&(skb->nf_bridge->use), 1);
 
 	return skb->nf_bridge;
 }
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 52739e6..ac04135 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -149,12 +149,12 @@ static inline struct nf_bridge_info *nf_bridge_unshare(struct sk_buff *skb)
 {
 	struct nf_bridge_info *nf_bridge = skb->nf_bridge;
 
-	if (atomic_read(&nf_bridge->use) > 1) {
+	if (refcount_read(&nf_bridge->use) > 1) {
 		struct nf_bridge_info *tmp = nf_bridge_alloc(skb);
 
 		if (tmp) {
 			memcpy(tmp, nf_bridge, sizeof(struct nf_bridge_info));
-			atomic_set(&tmp->use, 1);
+			refcount_set(&tmp->use, 1);
 		}
 		nf_bridge_put(nf_bridge);
 		nf_bridge = tmp;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 04/17] net: convert sk_buff.users from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 drivers/infiniband/hw/nes/nes_cm.c |  4 ++--
 drivers/isdn/mISDN/socket.c        |  2 +-
 drivers/net/rionet.c               |  2 +-
 drivers/s390/net/ctcm_main.c       | 26 +++++++++++++-------------
 drivers/s390/net/netiucv.c         | 10 +++++-----
 drivers/s390/net/qeth_core_main.c  |  4 ++--
 drivers/scsi/cxgbi/libcxgbi.h      |  2 +-
 include/linux/skbuff.h             |  6 +++---
 net/core/datagram.c                |  8 ++++----
 net/core/dev.c                     | 10 +++++-----
 net/core/netpoll.c                 |  4 ++--
 net/core/pktgen.c                  | 16 ++++++++--------
 net/core/rtnetlink.c               |  2 +-
 net/core/skbuff.c                  | 20 ++++++++++----------
 net/dccp/ipv6.c                    |  2 +-
 net/ipv6/syncookies.c              |  2 +-
 net/ipv6/tcp_ipv6.c                |  2 +-
 net/key/af_key.c                   |  4 ++--
 net/netlink/af_netlink.c           |  6 +++---
 net/rxrpc/skbuff.c                 | 12 ++++++------
 net/sctp/outqueue.c                |  2 +-
 net/sctp/socket.c                  |  2 +-
 22 files changed, 74 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index fb983df..08dac2e 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -743,7 +743,7 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
 
 	if (type == NES_TIMER_TYPE_SEND) {
 		new_send->seq_num = ntohl(tcp_hdr(skb)->seq);
-		atomic_inc(&new_send->skb->users);
+		refcount_inc(&new_send->skb->users);
 		spin_lock_irqsave(&cm_node->retrans_list_lock, flags);
 		cm_node->send_entry = new_send;
 		add_ref_cm_node(cm_node);
@@ -925,7 +925,7 @@ static void nes_cm_timer_tick(unsigned long pass)
 						  flags);
 				break;
 			}
-			atomic_inc(&send_entry->skb->users);
+			refcount_inc(&send_entry->skb->users);
 			cm_packets_retrans++;
 			nes_debug(NES_DBG_CM, "Retransmitting send_entry %p "
 				  "for node %p, jiffies = %lu, time to send = "
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 99e5f97..c5603d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -155,7 +155,7 @@ mISDN_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	copied = skb->len + MISDN_HEADER_LEN;
 	if (len < copied) {
 		if (flags & MSG_PEEK)
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 		else
 			skb_queue_head(&sk->sk_receive_queue, skb);
 		return -ENOSPC;
diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c
index 300bb14..e9f101c 100644
--- a/drivers/net/rionet.c
+++ b/drivers/net/rionet.c
@@ -201,7 +201,7 @@ static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 				rionet_queue_tx_msg(skb, ndev,
 					nets[rnet->mport->id].active[i]);
 				if (count)
-					atomic_inc(&skb->users);
+					refcount_inc(&skb->users);
 				count++;
 			}
 	} else if (RIONET_MAC_MATCH(eth->h_dest)) {
diff --git a/drivers/s390/net/ctcm_main.c b/drivers/s390/net/ctcm_main.c
index ac65f12..d079d02 100644
--- a/drivers/s390/net/ctcm_main.c
+++ b/drivers/s390/net/ctcm_main.c
@@ -483,7 +483,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			spin_unlock_irqrestore(&ch->collect_lock, saveflags);
 			return -EBUSY;
 		} else {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			header.length = l;
 			header.type = skb->protocol;
 			header.unused = 0;
@@ -500,7 +500,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	 * Protect skb against beeing free'd by upper
 	 * layers.
 	 */
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	ch->prof.txlen += skb->len;
 	header.length = skb->len + LL_HEADER_LENGTH;
 	header.type = skb->protocol;
@@ -517,14 +517,14 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	if (hi) {
 		nskb = alloc_skb(skb->len, GFP_ATOMIC | GFP_DMA);
 		if (!nskb) {
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			skb_pull(skb, LL_HEADER_LENGTH + 2);
 			ctcm_clear_busy(ch->netdev);
 			return -ENOMEM;
 		} else {
 			memcpy(skb_put(nskb, skb->len), skb->data, skb->len);
-			atomic_inc(&nskb->users);
-			atomic_dec(&skb->users);
+			refcount_inc(&nskb->users);
+			refcount_dec(&skb->users);
 			dev_kfree_skb_irq(skb);
 			skb = nskb;
 		}
@@ -542,7 +542,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			 * Remove our header. It gets added
 			 * again on retransmit.
 			 */
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			skb_pull(skb, LL_HEADER_LENGTH + 2);
 			ctcm_clear_busy(ch->netdev);
 			return -ENOMEM;
@@ -553,7 +553,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 		ch->ccw[1].count = skb->len;
 		skb_copy_from_linear_data(skb,
 				skb_put(ch->trans_skb, skb->len), skb->len);
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_irq(skb);
 		ccw_idx = 0;
 	} else {
@@ -679,7 +679,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 
 	if ((fsm_getstate(ch->fsm) != CTC_STATE_TXIDLE) || grp->in_sweep) {
 		spin_lock_irqsave(&ch->collect_lock, saveflags);
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		p_header = kmalloc(PDU_HEADER_LENGTH, gfp_type());
 
 		if (!p_header) {
@@ -716,7 +716,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	 * Protect skb against beeing free'd by upper
 	 * layers.
 	 */
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 
 	/*
 	 * IDAL support in CTCM is broken, so we have to
@@ -729,8 +729,8 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			goto nomem_exit;
 		} else {
 			memcpy(skb_put(nskb, skb->len), skb->data, skb->len);
-			atomic_inc(&nskb->users);
-			atomic_dec(&skb->users);
+			refcount_inc(&nskb->users);
+			refcount_dec(&skb->users);
 			dev_kfree_skb_irq(skb);
 			skb = nskb;
 		}
@@ -810,7 +810,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 		ch->trans_skb->len = 0;
 		ch->ccw[1].count = skb->len;
 		memcpy(skb_put(ch->trans_skb, skb->len), skb->data, skb->len);
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_irq(skb);
 		ccw_idx = 0;
 		CTCM_PR_DBGDATA("%s(%s): trans_skb len: %04x\n"
@@ -855,7 +855,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			"%s(%s): MEMORY allocation ERROR\n",
 			CTCM_FUNTAIL, ch->id);
 	rc = -ENOMEM;
-	atomic_dec(&skb->users);
+	refcount_dec(&skb->users);
 	dev_kfree_skb_any(skb);
 	fsm_event(priv->mpcg->fsm, MPCG_EVENT_INOP, dev);
 done:
diff --git a/drivers/s390/net/netiucv.c b/drivers/s390/net/netiucv.c
index 3f85b97..44fd71c 100644
--- a/drivers/s390/net/netiucv.c
+++ b/drivers/s390/net/netiucv.c
@@ -743,7 +743,7 @@ static void conn_action_txdone(fsm_instance *fi, int event, void *arg)
 	conn->prof.tx_pending--;
 	if (single_flag) {
 		if ((skb = skb_dequeue(&conn->commit_queue))) {
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			if (privptr) {
 				privptr->stats.tx_packets++;
 				privptr->stats.tx_bytes +=
@@ -767,7 +767,7 @@ static void conn_action_txdone(fsm_instance *fi, int event, void *arg)
 		txbytes += skb->len;
 		txpackets++;
 		stat_maxcq++;
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 	}
 	if (conn->collect_len > conn->prof.maxmulti)
@@ -959,7 +959,7 @@ static void netiucv_purge_skb_queue(struct sk_buff_head *q)
 	struct sk_buff *skb;
 
 	while ((skb = skb_dequeue(q))) {
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 	}
 }
@@ -1177,7 +1177,7 @@ static int netiucv_transmit_skb(struct iucv_connection *conn,
 			IUCV_DBF_TEXT(data, 2,
 				      "EBUSY from netiucv_transmit_skb\n");
 		} else {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			skb_queue_tail(&conn->collect_queue, skb);
 			conn->collect_len += l;
 			rc = 0;
@@ -1247,7 +1247,7 @@ static int netiucv_transmit_skb(struct iucv_connection *conn,
 		} else {
 			if (copied)
 				dev_kfree_skb(skb);
-			atomic_inc(&nskb->users);
+			refcount_inc(&nskb->users);
 			skb_queue_tail(&conn->commit_queue, nskb);
 		}
 	}
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 315d8a2..5103a02 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1239,7 +1239,7 @@ static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf)
 				iucv->sk_txnotify(skb, TX_NOTIFY_GENERALERROR);
 			}
 		}
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 		skb = skb_dequeue(&buf->skb_list);
 	}
@@ -3969,7 +3969,7 @@ static inline int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
 	int flush_cnt = 0, hdr_len, large_send = 0;
 
 	buffer = buf->buffer;
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	skb_queue_tail(&buf->skb_list, skb);
 
 	/*check first on TSO ....*/
diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 18e0ea8..9584b062 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -378,7 +378,7 @@ static inline void cxgbi_sock_enqueue_wr(struct cxgbi_sock *csk,
 	 * just one user currently so we use atomic_set rather than skb_get
 	 * to avoid the atomic op.
 	 */
-	atomic_set(&skb->users, 2);
+	refcount_set(&skb->users, 2);
 
 	if (!csk->wr_pending_head)
 		csk->wr_pending_head = skb;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 957a2b0..0bcaec4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -815,7 +815,7 @@ struct sk_buff {
 	unsigned char		*head,
 				*data;
 	unsigned int		truesize;
-	atomic_t		users;
+	refcount_t		users;
 };
 
 #ifdef __KERNEL__
@@ -1313,7 +1313,7 @@ static inline struct sk_buff *skb_queue_prev(const struct sk_buff_head *list,
  */
 static inline struct sk_buff *skb_get(struct sk_buff *skb)
 {
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	return skb;
 }
 
@@ -1414,7 +1414,7 @@ static inline void __skb_header_release(struct sk_buff *skb)
  */
 static inline int skb_shared(const struct sk_buff *skb)
 {
-	return atomic_read(&skb->users) != 1;
+	return refcount_read(&skb->users) != 1;
 }
 
 /**
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ea63334..281e5d6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -244,7 +244,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 					}
 				}
 				*peeked = 1;
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 			} else {
 				__skb_unlink(skb, queue);
 				if (destructor)
@@ -313,9 +313,9 @@ void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len)
 {
 	bool slow;
 
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users))) {
+	else if (likely(!refcount_dec_and_test(&skb->users))) {
 		sk_peek_offset_bwd(sk, len);
 		return;
 	}
@@ -343,7 +343,7 @@ int __sk_queue_drop_skb(struct sock *sk, struct sk_buff *skb,
 		spin_lock_bh(&sk->sk_receive_queue.lock);
 		if (skb == skb_peek(&sk->sk_receive_queue)) {
 			__skb_unlink(skb, &sk->sk_receive_queue);
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			if (destructor)
 				destructor(sk, skb);
 			err = 0;
diff --git a/net/core/dev.c b/net/core/dev.c
index d947308..eeb6338 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1830,7 +1830,7 @@ static inline int deliver_skb(struct sk_buff *skb,
 {
 	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 		return -ENOMEM;
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
@@ -2449,10 +2449,10 @@ void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	unsigned long flags;
 
-	if (likely(atomic_read(&skb->users) == 1)) {
+	if (likely(refcount_read(&skb->users) == 1)) {
 		smp_rmb();
-		atomic_set(&skb->users, 0);
-	} else if (likely(!atomic_dec_and_test(&skb->users))) {
+		refcount_set(&skb->users, 0);
+	} else if (likely(!refcount_dec_and_test(&skb->users))) {
 		return;
 	}
 	get_kfree_skb_cb(skb)->reason = reason;
@@ -3864,7 +3864,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 
 			clist = clist->next;
 
-			WARN_ON(atomic_read(&skb->users));
+			WARN_ON(refcount_read(&skb->users));
 			if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
 				trace_consume_skb(skb);
 			else
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..891d88e 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -271,7 +271,7 @@ static void zap_completion_queue(void)
 			struct sk_buff *skb = clist;
 			clist = clist->next;
 			if (!skb_irq_freeable(skb)) {
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 				dev_kfree_skb_any(skb); /* put this one back */
 			} else {
 				__kfree_skb(skb);
@@ -303,7 +303,7 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
 		return NULL;
 	}
 
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb_reserve(skb, reserve);
 	return skb;
 }
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 96947f5..598355f 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3361,7 +3361,7 @@ static void pktgen_wait_for_skb(struct pktgen_dev *pkt_dev)
 {
 	ktime_t idle_start = ktime_get();
 
-	while (atomic_read(&(pkt_dev->skb->users)) != 1) {
+	while (refcount_read(&(pkt_dev->skb->users)) != 1) {
 		if (signal_pending(current))
 			break;
 
@@ -3418,7 +3418,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 	if (pkt_dev->xmit_mode == M_NETIF_RECEIVE) {
 		skb = pkt_dev->skb;
 		skb->protocol = eth_type_trans(skb, skb->dev);
-		atomic_add(burst, &skb->users);
+		refcount_add(burst, &skb->users);
 		local_bh_disable();
 		do {
 			ret = netif_receive_skb(skb);
@@ -3426,11 +3426,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 				pkt_dev->errors++;
 			pkt_dev->sofar++;
 			pkt_dev->seq_num++;
-			if (atomic_read(&skb->users) != burst) {
+			if (refcount_read(&skb->users) != burst) {
 				/* skb was queued by rps/rfs or taps,
 				 * so cannot reuse this skb
 				 */
-				atomic_sub(burst - 1, &skb->users);
+				WARN_ON(refcount_sub_and_test(burst - 1, &skb->users));
 				/* get out of the loop and wait
 				 * until skb is consumed
 				 */
@@ -3444,7 +3444,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		goto out; /* Skips xmit_mode M_START_XMIT */
 	} else if (pkt_dev->xmit_mode == M_QUEUE_XMIT) {
 		local_bh_disable();
-		atomic_inc(&pkt_dev->skb->users);
+		refcount_inc(&pkt_dev->skb->users);
 
 		ret = dev_queue_xmit(pkt_dev->skb);
 		switch (ret) {
@@ -3485,7 +3485,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		pkt_dev->last_ok = 0;
 		goto unlock;
 	}
-	atomic_add(burst, &pkt_dev->skb->users);
+	refcount_add(burst, &pkt_dev->skb->users);
 
 xmit_more:
 	ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);
@@ -3511,11 +3511,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		/* fallthru */
 	case NETDEV_TX_BUSY:
 		/* Retry it next time */
-		atomic_dec(&(pkt_dev->skb->users));
+		refcount_dec(&(pkt_dev->skb->users));
 		pkt_dev->last_ok = 0;
 	}
 	if (unlikely(burst))
-		atomic_sub(burst, &pkt_dev->skb->users);
+		WARN_ON(refcount_sub_and_test(burst, &pkt_dev->skb->users));
 unlock:
 	HARD_TX_UNLOCK(odev, txq);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c4e84c5..8044cf3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -647,7 +647,7 @@ int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, unsigned int g
 
 	NETLINK_CB(skb).dst_group = group;
 	if (echo)
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 	netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL);
 	if (echo)
 		err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 89f0d4e..94e961f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -176,7 +176,7 @@ struct sk_buff *__alloc_skb_head(gfp_t gfp_mask, int node)
 	memset(skb, 0, offsetof(struct sk_buff, tail));
 	skb->head = NULL;
 	skb->truesize = sizeof(struct sk_buff);
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 
 	skb->mac_header = (typeof(skb->mac_header))~0U;
 out:
@@ -247,7 +247,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	/* Account for allocated memory : skb + skb->head */
 	skb->truesize = SKB_TRUESIZE(size);
 	skb->pfmemalloc = pfmemalloc;
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
 	skb_reset_tail_pointer(skb);
@@ -314,7 +314,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
 
 	memset(skb, 0, offsetof(struct sk_buff, tail));
 	skb->truesize = SKB_TRUESIZE(size);
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
 	skb_reset_tail_pointer(skb);
@@ -696,9 +696,9 @@ void kfree_skb(struct sk_buff *skb)
 {
 	if (unlikely(!skb))
 		return;
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	trace_kfree_skb(skb, __builtin_return_address(0));
 	__kfree_skb(skb);
@@ -748,9 +748,9 @@ void consume_skb(struct sk_buff *skb)
 {
 	if (unlikely(!skb))
 		return;
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	trace_consume_skb(skb);
 	__kfree_skb(skb);
@@ -807,9 +807,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	/* if reaching here SKB is ready to free */
 	trace_consume_skb(skb);
@@ -906,7 +906,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	C(head_frag);
 	C(data);
 	C(truesize);
-	atomic_set(&n->users, 1);
+	refcount_set(&n->users, 1);
 
 	atomic_inc(&(skb_shinfo(skb)->dataref));
 	skb->cloned = 1;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index b4019a5..c734eea 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -351,7 +351,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (ipv6_opt_accepted(sk, skb, IP6CB(skb)) ||
 	    np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
 	    np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 	ireq->ir_iif = sk->sk_bound_dev_if;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 895ff65..2626a1d 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -185,7 +185,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	if (ipv6_opt_accepted(sk, skb, &TCP_SKB_CB(skb)->header.h6) ||
 	    np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
 	    np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index bdbc432..21bb2fc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -718,7 +718,7 @@ static void tcp_v6_init_req(struct request_sock *req,
 	     np->rxopt.bits.rxinfo ||
 	     np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim ||
 	     np->rxopt.bits.rxohlim || np->repflow)) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 }
diff --git a/net/key/af_key.c b/net/key/af_key.c
index c6252ed..9d96b82 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -194,11 +194,11 @@ static int pfkey_broadcast_one(struct sk_buff *skb, struct sk_buff **skb2,
 
 	sock_hold(sk);
 	if (*skb2 == NULL) {
-		if (atomic_read(&skb->users) != 1) {
+		if (refcount_read(&skb->users) != 1) {
 			*skb2 = skb_clone(skb, allocation);
 		} else {
 			*skb2 = skb;
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 		}
 	}
 	if (*skb2 != NULL) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7b73c7c..7dac4a9 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1797,7 +1797,7 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 	}
 
 	if (dst_group) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		netlink_broadcast(sk, skb, dst_portid, dst_group, GFP_KERNEL);
 	}
 	err = netlink_unicast(sk, skb, dst_portid, msg->msg_flags&MSG_DONTWAIT);
@@ -2175,7 +2175,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	struct netlink_sock *nlk;
 	int ret;
 
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 
 	sk = netlink_lookup(sock_net(ssk), ssk->sk_protocol, NETLINK_CB(skb).portid);
 	if (sk == NULL) {
@@ -2332,7 +2332,7 @@ int nlmsg_notify(struct sock *sk, struct sk_buff *skb, u32 portid,
 		int exclude_portid = 0;
 
 		if (report) {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			exclude_portid = portid;
 		}
 
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 67b02c4..b8985d0 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -27,7 +27,7 @@ void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 {
 	const void *here = __builtin_return_address(0);
 	int n = atomic_inc_return(select_skb_count(op));
-	trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+	trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 }
 
 /*
@@ -38,7 +38,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 	const void *here = __builtin_return_address(0);
 	if (skb) {
 		int n = atomic_read(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 	}
 }
 
@@ -49,7 +49,7 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 {
 	const void *here = __builtin_return_address(0);
 	int n = atomic_inc_return(select_skb_count(op));
-	trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+	trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 	skb_get(skb);
 }
 
@@ -63,7 +63,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 		int n;
 		CHECK_SLAB_OKAY(&skb->users);
 		n = atomic_dec_return(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
@@ -78,7 +78,7 @@ void rxrpc_lose_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 		int n;
 		CHECK_SLAB_OKAY(&skb->users);
 		n = atomic_dec_return(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
@@ -93,7 +93,7 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
 	while ((skb = skb_dequeue((list))) != NULL) {
 		int n = atomic_dec_return(select_skb_count(rxrpc_skb_rx_purged));
 		trace_rxrpc_skb(skb, rxrpc_skb_rx_purged,
-				atomic_read(&skb->users), n, here);
+				refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index db352e5..1d6dae9 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1094,7 +1094,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
 				 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
 				 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
-				 atomic_read(&chunk->skb->users) : -1);
+				 refcount_read(&chunk->skb->users) : -1);
 
 			/* Add the chunk to the packet.  */
 			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 6f0a9be..c1120d5 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7421,7 +7421,7 @@ struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		if (flags & MSG_PEEK) {
 			skb = skb_peek(&sk->sk_receive_queue);
 			if (skb)
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 		} else {
 			skb = __skb_dequeue(&sk->sk_receive_queue);
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 04/17] net: convert sk_buff.users from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 drivers/infiniband/hw/nes/nes_cm.c |  4 ++--
 drivers/isdn/mISDN/socket.c        |  2 +-
 drivers/net/rionet.c               |  2 +-
 drivers/s390/net/ctcm_main.c       | 26 +++++++++++++-------------
 drivers/s390/net/netiucv.c         | 10 +++++-----
 drivers/s390/net/qeth_core_main.c  |  4 ++--
 drivers/scsi/cxgbi/libcxgbi.h      |  2 +-
 include/linux/skbuff.h             |  6 +++---
 net/core/datagram.c                |  8 ++++----
 net/core/dev.c                     | 10 +++++-----
 net/core/netpoll.c                 |  4 ++--
 net/core/pktgen.c                  | 16 ++++++++--------
 net/core/rtnetlink.c               |  2 +-
 net/core/skbuff.c                  | 20 ++++++++++----------
 net/dccp/ipv6.c                    |  2 +-
 net/ipv6/syncookies.c              |  2 +-
 net/ipv6/tcp_ipv6.c                |  2 +-
 net/key/af_key.c                   |  4 ++--
 net/netlink/af_netlink.c           |  6 +++---
 net/rxrpc/skbuff.c                 | 12 ++++++------
 net/sctp/outqueue.c                |  2 +-
 net/sctp/socket.c                  |  2 +-
 22 files changed, 74 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index fb983df..08dac2e 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -743,7 +743,7 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
 
 	if (type == NES_TIMER_TYPE_SEND) {
 		new_send->seq_num = ntohl(tcp_hdr(skb)->seq);
-		atomic_inc(&new_send->skb->users);
+		refcount_inc(&new_send->skb->users);
 		spin_lock_irqsave(&cm_node->retrans_list_lock, flags);
 		cm_node->send_entry = new_send;
 		add_ref_cm_node(cm_node);
@@ -925,7 +925,7 @@ static void nes_cm_timer_tick(unsigned long pass)
 						  flags);
 				break;
 			}
-			atomic_inc(&send_entry->skb->users);
+			refcount_inc(&send_entry->skb->users);
 			cm_packets_retrans++;
 			nes_debug(NES_DBG_CM, "Retransmitting send_entry %p "
 				  "for node %p, jiffies = %lu, time to send = "
diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 99e5f97..c5603d1 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -155,7 +155,7 @@ mISDN_sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	copied = skb->len + MISDN_HEADER_LEN;
 	if (len < copied) {
 		if (flags & MSG_PEEK)
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 		else
 			skb_queue_head(&sk->sk_receive_queue, skb);
 		return -ENOSPC;
diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c
index 300bb14..e9f101c 100644
--- a/drivers/net/rionet.c
+++ b/drivers/net/rionet.c
@@ -201,7 +201,7 @@ static int rionet_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 				rionet_queue_tx_msg(skb, ndev,
 					nets[rnet->mport->id].active[i]);
 				if (count)
-					atomic_inc(&skb->users);
+					refcount_inc(&skb->users);
 				count++;
 			}
 	} else if (RIONET_MAC_MATCH(eth->h_dest)) {
diff --git a/drivers/s390/net/ctcm_main.c b/drivers/s390/net/ctcm_main.c
index ac65f12..d079d02 100644
--- a/drivers/s390/net/ctcm_main.c
+++ b/drivers/s390/net/ctcm_main.c
@@ -483,7 +483,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			spin_unlock_irqrestore(&ch->collect_lock, saveflags);
 			return -EBUSY;
 		} else {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			header.length = l;
 			header.type = skb->protocol;
 			header.unused = 0;
@@ -500,7 +500,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	 * Protect skb against beeing free'd by upper
 	 * layers.
 	 */
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	ch->prof.txlen += skb->len;
 	header.length = skb->len + LL_HEADER_LENGTH;
 	header.type = skb->protocol;
@@ -517,14 +517,14 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	if (hi) {
 		nskb = alloc_skb(skb->len, GFP_ATOMIC | GFP_DMA);
 		if (!nskb) {
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			skb_pull(skb, LL_HEADER_LENGTH + 2);
 			ctcm_clear_busy(ch->netdev);
 			return -ENOMEM;
 		} else {
 			memcpy(skb_put(nskb, skb->len), skb->data, skb->len);
-			atomic_inc(&nskb->users);
-			atomic_dec(&skb->users);
+			refcount_inc(&nskb->users);
+			refcount_dec(&skb->users);
 			dev_kfree_skb_irq(skb);
 			skb = nskb;
 		}
@@ -542,7 +542,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			 * Remove our header. It gets added
 			 * again on retransmit.
 			 */
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			skb_pull(skb, LL_HEADER_LENGTH + 2);
 			ctcm_clear_busy(ch->netdev);
 			return -ENOMEM;
@@ -553,7 +553,7 @@ static int ctcm_transmit_skb(struct channel *ch, struct sk_buff *skb)
 		ch->ccw[1].count = skb->len;
 		skb_copy_from_linear_data(skb,
 				skb_put(ch->trans_skb, skb->len), skb->len);
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_irq(skb);
 		ccw_idx = 0;
 	} else {
@@ -679,7 +679,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 
 	if ((fsm_getstate(ch->fsm) != CTC_STATE_TXIDLE) || grp->in_sweep) {
 		spin_lock_irqsave(&ch->collect_lock, saveflags);
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		p_header = kmalloc(PDU_HEADER_LENGTH, gfp_type());
 
 		if (!p_header) {
@@ -716,7 +716,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 	 * Protect skb against beeing free'd by upper
 	 * layers.
 	 */
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 
 	/*
 	 * IDAL support in CTCM is broken, so we have to
@@ -729,8 +729,8 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			goto nomem_exit;
 		} else {
 			memcpy(skb_put(nskb, skb->len), skb->data, skb->len);
-			atomic_inc(&nskb->users);
-			atomic_dec(&skb->users);
+			refcount_inc(&nskb->users);
+			refcount_dec(&skb->users);
 			dev_kfree_skb_irq(skb);
 			skb = nskb;
 		}
@@ -810,7 +810,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 		ch->trans_skb->len = 0;
 		ch->ccw[1].count = skb->len;
 		memcpy(skb_put(ch->trans_skb, skb->len), skb->data, skb->len);
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_irq(skb);
 		ccw_idx = 0;
 		CTCM_PR_DBGDATA("%s(%s): trans_skb len: %04x\n"
@@ -855,7 +855,7 @@ static int ctcmpc_transmit_skb(struct channel *ch, struct sk_buff *skb)
 			"%s(%s): MEMORY allocation ERROR\n",
 			CTCM_FUNTAIL, ch->id);
 	rc = -ENOMEM;
-	atomic_dec(&skb->users);
+	refcount_dec(&skb->users);
 	dev_kfree_skb_any(skb);
 	fsm_event(priv->mpcg->fsm, MPCG_EVENT_INOP, dev);
 done:
diff --git a/drivers/s390/net/netiucv.c b/drivers/s390/net/netiucv.c
index 3f85b97..44fd71c 100644
--- a/drivers/s390/net/netiucv.c
+++ b/drivers/s390/net/netiucv.c
@@ -743,7 +743,7 @@ static void conn_action_txdone(fsm_instance *fi, int event, void *arg)
 	conn->prof.tx_pending--;
 	if (single_flag) {
 		if ((skb = skb_dequeue(&conn->commit_queue))) {
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			if (privptr) {
 				privptr->stats.tx_packets++;
 				privptr->stats.tx_bytes +=
@@ -767,7 +767,7 @@ static void conn_action_txdone(fsm_instance *fi, int event, void *arg)
 		txbytes += skb->len;
 		txpackets++;
 		stat_maxcq++;
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 	}
 	if (conn->collect_len > conn->prof.maxmulti)
@@ -959,7 +959,7 @@ static void netiucv_purge_skb_queue(struct sk_buff_head *q)
 	struct sk_buff *skb;
 
 	while ((skb = skb_dequeue(q))) {
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 	}
 }
@@ -1177,7 +1177,7 @@ static int netiucv_transmit_skb(struct iucv_connection *conn,
 			IUCV_DBF_TEXT(data, 2,
 				      "EBUSY from netiucv_transmit_skb\n");
 		} else {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			skb_queue_tail(&conn->collect_queue, skb);
 			conn->collect_len += l;
 			rc = 0;
@@ -1247,7 +1247,7 @@ static int netiucv_transmit_skb(struct iucv_connection *conn,
 		} else {
 			if (copied)
 				dev_kfree_skb(skb);
-			atomic_inc(&nskb->users);
+			refcount_inc(&nskb->users);
 			skb_queue_tail(&conn->commit_queue, nskb);
 		}
 	}
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 315d8a2..5103a02 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -1239,7 +1239,7 @@ static void qeth_release_skbs(struct qeth_qdio_out_buffer *buf)
 				iucv->sk_txnotify(skb, TX_NOTIFY_GENERALERROR);
 			}
 		}
-		atomic_dec(&skb->users);
+		refcount_dec(&skb->users);
 		dev_kfree_skb_any(skb);
 		skb = skb_dequeue(&buf->skb_list);
 	}
@@ -3969,7 +3969,7 @@ static inline int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
 	int flush_cnt = 0, hdr_len, large_send = 0;
 
 	buffer = buf->buffer;
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	skb_queue_tail(&buf->skb_list, skb);
 
 	/*check first on TSO ....*/
diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 18e0ea8..9584b062 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -378,7 +378,7 @@ static inline void cxgbi_sock_enqueue_wr(struct cxgbi_sock *csk,
 	 * just one user currently so we use atomic_set rather than skb_get
 	 * to avoid the atomic op.
 	 */
-	atomic_set(&skb->users, 2);
+	refcount_set(&skb->users, 2);
 
 	if (!csk->wr_pending_head)
 		csk->wr_pending_head = skb;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 957a2b0..0bcaec4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -815,7 +815,7 @@ struct sk_buff {
 	unsigned char		*head,
 				*data;
 	unsigned int		truesize;
-	atomic_t		users;
+	refcount_t		users;
 };
 
 #ifdef __KERNEL__
@@ -1313,7 +1313,7 @@ static inline struct sk_buff *skb_queue_prev(const struct sk_buff_head *list,
  */
 static inline struct sk_buff *skb_get(struct sk_buff *skb)
 {
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	return skb;
 }
 
@@ -1414,7 +1414,7 @@ static inline void __skb_header_release(struct sk_buff *skb)
  */
 static inline int skb_shared(const struct sk_buff *skb)
 {
-	return atomic_read(&skb->users) != 1;
+	return refcount_read(&skb->users) != 1;
 }
 
 /**
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ea63334..281e5d6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -244,7 +244,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned int flags,
 					}
 				}
 				*peeked = 1;
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 			} else {
 				__skb_unlink(skb, queue);
 				if (destructor)
@@ -313,9 +313,9 @@ void __skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb, int len)
 {
 	bool slow;
 
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users))) {
+	else if (likely(!refcount_dec_and_test(&skb->users))) {
 		sk_peek_offset_bwd(sk, len);
 		return;
 	}
@@ -343,7 +343,7 @@ int __sk_queue_drop_skb(struct sock *sk, struct sk_buff *skb,
 		spin_lock_bh(&sk->sk_receive_queue.lock);
 		if (skb == skb_peek(&sk->sk_receive_queue)) {
 			__skb_unlink(skb, &sk->sk_receive_queue);
-			atomic_dec(&skb->users);
+			refcount_dec(&skb->users);
 			if (destructor)
 				destructor(sk, skb);
 			err = 0;
diff --git a/net/core/dev.c b/net/core/dev.c
index d947308..eeb6338 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1830,7 +1830,7 @@ static inline int deliver_skb(struct sk_buff *skb,
 {
 	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 		return -ENOMEM;
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
@@ -2449,10 +2449,10 @@ void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	unsigned long flags;
 
-	if (likely(atomic_read(&skb->users) == 1)) {
+	if (likely(refcount_read(&skb->users) == 1)) {
 		smp_rmb();
-		atomic_set(&skb->users, 0);
-	} else if (likely(!atomic_dec_and_test(&skb->users))) {
+		refcount_set(&skb->users, 0);
+	} else if (likely(!refcount_dec_and_test(&skb->users))) {
 		return;
 	}
 	get_kfree_skb_cb(skb)->reason = reason;
@@ -3864,7 +3864,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 
 			clist = clist->next;
 
-			WARN_ON(atomic_read(&skb->users));
+			WARN_ON(refcount_read(&skb->users));
 			if (likely(get_kfree_skb_cb(skb)->reason == SKB_REASON_CONSUMED))
 				trace_consume_skb(skb);
 			else
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..891d88e 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -271,7 +271,7 @@ static void zap_completion_queue(void)
 			struct sk_buff *skb = clist;
 			clist = clist->next;
 			if (!skb_irq_freeable(skb)) {
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 				dev_kfree_skb_any(skb); /* put this one back */
 			} else {
 				__kfree_skb(skb);
@@ -303,7 +303,7 @@ static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
 		return NULL;
 	}
 
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb_reserve(skb, reserve);
 	return skb;
 }
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 96947f5..598355f 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3361,7 +3361,7 @@ static void pktgen_wait_for_skb(struct pktgen_dev *pkt_dev)
 {
 	ktime_t idle_start = ktime_get();
 
-	while (atomic_read(&(pkt_dev->skb->users)) != 1) {
+	while (refcount_read(&(pkt_dev->skb->users)) != 1) {
 		if (signal_pending(current))
 			break;
 
@@ -3418,7 +3418,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 	if (pkt_dev->xmit_mode == M_NETIF_RECEIVE) {
 		skb = pkt_dev->skb;
 		skb->protocol = eth_type_trans(skb, skb->dev);
-		atomic_add(burst, &skb->users);
+		refcount_add(burst, &skb->users);
 		local_bh_disable();
 		do {
 			ret = netif_receive_skb(skb);
@@ -3426,11 +3426,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 				pkt_dev->errors++;
 			pkt_dev->sofar++;
 			pkt_dev->seq_num++;
-			if (atomic_read(&skb->users) != burst) {
+			if (refcount_read(&skb->users) != burst) {
 				/* skb was queued by rps/rfs or taps,
 				 * so cannot reuse this skb
 				 */
-				atomic_sub(burst - 1, &skb->users);
+				WARN_ON(refcount_sub_and_test(burst - 1, &skb->users));
 				/* get out of the loop and wait
 				 * until skb is consumed
 				 */
@@ -3444,7 +3444,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		goto out; /* Skips xmit_mode M_START_XMIT */
 	} else if (pkt_dev->xmit_mode == M_QUEUE_XMIT) {
 		local_bh_disable();
-		atomic_inc(&pkt_dev->skb->users);
+		refcount_inc(&pkt_dev->skb->users);
 
 		ret = dev_queue_xmit(pkt_dev->skb);
 		switch (ret) {
@@ -3485,7 +3485,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		pkt_dev->last_ok = 0;
 		goto unlock;
 	}
-	atomic_add(burst, &pkt_dev->skb->users);
+	refcount_add(burst, &pkt_dev->skb->users);
 
 xmit_more:
 	ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);
@@ -3511,11 +3511,11 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		/* fallthru */
 	case NETDEV_TX_BUSY:
 		/* Retry it next time */
-		atomic_dec(&(pkt_dev->skb->users));
+		refcount_dec(&(pkt_dev->skb->users));
 		pkt_dev->last_ok = 0;
 	}
 	if (unlikely(burst))
-		atomic_sub(burst, &pkt_dev->skb->users);
+		WARN_ON(refcount_sub_and_test(burst, &pkt_dev->skb->users));
 unlock:
 	HARD_TX_UNLOCK(odev, txq);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c4e84c5..8044cf3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -647,7 +647,7 @@ int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, unsigned int g
 
 	NETLINK_CB(skb).dst_group = group;
 	if (echo)
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 	netlink_broadcast(rtnl, skb, pid, group, GFP_KERNEL);
 	if (echo)
 		err = netlink_unicast(rtnl, skb, pid, MSG_DONTWAIT);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 89f0d4e..94e961f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -176,7 +176,7 @@ struct sk_buff *__alloc_skb_head(gfp_t gfp_mask, int node)
 	memset(skb, 0, offsetof(struct sk_buff, tail));
 	skb->head = NULL;
 	skb->truesize = sizeof(struct sk_buff);
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 
 	skb->mac_header = (typeof(skb->mac_header))~0U;
 out:
@@ -247,7 +247,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	/* Account for allocated memory : skb + skb->head */
 	skb->truesize = SKB_TRUESIZE(size);
 	skb->pfmemalloc = pfmemalloc;
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
 	skb_reset_tail_pointer(skb);
@@ -314,7 +314,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
 
 	memset(skb, 0, offsetof(struct sk_buff, tail));
 	skb->truesize = SKB_TRUESIZE(size);
-	atomic_set(&skb->users, 1);
+	refcount_set(&skb->users, 1);
 	skb->head = data;
 	skb->data = data;
 	skb_reset_tail_pointer(skb);
@@ -696,9 +696,9 @@ void kfree_skb(struct sk_buff *skb)
 {
 	if (unlikely(!skb))
 		return;
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	trace_kfree_skb(skb, __builtin_return_address(0));
 	__kfree_skb(skb);
@@ -748,9 +748,9 @@ void consume_skb(struct sk_buff *skb)
 {
 	if (unlikely(!skb))
 		return;
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	trace_consume_skb(skb);
 	__kfree_skb(skb);
@@ -807,9 +807,9 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
-	if (likely(atomic_read(&skb->users) == 1))
+	if (likely(refcount_read(&skb->users) == 1))
 		smp_rmb();
-	else if (likely(!atomic_dec_and_test(&skb->users)))
+	else if (likely(!refcount_dec_and_test(&skb->users)))
 		return;
 	/* if reaching here SKB is ready to free */
 	trace_consume_skb(skb);
@@ -906,7 +906,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	C(head_frag);
 	C(data);
 	C(truesize);
-	atomic_set(&n->users, 1);
+	refcount_set(&n->users, 1);
 
 	atomic_inc(&(skb_shinfo(skb)->dataref));
 	skb->cloned = 1;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index b4019a5..c734eea 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -351,7 +351,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (ipv6_opt_accepted(sk, skb, IP6CB(skb)) ||
 	    np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
 	    np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 	ireq->ir_iif = sk->sk_bound_dev_if;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 895ff65..2626a1d 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -185,7 +185,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	if (ipv6_opt_accepted(sk, skb, &TCP_SKB_CB(skb)->header.h6) ||
 	    np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo ||
 	    np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index bdbc432..21bb2fc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -718,7 +718,7 @@ static void tcp_v6_init_req(struct request_sock *req,
 	     np->rxopt.bits.rxinfo ||
 	     np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim ||
 	     np->rxopt.bits.rxohlim || np->repflow)) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		ireq->pktopts = skb;
 	}
 }
diff --git a/net/key/af_key.c b/net/key/af_key.c
index c6252ed..9d96b82 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -194,11 +194,11 @@ static int pfkey_broadcast_one(struct sk_buff *skb, struct sk_buff **skb2,
 
 	sock_hold(sk);
 	if (*skb2 == NULL) {
-		if (atomic_read(&skb->users) != 1) {
+		if (refcount_read(&skb->users) != 1) {
 			*skb2 = skb_clone(skb, allocation);
 		} else {
 			*skb2 = skb;
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 		}
 	}
 	if (*skb2 != NULL) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7b73c7c..7dac4a9 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1797,7 +1797,7 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 	}
 
 	if (dst_group) {
-		atomic_inc(&skb->users);
+		refcount_inc(&skb->users);
 		netlink_broadcast(sk, skb, dst_portid, dst_group, GFP_KERNEL);
 	}
 	err = netlink_unicast(sk, skb, dst_portid, msg->msg_flags&MSG_DONTWAIT);
@@ -2175,7 +2175,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	struct netlink_sock *nlk;
 	int ret;
 
-	atomic_inc(&skb->users);
+	refcount_inc(&skb->users);
 
 	sk = netlink_lookup(sock_net(ssk), ssk->sk_protocol, NETLINK_CB(skb).portid);
 	if (sk == NULL) {
@@ -2332,7 +2332,7 @@ int nlmsg_notify(struct sock *sk, struct sk_buff *skb, u32 portid,
 		int exclude_portid = 0;
 
 		if (report) {
-			atomic_inc(&skb->users);
+			refcount_inc(&skb->users);
 			exclude_portid = portid;
 		}
 
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 67b02c4..b8985d0 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -27,7 +27,7 @@ void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 {
 	const void *here = __builtin_return_address(0);
 	int n = atomic_inc_return(select_skb_count(op));
-	trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+	trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 }
 
 /*
@@ -38,7 +38,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 	const void *here = __builtin_return_address(0);
 	if (skb) {
 		int n = atomic_read(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 	}
 }
 
@@ -49,7 +49,7 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 {
 	const void *here = __builtin_return_address(0);
 	int n = atomic_inc_return(select_skb_count(op));
-	trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+	trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 	skb_get(skb);
 }
 
@@ -63,7 +63,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 		int n;
 		CHECK_SLAB_OKAY(&skb->users);
 		n = atomic_dec_return(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
@@ -78,7 +78,7 @@ void rxrpc_lose_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
 		int n;
 		CHECK_SLAB_OKAY(&skb->users);
 		n = atomic_dec_return(select_skb_count(op));
-		trace_rxrpc_skb(skb, op, atomic_read(&skb->users), n, here);
+		trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
@@ -93,7 +93,7 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
 	while ((skb = skb_dequeue((list))) != NULL) {
 		int n = atomic_dec_return(select_skb_count(rxrpc_skb_rx_purged));
 		trace_rxrpc_skb(skb, rxrpc_skb_rx_purged,
-				atomic_read(&skb->users), n, here);
+				refcount_read(&skb->users), n, here);
 		kfree_skb(skb);
 	}
 }
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index db352e5..1d6dae9 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1094,7 +1094,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
 				 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
 				 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
-				 atomic_read(&chunk->skb->users) : -1);
+				 refcount_read(&chunk->skb->users) : -1);
 
 			/* Add the chunk to the packet.  */
 			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 6f0a9be..c1120d5 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7421,7 +7421,7 @@ struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		if (flags & MSG_PEEK) {
 			skb = skb_peek(&sk->sk_receive_queue);
 			if (skb)
-				atomic_inc(&skb->users);
+				refcount_inc(&skb->users);
 		} else {
 			skb = __skb_dequeue(&sk->sk_receive_queue);
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 05/17] net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/skbuff.h |  4 ++--
 net/core/skbuff.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0bcaec4..63ce21f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -945,7 +945,7 @@ struct sk_buff_fclones {
 
 	struct sk_buff	skb2;
 
-	atomic_t	fclone_ref;
+	refcount_t	fclone_ref;
 };
 
 /**
@@ -965,7 +965,7 @@ static inline bool skb_fclone_busy(const struct sock *sk,
 	fclones = container_of(skb, struct sk_buff_fclones, skb1);
 
 	return skb->fclone == SKB_FCLONE_ORIG &&
-	       atomic_read(&fclones->fclone_ref) > 1 &&
+	       refcount_read(&fclones->fclone_ref) > 1 &&
 	       fclones->skb2.sk == sk;
 }
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 94e961f..6911269 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -268,7 +268,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		kmemcheck_annotate_bitfield(&fclones->skb2, flags1);
 		skb->fclone = SKB_FCLONE_ORIG;
-		atomic_set(&fclones->fclone_ref, 1);
+		refcount_set(&fclones->fclone_ref, 1);
 
 		fclones->skb2.fclone = SKB_FCLONE_CLONE;
 	}
@@ -629,7 +629,7 @@ static void kfree_skbmem(struct sk_buff *skb)
 		 * This test would have no chance to be true for the clone,
 		 * while here, branch prediction will be good.
 		 */
-		if (atomic_read(&fclones->fclone_ref) == 1)
+		if (refcount_read(&fclones->fclone_ref) == 1)
 			goto fastpath;
 		break;
 
@@ -637,7 +637,7 @@ static void kfree_skbmem(struct sk_buff *skb)
 		fclones = container_of(skb, struct sk_buff_fclones, skb2);
 		break;
 	}
-	if (!atomic_dec_and_test(&fclones->fclone_ref))
+	if (!refcount_dec_and_test(&fclones->fclone_ref))
 		return;
 fastpath:
 	kmem_cache_free(skbuff_fclone_cache, fclones);
@@ -1018,9 +1018,9 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 		return NULL;
 
 	if (skb->fclone == SKB_FCLONE_ORIG &&
-	    atomic_read(&fclones->fclone_ref) == 1) {
+	    refcount_read(&fclones->fclone_ref) == 1) {
 		n = &fclones->skb2;
-		atomic_set(&fclones->fclone_ref, 2);
+		refcount_set(&fclones->fclone_ref, 2);
 	} else {
 		if (skb_pfmemalloc(skb))
 			gfp_mask |= __GFP_MEMALLOC;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 05/17] net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/skbuff.h |  4 ++--
 net/core/skbuff.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0bcaec4..63ce21f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -945,7 +945,7 @@ struct sk_buff_fclones {
 
 	struct sk_buff	skb2;
 
-	atomic_t	fclone_ref;
+	refcount_t	fclone_ref;
 };
 
 /**
@@ -965,7 +965,7 @@ static inline bool skb_fclone_busy(const struct sock *sk,
 	fclones = container_of(skb, struct sk_buff_fclones, skb1);
 
 	return skb->fclone == SKB_FCLONE_ORIG &&
-	       atomic_read(&fclones->fclone_ref) > 1 &&
+	       refcount_read(&fclones->fclone_ref) > 1 &&
 	       fclones->skb2.sk == sk;
 }
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 94e961f..6911269 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -268,7 +268,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		kmemcheck_annotate_bitfield(&fclones->skb2, flags1);
 		skb->fclone = SKB_FCLONE_ORIG;
-		atomic_set(&fclones->fclone_ref, 1);
+		refcount_set(&fclones->fclone_ref, 1);
 
 		fclones->skb2.fclone = SKB_FCLONE_CLONE;
 	}
@@ -629,7 +629,7 @@ static void kfree_skbmem(struct sk_buff *skb)
 		 * This test would have no chance to be true for the clone,
 		 * while here, branch prediction will be good.
 		 */
-		if (atomic_read(&fclones->fclone_ref) == 1)
+		if (refcount_read(&fclones->fclone_ref) == 1)
 			goto fastpath;
 		break;
 
@@ -637,7 +637,7 @@ static void kfree_skbmem(struct sk_buff *skb)
 		fclones = container_of(skb, struct sk_buff_fclones, skb2);
 		break;
 	}
-	if (!atomic_dec_and_test(&fclones->fclone_ref))
+	if (!refcount_dec_and_test(&fclones->fclone_ref))
 		return;
 fastpath:
 	kmem_cache_free(skbuff_fclone_cache, fclones);
@@ -1018,9 +1018,9 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 		return NULL;
 
 	if (skb->fclone == SKB_FCLONE_ORIG &&
-	    atomic_read(&fclones->fclone_ref) == 1) {
+	    refcount_read(&fclones->fclone_ref) == 1) {
 		n = &fclones->skb2;
-		atomic_set(&fclones->fclone_ref, 2);
+		refcount_set(&fclones->fclone_ref, 2);
 	} else {
 		if (skb_pfmemalloc(skb))
 			gfp_mask |= __GFP_MEMALLOC;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 06/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 drivers/atm/fore200e.c   | 12 +-----------
 drivers/atm/he.c         |  2 +-
 drivers/atm/idt77252.c   |  4 ++--
 include/linux/atmdev.h   |  2 +-
 include/net/sock.h       |  8 ++++----
 net/atm/br2684.c         |  2 +-
 net/atm/clip.c           |  2 +-
 net/atm/common.c         | 10 +++++-----
 net/atm/lec.c            |  4 ++--
 net/atm/mpc.c            |  4 ++--
 net/atm/pppoatm.c        |  2 +-
 net/atm/raw.c            |  2 +-
 net/atm/signaling.c      |  2 +-
 net/caif/caif_socket.c   |  2 +-
 net/core/datagram.c      |  2 +-
 net/core/skbuff.c        |  2 +-
 net/core/sock.c          | 26 +++++++++++++-------------
 net/ipv4/af_inet.c       |  2 +-
 net/ipv4/esp4.c          |  2 +-
 net/ipv4/ip_output.c     |  6 +++---
 net/ipv4/tcp.c           |  4 ++--
 net/ipv4/tcp_offload.c   |  2 +-
 net/ipv4/tcp_output.c    | 13 ++++++-------
 net/ipv6/esp6.c          |  2 +-
 net/ipv6/ip6_output.c    |  4 ++--
 net/kcm/kcmproc.c        |  2 +-
 net/key/af_key.c         |  2 +-
 net/netlink/af_netlink.c |  2 +-
 net/packet/af_packet.c   |  4 ++--
 net/phonet/socket.c      |  2 +-
 net/rds/tcp_send.c       |  2 +-
 net/rxrpc/af_rxrpc.c     |  4 ++--
 net/sched/sch_atm.c      |  2 +-
 net/sctp/output.c        |  2 +-
 net/sctp/proc.c          |  2 +-
 net/sctp/socket.c        |  4 ++--
 net/unix/af_unix.c       |  6 +++---
 37 files changed, 73 insertions(+), 84 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 637c3e6..b770d18 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -924,12 +924,7 @@ fore200e_tx_irq(struct fore200e* fore200e)
 		else {
 		    dev_kfree_skb_any(entry->skb);
 		}
-#if 1
-		/* race fixed by the above incarnation mechanism, but... */
-		if (atomic_read(&sk_atm(vcc)->sk_wmem_alloc) < 0) {
-		    atomic_set(&sk_atm(vcc)->sk_wmem_alloc, 0);
-		}
-#endif
+
 		/* check error condition */
 		if (*entry->status & STATUS_ERROR)
 		    atomic_inc(&vcc->stats->tx_err);
@@ -1130,13 +1125,9 @@ fore200e_push_rpd(struct fore200e* fore200e, struct atm_vcc* vcc, struct rpd* rp
 	return -ENOMEM;
     }
 
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
-
     vcc->push(vcc, skb);
     atomic_inc(&vcc->stats->rx);
 
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
-
     return 0;
 }
 
@@ -1572,7 +1563,6 @@ fore200e_send(struct atm_vcc *vcc, struct sk_buff *skb)
     unsigned long           flags;
 
     ASSERT(vcc);
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
     ASSERT(fore200e);
     ASSERT(fore200e_vcc);
 
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 3617659..fc1bbdb 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2395,7 +2395,7 @@ he_close(struct atm_vcc *vcc)
 		 * TBRQ, the host issues the close command to the adapter.
 		 */
 
-		while (((tx_inuse = atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) > 1) &&
+		while (((tx_inuse = refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) > 1) &&
 		       (retry < MAX_RETRY)) {
 			msleep(sleep);
 			if (sleep < 250)
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index 5ec1095..20eda87 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -724,7 +724,7 @@ push_on_scq(struct idt77252_dev *card, struct vc_map *vc, struct sk_buff *skb)
 		struct sock *sk = sk_atm(vcc);
 
 		vc->estimator->cells += (skb->len + 47) / 48;
-		if (atomic_read(&sk->sk_wmem_alloc) >
+		if (refcount_read(&sk->sk_wmem_alloc) >
 		    (sk->sk_sndbuf >> 1)) {
 			u32 cps = vc->estimator->maxcps;
 
@@ -2012,7 +2012,7 @@ idt77252_send_oam(struct atm_vcc *vcc, void *cell, int flags)
 		atomic_inc(&vcc->stats->tx_err);
 		return -ENOMEM;
 	}
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 
 	memcpy(skb_put(skb, 52), cell, 52);
 
diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
index c1da539..4d97a89 100644
--- a/include/linux/atmdev.h
+++ b/include/linux/atmdev.h
@@ -254,7 +254,7 @@ static inline void atm_return(struct atm_vcc *vcc,int truesize)
 
 static inline int atm_may_send(struct atm_vcc *vcc,unsigned int size)
 {
-	return (size + atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) <
+	return (size + refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) <
 	       sk_atm(vcc)->sk_sndbuf;
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index c519de7..24dcdba 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -388,7 +388,7 @@ struct sock {
 
 	/* ===== cache line for TX ===== */
 	int			sk_wmem_queued;
-	atomic_t		sk_wmem_alloc;
+	refcount_t		sk_wmem_alloc;
 	unsigned long		sk_tsq_flags;
 	struct sk_buff		*sk_send_head;
 	struct sk_buff_head	sk_write_queue;
@@ -1916,7 +1916,7 @@ static inline int skb_copy_to_page_nocache(struct sock *sk, struct iov_iter *fro
  */
 static inline int sk_wmem_alloc_get(const struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) - 1;
+	return refcount_read(&sk->sk_wmem_alloc) - 1;
 }
 
 /**
@@ -2060,7 +2060,7 @@ static inline unsigned long sock_wspace(struct sock *sk)
 	int amt = 0;
 
 	if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
-		amt = sk->sk_sndbuf - atomic_read(&sk->sk_wmem_alloc);
+		amt = sk->sk_sndbuf - refcount_read(&sk->sk_wmem_alloc);
 		if (amt < 0)
 			amt = 0;
 	}
@@ -2141,7 +2141,7 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
  */
 static inline bool sock_writeable(const struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) < (sk->sk_sndbuf >> 1);
+	return refcount_read(&sk->sk_wmem_alloc) < (sk->sk_sndbuf >> 1);
 }
 
 static inline gfp_t gfp_any(void)
diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index fca84e1..4e11119 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -252,7 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 
 	ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
-	atomic_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 33e0940..e2e1318 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -381,7 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
 		memcpy(here, llc_oui, sizeof(llc_oui));
 		((__be16 *) here)[3] = skb->protocol;
 	}
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = vcc->atm_options;
 	entry->vccs->last_use = jiffies;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
diff --git a/net/atm/common.c b/net/atm/common.c
index 9613381..e58d938 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -75,7 +75,7 @@ static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
 	while (!(skb = alloc_skb(size, GFP_KERNEL)))
 		schedule();
 	pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 	return skb;
 }
 
@@ -85,9 +85,9 @@ static void vcc_sock_destruct(struct sock *sk)
 		printk(KERN_DEBUG "%s: rmem leakage (%d bytes) detected.\n",
 		       __func__, atomic_read(&sk->sk_rmem_alloc));
 
-	if (atomic_read(&sk->sk_wmem_alloc))
+	if (refcount_read(&sk->sk_wmem_alloc))
 		printk(KERN_DEBUG "%s: wmem leakage (%d bytes) detected.\n",
-		       __func__, atomic_read(&sk->sk_wmem_alloc));
+		       __func__, refcount_read(&sk->sk_wmem_alloc));
 }
 
 static void vcc_def_wakeup(struct sock *sk)
@@ -106,7 +106,7 @@ static inline int vcc_writable(struct sock *sk)
 	struct atm_vcc *vcc = atm_sk(sk);
 
 	return (vcc->qos.txtp.max_sdu +
-		atomic_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf;
+		refcount_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf;
 }
 
 static void vcc_write_space(struct sock *sk)
@@ -161,7 +161,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family, i
 	memset(&vcc->local, 0, sizeof(struct sockaddr_atmsvc));
 	memset(&vcc->remote, 0, sizeof(struct sockaddr_atmsvc));
 	vcc->qos.txtp.max_sdu = 1 << 16; /* for meta VCs */
-	atomic_set(&sk->sk_wmem_alloc, 1);
+	refcount_set(&sk->sk_wmem_alloc, 1);
 	atomic_set(&sk->sk_rmem_alloc, 0);
 	vcc->push = NULL;
 	vcc->pop = NULL;
diff --git a/net/atm/lec.c b/net/atm/lec.c
index 09cfe87..7554571 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -181,7 +181,7 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	ATM_SKB(skb)->vcc = vcc;
 	ATM_SKB(skb)->atm_options = vcc->atm_options;
 
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 	if (vcc->send(vcc, skb) < 0) {
 		dev->stats.tx_dropped++;
 		return;
@@ -345,7 +345,7 @@ static int lec_atm_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	int i;
 	char *tmp;		/* FIXME */
 
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 	mesg = (struct atmlec_msg *)skb->data;
 	tmp = skb->data;
 	tmp += sizeof(struct atmlec_msg);
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index a190800..680a4b9 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -555,7 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
 					sizeof(struct llc_snap_hdr));
 	}
 
-	atomic_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
 	entry->shortcut->send(entry->shortcut, skb);
 	entry->packets_fwded++;
@@ -911,7 +911,7 @@ static int msg_from_mpoad(struct atm_vcc *vcc, struct sk_buff *skb)
 
 	struct mpoa_client *mpc = find_mpc_by_vcc(vcc);
 	struct k_message *mesg = (struct k_message *)skb->data;
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 
 	if (mpc == NULL) {
 		pr_info("no mpc found\n");
diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index c4e0984..21d9d34 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,7 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 		return 1;
 	}
 
-	atomic_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
 		 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
diff --git a/net/atm/raw.c b/net/atm/raw.c
index 2e17e97..821c079 100644
--- a/net/atm/raw.c
+++ b/net/atm/raw.c
@@ -35,7 +35,7 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
 
 	pr_debug("(%d) %d -= %d\n",
 		 vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
-	atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
 	dev_kfree_skb_any(skb);
 	sk->sk_write_space(sk);
 }
diff --git a/net/atm/signaling.c b/net/atm/signaling.c
index adb6e3d..ca59496 100644
--- a/net/atm/signaling.c
+++ b/net/atm/signaling.c
@@ -67,7 +67,7 @@ static int sigd_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	struct sock *sk;
 
 	msg = (struct atmsvc_msg *) skb->data;
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 	vcc = *(struct atm_vcc **) &msg->vcc;
 	pr_debug("%d (0x%lx)\n", (int)msg->type, (unsigned long)vcc);
 	sk = sk_atm(vcc);
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index adcad34..0ea2616 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1009,7 +1009,7 @@ static const struct proto_ops caif_stream_ops = {
 static void caif_sock_destructor(struct sock *sk)
 {
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-	caif_assert(!atomic_read(&sk->sk_wmem_alloc));
+	caif_assert(!refcount_read(&sk->sk_wmem_alloc));
 	caif_assert(sk_unhashed(sk));
 	caif_assert(!sk->sk_socket);
 	if (!sock_flag(sk, SOCK_DEAD)) {
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 281e5d6..d0702d0 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -595,7 +595,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from)
 		skb->data_len += copied;
 		skb->len += copied;
 		skb->truesize += truesize;
-		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
+		refcount_add(truesize, &skb->sk->sk_wmem_alloc);
 		while (copied) {
 			int size = min_t(int, copied, PAGE_SIZE - start);
 			skb_fill_page_desc(skb, frag++, pages[n], start, size);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6911269..e81a27f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2985,7 +2985,7 @@ int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 		get_page(pfrag->page);
 
 		skb->truesize += copy;
-		atomic_add(copy, &sk->sk_wmem_alloc);
+		refcount_add(copy, &sk->sk_wmem_alloc);
 		skb->len += copy;
 		skb->data_len += copy;
 		offset += copy;
diff --git a/net/core/sock.c b/net/core/sock.c
index f6fd79f..e830ddc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1404,7 +1404,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		if (likely(sk->sk_net_refcnt))
 			get_net(net);
 		sock_net_set(sk, net);
-		atomic_set(&sk->sk_wmem_alloc, 1);
+		refcount_set(&sk->sk_wmem_alloc, 1);
 
 		mem_cgroup_sk_alloc(sk);
 		cgroup_sk_alloc(&sk->sk_cgrp_data);
@@ -1428,7 +1428,7 @@ static void __sk_destruct(struct rcu_head *head)
 		sk->sk_destruct(sk);
 
 	filter = rcu_dereference_check(sk->sk_filter,
-				       atomic_read(&sk->sk_wmem_alloc) == 0);
+				       refcount_read(&sk->sk_wmem_alloc) == 0);
 	if (filter) {
 		sk_filter_uncharge(sk, filter);
 		RCU_INIT_POINTER(sk->sk_filter, NULL);
@@ -1473,7 +1473,7 @@ void sk_free(struct sock *sk)
 	 * some packets are still in some tx queue.
 	 * If not null, sock_wfree() will call __sk_free(sk) later
 	 */
-	if (atomic_dec_and_test(&sk->sk_wmem_alloc))
+	if (refcount_dec_and_test(&sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sk_free);
@@ -1509,7 +1509,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		/*
 		 * sk_wmem_alloc set to one (see sk_free() and sock_wfree())
 		 */
-		atomic_set(&newsk->sk_wmem_alloc, 1);
+		refcount_set(&newsk->sk_wmem_alloc, 1);
 		atomic_set(&newsk->sk_omem_alloc, 0);
 		skb_queue_head_init(&newsk->sk_receive_queue);
 		skb_queue_head_init(&newsk->sk_write_queue);
@@ -1638,7 +1638,7 @@ void sock_wfree(struct sk_buff *skb)
 		 * Keep a reference on sk_wmem_alloc, this will be released
 		 * after sk_write_space() call
 		 */
-		atomic_sub(len - 1, &sk->sk_wmem_alloc);
+		WARN_ON(refcount_sub_and_test(len - 1, &sk->sk_wmem_alloc));
 		sk->sk_write_space(sk);
 		len = 1;
 	}
@@ -1646,7 +1646,7 @@ void sock_wfree(struct sk_buff *skb)
 	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
 	 * could not do because of in-flight packets
 	 */
-	if (atomic_sub_and_test(len, &sk->sk_wmem_alloc))
+	if (refcount_sub_and_test(len, &sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);
@@ -1658,7 +1658,7 @@ void __sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
 
-	if (atomic_sub_and_test(skb->truesize, &sk->sk_wmem_alloc))
+	if (refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 
@@ -1680,7 +1680,7 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	 * is enough to guarantee sk_free() wont free this sock until
 	 * all in-flight packets are completed
 	 */
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
@@ -1708,7 +1708,7 @@ void skb_orphan_partial(struct sk_buff *skb)
 	    || skb->destructor == tcp_wfree
 #endif
 		) {
-		atomic_sub(skb->truesize - 1, &skb->sk->sk_wmem_alloc);
+		WARN_ON(refcount_sub_and_test(skb->truesize - 1, &skb->sk->sk_wmem_alloc));
 		skb->truesize = 1;
 	} else {
 		skb_orphan(skb);
@@ -1767,7 +1767,7 @@ EXPORT_SYMBOL(sock_i_ino);
 struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 			     gfp_t priority)
 {
-	if (force || atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+	if (force || refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
 		struct sk_buff *skb = alloc_skb(size, priority);
 		if (skb) {
 			skb_set_owner_w(skb, sk);
@@ -1842,7 +1842,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 			break;
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
-		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
+		if (refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
 			break;
 		if (sk->sk_shutdown & SEND_SHUTDOWN)
 			break;
@@ -2145,7 +2145,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 		if (sk->sk_type == SOCK_STREAM) {
 			if (sk->sk_wmem_queued < prot->sysctl_wmem[0])
 				return 1;
-		} else if (atomic_read(&sk->sk_wmem_alloc) <
+		} else if (refcount_read(&sk->sk_wmem_alloc) <
 			   prot->sysctl_wmem[0])
 				return 1;
 	}
@@ -2411,7 +2411,7 @@ static void sock_def_write_space(struct sock *sk)
 	/* Do not wake up a writer until he can make "significant"
 	 * progress.  --DaveM
 	 */
-	if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
+	if ((refcount_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (skwq_has_sleeper(wq))
 			wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5091f46..4d89969 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -150,7 +150,7 @@ void inet_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(sk->sk_wmem_queued);
 	WARN_ON(sk->sk_forward_alloc);
 
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b1e2444..d02afc2 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -337,7 +337,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 			skb->data_len += tailen;
 			skb->truesize += tailen;
 			if (sk)
-				atomic_add(tailen, &sk->sk_wmem_alloc);
+				refcount_add(tailen, &sk->sk_wmem_alloc);
 
 			skb_push(skb, -skb_network_offset(skb));
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 737ce82..a2ea706 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1036,7 +1036,7 @@ static int __ip_append_data(struct sock *sk,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
-				if (atomic_read(&sk->sk_wmem_alloc) <=
+				if (refcount_read(&sk->sk_wmem_alloc) <=
 				    2 * sk->sk_sndbuf)
 					skb = sock_wmalloc(sk,
 							   alloclen + hh_len + 15, 1,
@@ -1144,7 +1144,7 @@ static int __ip_append_data(struct sock *sk,
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
-			atomic_add(copy, &sk->sk_wmem_alloc);
+			refcount_add(copy, &sk->sk_wmem_alloc);
 		}
 		offset += copy;
 		length -= copy;
@@ -1368,7 +1368,7 @@ ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 		skb->len += len;
 		skb->data_len += len;
 		skb->truesize += len;
-		atomic_add(len, &sk->sk_wmem_alloc);
+		refcount_add(len, &sk->sk_wmem_alloc);
 		offset += len;
 		size -= len;
 	}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3354a61..1f82de5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -644,7 +644,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb,
 	return skb->len < size_goal &&
 	       sysctl_tcp_autocorking &&
 	       skb != tcp_write_queue_head(sk) &&
-	       atomic_read(&sk->sk_wmem_alloc) > skb->truesize;
+	       refcount_read(&sk->sk_wmem_alloc) > skb->truesize;
 }
 
 static void tcp_push(struct sock *sk, int flags, int mss_now,
@@ -672,7 +672,7 @@ static void tcp_push(struct sock *sk, int flags, int mss_now,
 		/* It is possible TX completion already happened
 		 * before we set TSQ_THROTTLED.
 		 */
-		if (atomic_read(&sk->sk_wmem_alloc) > skb->truesize)
+		if (refcount_read(&sk->sk_wmem_alloc) > skb->truesize)
 			return;
 	}
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index bc68da3..11f69bb 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -152,7 +152,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 		swap(gso_skb->sk, skb->sk);
 		swap(gso_skb->destructor, skb->destructor);
 		sum_truesize += skb->truesize;
-		atomic_add(sum_truesize - gso_skb->truesize,
+		refcount_add(sum_truesize - gso_skb->truesize,
 			   &skb->sk->sk_wmem_alloc);
 	}
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 22548b5..7cd1283 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -863,12 +863,11 @@ void tcp_wfree(struct sk_buff *skb)
 	struct sock *sk = skb->sk;
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned long flags, nval, oval;
-	int wmem;
 
 	/* Keep one reference on sk_wmem_alloc.
 	 * Will be released by sk_free() from here or tcp_tasklet_func()
 	 */
-	wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize - 1, &sk->sk_wmem_alloc));
 
 	/* If this softirq is serviced by ksoftirqd, we are likely under stress.
 	 * Wait until our queues (qdisc + devices) are drained.
@@ -877,7 +876,7 @@ void tcp_wfree(struct sk_buff *skb)
 	 * - chance for incoming ACK (processed by another cpu maybe)
 	 *   to migrate this flow (skb->ooo_okay will be eventually set)
 	 */
-	if (wmem >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
+	if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
 		goto out;
 
 	for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
@@ -981,7 +980,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb->sk = sk;
 	skb->destructor = skb_is_tcp_pure_ack(skb) ? __sock_wfree : tcp_wfree;
 	skb_set_hash_from_sk(skb, sk);
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 
 	skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm);
 
@@ -2101,7 +2100,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 	limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
 	limit <<= factor;
 
-	if (atomic_read(&sk->sk_wmem_alloc) > limit) {
+	if (refcount_read(&sk->sk_wmem_alloc) > limit) {
 		/* Always send the 1st or 2nd skb in write queue.
 		 * No need to wait for TX completion to call us back,
 		 * after softirq/tasklet schedule.
@@ -2117,7 +2116,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 		 * test again the condition.
 		 */
 		smp_mb__after_atomic();
-		if (atomic_read(&sk->sk_wmem_alloc) > limit)
+		if (refcount_read(&sk->sk_wmem_alloc) > limit)
 			return true;
 	}
 	return false;
@@ -2735,7 +2734,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	/* Do not sent more than we queued. 1/4 is reserved for possible
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
-	if (atomic_read(&sk->sk_wmem_alloc) >
+	if (refcount_read(&sk->sk_wmem_alloc) >
 	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
 		  sk->sk_sndbuf))
 		return -EAGAIN;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index ff54faa..b8f127e 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -317,7 +317,7 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 			skb->data_len += tailen;
 			skb->truesize += tailen;
 			if (sk)
-				atomic_add(tailen, &sk->sk_wmem_alloc);
+				refcount_add(tailen, &sk->sk_wmem_alloc);
 
 			skb_push(skb, -skb_network_offset(skb));
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 528b3c1..42a2f73 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1464,7 +1464,7 @@ static int __ip6_append_data(struct sock *sk,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
-				if (atomic_read(&sk->sk_wmem_alloc) <=
+				if (refcount_read(&sk->sk_wmem_alloc) <=
 				    2 * sk->sk_sndbuf)
 					skb = sock_wmalloc(sk,
 							   alloclen + hh_len, 1,
@@ -1577,7 +1577,7 @@ static int __ip6_append_data(struct sock *sk,
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
-			atomic_add(copy, &sk->sk_wmem_alloc);
+			refcount_add(copy, &sk->sk_wmem_alloc);
 		}
 		offset += copy;
 		length -= copy;
diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index bf75c92..c343ac6 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -162,7 +162,7 @@ static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq,
 		   psock->sk->sk_receive_queue.qlen,
 		   atomic_read(&psock->sk->sk_rmem_alloc),
 		   psock->sk->sk_write_queue.qlen,
-		   atomic_read(&psock->sk->sk_wmem_alloc));
+		   refcount_read(&psock->sk->sk_wmem_alloc));
 
 	if (psock->done)
 		seq_puts(seq, "Done ");
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9d96b82..ba00bd3 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -104,7 +104,7 @@ static void pfkey_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 
 	atomic_dec(&net_pfkey->socks_nr);
 }
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7dac4a9..9332b24 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -341,7 +341,7 @@ static void netlink_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(nlk_sk(sk)->groups);
 }
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a0dbe7c..82eb052 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1320,7 +1320,7 @@ static void packet_sock_destruct(struct sock *sk)
 	skb_queue_purge(&sk->sk_error_queue);
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 
 	if (!sock_flag(sk, SOCK_DEAD)) {
 		pr_err("Attempt to release alive packet socket: %p\n", sk);
@@ -2482,7 +2482,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->data_len = to_write;
 	skb->len += to_write;
 	skb->truesize += to_write;
-	atomic_add(to_write, &po->sk.sk_wmem_alloc);
+	refcount_add(to_write, &po->sk.sk_wmem_alloc);
 
 	while (likely(to_write)) {
 		nr_frags = skb_shinfo(skb)->nr_frags;
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index a6c8da3..27b0b13 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -360,7 +360,7 @@ static unsigned int pn_socket_poll(struct file *file, struct socket *sock,
 		return POLLHUP;
 
 	if (sk->sk_state == TCP_ESTABLISHED &&
-		atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf &&
+		refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf &&
 		atomic_read(&pn->tx_credits))
 		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
 
diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index dcf4742..592e68b 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -208,7 +208,7 @@ void rds_tcp_write_space(struct sock *sk)
 	tc->t_last_seen_una = rds_tcp_snd_una(tc);
 	rds_send_path_drop_acked(cp, rds_tcp_snd_una(tc), rds_tcp_is_acked);
 
-	if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
+	if ((refcount_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
 		queue_delayed_work(rds_wq, &cp->cp_send_w, 0);
 
 out:
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 7fb59c3..b473ac2 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -56,7 +56,7 @@ static void rxrpc_sock_destructor(struct sock *);
  */
 static inline int rxrpc_writable(struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) < (size_t) sk->sk_sndbuf;
+	return refcount_read(&sk->sk_wmem_alloc) < (size_t) sk->sk_sndbuf;
 }
 
 /*
@@ -665,7 +665,7 @@ static void rxrpc_sock_destructor(struct sock *sk)
 
 	rxrpc_purge_queue(&sk->sk_receive_queue);
 
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(!sk_unhashed(sk));
 	WARN_ON(sk->sk_socket);
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 2209c2d..2b0778b 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -491,7 +491,7 @@ static void sch_atm_dequeue(unsigned long data)
 			ATM_SKB(skb)->vcc = flow->vcc;
 			memcpy(skb_push(skb, flow->hdr_len), flow->hdr,
 			       flow->hdr_len);
-			atomic_add(skb->truesize,
+			refcount_add(skb->truesize,
 				   &sk_atm(flow->vcc)->sk_wmem_alloc);
 			/* atm.atm_options are already set by atm_tc_enqueue */
 			flow->vcc->send(flow->vcc, skb);
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 71ce6b9..6574281 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -392,7 +392,7 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	 * therefore only reserve a single byte to keep socket around until
 	 * the packet has been transmitted.
 	 */
-	atomic_inc(&sk->sk_wmem_alloc);
+	refcount_inc(&sk->sk_wmem_alloc);
 }
 
 static int sctp_packet_pack(struct sctp_packet *packet,
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 206377f..25cd840 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -365,7 +365,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 		assoc->c.sinit_num_ostreams, assoc->max_retrans,
 		assoc->init_retries, assoc->shutdown_retries,
 		assoc->rtx_data_chunks,
-		atomic_read(&sk->sk_wmem_alloc),
+		refcount_read(&sk->sk_wmem_alloc),
 		sk->sk_wmem_queued,
 		sk->sk_sndbuf,
 		sk->sk_rcvbuf);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c1120d5..67dfec1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -164,7 +164,7 @@ static inline void sctp_set_owner_w(struct sctp_chunk *chunk)
 				sizeof(struct sk_buff) +
 				sizeof(struct sctp_chunk);
 
-	atomic_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
+	refcount_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
 	sk->sk_wmem_queued += chunk->skb->truesize;
 	sk_mem_charge(sk, chunk->skb->truesize);
 }
@@ -7539,7 +7539,7 @@ static void sctp_wfree(struct sk_buff *skb)
 				sizeof(struct sk_buff) +
 				sizeof(struct sctp_chunk);
 
-	atomic_sub(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc));
 
 	/*
 	 * This undoes what is done via sctp_set_owner_w and sk_mem_charge
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a48d403..a74339e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -442,7 +442,7 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
 static int unix_writable(const struct sock *sk)
 {
 	return sk->sk_state != TCP_LISTEN &&
-	       (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+	       (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
 }
 
 static void unix_write_space(struct sock *sk)
@@ -487,7 +487,7 @@ static void unix_sock_destructor(struct sock *sk)
 
 	skb_queue_purge(&sk->sk_receive_queue);
 
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(!sk_unhashed(sk));
 	WARN_ON(sk->sk_socket);
 	if (!sock_flag(sk, SOCK_DEAD)) {
@@ -2024,7 +2024,7 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
 	skb->len += size;
 	skb->data_len += size;
 	skb->truesize += size;
-	atomic_add(size, &sk->sk_wmem_alloc);
+	refcount_add(size, &sk->sk_wmem_alloc);
 
 	if (newskb) {
 		err = unix_scm_to_skb(&scm, skb, false);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 06/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 drivers/atm/fore200e.c   | 12 +-----------
 drivers/atm/he.c         |  2 +-
 drivers/atm/idt77252.c   |  4 ++--
 include/linux/atmdev.h   |  2 +-
 include/net/sock.h       |  8 ++++----
 net/atm/br2684.c         |  2 +-
 net/atm/clip.c           |  2 +-
 net/atm/common.c         | 10 +++++-----
 net/atm/lec.c            |  4 ++--
 net/atm/mpc.c            |  4 ++--
 net/atm/pppoatm.c        |  2 +-
 net/atm/raw.c            |  2 +-
 net/atm/signaling.c      |  2 +-
 net/caif/caif_socket.c   |  2 +-
 net/core/datagram.c      |  2 +-
 net/core/skbuff.c        |  2 +-
 net/core/sock.c          | 26 +++++++++++++-------------
 net/ipv4/af_inet.c       |  2 +-
 net/ipv4/esp4.c          |  2 +-
 net/ipv4/ip_output.c     |  6 +++---
 net/ipv4/tcp.c           |  4 ++--
 net/ipv4/tcp_offload.c   |  2 +-
 net/ipv4/tcp_output.c    | 13 ++++++-------
 net/ipv6/esp6.c          |  2 +-
 net/ipv6/ip6_output.c    |  4 ++--
 net/kcm/kcmproc.c        |  2 +-
 net/key/af_key.c         |  2 +-
 net/netlink/af_netlink.c |  2 +-
 net/packet/af_packet.c   |  4 ++--
 net/phonet/socket.c      |  2 +-
 net/rds/tcp_send.c       |  2 +-
 net/rxrpc/af_rxrpc.c     |  4 ++--
 net/sched/sch_atm.c      |  2 +-
 net/sctp/output.c        |  2 +-
 net/sctp/proc.c          |  2 +-
 net/sctp/socket.c        |  4 ++--
 net/unix/af_unix.c       |  6 +++---
 37 files changed, 73 insertions(+), 84 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 637c3e6..b770d18 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -924,12 +924,7 @@ fore200e_tx_irq(struct fore200e* fore200e)
 		else {
 		    dev_kfree_skb_any(entry->skb);
 		}
-#if 1
-		/* race fixed by the above incarnation mechanism, but... */
-		if (atomic_read(&sk_atm(vcc)->sk_wmem_alloc) < 0) {
-		    atomic_set(&sk_atm(vcc)->sk_wmem_alloc, 0);
-		}
-#endif
+
 		/* check error condition */
 		if (*entry->status & STATUS_ERROR)
 		    atomic_inc(&vcc->stats->tx_err);
@@ -1130,13 +1125,9 @@ fore200e_push_rpd(struct fore200e* fore200e, struct atm_vcc* vcc, struct rpd* rp
 	return -ENOMEM;
     }
 
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
-
     vcc->push(vcc, skb);
     atomic_inc(&vcc->stats->rx);
 
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
-
     return 0;
 }
 
@@ -1572,7 +1563,6 @@ fore200e_send(struct atm_vcc *vcc, struct sk_buff *skb)
     unsigned long           flags;
 
     ASSERT(vcc);
-    ASSERT(atomic_read(&sk_atm(vcc)->sk_wmem_alloc) >= 0);
     ASSERT(fore200e);
     ASSERT(fore200e_vcc);
 
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 3617659..fc1bbdb 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2395,7 +2395,7 @@ he_close(struct atm_vcc *vcc)
 		 * TBRQ, the host issues the close command to the adapter.
 		 */
 
-		while (((tx_inuse = atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) > 1) &&
+		while (((tx_inuse = refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) > 1) &&
 		       (retry < MAX_RETRY)) {
 			msleep(sleep);
 			if (sleep < 250)
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index 5ec1095..20eda87 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -724,7 +724,7 @@ push_on_scq(struct idt77252_dev *card, struct vc_map *vc, struct sk_buff *skb)
 		struct sock *sk = sk_atm(vcc);
 
 		vc->estimator->cells += (skb->len + 47) / 48;
-		if (atomic_read(&sk->sk_wmem_alloc) >
+		if (refcount_read(&sk->sk_wmem_alloc) >
 		    (sk->sk_sndbuf >> 1)) {
 			u32 cps = vc->estimator->maxcps;
 
@@ -2012,7 +2012,7 @@ idt77252_send_oam(struct atm_vcc *vcc, void *cell, int flags)
 		atomic_inc(&vcc->stats->tx_err);
 		return -ENOMEM;
 	}
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 
 	memcpy(skb_put(skb, 52), cell, 52);
 
diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
index c1da539..4d97a89 100644
--- a/include/linux/atmdev.h
+++ b/include/linux/atmdev.h
@@ -254,7 +254,7 @@ static inline void atm_return(struct atm_vcc *vcc,int truesize)
 
 static inline int atm_may_send(struct atm_vcc *vcc,unsigned int size)
 {
-	return (size + atomic_read(&sk_atm(vcc)->sk_wmem_alloc)) <
+	return (size + refcount_read(&sk_atm(vcc)->sk_wmem_alloc)) <
 	       sk_atm(vcc)->sk_sndbuf;
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index c519de7..24dcdba 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -388,7 +388,7 @@ struct sock {
 
 	/* ===== cache line for TX ===== */
 	int			sk_wmem_queued;
-	atomic_t		sk_wmem_alloc;
+	refcount_t		sk_wmem_alloc;
 	unsigned long		sk_tsq_flags;
 	struct sk_buff		*sk_send_head;
 	struct sk_buff_head	sk_write_queue;
@@ -1916,7 +1916,7 @@ static inline int skb_copy_to_page_nocache(struct sock *sk, struct iov_iter *fro
  */
 static inline int sk_wmem_alloc_get(const struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) - 1;
+	return refcount_read(&sk->sk_wmem_alloc) - 1;
 }
 
 /**
@@ -2060,7 +2060,7 @@ static inline unsigned long sock_wspace(struct sock *sk)
 	int amt = 0;
 
 	if (!(sk->sk_shutdown & SEND_SHUTDOWN)) {
-		amt = sk->sk_sndbuf - atomic_read(&sk->sk_wmem_alloc);
+		amt = sk->sk_sndbuf - refcount_read(&sk->sk_wmem_alloc);
 		if (amt < 0)
 			amt = 0;
 	}
@@ -2141,7 +2141,7 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
  */
 static inline bool sock_writeable(const struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) < (sk->sk_sndbuf >> 1);
+	return refcount_read(&sk->sk_wmem_alloc) < (sk->sk_sndbuf >> 1);
 }
 
 static inline gfp_t gfp_any(void)
diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index fca84e1..4e11119 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -252,7 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 
 	ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
-	atomic_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 33e0940..e2e1318 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -381,7 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
 		memcpy(here, llc_oui, sizeof(llc_oui));
 		((__be16 *) here)[3] = skb->protocol;
 	}
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = vcc->atm_options;
 	entry->vccs->last_use = jiffies;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
diff --git a/net/atm/common.c b/net/atm/common.c
index 9613381..e58d938 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -75,7 +75,7 @@ static struct sk_buff *alloc_tx(struct atm_vcc *vcc, unsigned int size)
 	while (!(skb = alloc_skb(size, GFP_KERNEL)))
 		schedule();
 	pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 	return skb;
 }
 
@@ -85,9 +85,9 @@ static void vcc_sock_destruct(struct sock *sk)
 		printk(KERN_DEBUG "%s: rmem leakage (%d bytes) detected.\n",
 		       __func__, atomic_read(&sk->sk_rmem_alloc));
 
-	if (atomic_read(&sk->sk_wmem_alloc))
+	if (refcount_read(&sk->sk_wmem_alloc))
 		printk(KERN_DEBUG "%s: wmem leakage (%d bytes) detected.\n",
-		       __func__, atomic_read(&sk->sk_wmem_alloc));
+		       __func__, refcount_read(&sk->sk_wmem_alloc));
 }
 
 static void vcc_def_wakeup(struct sock *sk)
@@ -106,7 +106,7 @@ static inline int vcc_writable(struct sock *sk)
 	struct atm_vcc *vcc = atm_sk(sk);
 
 	return (vcc->qos.txtp.max_sdu +
-		atomic_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf;
+		refcount_read(&sk->sk_wmem_alloc)) <= sk->sk_sndbuf;
 }
 
 static void vcc_write_space(struct sock *sk)
@@ -161,7 +161,7 @@ int vcc_create(struct net *net, struct socket *sock, int protocol, int family, i
 	memset(&vcc->local, 0, sizeof(struct sockaddr_atmsvc));
 	memset(&vcc->remote, 0, sizeof(struct sockaddr_atmsvc));
 	vcc->qos.txtp.max_sdu = 1 << 16; /* for meta VCs */
-	atomic_set(&sk->sk_wmem_alloc, 1);
+	refcount_set(&sk->sk_wmem_alloc, 1);
 	atomic_set(&sk->sk_rmem_alloc, 0);
 	vcc->push = NULL;
 	vcc->pop = NULL;
diff --git a/net/atm/lec.c b/net/atm/lec.c
index 09cfe87..7554571 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -181,7 +181,7 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	ATM_SKB(skb)->vcc = vcc;
 	ATM_SKB(skb)->atm_options = vcc->atm_options;
 
-	atomic_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 	if (vcc->send(vcc, skb) < 0) {
 		dev->stats.tx_dropped++;
 		return;
@@ -345,7 +345,7 @@ static int lec_atm_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	int i;
 	char *tmp;		/* FIXME */
 
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 	mesg = (struct atmlec_msg *)skb->data;
 	tmp = skb->data;
 	tmp += sizeof(struct atmlec_msg);
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index a190800..680a4b9 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -555,7 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
 					sizeof(struct llc_snap_hdr));
 	}
 
-	atomic_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
 	entry->shortcut->send(entry->shortcut, skb);
 	entry->packets_fwded++;
@@ -911,7 +911,7 @@ static int msg_from_mpoad(struct atm_vcc *vcc, struct sk_buff *skb)
 
 	struct mpoa_client *mpc = find_mpc_by_vcc(vcc);
 	struct k_message *mesg = (struct k_message *)skb->data;
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 
 	if (mpc == NULL) {
 		pr_info("no mpc found\n");
diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index c4e0984..21d9d34 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,7 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 		return 1;
 	}
 
-	atomic_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
 	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
 		 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
diff --git a/net/atm/raw.c b/net/atm/raw.c
index 2e17e97..821c079 100644
--- a/net/atm/raw.c
+++ b/net/atm/raw.c
@@ -35,7 +35,7 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
 
 	pr_debug("(%d) %d -= %d\n",
 		 vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
-	atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
 	dev_kfree_skb_any(skb);
 	sk->sk_write_space(sk);
 }
diff --git a/net/atm/signaling.c b/net/atm/signaling.c
index adb6e3d..ca59496 100644
--- a/net/atm/signaling.c
+++ b/net/atm/signaling.c
@@ -67,7 +67,7 @@ static int sigd_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	struct sock *sk;
 
 	msg = (struct atmsvc_msg *) skb->data;
-	atomic_sub(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc));
 	vcc = *(struct atm_vcc **) &msg->vcc;
 	pr_debug("%d (0x%lx)\n", (int)msg->type, (unsigned long)vcc);
 	sk = sk_atm(vcc);
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index adcad34..0ea2616 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1009,7 +1009,7 @@ static const struct proto_ops caif_stream_ops = {
 static void caif_sock_destructor(struct sock *sk)
 {
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
-	caif_assert(!atomic_read(&sk->sk_wmem_alloc));
+	caif_assert(!refcount_read(&sk->sk_wmem_alloc));
 	caif_assert(sk_unhashed(sk));
 	caif_assert(!sk->sk_socket);
 	if (!sock_flag(sk, SOCK_DEAD)) {
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 281e5d6..d0702d0 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -595,7 +595,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from)
 		skb->data_len += copied;
 		skb->len += copied;
 		skb->truesize += truesize;
-		atomic_add(truesize, &skb->sk->sk_wmem_alloc);
+		refcount_add(truesize, &skb->sk->sk_wmem_alloc);
 		while (copied) {
 			int size = min_t(int, copied, PAGE_SIZE - start);
 			skb_fill_page_desc(skb, frag++, pages[n], start, size);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6911269..e81a27f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2985,7 +2985,7 @@ int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 		get_page(pfrag->page);
 
 		skb->truesize += copy;
-		atomic_add(copy, &sk->sk_wmem_alloc);
+		refcount_add(copy, &sk->sk_wmem_alloc);
 		skb->len += copy;
 		skb->data_len += copy;
 		offset += copy;
diff --git a/net/core/sock.c b/net/core/sock.c
index f6fd79f..e830ddc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1404,7 +1404,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		if (likely(sk->sk_net_refcnt))
 			get_net(net);
 		sock_net_set(sk, net);
-		atomic_set(&sk->sk_wmem_alloc, 1);
+		refcount_set(&sk->sk_wmem_alloc, 1);
 
 		mem_cgroup_sk_alloc(sk);
 		cgroup_sk_alloc(&sk->sk_cgrp_data);
@@ -1428,7 +1428,7 @@ static void __sk_destruct(struct rcu_head *head)
 		sk->sk_destruct(sk);
 
 	filter = rcu_dereference_check(sk->sk_filter,
-				       atomic_read(&sk->sk_wmem_alloc) == 0);
+				       refcount_read(&sk->sk_wmem_alloc) == 0);
 	if (filter) {
 		sk_filter_uncharge(sk, filter);
 		RCU_INIT_POINTER(sk->sk_filter, NULL);
@@ -1473,7 +1473,7 @@ void sk_free(struct sock *sk)
 	 * some packets are still in some tx queue.
 	 * If not null, sock_wfree() will call __sk_free(sk) later
 	 */
-	if (atomic_dec_and_test(&sk->sk_wmem_alloc))
+	if (refcount_dec_and_test(&sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sk_free);
@@ -1509,7 +1509,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		/*
 		 * sk_wmem_alloc set to one (see sk_free() and sock_wfree())
 		 */
-		atomic_set(&newsk->sk_wmem_alloc, 1);
+		refcount_set(&newsk->sk_wmem_alloc, 1);
 		atomic_set(&newsk->sk_omem_alloc, 0);
 		skb_queue_head_init(&newsk->sk_receive_queue);
 		skb_queue_head_init(&newsk->sk_write_queue);
@@ -1638,7 +1638,7 @@ void sock_wfree(struct sk_buff *skb)
 		 * Keep a reference on sk_wmem_alloc, this will be released
 		 * after sk_write_space() call
 		 */
-		atomic_sub(len - 1, &sk->sk_wmem_alloc);
+		WARN_ON(refcount_sub_and_test(len - 1, &sk->sk_wmem_alloc));
 		sk->sk_write_space(sk);
 		len = 1;
 	}
@@ -1646,7 +1646,7 @@ void sock_wfree(struct sk_buff *skb)
 	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
 	 * could not do because of in-flight packets
 	 */
-	if (atomic_sub_and_test(len, &sk->sk_wmem_alloc))
+	if (refcount_sub_and_test(len, &sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);
@@ -1658,7 +1658,7 @@ void __sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
 
-	if (atomic_sub_and_test(skb->truesize, &sk->sk_wmem_alloc))
+	if (refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc))
 		__sk_free(sk);
 }
 
@@ -1680,7 +1680,7 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	 * is enough to guarantee sk_free() wont free this sock until
 	 * all in-flight packets are completed
 	 */
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
@@ -1708,7 +1708,7 @@ void skb_orphan_partial(struct sk_buff *skb)
 	    || skb->destructor == tcp_wfree
 #endif
 		) {
-		atomic_sub(skb->truesize - 1, &skb->sk->sk_wmem_alloc);
+		WARN_ON(refcount_sub_and_test(skb->truesize - 1, &skb->sk->sk_wmem_alloc));
 		skb->truesize = 1;
 	} else {
 		skb_orphan(skb);
@@ -1767,7 +1767,7 @@ EXPORT_SYMBOL(sock_i_ino);
 struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 			     gfp_t priority)
 {
-	if (force || atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+	if (force || refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
 		struct sk_buff *skb = alloc_skb(size, priority);
 		if (skb) {
 			skb_set_owner_w(skb, sk);
@@ -1842,7 +1842,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 			break;
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
-		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
+		if (refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
 			break;
 		if (sk->sk_shutdown & SEND_SHUTDOWN)
 			break;
@@ -2145,7 +2145,7 @@ int __sk_mem_raise_allocated(struct sock *sk, int size, int amt, int kind)
 		if (sk->sk_type == SOCK_STREAM) {
 			if (sk->sk_wmem_queued < prot->sysctl_wmem[0])
 				return 1;
-		} else if (atomic_read(&sk->sk_wmem_alloc) <
+		} else if (refcount_read(&sk->sk_wmem_alloc) <
 			   prot->sysctl_wmem[0])
 				return 1;
 	}
@@ -2411,7 +2411,7 @@ static void sock_def_write_space(struct sock *sk)
 	/* Do not wake up a writer until he can make "significant"
 	 * progress.  --DaveM
 	 */
-	if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
+	if ((refcount_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (skwq_has_sleeper(wq))
 			wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5091f46..4d89969 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -150,7 +150,7 @@ void inet_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(sk->sk_wmem_queued);
 	WARN_ON(sk->sk_forward_alloc);
 
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b1e2444..d02afc2 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -337,7 +337,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 			skb->data_len += tailen;
 			skb->truesize += tailen;
 			if (sk)
-				atomic_add(tailen, &sk->sk_wmem_alloc);
+				refcount_add(tailen, &sk->sk_wmem_alloc);
 
 			skb_push(skb, -skb_network_offset(skb));
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 737ce82..a2ea706 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1036,7 +1036,7 @@ static int __ip_append_data(struct sock *sk,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
-				if (atomic_read(&sk->sk_wmem_alloc) <=
+				if (refcount_read(&sk->sk_wmem_alloc) <=
 				    2 * sk->sk_sndbuf)
 					skb = sock_wmalloc(sk,
 							   alloclen + hh_len + 15, 1,
@@ -1144,7 +1144,7 @@ static int __ip_append_data(struct sock *sk,
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
-			atomic_add(copy, &sk->sk_wmem_alloc);
+			refcount_add(copy, &sk->sk_wmem_alloc);
 		}
 		offset += copy;
 		length -= copy;
@@ -1368,7 +1368,7 @@ ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 		skb->len += len;
 		skb->data_len += len;
 		skb->truesize += len;
-		atomic_add(len, &sk->sk_wmem_alloc);
+		refcount_add(len, &sk->sk_wmem_alloc);
 		offset += len;
 		size -= len;
 	}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3354a61..1f82de5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -644,7 +644,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb,
 	return skb->len < size_goal &&
 	       sysctl_tcp_autocorking &&
 	       skb != tcp_write_queue_head(sk) &&
-	       atomic_read(&sk->sk_wmem_alloc) > skb->truesize;
+	       refcount_read(&sk->sk_wmem_alloc) > skb->truesize;
 }
 
 static void tcp_push(struct sock *sk, int flags, int mss_now,
@@ -672,7 +672,7 @@ static void tcp_push(struct sock *sk, int flags, int mss_now,
 		/* It is possible TX completion already happened
 		 * before we set TSQ_THROTTLED.
 		 */
-		if (atomic_read(&sk->sk_wmem_alloc) > skb->truesize)
+		if (refcount_read(&sk->sk_wmem_alloc) > skb->truesize)
 			return;
 	}
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index bc68da3..11f69bb 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -152,7 +152,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 		swap(gso_skb->sk, skb->sk);
 		swap(gso_skb->destructor, skb->destructor);
 		sum_truesize += skb->truesize;
-		atomic_add(sum_truesize - gso_skb->truesize,
+		refcount_add(sum_truesize - gso_skb->truesize,
 			   &skb->sk->sk_wmem_alloc);
 	}
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 22548b5..7cd1283 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -863,12 +863,11 @@ void tcp_wfree(struct sk_buff *skb)
 	struct sock *sk = skb->sk;
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned long flags, nval, oval;
-	int wmem;
 
 	/* Keep one reference on sk_wmem_alloc.
 	 * Will be released by sk_free() from here or tcp_tasklet_func()
 	 */
-	wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(skb->truesize - 1, &sk->sk_wmem_alloc));
 
 	/* If this softirq is serviced by ksoftirqd, we are likely under stress.
 	 * Wait until our queues (qdisc + devices) are drained.
@@ -877,7 +876,7 @@ void tcp_wfree(struct sk_buff *skb)
 	 * - chance for incoming ACK (processed by another cpu maybe)
 	 *   to migrate this flow (skb->ooo_okay will be eventually set)
 	 */
-	if (wmem >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
+	if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
 		goto out;
 
 	for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
@@ -981,7 +980,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	skb->sk = sk;
 	skb->destructor = skb_is_tcp_pure_ack(skb) ? __sock_wfree : tcp_wfree;
 	skb_set_hash_from_sk(skb, sk);
-	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
+	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
 
 	skb_set_dst_pending_confirm(skb, sk->sk_dst_pending_confirm);
 
@@ -2101,7 +2100,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 	limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
 	limit <<= factor;
 
-	if (atomic_read(&sk->sk_wmem_alloc) > limit) {
+	if (refcount_read(&sk->sk_wmem_alloc) > limit) {
 		/* Always send the 1st or 2nd skb in write queue.
 		 * No need to wait for TX completion to call us back,
 		 * after softirq/tasklet schedule.
@@ -2117,7 +2116,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 		 * test again the condition.
 		 */
 		smp_mb__after_atomic();
-		if (atomic_read(&sk->sk_wmem_alloc) > limit)
+		if (refcount_read(&sk->sk_wmem_alloc) > limit)
 			return true;
 	}
 	return false;
@@ -2735,7 +2734,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	/* Do not sent more than we queued. 1/4 is reserved for possible
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
-	if (atomic_read(&sk->sk_wmem_alloc) >
+	if (refcount_read(&sk->sk_wmem_alloc) >
 	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
 		  sk->sk_sndbuf))
 		return -EAGAIN;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index ff54faa..b8f127e 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -317,7 +317,7 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 			skb->data_len += tailen;
 			skb->truesize += tailen;
 			if (sk)
-				atomic_add(tailen, &sk->sk_wmem_alloc);
+				refcount_add(tailen, &sk->sk_wmem_alloc);
 
 			skb_push(skb, -skb_network_offset(skb));
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 528b3c1..42a2f73 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1464,7 +1464,7 @@ static int __ip6_append_data(struct sock *sk,
 						(flags & MSG_DONTWAIT), &err);
 			} else {
 				skb = NULL;
-				if (atomic_read(&sk->sk_wmem_alloc) <=
+				if (refcount_read(&sk->sk_wmem_alloc) <=
 				    2 * sk->sk_sndbuf)
 					skb = sock_wmalloc(sk,
 							   alloclen + hh_len, 1,
@@ -1577,7 +1577,7 @@ static int __ip6_append_data(struct sock *sk,
 			skb->len += copy;
 			skb->data_len += copy;
 			skb->truesize += copy;
-			atomic_add(copy, &sk->sk_wmem_alloc);
+			refcount_add(copy, &sk->sk_wmem_alloc);
 		}
 		offset += copy;
 		length -= copy;
diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index bf75c92..c343ac6 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -162,7 +162,7 @@ static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq,
 		   psock->sk->sk_receive_queue.qlen,
 		   atomic_read(&psock->sk->sk_rmem_alloc),
 		   psock->sk->sk_write_queue.qlen,
-		   atomic_read(&psock->sk->sk_wmem_alloc));
+		   refcount_read(&psock->sk->sk_wmem_alloc));
 
 	if (psock->done)
 		seq_puts(seq, "Done ");
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9d96b82..ba00bd3 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -104,7 +104,7 @@ static void pfkey_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 
 	atomic_dec(&net_pfkey->socks_nr);
 }
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7dac4a9..9332b24 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -341,7 +341,7 @@ static void netlink_sock_destruct(struct sock *sk)
 	}
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(nlk_sk(sk)->groups);
 }
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index a0dbe7c..82eb052 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1320,7 +1320,7 @@ static void packet_sock_destruct(struct sock *sk)
 	skb_queue_purge(&sk->sk_error_queue);
 
 	WARN_ON(atomic_read(&sk->sk_rmem_alloc));
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 
 	if (!sock_flag(sk, SOCK_DEAD)) {
 		pr_err("Attempt to release alive packet socket: %p\n", sk);
@@ -2482,7 +2482,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->data_len = to_write;
 	skb->len += to_write;
 	skb->truesize += to_write;
-	atomic_add(to_write, &po->sk.sk_wmem_alloc);
+	refcount_add(to_write, &po->sk.sk_wmem_alloc);
 
 	while (likely(to_write)) {
 		nr_frags = skb_shinfo(skb)->nr_frags;
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index a6c8da3..27b0b13 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -360,7 +360,7 @@ static unsigned int pn_socket_poll(struct file *file, struct socket *sock,
 		return POLLHUP;
 
 	if (sk->sk_state == TCP_ESTABLISHED &&
-		atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf &&
+		refcount_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf &&
 		atomic_read(&pn->tx_credits))
 		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
 
diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index dcf4742..592e68b 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -208,7 +208,7 @@ void rds_tcp_write_space(struct sock *sk)
 	tc->t_last_seen_una = rds_tcp_snd_una(tc);
 	rds_send_path_drop_acked(cp, rds_tcp_snd_una(tc), rds_tcp_is_acked);
 
-	if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
+	if ((refcount_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
 		queue_delayed_work(rds_wq, &cp->cp_send_w, 0);
 
 out:
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 7fb59c3..b473ac2 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -56,7 +56,7 @@ static void rxrpc_sock_destructor(struct sock *);
  */
 static inline int rxrpc_writable(struct sock *sk)
 {
-	return atomic_read(&sk->sk_wmem_alloc) < (size_t) sk->sk_sndbuf;
+	return refcount_read(&sk->sk_wmem_alloc) < (size_t) sk->sk_sndbuf;
 }
 
 /*
@@ -665,7 +665,7 @@ static void rxrpc_sock_destructor(struct sock *sk)
 
 	rxrpc_purge_queue(&sk->sk_receive_queue);
 
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(!sk_unhashed(sk));
 	WARN_ON(sk->sk_socket);
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 2209c2d..2b0778b 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -491,7 +491,7 @@ static void sch_atm_dequeue(unsigned long data)
 			ATM_SKB(skb)->vcc = flow->vcc;
 			memcpy(skb_push(skb, flow->hdr_len), flow->hdr,
 			       flow->hdr_len);
-			atomic_add(skb->truesize,
+			refcount_add(skb->truesize,
 				   &sk_atm(flow->vcc)->sk_wmem_alloc);
 			/* atm.atm_options are already set by atm_tc_enqueue */
 			flow->vcc->send(flow->vcc, skb);
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 71ce6b9..6574281 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -392,7 +392,7 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	 * therefore only reserve a single byte to keep socket around until
 	 * the packet has been transmitted.
 	 */
-	atomic_inc(&sk->sk_wmem_alloc);
+	refcount_inc(&sk->sk_wmem_alloc);
 }
 
 static int sctp_packet_pack(struct sctp_packet *packet,
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 206377f..25cd840 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -365,7 +365,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 		assoc->c.sinit_num_ostreams, assoc->max_retrans,
 		assoc->init_retries, assoc->shutdown_retries,
 		assoc->rtx_data_chunks,
-		atomic_read(&sk->sk_wmem_alloc),
+		refcount_read(&sk->sk_wmem_alloc),
 		sk->sk_wmem_queued,
 		sk->sk_sndbuf,
 		sk->sk_rcvbuf);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c1120d5..67dfec1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -164,7 +164,7 @@ static inline void sctp_set_owner_w(struct sctp_chunk *chunk)
 				sizeof(struct sk_buff) +
 				sizeof(struct sctp_chunk);
 
-	atomic_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
+	refcount_add(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
 	sk->sk_wmem_queued += chunk->skb->truesize;
 	sk_mem_charge(sk, chunk->skb->truesize);
 }
@@ -7539,7 +7539,7 @@ static void sctp_wfree(struct sk_buff *skb)
 				sizeof(struct sk_buff) +
 				sizeof(struct sctp_chunk);
 
-	atomic_sub(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc);
+	WARN_ON(refcount_sub_and_test(sizeof(struct sctp_chunk), &sk->sk_wmem_alloc));
 
 	/*
 	 * This undoes what is done via sctp_set_owner_w and sk_mem_charge
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a48d403..a74339e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -442,7 +442,7 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
 static int unix_writable(const struct sock *sk)
 {
 	return sk->sk_state != TCP_LISTEN &&
-	       (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+	       (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
 }
 
 static void unix_write_space(struct sock *sk)
@@ -487,7 +487,7 @@ static void unix_sock_destructor(struct sock *sk)
 
 	skb_queue_purge(&sk->sk_receive_queue);
 
-	WARN_ON(atomic_read(&sk->sk_wmem_alloc));
+	WARN_ON(refcount_read(&sk->sk_wmem_alloc));
 	WARN_ON(!sk_unhashed(sk));
 	WARN_ON(sk->sk_socket);
 	if (!sock_flag(sk, SOCK_DEAD)) {
@@ -2024,7 +2024,7 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
 	skb->len += size;
 	skb->data_len += size;
 	skb->truesize += size;
-	atomic_add(size, &sk->sk_wmem_alloc);
+	refcount_add(size, &sk->sk_wmem_alloc);
 
 	if (newskb) {
 		err = unix_scm_to_skb(&scm, skb, false);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 crypto/algif_aead.c             |  2 +-
 include/net/inet_hashtables.h   |  4 ++--
 include/net/request_sock.h      |  9 +++++----
 include/net/sock.h              | 17 +++++++++--------
 net/atm/proc.c                  |  2 +-
 net/bluetooth/af_bluetooth.c    |  2 +-
 net/bluetooth/rfcomm/sock.c     |  2 +-
 net/core/skbuff.c               |  6 +++---
 net/core/sock.c                 |  4 ++--
 net/ipv4/inet_connection_sock.c |  2 +-
 net/ipv4/inet_hashtables.c      |  4 ++--
 net/ipv4/inet_timewait_sock.c   |  8 ++++----
 net/ipv4/ping.c                 |  4 ++--
 net/ipv4/raw.c                  |  2 +-
 net/ipv4/syncookies.c           |  2 +-
 net/ipv4/tcp_fastopen.c         |  2 +-
 net/ipv4/tcp_ipv4.c             |  4 ++--
 net/ipv4/udp.c                  |  6 +++---
 net/ipv4/udp_diag.c             |  4 ++--
 net/ipv6/datagram.c             |  2 +-
 net/ipv6/inet6_hashtables.c     |  4 ++--
 net/ipv6/tcp_ipv6.c             |  4 ++--
 net/ipv6/udp.c                  |  2 +-
 net/key/af_key.c                |  2 +-
 net/l2tp/l2tp_debugfs.c         |  3 +--
 net/llc/llc_conn.c              |  8 ++++----
 net/llc/llc_sap.c               |  2 +-
 net/netfilter/xt_TPROXY.c       |  4 ++--
 net/netlink/af_netlink.c        |  6 +++---
 net/packet/af_packet.c          |  2 +-
 net/phonet/socket.c             |  2 +-
 net/rxrpc/af_rxrpc.c            |  2 +-
 net/sched/em_meta.c             |  2 +-
 net/tipc/socket.c               |  2 +-
 net/unix/af_unix.c              |  2 +-
 35 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 5a80537..607380d 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -750,7 +750,7 @@ static void aead_sock_destruct(struct sock *sk)
 	unsigned int ivlen = crypto_aead_ivsize(
 				crypto_aead_reqtfm(&ctx->aead_req));
 
-	WARN_ON(atomic_read(&sk->sk_refcnt) != 0);
+	WARN_ON(refcount_read(&sk->sk_refcnt) != 0);
 	aead_put_sgl(sk);
 	sock_kzfree_s(sk, ctx->iv, ivlen);
 	sock_kfree_s(sk, ctx, ctx->len);
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 1178931..b9e6e0e 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -32,7 +32,7 @@
 #include <net/tcp_states.h>
 #include <net/netns/hash.h>
 
-#include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <asm/byteorder.h>
 
 /* This is for all connections with a full identity, no wildcards.
@@ -334,7 +334,7 @@ static inline struct sock *inet_lookup(struct net *net,
 	sk = __inet_lookup(net, hashinfo, skb, doff, saddr, sport, daddr,
 			   dport, dif, &refcounted);
 
-	if (sk && !refcounted && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcounted && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index a12a5d2..e76e8c2 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -19,6 +19,7 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/bug.h>
+#include <linux/refcount.h>
 
 #include <net/sock.h>
 
@@ -89,7 +90,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 		return NULL;
 	req->rsk_listener = NULL;
 	if (attach_listener) {
-		if (unlikely(!atomic_inc_not_zero(&sk_listener->sk_refcnt))) {
+		if (unlikely(!refcount_inc_not_zero(&sk_listener->sk_refcnt))) {
 			kmem_cache_free(ops->slab, req);
 			return NULL;
 		}
@@ -100,7 +101,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 	sk_node_init(&req_to_sk(req)->sk_node);
 	sk_tx_queue_clear(req_to_sk(req));
 	req->saved_syn = NULL;
-	atomic_set(&req->rsk_refcnt, 0);
+	refcount_set(&req->rsk_refcnt, 0);
 
 	return req;
 }
@@ -108,7 +109,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 static inline void reqsk_free(struct request_sock *req)
 {
 	/* temporary debugging */
-	WARN_ON_ONCE(atomic_read(&req->rsk_refcnt) != 0);
+	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
 
 	req->rsk_ops->destructor(req);
 	if (req->rsk_listener)
@@ -119,7 +120,7 @@ static inline void reqsk_free(struct request_sock *req)
 
 static inline void reqsk_put(struct request_sock *req)
 {
-	if (atomic_dec_and_test(&req->rsk_refcnt))
+	if (refcount_dec_and_test(&req->rsk_refcnt))
 		reqsk_free(req);
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 24dcdba..c4f2b6c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -66,6 +66,7 @@
 #include <linux/poll.h>
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <net/dst.h>
 #include <net/checksum.h>
 #include <net/tcp_states.h>
@@ -219,7 +220,7 @@ struct sock_common {
 		u32		skc_tw_rcv_nxt; /* struct tcp_timewait_sock  */
 	};
 
-	atomic_t		skc_refcnt;
+	refcount_t		skc_refcnt;
 	/* private: */
 	int                     skc_dontcopy_end[0];
 	union {
@@ -602,7 +603,7 @@ static inline bool __sk_del_node_init(struct sock *sk)
 
 static __always_inline void sock_hold(struct sock *sk)
 {
-	atomic_inc(&sk->sk_refcnt);
+	refcount_inc(&sk->sk_refcnt);
 }
 
 /* Ungrab socket in the context, which assumes that socket refcnt
@@ -610,7 +611,7 @@ static __always_inline void sock_hold(struct sock *sk)
  */
 static __always_inline void __sock_put(struct sock *sk)
 {
-	atomic_dec(&sk->sk_refcnt);
+	refcount_dec(&sk->sk_refcnt);
 }
 
 static inline bool sk_del_node_init(struct sock *sk)
@@ -619,7 +620,7 @@ static inline bool sk_del_node_init(struct sock *sk)
 
 	if (rc) {
 		/* paranoid for a while -acme */
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 	return rc;
@@ -641,7 +642,7 @@ static inline bool sk_nulls_del_node_init_rcu(struct sock *sk)
 
 	if (rc) {
 		/* paranoid for a while -acme */
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 	return rc;
@@ -1130,9 +1131,9 @@ static inline void sk_refcnt_debug_dec(struct sock *sk)
 
 static inline void sk_refcnt_debug_release(const struct sock *sk)
 {
-	if (atomic_read(&sk->sk_refcnt) != 1)
+	if (refcount_read(&sk->sk_refcnt) != 1)
 		printk(KERN_DEBUG "Destruction of the %s socket %p delayed, refcnt=%d\n",
-		       sk->sk_prot->name, sk, atomic_read(&sk->sk_refcnt));
+		       sk->sk_prot->name, sk, refcount_read(&sk->sk_refcnt));
 }
 #else /* SOCK_REFCNT_DEBUG */
 #define sk_refcnt_debug_inc(sk) do { } while (0)
@@ -1641,7 +1642,7 @@ void sock_init_data(struct socket *sock, struct sock *sk);
 /* Ungrab socket and destroy it, if it was the last reference. */
 static inline void sock_put(struct sock *sk)
 {
-	if (atomic_dec_and_test(&sk->sk_refcnt))
+	if (refcount_dec_and_test(&sk->sk_refcnt))
 		sk_free(sk);
 }
 /* Generic version of sock_put(), dealing with all sockets
diff --git a/net/atm/proc.c b/net/atm/proc.c
index bbb6461..27c9c01 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -211,7 +211,7 @@ static void vcc_info(struct seq_file *seq, struct atm_vcc *vcc)
 		   vcc->flags, sk->sk_err,
 		   sk_wmem_alloc_get(sk), sk->sk_sndbuf,
 		   sk_rmem_alloc_get(sk), sk->sk_rcvbuf,
-		   atomic_read(&sk->sk_refcnt));
+		   refcount_read(&sk->sk_refcnt));
 }
 
 static void svc_info(struct seq_file *seq, struct atm_vcc *vcc)
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 69e1f7d..dbc00c2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -631,7 +631,7 @@ static int bt_seq_show(struct seq_file *seq, void *v)
 		seq_printf(seq,
 			   "%pK %-6d %-6u %-6u %-6u %-6lu %-6lu",
 			   sk,
-			   atomic_read(&sk->sk_refcnt),
+			   refcount_read(&sk->sk_refcnt),
 			   sk_rmem_alloc_get(sk),
 			   sk_wmem_alloc_get(sk),
 			   from_kuid(seq_user_ns(seq), sock_i_uid(sk)),
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index aa1a814..951be13 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -197,7 +197,7 @@ static void rfcomm_sock_kill(struct sock *sk)
 	if (!sock_flag(sk, SOCK_ZAPPED) || sk->sk_socket)
 		return;
 
-	BT_DBG("sk %p state %d refcnt %d", sk, sk->sk_state, atomic_read(&sk->sk_refcnt));
+	BT_DBG("sk %p state %d refcnt %d", sk, sk->sk_state, refcount_read(&sk->sk_refcnt));
 
 	/* Kill poor orphan */
 	bt_sock_unlink(&rfcomm_sk_list, sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e81a27f..7927fa0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3763,7 +3763,7 @@ struct sk_buff *skb_clone_sk(struct sk_buff *skb)
 	struct sock *sk = skb->sk;
 	struct sk_buff *clone;
 
-	if (!sk || !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
 		return NULL;
 
 	clone = skb_clone(skb, GFP_ATOMIC);
@@ -3829,7 +3829,7 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 	/* Take a reference to prevent skb_orphan() from freeing the socket,
 	 * but only if the socket refcount is not zero.
 	 */
-	if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) {
+	if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
 		*skb_hwtstamps(skb) = *hwtstamps;
 		__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND);
 		sock_put(sk);
@@ -3905,7 +3905,7 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
 	/* Take a reference to prevent skb_orphan() from freeing the socket,
 	 * but only if the socket refcount is not zero.
 	 */
-	if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) {
+	if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
 		err = sock_queue_err_skb(sk, skb);
 		sock_put(sk);
 	}
diff --git a/net/core/sock.c b/net/core/sock.c
index e830ddc..e4c52af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1559,7 +1559,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		 * (Documentation/RCU/rculist_nulls.txt for details)
 		 */
 		smp_wmb();
-		atomic_set(&newsk->sk_refcnt, 2);
+		refcount_set(&newsk->sk_refcnt, 2);
 
 		/*
 		 * Increment the counter in the same struct proto as the master
@@ -2517,7 +2517,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	 * (Documentation/RCU/rculist_nulls.txt for details)
 	 */
 	smp_wmb();
-	atomic_set(&sk->sk_refcnt, 1);
+	refcount_set(&sk->sk_refcnt, 1);
 	atomic_set(&sk->sk_drops, 0);
 }
 EXPORT_SYMBOL(sock_init_data);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b4d5980..f01adaf 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -755,7 +755,7 @@ static void reqsk_queue_hash_req(struct request_sock *req,
 	 * are committed to memory and refcnt initialized.
 	 */
 	smp_wmb();
-	atomic_set(&req->rsk_refcnt, 2 + 1);
+	refcount_set(&req->rsk_refcnt, 2 + 1);
 }
 
 void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index e9a59d2..a4be2c1 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL_GPL(__inet_lookup_listener);
 /* All sockets share common refcount, but have different destructors */
 void sock_gen_put(struct sock *sk)
 {
-	if (!atomic_dec_and_test(&sk->sk_refcnt))
+	if (!refcount_dec_and_test(&sk->sk_refcnt))
 		return;
 
 	if (sk->sk_state == TCP_TIME_WAIT)
@@ -287,7 +287,7 @@ struct sock *__inet_lookup_established(struct net *net,
 			continue;
 		if (likely(INET_MATCH(sk, net, acookie,
 				      saddr, daddr, ports, dif))) {
-			if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 				goto out;
 			if (unlikely(!INET_MATCH(sk, net, acookie,
 						 saddr, daddr, ports, dif))) {
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index f8aff2c..5b03915 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -76,7 +76,7 @@ void inet_twsk_free(struct inet_timewait_sock *tw)
 
 void inet_twsk_put(struct inet_timewait_sock *tw)
 {
-	if (atomic_dec_and_test(&tw->tw_refcnt))
+	if (refcount_dec_and_test(&tw->tw_refcnt))
 		inet_twsk_free(tw);
 }
 EXPORT_SYMBOL_GPL(inet_twsk_put);
@@ -131,7 +131,7 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
 	 * We can use atomic_set() because prior spin_lock()/spin_unlock()
 	 * committed into memory all tw fields.
 	 */
-	atomic_set(&tw->tw_refcnt, 4);
+	refcount_set(&tw->tw_refcnt, 4);
 	inet_twsk_add_node_rcu(tw, &ehead->chain);
 
 	/* Step 3: Remove SK from hash chain */
@@ -195,7 +195,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
 		 * to a non null value before everything is setup for this
 		 * timewait socket.
 		 */
-		atomic_set(&tw->tw_refcnt, 0);
+		refcount_set(&tw->tw_refcnt, 0);
 
 		__module_get(tw->tw_prot->owner);
 	}
@@ -278,7 +278,7 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family)
 				atomic_read(&twsk_net(tw)->count))
 				continue;
 
-			if (unlikely(!atomic_inc_not_zero(&tw->tw_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&tw->tw_refcnt)))
 				continue;
 
 			if (unlikely((tw->tw_family != family) ||
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 2af6244..c5434b9 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -289,7 +289,7 @@ void ping_close(struct sock *sk, long timeout)
 {
 	pr_debug("ping_close(sk=%p,sk->num=%u)\n",
 		 inet_sk(sk), inet_sk(sk)->inet_num);
-	pr_debug("isk->refcnt = %d\n", sk->sk_refcnt.counter);
+	pr_debug("isk->refcnt = %d\n", refcount_read(&sk->sk_refcnt));
 
 	sk_common_release(sk);
 }
@@ -1126,7 +1126,7 @@ static void ping_v4_format_sock(struct sock *sp, struct seq_file *f,
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp,
+		refcount_read(&sp->sk_refcnt), sp,
 		atomic_read(&sp->sk_drops));
 }
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8119e1f..7ab99f0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -1058,7 +1058,7 @@ static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
+		refcount_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
 }
 
 static int raw_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 496b97e..42d2426 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -212,7 +212,7 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 	child = icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst,
 						 NULL, &own_req);
 	if (child) {
-		atomic_set(&req->rsk_refcnt, 1);
+		refcount_set(&req->rsk_refcnt, 1);
 		sock_rps_save_rxhash(child, skb);
 		inet_csk_reqsk_queue_add(sk, req, child);
 	} else {
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 8ea4e97..d527795 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -214,7 +214,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	inet_csk_reset_xmit_timer(child, ICSK_TIME_RETRANS,
 				  TCP_TIMEOUT_INIT, TCP_RTO_MAX);
 
-	atomic_set(&req->rsk_refcnt, 2);
+	refcount_set(&req->rsk_refcnt, 2);
 
 	/* Now finish processing the fastopen child socket. */
 	inet_csk(child)->icsk_af_ops->rebuild_header(child);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 69f2d20..bb5e4c4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2269,7 +2269,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sk)),
 		icsk->icsk_probes_out,
 		sock_i_ino(sk),
-		atomic_read(&sk->sk_refcnt), sk,
+		refcount_read(&sk->sk_refcnt), sk,
 		jiffies_to_clock_t(icsk->icsk_rto),
 		jiffies_to_clock_t(icsk->icsk_ack.ato),
 		(icsk->icsk_ack.quick << 1) | icsk->icsk_ack.pingpong,
@@ -2295,7 +2295,7 @@ static void get_timewait4_sock(const struct inet_timewait_sock *tw,
 		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK",
 		i, src, srcp, dest, destp, tw->tw_substate, 0, 0,
 		3, jiffies_delta_to_clock_t(delta), 0, 0, 0, 0,
-		atomic_read(&tw->tw_refcnt), tw);
+		refcount_read(&tw->tw_refcnt), tw);
 }
 
 #define TMPSZ 150
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ea6e4cf..c241d6df 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -577,7 +577,7 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 
 	sk = __udp4_lib_lookup(net, saddr, sport, daddr, dport,
 			       dif, &udp_table, NULL);
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
@@ -2082,7 +2082,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
 					     uh->source, iph->saddr, dif);
 	}
 
-	if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))
+	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
 		return;
 
 	skb->sk = sk;
@@ -2530,7 +2530,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp,
+		refcount_read(&sp->sk_refcnt), sp,
 		atomic_read(&sp->sk_drops));
 }
 
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index 9a89c10..4515836 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -55,7 +55,7 @@ static int udp_dump_one(struct udp_table *tbl, struct sk_buff *in_skb,
 				req->id.idiag_dport,
 				req->id.idiag_if, tbl, NULL);
 #endif
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	rcu_read_unlock();
 	err = -ENOENT;
@@ -206,7 +206,7 @@ static int __udp_diag_destroy(struct sk_buff *in_skb,
 		return -EINVAL;
 	}
 
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 
 	rcu_read_unlock();
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index eec27f8..a15c9a6 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1043,6 +1043,6 @@ void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp,
 		   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		   0,
 		   sock_i_ino(sp),
-		   atomic_read(&sp->sk_refcnt), sp,
+		   refcount_read(&sp->sk_refcnt), sp,
 		   atomic_read(&sp->sk_drops));
 }
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index d090091..b13b8f9 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -75,7 +75,7 @@ struct sock *__inet6_lookup_established(struct net *net,
 			continue;
 		if (!INET6_MATCH(sk, net, saddr, daddr, ports, dif))
 			continue;
-		if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
+		if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 			goto out;
 
 		if (unlikely(!INET6_MATCH(sk, net, saddr, daddr, ports, dif))) {
@@ -172,7 +172,7 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 
 	sk = __inet6_lookup(net, hashinfo, skb, doff, saddr, sport, daddr,
 			    ntohs(dport), dif, &refcounted);
-	if (sk && !refcounted && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcounted && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21bb2fc..1a55d49 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1796,7 +1796,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 		   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		   icsk->icsk_probes_out,
 		   sock_i_ino(sp),
-		   atomic_read(&sp->sk_refcnt), sp,
+		   refcount_read(&sp->sk_refcnt), sp,
 		   jiffies_to_clock_t(icsk->icsk_rto),
 		   jiffies_to_clock_t(icsk->icsk_ack.ato),
 		   (icsk->icsk_ack.quick << 1) | icsk->icsk_ack.pingpong,
@@ -1829,7 +1829,7 @@ static void get_timewait6_sock(struct seq_file *seq,
 		   dest->s6_addr32[2], dest->s6_addr32[3], destp,
 		   tw->tw_substate, 0, 0,
 		   3, jiffies_delta_to_clock_t(delta), 0, 0, 0, 0,
-		   atomic_read(&tw->tw_refcnt), tw);
+		   refcount_read(&tw->tw_refcnt), tw);
 }
 
 static int tcp6_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4e4c401..deff3a6 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -324,7 +324,7 @@ struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be
 
 	sk =  __udp6_lib_lookup(net, saddr, sport, daddr, dport,
 				dif, &udp_table, NULL);
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/net/key/af_key.c b/net/key/af_key.c
index ba00bd3..30ec662 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3711,7 +3711,7 @@ static int pfkey_seq_show(struct seq_file *f, void *v)
 	else
 		seq_printf(f, "%pK %-6d %-6u %-6u %-6u %-6lu\n",
 			       s,
-			       atomic_read(&s->sk_refcnt),
+			       refcount_read(&s->sk_refcnt),
 			       sk_rmem_alloc_get(s),
 			       sk_wmem_alloc_get(s),
 			       from_kuid_munged(seq_user_ns(f), sock_i_uid(s)),
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index 2d6760a..c87689d 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -144,9 +144,8 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
 		   tunnel->encap == L2TP_ENCAPTYPE_IP ? "IP" :
 		   "");
 	seq_printf(m, " %d sessions, refcnt %d/%d\n", session_count,
-		   tunnel->sock ? atomic_read(&tunnel->sock->sk_refcnt) : 0,
+		   tunnel->sock ? refcount_read(&tunnel->sock->sk_refcnt) : 0,
 		   atomic_read(&tunnel->ref_count));
-
 	seq_printf(m, " %08x rx %ld/%ld/%ld rx %ld/%ld/%ld\n",
 		   tunnel->debug,
 		   atomic_long_read(&tunnel->stats.tx_packets),
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 9b02c13..5e91b47 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -507,7 +507,7 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_estab_match(sap, daddr, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_estab_match(sap, daddr, laddr, rc))) {
@@ -566,7 +566,7 @@ static struct sock *__llc_lookup_listener(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_listener_match(sap, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_listener_match(sap, laddr, rc))) {
@@ -973,9 +973,9 @@ void llc_sk_free(struct sock *sk)
 	skb_queue_purge(&sk->sk_write_queue);
 	skb_queue_purge(&llc->pdu_unack_q);
 #ifdef LLC_REFCNT_DEBUG
-	if (atomic_read(&sk->sk_refcnt) != 1) {
+	if (refcount_read(&sk->sk_refcnt) != 1) {
 		printk(KERN_DEBUG "Destruction of LLC sock %p delayed in %s, cnt=%d\n",
-			sk, __func__, atomic_read(&sk->sk_refcnt));
+			sk, __func__, refcount_read(&sk->sk_refcnt));
 		printk(KERN_DEBUG "%d LLC sockets are still alive\n",
 			atomic_read(&llc_sock_nr));
 	} else {
diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c
index 63b6ab0..d90928f 100644
--- a/net/llc/llc_sap.c
+++ b/net/llc/llc_sap.c
@@ -329,7 +329,7 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_dgram_match(sap, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_dgram_match(sap, laddr, rc))) {
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 80cb7ba..d51f1e8 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -127,7 +127,7 @@ nf_tproxy_get_sock_v4(struct net *net, struct sk_buff *skb, void *hp,
 						    daddr, dport,
 						    in->ifindex);
 
-			if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+			if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 				sk = NULL;
 			/* NOTE: we return listeners even if bound to
 			 * 0.0.0.0, those are filtered out in
@@ -197,7 +197,7 @@ nf_tproxy_get_sock_v6(struct net *net, struct sk_buff *skb, int thoff, void *hp,
 						   daddr, ntohs(dport),
 						   in->ifindex);
 
-			if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+			if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 				sk = NULL;
 			/* NOTE: we return listeners even if bound to
 			 * 0.0.0.0, those are filtered out in
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9332b24..e3c9a6a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -544,7 +544,7 @@ static void netlink_remove(struct sock *sk)
 	table = &nl_table[sk->sk_protocol];
 	if (!rhashtable_remove_fast(&table->hash, &nlk_sk(sk)->node,
 				    netlink_rhashtable_params)) {
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 
@@ -657,7 +657,7 @@ static void deferred_put_nlk_sk(struct rcu_head *head)
 	struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
 	struct sock *sk = &nlk->sk;
 
-	if (!atomic_dec_and_test(&sk->sk_refcnt))
+	if (!refcount_dec_and_test(&sk->sk_refcnt))
 		return;
 
 	if (nlk->cb_running && nlk->cb.done) {
@@ -2469,7 +2469,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 			   sk_rmem_alloc_get(s),
 			   sk_wmem_alloc_get(s),
 			   nlk->cb_running,
-			   atomic_read(&s->sk_refcnt),
+			   refcount_read(&s->sk_refcnt),
 			   atomic_read(&s->sk_drops),
 			   sock_i_ino(s)
 			);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 82eb052..ad5e5dc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4448,7 +4448,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
 		seq_printf(seq,
 			   "%pK %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
 			   s,
-			   atomic_read(&s->sk_refcnt),
+			   refcount_read(&s->sk_refcnt),
 			   s->sk_type,
 			   ntohs(po->num),
 			   po->ifindex,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 27b0b13..6b7aa30 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -614,7 +614,7 @@ static int pn_sock_seq_show(struct seq_file *seq, void *v)
 			sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk),
 			from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)),
 			sock_i_ino(sk),
-			atomic_read(&sk->sk_refcnt), sk,
+			refcount_read(&sk->sk_refcnt), sk,
 			atomic_read(&sk->sk_drops));
 	}
 	seq_pad(seq, '\n');
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index b473ac2..e33e007 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -682,7 +682,7 @@ static int rxrpc_release_sock(struct sock *sk)
 {
 	struct rxrpc_sock *rx = rxrpc_sk(sk);
 
-	_enter("%p{%d,%d}", sk, sk->sk_state, atomic_read(&sk->sk_refcnt));
+	_enter("%p{%d,%d}", sk, sk->sk_state, refcount_read(&sk->sk_refcnt));
 
 	/* declare the socket closed for business */
 	sock_orphan(sk);
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c
index ae7e4f5..b1e7838 100644
--- a/net/sched/em_meta.c
+++ b/net/sched/em_meta.c
@@ -340,7 +340,7 @@ META_COLLECTOR(int_sk_refcnt)
 		*err = -1;
 		return;
 	}
-	dst->value = atomic_read(&skb->sk->sk_refcnt);
+	dst->value = refcount_read(&skb->sk->sk_refcnt);
 }
 
 META_COLLECTOR(int_sk_rcvbuf)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 43e4045..6991dbe 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2307,7 +2307,7 @@ static void tipc_sk_remove(struct tipc_sock *tsk)
 	struct tipc_net *tn = net_generic(sock_net(sk), tipc_net_id);
 
 	if (!rhashtable_remove_fast(&tn->sk_rht, &tsk->node, tsk_rht_params)) {
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 }
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a74339e..e9f8102 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2750,7 +2750,7 @@ static int unix_seq_show(struct seq_file *seq, void *v)
 
 		seq_printf(seq, "%pK: %08X %08X %08X %04X %02X %5lu",
 			s,
-			atomic_read(&s->sk_refcnt),
+			refcount_read(&s->sk_refcnt),
 			0,
 			s->sk_state == TCP_LISTEN ? __SO_ACCEPTCON : 0,
 			s->sk_type,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 crypto/algif_aead.c             |  2 +-
 include/net/inet_hashtables.h   |  4 ++--
 include/net/request_sock.h      |  9 +++++----
 include/net/sock.h              | 17 +++++++++--------
 net/atm/proc.c                  |  2 +-
 net/bluetooth/af_bluetooth.c    |  2 +-
 net/bluetooth/rfcomm/sock.c     |  2 +-
 net/core/skbuff.c               |  6 +++---
 net/core/sock.c                 |  4 ++--
 net/ipv4/inet_connection_sock.c |  2 +-
 net/ipv4/inet_hashtables.c      |  4 ++--
 net/ipv4/inet_timewait_sock.c   |  8 ++++----
 net/ipv4/ping.c                 |  4 ++--
 net/ipv4/raw.c                  |  2 +-
 net/ipv4/syncookies.c           |  2 +-
 net/ipv4/tcp_fastopen.c         |  2 +-
 net/ipv4/tcp_ipv4.c             |  4 ++--
 net/ipv4/udp.c                  |  6 +++---
 net/ipv4/udp_diag.c             |  4 ++--
 net/ipv6/datagram.c             |  2 +-
 net/ipv6/inet6_hashtables.c     |  4 ++--
 net/ipv6/tcp_ipv6.c             |  4 ++--
 net/ipv6/udp.c                  |  2 +-
 net/key/af_key.c                |  2 +-
 net/l2tp/l2tp_debugfs.c         |  3 +--
 net/llc/llc_conn.c              |  8 ++++----
 net/llc/llc_sap.c               |  2 +-
 net/netfilter/xt_TPROXY.c       |  4 ++--
 net/netlink/af_netlink.c        |  6 +++---
 net/packet/af_packet.c          |  2 +-
 net/phonet/socket.c             |  2 +-
 net/rxrpc/af_rxrpc.c            |  2 +-
 net/sched/em_meta.c             |  2 +-
 net/tipc/socket.c               |  2 +-
 net/unix/af_unix.c              |  2 +-
 35 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 5a80537..607380d 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -750,7 +750,7 @@ static void aead_sock_destruct(struct sock *sk)
 	unsigned int ivlen = crypto_aead_ivsize(
 				crypto_aead_reqtfm(&ctx->aead_req));
 
-	WARN_ON(atomic_read(&sk->sk_refcnt) != 0);
+	WARN_ON(refcount_read(&sk->sk_refcnt) != 0);
 	aead_put_sgl(sk);
 	sock_kzfree_s(sk, ctx->iv, ivlen);
 	sock_kfree_s(sk, ctx, ctx->len);
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 1178931..b9e6e0e 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -32,7 +32,7 @@
 #include <net/tcp_states.h>
 #include <net/netns/hash.h>
 
-#include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <asm/byteorder.h>
 
 /* This is for all connections with a full identity, no wildcards.
@@ -334,7 +334,7 @@ static inline struct sock *inet_lookup(struct net *net,
 	sk = __inet_lookup(net, hashinfo, skb, doff, saddr, sport, daddr,
 			   dport, dif, &refcounted);
 
-	if (sk && !refcounted && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcounted && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index a12a5d2..e76e8c2 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -19,6 +19,7 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/bug.h>
+#include <linux/refcount.h>
 
 #include <net/sock.h>
 
@@ -89,7 +90,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 		return NULL;
 	req->rsk_listener = NULL;
 	if (attach_listener) {
-		if (unlikely(!atomic_inc_not_zero(&sk_listener->sk_refcnt))) {
+		if (unlikely(!refcount_inc_not_zero(&sk_listener->sk_refcnt))) {
 			kmem_cache_free(ops->slab, req);
 			return NULL;
 		}
@@ -100,7 +101,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 	sk_node_init(&req_to_sk(req)->sk_node);
 	sk_tx_queue_clear(req_to_sk(req));
 	req->saved_syn = NULL;
-	atomic_set(&req->rsk_refcnt, 0);
+	refcount_set(&req->rsk_refcnt, 0);
 
 	return req;
 }
@@ -108,7 +109,7 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 static inline void reqsk_free(struct request_sock *req)
 {
 	/* temporary debugging */
-	WARN_ON_ONCE(atomic_read(&req->rsk_refcnt) != 0);
+	WARN_ON_ONCE(refcount_read(&req->rsk_refcnt) != 0);
 
 	req->rsk_ops->destructor(req);
 	if (req->rsk_listener)
@@ -119,7 +120,7 @@ static inline void reqsk_free(struct request_sock *req)
 
 static inline void reqsk_put(struct request_sock *req)
 {
-	if (atomic_dec_and_test(&req->rsk_refcnt))
+	if (refcount_dec_and_test(&req->rsk_refcnt))
 		reqsk_free(req);
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 24dcdba..c4f2b6c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -66,6 +66,7 @@
 #include <linux/poll.h>
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <net/dst.h>
 #include <net/checksum.h>
 #include <net/tcp_states.h>
@@ -219,7 +220,7 @@ struct sock_common {
 		u32		skc_tw_rcv_nxt; /* struct tcp_timewait_sock  */
 	};
 
-	atomic_t		skc_refcnt;
+	refcount_t		skc_refcnt;
 	/* private: */
 	int                     skc_dontcopy_end[0];
 	union {
@@ -602,7 +603,7 @@ static inline bool __sk_del_node_init(struct sock *sk)
 
 static __always_inline void sock_hold(struct sock *sk)
 {
-	atomic_inc(&sk->sk_refcnt);
+	refcount_inc(&sk->sk_refcnt);
 }
 
 /* Ungrab socket in the context, which assumes that socket refcnt
@@ -610,7 +611,7 @@ static __always_inline void sock_hold(struct sock *sk)
  */
 static __always_inline void __sock_put(struct sock *sk)
 {
-	atomic_dec(&sk->sk_refcnt);
+	refcount_dec(&sk->sk_refcnt);
 }
 
 static inline bool sk_del_node_init(struct sock *sk)
@@ -619,7 +620,7 @@ static inline bool sk_del_node_init(struct sock *sk)
 
 	if (rc) {
 		/* paranoid for a while -acme */
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 	return rc;
@@ -641,7 +642,7 @@ static inline bool sk_nulls_del_node_init_rcu(struct sock *sk)
 
 	if (rc) {
 		/* paranoid for a while -acme */
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 	return rc;
@@ -1130,9 +1131,9 @@ static inline void sk_refcnt_debug_dec(struct sock *sk)
 
 static inline void sk_refcnt_debug_release(const struct sock *sk)
 {
-	if (atomic_read(&sk->sk_refcnt) != 1)
+	if (refcount_read(&sk->sk_refcnt) != 1)
 		printk(KERN_DEBUG "Destruction of the %s socket %p delayed, refcnt=%d\n",
-		       sk->sk_prot->name, sk, atomic_read(&sk->sk_refcnt));
+		       sk->sk_prot->name, sk, refcount_read(&sk->sk_refcnt));
 }
 #else /* SOCK_REFCNT_DEBUG */
 #define sk_refcnt_debug_inc(sk) do { } while (0)
@@ -1641,7 +1642,7 @@ void sock_init_data(struct socket *sock, struct sock *sk);
 /* Ungrab socket and destroy it, if it was the last reference. */
 static inline void sock_put(struct sock *sk)
 {
-	if (atomic_dec_and_test(&sk->sk_refcnt))
+	if (refcount_dec_and_test(&sk->sk_refcnt))
 		sk_free(sk);
 }
 /* Generic version of sock_put(), dealing with all sockets
diff --git a/net/atm/proc.c b/net/atm/proc.c
index bbb6461..27c9c01 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -211,7 +211,7 @@ static void vcc_info(struct seq_file *seq, struct atm_vcc *vcc)
 		   vcc->flags, sk->sk_err,
 		   sk_wmem_alloc_get(sk), sk->sk_sndbuf,
 		   sk_rmem_alloc_get(sk), sk->sk_rcvbuf,
-		   atomic_read(&sk->sk_refcnt));
+		   refcount_read(&sk->sk_refcnt));
 }
 
 static void svc_info(struct seq_file *seq, struct atm_vcc *vcc)
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 69e1f7d..dbc00c2 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -631,7 +631,7 @@ static int bt_seq_show(struct seq_file *seq, void *v)
 		seq_printf(seq,
 			   "%pK %-6d %-6u %-6u %-6u %-6lu %-6lu",
 			   sk,
-			   atomic_read(&sk->sk_refcnt),
+			   refcount_read(&sk->sk_refcnt),
 			   sk_rmem_alloc_get(sk),
 			   sk_wmem_alloc_get(sk),
 			   from_kuid(seq_user_ns(seq), sock_i_uid(sk)),
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index aa1a814..951be13 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -197,7 +197,7 @@ static void rfcomm_sock_kill(struct sock *sk)
 	if (!sock_flag(sk, SOCK_ZAPPED) || sk->sk_socket)
 		return;
 
-	BT_DBG("sk %p state %d refcnt %d", sk, sk->sk_state, atomic_read(&sk->sk_refcnt));
+	BT_DBG("sk %p state %d refcnt %d", sk, sk->sk_state, refcount_read(&sk->sk_refcnt));
 
 	/* Kill poor orphan */
 	bt_sock_unlink(&rfcomm_sk_list, sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e81a27f..7927fa0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3763,7 +3763,7 @@ struct sk_buff *skb_clone_sk(struct sk_buff *skb)
 	struct sock *sk = skb->sk;
 	struct sk_buff *clone;
 
-	if (!sk || !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
 		return NULL;
 
 	clone = skb_clone(skb, GFP_ATOMIC);
@@ -3829,7 +3829,7 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 	/* Take a reference to prevent skb_orphan() from freeing the socket,
 	 * but only if the socket refcount is not zero.
 	 */
-	if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) {
+	if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
 		*skb_hwtstamps(skb) = *hwtstamps;
 		__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND);
 		sock_put(sk);
@@ -3905,7 +3905,7 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
 	/* Take a reference to prevent skb_orphan() from freeing the socket,
 	 * but only if the socket refcount is not zero.
 	 */
-	if (likely(atomic_inc_not_zero(&sk->sk_refcnt))) {
+	if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
 		err = sock_queue_err_skb(sk, skb);
 		sock_put(sk);
 	}
diff --git a/net/core/sock.c b/net/core/sock.c
index e830ddc..e4c52af 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1559,7 +1559,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		 * (Documentation/RCU/rculist_nulls.txt for details)
 		 */
 		smp_wmb();
-		atomic_set(&newsk->sk_refcnt, 2);
+		refcount_set(&newsk->sk_refcnt, 2);
 
 		/*
 		 * Increment the counter in the same struct proto as the master
@@ -2517,7 +2517,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	 * (Documentation/RCU/rculist_nulls.txt for details)
 	 */
 	smp_wmb();
-	atomic_set(&sk->sk_refcnt, 1);
+	refcount_set(&sk->sk_refcnt, 1);
 	atomic_set(&sk->sk_drops, 0);
 }
 EXPORT_SYMBOL(sock_init_data);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b4d5980..f01adaf 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -755,7 +755,7 @@ static void reqsk_queue_hash_req(struct request_sock *req,
 	 * are committed to memory and refcnt initialized.
 	 */
 	smp_wmb();
-	atomic_set(&req->rsk_refcnt, 2 + 1);
+	refcount_set(&req->rsk_refcnt, 2 + 1);
 }
 
 void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index e9a59d2..a4be2c1 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL_GPL(__inet_lookup_listener);
 /* All sockets share common refcount, but have different destructors */
 void sock_gen_put(struct sock *sk)
 {
-	if (!atomic_dec_and_test(&sk->sk_refcnt))
+	if (!refcount_dec_and_test(&sk->sk_refcnt))
 		return;
 
 	if (sk->sk_state == TCP_TIME_WAIT)
@@ -287,7 +287,7 @@ struct sock *__inet_lookup_established(struct net *net,
 			continue;
 		if (likely(INET_MATCH(sk, net, acookie,
 				      saddr, daddr, ports, dif))) {
-			if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 				goto out;
 			if (unlikely(!INET_MATCH(sk, net, acookie,
 						 saddr, daddr, ports, dif))) {
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index f8aff2c..5b03915 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -76,7 +76,7 @@ void inet_twsk_free(struct inet_timewait_sock *tw)
 
 void inet_twsk_put(struct inet_timewait_sock *tw)
 {
-	if (atomic_dec_and_test(&tw->tw_refcnt))
+	if (refcount_dec_and_test(&tw->tw_refcnt))
 		inet_twsk_free(tw);
 }
 EXPORT_SYMBOL_GPL(inet_twsk_put);
@@ -131,7 +131,7 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
 	 * We can use atomic_set() because prior spin_lock()/spin_unlock()
 	 * committed into memory all tw fields.
 	 */
-	atomic_set(&tw->tw_refcnt, 4);
+	refcount_set(&tw->tw_refcnt, 4);
 	inet_twsk_add_node_rcu(tw, &ehead->chain);
 
 	/* Step 3: Remove SK from hash chain */
@@ -195,7 +195,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
 		 * to a non null value before everything is setup for this
 		 * timewait socket.
 		 */
-		atomic_set(&tw->tw_refcnt, 0);
+		refcount_set(&tw->tw_refcnt, 0);
 
 		__module_get(tw->tw_prot->owner);
 	}
@@ -278,7 +278,7 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family)
 				atomic_read(&twsk_net(tw)->count))
 				continue;
 
-			if (unlikely(!atomic_inc_not_zero(&tw->tw_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&tw->tw_refcnt)))
 				continue;
 
 			if (unlikely((tw->tw_family != family) ||
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 2af6244..c5434b9 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -289,7 +289,7 @@ void ping_close(struct sock *sk, long timeout)
 {
 	pr_debug("ping_close(sk=%p,sk->num=%u)\n",
 		 inet_sk(sk), inet_sk(sk)->inet_num);
-	pr_debug("isk->refcnt = %d\n", sk->sk_refcnt.counter);
+	pr_debug("isk->refcnt = %d\n", refcount_read(&sk->sk_refcnt));
 
 	sk_common_release(sk);
 }
@@ -1126,7 +1126,7 @@ static void ping_v4_format_sock(struct sock *sp, struct seq_file *f,
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp,
+		refcount_read(&sp->sk_refcnt), sp,
 		atomic_read(&sp->sk_drops));
 }
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 8119e1f..7ab99f0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -1058,7 +1058,7 @@ static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
+		refcount_read(&sp->sk_refcnt), sp, atomic_read(&sp->sk_drops));
 }
 
 static int raw_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 496b97e..42d2426 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -212,7 +212,7 @@ struct sock *tcp_get_cookie_sock(struct sock *sk, struct sk_buff *skb,
 	child = icsk->icsk_af_ops->syn_recv_sock(sk, skb, req, dst,
 						 NULL, &own_req);
 	if (child) {
-		atomic_set(&req->rsk_refcnt, 1);
+		refcount_set(&req->rsk_refcnt, 1);
 		sock_rps_save_rxhash(child, skb);
 		inet_csk_reqsk_queue_add(sk, req, child);
 	} else {
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 8ea4e97..d527795 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -214,7 +214,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	inet_csk_reset_xmit_timer(child, ICSK_TIME_RETRANS,
 				  TCP_TIMEOUT_INIT, TCP_RTO_MAX);
 
-	atomic_set(&req->rsk_refcnt, 2);
+	refcount_set(&req->rsk_refcnt, 2);
 
 	/* Now finish processing the fastopen child socket. */
 	inet_csk(child)->icsk_af_ops->rebuild_header(child);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 69f2d20..bb5e4c4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2269,7 +2269,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sk)),
 		icsk->icsk_probes_out,
 		sock_i_ino(sk),
-		atomic_read(&sk->sk_refcnt), sk,
+		refcount_read(&sk->sk_refcnt), sk,
 		jiffies_to_clock_t(icsk->icsk_rto),
 		jiffies_to_clock_t(icsk->icsk_ack.ato),
 		(icsk->icsk_ack.quick << 1) | icsk->icsk_ack.pingpong,
@@ -2295,7 +2295,7 @@ static void get_timewait4_sock(const struct inet_timewait_sock *tw,
 		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK",
 		i, src, srcp, dest, destp, tw->tw_substate, 0, 0,
 		3, jiffies_delta_to_clock_t(delta), 0, 0, 0, 0,
-		atomic_read(&tw->tw_refcnt), tw);
+		refcount_read(&tw->tw_refcnt), tw);
 }
 
 #define TMPSZ 150
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ea6e4cf..c241d6df 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -577,7 +577,7 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 
 	sk = __udp4_lib_lookup(net, saddr, sport, daddr, dport,
 			       dif, &udp_table, NULL);
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
@@ -2082,7 +2082,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
 					     uh->source, iph->saddr, dif);
 	}
 
-	if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))
+	if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
 		return;
 
 	skb->sk = sk;
@@ -2530,7 +2530,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 		0, 0L, 0,
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
-		atomic_read(&sp->sk_refcnt), sp,
+		refcount_read(&sp->sk_refcnt), sp,
 		atomic_read(&sp->sk_drops));
 }
 
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index 9a89c10..4515836 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -55,7 +55,7 @@ static int udp_dump_one(struct udp_table *tbl, struct sk_buff *in_skb,
 				req->id.idiag_dport,
 				req->id.idiag_if, tbl, NULL);
 #endif
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	rcu_read_unlock();
 	err = -ENOENT;
@@ -206,7 +206,7 @@ static int __udp_diag_destroy(struct sk_buff *in_skb,
 		return -EINVAL;
 	}
 
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 
 	rcu_read_unlock();
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index eec27f8..a15c9a6 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1043,6 +1043,6 @@ void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp,
 		   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		   0,
 		   sock_i_ino(sp),
-		   atomic_read(&sp->sk_refcnt), sp,
+		   refcount_read(&sp->sk_refcnt), sp,
 		   atomic_read(&sp->sk_drops));
 }
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index d090091..b13b8f9 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -75,7 +75,7 @@ struct sock *__inet6_lookup_established(struct net *net,
 			continue;
 		if (!INET6_MATCH(sk, net, saddr, daddr, ports, dif))
 			continue;
-		if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
+		if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
 			goto out;
 
 		if (unlikely(!INET6_MATCH(sk, net, saddr, daddr, ports, dif))) {
@@ -172,7 +172,7 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 
 	sk = __inet6_lookup(net, hashinfo, skb, doff, saddr, sport, daddr,
 			    ntohs(dport), dif, &refcounted);
-	if (sk && !refcounted && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcounted && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21bb2fc..1a55d49 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1796,7 +1796,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 		   from_kuid_munged(seq_user_ns(seq), sock_i_uid(sp)),
 		   icsk->icsk_probes_out,
 		   sock_i_ino(sp),
-		   atomic_read(&sp->sk_refcnt), sp,
+		   refcount_read(&sp->sk_refcnt), sp,
 		   jiffies_to_clock_t(icsk->icsk_rto),
 		   jiffies_to_clock_t(icsk->icsk_ack.ato),
 		   (icsk->icsk_ack.quick << 1) | icsk->icsk_ack.pingpong,
@@ -1829,7 +1829,7 @@ static void get_timewait6_sock(struct seq_file *seq,
 		   dest->s6_addr32[2], dest->s6_addr32[3], destp,
 		   tw->tw_substate, 0, 0,
 		   3, jiffies_delta_to_clock_t(delta), 0, 0, 0, 0,
-		   atomic_read(&tw->tw_refcnt), tw);
+		   refcount_read(&tw->tw_refcnt), tw);
 }
 
 static int tcp6_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 4e4c401..deff3a6 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -324,7 +324,7 @@ struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be
 
 	sk =  __udp6_lib_lookup(net, saddr, sport, daddr, dport,
 				dif, &udp_table, NULL);
-	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+	if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 		sk = NULL;
 	return sk;
 }
diff --git a/net/key/af_key.c b/net/key/af_key.c
index ba00bd3..30ec662 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3711,7 +3711,7 @@ static int pfkey_seq_show(struct seq_file *f, void *v)
 	else
 		seq_printf(f, "%pK %-6d %-6u %-6u %-6u %-6lu\n",
 			       s,
-			       atomic_read(&s->sk_refcnt),
+			       refcount_read(&s->sk_refcnt),
 			       sk_rmem_alloc_get(s),
 			       sk_wmem_alloc_get(s),
 			       from_kuid_munged(seq_user_ns(f), sock_i_uid(s)),
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index 2d6760a..c87689d 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -144,9 +144,8 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
 		   tunnel->encap == L2TP_ENCAPTYPE_IP ? "IP" :
 		   "");
 	seq_printf(m, " %d sessions, refcnt %d/%d\n", session_count,
-		   tunnel->sock ? atomic_read(&tunnel->sock->sk_refcnt) : 0,
+		   tunnel->sock ? refcount_read(&tunnel->sock->sk_refcnt) : 0,
 		   atomic_read(&tunnel->ref_count));
-
 	seq_printf(m, " %08x rx %ld/%ld/%ld rx %ld/%ld/%ld\n",
 		   tunnel->debug,
 		   atomic_long_read(&tunnel->stats.tx_packets),
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 9b02c13..5e91b47 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -507,7 +507,7 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_estab_match(sap, daddr, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_estab_match(sap, daddr, laddr, rc))) {
@@ -566,7 +566,7 @@ static struct sock *__llc_lookup_listener(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_listener_match(sap, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_listener_match(sap, laddr, rc))) {
@@ -973,9 +973,9 @@ void llc_sk_free(struct sock *sk)
 	skb_queue_purge(&sk->sk_write_queue);
 	skb_queue_purge(&llc->pdu_unack_q);
 #ifdef LLC_REFCNT_DEBUG
-	if (atomic_read(&sk->sk_refcnt) != 1) {
+	if (refcount_read(&sk->sk_refcnt) != 1) {
 		printk(KERN_DEBUG "Destruction of LLC sock %p delayed in %s, cnt=%d\n",
-			sk, __func__, atomic_read(&sk->sk_refcnt));
+			sk, __func__, refcount_read(&sk->sk_refcnt));
 		printk(KERN_DEBUG "%d LLC sockets are still alive\n",
 			atomic_read(&llc_sock_nr));
 	} else {
diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c
index 63b6ab0..d90928f 100644
--- a/net/llc/llc_sap.c
+++ b/net/llc/llc_sap.c
@@ -329,7 +329,7 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
 	sk_nulls_for_each_rcu(rc, node, laddr_hb) {
 		if (llc_dgram_match(sap, laddr, rc)) {
 			/* Extra checks required by SLAB_TYPESAFE_BY_RCU */
-			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+			if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt)))
 				goto again;
 			if (unlikely(llc_sk(rc)->sap != sap ||
 				     !llc_dgram_match(sap, laddr, rc))) {
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 80cb7ba..d51f1e8 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -127,7 +127,7 @@ nf_tproxy_get_sock_v4(struct net *net, struct sk_buff *skb, void *hp,
 						    daddr, dport,
 						    in->ifindex);
 
-			if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+			if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 				sk = NULL;
 			/* NOTE: we return listeners even if bound to
 			 * 0.0.0.0, those are filtered out in
@@ -197,7 +197,7 @@ nf_tproxy_get_sock_v6(struct net *net, struct sk_buff *skb, int thoff, void *hp,
 						   daddr, ntohs(dport),
 						   in->ifindex);
 
-			if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+			if (sk && !refcount_inc_not_zero(&sk->sk_refcnt))
 				sk = NULL;
 			/* NOTE: we return listeners even if bound to
 			 * 0.0.0.0, those are filtered out in
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9332b24..e3c9a6a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -544,7 +544,7 @@ static void netlink_remove(struct sock *sk)
 	table = &nl_table[sk->sk_protocol];
 	if (!rhashtable_remove_fast(&table->hash, &nlk_sk(sk)->node,
 				    netlink_rhashtable_params)) {
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 
@@ -657,7 +657,7 @@ static void deferred_put_nlk_sk(struct rcu_head *head)
 	struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
 	struct sock *sk = &nlk->sk;
 
-	if (!atomic_dec_and_test(&sk->sk_refcnt))
+	if (!refcount_dec_and_test(&sk->sk_refcnt))
 		return;
 
 	if (nlk->cb_running && nlk->cb.done) {
@@ -2469,7 +2469,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 			   sk_rmem_alloc_get(s),
 			   sk_wmem_alloc_get(s),
 			   nlk->cb_running,
-			   atomic_read(&s->sk_refcnt),
+			   refcount_read(&s->sk_refcnt),
 			   atomic_read(&s->sk_drops),
 			   sock_i_ino(s)
 			);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 82eb052..ad5e5dc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4448,7 +4448,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
 		seq_printf(seq,
 			   "%pK %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
 			   s,
-			   atomic_read(&s->sk_refcnt),
+			   refcount_read(&s->sk_refcnt),
 			   s->sk_type,
 			   ntohs(po->num),
 			   po->ifindex,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 27b0b13..6b7aa30 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -614,7 +614,7 @@ static int pn_sock_seq_show(struct seq_file *seq, void *v)
 			sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk),
 			from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)),
 			sock_i_ino(sk),
-			atomic_read(&sk->sk_refcnt), sk,
+			refcount_read(&sk->sk_refcnt), sk,
 			atomic_read(&sk->sk_drops));
 	}
 	seq_pad(seq, '\n');
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index b473ac2..e33e007 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -682,7 +682,7 @@ static int rxrpc_release_sock(struct sock *sk)
 {
 	struct rxrpc_sock *rx = rxrpc_sk(sk);
 
-	_enter("%p{%d,%d}", sk, sk->sk_state, atomic_read(&sk->sk_refcnt));
+	_enter("%p{%d,%d}", sk, sk->sk_state, refcount_read(&sk->sk_refcnt));
 
 	/* declare the socket closed for business */
 	sock_orphan(sk);
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c
index ae7e4f5..b1e7838 100644
--- a/net/sched/em_meta.c
+++ b/net/sched/em_meta.c
@@ -340,7 +340,7 @@ META_COLLECTOR(int_sk_refcnt)
 		*err = -1;
 		return;
 	}
-	dst->value = atomic_read(&skb->sk->sk_refcnt);
+	dst->value = refcount_read(&skb->sk->sk_refcnt);
 }
 
 META_COLLECTOR(int_sk_rcvbuf)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 43e4045..6991dbe 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2307,7 +2307,7 @@ static void tipc_sk_remove(struct tipc_sock *tsk)
 	struct tipc_net *tn = net_generic(sock_net(sk), tipc_net_id);
 
 	if (!rhashtable_remove_fast(&tn->sk_rht, &tsk->node, tsk_rht_params)) {
-		WARN_ON(atomic_read(&sk->sk_refcnt) == 1);
+		WARN_ON(refcount_read(&sk->sk_refcnt) == 1);
 		__sock_put(sk);
 	}
 }
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a74339e..e9f8102 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2750,7 +2750,7 @@ static int unix_seq_show(struct seq_file *seq, void *v)
 
 		seq_printf(seq, "%pK: %08X %08X %08X %04X %02X %5lu",
 			s,
-			atomic_read(&s->sk_refcnt),
+			refcount_read(&s->sk_refcnt),
 			0,
 			s->sk_state == TCP_LISTEN ? __SO_ACCEPTCON : 0,
 			s->sk_type,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/filter.h | 3 ++-
 net/core/filter.c      | 7 ++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 8053c38..20247e7 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -7,6 +7,7 @@
 #include <stdarg.h>
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/compat.h>
 #include <linux/skbuff.h>
 #include <linux/linkage.h>
@@ -431,7 +432,7 @@ struct bpf_prog {
 };
 
 struct sk_filter {
-	atomic_t	refcnt;
+	refcount_t	refcnt;
 	struct rcu_head	rcu;
 	struct bpf_prog	*prog;
 };
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2..62267e2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
  */
 static void sk_filter_release(struct sk_filter *fp)
 {
-	if (atomic_dec_and_test(&fp->refcnt))
+	if (refcount_dec_and_test(&fp->refcnt))
 		call_rcu(&fp->rcu, sk_filter_release_rcu);
 }
 
@@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 	/* same check as in sock_kmalloc() */
 	if (filter_size <= sysctl_optmem_max &&
 	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
-		atomic_inc(&fp->refcnt);
+		refcount_inc(&fp->refcnt);
 		atomic_add(filter_size, &sk->sk_omem_alloc);
 		return true;
 	}
@@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
 		return -ENOMEM;
 
 	fp->prog = prog;
-	atomic_set(&fp->refcnt, 0);
+	refcount_set(&fp->refcnt, 1);
 
 	if (!sk_filter_charge(sk, fp)) {
 		kfree(fp);
 		return -ENOMEM;
 	}
+	refcount_set(&fp->refcnt, 1);
 
 	old_fp = rcu_dereference_protected(sk->sk_filter,
 					   lockdep_sock_is_held(sk));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/filter.h | 3 ++-
 net/core/filter.c      | 7 ++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 8053c38..20247e7 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -7,6 +7,7 @@
 #include <stdarg.h>
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/compat.h>
 #include <linux/skbuff.h>
 #include <linux/linkage.h>
@@ -431,7 +432,7 @@ struct bpf_prog {
 };
 
 struct sk_filter {
-	atomic_t	refcnt;
+	refcount_t	refcnt;
 	struct rcu_head	rcu;
 	struct bpf_prog	*prog;
 };
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2..62267e2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
  */
 static void sk_filter_release(struct sk_filter *fp)
 {
-	if (atomic_dec_and_test(&fp->refcnt))
+	if (refcount_dec_and_test(&fp->refcnt))
 		call_rcu(&fp->rcu, sk_filter_release_rcu);
 }
 
@@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 	/* same check as in sock_kmalloc() */
 	if (filter_size <= sysctl_optmem_max &&
 	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
-		atomic_inc(&fp->refcnt);
+		refcount_inc(&fp->refcnt);
 		atomic_add(filter_size, &sk->sk_omem_alloc);
 		return true;
 	}
@@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
 		return -ENOMEM;
 
 	fp->prog = prog;
-	atomic_set(&fp->refcnt, 0);
+	refcount_set(&fp->refcnt, 1);
 
 	if (!sk_filter_charge(sk, fp)) {
 		kfree(fp);
 		return -ENOMEM;
 	}
+	refcount_set(&fp->refcnt, 1);
 
 	old_fp = rcu_dereference_protected(sk->sk_filter,
 					   lockdep_sock_is_held(sk));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 09/17] net: convert ip_mc_list.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:28   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/igmp.h |  3 ++-
 net/ipv4/igmp.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 12f6fba..97caf18 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -18,6 +18,7 @@
 #include <linux/skbuff.h>
 #include <linux/timer.h>
 #include <linux/in.h>
+#include <linux/refcount.h>
 #include <uapi/linux/igmp.h>
 
 static inline struct igmphdr *igmp_hdr(const struct sk_buff *skb)
@@ -84,7 +85,7 @@ struct ip_mc_list {
 	struct ip_mc_list __rcu *next_hash;
 	struct timer_list	timer;
 	int			users;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	spinlock_t		lock;
 	char			tm_running;
 	char			reporter;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 44fd86d..0ed4f7b 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -173,7 +173,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode,
 
 static void ip_ma_put(struct ip_mc_list *im)
 {
-	if (atomic_dec_and_test(&im->refcnt)) {
+	if (refcount_dec_and_test(&im->refcnt)) {
 		in_dev_put(im->interface);
 		kfree_rcu(im, rcu);
 	}
@@ -199,7 +199,7 @@ static void igmp_stop_timer(struct ip_mc_list *im)
 {
 	spin_lock_bh(&im->lock);
 	if (del_timer(&im->timer))
-		atomic_dec(&im->refcnt);
+		refcount_dec(&im->refcnt);
 	im->tm_running = 0;
 	im->reporter = 0;
 	im->unsolicit_count = 0;
@@ -213,7 +213,7 @@ static void igmp_start_timer(struct ip_mc_list *im, int max_delay)
 
 	im->tm_running = 1;
 	if (!mod_timer(&im->timer, jiffies+tv+2))
-		atomic_inc(&im->refcnt);
+		refcount_inc(&im->refcnt);
 }
 
 static void igmp_gq_start_timer(struct in_device *in_dev)
@@ -249,7 +249,7 @@ static void igmp_mod_timer(struct ip_mc_list *im, int max_delay)
 			spin_unlock_bh(&im->lock);
 			return;
 		}
-		atomic_dec(&im->refcnt);
+		refcount_dec(&im->refcnt);
 	}
 	igmp_start_timer(im, max_delay);
 	spin_unlock_bh(&im->lock);
@@ -1373,7 +1373,7 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
 	/* initial mode is (EX, empty) */
 	im->sfmode = MCAST_EXCLUDE;
 	im->sfcount[MCAST_EXCLUDE] = 1;
-	atomic_set(&im->refcnt, 1);
+	refcount_set(&im->refcnt, 1);
 	spin_lock_init(&im->lock);
 #ifdef CONFIG_IP_MULTICAST
 	setup_timer(&im->timer, igmp_timer_expire, (unsigned long)im);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 09/17] net: convert ip_mc_list.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:28   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:28 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/igmp.h |  3 ++-
 net/ipv4/igmp.c      | 10 +++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 12f6fba..97caf18 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -18,6 +18,7 @@
 #include <linux/skbuff.h>
 #include <linux/timer.h>
 #include <linux/in.h>
+#include <linux/refcount.h>
 #include <uapi/linux/igmp.h>
 
 static inline struct igmphdr *igmp_hdr(const struct sk_buff *skb)
@@ -84,7 +85,7 @@ struct ip_mc_list {
 	struct ip_mc_list __rcu *next_hash;
 	struct timer_list	timer;
 	int			users;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	spinlock_t		lock;
 	char			tm_running;
 	char			reporter;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 44fd86d..0ed4f7b 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -173,7 +173,7 @@ static int ip_mc_add_src(struct in_device *in_dev, __be32 *pmca, int sfmode,
 
 static void ip_ma_put(struct ip_mc_list *im)
 {
-	if (atomic_dec_and_test(&im->refcnt)) {
+	if (refcount_dec_and_test(&im->refcnt)) {
 		in_dev_put(im->interface);
 		kfree_rcu(im, rcu);
 	}
@@ -199,7 +199,7 @@ static void igmp_stop_timer(struct ip_mc_list *im)
 {
 	spin_lock_bh(&im->lock);
 	if (del_timer(&im->timer))
-		atomic_dec(&im->refcnt);
+		refcount_dec(&im->refcnt);
 	im->tm_running = 0;
 	im->reporter = 0;
 	im->unsolicit_count = 0;
@@ -213,7 +213,7 @@ static void igmp_start_timer(struct ip_mc_list *im, int max_delay)
 
 	im->tm_running = 1;
 	if (!mod_timer(&im->timer, jiffies+tv+2))
-		atomic_inc(&im->refcnt);
+		refcount_inc(&im->refcnt);
 }
 
 static void igmp_gq_start_timer(struct in_device *in_dev)
@@ -249,7 +249,7 @@ static void igmp_mod_timer(struct ip_mc_list *im, int max_delay)
 			spin_unlock_bh(&im->lock);
 			return;
 		}
-		atomic_dec(&im->refcnt);
+		refcount_dec(&im->refcnt);
 	}
 	igmp_start_timer(im, max_delay);
 	spin_unlock_bh(&im->lock);
@@ -1373,7 +1373,7 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
 	/* initial mode is (EX, empty) */
 	im->sfmode = MCAST_EXCLUDE;
 	im->sfcount[MCAST_EXCLUDE] = 1;
-	atomic_set(&im->refcnt, 1);
+	refcount_set(&im->refcnt, 1);
 	spin_lock_init(&im->lock);
 #ifdef CONFIG_IP_MULTICAST
 	setup_timer(&im->timer, igmp_timer_expire, (unsigned long)im);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 10/17] net: convert in_device.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/inetdevice.h | 11 ++++++-----
 net/ipv4/devinet.c         |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ee971f3..5cd9671 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -11,6 +11,7 @@
 #include <linux/timer.h>
 #include <linux/sysctl.h>
 #include <linux/rtnetlink.h>
+#include <linux/refcount.h>
 
 struct ipv4_devconf {
 	void	*sysctl;
@@ -22,7 +23,7 @@ struct ipv4_devconf {
 
 struct in_device {
 	struct net_device	*dev;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	int			dead;
 	struct in_ifaddr	*ifa_list;	/* IP ifaddr chain		*/
 
@@ -212,7 +213,7 @@ static inline struct in_device *in_dev_get(const struct net_device *dev)
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev)
-		atomic_inc(&in_dev->refcnt);
+		refcount_inc(&in_dev->refcnt);
 	rcu_read_unlock();
 	return in_dev;
 }
@@ -233,12 +234,12 @@ void in_dev_finish_destroy(struct in_device *idev);
 
 static inline void in_dev_put(struct in_device *idev)
 {
-	if (atomic_dec_and_test(&idev->refcnt))
+	if (refcount_dec_and_test(&idev->refcnt))
 		in_dev_finish_destroy(idev);
 }
 
-#define __in_dev_put(idev)  atomic_dec(&(idev)->refcnt)
-#define in_dev_hold(idev)   atomic_inc(&(idev)->refcnt)
+#define __in_dev_put(idev)  refcount_dec(&(idev)->refcnt)
+#define in_dev_hold(idev)   refcount_inc(&(idev)->refcnt)
 
 #endif /* __KERNEL__ */
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cebedd5..1527146 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -251,7 +251,7 @@ static struct in_device *inetdev_init(struct net_device *dev)
 	/* Reference in_dev->dev */
 	dev_hold(dev);
 	/* Account for reference dev->ip_ptr (below) */
-	in_dev_hold(in_dev);
+	refcount_set(&in_dev->refcnt, 1);
 
 	err = devinet_sysctl_register(in_dev);
 	if (err) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 10/17] net: convert in_device.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/inetdevice.h | 11 ++++++-----
 net/ipv4/devinet.c         |  2 +-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ee971f3..5cd9671 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -11,6 +11,7 @@
 #include <linux/timer.h>
 #include <linux/sysctl.h>
 #include <linux/rtnetlink.h>
+#include <linux/refcount.h>
 
 struct ipv4_devconf {
 	void	*sysctl;
@@ -22,7 +23,7 @@ struct ipv4_devconf {
 
 struct in_device {
 	struct net_device	*dev;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	int			dead;
 	struct in_ifaddr	*ifa_list;	/* IP ifaddr chain		*/
 
@@ -212,7 +213,7 @@ static inline struct in_device *in_dev_get(const struct net_device *dev)
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev)
-		atomic_inc(&in_dev->refcnt);
+		refcount_inc(&in_dev->refcnt);
 	rcu_read_unlock();
 	return in_dev;
 }
@@ -233,12 +234,12 @@ void in_dev_finish_destroy(struct in_device *idev);
 
 static inline void in_dev_put(struct in_device *idev)
 {
-	if (atomic_dec_and_test(&idev->refcnt))
+	if (refcount_dec_and_test(&idev->refcnt))
 		in_dev_finish_destroy(idev);
 }
 
-#define __in_dev_put(idev)  atomic_dec(&(idev)->refcnt)
-#define in_dev_hold(idev)   atomic_inc(&(idev)->refcnt)
+#define __in_dev_put(idev)  refcount_dec(&(idev)->refcnt)
+#define in_dev_hold(idev)   refcount_inc(&(idev)->refcnt)
 
 #endif /* __KERNEL__ */
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index cebedd5..1527146 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -251,7 +251,7 @@ static struct in_device *inetdev_init(struct net_device *dev)
 	/* Reference in_dev->dev */
 	dev_hold(dev);
 	/* Account for reference dev->ip_ptr (below) */
-	in_dev_hold(in_dev);
+	refcount_set(&in_dev->refcnt, 1);
 
 	err = devinet_sysctl_register(in_dev);
 	if (err) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 11/17] net: convert netpoll_info.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/netpoll.h | 3 ++-
 net/core/netpoll.c      | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 1828900..27c0aaa 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/rcupdate.h>
 #include <linux/list.h>
+#include <linux/refcount.h>
 
 union inet_addr {
 	__u32		all[4];
@@ -34,7 +35,7 @@ struct netpoll {
 };
 
 struct netpoll_info {
-	atomic_t refcnt;
+	refcount_t refcnt;
 
 	struct semaphore dev_lock;
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 891d88e..8b9083d 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -626,7 +626,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 		skb_queue_head_init(&npinfo->txq);
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
 
-		atomic_set(&npinfo->refcnt, 1);
+		refcount_set(&npinfo->refcnt, 1);
 
 		ops = np->dev->netdev_ops;
 		if (ops->ndo_netpoll_setup) {
@@ -636,7 +636,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 		}
 	} else {
 		npinfo = rtnl_dereference(ndev->npinfo);
-		atomic_inc(&npinfo->refcnt);
+		refcount_inc(&npinfo->refcnt);
 	}
 
 	npinfo->netpoll = np;
@@ -815,7 +815,7 @@ void __netpoll_cleanup(struct netpoll *np)
 
 	synchronize_srcu(&netpoll_srcu);
 
-	if (atomic_dec_and_test(&npinfo->refcnt)) {
+	if (refcount_dec_and_test(&npinfo->refcnt)) {
 		const struct net_device_ops *ops;
 
 		ops = np->dev->netdev_ops;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 11/17] net: convert netpoll_info.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/linux/netpoll.h | 3 ++-
 net/core/netpoll.c      | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 1828900..27c0aaa 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/rcupdate.h>
 #include <linux/list.h>
+#include <linux/refcount.h>
 
 union inet_addr {
 	__u32		all[4];
@@ -34,7 +35,7 @@ struct netpoll {
 };
 
 struct netpoll_info {
-	atomic_t refcnt;
+	refcount_t refcnt;
 
 	struct semaphore dev_lock;
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 891d88e..8b9083d 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -626,7 +626,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 		skb_queue_head_init(&npinfo->txq);
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
 
-		atomic_set(&npinfo->refcnt, 1);
+		refcount_set(&npinfo->refcnt, 1);
 
 		ops = np->dev->netdev_ops;
 		if (ops->ndo_netpoll_setup) {
@@ -636,7 +636,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 		}
 	} else {
 		npinfo = rtnl_dereference(ndev->npinfo);
-		atomic_inc(&npinfo->refcnt);
+		refcount_inc(&npinfo->refcnt);
 	}
 
 	npinfo->netpoll = np;
@@ -815,7 +815,7 @@ void __netpoll_cleanup(struct netpoll *np)
 
 	synchronize_srcu(&netpoll_srcu);
 
-	if (atomic_dec_and_test(&npinfo->refcnt)) {
+	if (refcount_dec_and_test(&npinfo->refcnt)) {
 		const struct net_device_ops *ops;
 
 		ops = np->dev->netdev_ops;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 12/17] net: convert unix_address.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/af_unix.h | 3 ++-
 net/unix/af_unix.c    | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index fd60ecc..3a385e4 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -4,6 +4,7 @@
 #include <linux/socket.h>
 #include <linux/un.h>
 #include <linux/mutex.h>
+#include <linux/refcount.h>
 #include <net/sock.h>
 
 void unix_inflight(struct user_struct *user, struct file *fp);
@@ -21,7 +22,7 @@ extern spinlock_t unix_table_lock;
 extern struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
 
 struct unix_address {
-	atomic_t	refcnt;
+	refcount_t	refcnt;
 	int		len;
 	unsigned int	hash;
 	struct sockaddr_un name[0];
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e9f8102..5e2a402 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -212,7 +212,7 @@ EXPORT_SYMBOL_GPL(unix_peer_get);
 
 static inline void unix_release_addr(struct unix_address *addr)
 {
-	if (atomic_dec_and_test(&addr->refcnt))
+	if (refcount_dec_and_test(&addr->refcnt))
 		kfree(addr);
 }
 
@@ -861,7 +861,7 @@ static int unix_autobind(struct socket *sock)
 		goto out;
 
 	addr->name->sun_family = AF_UNIX;
-	atomic_set(&addr->refcnt, 1);
+	refcount_set(&addr->refcnt, 1);
 
 retry:
 	addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
@@ -1036,7 +1036,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	memcpy(addr->name, sunaddr, addr_len);
 	addr->len = addr_len;
 	addr->hash = hash ^ sk->sk_type;
-	atomic_set(&addr->refcnt, 1);
+	refcount_set(&addr->refcnt, 1);
 
 	if (sun_path[0]) {
 		addr->hash = UNIX_HASH_SIZE;
@@ -1327,7 +1327,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 
 	/* copy address information from listening to new sock*/
 	if (otheru->addr) {
-		atomic_inc(&otheru->addr->refcnt);
+		refcount_inc(&otheru->addr->refcnt);
 		newu->addr = otheru->addr;
 	}
 	if (otheru->path.dentry) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 12/17] net: convert unix_address.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/af_unix.h | 3 ++-
 net/unix/af_unix.c    | 8 ++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index fd60ecc..3a385e4 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -4,6 +4,7 @@
 #include <linux/socket.h>
 #include <linux/un.h>
 #include <linux/mutex.h>
+#include <linux/refcount.h>
 #include <net/sock.h>
 
 void unix_inflight(struct user_struct *user, struct file *fp);
@@ -21,7 +22,7 @@ extern spinlock_t unix_table_lock;
 extern struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
 
 struct unix_address {
-	atomic_t	refcnt;
+	refcount_t	refcnt;
 	int		len;
 	unsigned int	hash;
 	struct sockaddr_un name[0];
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e9f8102..5e2a402 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -212,7 +212,7 @@ EXPORT_SYMBOL_GPL(unix_peer_get);
 
 static inline void unix_release_addr(struct unix_address *addr)
 {
-	if (atomic_dec_and_test(&addr->refcnt))
+	if (refcount_dec_and_test(&addr->refcnt))
 		kfree(addr);
 }
 
@@ -861,7 +861,7 @@ static int unix_autobind(struct socket *sock)
 		goto out;
 
 	addr->name->sun_family = AF_UNIX;
-	atomic_set(&addr->refcnt, 1);
+	refcount_set(&addr->refcnt, 1);
 
 retry:
 	addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + sizeof(short);
@@ -1036,7 +1036,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	memcpy(addr->name, sunaddr, addr_len);
 	addr->len = addr_len;
 	addr->hash = hash ^ sk->sk_type;
-	atomic_set(&addr->refcnt, 1);
+	refcount_set(&addr->refcnt, 1);
 
 	if (sun_path[0]) {
 		addr->hash = UNIX_HASH_SIZE;
@@ -1327,7 +1327,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 
 	/* copy address information from listening to new sock*/
 	if (otheru->addr) {
-		atomic_inc(&otheru->addr->refcnt);
+		refcount_inc(&otheru->addr->refcnt);
 		newu->addr = otheru->addr;
 	}
 	if (otheru->path.dentry) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 13/17] net: convert fib_rule.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/fib_rules.h | 7 ++++---
 net/core/fib_rules.c    | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 8dbfdf7..26ae0ad 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -5,6 +5,7 @@
 #include <linux/slab.h>
 #include <linux/netdevice.h>
 #include <linux/fib_rules.h>
+#include <linux/refcount.h>
 #include <net/flow.h>
 #include <net/rtnetlink.h>
 
@@ -29,7 +30,7 @@ struct fib_rule {
 	struct fib_rule __rcu	*ctarget;
 	struct net		*fr_net;
 
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	u32			pref;
 	int			suppress_ifgroup;
 	int			suppress_prefixlen;
@@ -103,12 +104,12 @@ struct fib_rules_ops {
 
 static inline void fib_rule_get(struct fib_rule *rule)
 {
-	atomic_inc(&rule->refcnt);
+	refcount_inc(&rule->refcnt);
 }
 
 static inline void fib_rule_put(struct fib_rule *rule)
 {
-	if (atomic_dec_and_test(&rule->refcnt))
+	if (refcount_dec_and_test(&rule->refcnt))
 		kfree_rcu(rule, rcu);
 }
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index b6791d9..53d55d5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -32,7 +32,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
 	if (r == NULL)
 		return -ENOMEM;
 
-	atomic_set(&r->refcnt, 1);
+	refcount_set(&r->refcnt, 1);
 	r->action = FR_ACT_TO_TBL;
 	r->pref = pref;
 	r->table = table;
@@ -269,7 +269,7 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl,
 
 		if (err != -EAGAIN) {
 			if ((arg->flags & FIB_LOOKUP_NOREF) ||
-			    likely(atomic_inc_not_zero(&rule->refcnt))) {
+			    likely(refcount_inc_not_zero(&rule->refcnt))) {
 				arg->rule = rule;
 				goto out;
 			}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 13/17] net: convert fib_rule.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/fib_rules.h | 7 ++++---
 net/core/fib_rules.c    | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 8dbfdf7..26ae0ad 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -5,6 +5,7 @@
 #include <linux/slab.h>
 #include <linux/netdevice.h>
 #include <linux/fib_rules.h>
+#include <linux/refcount.h>
 #include <net/flow.h>
 #include <net/rtnetlink.h>
 
@@ -29,7 +30,7 @@ struct fib_rule {
 	struct fib_rule __rcu	*ctarget;
 	struct net		*fr_net;
 
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	u32			pref;
 	int			suppress_ifgroup;
 	int			suppress_prefixlen;
@@ -103,12 +104,12 @@ struct fib_rules_ops {
 
 static inline void fib_rule_get(struct fib_rule *rule)
 {
-	atomic_inc(&rule->refcnt);
+	refcount_inc(&rule->refcnt);
 }
 
 static inline void fib_rule_put(struct fib_rule *rule)
 {
-	if (atomic_dec_and_test(&rule->refcnt))
+	if (refcount_dec_and_test(&rule->refcnt))
 		kfree_rcu(rule, rcu);
 }
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index b6791d9..53d55d5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -32,7 +32,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
 	if (r == NULL)
 		return -ENOMEM;
 
-	atomic_set(&r->refcnt, 1);
+	refcount_set(&r->refcnt, 1);
 	r->action = FR_ACT_TO_TBL;
 	r->pref = pref;
 	r->table = table;
@@ -269,7 +269,7 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl,
 
 		if (err != -EAGAIN) {
 			if ((arg->flags & FIB_LOOKUP_NOREF) ||
-			    likely(atomic_inc_not_zero(&rule->refcnt))) {
+			    likely(refcount_inc_not_zero(&rule->refcnt))) {
 				arg->rule = rule;
 				goto out;
 			}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 14/17] net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/inet_frag.h  |  4 ++--
 net/ipv4/inet_fragment.c | 14 +++++++-------
 net/ipv4/ip_fragment.c   |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5894730..5a334bf 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -50,7 +50,7 @@ struct inet_frag_queue {
 	spinlock_t		lock;
 	struct timer_list	timer;
 	struct hlist_node	list;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	struct sk_buff		*fragments;
 	struct sk_buff		*fragments_tail;
 	ktime_t			stamp;
@@ -129,7 +129,7 @@ void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 
 static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
 {
-	if (atomic_dec_and_test(&q->refcnt))
+	if (refcount_dec_and_test(&q->refcnt))
 		inet_frag_destroy(q, f);
 }
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index b5e9317..96e95e8 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -276,11 +276,11 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
 {
 	if (del_timer(&fq->timer))
-		atomic_dec(&fq->refcnt);
+		refcount_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
 		fq_unlink(fq, f);
-		atomic_dec(&fq->refcnt);
+		refcount_dec(&fq->refcnt);
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
@@ -329,7 +329,7 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 	 */
 	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
-			atomic_inc(&qp->refcnt);
+			refcount_inc(&qp->refcnt);
 			spin_unlock(&hb->chain_lock);
 			qp_in->flags |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
@@ -339,9 +339,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 #endif
 	qp = qp_in;
 	if (!mod_timer(&qp->timer, jiffies + nf->timeout))
-		atomic_inc(&qp->refcnt);
+		refcount_inc(&qp->refcnt);
 
-	atomic_inc(&qp->refcnt);
+	refcount_inc(&qp->refcnt);
 	hlist_add_head(&qp->list, &hb->chain);
 
 	spin_unlock(&hb->chain_lock);
@@ -370,7 +370,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 
 	setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
 	spin_lock_init(&q->lock);
-	atomic_set(&q->refcnt, 1);
+	refcount_set(&q->refcnt, 1);
 
 	return q;
 }
@@ -405,7 +405,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	spin_lock(&hb->chain_lock);
 	hlist_for_each_entry(q, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
-			atomic_inc(&q->refcnt);
+			refcount_inc(&q->refcnt);
 			spin_unlock(&hb->chain_lock);
 			return q;
 		}
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index bbe7f72..4fed6f6 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -303,7 +303,7 @@ static int ip_frag_reinit(struct ipq *qp)
 	unsigned int sum_truesize = 0;
 
 	if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
-		atomic_inc(&qp->q.refcnt);
+		refcount_inc(&qp->q.refcnt);
 		return -ETIMEDOUT;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 14/17] net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/inet_frag.h  |  4 ++--
 net/ipv4/inet_fragment.c | 14 +++++++-------
 net/ipv4/ip_fragment.c   |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5894730..5a334bf 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -50,7 +50,7 @@ struct inet_frag_queue {
 	spinlock_t		lock;
 	struct timer_list	timer;
 	struct hlist_node	list;
-	atomic_t		refcnt;
+	refcount_t		refcnt;
 	struct sk_buff		*fragments;
 	struct sk_buff		*fragments_tail;
 	ktime_t			stamp;
@@ -129,7 +129,7 @@ void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 
 static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
 {
-	if (atomic_dec_and_test(&q->refcnt))
+	if (refcount_dec_and_test(&q->refcnt))
 		inet_frag_destroy(q, f);
 }
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index b5e9317..96e95e8 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -276,11 +276,11 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
 {
 	if (del_timer(&fq->timer))
-		atomic_dec(&fq->refcnt);
+		refcount_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
 		fq_unlink(fq, f);
-		atomic_dec(&fq->refcnt);
+		refcount_dec(&fq->refcnt);
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
@@ -329,7 +329,7 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 	 */
 	hlist_for_each_entry(qp, &hb->chain, list) {
 		if (qp->net == nf && f->match(qp, arg)) {
-			atomic_inc(&qp->refcnt);
+			refcount_inc(&qp->refcnt);
 			spin_unlock(&hb->chain_lock);
 			qp_in->flags |= INET_FRAG_COMPLETE;
 			inet_frag_put(qp_in, f);
@@ -339,9 +339,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,
 #endif
 	qp = qp_in;
 	if (!mod_timer(&qp->timer, jiffies + nf->timeout))
-		atomic_inc(&qp->refcnt);
+		refcount_inc(&qp->refcnt);
 
-	atomic_inc(&qp->refcnt);
+	refcount_inc(&qp->refcnt);
 	hlist_add_head(&qp->list, &hb->chain);
 
 	spin_unlock(&hb->chain_lock);
@@ -370,7 +370,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 
 	setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
 	spin_lock_init(&q->lock);
-	atomic_set(&q->refcnt, 1);
+	refcount_set(&q->refcnt, 1);
 
 	return q;
 }
@@ -405,7 +405,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	spin_lock(&hb->chain_lock);
 	hlist_for_each_entry(q, &hb->chain, list) {
 		if (q->net == nf && f->match(q, key)) {
-			atomic_inc(&q->refcnt);
+			refcount_inc(&q->refcnt);
 			spin_unlock(&hb->chain_lock);
 			return q;
 		}
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index bbe7f72..4fed6f6 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -303,7 +303,7 @@ static int ip_frag_reinit(struct ipq *qp)
 	unsigned int sum_truesize = 0;
 
 	if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
-		atomic_inc(&qp->q.refcnt);
+		refcount_inc(&qp->q.refcnt);
 		return -ETIMEDOUT;
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/net_namespace.h | 3 ++-
 net/core/net-sysfs.c        | 2 +-
 net/core/net_namespace.c    | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index af8fe8a..ec6dcaf 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -5,6 +5,7 @@
 #define __NET_NET_NAMESPACE_H
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/list.h>
 #include <linux/sysctl.h>
@@ -45,7 +46,7 @@ struct netns_ipvs;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
-	atomic_t		passive;	/* To decided when the network
+	refcount_t		passive;	/* To decided when the network
 						 * namespace should be freed.
 						 */
 	atomic_t		count;		/* To decided when the network
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 3945821..d62caad 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1444,7 +1444,7 @@ static void *net_grab_current_ns(void)
 	struct net *ns = current->nsproxy->net_ns;
 #ifdef CONFIG_NET_NS
 	if (ns)
-		atomic_inc(&ns->passive);
+		refcount_inc(&ns->passive);
 #endif
 	return ns;
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 652468f..cb981aa 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -283,7 +283,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
-	atomic_set(&net->passive, 1);
+	refcount_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
@@ -360,7 +360,7 @@ static void net_free(struct net *net)
 void net_drop_ns(void *p)
 {
 	struct net *ns = p;
-	if (ns && atomic_dec_and_test(&ns->passive))
+	if (ns && refcount_dec_and_test(&ns->passive))
 		net_free(ns);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/net_namespace.h | 3 ++-
 net/core/net-sysfs.c        | 2 +-
 net/core/net_namespace.c    | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index af8fe8a..ec6dcaf 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -5,6 +5,7 @@
 #define __NET_NET_NAMESPACE_H
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/list.h>
 #include <linux/sysctl.h>
@@ -45,7 +46,7 @@ struct netns_ipvs;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
-	atomic_t		passive;	/* To decided when the network
+	refcount_t		passive;	/* To decided when the network
 						 * namespace should be freed.
 						 */
 	atomic_t		count;		/* To decided when the network
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 3945821..d62caad 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1444,7 +1444,7 @@ static void *net_grab_current_ns(void)
 	struct net *ns = current->nsproxy->net_ns;
 #ifdef CONFIG_NET_NS
 	if (ns)
-		atomic_inc(&ns->passive);
+		refcount_inc(&ns->passive);
 #endif
 	return ns;
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 652468f..cb981aa 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -283,7 +283,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
-	atomic_set(&net->passive, 1);
+	refcount_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
@@ -360,7 +360,7 @@ static void net_free(struct net *net)
 void net_drop_ns(void *p)
 {
 	struct net *ns = p;
-	if (ns && atomic_dec_and_test(&ns->passive))
+	if (ns && refcount_dec_and_test(&ns->passive))
 		net_free(ns);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 16/17] net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/netlabel.h | 8 ++++----
 net/ipv4/cipso_ipv4.c  | 4 ++--
 net/ipv6/calipso.c     | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index efe9806..72d6435 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -37,7 +37,7 @@
 #include <linux/in6.h>
 #include <net/netlink.h>
 #include <net/request_sock.h>
-#include <linux/atomic.h>
+#include <linux/refcount.h>
 
 struct cipso_v4_doi;
 struct calipso_doi;
@@ -136,7 +136,7 @@ struct netlbl_audit {
  *
  */
 struct netlbl_lsm_cache {
-	atomic_t refcount;
+	refcount_t refcount;
 	void (*free) (const void *data);
 	void *data;
 };
@@ -295,7 +295,7 @@ static inline struct netlbl_lsm_cache *netlbl_secattr_cache_alloc(gfp_t flags)
 
 	cache = kzalloc(sizeof(*cache), flags);
 	if (cache)
-		atomic_set(&cache->refcount, 1);
+		refcount_set(&cache->refcount, 1);
 	return cache;
 }
 
@@ -309,7 +309,7 @@ static inline struct netlbl_lsm_cache *netlbl_secattr_cache_alloc(gfp_t flags)
  */
 static inline void netlbl_secattr_cache_free(struct netlbl_lsm_cache *cache)
 {
-	if (!atomic_dec_and_test(&cache->refcount))
+	if (!refcount_dec_and_test(&cache->refcount))
 		return;
 
 	if (cache->free)
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index ae20616..c204477 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -265,7 +265,7 @@ static int cipso_v4_cache_check(const unsigned char *key,
 		    entry->key_len == key_len &&
 		    memcmp(entry->key, key, key_len) == 0) {
 			entry->activity += 1;
-			atomic_inc(&entry->lsm_data->refcount);
+			refcount_inc(&entry->lsm_data->refcount);
 			secattr->cache = entry->lsm_data;
 			secattr->flags |= NETLBL_SECATTR_CACHE;
 			secattr->type = NETLBL_NLTYPE_CIPSOV4;
@@ -332,7 +332,7 @@ int cipso_v4_cache_add(const unsigned char *cipso_ptr,
 	}
 	entry->key_len = cipso_ptr_len;
 	entry->hash = cipso_v4_map_cache_hash(cipso_ptr, cipso_ptr_len);
-	atomic_inc(&secattr->cache->refcount);
+	refcount_inc(&secattr->cache->refcount);
 	entry->lsm_data = secattr->cache;
 
 	bkt = entry->hash & (CIPSO_V4_CACHE_BUCKETS - 1);
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 37ac9de..6c6ecf1 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -227,7 +227,7 @@ static int calipso_cache_check(const unsigned char *key,
 		    entry->key_len == key_len &&
 		    memcmp(entry->key, key, key_len) == 0) {
 			entry->activity += 1;
-			atomic_inc(&entry->lsm_data->refcount);
+			refcount_inc(&entry->lsm_data->refcount);
 			secattr->cache = entry->lsm_data;
 			secattr->flags |= NETLBL_SECATTR_CACHE;
 			secattr->type = NETLBL_NLTYPE_CALIPSO;
@@ -296,7 +296,7 @@ static int calipso_cache_add(const unsigned char *calipso_ptr,
 	}
 	entry->key_len = calipso_ptr_len;
 	entry->hash = calipso_map_cache_hash(calipso_ptr, calipso_ptr_len);
-	atomic_inc(&secattr->cache->refcount);
+	refcount_inc(&secattr->cache->refcount);
 	entry->lsm_data = secattr->cache;
 
 	bkt = entry->hash & (CALIPSO_CACHE_BUCKETS - 1);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 16/17] net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/netlabel.h | 8 ++++----
 net/ipv4/cipso_ipv4.c  | 4 ++--
 net/ipv6/calipso.c     | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/netlabel.h b/include/net/netlabel.h
index efe9806..72d6435 100644
--- a/include/net/netlabel.h
+++ b/include/net/netlabel.h
@@ -37,7 +37,7 @@
 #include <linux/in6.h>
 #include <net/netlink.h>
 #include <net/request_sock.h>
-#include <linux/atomic.h>
+#include <linux/refcount.h>
 
 struct cipso_v4_doi;
 struct calipso_doi;
@@ -136,7 +136,7 @@ struct netlbl_audit {
  *
  */
 struct netlbl_lsm_cache {
-	atomic_t refcount;
+	refcount_t refcount;
 	void (*free) (const void *data);
 	void *data;
 };
@@ -295,7 +295,7 @@ static inline struct netlbl_lsm_cache *netlbl_secattr_cache_alloc(gfp_t flags)
 
 	cache = kzalloc(sizeof(*cache), flags);
 	if (cache)
-		atomic_set(&cache->refcount, 1);
+		refcount_set(&cache->refcount, 1);
 	return cache;
 }
 
@@ -309,7 +309,7 @@ static inline struct netlbl_lsm_cache *netlbl_secattr_cache_alloc(gfp_t flags)
  */
 static inline void netlbl_secattr_cache_free(struct netlbl_lsm_cache *cache)
 {
-	if (!atomic_dec_and_test(&cache->refcount))
+	if (!refcount_dec_and_test(&cache->refcount))
 		return;
 
 	if (cache->free)
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index ae20616..c204477 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -265,7 +265,7 @@ static int cipso_v4_cache_check(const unsigned char *key,
 		    entry->key_len == key_len &&
 		    memcmp(entry->key, key, key_len) == 0) {
 			entry->activity += 1;
-			atomic_inc(&entry->lsm_data->refcount);
+			refcount_inc(&entry->lsm_data->refcount);
 			secattr->cache = entry->lsm_data;
 			secattr->flags |= NETLBL_SECATTR_CACHE;
 			secattr->type = NETLBL_NLTYPE_CIPSOV4;
@@ -332,7 +332,7 @@ int cipso_v4_cache_add(const unsigned char *cipso_ptr,
 	}
 	entry->key_len = cipso_ptr_len;
 	entry->hash = cipso_v4_map_cache_hash(cipso_ptr, cipso_ptr_len);
-	atomic_inc(&secattr->cache->refcount);
+	refcount_inc(&secattr->cache->refcount);
 	entry->lsm_data = secattr->cache;
 
 	bkt = entry->hash & (CIPSO_V4_CACHE_BUCKETS - 1);
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 37ac9de..6c6ecf1 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -227,7 +227,7 @@ static int calipso_cache_check(const unsigned char *key,
 		    entry->key_len == key_len &&
 		    memcmp(entry->key, key, key_len) == 0) {
 			entry->activity += 1;
-			atomic_inc(&entry->lsm_data->refcount);
+			refcount_inc(&entry->lsm_data->refcount);
 			secattr->cache = entry->lsm_data;
 			secattr->flags |= NETLBL_SECATTR_CACHE;
 			secattr->type = NETLBL_NLTYPE_CALIPSO;
@@ -296,7 +296,7 @@ static int calipso_cache_add(const unsigned char *calipso_ptr,
 	}
 	entry->key_len = calipso_ptr_len;
 	entry->hash = calipso_map_cache_hash(calipso_ptr, calipso_ptr_len);
-	atomic_inc(&secattr->cache->refcount);
+	refcount_inc(&secattr->cache->refcount);
 	entry->lsm_data = secattr->cache;
 
 	bkt = entry->hash & (CALIPSO_CACHE_BUCKETS - 1);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 17/17] net: convert packet_fanout.sk_ref from atomic_t to refcount_t
  2017-03-16 15:28 ` [Bridge] " Elena Reshetova
@ 2017-03-16 15:29   ` Elena Reshetova
  -1 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 net/packet/af_packet.c | 8 ++++----
 net/packet/internal.h  | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ad5e5dc..ef868a7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1698,7 +1698,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 		match->flags = flags;
 		INIT_LIST_HEAD(&match->list);
 		spin_lock_init(&match->lock);
-		atomic_set(&match->sk_ref, 0);
+		refcount_set(&match->sk_ref, 0);
 		fanout_init_data(match);
 		match->prot_hook.type = po->prot_hook.type;
 		match->prot_hook.dev = po->prot_hook.dev;
@@ -1712,10 +1712,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 	    match->prot_hook.type == po->prot_hook.type &&
 	    match->prot_hook.dev == po->prot_hook.dev) {
 		err = -ENOSPC;
-		if (atomic_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
+		if (refcount_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
 			__dev_remove_pack(&po->prot_hook);
 			po->fanout = match;
-			atomic_inc(&match->sk_ref);
+			refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1);
 			__fanout_link(sk, po);
 			err = 0;
 		}
@@ -1744,7 +1744,7 @@ static struct packet_fanout *fanout_release(struct sock *sk)
 	if (f) {
 		po->fanout = NULL;
 
-		if (atomic_dec_and_test(&f->sk_ref))
+		if (refcount_dec_and_test(&f->sk_ref))
 			list_del(&f->list);
 		else
 			f = NULL;
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 9ee4631..94d1d40 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -1,6 +1,8 @@
 #ifndef __PACKET_INTERNAL_H__
 #define __PACKET_INTERNAL_H__
 
+#include <linux/refcount.h>
+
 struct packet_mclist {
 	struct packet_mclist	*next;
 	int			ifindex;
@@ -86,7 +88,7 @@ struct packet_fanout {
 	struct list_head	list;
 	struct sock		*arr[PACKET_FANOUT_MAX];
 	spinlock_t		lock;
-	atomic_t		sk_ref;
+	refcount_t		sk_ref;
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [Bridge] [PATCH 17/17] net: convert packet_fanout.sk_ref from atomic_t to refcount_t
@ 2017-03-16 15:29   ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-03-16 15:29 UTC (permalink / raw)
  To: netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, kaber, Elena Reshetova, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 net/packet/af_packet.c | 8 ++++----
 net/packet/internal.h  | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ad5e5dc..ef868a7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1698,7 +1698,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 		match->flags = flags;
 		INIT_LIST_HEAD(&match->list);
 		spin_lock_init(&match->lock);
-		atomic_set(&match->sk_ref, 0);
+		refcount_set(&match->sk_ref, 0);
 		fanout_init_data(match);
 		match->prot_hook.type = po->prot_hook.type;
 		match->prot_hook.dev = po->prot_hook.dev;
@@ -1712,10 +1712,10 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 	    match->prot_hook.type == po->prot_hook.type &&
 	    match->prot_hook.dev == po->prot_hook.dev) {
 		err = -ENOSPC;
-		if (atomic_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
+		if (refcount_read(&match->sk_ref) < PACKET_FANOUT_MAX) {
 			__dev_remove_pack(&po->prot_hook);
 			po->fanout = match;
-			atomic_inc(&match->sk_ref);
+			refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1);
 			__fanout_link(sk, po);
 			err = 0;
 		}
@@ -1744,7 +1744,7 @@ static struct packet_fanout *fanout_release(struct sock *sk)
 	if (f) {
 		po->fanout = NULL;
 
-		if (atomic_dec_and_test(&f->sk_ref))
+		if (refcount_dec_and_test(&f->sk_ref))
 			list_del(&f->list);
 		else
 			f = NULL;
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 9ee4631..94d1d40 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -1,6 +1,8 @@
 #ifndef __PACKET_INTERNAL_H__
 #define __PACKET_INTERNAL_H__
 
+#include <linux/refcount.h>
+
 struct packet_mclist {
 	struct packet_mclist	*next;
 	int			ifindex;
@@ -86,7 +88,7 @@ struct packet_fanout {
 	struct list_head	list;
 	struct sock		*arr[PACKET_FANOUT_MAX];
 	spinlock_t		lock;
-	atomic_t		sk_ref;
+	refcount_t		sk_ref;
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
  2017-03-16 15:28   ` [Bridge] " Elena Reshetova
@ 2017-03-16 16:04     ` Daniel Borkmann
  -1 siblings, 0 replies; 137+ messages in thread
From: Daniel Borkmann @ 2017-03-16 16:04 UTC (permalink / raw)
  To: Elena Reshetova, netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Hans Liljestrand, David Windsor, alexei.starovoitov

On 03/16/2017 04:28 PM, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: David Windsor <dwindsor@gmail.com>
> ---
>   include/linux/filter.h | 3 ++-
>   net/core/filter.c      | 7 ++++---
>   2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 8053c38..20247e7 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -7,6 +7,7 @@
>   #include <stdarg.h>
>
>   #include <linux/atomic.h>
> +#include <linux/refcount.h>
>   #include <linux/compat.h>
>   #include <linux/skbuff.h>
>   #include <linux/linkage.h>
> @@ -431,7 +432,7 @@ struct bpf_prog {
>   };
>
>   struct sk_filter {
> -	atomic_t	refcnt;
> +	refcount_t	refcnt;
>   	struct rcu_head	rcu;
>   	struct bpf_prog	*prog;
>   };
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ebaeaf2..62267e2 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
>    */
>   static void sk_filter_release(struct sk_filter *fp)
>   {
> -	if (atomic_dec_and_test(&fp->refcnt))
> +	if (refcount_dec_and_test(&fp->refcnt))
>   		call_rcu(&fp->rcu, sk_filter_release_rcu);
>   }
>
> @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
>   	/* same check as in sock_kmalloc() */
>   	if (filter_size <= sysctl_optmem_max &&
>   	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
> -		atomic_inc(&fp->refcnt);
> +		refcount_inc(&fp->refcnt);
>   		atomic_add(filter_size, &sk->sk_omem_alloc);
>   		return true;
>   	}
> @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
>   		return -ENOMEM;
>
>   	fp->prog = prog;
> -	atomic_set(&fp->refcnt, 0);
> +	refcount_set(&fp->refcnt, 1);
>
>   	if (!sk_filter_charge(sk, fp)) {
>   		kfree(fp);
>   		return -ENOMEM;
>   	}
> +	refcount_set(&fp->refcnt, 1);

Regarding the two subsequent refcount_set(, 1) that look a bit strange
due to the sk_filter_charge() having refcount_inc() I presume ... can't
the refcount API handle such corner case? Or alternatively the let the
sk_filter_charge() handle it, for example:

bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
{
	u32 filter_size = bpf_prog_size(fp->prog->len);

	/* same check as in sock_kmalloc() */
	if (filter_size <= sysctl_optmem_max &&
	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
		atomic_add(filter_size, &sk->sk_omem_alloc);
		return true;
	}
	return false;
}

And this goes to filter.h:

bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp);

bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
{
	bool ret = __sk_filter_charge(sk, fp);
	if (ret)
		refcount_inc(&fp->refcnt);
	return ret;
}

... and let __sk_attach_prog() call __sk_filter_charge() and only fo
the second refcount_set()?

>   	old_fp = rcu_dereference_protected(sk->sk_filter,
>   					   lockdep_sock_is_held(sk));
>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
@ 2017-03-16 16:04     ` Daniel Borkmann
  0 siblings, 0 replies; 137+ messages in thread
From: Daniel Borkmann @ 2017-03-16 16:04 UTC (permalink / raw)
  To: Elena Reshetova, netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, alexei.starovoitov, kaber,
	David Windsor

On 03/16/2017 04:28 PM, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: David Windsor <dwindsor@gmail.com>
> ---
>   include/linux/filter.h | 3 ++-
>   net/core/filter.c      | 7 ++++---
>   2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 8053c38..20247e7 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -7,6 +7,7 @@
>   #include <stdarg.h>
>
>   #include <linux/atomic.h>
> +#include <linux/refcount.h>
>   #include <linux/compat.h>
>   #include <linux/skbuff.h>
>   #include <linux/linkage.h>
> @@ -431,7 +432,7 @@ struct bpf_prog {
>   };
>
>   struct sk_filter {
> -	atomic_t	refcnt;
> +	refcount_t	refcnt;
>   	struct rcu_head	rcu;
>   	struct bpf_prog	*prog;
>   };
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ebaeaf2..62267e2 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
>    */
>   static void sk_filter_release(struct sk_filter *fp)
>   {
> -	if (atomic_dec_and_test(&fp->refcnt))
> +	if (refcount_dec_and_test(&fp->refcnt))
>   		call_rcu(&fp->rcu, sk_filter_release_rcu);
>   }
>
> @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
>   	/* same check as in sock_kmalloc() */
>   	if (filter_size <= sysctl_optmem_max &&
>   	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
> -		atomic_inc(&fp->refcnt);
> +		refcount_inc(&fp->refcnt);
>   		atomic_add(filter_size, &sk->sk_omem_alloc);
>   		return true;
>   	}
> @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
>   		return -ENOMEM;
>
>   	fp->prog = prog;
> -	atomic_set(&fp->refcnt, 0);
> +	refcount_set(&fp->refcnt, 1);
>
>   	if (!sk_filter_charge(sk, fp)) {
>   		kfree(fp);
>   		return -ENOMEM;
>   	}
> +	refcount_set(&fp->refcnt, 1);

Regarding the two subsequent refcount_set(, 1) that look a bit strange
due to the sk_filter_charge() having refcount_inc() I presume ... can't
the refcount API handle such corner case? Or alternatively the let the
sk_filter_charge() handle it, for example:

bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
{
	u32 filter_size = bpf_prog_size(fp->prog->len);

	/* same check as in sock_kmalloc() */
	if (filter_size <= sysctl_optmem_max &&
	    atomic_read(&sk->sk_omem_alloc) + filter_size < sysctl_optmem_max) {
		atomic_add(filter_size, &sk->sk_omem_alloc);
		return true;
	}
	return false;
}

And this goes to filter.h:

bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp);

bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
{
	bool ret = __sk_filter_charge(sk, fp);
	if (ret)
		refcount_inc(&fp->refcnt);
	return ret;
}

... and let __sk_attach_prog() call __sk_filter_charge() and only fo
the second refcount_set()?

>   	old_fp = rcu_dereference_protected(sk->sk_filter,
>   					   lockdep_sock_is_held(sk));
>


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-16 15:28   ` [Bridge] " Elena Reshetova
  (?)
@ 2017-03-16 16:58     ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-16 16:58 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: netdev, bridge, linux-kernel, kuznet, jmorris, kaber, stephen,
	peterz, keescook, Hans Liljestrand, David Windsor

On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.


...

>  static __always_inline void sock_hold(struct sock *sk)
>  {
> -	atomic_inc(&sk->sk_refcnt);
> +	refcount_inc(&sk->sk_refcnt);
>  }
>  

While I certainly see the value of these refcount_t, we have a very
different behavior on these atomic_inc() which were doing a single
inlined LOCK RMW on x86.

We now call an external function performing a 
atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
loop, loosing the nice ability for x86 of preventing live locks.

Looks a lot of bloat, just to be able to chase hypothetical bugs in the
kernel.

I would love to have a way to enable extra debugging when I want a debug
kernel, like LOCKDEP or KASAN.

By adding all this bloat, we assert linux kernel is terminally buggy and
every atomic_inc() we did was suspicious, and need to be always
instrumented/validated.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 16:58     ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-16 16:58 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: keescook, peterz, netdev, bridge, jmorris, linux-kernel,
	Hans Liljestrand, kuznet, kaber, David Windsor

On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.


...

>  static __always_inline void sock_hold(struct sock *sk)
>  {
> -	atomic_inc(&sk->sk_refcnt);
> +	refcount_inc(&sk->sk_refcnt);
>  }
>  

While I certainly see the value of these refcount_t, we have a very
different behavior on these atomic_inc() which were doing a single
inlined LOCK RMW on x86.

We now call an external function performing a 
atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
loop, loosing the nice ability for x86 of preventing live locks.

Looks a lot of bloat, just to be able to chase hypothetical bugs in the
kernel.

I would love to have a way to enable extra debugging when I want a debug
kernel, like LOCKDEP or KASAN.

By adding all this bloat, we assert linux kernel is terminally buggy and
every atomic_inc() we did was suspicious, and need to be always
instrumented/validated.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 16:58     ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-16 16:58 UTC (permalink / raw)
  To: Elena Reshetova
  Cc: keescook, peterz, netdev, bridge, jmorris, linux-kernel,
	Hans Liljestrand, kuznet, kaber, David Windsor

On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.


...

>  static __always_inline void sock_hold(struct sock *sk)
>  {
> -	atomic_inc(&sk->sk_refcnt);
> +	refcount_inc(&sk->sk_refcnt);
>  }
>  

While I certainly see the value of these refcount_t, we have a very
different behavior on these atomic_inc() which were doing a single
inlined LOCK RMW on x86.

We now call an external function performing a 
atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
loop, loosing the nice ability for x86 of preventing live locks.

Looks a lot of bloat, just to be able to chase hypothetical bugs in the
kernel.

I would love to have a way to enable extra debugging when I want a debug
kernel, like LOCKDEP or KASAN.

By adding all this bloat, we assert linux kernel is terminally buggy and
every atomic_inc() we did was suspicious, and need to be always
instrumented/validated.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-16 16:58     ` Eric Dumazet
@ 2017-03-16 17:38       ` Kees Cook
  -1 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-16 17:38 UTC (permalink / raw)
  To: Eric Dumazet, Peter Zijlstra
  Cc: Elena Reshetova, Network Development, bridge, LKML,
	Alexey Kuznetsov, James Morris, Patrick McHardy,
	Stephen Hemminger, Hans Liljestrand, David Windsor

On Thu, Mar 16, 2017 at 10:58 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
>> refcount_t type and corresponding API should be
>> used instead of atomic_t when the variable is used as
>> a reference counter. This allows to avoid accidental
>> refcounter overflows that might lead to use-after-free
>> situations.
>
>
> ...
>
>>  static __always_inline void sock_hold(struct sock *sk)
>>  {
>> -     atomic_inc(&sk->sk_refcnt);
>> +     refcount_inc(&sk->sk_refcnt);
>>  }
>>
>
> While I certainly see the value of these refcount_t, we have a very
> different behavior on these atomic_inc() which were doing a single
> inlined LOCK RMW on x86.

I think we can certainly investigate arch-specific ways to improve the
performance, but the consensus seemed to be that getting the
infrastructure in and doing the migration was the first set of steps.

> We now call an external function performing a
> atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
> loop, loosing the nice ability for x86 of preventing live locks.
>
> Looks a lot of bloat, just to be able to chase hypothetical bugs in the
> kernel.
>
> I would love to have a way to enable extra debugging when I want a debug
> kernel, like LOCKDEP or KASAN.
>
> By adding all this bloat, we assert linux kernel is terminally buggy and
> every atomic_inc() we did was suspicious, and need to be always
> instrumented/validated.

This IS the assertion, unfortunately. With average 5 year lifetimes on
security flaws[1], and many of the last couple years' public exploits
being refcount flaws[2], this is something we have to get done. We
need the default kernel to be much more self-protective, and this is
one of many places to make it happen.

I am, of course, biased, but I think the evidence of actual
refcounting attacks outweighs the theoretical performance cost of
these changes. If there is a realistic workflow that shows actual
problems, let's examine it and find a solution for that as a separate
part of this work without blocking this migration.

-Kees

[1] https://outflux.net/blog/archives/2016/10/18/security-bug-lifetime/
[2] http://kernsec.org/wiki/index.php/Bug_Classes/Integer_overflow

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 17:38       ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-16 17:38 UTC (permalink / raw)
  To: Eric Dumazet, Peter Zijlstra
  Cc: Network Development, bridge, James Morris, LKML,
	Hans Liljestrand, Alexey Kuznetsov, Patrick McHardy,
	Elena Reshetova, David Windsor

On Thu, Mar 16, 2017 at 10:58 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-16 at 17:28 +0200, Elena Reshetova wrote:
>> refcount_t type and corresponding API should be
>> used instead of atomic_t when the variable is used as
>> a reference counter. This allows to avoid accidental
>> refcounter overflows that might lead to use-after-free
>> situations.
>
>
> ...
>
>>  static __always_inline void sock_hold(struct sock *sk)
>>  {
>> -     atomic_inc(&sk->sk_refcnt);
>> +     refcount_inc(&sk->sk_refcnt);
>>  }
>>
>
> While I certainly see the value of these refcount_t, we have a very
> different behavior on these atomic_inc() which were doing a single
> inlined LOCK RMW on x86.

I think we can certainly investigate arch-specific ways to improve the
performance, but the consensus seemed to be that getting the
infrastructure in and doing the migration was the first set of steps.

> We now call an external function performing a
> atomic_read(), various ops/tests, then atomic_cmpxchg_relaxed(), in a
> loop, loosing the nice ability for x86 of preventing live locks.
>
> Looks a lot of bloat, just to be able to chase hypothetical bugs in the
> kernel.
>
> I would love to have a way to enable extra debugging when I want a debug
> kernel, like LOCKDEP or KASAN.
>
> By adding all this bloat, we assert linux kernel is terminally buggy and
> every atomic_inc() we did was suspicious, and need to be always
> instrumented/validated.

This IS the assertion, unfortunately. With average 5 year lifetimes on
security flaws[1], and many of the last couple years' public exploits
being refcount flaws[2], this is something we have to get done. We
need the default kernel to be much more self-protective, and this is
one of many places to make it happen.

I am, of course, biased, but I think the evidence of actual
refcounting attacks outweighs the theoretical performance cost of
these changes. If there is a realistic workflow that shows actual
problems, let's examine it and find a solution for that as a separate
part of this work without blocking this migration.

-Kees

[1] https://outflux.net/blog/archives/2016/10/18/security-bug-lifetime/
[2] http://kernsec.org/wiki/index.php/Bug_Classes/Integer_overflow

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-16 17:38       ` [Bridge] " Kees Cook
  (?)
@ 2017-03-16 19:10         ` David Miller
  -1 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-16 19:10 UTC (permalink / raw)
  To: keescook
  Cc: eric.dumazet, peterz, elena.reshetova, netdev, bridge,
	linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor

From: Kees Cook <keescook@chromium.org>
Date: Thu, 16 Mar 2017 11:38:25 -0600

> I am, of course, biased, but I think the evidence of actual
> refcounting attacks outweighs the theoretical performance cost of
> these changes.

This is not theoretical at all.

We count the nanoseconds that every packet takes to get processed and
you are adding quite a bit.

I understand your point of view, but this is knowingly going to add
performance regressions to the networking code.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 19:10         ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-16 19:10 UTC (permalink / raw)
  To: keescook
  Cc: eric.dumazet, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, kaber, elena.reshetova, dwindsor

From: Kees Cook <keescook@chromium.org>
Date: Thu, 16 Mar 2017 11:38:25 -0600

> I am, of course, biased, but I think the evidence of actual
> refcounting attacks outweighs the theoretical performance cost of
> these changes.

This is not theoretical at all.

We count the nanoseconds that every packet takes to get processed and
you are adding quite a bit.

I understand your point of view, but this is knowingly going to add
performance regressions to the networking code.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-16 19:10         ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-16 19:10 UTC (permalink / raw)
  To: keescook
  Cc: eric.dumazet, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, kaber, elena.reshetova, dwindsor

From: Kees Cook <keescook@chromium.org>
Date: Thu, 16 Mar 2017 11:38:25 -0600

> I am, of course, biased, but I think the evidence of actual
> refcounting attacks outweighs the theoretical performance cost of
> these changes.

This is not theoretical at all.

We count the nanoseconds that every packet takes to get processed and
you are adding quite a bit.

I understand your point of view, but this is knowingly going to add
performance regressions to the networking code.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-16 19:10         ` David Miller
  (?)
@ 2017-03-17  7:42           ` Reshetova, Elena
  -1 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  7:42 UTC (permalink / raw)
  To: David Miller, keescook
  Cc: eric.dumazet, peterz, netdev, bridge, linux-kernel, kuznet,
	jmorris, kaber, stephen, ishkamiel, dwindsor

> From: Kees Cook <keescook@chromium.org>
> Date: Thu, 16 Mar 2017 11:38:25 -0600
> 
> > I am, of course, biased, but I think the evidence of actual
> > refcounting attacks outweighs the theoretical performance cost of
> > these changes.
> 
> This is not theoretical at all.
> 
> We count the nanoseconds that every packet takes to get processed and
> you are adding quite a bit.
> 
> I understand your point of view, but this is knowingly going to add
> performance regressions to the networking code.

Should we then first measure the actual numbers to understand what we are talking here about? 
I would be glad to do it if you suggest what is the correct way to do measurements here to actually reflect the real life use cases. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-17  7:42           ` Reshetova, Elena
  0 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  7:42 UTC (permalink / raw)
  To: David Miller, keescook
  Cc: eric.dumazet, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, kaber, dwindsor

> From: Kees Cook <keescook@chromium.org>
> Date: Thu, 16 Mar 2017 11:38:25 -0600
> 
> > I am, of course, biased, but I think the evidence of actual
> > refcounting attacks outweighs the theoretical performance cost of
> > these changes.
> 
> This is not theoretical at all.
> 
> We count the nanoseconds that every packet takes to get processed and
> you are adding quite a bit.
> 
> I understand your point of view, but this is knowingly going to add
> performance regressions to the networking code.

Should we then first measure the actual numbers to understand what we are talking here about? 
I would be glad to do it if you suggest what is the correct way to do measurements here to actually reflect the real life use cases. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-17  7:42           ` Reshetova, Elena
  0 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  7:42 UTC (permalink / raw)
  To: David Miller, keescook
  Cc: eric.dumazet, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, kaber, dwindsor

> From: Kees Cook <keescook@chromium.org>
> Date: Thu, 16 Mar 2017 11:38:25 -0600
> 
> > I am, of course, biased, but I think the evidence of actual
> > refcounting attacks outweighs the theoretical performance cost of
> > these changes.
> 
> This is not theoretical at all.
> 
> We count the nanoseconds that every packet takes to get processed and
> you are adding quite a bit.
> 
> I understand your point of view, but this is knowingly going to add
> performance regressions to the networking code.

Should we then first measure the actual numbers to understand what we are talking here about? 
I would be glad to do it if you suggest what is the correct way to do measurements here to actually reflect the real life use cases. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
  2017-03-16 16:04     ` [Bridge] " Daniel Borkmann
  (?)
@ 2017-03-17  8:02       ` Reshetova, Elena
  -1 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  8:02 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Hans Liljestrand, David Windsor, alexei.starovoitov


> On 03/16/2017 04:28 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Signed-off-by: David Windsor <dwindsor@gmail.com>
> > ---
> >   include/linux/filter.h | 3 ++-
> >   net/core/filter.c      | 7 ++++---
> >   2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 8053c38..20247e7 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -7,6 +7,7 @@
> >   #include <stdarg.h>
> >
> >   #include <linux/atomic.h>
> > +#include <linux/refcount.h>
> >   #include <linux/compat.h>
> >   #include <linux/skbuff.h>
> >   #include <linux/linkage.h>
> > @@ -431,7 +432,7 @@ struct bpf_prog {
> >   };
> >
> >   struct sk_filter {
> > -	atomic_t	refcnt;
> > +	refcount_t	refcnt;
> >   	struct rcu_head	rcu;
> >   	struct bpf_prog	*prog;
> >   };
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index ebaeaf2..62267e2 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
> >    */
> >   static void sk_filter_release(struct sk_filter *fp)
> >   {
> > -	if (atomic_dec_and_test(&fp->refcnt))
> > +	if (refcount_dec_and_test(&fp->refcnt))
> >   		call_rcu(&fp->rcu, sk_filter_release_rcu);
> >   }
> >
> > @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> >   	/* same check as in sock_kmalloc() */
> >   	if (filter_size <= sysctl_optmem_max &&
> >   	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> > -		atomic_inc(&fp->refcnt);
> > +		refcount_inc(&fp->refcnt);
> >   		atomic_add(filter_size, &sk->sk_omem_alloc);
> >   		return true;
> >   	}
> > @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog,
> struct sock *sk)
> >   		return -ENOMEM;
> >
> >   	fp->prog = prog;
> > -	atomic_set(&fp->refcnt, 0);
> > +	refcount_set(&fp->refcnt, 1);
> >
> >   	if (!sk_filter_charge(sk, fp)) {
> >   		kfree(fp);
> >   		return -ENOMEM;
> >   	}
> > +	refcount_set(&fp->refcnt, 1);
> 
> Regarding the two subsequent refcount_set(, 1) that look a bit strange
> due to the sk_filter_charge() having refcount_inc() I presume ... can't
> the refcount API handle such corner case? 

Yes, it was exactly because of recount_inc() from zero in sk_filter_charge(). 
refcount_inc() would refuse to do an inc from zero for security reasons. At some 
point in past we discussed refcount_inc_not_one() but it was decided to be too special case
to support (we really have very little of such cases).


Or alternatively the let the
> sk_filter_charge() handle it, for example:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	u32 filter_size = bpf_prog_size(fp->prog->len);
> 
> 	/* same check as in sock_kmalloc() */
> 	if (filter_size <= sysctl_optmem_max &&
> 	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> 		atomic_add(filter_size, &sk->sk_omem_alloc);
> 		return true;
> 	}
> 	return false;
> }
> 
> And this goes to filter.h:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp);
> 
> bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	bool ret = __sk_filter_charge(sk, fp);
> 	if (ret)
> 		refcount_inc(&fp->refcnt);
> 	return ret;
> }
> 
> ... and let __sk_attach_prog() call __sk_filter_charge() and only fo
> the second refcount_set()?
> 
> >   	old_fp = rcu_dereference_protected(sk->sk_filter,
> >
> lockdep_sock_is_held(sk));
> >

Oh, yes, this would make it look less awkward. Thank you for the suggestion Daniel! 
I guess we try to be less invasive for code changes overall, maybe even too careful... 

I will update the patch and send a new version. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
@ 2017-03-17  8:02       ` Reshetova, Elena
  0 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  8:02 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, alexei.starovoitov, kaber,
	David Windsor


> On 03/16/2017 04:28 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Signed-off-by: David Windsor <dwindsor@gmail.com>
> > ---
> >   include/linux/filter.h | 3 ++-
> >   net/core/filter.c      | 7 ++++---
> >   2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 8053c38..20247e7 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -7,6 +7,7 @@
> >   #include <stdarg.h>
> >
> >   #include <linux/atomic.h>
> > +#include <linux/refcount.h>
> >   #include <linux/compat.h>
> >   #include <linux/skbuff.h>
> >   #include <linux/linkage.h>
> > @@ -431,7 +432,7 @@ struct bpf_prog {
> >   };
> >
> >   struct sk_filter {
> > -	atomic_t	refcnt;
> > +	refcount_t	refcnt;
> >   	struct rcu_head	rcu;
> >   	struct bpf_prog	*prog;
> >   };
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index ebaeaf2..62267e2 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
> >    */
> >   static void sk_filter_release(struct sk_filter *fp)
> >   {
> > -	if (atomic_dec_and_test(&fp->refcnt))
> > +	if (refcount_dec_and_test(&fp->refcnt))
> >   		call_rcu(&fp->rcu, sk_filter_release_rcu);
> >   }
> >
> > @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> >   	/* same check as in sock_kmalloc() */
> >   	if (filter_size <= sysctl_optmem_max &&
> >   	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> > -		atomic_inc(&fp->refcnt);
> > +		refcount_inc(&fp->refcnt);
> >   		atomic_add(filter_size, &sk->sk_omem_alloc);
> >   		return true;
> >   	}
> > @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog,
> struct sock *sk)
> >   		return -ENOMEM;
> >
> >   	fp->prog = prog;
> > -	atomic_set(&fp->refcnt, 0);
> > +	refcount_set(&fp->refcnt, 1);
> >
> >   	if (!sk_filter_charge(sk, fp)) {
> >   		kfree(fp);
> >   		return -ENOMEM;
> >   	}
> > +	refcount_set(&fp->refcnt, 1);
> 
> Regarding the two subsequent refcount_set(, 1) that look a bit strange
> due to the sk_filter_charge() having refcount_inc() I presume ... can't
> the refcount API handle such corner case? 

Yes, it was exactly because of recount_inc() from zero in sk_filter_charge(). 
refcount_inc() would refuse to do an inc from zero for security reasons. At some 
point in past we discussed refcount_inc_not_one() but it was decided to be too special case
to support (we really have very little of such cases).


Or alternatively the let the
> sk_filter_charge() handle it, for example:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	u32 filter_size = bpf_prog_size(fp->prog->len);
> 
> 	/* same check as in sock_kmalloc() */
> 	if (filter_size <= sysctl_optmem_max &&
> 	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> 		atomic_add(filter_size, &sk->sk_omem_alloc);
> 		return true;
> 	}
> 	return false;
> }
> 
> And this goes to filter.h:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp);
> 
> bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	bool ret = __sk_filter_charge(sk, fp);
> 	if (ret)
> 		refcount_inc(&fp->refcnt);
> 	return ret;
> }
> 
> ... and let __sk_attach_prog() call __sk_filter_charge() and only fo
> the second refcount_set()?
> 
> >   	old_fp = rcu_dereference_protected(sk->sk_filter,
> >
> lockdep_sock_is_held(sk));
> >

Oh, yes, this would make it look less awkward. Thank you for the suggestion Daniel! 
I guess we try to be less invasive for code changes overall, maybe even too careful... 

I will update the patch and send a new version. 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t
@ 2017-03-17  8:02       ` Reshetova, Elena
  0 siblings, 0 replies; 137+ messages in thread
From: Reshetova, Elena @ 2017-03-17  8:02 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: keescook, peterz, bridge, linux-kernel, jmorris,
	Hans Liljestrand, kuznet, alexei.starovoitov, kaber,
	David Windsor


> On 03/16/2017 04:28 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > Signed-off-by: David Windsor <dwindsor@gmail.com>
> > ---
> >   include/linux/filter.h | 3 ++-
> >   net/core/filter.c      | 7 ++++---
> >   2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index 8053c38..20247e7 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -7,6 +7,7 @@
> >   #include <stdarg.h>
> >
> >   #include <linux/atomic.h>
> > +#include <linux/refcount.h>
> >   #include <linux/compat.h>
> >   #include <linux/skbuff.h>
> >   #include <linux/linkage.h>
> > @@ -431,7 +432,7 @@ struct bpf_prog {
> >   };
> >
> >   struct sk_filter {
> > -	atomic_t	refcnt;
> > +	refcount_t	refcnt;
> >   	struct rcu_head	rcu;
> >   	struct bpf_prog	*prog;
> >   };
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index ebaeaf2..62267e2 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu)
> >    */
> >   static void sk_filter_release(struct sk_filter *fp)
> >   {
> > -	if (atomic_dec_and_test(&fp->refcnt))
> > +	if (refcount_dec_and_test(&fp->refcnt))
> >   		call_rcu(&fp->rcu, sk_filter_release_rcu);
> >   }
> >
> > @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> >   	/* same check as in sock_kmalloc() */
> >   	if (filter_size <= sysctl_optmem_max &&
> >   	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> > -		atomic_inc(&fp->refcnt);
> > +		refcount_inc(&fp->refcnt);
> >   		atomic_add(filter_size, &sk->sk_omem_alloc);
> >   		return true;
> >   	}
> > @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog,
> struct sock *sk)
> >   		return -ENOMEM;
> >
> >   	fp->prog = prog;
> > -	atomic_set(&fp->refcnt, 0);
> > +	refcount_set(&fp->refcnt, 1);
> >
> >   	if (!sk_filter_charge(sk, fp)) {
> >   		kfree(fp);
> >   		return -ENOMEM;
> >   	}
> > +	refcount_set(&fp->refcnt, 1);
> 
> Regarding the two subsequent refcount_set(, 1) that look a bit strange
> due to the sk_filter_charge() having refcount_inc() I presume ... can't
> the refcount API handle such corner case? 

Yes, it was exactly because of recount_inc() from zero in sk_filter_charge(). 
refcount_inc() would refuse to do an inc from zero for security reasons. At some 
point in past we discussed refcount_inc_not_one() but it was decided to be too special case
to support (we really have very little of such cases).


Or alternatively the let the
> sk_filter_charge() handle it, for example:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	u32 filter_size = bpf_prog_size(fp->prog->len);
> 
> 	/* same check as in sock_kmalloc() */
> 	if (filter_size <= sysctl_optmem_max &&
> 	    atomic_read(&sk->sk_omem_alloc) + filter_size <
> sysctl_optmem_max) {
> 		atomic_add(filter_size, &sk->sk_omem_alloc);
> 		return true;
> 	}
> 	return false;
> }
> 
> And this goes to filter.h:
> 
> bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp);
> 
> bool sk_filter_charge(struct sock *sk, struct sk_filter *fp)
> {
> 	bool ret = __sk_filter_charge(sk, fp);
> 	if (ret)
> 		refcount_inc(&fp->refcnt);
> 	return ret;
> }
> 
> ... and let __sk_attach_prog() call __sk_filter_charge() and only fo
> the second refcount_set()?
> 
> >   	old_fp = rcu_dereference_protected(sk->sk_filter,
> >
> lockdep_sock_is_held(sk));
> >

Oh, yes, this would make it look less awkward. Thank you for the suggestion Daniel! 
I guess we try to be less invasive for code changes overall, maybe even too careful... 

I will update the patch and send a new version. 

Best Regards,
Elena.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-17  7:42           ` Reshetova, Elena
  (?)
@ 2017-03-17 16:13             ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-17 16:13 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: David Miller, keescook, peterz, netdev, bridge, linux-kernel,
	kuznet, jmorris, kaber, stephen, ishkamiel, dwindsor

On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:

> Should we then first measure the actual numbers to understand what we
> are talking here about? 
> I would be glad to do it if you suggest what is the correct way to do
> measurements here to actually reflect the real life use cases. 

How have these patches been tested in real life exactly ?

Can you quantify number of added cycles per TCP packet, where I expect
we have maybe 20 atomic operations in all layers ...


(sk refcnt, skb->users, page refcounts, sk->sk_wmem_alloc,
sk->sk_rmem_alloc, qdisc ...)

Once we 'protect' all of them, cost will be quite high.

This translates to more fossil fuel being burnt.

one atomic_inc() used to be a single x86 instruction.

Rough estimate of refcount_inc() :

0000000000000140 <refcount_inc>:
 140:	55                   	push   %rbp
 141:	48 89 e5             	mov    %rsp,%rbp
 144:	e8 00 00 00 00       	callq  refcount_inc_not_zero
 149:	84 c0                	test   %al,%al
 14b:	74 02                	je     14f <refcount_inc+0xf>
 14d:	5d                   	pop    %rbp
 14e:	c3                   	retq   

00000000000000e0 <refcount_inc_not_zero>:
  e0:	8b 17                	mov    (%rdi),%edx
  e2:	eb 10                	jmp    f4 <refcount_inc_not_zero+0x14>
  e4:	85 c9                	test   %ecx,%ecx
  e6:	74 1b                	je     103 <refcount_inc_not_zero+0x23>
  e8:	89 d0                	mov    %edx,%eax
  ea:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
  ee:	39 d0                	cmp    %edx,%eax
  f0:	74 0c                	je     fe <refcount_inc_not_zero+0x1e>
  f2:	89 c2                	mov    %eax,%edx
  f4:	85 d2                	test   %edx,%edx
  f6:	8d 4a 01             	lea    0x1(%rdx),%ecx
  f9:	75 e9                	jne    e4 <refcount_inc_not_zero+0x4>
  fb:	31 c0                	xor    %eax,%eax
  fd:	c3                   	retq   
  fe:	83 f9 ff             	cmp    $0xffffffff,%ecx
 101:	74 06                	je     109 <refcount_inc_not_zero+0x29>
 103:	b8 01 00 00 00       	mov    $0x1,%eax
 108:	c3                   	retq   

This is simply bloat for most cases.

Again, I believe this infrastructure makes sense for debugging kernels.

If some vendors are willing to run fully enabled debug kernels,
that is their choice. Probably many devices wont show any difference.

Have we forced KASAN being enabled in linux kernel, just because
it found ~400 bugs so far ?

I believe refcount_t infra is not mature enough to be widely used
right now.

Maybe in few months when we have more flexibility, like existing
debugging facilities (CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_PAGE_REF,
LOCKDEP, KMEMLEAK, KASAN, ...)

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-17 16:13             ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-17 16:13 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: keescook, kaber, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, David Miller, dwindsor

On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:

> Should we then first measure the actual numbers to understand what we
> are talking here about? 
> I would be glad to do it if you suggest what is the correct way to do
> measurements here to actually reflect the real life use cases. 

How have these patches been tested in real life exactly ?

Can you quantify number of added cycles per TCP packet, where I expect
we have maybe 20 atomic operations in all layers ...


(sk refcnt, skb->users, page refcounts, sk->sk_wmem_alloc,
sk->sk_rmem_alloc, qdisc ...)

Once we 'protect' all of them, cost will be quite high.

This translates to more fossil fuel being burnt.

one atomic_inc() used to be a single x86 instruction.

Rough estimate of refcount_inc() :

0000000000000140 <refcount_inc>:
 140:	55                   	push   %rbp
 141:	48 89 e5             	mov    %rsp,%rbp
 144:	e8 00 00 00 00       	callq  refcount_inc_not_zero
 149:	84 c0                	test   %al,%al
 14b:	74 02                	je     14f <refcount_inc+0xf>
 14d:	5d                   	pop    %rbp
 14e:	c3                   	retq   

00000000000000e0 <refcount_inc_not_zero>:
  e0:	8b 17                	mov    (%rdi),%edx
  e2:	eb 10                	jmp    f4 <refcount_inc_not_zero+0x14>
  e4:	85 c9                	test   %ecx,%ecx
  e6:	74 1b                	je     103 <refcount_inc_not_zero+0x23>
  e8:	89 d0                	mov    %edx,%eax
  ea:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
  ee:	39 d0                	cmp    %edx,%eax
  f0:	74 0c                	je     fe <refcount_inc_not_zero+0x1e>
  f2:	89 c2                	mov    %eax,%edx
  f4:	85 d2                	test   %edx,%edx
  f6:	8d 4a 01             	lea    0x1(%rdx),%ecx
  f9:	75 e9                	jne    e4 <refcount_inc_not_zero+0x4>
  fb:	31 c0                	xor    %eax,%eax
  fd:	c3                   	retq   
  fe:	83 f9 ff             	cmp    $0xffffffff,%ecx
 101:	74 06                	je     109 <refcount_inc_not_zero+0x29>
 103:	b8 01 00 00 00       	mov    $0x1,%eax
 108:	c3                   	retq   

This is simply bloat for most cases.

Again, I believe this infrastructure makes sense for debugging kernels.

If some vendors are willing to run fully enabled debug kernels,
that is their choice. Probably many devices wont show any difference.

Have we forced KASAN being enabled in linux kernel, just because
it found ~400 bugs so far ?

I believe refcount_t infra is not mature enough to be widely used
right now.

Maybe in few months when we have more flexibility, like existing
debugging facilities (CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_PAGE_REF,
LOCKDEP, KMEMLEAK, KASAN, ...)

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-17 16:13             ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-17 16:13 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: keescook, kaber, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, David Miller, dwindsor

On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:

> Should we then first measure the actual numbers to understand what we
> are talking here about? 
> I would be glad to do it if you suggest what is the correct way to do
> measurements here to actually reflect the real life use cases. 

How have these patches been tested in real life exactly ?

Can you quantify number of added cycles per TCP packet, where I expect
we have maybe 20 atomic operations in all layers ...


(sk refcnt, skb->users, page refcounts, sk->sk_wmem_alloc,
sk->sk_rmem_alloc, qdisc ...)

Once we 'protect' all of them, cost will be quite high.

This translates to more fossil fuel being burnt.

one atomic_inc() used to be a single x86 instruction.

Rough estimate of refcount_inc() :

0000000000000140 <refcount_inc>:
 140:	55                   	push   %rbp
 141:	48 89 e5             	mov    %rsp,%rbp
 144:	e8 00 00 00 00       	callq  refcount_inc_not_zero
 149:	84 c0                	test   %al,%al
 14b:	74 02                	je     14f <refcount_inc+0xf>
 14d:	5d                   	pop    %rbp
 14e:	c3                   	retq   

00000000000000e0 <refcount_inc_not_zero>:
  e0:	8b 17                	mov    (%rdi),%edx
  e2:	eb 10                	jmp    f4 <refcount_inc_not_zero+0x14>
  e4:	85 c9                	test   %ecx,%ecx
  e6:	74 1b                	je     103 <refcount_inc_not_zero+0x23>
  e8:	89 d0                	mov    %edx,%eax
  ea:	f0 0f b1 0f          	lock cmpxchg %ecx,(%rdi)
  ee:	39 d0                	cmp    %edx,%eax
  f0:	74 0c                	je     fe <refcount_inc_not_zero+0x1e>
  f2:	89 c2                	mov    %eax,%edx
  f4:	85 d2                	test   %edx,%edx
  f6:	8d 4a 01             	lea    0x1(%rdx),%ecx
  f9:	75 e9                	jne    e4 <refcount_inc_not_zero+0x4>
  fb:	31 c0                	xor    %eax,%eax
  fd:	c3                   	retq   
  fe:	83 f9 ff             	cmp    $0xffffffff,%ecx
 101:	74 06                	je     109 <refcount_inc_not_zero+0x29>
 103:	b8 01 00 00 00       	mov    $0x1,%eax
 108:	c3                   	retq   

This is simply bloat for most cases.

Again, I believe this infrastructure makes sense for debugging kernels.

If some vendors are willing to run fully enabled debug kernels,
that is their choice. Probably many devices wont show any difference.

Have we forced KASAN being enabled in linux kernel, just because
it found ~400 bugs so far ?

I believe refcount_t infra is not mature enough to be widely used
right now.

Maybe in few months when we have more flexibility, like existing
debugging facilities (CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_PAGE_REF,
LOCKDEP, KMEMLEAK, KASAN, ...)



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-17 16:13             ` Eric Dumazet
@ 2017-03-18 16:47               ` Herbert Xu
  -1 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-18 16:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: elena.reshetova, davem, keescook, peterz, netdev, bridge,
	linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, Andrew Morton

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
> 
>> Should we then first measure the actual numbers to understand what we
>> are talking here about? 
>> I would be glad to do it if you suggest what is the correct way to do
>> measurements here to actually reflect the real life use cases. 
> 
> How have these patches been tested in real life exactly ?
> 
> Can you quantify number of added cycles per TCP packet, where I expect
> we have maybe 20 atomic operations in all layers ...

I completely agree.  I think this thing needs to default to the
existing atomic_t behaviour.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-18 16:47               ` Herbert Xu
  0 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-18 16:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: keescook, kaber, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, Andrew Morton, davem, elena.reshetova,
	dwindsor

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
> 
>> Should we then first measure the actual numbers to understand what we
>> are talking here about? 
>> I would be glad to do it if you suggest what is the correct way to do
>> measurements here to actually reflect the real life use cases. 
> 
> How have these patches been tested in real life exactly ?
> 
> Can you quantify number of added cycles per TCP packet, where I expect
> we have maybe 20 atomic operations in all layers ...

I completely agree.  I think this thing needs to default to the
existing atomic_t behaviour.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-18 16:47               ` [Bridge] " Herbert Xu
  (?)
@ 2017-03-19  1:21                 ` David Miller
  -1 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-19  1:21 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, elena.reshetova, keescook, peterz, netdev, bridge,
	linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 19 Mar 2017 00:47:59 +0800

> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
>> 
>>> Should we then first measure the actual numbers to understand what we
>>> are talking here about? 
>>> I would be glad to do it if you suggest what is the correct way to do
>>> measurements here to actually reflect the real life use cases. 
>> 
>> How have these patches been tested in real life exactly ?
>> 
>> Can you quantify number of added cycles per TCP packet, where I expect
>> we have maybe 20 atomic operations in all layers ...
> 
> I completely agree.  I think this thing needs to default to the
> existing atomic_t behaviour.

I totally agree as well, the refcount_t facility as-is is unacceptable
for networking.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-19  1:21                 ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-19  1:21 UTC (permalink / raw)
  To: herbert
  Cc: keescook, eric.dumazet, peterz, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, kaber, elena.reshetova,
	dwindsor

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 19 Mar 2017 00:47:59 +0800

> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
>> 
>>> Should we then first measure the actual numbers to understand what we
>>> are talking here about? 
>>> I would be glad to do it if you suggest what is the correct way to do
>>> measurements here to actually reflect the real life use cases. 
>> 
>> How have these patches been tested in real life exactly ?
>> 
>> Can you quantify number of added cycles per TCP packet, where I expect
>> we have maybe 20 atomic operations in all layers ...
> 
> I completely agree.  I think this thing needs to default to the
> existing atomic_t behaviour.

I totally agree as well, the refcount_t facility as-is is unacceptable
for networking.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-19  1:21                 ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-19  1:21 UTC (permalink / raw)
  To: herbert
  Cc: keescook, eric.dumazet, peterz, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, kaber, elena.reshetova,
	dwindsor

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 19 Mar 2017 00:47:59 +0800

> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
>> 
>>> Should we then first measure the actual numbers to understand what we
>>> are talking here about? 
>>> I would be glad to do it if you suggest what is the correct way to do
>>> measurements here to actually reflect the real life use cases. 
>> 
>> How have these patches been tested in real life exactly ?
>> 
>> Can you quantify number of added cycles per TCP packet, where I expect
>> we have maybe 20 atomic operations in all layers ...
> 
> I completely agree.  I think this thing needs to default to the
> existing atomic_t behaviour.

I totally agree as well, the refcount_t facility as-is is unacceptable
for networking.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-19  1:21                 ` David Miller
@ 2017-03-20 10:39                   ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 10:39 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, eric.dumazet, elena.reshetova, keescook, netdev, bridge,
	linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Sat, Mar 18, 2017 at 06:21:21PM -0700, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sun, 19 Mar 2017 00:47:59 +0800
> 
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
> >> 
> >>> Should we then first measure the actual numbers to understand what we
> >>> are talking here about? 
> >>> I would be glad to do it if you suggest what is the correct way to do
> >>> measurements here to actually reflect the real life use cases. 
> >> 
> >> How have these patches been tested in real life exactly ?
> >> 
> >> Can you quantify number of added cycles per TCP packet, where I expect
> >> we have maybe 20 atomic operations in all layers ...
> > 
> > I completely agree.  I think this thing needs to default to the
> > existing atomic_t behaviour.
> 
> I totally agree as well, the refcount_t facility as-is is unacceptable
> for networking.

Can we at least give a benchmark and have someone run numbers? We should
be able to quantify these things.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 10:39                   ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 10:39 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, eric.dumazet, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, kaber, elena.reshetova,
	keescook

On Sat, Mar 18, 2017 at 06:21:21PM -0700, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sun, 19 Mar 2017 00:47:59 +0800
> 
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> On Fri, 2017-03-17 at 07:42 +0000, Reshetova, Elena wrote:
> >> 
> >>> Should we then first measure the actual numbers to understand what we
> >>> are talking here about? 
> >>> I would be glad to do it if you suggest what is the correct way to do
> >>> measurements here to actually reflect the real life use cases. 
> >> 
> >> How have these patches been tested in real life exactly ?
> >> 
> >> Can you quantify number of added cycles per TCP packet, where I expect
> >> we have maybe 20 atomic operations in all layers ...
> > 
> > I completely agree.  I think this thing needs to default to the
> > existing atomic_t behaviour.
> 
> I totally agree as well, the refcount_t facility as-is is unacceptable
> for networking.

Can we at least give a benchmark and have someone run numbers? We should
be able to quantify these things.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 10:39                   ` [Bridge] " Peter Zijlstra
  (?)
@ 2017-03-20 13:16                     ` Herbert Xu
  -1 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
>
> Can we at least give a benchmark and have someone run numbers? We should
> be able to quantify these things.

Do you realise how many times this thing gets hit at 10Gb/s or
higher? Anyway, since you're proposing this change you should
demonstrate that it does not cause a performance regression.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:16                     ` Herbert Xu
  0 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
>
> Can we at least give a benchmark and have someone run numbers? We should
> be able to quantify these things.

Do you realise how many times this thing gets hit at 10Gb/s or
higher? Anyway, since you're proposing this change you should
demonstrate that it does not cause a performance regression.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:16                     ` Herbert Xu
  0 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
>
> Can we at least give a benchmark and have someone run numbers? We should
> be able to quantify these things.

Do you realise how many times this thing gets hit at 10Gb/s or
higher? Anyway, since you're proposing this change you should
demonstrate that it does not cause a performance regression.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:16                     ` Herbert Xu
@ 2017-03-20 13:23                       ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 13:23 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 09:16:29PM +0800, Herbert Xu wrote:
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

So what bench/setup do you want ran?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:23                       ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 13:23 UTC (permalink / raw)
  To: Herbert Xu
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 09:16:29PM +0800, Herbert Xu wrote:
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

So what bench/setup do you want ran?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:23                       ` [Bridge] " Peter Zijlstra
  (?)
@ 2017-03-20 13:27                         ` Herbert Xu
  -1 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>
> So what bench/setup do you want ran?

You can start by counting how many cycles an atomic op takes
vs. how many cycles this new code takes.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:27                         ` Herbert Xu
  0 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>
> So what bench/setup do you want ran?

You can start by counting how many cycles an atomic op takes
vs. how many cycles this new code takes.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:27                         ` Herbert Xu
  0 siblings, 0 replies; 137+ messages in thread
From: Herbert Xu @ 2017-03-20 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>
> So what bench/setup do you want ran?

You can start by counting how many cycles an atomic op takes
vs. how many cycles this new code takes.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:27                         ` Herbert Xu
@ 2017-03-20 13:40                           ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 13:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> >
> > So what bench/setup do you want ran?
> 
> You can start by counting how many cycles an atomic op takes
> vs. how many cycles this new code takes.

On what uarch?

I think I tested hand coded asm version and it ended up about double the
cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
the memory bus saturated, at which point they took the same). Newer
parts will of course have different numbers,

Can't we run some iperf on a 40gbe fiber loop or something? It would be
very useful to have an actual workload we can run.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 13:40                           ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 13:40 UTC (permalink / raw)
  To: Herbert Xu
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> >
> > So what bench/setup do you want ran?
> 
> You can start by counting how many cycles an atomic op takes
> vs. how many cycles this new code takes.

On what uarch?

I think I tested hand coded asm version and it ended up about double the
cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
the memory bus saturated, at which point they took the same). Newer
parts will of course have different numbers,

Can't we run some iperf on a 40gbe fiber loop or something? It would be
very useful to have an actual workload we can run.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:16                     ` Herbert Xu
  (?)
@ 2017-03-20 14:10                       ` David Laight
  -1 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 14:10 UTC (permalink / raw)
  To: 'Herbert Xu', Peter Zijlstra
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

From: Herbert Xu
> Sent: 20 March 2017 13:16
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

What checks does refcnt_t actually do?

An extra decrement is hard to detect since the item gets freed early.
I guess making the main 'allocate/free' code hold (say) 64k references
would give some leeway for extra decrements.

An extra increment will be detected when the count eventually wraps.
Unless the error is in a very common path that won't happen for a long time.

On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
cheaply detect errors - provided you actually generate a forwards branch.

Otherwise having a common, but not every packet, code path verify that the
reference count is 'sane' would give reasonable coverage.

	David

^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:10                       ` David Laight
  0 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 14:10 UTC (permalink / raw)
  To: 'Herbert Xu', Peter Zijlstra
  Cc: David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

From: Herbert Xu
> Sent: 20 March 2017 13:16
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

What checks does refcnt_t actually do?

An extra decrement is hard to detect since the item gets freed early.
I guess making the main 'allocate/free' code hold (say) 64k references
would give some leeway for extra decrements.

An extra increment will be detected when the count eventually wraps.
Unless the error is in a very common path that won't happen for a long time.

On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
cheaply detect errors - provided you actually generate a forwards branch.

Otherwise having a common, but not every packet, code path verify that the
reference count is 'sane' would give reasonable coverage.

	David

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:10                       ` David Laight
  0 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 14:10 UTC (permalink / raw)
  To: 'Herbert Xu', Peter Zijlstra
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, David Miller, elena.reshetova,
	dwindsor

From: Herbert Xu
> Sent: 20 March 2017 13:16
> On Mon, Mar 20, 2017 at 11:39:37AM +0100, Peter Zijlstra wrote:
> >
> > Can we at least give a benchmark and have someone run numbers? We should
> > be able to quantify these things.
> 
> Do you realise how many times this thing gets hit at 10Gb/s or
> higher? Anyway, since you're proposing this change you should
> demonstrate that it does not cause a performance regression.

What checks does refcnt_t actually do?

An extra decrement is hard to detect since the item gets freed early.
I guess making the main 'allocate/free' code hold (say) 64k references
would give some leeway for extra decrements.

An extra increment will be detected when the count eventually wraps.
Unless the error is in a very common path that won't happen for a long time.

On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
cheaply detect errors - provided you actually generate a forwards branch.

Otherwise having a common, but not every packet, code path verify that the
reference count is 'sane' would give reasonable coverage.

	David

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 14:10                       ` David Laight
  (?)
@ 2017-03-20 14:28                         ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 14:28 UTC (permalink / raw)
  To: David Laight
  Cc: 'Herbert Xu',
	David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> cheaply detect errors - provided you actually generate a forwards branch.

Note that currently there is no arch specific implementation. We could
of course cure this.

But note that the thing you propose; using the overflow flag, can only
reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
number of contexts that can nest.

Sure; getting all starts aligned to double overflow is incredibly rare,
but I don't want to be the one to have to debug that.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:28                         ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 14:28 UTC (permalink / raw)
  To: David Laight
  Cc: 'Herbert Xu',
	David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> cheaply detect errors - provided you actually generate a forwards branch.

Note that currently there is no arch specific implementation. We could
of course cure this.

But note that the thing you propose; using the overflow flag, can only
reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
number of contexts that can nest.

Sure; getting all starts aligned to double overflow is incredibly rare,
but I don't want to be the one to have to debug that.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:28                         ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 14:28 UTC (permalink / raw)
  To: David Laight
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, dwindsor, David Miller,
	elena.reshetova, 'Herbert Xu'

On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> cheaply detect errors - provided you actually generate a forwards branch.

Note that currently there is no arch specific implementation. We could
of course cure this.

But note that the thing you propose; using the overflow flag, can only
reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
number of contexts that can nest.

Sure; getting all starts aligned to double overflow is incredibly rare,
but I don't want to be the one to have to debug that.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:40                           ` [Bridge] " Peter Zijlstra
  (?)
@ 2017-03-20 14:51                             ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, David Miller, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, 2017-03-20 at 14:40 +0100, Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> > >
> > > So what bench/setup do you want ran?
> > 
> > You can start by counting how many cycles an atomic op takes
> > vs. how many cycles this new code takes.
> 
> On what uarch?
> 
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
> 
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

If atomic ops are converted one by one, it is likely that results will
be noise.

We can not start a global conversion without having a way to have
selective debugging ?

Then, adopting this fine infra would really not be a problem.

Some arches have efficient atomic_inc() ( no full barriers ) while load
+ test + atomic_cmpxchg() + test + loop" is more expensive.

PowerPC has no efficient atomic_inc() and this definitely shows on
network intensive workloads involving concurrent cores/threads.

atomic_cmpxchg() on PowerPC is horribly more expensive because of the
added two SYNC instructions.

networking performance is quite poor on PowerPC as of today.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:51                             ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 14:40 +0100, Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> > >
> > > So what bench/setup do you want ran?
> > 
> > You can start by counting how many cycles an atomic op takes
> > vs. how many cycles this new code takes.
> 
> On what uarch?
> 
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
> 
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

If atomic ops are converted one by one, it is likely that results will
be noise.

We can not start a global conversion without having a way to have
selective debugging ?

Then, adopting this fine infra would really not be a problem.

Some arches have efficient atomic_inc() ( no full barriers ) while load
+ test + atomic_cmpxchg() + test + loop" is more expensive.

PowerPC has no efficient atomic_inc() and this definitely shows on
network intensive workloads involving concurrent cores/threads.

atomic_cmpxchg() on PowerPC is horribly more expensive because of the
added two SYNC instructions.

networking performance is quite poor on PowerPC as of today.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:51                             ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 14:40 +0100, Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
> > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
> > >
> > > So what bench/setup do you want ran?
> > 
> > You can start by counting how many cycles an atomic op takes
> > vs. how many cycles this new code takes.
> 
> On what uarch?
> 
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
> 
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

If atomic ops are converted one by one, it is likely that results will
be noise.

We can not start a global conversion without having a way to have
selective debugging ?

Then, adopting this fine infra would really not be a problem.

Some arches have efficient atomic_inc() ( no full barriers ) while load
+ test + atomic_cmpxchg() + test + loop" is more expensive.

PowerPC has no efficient atomic_inc() and this definitely shows on
network intensive workloads involving concurrent cores/threads.

atomic_cmpxchg() on PowerPC is horribly more expensive because of the
added two SYNC instructions.

networking performance is quite poor on PowerPC as of today.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 14:51                             ` Eric Dumazet
  (?)
@ 2017-03-20 14:59                               ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, David Miller, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:

> atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> added two SYNC instructions.

Although I just saw that refcount was using atomic_cmpxchg_relaxed()

Time to find some documentation (probably missing) or get some specs for
this thing.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:59                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:

> atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> added two SYNC instructions.

Although I just saw that refcount was using atomic_cmpxchg_relaxed()

Time to find some documentation (probably missing) or get some specs for
this thing.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:59                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 14:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:

> atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> added two SYNC instructions.

Although I just saw that refcount was using atomic_cmpxchg_relaxed()

Time to find some documentation (probably missing) or get some specs for
this thing.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 14:51                             ` Eric Dumazet
@ 2017-03-20 14:59                               ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 14:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, David Miller, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, Mar 20, 2017 at 07:51:01AM -0700, Eric Dumazet wrote:
> PowerPC has no efficient atomic_inc() and this definitely shows on
> network intensive workloads involving concurrent cores/threads.

Correct, PPC LL/SC are dreadfully expensive.

> atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> added two SYNC instructions.

Note that refcount_t uses atomic_cmpxchg_release() and
atomic_cmpxchg_relaxed() which avoid most of the painful barriers.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 14:59                               ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-20 14:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, Mar 20, 2017 at 07:51:01AM -0700, Eric Dumazet wrote:
> PowerPC has no efficient atomic_inc() and this definitely shows on
> network intensive workloads involving concurrent cores/threads.

Correct, PPC LL/SC are dreadfully expensive.

> atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> added two SYNC instructions.

Note that refcount_t uses atomic_cmpxchg_release() and
atomic_cmpxchg_relaxed() which avoid most of the painful barriers.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* RE: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 14:28                         ` Peter Zijlstra
  (?)
@ 2017-03-20 15:00                           ` David Laight
  -1 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 15:00 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Herbert Xu',
	David Miller, eric.dumazet, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

From: Peter Zijlstra
> Sent: 20 March 2017 14:28
> On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> > On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> > cheaply detect errors - provided you actually generate a forwards branch.
> 
> Note that currently there is no arch specific implementation. We could
> of course cure this.
> 
> But note that the thing you propose; using the overflow flag, can only
> reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
> number of contexts that can nest.
> 
> Sure; getting all starts aligned to double overflow is incredibly rare,
> but I don't want to be the one to have to debug that.

One overflow would set the overflow flag, you don't need both to fail.

In any case you can use the sign flag.
Say valid count values are -64k to -256 and 0 to MAXINT.
The count will normally be +ve unless the 'main free path'
has released the 64k references it holds.
If the sign bit is set after inc/dec the value is checked; 
might valid, an error, or require the item be freed.

Ok assuming the items have reasonable lifetimes and have a nominal
'delete' function.

	David

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 15:00                           ` David Laight
  0 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 15:00 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, dwindsor, David Miller,
	elena.reshetova, 'Herbert Xu'

From: Peter Zijlstra
> Sent: 20 March 2017 14:28
> On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> > On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> > cheaply detect errors - provided you actually generate a forwards branch.
> 
> Note that currently there is no arch specific implementation. We could
> of course cure this.
> 
> But note that the thing you propose; using the overflow flag, can only
> reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
> number of contexts that can nest.
> 
> Sure; getting all starts aligned to double overflow is incredibly rare,
> but I don't want to be the one to have to debug that.

One overflow would set the overflow flag, you don't need both to fail.

In any case you can use the sign flag.
Say valid count values are -64k to -256 and 0 to MAXINT.
The count will normally be +ve unless the 'main free path'
has released the 64k references it holds.
If the sign bit is set after inc/dec the value is checked; 
might valid, an error, or require the item be freed.

Ok assuming the items have reasonable lifetimes and have a nominal
'delete' function.

	David

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 15:00                           ` David Laight
  0 siblings, 0 replies; 137+ messages in thread
From: David Laight @ 2017-03-20 15:00 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: keescook, eric.dumazet, kaber, netdev, bridge, linux-kernel,
	jmorris, ishkamiel, kuznet, akpm, dwindsor, David Miller,
	elena.reshetova, 'Herbert Xu'

From: Peter Zijlstra
> Sent: 20 March 2017 14:28
> On Mon, Mar 20, 2017 at 02:10:24PM +0000, David Laight wrote:
> > On x86 the cpu flags from the 'lock inc/dec' could be used to reasonably
> > cheaply detect errors - provided you actually generate a forwards branch.
> 
> Note that currently there is no arch specific implementation. We could
> of course cure this.
> 
> But note that the thing you propose; using the overflow flag, can only
> reasonably be done on PREEMPT=n kernels, otherwise we have an incredible
> number of contexts that can nest.
> 
> Sure; getting all starts aligned to double overflow is incredibly rare,
> but I don't want to be the one to have to debug that.

One overflow would set the overflow flag, you don't need both to fail.

In any case you can use the sign flag.
Say valid count values are -64k to -256 and 0 to MAXINT.
The count will normally be +ve unless the 'main free path'
has released the 64k references it holds.
If the sign bit is set after inc/dec the value is checked; 
might valid, an error, or require the item be freed.

Ok assuming the items have reasonable lifetimes and have a nominal
'delete' function.

	David


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 14:59                               ` Eric Dumazet
  (?)
@ 2017-03-20 16:18                                 ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, David Miller, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, 2017-03-20 at 07:59 -0700, Eric Dumazet wrote:
> On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:
> 
> > atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> > added two SYNC instructions.
> 
> Although I just saw that refcount was using atomic_cmpxchg_relaxed()
> 
> Time to find some documentation (probably missing) or get some specs for
> this thing.

Interesting.

UDP ipv4 xmit path gets a ~25 % improvement on PPC with this patch.

( 20 concurrent netperf -t UDP_STREAM  : 2.45 Mpps -> 3.07 Mpps )

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8471dd116771462d149e1da2807e446b69b74bcc..9f14aebf0ae1f5f366cfff0fbf58c48603916bc7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -497,14 +497,14 @@ u32 ip_idents_reserve(u32 hash, int segs)
 	u32 now = (u32)jiffies;
 	u32 new, delta = 0;
 
-	if (old != now && cmpxchg(p_tstamp, old, now) == old)
+	if (old != now && cmpxchg_relaxed(p_tstamp, old, now) == old)
 		delta = prandom_u32_max(now - old);
 
 	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
 	do {
 		old = (u32)atomic_read(p_id);
 		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
+	} while (atomic_cmpxchg_relaxed(p_id, old, new) != old);
 
 	return new - segs;
 }

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 16:18                                 ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 07:59 -0700, Eric Dumazet wrote:
> On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:
> 
> > atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> > added two SYNC instructions.
> 
> Although I just saw that refcount was using atomic_cmpxchg_relaxed()
> 
> Time to find some documentation (probably missing) or get some specs for
> this thing.

Interesting.

UDP ipv4 xmit path gets a ~25 % improvement on PPC with this patch.

( 20 concurrent netperf -t UDP_STREAM  : 2.45 Mpps -> 3.07 Mpps )

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8471dd116771462d149e1da2807e446b69b74bcc..9f14aebf0ae1f5f366cfff0fbf58c48603916bc7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -497,14 +497,14 @@ u32 ip_idents_reserve(u32 hash, int segs)
 	u32 now = (u32)jiffies;
 	u32 new, delta = 0;
 
-	if (old != now && cmpxchg(p_tstamp, old, now) == old)
+	if (old != now && cmpxchg_relaxed(p_tstamp, old, now) == old)
 		delta = prandom_u32_max(now - old);
 
 	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
 	do {
 		old = (u32)atomic_read(p_id);
 		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
+	} while (atomic_cmpxchg_relaxed(p_id, old, new) != old);
 
 	return new - segs;
 }

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 16:18                                 ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 07:59 -0700, Eric Dumazet wrote:
> On Mon, 2017-03-20 at 07:51 -0700, Eric Dumazet wrote:
> 
> > atomic_cmpxchg() on PowerPC is horribly more expensive because of the
> > added two SYNC instructions.
> 
> Although I just saw that refcount was using atomic_cmpxchg_relaxed()
> 
> Time to find some documentation (probably missing) or get some specs for
> this thing.

Interesting.

UDP ipv4 xmit path gets a ~25 % improvement on PPC with this patch.

( 20 concurrent netperf -t UDP_STREAM  : 2.45 Mpps -> 3.07 Mpps )

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8471dd116771462d149e1da2807e446b69b74bcc..9f14aebf0ae1f5f366cfff0fbf58c48603916bc7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -497,14 +497,14 @@ u32 ip_idents_reserve(u32 hash, int segs)
 	u32 now = (u32)jiffies;
 	u32 new, delta = 0;
 
-	if (old != now && cmpxchg(p_tstamp, old, now) == old)
+	if (old != now && cmpxchg_relaxed(p_tstamp, old, now) == old)
 		delta = prandom_u32_max(now - old);
 
 	/* Do not use atomic_add_return() as it makes UBSAN unhappy */
 	do {
 		old = (u32)atomic_read(p_id);
 		new = old + delta + segs;
-	} while (atomic_cmpxchg(p_id, old, new) != old);
+	} while (atomic_cmpxchg_relaxed(p_id, old, new) != old);
 
 	return new - segs;
 }




^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 16:18                                 ` Eric Dumazet
@ 2017-03-20 16:34                                   ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, David Miller, elena.reshetova, keescook, netdev,
	bridge, linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

On Mon, 2017-03-20 at 09:18 -0700, Eric Dumazet wrote:

> Interesting.
> 
> UDP ipv4 xmit path gets a ~25 % improvement on PPC with this patch.
> 
> ( 20 concurrent netperf -t UDP_STREAM  : 2.45 Mpps -> 3.07 Mpps )

Well, there _is_ a difference, but not 25 % (this was probably caused by
different queues on TX or RX between my reboots).

I added a sysctl hack to be able to dynamically change on a given
workload, and we hit other bottlenecks (mainly qdisc locks and driver tx
locks) anyway.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-20 16:34                                   ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-20 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: keescook, kaber, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, David Miller, elena.reshetova,
	Herbert Xu

On Mon, 2017-03-20 at 09:18 -0700, Eric Dumazet wrote:

> Interesting.
> 
> UDP ipv4 xmit path gets a ~25 % improvement on PPC with this patch.
> 
> ( 20 concurrent netperf -t UDP_STREAM  : 2.45 Mpps -> 3.07 Mpps )

Well, there _is_ a difference, but not 25 % (this was probably caused by
different queues on TX or RX between my reboots).

I added a sysctl hack to be able to dynamically change on a given
workload, and we hit other bottlenecks (mainly qdisc locks and driver tx
locks) anyway.







^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-20 13:40                           ` [Bridge] " Peter Zijlstra
  (?)
@ 2017-03-21 20:49                             ` Kees Cook
  -1 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, David Miller, Eric Dumazet, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
>> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>> >
>> > So what bench/setup do you want ran?
>>
>> You can start by counting how many cycles an atomic op takes
>> vs. how many cycles this new code takes.
>
> On what uarch?
>
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
>
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

Yeah, this is exactly what I'd like to find as well. Just comparing
cycles between refcount implementations, while interesting, doesn't
show us real-world performance changes, which is what we need to
measure.

Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
elsewhere in this email thread) real-world meaningful enough?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 20:49                             ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, Eric Dumazet, Patrick McHardy, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
>> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>> >
>> > So what bench/setup do you want ran?
>>
>> You can start by counting how many cycles an atomic op takes
>> vs. how many cycles this new code takes.
>
> On what uarch?
>
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
>
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

Yeah, this is exactly what I'd like to find as well. Just comparing
cycles between refcount implementations, while interesting, doesn't
show us real-world performance changes, which is what we need to
measure.

Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
elsewhere in this email thread) real-world meaningful enough?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 20:49                             ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Herbert Xu, Eric Dumazet, Patrick McHardy, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
>> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>> >
>> > So what bench/setup do you want ran?
>>
>> You can start by counting how many cycles an atomic op takes
>> vs. how many cycles this new code takes.
>
> On what uarch?
>
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
>
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

Yeah, this is exactly what I'd like to find as well. Just comparing
cycles between refcount implementations, while interesting, doesn't
show us real-world performance changes, which is what we need to
measure.

Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
elsewhere in this email thread) real-world meaningful enough?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-21 20:49                             ` Kees Cook
  (?)
@ 2017-03-21 21:23                               ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-21 21:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:

> Yeah, this is exactly what I'd like to find as well. Just comparing
> cycles between refcount implementations, while interesting, doesn't
> show us real-world performance changes, which is what we need to
> measure.
> 
> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
> elsewhere in this email thread) real-world meaningful enough?

Not at all ;)

This was targeting the specific change I had in mind for
ip_idents_reserve(), which is not used by TCP flows.

Unfortunately there is no good test simulating real-world workloads,
which are mostly using TCP flows.

Most synthetic tools you can find are not using epoll(), and very often
hit bottlenecks in other layers.


It looks like our suggestion to get kernel builds with atomic_inc()
being exactly an atomic_inc() is not even discussed or implemented.

Coding this would require less time than running a typical Google kernel
qualification (roughly one month, thousands of hosts..., days of SWE).

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 21:23                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-21 21:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:

> Yeah, this is exactly what I'd like to find as well. Just comparing
> cycles between refcount implementations, while interesting, doesn't
> show us real-world performance changes, which is what we need to
> measure.
> 
> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
> elsewhere in this email thread) real-world meaningful enough?

Not at all ;)

This was targeting the specific change I had in mind for
ip_idents_reserve(), which is not used by TCP flows.

Unfortunately there is no good test simulating real-world workloads,
which are mostly using TCP flows.

Most synthetic tools you can find are not using epoll(), and very often
hit bottlenecks in other layers.


It looks like our suggestion to get kernel builds with atomic_inc()
being exactly an atomic_inc() is not even discussed or implemented.

Coding this would require less time than running a typical Google kernel
qualification (roughly one month, thousands of hosts..., days of SWE).

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 21:23                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-21 21:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:

> Yeah, this is exactly what I'd like to find as well. Just comparing
> cycles between refcount implementations, while interesting, doesn't
> show us real-world performance changes, which is what we need to
> measure.
> 
> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
> elsewhere in this email thread) real-world meaningful enough?

Not at all ;)

This was targeting the specific change I had in mind for
ip_idents_reserve(), which is not used by TCP flows.

Unfortunately there is no good test simulating real-world workloads,
which are mostly using TCP flows.

Most synthetic tools you can find are not using epoll(), and very often
hit bottlenecks in other layers.


It looks like our suggestion to get kernel builds with atomic_inc()
being exactly an atomic_inc() is not even discussed or implemented.

Coding this would require less time than running a typical Google kernel
qualification (roughly one month, thousands of hosts..., days of SWE).



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-21 21:23                               ` Eric Dumazet
  (?)
@ 2017-03-21 22:36                                 ` David Miller
  -1 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-21 22:36 UTC (permalink / raw)
  To: eric.dumazet
  Cc: keescook, peterz, herbert, elena.reshetova, netdev, bridge,
	linux-kernel, kuznet, jmorris, kaber, stephen, ishkamiel,
	dwindsor, akpm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 21 Mar 2017 14:23:09 -0700

> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.
> 
> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

+1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 22:36                                 ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-21 22:36 UTC (permalink / raw)
  To: eric.dumazet
  Cc: keescook, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, kaber, elena.reshetova,
	herbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 21 Mar 2017 14:23:09 -0700

> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.
> 
> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

+1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 22:36                                 ` David Miller
  0 siblings, 0 replies; 137+ messages in thread
From: David Miller @ 2017-03-21 22:36 UTC (permalink / raw)
  To: eric.dumazet
  Cc: keescook, peterz, netdev, bridge, linux-kernel, jmorris,
	ishkamiel, kuznet, akpm, dwindsor, kaber, elena.reshetova,
	herbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 21 Mar 2017 14:23:09 -0700

> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.
> 
> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

+1

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-21 21:23                               ` Eric Dumazet
  (?)
@ 2017-03-21 23:51                                 ` Kees Cook
  -1 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 23:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:
>
>> Yeah, this is exactly what I'd like to find as well. Just comparing
>> cycles between refcount implementations, while interesting, doesn't
>> show us real-world performance changes, which is what we need to
>> measure.
>>
>> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
>> elsewhere in this email thread) real-world meaningful enough?
>
> Not at all ;)
>
> This was targeting the specific change I had in mind for
> ip_idents_reserve(), which is not used by TCP flows.

Okay, I just wanted to check. I didn't think so, but it was the only
example in the thread.

> Unfortunately there is no good test simulating real-world workloads,
> which are mostly using TCP flows.

Sure, but there has to be _something_ that can be used to test to
measure the effects. Without a meaningful test, it's weird to reject a
change for performance reasons.

> Most synthetic tools you can find are not using epoll(), and very often
> hit bottlenecks in other layers.
>
>
> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.

So, FWIW, I originally tried to make this a CONFIG in the first couple
passes at getting a refcount defense. I would be fine with this, but I
was not able to convince Peter. :) However, things have evolved a lot
since then, so perhaps there are things do be done here.

> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

It wasn't the issue of coding time; just that it had been specifically
not wanted. :)

Am I understanding you correctly that you'd want something like:

refcount.h:
#ifdef UNPROTECTED_REFCOUNT
#define refcount_inc(x)   atomic_inc(x)
...
#else
void refcount_inc(...
...
#endif

some/net.c:
#define UNPROTECTED_REFCOUNT
#include <refcount.h>

or similar?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 23:51                                 ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 23:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:
>
>> Yeah, this is exactly what I'd like to find as well. Just comparing
>> cycles between refcount implementations, while interesting, doesn't
>> show us real-world performance changes, which is what we need to
>> measure.
>>
>> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
>> elsewhere in this email thread) real-world meaningful enough?
>
> Not at all ;)
>
> This was targeting the specific change I had in mind for
> ip_idents_reserve(), which is not used by TCP flows.

Okay, I just wanted to check. I didn't think so, but it was the only
example in the thread.

> Unfortunately there is no good test simulating real-world workloads,
> which are mostly using TCP flows.

Sure, but there has to be _something_ that can be used to test to
measure the effects. Without a meaningful test, it's weird to reject a
change for performance reasons.

> Most synthetic tools you can find are not using epoll(), and very often
> hit bottlenecks in other layers.
>
>
> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.

So, FWIW, I originally tried to make this a CONFIG in the first couple
passes at getting a refcount defense. I would be fine with this, but I
was not able to convince Peter. :) However, things have evolved a lot
since then, so perhaps there are things do be done here.

> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

It wasn't the issue of coding time; just that it had been specifically
not wanted. :)

Am I understanding you correctly that you'd want something like:

refcount.h:
#ifdef UNPROTECTED_REFCOUNT
#define refcount_inc(x)   atomic_inc(x)
...
#else
void refcount_inc(...
...
#endif

some/net.c:
#define UNPROTECTED_REFCOUNT
#include <refcount.h>

or similar?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-21 23:51                                 ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-21 23:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:
>
>> Yeah, this is exactly what I'd like to find as well. Just comparing
>> cycles between refcount implementations, while interesting, doesn't
>> show us real-world performance changes, which is what we need to
>> measure.
>>
>> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
>> elsewhere in this email thread) real-world meaningful enough?
>
> Not at all ;)
>
> This was targeting the specific change I had in mind for
> ip_idents_reserve(), which is not used by TCP flows.

Okay, I just wanted to check. I didn't think so, but it was the only
example in the thread.

> Unfortunately there is no good test simulating real-world workloads,
> which are mostly using TCP flows.

Sure, but there has to be _something_ that can be used to test to
measure the effects. Without a meaningful test, it's weird to reject a
change for performance reasons.

> Most synthetic tools you can find are not using epoll(), and very often
> hit bottlenecks in other layers.
>
>
> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.

So, FWIW, I originally tried to make this a CONFIG in the first couple
passes at getting a refcount defense. I would be fine with this, but I
was not able to convince Peter. :) However, things have evolved a lot
since then, so perhaps there are things do be done here.

> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

It wasn't the issue of coding time; just that it had been specifically
not wanted. :)

Am I understanding you correctly that you'd want something like:

refcount.h:
#ifdef UNPROTECTED_REFCOUNT
#define refcount_inc(x)   atomic_inc(x)
...
#else
void refcount_inc(...
...
#endif

some/net.c:
#define UNPROTECTED_REFCOUNT
#include <refcount.h>

or similar?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-21 23:51                                 ` Kees Cook
@ 2017-03-22  2:03                                   ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22  2:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Peter Zijlstra, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:

> Am I understanding you correctly that you'd want something like:
> 
> refcount.h:
> #ifdef UNPROTECTED_REFCOUNT
> #define refcount_inc(x)   atomic_inc(x)
> ...
> #else
> void refcount_inc(...
> ...
> #endif
> 
> some/net.c:
> #define UNPROTECTED_REFCOUNT
> #include <refcount.h>
> 
> or similar?

At first, it could be something simple like that yes.

Note that we might define two refcount_inc()  : One that does whole
tests, and refcount_inc_relaxed() that might translate to atomic_inc()
on non debug kernels.

Then later, maybe provide a dynamic infrastructure so that we can
dynamically force the full checks even for refcount_inc_relaxed() on say
1% of the hosts, to get better debug coverage ?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22  2:03                                   ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22  2:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:

> Am I understanding you correctly that you'd want something like:
> 
> refcount.h:
> #ifdef UNPROTECTED_REFCOUNT
> #define refcount_inc(x)   atomic_inc(x)
> ...
> #else
> void refcount_inc(...
> ...
> #endif
> 
> some/net.c:
> #define UNPROTECTED_REFCOUNT
> #include <refcount.h>
> 
> or similar?

At first, it could be something simple like that yes.

Note that we might define two refcount_inc()  : One that does whole
tests, and refcount_inc_relaxed() that might translate to atomic_inc()
on non debug kernels.

Then later, maybe provide a dynamic infrastructure so that we can
dynamically force the full checks even for refcount_inc_relaxed() on say
1% of the hosts, to get better debug coverage ?




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-21 23:51                                 ` Kees Cook
  (?)
@ 2017-03-22 12:11                                   ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Dumazet, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, Mar 21, 2017 at 04:51:13PM -0700, Kees Cook wrote:
> On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Unfortunately there is no good test simulating real-world workloads,
> > which are mostly using TCP flows.
> 
> Sure, but there has to be _something_ that can be used to test to
> measure the effects. Without a meaningful test, it's weird to reject a
> change for performance reasons.

This. How can you optimize if there's no way to actually measure
something?

> > Most synthetic tools you can find are not using epoll(), and very often
> > hit bottlenecks in other layers.
> >
> >
> > It looks like our suggestion to get kernel builds with atomic_inc()
> > being exactly an atomic_inc() is not even discussed or implemented.
> 
> So, FWIW, I originally tried to make this a CONFIG in the first couple
> passes at getting a refcount defense. I would be fine with this, but I
> was not able to convince Peter. :) However, things have evolved a lot
> since then, so perhaps there are things do be done here.

Well, the argument was that unless there's a benchmark that shows it
cares, its all premature optimization.

Similarly, you wanted this enabled at all times because hardening.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 12:11                                   ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Herbert Xu, Eric Dumazet, Patrick McHardy, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 04:51:13PM -0700, Kees Cook wrote:
> On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Unfortunately there is no good test simulating real-world workloads,
> > which are mostly using TCP flows.
> 
> Sure, but there has to be _something_ that can be used to test to
> measure the effects. Without a meaningful test, it's weird to reject a
> change for performance reasons.

This. How can you optimize if there's no way to actually measure
something?

> > Most synthetic tools you can find are not using epoll(), and very often
> > hit bottlenecks in other layers.
> >
> >
> > It looks like our suggestion to get kernel builds with atomic_inc()
> > being exactly an atomic_inc() is not even discussed or implemented.
> 
> So, FWIW, I originally tried to make this a CONFIG in the first couple
> passes at getting a refcount defense. I would be fine with this, but I
> was not able to convince Peter. :) However, things have evolved a lot
> since then, so perhaps there are things do be done here.

Well, the argument was that unless there's a benchmark that shows it
cares, its all premature optimization.

Similarly, you wanted this enabled at all times because hardening.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 12:11                                   ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Herbert Xu, Eric Dumazet, Patrick McHardy, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 04:51:13PM -0700, Kees Cook wrote:
> On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Unfortunately there is no good test simulating real-world workloads,
> > which are mostly using TCP flows.
> 
> Sure, but there has to be _something_ that can be used to test to
> measure the effects. Without a meaningful test, it's weird to reject a
> change for performance reasons.

This. How can you optimize if there's no way to actually measure
something?

> > Most synthetic tools you can find are not using epoll(), and very often
> > hit bottlenecks in other layers.
> >
> >
> > It looks like our suggestion to get kernel builds with atomic_inc()
> > being exactly an atomic_inc() is not even discussed or implemented.
> 
> So, FWIW, I originally tried to make this a CONFIG in the first couple
> passes at getting a refcount defense. I would be fine with this, but I
> was not able to convince Peter. :) However, things have evolved a lot
> since then, so perhaps there are things do be done here.

Well, the argument was that unless there's a benchmark that shows it
cares, its all premature optimization.

Similarly, you wanted this enabled at all times because hardening.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22  2:03                                   ` [Bridge] " Eric Dumazet
  (?)
@ 2017-03-22 12:25                                     ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:

> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.

So you'd want a duplicate interface, such that most code, which doesn't
care about refcount performance much, can still have all the tests
enabled.

But the code that cares about it (and preferably can prove it with
numbers) can use the other.

I'm also somewhat hesitant to use _relaxed for this distinction, as it
has a clear meaning in atomics, maybe _nocheck?

Also; what operations do you want _nocheck variants of, only
refcount_inc() ?

That said; I'm really loath to provide these without actual measurements
that prove they make a difference.

> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Shouldn't be too hard to do in arch specific code using alternative
stuff. Generic code could use jump labels I suppose, but that would
result in bigger code.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 12:25                                     ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:

> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.

So you'd want a duplicate interface, such that most code, which doesn't
care about refcount performance much, can still have all the tests
enabled.

But the code that cares about it (and preferably can prove it with
numbers) can use the other.

I'm also somewhat hesitant to use _relaxed for this distinction, as it
has a clear meaning in atomics, maybe _nocheck?

Also; what operations do you want _nocheck variants of, only
refcount_inc() ?

That said; I'm really loath to provide these without actual measurements
that prove they make a difference.

> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Shouldn't be too hard to do in arch specific code using alternative
stuff. Generic code could use jump labels I suppose, but that would
result in bigger code.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 12:25                                     ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 12:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:

> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.

So you'd want a duplicate interface, such that most code, which doesn't
care about refcount performance much, can still have all the tests
enabled.

But the code that cares about it (and preferably can prove it with
numbers) can use the other.

I'm also somewhat hesitant to use _relaxed for this distinction, as it
has a clear meaning in atomics, maybe _nocheck?

Also; what operations do you want _nocheck variants of, only
refcount_inc() ?

That said; I'm really loath to provide these without actual measurements
that prove they make a difference.

> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Shouldn't be too hard to do in arch specific code using alternative
stuff. Generic code could use jump labels I suppose, but that would
result in bigger code.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 12:25                                     ` Peter Zijlstra
  (?)
@ 2017-03-22 13:22                                       ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Wed, 2017-03-22 at 13:25 +0100, Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:
> 
> > Note that we might define two refcount_inc()  : One that does whole
> > tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> > on non debug kernels.
> 
> So you'd want a duplicate interface, such that most code, which doesn't
> care about refcount performance much, can still have all the tests
> enabled.
> 
> But the code that cares about it (and preferably can prove it with
> numbers) can use the other.
> 
> I'm also somewhat hesitant to use _relaxed for this distinction, as it
> has a clear meaning in atomics, maybe _nocheck?
> 
> Also; what operations do you want _nocheck variants of, only
> refcount_inc() ?

I was mostly thinking of points where we were already checking the value
either before or after the atomic_inc(), using some lazy check (a la
WARN_ON(atomic_read(p) == 0) or something like that.

But admittedly we can replace all these by standard refcount_inc() and
simply provide a CONFIG option to turn off the checks, and let brave
people enable this option.


 

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 13:22                                       ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 13:25 +0100, Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:
> 
> > Note that we might define two refcount_inc()  : One that does whole
> > tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> > on non debug kernels.
> 
> So you'd want a duplicate interface, such that most code, which doesn't
> care about refcount performance much, can still have all the tests
> enabled.
> 
> But the code that cares about it (and preferably can prove it with
> numbers) can use the other.
> 
> I'm also somewhat hesitant to use _relaxed for this distinction, as it
> has a clear meaning in atomics, maybe _nocheck?
> 
> Also; what operations do you want _nocheck variants of, only
> refcount_inc() ?

I was mostly thinking of points where we were already checking the value
either before or after the atomic_inc(), using some lazy check (a la
WARN_ON(atomic_read(p) == 0) or something like that.

But admittedly we can replace all these by standard refcount_inc() and
simply provide a CONFIG option to turn off the checks, and let brave
people enable this option.


 

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 13:22                                       ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 13:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 13:25 +0100, Peter Zijlstra wrote:
> On Tue, Mar 21, 2017 at 07:03:19PM -0700, Eric Dumazet wrote:
> 
> > Note that we might define two refcount_inc()  : One that does whole
> > tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> > on non debug kernels.
> 
> So you'd want a duplicate interface, such that most code, which doesn't
> care about refcount performance much, can still have all the tests
> enabled.
> 
> But the code that cares about it (and preferably can prove it with
> numbers) can use the other.
> 
> I'm also somewhat hesitant to use _relaxed for this distinction, as it
> has a clear meaning in atomics, maybe _nocheck?
> 
> Also; what operations do you want _nocheck variants of, only
> refcount_inc() ?

I was mostly thinking of points where we were already checking the value
either before or after the atomic_inc(), using some lazy check (a la
WARN_ON(atomic_read(p) == 0) or something like that.

But admittedly we can replace all these by standard refcount_inc() and
simply provide a CONFIG option to turn off the checks, and let brave
people enable this option.


 



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 13:22                                       ` Eric Dumazet
  (?)
@ 2017-03-22 14:33                                         ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 14:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Wed, Mar 22, 2017 at 06:22:16AM -0700, Eric Dumazet wrote:

> But admittedly we can replace all these by standard refcount_inc() and
> simply provide a CONFIG option to turn off the checks, and let brave
> people enable this option.

Still brings us back to lacking a real reason to provide that CONFIG
option. Not to mention that this CONFIG knob will kill the warnings for
everything, even the code that might not be as heavily audited as
network and which doesn't really care much about the performance of
refcount operations.


So I'm actually in favour of _nocheck variants, if we can show the need
for them. And I like your idea of being able to dynamically switch them
back to full debug as well.


But I would feel a whole lot better about the entire thing if we could
measure their impact. It would also give us good precedent to whack
other potential users of _nocheck over the head with -- show numbers.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 14:33                                         ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 14:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, Mar 22, 2017 at 06:22:16AM -0700, Eric Dumazet wrote:

> But admittedly we can replace all these by standard refcount_inc() and
> simply provide a CONFIG option to turn off the checks, and let brave
> people enable this option.

Still brings us back to lacking a real reason to provide that CONFIG
option. Not to mention that this CONFIG knob will kill the warnings for
everything, even the code that might not be as heavily audited as
network and which doesn't really care much about the performance of
refcount operations.


So I'm actually in favour of _nocheck variants, if we can show the need
for them. And I like your idea of being able to dynamically switch them
back to full debug as well.


But I would feel a whole lot better about the entire thing if we could
measure their impact. It would also give us good precedent to whack
other potential users of _nocheck over the head with -- show numbers.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 14:33                                         ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 14:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, Mar 22, 2017 at 06:22:16AM -0700, Eric Dumazet wrote:

> But admittedly we can replace all these by standard refcount_inc() and
> simply provide a CONFIG option to turn off the checks, and let brave
> people enable this option.

Still brings us back to lacking a real reason to provide that CONFIG
option. Not to mention that this CONFIG knob will kill the warnings for
everything, even the code that might not be as heavily audited as
network and which doesn't really care much about the performance of
refcount operations.


So I'm actually in favour of _nocheck variants, if we can show the need
for them. And I like your idea of being able to dynamically switch them
back to full debug as well.


But I would feel a whole lot better about the entire thing if we could
measure their impact. It would also give us good precedent to whack
other potential users of _nocheck over the head with -- show numbers.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 14:33                                         ` Peter Zijlstra
  (?)
@ 2017-03-22 14:54                                           ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:

> 
> But I would feel a whole lot better about the entire thing if we could
> measure their impact. It would also give us good precedent to whack
> other potential users of _nocheck over the head with -- show numbers.

I wont be able to measure the impact on real workloads, our productions
kernels are based on 4.3 at this moment.

I guess someone could code a lib/test_refcount.c launching X threads
using either atomic_inc or refcount_inc() in a loop.

That would give a rough estimate of the refcount_t overhead among
various platforms.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 14:54                                           ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:

> 
> But I would feel a whole lot better about the entire thing if we could
> measure their impact. It would also give us good precedent to whack
> other potential users of _nocheck over the head with -- show numbers.

I wont be able to measure the impact on real workloads, our productions
kernels are based on 4.3 at this moment.

I guess someone could code a lib/test_refcount.c launching X threads
using either atomic_inc or refcount_inc() in a loop.

That would give a rough estimate of the refcount_t overhead among
various platforms.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 14:54                                           ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:

> 
> But I would feel a whole lot better about the entire thing if we could
> measure their impact. It would also give us good precedent to whack
> other potential users of _nocheck over the head with -- show numbers.

I wont be able to measure the impact on real workloads, our productions
kernels are based on 4.3 at this moment.

I guess someone could code a lib/test_refcount.c launching X threads
using either atomic_inc or refcount_inc() in a loop.

That would give a rough estimate of the refcount_t overhead among
various platforms.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 14:54                                           ` Eric Dumazet
  (?)
@ 2017-03-22 15:08                                             ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 15:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> 
> > 
> > But I would feel a whole lot better about the entire thing if we could
> > measure their impact. It would also give us good precedent to whack
> > other potential users of _nocheck over the head with -- show numbers.
> 
> I wont be able to measure the impact on real workloads, our productions
> kernels are based on 4.3 at this moment.

Is there really no micro bench that exercises the relevant network
paths? Do you really fully rely on Google production workloads?

> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Its also a fairly meaningless number. It doesn't include any of the
other work the network path does.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 15:08                                             ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 15:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> 
> > 
> > But I would feel a whole lot better about the entire thing if we could
> > measure their impact. It would also give us good precedent to whack
> > other potential users of _nocheck over the head with -- show numbers.
> 
> I wont be able to measure the impact on real workloads, our productions
> kernels are based on 4.3 at this moment.

Is there really no micro bench that exercises the relevant network
paths? Do you really fully rely on Google production workloads?

> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Its also a fairly meaningless number. It doesn't include any of the
other work the network path does.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 15:08                                             ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 15:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> 
> > 
> > But I would feel a whole lot better about the entire thing if we could
> > measure their impact. It would also give us good precedent to whack
> > other potential users of _nocheck over the head with -- show numbers.
> 
> I wont be able to measure the impact on real workloads, our productions
> kernels are based on 4.3 at this moment.

Is there really no micro bench that exercises the relevant network
paths? Do you really fully rely on Google production workloads?

> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Its also a fairly meaningless number. It doesn't include any of the
other work the network path does.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 15:08                                             ` Peter Zijlstra
  (?)
@ 2017-03-22 15:22                                               ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Wed, 2017-03-22 at 16:08 +0100, Peter Zijlstra wrote:
> On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> > On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> > 
> > > 
> > > But I would feel a whole lot better about the entire thing if we could
> > > measure their impact. It would also give us good precedent to whack
> > > other potential users of _nocheck over the head with -- show numbers.
> > 
> > I wont be able to measure the impact on real workloads, our productions
> > kernels are based on 4.3 at this moment.
> 
> Is there really no micro bench that exercises the relevant network
> paths? Do you really fully rely on Google production workloads?

You could run a synflood test, with ~10 Mpps.

sock_hold() is definitely used in SYN handling.

Last upstream kernels do not work on my lab hosts, for whatever reason.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 15:22                                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 16:08 +0100, Peter Zijlstra wrote:
> On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> > On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> > 
> > > 
> > > But I would feel a whole lot better about the entire thing if we could
> > > measure their impact. It would also give us good precedent to whack
> > > other potential users of _nocheck over the head with -- show numbers.
> > 
> > I wont be able to measure the impact on real workloads, our productions
> > kernels are based on 4.3 at this moment.
> 
> Is there really no micro bench that exercises the relevant network
> paths? Do you really fully rely on Google production workloads?

You could run a synflood test, with ~10 Mpps.

sock_hold() is definitely used in SYN handling.

Last upstream kernels do not work on my lab hosts, for whatever reason.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 15:22                                               ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2017-03-22 15:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

On Wed, 2017-03-22 at 16:08 +0100, Peter Zijlstra wrote:
> On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> > On Wed, 2017-03-22 at 15:33 +0100, Peter Zijlstra wrote:
> > 
> > > 
> > > But I would feel a whole lot better about the entire thing if we could
> > > measure their impact. It would also give us good precedent to whack
> > > other potential users of _nocheck over the head with -- show numbers.
> > 
> > I wont be able to measure the impact on real workloads, our productions
> > kernels are based on 4.3 at this moment.
> 
> Is there really no micro bench that exercises the relevant network
> paths? Do you really fully rely on Google production workloads?

You could run a synflood test, with ~10 Mpps.

sock_hold() is definitely used in SYN handling.

Last upstream kernels do not work on my lab hosts, for whatever reason.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22 14:54                                           ` Eric Dumazet
  (?)
@ 2017-03-22 16:51                                             ` Peter Zijlstra
  -1 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 16:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> 
> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Cycles spend on uncontended ops:

					SKL	SNB	IVB-EP

atomic:		lock incl		~15	~13	~10
atomic-ref:	call refcount_inc	~31	~37	~31
atomic-ref2:	$inlined		~23	~22	~21


Contended numbers (E3-1245 v5):


root@skl:~/spinlocks# LOCK=./atomic ./test1.sh
1: 14.797240
2: 87.451230
4: 100.747790
8: 118.234010

root@skl:~/spinlocks# LOCK=./atomic-ref ./test1.sh
1: 30.627320
2: 91.866730
4: 111.029560
8: 141.922420

root@skl:~/spinlocks# LOCK=./atomic-ref2 ./test1.sh
1: 23.243930
2: 98.620250
4: 119.604240
8: 124.864380



The code includes the patches found here:

  https://lkml.kernel.org/r/20170317211918.393791494@infradead.org

and effectively does:

#define REFCOUNT_WARN(cond, str) WARN_ON_ONCE(cond)

 s/WARN_ONCE/REFCOUNT_WARN/

on lib/refcount.c

Find the tarball of the userspace code used attached (its a bit of a
mess; its grown over time and needs a cleanup).

I used: gcc (Debian 6.3.0-6) 6.3.0 20170205


So while its about ~20 cycles worse, reducing contention is far more
effective than removing straight line instruction count (which too is
entirely possible, because GCC generates absolute shite in places).

[-- Attachment #2: spinlocks.tar.bz2 --]
[-- Type: application/octet-stream, Size: 20469 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 16:51                                             ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 16:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> 
> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Cycles spend on uncontended ops:

					SKL	SNB	IVB-EP

atomic:		lock incl		~15	~13	~10
atomic-ref:	call refcount_inc	~31	~37	~31
atomic-ref2:	$inlined		~23	~22	~21


Contended numbers (E3-1245 v5):


root@skl:~/spinlocks# LOCK=./atomic ./test1.sh
1: 14.797240
2: 87.451230
4: 100.747790
8: 118.234010

root@skl:~/spinlocks# LOCK=./atomic-ref ./test1.sh
1: 30.627320
2: 91.866730
4: 111.029560
8: 141.922420

root@skl:~/spinlocks# LOCK=./atomic-ref2 ./test1.sh
1: 23.243930
2: 98.620250
4: 119.604240
8: 124.864380



The code includes the patches found here:

  https://lkml.kernel.org/r/20170317211918.393791494@infradead.org

and effectively does:

#define REFCOUNT_WARN(cond, str) WARN_ON_ONCE(cond)

 s/WARN_ONCE/REFCOUNT_WARN/

on lib/refcount.c

Find the tarball of the userspace code used attached (its a bit of a
mess; its grown over time and needs a cleanup).

I used: gcc (Debian 6.3.0-6) 6.3.0 20170205


So while its about ~20 cycles worse, reducing contention is far more
effective than removing straight line instruction count (which too is
entirely possible, because GCC generates absolute shite in places).

[-- Attachment #2: spinlocks.tar.bz2 --]
[-- Type: application/octet-stream, Size: 20469 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 16:51                                             ` Peter Zijlstra
  0 siblings, 0 replies; 137+ messages in thread
From: Peter Zijlstra @ 2017-03-22 16:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kees Cook, Patrick McHardy, Network Development, bridge, LKML,
	James Morris, Hans Liljestrand, Alexey Kuznetsov, Andrew Morton,
	David Windsor, David Miller, Reshetova, Elena, Herbert Xu

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote:
> 
> I guess someone could code a lib/test_refcount.c launching X threads
> using either atomic_inc or refcount_inc() in a loop.
> 
> That would give a rough estimate of the refcount_t overhead among
> various platforms.

Cycles spend on uncontended ops:

					SKL	SNB	IVB-EP

atomic:		lock incl		~15	~13	~10
atomic-ref:	call refcount_inc	~31	~37	~31
atomic-ref2:	$inlined		~23	~22	~21


Contended numbers (E3-1245 v5):


root@skl:~/spinlocks# LOCK=./atomic ./test1.sh
1: 14.797240
2: 87.451230
4: 100.747790
8: 118.234010

root@skl:~/spinlocks# LOCK=./atomic-ref ./test1.sh
1: 30.627320
2: 91.866730
4: 111.029560
8: 141.922420

root@skl:~/spinlocks# LOCK=./atomic-ref2 ./test1.sh
1: 23.243930
2: 98.620250
4: 119.604240
8: 124.864380



The code includes the patches found here:

  https://lkml.kernel.org/r/20170317211918.393791494@infradead.org

and effectively does:

#define REFCOUNT_WARN(cond, str) WARN_ON_ONCE(cond)

 s/WARN_ONCE/REFCOUNT_WARN/

on lib/refcount.c

Find the tarball of the userspace code used attached (its a bit of a
mess; its grown over time and needs a cleanup).

I used: gcc (Debian 6.3.0-6) 6.3.0 20170205


So while its about ~20 cycles worse, reducing contention is far more
effective than removing straight line instruction count (which too is
entirely possible, because GCC generates absolute shite in places).

[-- Attachment #2: spinlocks.tar.bz2 --]
[-- Type: application/octet-stream, Size: 20469 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
  2017-03-22  2:03                                   ` [Bridge] " Eric Dumazet
  (?)
@ 2017-03-22 19:08                                     ` Kees Cook
  -1 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-22 19:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Peter Zijlstra, Herbert Xu, David Miller, Reshetova, Elena,
	Network Development, bridge, LKML, Alexey Kuznetsov,
	James Morris, Patrick McHardy, Stephen Hemminger,
	Hans Liljestrand, David Windsor, Andrew Morton

On Tue, Mar 21, 2017 at 7:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:
>
>> Am I understanding you correctly that you'd want something like:
>>
>> refcount.h:
>> #ifdef UNPROTECTED_REFCOUNT
>> #define refcount_inc(x)   atomic_inc(x)
>> ...
>> #else
>> void refcount_inc(...
>> ...
>> #endif
>>
>> some/net.c:
>> #define UNPROTECTED_REFCOUNT
>> #include <refcount.h>
>>
>> or similar?
>
> At first, it could be something simple like that yes.
>
> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.
>
> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Well, this isn't about finding bugs in normal workflows. This is about
catching bugs that attackers have found and start exploiting to gain a
use-after-free primitive. The intention is for it to be always
enabled.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 19:08                                     ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-22 19:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 7:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:
>
>> Am I understanding you correctly that you'd want something like:
>>
>> refcount.h:
>> #ifdef UNPROTECTED_REFCOUNT
>> #define refcount_inc(x)   atomic_inc(x)
>> ...
>> #else
>> void refcount_inc(...
>> ...
>> #endif
>>
>> some/net.c:
>> #define UNPROTECTED_REFCOUNT
>> #include <refcount.h>
>>
>> or similar?
>
> At first, it could be something simple like that yes.
>
> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.
>
> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Well, this isn't about finding bugs in normal workflows. This is about
catching bugs that attackers have found and start exploiting to gain a
use-after-free primitive. The intention is for it to be always
enabled.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [Bridge] [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t
@ 2017-03-22 19:08                                     ` Kees Cook
  0 siblings, 0 replies; 137+ messages in thread
From: Kees Cook @ 2017-03-22 19:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, Patrick McHardy, Peter Zijlstra, Network Development,
	bridge, LKML, James Morris, Hans Liljestrand, Alexey Kuznetsov,
	Andrew Morton, David Miller, Reshetova, Elena, David Windsor

On Tue, Mar 21, 2017 at 7:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:
>
>> Am I understanding you correctly that you'd want something like:
>>
>> refcount.h:
>> #ifdef UNPROTECTED_REFCOUNT
>> #define refcount_inc(x)   atomic_inc(x)
>> ...
>> #else
>> void refcount_inc(...
>> ...
>> #endif
>>
>> some/net.c:
>> #define UNPROTECTED_REFCOUNT
>> #include <refcount.h>
>>
>> or similar?
>
> At first, it could be something simple like that yes.
>
> Note that we might define two refcount_inc()  : One that does whole
> tests, and refcount_inc_relaxed() that might translate to atomic_inc()
> on non debug kernels.
>
> Then later, maybe provide a dynamic infrastructure so that we can
> dynamically force the full checks even for refcount_inc_relaxed() on say
> 1% of the hosts, to get better debug coverage ?

Well, this isn't about finding bugs in normal workflows. This is about
catching bugs that attackers have found and start exploiting to gain a
use-after-free primitive. The intention is for it to be always
enabled.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t
  2017-06-30 10:07 [PATCH 00/17] v3 net generic subsystem refcount conversions Elena Reshetova
@ 2017-06-30 10:08 ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-06-30 10:08 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/net_namespace.h | 3 ++-
 net/core/net-sysfs.c        | 2 +-
 net/core/net_namespace.c    | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index fe80bb4..bffe0a3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -5,6 +5,7 @@
 #define __NET_NET_NAMESPACE_H
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/list.h>
 #include <linux/sysctl.h>
@@ -46,7 +47,7 @@ struct netns_ipvs;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
-	atomic_t		passive;	/* To decided when the network
+	refcount_t		passive;	/* To decided when the network
 						 * namespace should be freed.
 						 */
 	atomic_t		count;		/* To decided when the network
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index fe7e145..b4f9922 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1448,7 +1448,7 @@ static void *net_grab_current_ns(void)
 	struct net *ns = current->nsproxy->net_ns;
 #ifdef CONFIG_NET_NS
 	if (ns)
-		atomic_inc(&ns->passive);
+		refcount_inc(&ns->passive);
 #endif
 	return ns;
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2178db8..57feb1e 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -284,7 +284,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
-	atomic_set(&net->passive, 1);
+	refcount_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
@@ -380,7 +380,7 @@ static void net_free(struct net *net)
 void net_drop_ns(void *p)
 {
 	struct net *ns = p;
-	if (ns && atomic_dec_and_test(&ns->passive))
+	if (ns && refcount_dec_and_test(&ns->passive))
 		net_free(ns);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t
  2017-06-28 11:54 [PATCH 00/17] v2 net generic subsystem refcount conversions Elena Reshetova
@ 2017-06-28 11:55 ` Elena Reshetova
  0 siblings, 0 replies; 137+ messages in thread
From: Elena Reshetova @ 2017-06-28 11:55 UTC (permalink / raw)
  To: netdev
  Cc: bridge, linux-kernel, kuznet, jmorris, kaber, stephen, peterz,
	keescook, Elena Reshetova, Hans Liljestrand, David Windsor

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
---
 include/net/net_namespace.h | 3 ++-
 net/core/net-sysfs.c        | 2 +-
 net/core/net_namespace.c    | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index fe80bb4..bffe0a3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -5,6 +5,7 @@
 #define __NET_NET_NAMESPACE_H
 
 #include <linux/atomic.h>
+#include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/list.h>
 #include <linux/sysctl.h>
@@ -46,7 +47,7 @@ struct netns_ipvs;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
-	atomic_t		passive;	/* To decided when the network
+	refcount_t		passive;	/* To decided when the network
 						 * namespace should be freed.
 						 */
 	atomic_t		count;		/* To decided when the network
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 65ea0ff..bdcf5dd 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1444,7 +1444,7 @@ static void *net_grab_current_ns(void)
 	struct net *ns = current->nsproxy->net_ns;
 #ifdef CONFIG_NET_NS
 	if (ns)
-		atomic_inc(&ns->passive);
+		refcount_inc(&ns->passive);
 #endif
 	return ns;
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 26bbfab..50935eb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -284,7 +284,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
-	atomic_set(&net->passive, 1);
+	refcount_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
 	idr_init(&net->netns_ids);
@@ -380,7 +380,7 @@ static void net_free(struct net *net)
 void net_drop_ns(void *p)
 {
 	struct net *ns = p;
-	if (ns && atomic_dec_and_test(&ns->passive))
+	if (ns && refcount_dec_and_test(&ns->passive))
 		net_free(ns);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2017-06-30 10:09 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 15:28 [PATCH 00/17] net subsystem refcount conversions Elena Reshetova
2017-03-16 15:28 ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 01/17] net: convert neighbour.refcnt from atomic_t to refcount_t Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 02/17] net: convert neigh_params.refcnt " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 03/17] net: convert nf_bridge_info.use " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 04/17] net: convert sk_buff.users " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 05/17] net: convert sk_buff_fclones.fclone_ref " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 06/17] net: convert sock.sk_wmem_alloc " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:28 ` [PATCH 07/17] net: convert sock.sk_refcnt " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 16:58   ` Eric Dumazet
2017-03-16 16:58     ` [Bridge] " Eric Dumazet
2017-03-16 16:58     ` Eric Dumazet
2017-03-16 17:38     ` Kees Cook
2017-03-16 17:38       ` [Bridge] " Kees Cook
2017-03-16 19:10       ` David Miller
2017-03-16 19:10         ` [Bridge] " David Miller
2017-03-16 19:10         ` David Miller
2017-03-17  7:42         ` Reshetova, Elena
2017-03-17  7:42           ` [Bridge] " Reshetova, Elena
2017-03-17  7:42           ` Reshetova, Elena
2017-03-17 16:13           ` Eric Dumazet
2017-03-17 16:13             ` [Bridge] " Eric Dumazet
2017-03-17 16:13             ` Eric Dumazet
2017-03-18 16:47             ` Herbert Xu
2017-03-18 16:47               ` [Bridge] " Herbert Xu
2017-03-19  1:21               ` David Miller
2017-03-19  1:21                 ` [Bridge] " David Miller
2017-03-19  1:21                 ` David Miller
2017-03-20 10:39                 ` Peter Zijlstra
2017-03-20 10:39                   ` [Bridge] " Peter Zijlstra
2017-03-20 13:16                   ` Herbert Xu
2017-03-20 13:16                     ` [Bridge] " Herbert Xu
2017-03-20 13:16                     ` Herbert Xu
2017-03-20 13:23                     ` Peter Zijlstra
2017-03-20 13:23                       ` [Bridge] " Peter Zijlstra
2017-03-20 13:27                       ` Herbert Xu
2017-03-20 13:27                         ` [Bridge] " Herbert Xu
2017-03-20 13:27                         ` Herbert Xu
2017-03-20 13:40                         ` Peter Zijlstra
2017-03-20 13:40                           ` [Bridge] " Peter Zijlstra
2017-03-20 14:51                           ` Eric Dumazet
2017-03-20 14:51                             ` [Bridge] " Eric Dumazet
2017-03-20 14:51                             ` Eric Dumazet
2017-03-20 14:59                             ` Eric Dumazet
2017-03-20 14:59                               ` [Bridge] " Eric Dumazet
2017-03-20 14:59                               ` Eric Dumazet
2017-03-20 16:18                               ` Eric Dumazet
2017-03-20 16:18                                 ` [Bridge] " Eric Dumazet
2017-03-20 16:18                                 ` Eric Dumazet
2017-03-20 16:34                                 ` Eric Dumazet
2017-03-20 16:34                                   ` [Bridge] " Eric Dumazet
2017-03-20 14:59                             ` Peter Zijlstra
2017-03-20 14:59                               ` [Bridge] " Peter Zijlstra
2017-03-21 20:49                           ` Kees Cook
2017-03-21 20:49                             ` [Bridge] " Kees Cook
2017-03-21 20:49                             ` Kees Cook
2017-03-21 21:23                             ` Eric Dumazet
2017-03-21 21:23                               ` [Bridge] " Eric Dumazet
2017-03-21 21:23                               ` Eric Dumazet
2017-03-21 22:36                               ` David Miller
2017-03-21 22:36                                 ` [Bridge] " David Miller
2017-03-21 22:36                                 ` David Miller
2017-03-21 23:51                               ` Kees Cook
2017-03-21 23:51                                 ` [Bridge] " Kees Cook
2017-03-21 23:51                                 ` Kees Cook
2017-03-22  2:03                                 ` Eric Dumazet
2017-03-22  2:03                                   ` [Bridge] " Eric Dumazet
2017-03-22 12:25                                   ` Peter Zijlstra
2017-03-22 12:25                                     ` [Bridge] " Peter Zijlstra
2017-03-22 12:25                                     ` Peter Zijlstra
2017-03-22 13:22                                     ` Eric Dumazet
2017-03-22 13:22                                       ` [Bridge] " Eric Dumazet
2017-03-22 13:22                                       ` Eric Dumazet
2017-03-22 14:33                                       ` Peter Zijlstra
2017-03-22 14:33                                         ` [Bridge] " Peter Zijlstra
2017-03-22 14:33                                         ` Peter Zijlstra
2017-03-22 14:54                                         ` Eric Dumazet
2017-03-22 14:54                                           ` [Bridge] " Eric Dumazet
2017-03-22 14:54                                           ` Eric Dumazet
2017-03-22 15:08                                           ` Peter Zijlstra
2017-03-22 15:08                                             ` [Bridge] " Peter Zijlstra
2017-03-22 15:08                                             ` Peter Zijlstra
2017-03-22 15:22                                             ` Eric Dumazet
2017-03-22 15:22                                               ` [Bridge] " Eric Dumazet
2017-03-22 15:22                                               ` Eric Dumazet
2017-03-22 16:51                                           ` Peter Zijlstra
2017-03-22 16:51                                             ` [Bridge] " Peter Zijlstra
2017-03-22 16:51                                             ` Peter Zijlstra
2017-03-22 19:08                                   ` Kees Cook
2017-03-22 19:08                                     ` [Bridge] " Kees Cook
2017-03-22 19:08                                     ` Kees Cook
2017-03-22 12:11                                 ` Peter Zijlstra
2017-03-22 12:11                                   ` [Bridge] " Peter Zijlstra
2017-03-22 12:11                                   ` Peter Zijlstra
2017-03-20 14:10                     ` David Laight
2017-03-20 14:10                       ` [Bridge] " David Laight
2017-03-20 14:10                       ` David Laight
2017-03-20 14:28                       ` Peter Zijlstra
2017-03-20 14:28                         ` [Bridge] " Peter Zijlstra
2017-03-20 14:28                         ` Peter Zijlstra
2017-03-20 15:00                         ` David Laight
2017-03-20 15:00                           ` [Bridge] " David Laight
2017-03-20 15:00                           ` David Laight
2017-03-16 15:28 ` [PATCH 08/17] net: convert sk_filter.refcnt " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 16:04   ` Daniel Borkmann
2017-03-16 16:04     ` [Bridge] " Daniel Borkmann
2017-03-17  8:02     ` Reshetova, Elena
2017-03-17  8:02       ` [Bridge] " Reshetova, Elena
2017-03-17  8:02       ` Reshetova, Elena
2017-03-16 15:28 ` [PATCH 09/17] net: convert ip_mc_list.refcnt " Elena Reshetova
2017-03-16 15:28   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 10/17] net: convert in_device.refcnt " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 11/17] net: convert netpoll_info.refcnt " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 12/17] net: convert unix_address.refcnt " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 13/17] net: convert fib_rule.refcnt " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 14/17] net: convert inet_frag_queue.refcnt " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 15/17] net: convert net.passive " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 16/17] net: convert netlbl_lsm_cache.refcount " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-03-16 15:29 ` [PATCH 17/17] net: convert packet_fanout.sk_ref " Elena Reshetova
2017-03-16 15:29   ` [Bridge] " Elena Reshetova
2017-06-28 11:54 [PATCH 00/17] v2 net generic subsystem refcount conversions Elena Reshetova
2017-06-28 11:55 ` [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t Elena Reshetova
2017-06-30 10:07 [PATCH 00/17] v3 net generic subsystem refcount conversions Elena Reshetova
2017-06-30 10:08 ` [PATCH 15/17] net: convert net.passive from atomic_t to refcount_t Elena Reshetova

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.