linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/4] net: percpufy frequently used vars on struct proto
@ 2006-03-08  1:58 Ravikiran G Thirumalai
  2006-03-08  1:59 ` [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh Ravikiran G Thirumalai
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  1:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

Following patchset converts struct proto.memory_allocated to use batching
per-cpu counters, struct proto.sockets_allocated to use per-cpu counters and
changes the proto.inuse per-cpu variable to use alloc_percpu instead of the
NR_CPUS x cacheline size padding. 

We observed 5% improvement in apache bench requests per second with this 
patchset on a multi NIC 8 way IBM x460 box.

(This was posted earlier
http://marc.theaimsgroup.com/?l=linux-kernel&m=113830220408812&w=2 )

Can this go into -mm please?

Thanks,
Kiran


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08  1:58 [patch 0/4] net: percpufy frequently used vars on struct proto Ravikiran G Thirumalai
@ 2006-03-08  1:59 ` Ravikiran G Thirumalai
  2006-03-08  2:13   ` Andrew Morton
  2006-03-08  2:01 ` [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated Ravikiran G Thirumalai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  1:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

Add percpu_counter_mod_bh for using these counters safely from
both softirq and process context.

Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off by: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/linux/percpu_counter.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/linux/percpu_counter.h	2006-03-07 15:08:00.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/linux/percpu_counter.h	2006-03-07 15:09:21.000000000 -0800
@@ -11,6 +11,7 @@
 #include <linux/smp.h>
 #include <linux/threads.h>
 #include <linux/percpu.h>
+#include <linux/interrupt.h>
 
 #ifdef CONFIG_SMP
 
@@ -110,4 +111,11 @@ static inline void percpu_counter_dec(st
 	percpu_counter_mod(fbc, -1);
 }
 
+static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
+{
+	local_bh_disable();
+	percpu_counter_mod(fbc, amount);
+	local_bh_enable();
+}
+
 #endif /* _LINUX_PERCPU_COUNTER_H */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated
  2006-03-08  1:58 [patch 0/4] net: percpufy frequently used vars on struct proto Ravikiran G Thirumalai
  2006-03-08  1:59 ` [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh Ravikiran G Thirumalai
@ 2006-03-08  2:01 ` Ravikiran G Thirumalai
  2006-03-08  2:14   ` Andrew Morton
  2006-03-08  2:02 ` [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated Ravikiran G Thirumalai
  2006-03-08  2:03 ` [patch 4/4] net: percpufy frequently used vars -- proto.inuse Ravikiran G Thirumalai
  3 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  2:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

Change struct proto->memory_allocated to a batching per-CPU counter 
(percpu_counter) from an atomic_t.  A batching counter is better than a 
plain per-CPU counter as this field is read often.

Signed-off-by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h	2006-03-07 15:07:43.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h	2006-03-07 15:11:48.000000000 -0800
@@ -48,6 +48,7 @@
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>	/* struct sk_buff */
 #include <linux/security.h>
+#include <linux/percpu_counter.h>
 
 #include <linux/filter.h>
 
@@ -541,8 +542,9 @@ struct proto {
 
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(void);
-	atomic_t		*memory_allocated;	/* Current allocated memory. */
+	struct percpu_counter	*memory_allocated;	/* Current allocated memory. */
 	atomic_t		*sockets_allocated;	/* Current number of sockets. */
+
 	/*
 	 * Pressure flag: try to collapse.
 	 * Technical note: it is used by multiple contexts non atomically.
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h	2006-03-07 15:08:00.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h	2006-03-07 15:11:48.000000000 -0800
@@ -225,7 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
-extern atomic_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c	2006-03-07 15:07:43.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c	2006-03-07 15:11:48.000000000 -0800
@@ -1616,12 +1616,13 @@ static char proto_method_implemented(con
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
-	seq_printf(seq, "%-9s %4u %6d  %6d   %-3s %6u   %-3s  %-10s "
+	seq_printf(seq, "%-9s %4u %6d  %6lu   %-3s %6u   %-3s  %-10s "
 			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
 		   proto->sockets_allocated != NULL ? atomic_read(proto->sockets_allocated) : -1,
-		   proto->memory_allocated != NULL ? atomic_read(proto->memory_allocated) : -1,
+		   proto->memory_allocated != NULL ?
+				percpu_counter_read_positive(proto->memory_allocated) : -1,
 		   proto->memory_pressure != NULL ? *proto->memory_pressure ? "yes" : "no" : "NI",
 		   proto->max_header,
 		   proto->slab == NULL ? "no" : "yes",
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c	2006-03-07 15:07:43.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c	2006-03-07 15:11:48.000000000 -0800
@@ -196,11 +196,11 @@ EXPORT_SYMBOL(sk_stream_error);
 void __sk_stream_mem_reclaim(struct sock *sk)
 {
 	if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {
-		atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-			   sk->sk_prot->memory_allocated);
+		percpu_counter_mod_bh(sk->sk_prot->memory_allocated,
+				-(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM));
 		sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
 		if (*sk->sk_prot->memory_pressure &&
-		    (atomic_read(sk->sk_prot->memory_allocated) <
+		    (percpu_counter_read(sk->sk_prot->memory_allocated) <
 		     sk->sk_prot->sysctl_mem[0]))
 			*sk->sk_prot->memory_pressure = 0;
 	}
@@ -213,23 +213,26 @@ int sk_stream_mem_schedule(struct sock *
 	int amt = sk_stream_pages(size);
 
 	sk->sk_forward_alloc += amt * SK_STREAM_MEM_QUANTUM;
-	atomic_add(amt, sk->sk_prot->memory_allocated);
+	percpu_counter_mod_bh(sk->sk_prot->memory_allocated, amt);
 
 	/* Under limit. */
-	if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
+	if (percpu_counter_read(sk->sk_prot->memory_allocated) <
+			sk->sk_prot->sysctl_mem[0]) {
 		if (*sk->sk_prot->memory_pressure)
 			*sk->sk_prot->memory_pressure = 0;
 		return 1;
 	}
 
 	/* Over hard limit. */
-	if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
+	if (percpu_counter_read(sk->sk_prot->memory_allocated) >
+			sk->sk_prot->sysctl_mem[2]) {
 		sk->sk_prot->enter_memory_pressure();
 		goto suppress_allocation;
 	}
 
 	/* Under pressure. */
-	if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[1])
+	if (percpu_counter_read(sk->sk_prot->memory_allocated) >
+			sk->sk_prot->sysctl_mem[1])
 		sk->sk_prot->enter_memory_pressure();
 
 	if (kind) {
@@ -259,7 +262,7 @@ suppress_allocation:
 
 	/* Alas. Undo changes. */
 	sk->sk_forward_alloc -= amt * SK_STREAM_MEM_QUANTUM;
-	atomic_sub(amt, sk->sk_prot->memory_allocated);
+	percpu_counter_mod_bh(sk->sk_prot->memory_allocated, -amt);
 	return 0;
 }
 
Index: linux-2.6.16-rc5mm3/net/decnet/af_decnet.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/decnet/af_decnet.c	2006-03-07 15:07:43.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/decnet/af_decnet.c	2006-03-07 15:09:22.000000000 -0800
@@ -154,7 +154,7 @@ static const struct proto_ops dn_proto_o
 static DEFINE_RWLOCK(dn_hash_lock);
 static struct hlist_head dn_sk_hash[DN_SK_HASH_SIZE];
 static struct hlist_head dn_wild_sk;
-static atomic_t decnet_memory_allocated;
+static struct percpu_counter decnet_memory_allocated;
 
 static int __dn_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen, int flags);
 static int __dn_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen, int flags);
@@ -2383,7 +2383,8 @@ static int __init decnet_init(void)
 	rc = proto_register(&dn_proto, 1);
 	if (rc != 0)
 		goto out;
-
+	
+	percpu_counter_init(&decnet_memory_allocated);
 	dn_neigh_init();
 	dn_dev_init();
 	dn_route_init();
@@ -2424,6 +2425,7 @@ static void __exit decnet_exit(void)
 	proc_net_remove("decnet");
 
 	proto_unregister(&dn_proto);
+	percpu_counter_destroy(&decnet_memory_allocated);
 }
 module_exit(decnet_exit);
 #endif
Index: linux-2.6.16-rc5mm3/net/ipv4/proc.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/proc.c	2006-03-07 15:07:44.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/proc.c	2006-03-07 15:11:48.000000000 -0800
@@ -61,10 +61,10 @@ static int fold_prot_inuse(struct proto 
 static int sockstat_seq_show(struct seq_file *seq, void *v)
 {
 	socket_seq_show(seq);
-	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %d\n",
+	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n",
 		   fold_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
 		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
-		   atomic_read(&tcp_memory_allocated));
+		   percpu_counter_read_positive(&tcp_memory_allocated));
 	seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
 	seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
 	seq_printf(seq,  "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp.c	2006-03-07 15:07:44.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp.c	2006-03-07 15:11:48.000000000 -0800
@@ -283,7 +283,7 @@ EXPORT_SYMBOL(sysctl_tcp_mem);
 EXPORT_SYMBOL(sysctl_tcp_rmem);
 EXPORT_SYMBOL(sysctl_tcp_wmem);
 
-atomic_t tcp_memory_allocated;	/* Current allocated memory. */
+struct percpu_counter tcp_memory_allocated;	/* Current allocated memory. */
 atomic_t tcp_sockets_allocated;	/* Current number of TCP sockets. */
 
 EXPORT_SYMBOL(tcp_memory_allocated);
@@ -1593,7 +1593,7 @@ adjudge_to_death:
 		sk_stream_mem_reclaim(sk);
 		if (atomic_read(sk->sk_prot->orphan_count) > sysctl_tcp_max_orphans ||
 		    (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-		     atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
+		     percpu_counter_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
 			if (net_ratelimit())
 				printk(KERN_INFO "TCP: too many of orphaned "
 				       "sockets\n");
@@ -2127,6 +2127,8 @@ void __init tcp_init(void)
 	       "(established %d bind %d)\n",
 	       tcp_hashinfo.ehash_size << 1, tcp_hashinfo.bhash_size);
 
+	percpu_counter_init(&tcp_memory_allocated);
+
 	tcp_register_congestion_control(&tcp_reno);
 }
 
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_input.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_input.c	2006-03-07 15:08:01.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_input.c	2006-03-07 15:09:22.000000000 -0800
@@ -333,7 +333,7 @@ static void tcp_clamp_window(struct sock
 	if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
 	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
 	    !tcp_memory_pressure &&
-	    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
+	    percpu_counter_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
 		sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
 				    sysctl_tcp_rmem[2]);
 	}
@@ -3555,7 +3555,7 @@ static int tcp_should_expand_sndbuf(stru
 		return 0;
 
 	/* If we are under soft global TCP memory pressure, do not expand.  */
-	if (atomic_read(&tcp_memory_allocated) >= sysctl_tcp_mem[0])
+	if (percpu_counter_read(&tcp_memory_allocated) >= sysctl_tcp_mem[0])
 		return 0;
 
 	/* If we filled the congestion window, do not expand.  */
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_timer.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_timer.c	2006-03-07 15:08:01.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_timer.c	2006-03-07 15:09:22.000000000 -0800
@@ -80,7 +80,7 @@ static int tcp_out_of_resources(struct s
 
 	if (orphans >= sysctl_tcp_max_orphans ||
 	    (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-	     atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
+	     percpu_counter_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
 		if (net_ratelimit())
 			printk(KERN_INFO "Out of socket memory\n");
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated
  2006-03-08  1:58 [patch 0/4] net: percpufy frequently used vars on struct proto Ravikiran G Thirumalai
  2006-03-08  1:59 ` [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh Ravikiran G Thirumalai
  2006-03-08  2:01 ` [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated Ravikiran G Thirumalai
@ 2006-03-08  2:02 ` Ravikiran G Thirumalai
  2006-03-08  2:16   ` Andrew Morton
  2006-03-08  2:03 ` [patch 4/4] net: percpufy frequently used vars -- proto.inuse Ravikiran G Thirumalai
  3 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  2:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

Change the atomic_t sockets_allocated member of struct proto to a 
per-cpu counter.

Signed-off-by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h	2006-03-07 15:09:52.000000000 -0800
@@ -543,7 +543,7 @@ struct proto {
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(void);
 	struct percpu_counter	*memory_allocated;	/* Current allocated memory. */
-	atomic_t		*sockets_allocated;	/* Current number of sockets. */
+	int                     *sockets_allocated;     /* Current number of sockets (percpu counter). */
 
 	/*
 	 * Pressure flag: try to collapse.
@@ -579,6 +579,24 @@ struct proto {
 	} stats[NR_CPUS];
 };
 
+static inline int read_sockets_allocated(struct proto *prot)
+{
+	int total = 0;
+	int cpu;
+	for_each_cpu(cpu)
+		total += *per_cpu_ptr(prot->sockets_allocated, cpu);
+	return total;
+}
+
+static inline void mod_sockets_allocated(int *sockets_allocated, int count)
+{
+	(*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
+	put_cpu();
+}
+
+#define inc_sockets_allocated(c) mod_sockets_allocated(c, 1)
+#define dec_sockets_allocated(c) mod_sockets_allocated(c, -1)
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h	2006-03-07 15:09:52.000000000 -0800
@@ -226,7 +226,7 @@ extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
 extern struct percpu_counter tcp_memory_allocated;
-extern atomic_t tcp_sockets_allocated;
+extern int *tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
 /*
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c	2006-03-07 15:09:52.000000000 -0800
@@ -771,7 +771,7 @@ struct sock *sk_clone(const struct sock 
 		newsk->sk_sleep	 = NULL;
 
 		if (newsk->sk_prot->sockets_allocated)
-			atomic_inc(newsk->sk_prot->sockets_allocated);
+			inc_sockets_allocated(newsk->sk_prot->sockets_allocated); 
 	}
 out:
 	return newsk;
@@ -1620,7 +1620,7 @@ static void proto_seq_printf(struct seq_
 			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
-		   proto->sockets_allocated != NULL ? atomic_read(proto->sockets_allocated) : -1,
+		   proto->sockets_allocated != NULL ? read_sockets_allocated(proto) : -1,
 		   proto->memory_allocated != NULL ?
 				percpu_counter_read_positive(proto->memory_allocated) : -1,
 		   proto->memory_pressure != NULL ? *proto->memory_pressure ? "yes" : "no" : "NI",
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c	2006-03-07 15:09:52.000000000 -0800
@@ -242,7 +242,7 @@ int sk_stream_mem_schedule(struct sock *
 		return 1;
 
 	if (!*sk->sk_prot->memory_pressure ||
-	    sk->sk_prot->sysctl_mem[2] > atomic_read(sk->sk_prot->sockets_allocated) *
+	    sk->sk_prot->sysctl_mem[2] > read_sockets_allocated(sk->sk_prot) *
 				sk_stream_pages(sk->sk_wmem_queued +
 						atomic_read(&sk->sk_rmem_alloc) +
 						sk->sk_forward_alloc))
Index: linux-2.6.16-rc5mm3/net/ipv4/proc.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/proc.c	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/proc.c	2006-03-07 15:09:52.000000000 -0800
@@ -63,7 +63,7 @@ static int sockstat_seq_show(struct seq_
 	socket_seq_show(seq);
 	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n",
 		   fold_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
-		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
+		   tcp_death_row.tw_count, read_sockets_allocated(&tcp_prot),
 		   percpu_counter_read_positive(&tcp_memory_allocated));
 	seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
 	seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp.c	2006-03-07 15:09:22.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp.c	2006-03-07 15:09:52.000000000 -0800
@@ -284,7 +284,7 @@ EXPORT_SYMBOL(sysctl_tcp_rmem);
 EXPORT_SYMBOL(sysctl_tcp_wmem);
 
 struct percpu_counter tcp_memory_allocated;	/* Current allocated memory. */
-atomic_t tcp_sockets_allocated;	/* Current number of TCP sockets. */
+int *tcp_sockets_allocated;   /* Current number of TCP sockets. */
 
 EXPORT_SYMBOL(tcp_memory_allocated);
 EXPORT_SYMBOL(tcp_sockets_allocated);
@@ -2055,6 +2055,12 @@ void __init tcp_init(void)
 	if (!tcp_hashinfo.bind_bucket_cachep)
 		panic("tcp_init: Cannot alloc tcp_bind_bucket cache.");
 
+	tcp_sockets_allocated = alloc_percpu(int);
+	if (!tcp_sockets_allocated)
+		panic("tcp_init: Cannot alloc tcp_sockets_allocated");
+
+	tcp_prot.sockets_allocated = tcp_sockets_allocated;
+
 	/* Size and allocate the main established and bind bucket
 	 * hash tables.
 	 *
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_ipv4.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_ipv4.c	2006-03-07 15:08:01.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_ipv4.c	2006-03-07 15:09:52.000000000 -0800
@@ -1273,7 +1273,7 @@ static int tcp_v4_init_sock(struct sock 
 	sk->sk_sndbuf = sysctl_tcp_wmem[1];
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
-	atomic_inc(&tcp_sockets_allocated);
+	inc_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1307,7 +1307,7 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	atomic_dec(&tcp_sockets_allocated);
+	dec_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1815,7 +1815,6 @@ struct proto tcp_prot = {
 	.unhash			= tcp_unhash,
 	.get_port		= tcp_v4_get_port,
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
-	.sockets_allocated	= &tcp_sockets_allocated,
 	.orphan_count		= &tcp_orphan_count,
 	.memory_allocated	= &tcp_memory_allocated,
 	.memory_pressure	= &tcp_memory_pressure,
Index: linux-2.6.16-rc5mm3/net/ipv6/tcp_ipv6.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv6/tcp_ipv6.c	2006-03-07 15:08:01.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv6/tcp_ipv6.c	2006-03-07 15:11:43.000000000 -0800
@@ -1375,7 +1375,7 @@ static int tcp_v6_init_sock(struct sock 
 	sk->sk_sndbuf = sysctl_tcp_wmem[1];
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
-	atomic_inc(&tcp_sockets_allocated);
+	inc_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1573,7 +1573,6 @@ struct proto tcpv6_prot = {
 	.unhash			= tcp_unhash,
 	.get_port		= tcp_v6_get_port,
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
-	.sockets_allocated	= &tcp_sockets_allocated,
 	.memory_allocated	= &tcp_memory_allocated,
 	.memory_pressure	= &tcp_memory_pressure,
 	.orphan_count		= &tcp_orphan_count,
@@ -1605,6 +1604,7 @@ static struct inet_protosw tcpv6_protosw
 
 void __init tcpv6_init(void)
 {
+	tcpv6_prot.sockets_allocated = tcp_sockets_allocated;
 	/* register inet6 protocol */
 	if (inet6_add_protocol(&tcpv6_protocol, IPPROTO_TCP) < 0)
 		printk(KERN_ERR "tcpv6_init: Could not register protocol\n");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [patch 4/4] net: percpufy frequently used vars -- proto.inuse
  2006-03-08  1:58 [patch 0/4] net: percpufy frequently used vars on struct proto Ravikiran G Thirumalai
                   ` (2 preceding siblings ...)
  2006-03-08  2:02 ` [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated Ravikiran G Thirumalai
@ 2006-03-08  2:03 ` Ravikiran G Thirumalai
  3 siblings, 0 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  2:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

This patch uses alloc_percpu to allocate per-CPU memory for the proto->inuse
field.  The inuse field is currently per-CPU as in NR_CPUS * cacheline size -- 
a big bloat on arches with large cachelines.  Also marks some frequently
used protos read mostly.

Signed-off-by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc2/include/net/sock.h
===================================================================
--- linux-2.6.16-rc2.orig/include/net/sock.h	2006-02-09 15:01:47.000000000 -0800
+++ linux-2.6.16-rc2/include/net/sock.h	2006-02-09 15:01:52.000000000 -0800
@@ -573,10 +573,7 @@ struct proto {
 #ifdef SOCK_REFCNT_DEBUG
 	atomic_t		socks;
 #endif
-	struct {
-		int inuse;
-		u8  __pad[SMP_CACHE_BYTES - sizeof(int)];
-	} stats[NR_CPUS];
+	int *inuse;
 };
 
 static inline int read_sockets_allocated(struct proto *prot)
@@ -628,12 +625,12 @@ static inline void sk_refcnt_debug_relea
 /* Called with local bh disabled */
 static __inline__ void sock_prot_inc_use(struct proto *prot)
 {
-	prot->stats[smp_processor_id()].inuse++;
+	(*per_cpu_ptr(prot->inuse, smp_processor_id())) += 1;
 }
 
 static __inline__ void sock_prot_dec_use(struct proto *prot)
 {
-	prot->stats[smp_processor_id()].inuse--;
+	(*per_cpu_ptr(prot->inuse, smp_processor_id())) -= 1;
 }
 
 /* With per-bucket locks this operation is not-atomic, so that
Index: linux-2.6.16-rc2/net/core/sock.c
===================================================================
--- linux-2.6.16-rc2.orig/net/core/sock.c	2006-02-09 15:01:47.000000000 -0800
+++ linux-2.6.16-rc2/net/core/sock.c	2006-02-09 15:01:52.000000000 -0800
@@ -1508,12 +1508,21 @@ int proto_register(struct proto *prot, i
 		}
 	}
 
+	prot->inuse = alloc_percpu(int);
+	if (prot->inuse == NULL) {
+		if (alloc_slab)
+			goto out_free_timewait_sock_slab_name_cache;
+		else
+			goto out;
+	}
 	write_lock(&proto_list_lock);
 	list_add(&prot->node, &proto_list);
 	write_unlock(&proto_list_lock);
 	rc = 0;
 out:
 	return rc;
+out_free_timewait_sock_slab_name_cache:
+	kmem_cache_destroy(prot->twsk_prot->twsk_slab);
 out_free_timewait_sock_slab_name:
 	kfree(timewait_sock_slab_name);
 out_free_request_sock_slab:
@@ -1537,6 +1546,11 @@ void proto_unregister(struct proto *prot
 	list_del(&prot->node);
 	write_unlock(&proto_list_lock);
 
+	if (prot->inuse != NULL) {
+		free_percpu(prot->inuse);
+		prot->inuse = NULL;
+	}
+
 	if (prot->slab != NULL) {
 		kmem_cache_destroy(prot->slab);
 		prot->slab = NULL;
Index: linux-2.6.16-rc2/net/ipv4/proc.c
===================================================================
--- linux-2.6.16-rc2.orig/net/ipv4/proc.c	2006-02-09 15:01:47.000000000 -0800
+++ linux-2.6.16-rc2/net/ipv4/proc.c	2006-02-09 15:01:52.000000000 -0800
@@ -50,7 +50,7 @@ static int fold_prot_inuse(struct proto 
 	int cpu;
 
 	for_each_cpu(cpu)
-		res += proto->stats[cpu].inuse;
+		res += (*per_cpu_ptr(proto->inuse, cpu));
 
 	return res;
 }
Index: linux-2.6.16-rc2/net/ipv4/raw.c
===================================================================
--- linux-2.6.16-rc2.orig/net/ipv4/raw.c	2006-02-07 23:14:04.000000000 -0800
+++ linux-2.6.16-rc2/net/ipv4/raw.c	2006-02-09 15:01:52.000000000 -0800
@@ -718,7 +718,7 @@ static int raw_ioctl(struct sock *sk, in
 	}
 }
 
-struct proto raw_prot = {
+struct proto raw_prot __read_mostly = {
 	.name =		"RAW",
 	.owner =	THIS_MODULE,
 	.close =	raw_close,
Index: linux-2.6.16-rc2/net/ipv4/tcp_ipv4.c
===================================================================
--- linux-2.6.16-rc2.orig/net/ipv4/tcp_ipv4.c	2006-02-09 15:01:47.000000000 -0800
+++ linux-2.6.16-rc2/net/ipv4/tcp_ipv4.c	2006-02-09 15:01:52.000000000 -0800
@@ -1794,7 +1794,7 @@ void tcp4_proc_exit(void)
 }
 #endif /* CONFIG_PROC_FS */
 
-struct proto tcp_prot = {
+struct proto tcp_prot __read_mostly = {
 	.name			= "TCP",
 	.owner			= THIS_MODULE,
 	.close			= tcp_close,
Index: linux-2.6.16-rc2/net/ipv4/udp.c
===================================================================
--- linux-2.6.16-rc2.orig/net/ipv4/udp.c	2006-02-07 23:14:04.000000000 -0800
+++ linux-2.6.16-rc2/net/ipv4/udp.c	2006-02-09 15:01:52.000000000 -0800
@@ -1340,7 +1340,7 @@ unsigned int udp_poll(struct file *file,
 	
 }
 
-struct proto udp_prot = {
+struct proto udp_prot __read_mostly = {
  	.name =		"UDP",
 	.owner =	THIS_MODULE,
 	.close =	udp_close,
Index: linux-2.6.16-rc2/net/ipv6/proc.c
===================================================================
--- linux-2.6.16-rc2.orig/net/ipv6/proc.c	2006-02-07 23:19:10.000000000 -0800
+++ linux-2.6.16-rc2/net/ipv6/proc.c	2006-02-09 15:01:52.000000000 -0800
@@ -39,7 +39,7 @@ static int fold_prot_inuse(struct proto 
 	int cpu;
 
 	for_each_cpu(cpu)
-		res += proto->stats[cpu].inuse;
+		res += (*per_cpu_ptr(proto->inuse, cpu));
 
 	return res;
 }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08  1:59 ` [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh Ravikiran G Thirumalai
@ 2006-03-08  2:13   ` Andrew Morton
  2006-03-08 20:26     ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-08  2:13 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> +static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
>  +{
>  +	local_bh_disable();
>  +	percpu_counter_mod(fbc, amount);
>  +	local_bh_enable();
>  +}
>  +

percpu_counter_mod() does preempt_disable(), which is redundant in this
context.  So just do fbc->count += amount; here.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated
  2006-03-08  2:01 ` [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated Ravikiran G Thirumalai
@ 2006-03-08  2:14   ` Andrew Morton
  2006-03-08  3:08     ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-08  2:14 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> -	if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
>  +	if (percpu_counter_read(sk->sk_prot->memory_allocated) <
>  +			sk->sk_prot->sysctl_mem[0]) {

Bear in mind that percpu_counter_read[_positive] can be inaccurate on large
CPU counts.

It might be worth running percpu_counter_sum() to get the exact count if we
think we're about to cause something to fail.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated
  2006-03-08  2:02 ` [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated Ravikiran G Thirumalai
@ 2006-03-08  2:16   ` Andrew Morton
  2006-03-08 20:56     ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-08  2:16 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> --- linux-2.6.16-rc5mm3.orig/include/net/sock.h	2006-03-07 15:09:22.000000000 -0800
>  +++ linux-2.6.16-rc5mm3/include/net/sock.h	2006-03-07 15:09:52.000000000 -0800
>  @@ -543,7 +543,7 @@ struct proto {
>   	/* Memory pressure */
>   	void			(*enter_memory_pressure)(void);
>   	struct percpu_counter	*memory_allocated;	/* Current allocated memory. */
>  -	atomic_t		*sockets_allocated;	/* Current number of sockets. */
>  +	int                     *sockets_allocated;     /* Current number of sockets (percpu counter). */
>   
>   	/*
>   	 * Pressure flag: try to collapse.
>  @@ -579,6 +579,24 @@ struct proto {
>   	} stats[NR_CPUS];
>   };
>   
>  +static inline int read_sockets_allocated(struct proto *prot)
>  +{
>  +	int total = 0;
>  +	int cpu;
>  +	for_each_cpu(cpu)
>  +		total += *per_cpu_ptr(prot->sockets_allocated, cpu);
>  +	return total;
>  +}

This is likely too big to be inlined, plus sock.h doesn't include enough
headers to reliably compile this code.

>  +static inline void mod_sockets_allocated(int *sockets_allocated, int count)
>  +{
>  +	(*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
>  +	put_cpu();
>  +}
>  +

Ditto.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated
  2006-03-08  2:14   ` Andrew Morton
@ 2006-03-08  3:08     ` Ravikiran G Thirumalai
  2006-03-08  3:22       ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08  3:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

On Tue, Mar 07, 2006 at 06:14:22PM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> >
> > -	if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> >  +	if (percpu_counter_read(sk->sk_prot->memory_allocated) <
> >  +			sk->sk_prot->sysctl_mem[0]) {
> 
> Bear in mind that percpu_counter_read[_positive] can be inaccurate on large
> CPU counts.
> 
> It might be worth running percpu_counter_sum() to get the exact count if we
> think we're about to cause something to fail.

The problem is percpu_counter_sum has to read all the cpus cachelines.  If
we have to use percpu_counter_sum everywhere, then might as well use plain
per-cpu counters instead of  batching ones no?
 
sysctl_mem[0] is about 196K  and on a 16 cpu box variance is 512 bytes, which 
is OK with just percpu_counter_read I hope.  Maybe, on very large cpu counts, 
we should just change the FBC_BATCH so that variance does not go quadratic.
Something like 32.  So that variance is 32 * NR_CPUS in that case, instead
of (NR_CPUS * NR_CPUS * 2) currently.  Comments?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated
  2006-03-08  3:08     ` Ravikiran G Thirumalai
@ 2006-03-08  3:22       ` Andrew Morton
  2006-03-08 20:54         ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-08  3:22 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> On Tue, Mar 07, 2006 at 06:14:22PM -0800, Andrew Morton wrote:
> > Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> > >
> > > -	if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> > >  +	if (percpu_counter_read(sk->sk_prot->memory_allocated) <
> > >  +			sk->sk_prot->sysctl_mem[0]) {
> > 
> > Bear in mind that percpu_counter_read[_positive] can be inaccurate on large
> > CPU counts.
> > 
> > It might be worth running percpu_counter_sum() to get the exact count if we
> > think we're about to cause something to fail.
> 
> The problem is percpu_counter_sum has to read all the cpus cachelines.  If
> we have to use percpu_counter_sum everywhere, then might as well use plain
> per-cpu counters instead of  batching ones no?

I didn't say "use it everywhere" ;)

Just in places like this:

	if (percpu_counter_read(something) > something_else)
		make_an_application_fail();

in that case it's worth running percpu_counter_sum().  And bear in mind
that once we've done that, the following percpu_counter_read()s become
accurate, so we won't run the expensive percpu_counter_sum() again
for a while.  Unless we're really close to or over the limit, in which case
blowing a few cycles is relatively unimportant.

All that should be captured in library code (per_cpu_counter_exceeds(ptr,
threshold), for example) rather than open-coded everywhere.

> sysctl_mem[0] is about 196K  and on a 16 cpu box variance is 512 bytes, which 
> is OK with just percpu_counter_read I hope.

You mean a 16 CPU box with NR_CPUS=16 as well...

>  Maybe, on very large cpu counts, 
> we should just change the FBC_BATCH so that variance does not go quadratic.
> Something like 32.  So that variance is 32 * NR_CPUS in that case, instead
> of (NR_CPUS * NR_CPUS * 2) currently.  Comments?

Sure, we need to make that happen.  But it got all mixed up with the
spinlock removal and it does need quite some thought and testing and
documentation to help developers to choose the right settings and
appropriate selection of defaults, etc.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08  2:13   ` Andrew Morton
@ 2006-03-08 20:26     ` Ravikiran G Thirumalai
  2006-03-08 20:36       ` Benjamin LaHaise
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08 20:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

On Tue, Mar 07, 2006 at 06:13:01PM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> >
> > +static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
> >  +{
> >  +	local_bh_disable();
> >  +	percpu_counter_mod(fbc, amount);
> >  +	local_bh_enable();
> >  +}
> >  +
> 
> percpu_counter_mod() does preempt_disable(), which is redundant in this
> context.  So just do fbc->count += amount; here.

OK.  Here is the revised patch.


Add percpu_counter_mod_bh for using these counters safely from
both softirq and process context.

Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off by: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/linux/percpu_counter.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/linux/percpu_counter.h	2006-03-07 15:08:00.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/linux/percpu_counter.h	2006-03-08 10:53:13.000000000 -0800
@@ -11,6 +11,7 @@
 #include <linux/smp.h>
 #include <linux/threads.h>
 #include <linux/percpu.h>
+#include <linux/interrupt.h>
 
 #ifdef CONFIG_SMP
 
@@ -40,6 +41,7 @@ static inline void percpu_counter_destro
 
 void percpu_counter_mod(struct percpu_counter *fbc, long amount);
 long percpu_counter_sum(struct percpu_counter *fbc);
+void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount);
 
 static inline long percpu_counter_read(struct percpu_counter *fbc)
 {
@@ -98,6 +100,12 @@ static inline long percpu_counter_sum(st
 	return percpu_counter_read_positive(fbc);
 }
 
+static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
+{
+	local_bh_disable();
+	fbc->count += amount;
+	local_bh_enable();
+}
 #endif	/* CONFIG_SMP */
 
 static inline void percpu_counter_inc(struct percpu_counter *fbc)
@@ -110,4 +118,5 @@ static inline void percpu_counter_dec(st
 	percpu_counter_mod(fbc, -1);
 }
 
+
 #endif /* _LINUX_PERCPU_COUNTER_H */
Index: linux-2.6.16-rc5mm3/mm/swap.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/mm/swap.c	2006-03-07 15:08:01.000000000 -0800
+++ linux-2.6.16-rc5mm3/mm/swap.c	2006-03-08 12:11:19.000000000 -0800
@@ -521,11 +521,11 @@ static int cpu_swap_callback(struct noti
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_SMP
-void percpu_counter_mod(struct percpu_counter *fbc, long amount)
+static void __percpu_counter_mod(struct percpu_counter *fbc, long amount)
 {
 	long count;
 	long *pcount;
-	int cpu = get_cpu();
+	int cpu = smp_processor_id();
 
 	pcount = per_cpu_ptr(fbc->counters, cpu);
 	count = *pcount + amount;
@@ -537,9 +537,21 @@ void percpu_counter_mod(struct percpu_co
 	} else {
 		*pcount = count;
 	}
-	put_cpu();
 }
-EXPORT_SYMBOL(percpu_counter_mod);
+
+void percpu_counter_mod(struct percpu_counter *fbc, long amount)
+{
+	preempt_disable();
+	__percpu_counter_mod(fbc, amount);
+	preempt_enable();
+}
+
+void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
+{
+	local_bh_disable();
+	__percpu_counter_mod(fbc, amount);
+	local_bh_enable();
+}
 
 /*
  * Add up all the per-cpu counts, return the result.  This is a more accurate
@@ -559,6 +571,9 @@ long percpu_counter_sum(struct percpu_co
 	spin_unlock(&fbc->lock);
 	return ret < 0 ? 0 : ret;
 }
+
+EXPORT_SYMBOL(percpu_counter_mod);
+EXPORT_SYMBOL(percpu_counter_mod_bh);
 EXPORT_SYMBOL(percpu_counter_sum);
 #endif
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 20:26     ` Ravikiran G Thirumalai
@ 2006-03-08 20:36       ` Benjamin LaHaise
  2006-03-08 21:07         ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Benjamin LaHaise @ 2006-03-08 20:36 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 12:26:56PM -0800, Ravikiran G Thirumalai wrote:
> +static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
> +{
> +	local_bh_disable();
> +	fbc->count += amount;
> +	local_bh_enable();
> +}

Please use local_t instead, then you don't have to do the local_bh_disable() 
and enable() and the whole thing collapses down into 1 instruction on x86.

		-ben

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated
  2006-03-08  3:22       ` Andrew Morton
@ 2006-03-08 20:54         ` Ravikiran G Thirumalai
  0 siblings, 0 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08 20:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

On Tue, Mar 07, 2006 at 07:22:34PM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> > 
> > The problem is percpu_counter_sum has to read all the cpus cachelines.  If
> > we have to use percpu_counter_sum everywhere, then might as well use plain
> > per-cpu counters instead of  batching ones no?
> 
> I didn't say "use it everywhere" ;)

:) No you did not. Sorry.  When the counter shows large varience it will affect
all places where it is read and it wasn't clear to me where to use
percpu_counter_sum vs percpu_counter_read in the context of the patch.  
But yes, using percpu_counter_sum at the critical points makes things better.
Thanks for pointing it out.  Here is an attempt on those lines. 

> 
> >  Maybe, on very large cpu counts, 
> > we should just change the FBC_BATCH so that variance does not go quadratic.
> > Something like 32.  So that variance is 32 * NR_CPUS in that case, instead
> > of (NR_CPUS * NR_CPUS * 2) currently.  Comments?
> 
> Sure, we need to make that happen.  But it got all mixed up with the
> spinlock removal and it does need quite some thought and testing and
> documentation to help developers to choose the right settings and
> appropriate selection of defaults, etc.

I was thinking on the lines of something like the following as a simple fix
to the quadratic variance. But yes, it needs some experimentation before hand..

#if NR_CPUS >= 16
#define FBC_BATCH       (32)
#else
#define FBC_BATCH       (NR_CPUS*4)
#endif

Thanks,
Kiran


Change struct proto->memory_allocated to a batching per-CPU counter 
(percpu_counter) from an atomic_t.  A batching counter is better than a 
plain per-CPU counter as this field is read often. 

Changes since last post: Use the more accurate percpu_counter_sum() at 
critical places.

Signed-off-by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h	2006-03-08 11:03:19.000000000 -0800
@@ -48,6 +48,7 @@
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>	/* struct sk_buff */
 #include <linux/security.h>
+#include <linux/percpu_counter.h>
 
 #include <linux/filter.h>
 
@@ -541,8 +542,9 @@ struct proto {
 
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(void);
-	atomic_t		*memory_allocated;	/* Current allocated memory. */
+	struct percpu_counter	*memory_allocated;	/* Current allocated memory. */
 	atomic_t		*sockets_allocated;	/* Current number of sockets. */
+
 	/*
 	 * Pressure flag: try to collapse.
 	 * Technical note: it is used by multiple contexts non atomically.
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h	2006-03-08 11:03:19.000000000 -0800
@@ -225,7 +225,7 @@ extern int sysctl_tcp_abc;
 extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
-extern atomic_t tcp_memory_allocated;
+extern struct percpu_counter tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c	2006-03-08 11:03:19.000000000 -0800
@@ -1616,12 +1616,13 @@ static char proto_method_implemented(con
 
 static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
 {
-	seq_printf(seq, "%-9s %4u %6d  %6d   %-3s %6u   %-3s  %-10s "
+	seq_printf(seq, "%-9s %4u %6d  %6lu   %-3s %6u   %-3s  %-10s "
 			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
 		   proto->sockets_allocated != NULL ? atomic_read(proto->sockets_allocated) : -1,
-		   proto->memory_allocated != NULL ? atomic_read(proto->memory_allocated) : -1,
+		   proto->memory_allocated != NULL ?
+				percpu_counter_read_positive(proto->memory_allocated) : -1,
 		   proto->memory_pressure != NULL ? *proto->memory_pressure ? "yes" : "no" : "NI",
 		   proto->max_header,
 		   proto->slab == NULL ? "no" : "yes",
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c	2006-03-08 11:08:28.000000000 -0800
@@ -196,11 +196,11 @@ EXPORT_SYMBOL(sk_stream_error);
 void __sk_stream_mem_reclaim(struct sock *sk)
 {
 	if (sk->sk_forward_alloc >= SK_STREAM_MEM_QUANTUM) {
-		atomic_sub(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
-			   sk->sk_prot->memory_allocated);
+		percpu_counter_mod_bh(sk->sk_prot->memory_allocated,
+				-(sk->sk_forward_alloc / SK_STREAM_MEM_QUANTUM));
 		sk->sk_forward_alloc &= SK_STREAM_MEM_QUANTUM - 1;
 		if (*sk->sk_prot->memory_pressure &&
-		    (atomic_read(sk->sk_prot->memory_allocated) <
+		    (percpu_counter_read(sk->sk_prot->memory_allocated) <
 		     sk->sk_prot->sysctl_mem[0]))
 			*sk->sk_prot->memory_pressure = 0;
 	}
@@ -213,23 +213,26 @@ int sk_stream_mem_schedule(struct sock *
 	int amt = sk_stream_pages(size);
 
 	sk->sk_forward_alloc += amt * SK_STREAM_MEM_QUANTUM;
-	atomic_add(amt, sk->sk_prot->memory_allocated);
+	percpu_counter_mod_bh(sk->sk_prot->memory_allocated, amt);
 
 	/* Under limit. */
-	if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
+	if (percpu_counter_read(sk->sk_prot->memory_allocated) <
+			sk->sk_prot->sysctl_mem[0]) {
 		if (*sk->sk_prot->memory_pressure)
 			*sk->sk_prot->memory_pressure = 0;
 		return 1;
 	}
 
 	/* Over hard limit. */
-	if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[2]) {
+	if (percpu_counter_sum(sk->sk_prot->memory_allocated) >
+			sk->sk_prot->sysctl_mem[2]) {
 		sk->sk_prot->enter_memory_pressure();
 		goto suppress_allocation;
 	}
 
 	/* Under pressure. */
-	if (atomic_read(sk->sk_prot->memory_allocated) > sk->sk_prot->sysctl_mem[1])
+	if (percpu_counter_read(sk->sk_prot->memory_allocated) >
+			sk->sk_prot->sysctl_mem[1])
 		sk->sk_prot->enter_memory_pressure();
 
 	if (kind) {
@@ -259,7 +262,7 @@ suppress_allocation:
 
 	/* Alas. Undo changes. */
 	sk->sk_forward_alloc -= amt * SK_STREAM_MEM_QUANTUM;
-	atomic_sub(amt, sk->sk_prot->memory_allocated);
+	percpu_counter_mod_bh(sk->sk_prot->memory_allocated, -amt);
 	return 0;
 }
 
Index: linux-2.6.16-rc5mm3/net/decnet/af_decnet.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/decnet/af_decnet.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/decnet/af_decnet.c	2006-03-08 11:03:19.000000000 -0800
@@ -154,7 +154,7 @@ static const struct proto_ops dn_proto_o
 static DEFINE_RWLOCK(dn_hash_lock);
 static struct hlist_head dn_sk_hash[DN_SK_HASH_SIZE];
 static struct hlist_head dn_wild_sk;
-static atomic_t decnet_memory_allocated;
+static struct percpu_counter decnet_memory_allocated;
 
 static int __dn_setsockopt(struct socket *sock, int level, int optname, char __user *optval, int optlen, int flags);
 static int __dn_getsockopt(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen, int flags);
@@ -2383,7 +2383,8 @@ static int __init decnet_init(void)
 	rc = proto_register(&dn_proto, 1);
 	if (rc != 0)
 		goto out;
-
+	
+	percpu_counter_init(&decnet_memory_allocated);
 	dn_neigh_init();
 	dn_dev_init();
 	dn_route_init();
@@ -2424,6 +2425,7 @@ static void __exit decnet_exit(void)
 	proc_net_remove("decnet");
 
 	proto_unregister(&dn_proto);
+	percpu_counter_destroy(&decnet_memory_allocated);
 }
 module_exit(decnet_exit);
 #endif
Index: linux-2.6.16-rc5mm3/net/ipv4/proc.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/proc.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/proc.c	2006-03-08 11:03:19.000000000 -0800
@@ -61,10 +61,10 @@ static int fold_prot_inuse(struct proto 
 static int sockstat_seq_show(struct seq_file *seq, void *v)
 {
 	socket_seq_show(seq);
-	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %d\n",
+	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n",
 		   fold_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
 		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
-		   atomic_read(&tcp_memory_allocated));
+		   percpu_counter_read_positive(&tcp_memory_allocated));
 	seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
 	seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
 	seq_printf(seq,  "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp.c	2006-03-08 11:16:33.000000000 -0800
@@ -283,7 +283,7 @@ EXPORT_SYMBOL(sysctl_tcp_mem);
 EXPORT_SYMBOL(sysctl_tcp_rmem);
 EXPORT_SYMBOL(sysctl_tcp_wmem);
 
-atomic_t tcp_memory_allocated;	/* Current allocated memory. */
+struct percpu_counter tcp_memory_allocated;	/* Current allocated memory. */
 atomic_t tcp_sockets_allocated;	/* Current number of TCP sockets. */
 
 EXPORT_SYMBOL(tcp_memory_allocated);
@@ -1593,7 +1593,7 @@ adjudge_to_death:
 		sk_stream_mem_reclaim(sk);
 		if (atomic_read(sk->sk_prot->orphan_count) > sysctl_tcp_max_orphans ||
 		    (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-		     atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
+		     percpu_counter_sum(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
 			if (net_ratelimit())
 				printk(KERN_INFO "TCP: too many of orphaned "
 				       "sockets\n");
@@ -2127,6 +2127,8 @@ void __init tcp_init(void)
 	       "(established %d bind %d)\n",
 	       tcp_hashinfo.ehash_size << 1, tcp_hashinfo.bhash_size);
 
+	percpu_counter_init(&tcp_memory_allocated);
+
 	tcp_register_congestion_control(&tcp_reno);
 }
 
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_input.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_input.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_input.c	2006-03-08 11:03:19.000000000 -0800
@@ -333,7 +333,7 @@ static void tcp_clamp_window(struct sock
 	if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
 	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
 	    !tcp_memory_pressure &&
-	    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
+	    percpu_counter_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
 		sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
 				    sysctl_tcp_rmem[2]);
 	}
@@ -3555,7 +3555,7 @@ static int tcp_should_expand_sndbuf(stru
 		return 0;
 
 	/* If we are under soft global TCP memory pressure, do not expand.  */
-	if (atomic_read(&tcp_memory_allocated) >= sysctl_tcp_mem[0])
+	if (percpu_counter_read(&tcp_memory_allocated) >= sysctl_tcp_mem[0])
 		return 0;
 
 	/* If we filled the congestion window, do not expand.  */
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_timer.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_timer.c	2006-03-08 10:36:07.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_timer.c	2006-03-08 11:39:05.000000000 -0800
@@ -80,7 +80,7 @@ static int tcp_out_of_resources(struct s
 
 	if (orphans >= sysctl_tcp_max_orphans ||
 	    (sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&
-	     atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
+	     percpu_counter_sum(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {
 		if (net_ratelimit())
 			printk(KERN_INFO "Out of socket memory\n");
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated
  2006-03-08  2:16   ` Andrew Morton
@ 2006-03-08 20:56     ` Ravikiran G Thirumalai
  0 siblings, 0 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08 20:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, davem, netdev, shai

On Tue, Mar 07, 2006 at 06:16:02PM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> >   
> >  +static inline int read_sockets_allocated(struct proto *prot)
> >  +{
> >  +	int total = 0;
> >  +	int cpu;
> >  +	for_each_cpu(cpu)
> >  +		total += *per_cpu_ptr(prot->sockets_allocated, cpu);
> >  +	return total;
> >  +}
> 
> This is likely too big to be inlined, plus sock.h doesn't include enough
> headers to reliably compile this code.
> 
> >  +static inline void mod_sockets_allocated(int *sockets_allocated, int count)
> >  +{
> >  +	(*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
> >  +	put_cpu();
> >  +}
> >  +
> 
> Ditto.

OK, here is a revised patch.


Change the atomic_t sockets_allocated member of struct proto to a 
per-cpu counter.

Signed-off-by: Pravin B. Shelar <pravins@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/net/sock.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/sock.h	2006-03-08 11:03:19.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/sock.h	2006-03-08 12:06:52.000000000 -0800
@@ -543,7 +543,7 @@ struct proto {
 	/* Memory pressure */
 	void			(*enter_memory_pressure)(void);
 	struct percpu_counter	*memory_allocated;	/* Current allocated memory. */
-	atomic_t		*sockets_allocated;	/* Current number of sockets. */
+	int                     *sockets_allocated;     /* Current number of sockets (percpu counter). */
 
 	/*
 	 * Pressure flag: try to collapse.
@@ -579,6 +579,12 @@ struct proto {
 	} stats[NR_CPUS];
 };
 
+extern int read_sockets_allocated(struct proto *prot);
+extern void mod_sockets_allocated(int *sockets_allocated, int count);
+
+#define inc_sockets_allocated(c) mod_sockets_allocated(c, 1)
+#define dec_sockets_allocated(c) mod_sockets_allocated(c, -1)
+
 extern int proto_register(struct proto *prot, int alloc_slab);
 extern void proto_unregister(struct proto *prot);
 
Index: linux-2.6.16-rc5mm3/include/net/tcp.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/net/tcp.h	2006-03-08 11:03:19.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/net/tcp.h	2006-03-08 11:39:40.000000000 -0800
@@ -226,7 +226,7 @@ extern int sysctl_tcp_mtu_probing;
 extern int sysctl_tcp_base_mss;
 
 extern struct percpu_counter tcp_memory_allocated;
-extern atomic_t tcp_sockets_allocated;
+extern int *tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
 /*
Index: linux-2.6.16-rc5mm3/net/core/sock.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/sock.c	2006-03-08 11:03:19.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/sock.c	2006-03-08 12:09:19.000000000 -0800
@@ -771,7 +771,7 @@ struct sock *sk_clone(const struct sock 
 		newsk->sk_sleep	 = NULL;
 
 		if (newsk->sk_prot->sockets_allocated)
-			atomic_inc(newsk->sk_prot->sockets_allocated);
+			inc_sockets_allocated(newsk->sk_prot->sockets_allocated); 
 	}
 out:
 	return newsk;
@@ -1451,6 +1451,25 @@ void sk_common_release(struct sock *sk)
 
 EXPORT_SYMBOL(sk_common_release);
 
+int read_sockets_allocated(struct proto *prot)
+{
+	int total = 0;
+	int cpu;
+	for_each_cpu(cpu)
+		total += *per_cpu_ptr(prot->sockets_allocated, cpu);
+	return total;
+}
+
+EXPORT_SYMBOL(read_sockets_allocated);
+
+void mod_sockets_allocated(int *sockets_allocated, int count)
+{
+	(*per_cpu_ptr(sockets_allocated, get_cpu())) += count;
+	put_cpu();
+}
+
+EXPORT_SYMBOL(mod_sockets_allocated);
+
 static DEFINE_RWLOCK(proto_list_lock);
 static LIST_HEAD(proto_list);
 
@@ -1620,7 +1639,7 @@ static void proto_seq_printf(struct seq_
 			"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
 		   proto->name,
 		   proto->obj_size,
-		   proto->sockets_allocated != NULL ? atomic_read(proto->sockets_allocated) : -1,
+		   proto->sockets_allocated != NULL ? read_sockets_allocated(proto) : -1,
 		   proto->memory_allocated != NULL ?
 				percpu_counter_read_positive(proto->memory_allocated) : -1,
 		   proto->memory_pressure != NULL ? *proto->memory_pressure ? "yes" : "no" : "NI",
Index: linux-2.6.16-rc5mm3/net/core/stream.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/core/stream.c	2006-03-08 11:08:28.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/core/stream.c	2006-03-08 11:39:40.000000000 -0800
@@ -242,7 +242,7 @@ int sk_stream_mem_schedule(struct sock *
 		return 1;
 
 	if (!*sk->sk_prot->memory_pressure ||
-	    sk->sk_prot->sysctl_mem[2] > atomic_read(sk->sk_prot->sockets_allocated) *
+	    sk->sk_prot->sysctl_mem[2] > read_sockets_allocated(sk->sk_prot) *
 				sk_stream_pages(sk->sk_wmem_queued +
 						atomic_read(&sk->sk_rmem_alloc) +
 						sk->sk_forward_alloc))
Index: linux-2.6.16-rc5mm3/net/ipv4/proc.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/proc.c	2006-03-08 11:03:19.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/proc.c	2006-03-08 11:39:40.000000000 -0800
@@ -63,7 +63,7 @@ static int sockstat_seq_show(struct seq_
 	socket_seq_show(seq);
 	seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %lu\n",
 		   fold_prot_inuse(&tcp_prot), atomic_read(&tcp_orphan_count),
-		   tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
+		   tcp_death_row.tw_count, read_sockets_allocated(&tcp_prot),
 		   percpu_counter_read_positive(&tcp_memory_allocated));
 	seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
 	seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp.c	2006-03-08 11:16:33.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp.c	2006-03-08 11:39:40.000000000 -0800
@@ -284,7 +284,7 @@ EXPORT_SYMBOL(sysctl_tcp_rmem);
 EXPORT_SYMBOL(sysctl_tcp_wmem);
 
 struct percpu_counter tcp_memory_allocated;	/* Current allocated memory. */
-atomic_t tcp_sockets_allocated;	/* Current number of TCP sockets. */
+int *tcp_sockets_allocated;   /* Current number of TCP sockets. */
 
 EXPORT_SYMBOL(tcp_memory_allocated);
 EXPORT_SYMBOL(tcp_sockets_allocated);
@@ -2055,6 +2055,12 @@ void __init tcp_init(void)
 	if (!tcp_hashinfo.bind_bucket_cachep)
 		panic("tcp_init: Cannot alloc tcp_bind_bucket cache.");
 
+	tcp_sockets_allocated = alloc_percpu(int);
+	if (!tcp_sockets_allocated)
+		panic("tcp_init: Cannot alloc tcp_sockets_allocated");
+
+	tcp_prot.sockets_allocated = tcp_sockets_allocated;
+
 	/* Size and allocate the main established and bind bucket
 	 * hash tables.
 	 *
Index: linux-2.6.16-rc5mm3/net/ipv4/tcp_ipv4.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv4/tcp_ipv4.c	2006-03-08 10:36:03.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv4/tcp_ipv4.c	2006-03-08 11:39:40.000000000 -0800
@@ -1273,7 +1273,7 @@ static int tcp_v4_init_sock(struct sock 
 	sk->sk_sndbuf = sysctl_tcp_wmem[1];
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
-	atomic_inc(&tcp_sockets_allocated);
+	inc_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1307,7 +1307,7 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	atomic_dec(&tcp_sockets_allocated);
+	dec_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1815,7 +1815,6 @@ struct proto tcp_prot = {
 	.unhash			= tcp_unhash,
 	.get_port		= tcp_v4_get_port,
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
-	.sockets_allocated	= &tcp_sockets_allocated,
 	.orphan_count		= &tcp_orphan_count,
 	.memory_allocated	= &tcp_memory_allocated,
 	.memory_pressure	= &tcp_memory_pressure,
Index: linux-2.6.16-rc5mm3/net/ipv6/tcp_ipv6.c
===================================================================
--- linux-2.6.16-rc5mm3.orig/net/ipv6/tcp_ipv6.c	2006-03-08 10:36:03.000000000 -0800
+++ linux-2.6.16-rc5mm3/net/ipv6/tcp_ipv6.c	2006-03-08 11:39:40.000000000 -0800
@@ -1375,7 +1375,7 @@ static int tcp_v6_init_sock(struct sock 
 	sk->sk_sndbuf = sysctl_tcp_wmem[1];
 	sk->sk_rcvbuf = sysctl_tcp_rmem[1];
 
-	atomic_inc(&tcp_sockets_allocated);
+	inc_sockets_allocated(tcp_sockets_allocated);
 
 	return 0;
 }
@@ -1573,7 +1573,6 @@ struct proto tcpv6_prot = {
 	.unhash			= tcp_unhash,
 	.get_port		= tcp_v6_get_port,
 	.enter_memory_pressure	= tcp_enter_memory_pressure,
-	.sockets_allocated	= &tcp_sockets_allocated,
 	.memory_allocated	= &tcp_memory_allocated,
 	.memory_pressure	= &tcp_memory_pressure,
 	.orphan_count		= &tcp_orphan_count,
@@ -1605,6 +1604,7 @@ static struct inet_protosw tcpv6_protosw
 
 void __init tcpv6_init(void)
 {
+	tcpv6_prot.sockets_allocated = tcp_sockets_allocated;
 	/* register inet6 protocol */
 	if (inet6_add_protocol(&tcpv6_protocol, IPPROTO_TCP) < 0)
 		printk(KERN_ERR "tcpv6_init: Could not register protocol\n");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 20:36       ` Benjamin LaHaise
@ 2006-03-08 21:07         ` Ravikiran G Thirumalai
  2006-03-08 21:17           ` Benjamin LaHaise
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08 21:07 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 03:36:42PM -0500, Benjamin LaHaise wrote:
> On Wed, Mar 08, 2006 at 12:26:56PM -0800, Ravikiran G Thirumalai wrote:
> > +static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
> > +{
> > +	local_bh_disable();
> > +	fbc->count += amount;
> > +	local_bh_enable();
> > +}
> 
> Please use local_t instead, then you don't have to do the local_bh_disable() 
> and enable() and the whole thing collapses down into 1 instruction on x86.

But on non x86, local_bh_disable() is gonna be cheaper than a cli/atomic op no?
(Even if they were switched over to do local_irq_save() and
local_irq_restore() from atomic_t's that is).

And if we use local_t, we will add the overhead for the non bh 
percpu_counter_mod for non x86 arches.

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 21:07         ` Ravikiran G Thirumalai
@ 2006-03-08 21:17           ` Benjamin LaHaise
  2006-03-08 22:25             ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Benjamin LaHaise @ 2006-03-08 21:17 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 01:07:26PM -0800, Ravikiran G Thirumalai wrote:
> But on non x86, local_bh_disable() is gonna be cheaper than a cli/atomic op no?
> (Even if they were switched over to do local_irq_save() and
> local_irq_restore() from atomic_t's that is).

It's still more expensive than local_t.

> And if we use local_t, we will add the overhead for the non bh 
> percpu_counter_mod for non x86 arches.

Last time I checked, all the major architectures had efficient local_t 
implementations.  Most of the RISC CPUs are able to do a load / store 
conditional implementation that is the same cost (since memory barriers 
tend to be explicite on powerpc).  So why not use it?

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 21:17           ` Benjamin LaHaise
@ 2006-03-08 22:25             ` Ravikiran G Thirumalai
  2006-03-08 22:41               ` Benjamin LaHaise
  2006-03-08 23:06               ` Andrew Morton
  0 siblings, 2 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-08 22:25 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 04:17:33PM -0500, Benjamin LaHaise wrote:
> On Wed, Mar 08, 2006 at 01:07:26PM -0800, Ravikiran G Thirumalai wrote:
> 
> Last time I checked, all the major architectures had efficient local_t 
> implementations.  Most of the RISC CPUs are able to do a load / store 
> conditional implementation that is the same cost (since memory barriers 
> tend to be explicite on powerpc).  So why not use it?

Then, for the batched percpu_counters, we could gain by using local_t only for 
the UP case. But we will have to have a new local_long_t implementation 
for that.  Do you think just one use case of local_long_t warrants for a new
set of apis?

Kiran

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 22:25             ` Ravikiran G Thirumalai
@ 2006-03-08 22:41               ` Benjamin LaHaise
  2006-03-08 23:43                 ` Andrew Morton
  2006-03-08 23:06               ` Andrew Morton
  1 sibling, 1 reply; 31+ messages in thread
From: Benjamin LaHaise @ 2006-03-08 22:41 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 02:25:28PM -0800, Ravikiran G Thirumalai wrote:
> Then, for the batched percpu_counters, we could gain by using local_t only for 
> the UP case. But we will have to have a new local_long_t implementation 
> for that.  Do you think just one use case of local_long_t warrants for a new
> set of apis?

I think it may make more sense to simply convert local_t into a long, given 
that most of the users will be things like stats counters.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 22:25             ` Ravikiran G Thirumalai
  2006-03-08 22:41               ` Benjamin LaHaise
@ 2006-03-08 23:06               ` Andrew Morton
  2006-03-08 23:12                 ` Andrew Morton
  2006-03-09  2:21                 ` Andi Kleen
  1 sibling, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2006-03-08 23:06 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: bcrl, linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> On Wed, Mar 08, 2006 at 04:17:33PM -0500, Benjamin LaHaise wrote:
> > On Wed, Mar 08, 2006 at 01:07:26PM -0800, Ravikiran G Thirumalai wrote:
> > 
> > Last time I checked, all the major architectures had efficient local_t 
> > implementations.  Most of the RISC CPUs are able to do a load / store 
> > conditional implementation that is the same cost (since memory barriers 
> > tend to be explicite on powerpc).  So why not use it?
> 
> Then, for the batched percpu_counters, we could gain by using local_t only for 
> the UP case. But we will have to have a new local_long_t implementation 
> for that.  Do you think just one use case of local_long_t warrants for a new
> set of apis?
> 

local_t maps onto 32-bit values on 32-bit machines and onto 64-bit values
on 64-bit machines.  unsigned longs.  I don't quite trust the signedness
handling across all archs.

<looks>

Yes, alpha (for example) went and made its local_t's signed, which is wrong
and dangerous.

ia64 is signed.

mips is signed.

parisc is signed.

s390 is signed.

sparc64 is signed.

x86_64 is signed 32-bit!

All other architectures use unsigned long.  A fiasco.

Once decrapify-asm-generic-localh.patch is merged I think all architectures
can and should use asm-generic/local.h.

Until decrapify-asm-generic-localh.patch has been merged and the downstream
arch consolidation has happened and the signedness problems have been
carefully reviewed and fixed I wouldn't go within a mile of local_t.

Once all that is sorted out then yes, it makes sense to convert per-cpu
counters to local_t.  Note that local_t is unsigned, and percpu_counter
needs to treat it as signed.

We should also move the out-of-line percpu_counter implementation over to
lib/something.c (in obj-y).

But none of that has anything to do with these patches.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 23:06               ` Andrew Morton
@ 2006-03-08 23:12                 ` Andrew Morton
  2006-03-09  2:21                 ` Andi Kleen
  1 sibling, 0 replies; 31+ messages in thread
From: Andrew Morton @ 2006-03-08 23:12 UTC (permalink / raw)
  To: kiran, bcrl, linux-kernel, davem, netdev, shai

Andrew Morton <akpm@osdl.org> wrote:
>
> Once decrapify-asm-generic-localh.patch is merged I think all architectures
>  can and should use asm-generic/local.h.

err, no.  Because that's just atomic_long_t, and that's a locked instruction.

We need to review and fix up those architectures which have implemented the
optimised versions.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 22:41               ` Benjamin LaHaise
@ 2006-03-08 23:43                 ` Andrew Morton
  2006-03-09  0:18                   ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-08 23:43 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: kiran, linux-kernel, davem, netdev, shai

Benjamin LaHaise <bcrl@kvack.org> wrote:
>
> On Wed, Mar 08, 2006 at 02:25:28PM -0800, Ravikiran G Thirumalai wrote:
> > Then, for the batched percpu_counters, we could gain by using local_t only for 
> > the UP case. But we will have to have a new local_long_t implementation 
> > for that.  Do you think just one use case of local_long_t warrants for a new
> > set of apis?
> 
> I think it may make more sense to simply convert local_t into a long, given 
> that most of the users will be things like stats counters.
> 

Yes, I agree that making local_t signed would be better.  It's consistent
with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.

Perhaps.  A lot of applications would just be upcounters for statistics,
where unsigned is desired.  But I think the consistency argument wins out.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 23:43                 ` Andrew Morton
@ 2006-03-09  0:18                   ` Ravikiran G Thirumalai
  2006-03-09  0:32                     ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-09  0:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Benjamin LaHaise, linux-kernel, davem, netdev, shai

On Wed, Mar 08, 2006 at 03:43:21PM -0800, Andrew Morton wrote:
> Benjamin LaHaise <bcrl@kvack.org> wrote:
> >
> > I think it may make more sense to simply convert local_t into a long, given 
> > that most of the users will be things like stats counters.
> > 
> 
> Yes, I agree that making local_t signed would be better.  It's consistent
> with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.
> 
> Perhaps.  A lot of applications would just be upcounters for statistics,
> where unsigned is desired.  But I think the consistency argument wins out.

It already is... for most of the arches except x86_64.
And on -mm, the asm-generic version uses atomic_long_t for local_t (signed
long) which seems right.

Although, I wonder why we use:

#define local_read(l) ((unsigned long)atomic_long_read(&(l)->a))

It would return a huge value if the local counter was even -1 no?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  0:18                   ` Ravikiran G Thirumalai
@ 2006-03-09  0:32                     ` Andrew Morton
  2006-03-09  8:06                       ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2006-03-09  0:32 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: bcrl, linux-kernel, davem, netdev, shai

Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
>
> On Wed, Mar 08, 2006 at 03:43:21PM -0800, Andrew Morton wrote:
> > Benjamin LaHaise <bcrl@kvack.org> wrote:
> > >
> > > I think it may make more sense to simply convert local_t into a long, given 
> > > that most of the users will be things like stats counters.
> > > 
> > 
> > Yes, I agree that making local_t signed would be better.  It's consistent
> > with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.
> > 
> > Perhaps.  A lot of applications would just be upcounters for statistics,
> > where unsigned is desired.  But I think the consistency argument wins out.
> 
> It already is... for most of the arches except x86_64.

x86 uses unsigned long.

> And on -mm, the asm-generic version uses atomic_long_t for local_t (signed
> long) which seems right.

No, it uses unsigned long.  The only place where signedness matters is
local_read(), and there it is typecast to ulong.

> Although, I wonder why we use:
> 
> #define local_read(l) ((unsigned long)atomic_long_read(&(l)->a))
> 
> It would return a huge value if the local counter was even -1 no?

It's casting a signed long to an unsigned long.  That does the right thing.
Yes, it'll convert -1 to 0xffffffff[ffffffff].


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-08 23:06               ` Andrew Morton
  2006-03-08 23:12                 ` Andrew Morton
@ 2006-03-09  2:21                 ` Andi Kleen
  2006-03-09  2:32                   ` Andrew Morton
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2006-03-09  2:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bcrl, linux-kernel, davem, netdev, shai

Andrew Morton <akpm@osdl.org> writes:
> 
> x86_64 is signed 32-bit!

I'll change it. You want signed 64bit?

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  2:21                 ` Andi Kleen
@ 2006-03-09  2:32                   ` Andrew Morton
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Morton @ 2006-03-09  2:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: bcrl, linux-kernel, davem, netdev, shai

Andi Kleen <ak@suse.de> wrote:
>
> Andrew Morton <akpm@osdl.org> writes:
> > 
> > x86_64 is signed 32-bit!
> 
> I'll change it. You want signed 64bit?
> 

Well it's all random at present.  Since the API is defined as unsigned I
guess it's be best to make it unsigned for now.  Later, when someone gets
down to making it signed and reviewing all the users they can flip x86_64
to signed along with the rest of the architectures.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  8:06                       ` Ravikiran G Thirumalai
@ 2006-03-09  4:14                         ` Andi Kleen
  2006-03-09  8:14                         ` Nick Piggin
  1 sibling, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2006-03-09  4:14 UTC (permalink / raw)
  To: Ravikiran G Thirumalai
  Cc: Andrew Morton, bcrl, linux-kernel, davem, netdev, shai

On Thursday 09 March 2006 09:06, Ravikiran G Thirumalai wrote:
> On Wed, Mar 08, 2006 at 04:32:58PM -0800, Andrew Morton wrote:
> > Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> > >
> > > On Wed, Mar 08, 2006 at 03:43:21PM -0800, Andrew Morton wrote:
> > > > Benjamin LaHaise <bcrl@kvack.org> wrote:
> > > > >
> > > > > I think it may make more sense to simply convert local_t into a long, given 
> > > > > that most of the users will be things like stats counters.
> > > > > 
> > > > 
> > > > Yes, I agree that making local_t signed would be better.  It's consistent
> > > > with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.
> > > > 
> > > > Perhaps.  A lot of applications would just be upcounters for statistics,
> > > > where unsigned is desired.  But I think the consistency argument wins out.
> > > 
> > > It already is... for most of the arches except x86_64.
> > 
> > x86 uses unsigned long.
> 
> Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
> This keeps local_t unsigned long.  (We can change it to signed value 
> along with other arches later in one go I guess)

I already did that change in my tree
(except it's currently unsigned long as Andrew Indicated) 

-Andi


>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  0:32                     ` Andrew Morton
@ 2006-03-09  8:06                       ` Ravikiran G Thirumalai
  2006-03-09  4:14                         ` Andi Kleen
  2006-03-09  8:14                         ` Nick Piggin
  0 siblings, 2 replies; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-09  8:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bcrl, linux-kernel, davem, netdev, shai, Andi Kleen

On Wed, Mar 08, 2006 at 04:32:58PM -0800, Andrew Morton wrote:
> Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> >
> > On Wed, Mar 08, 2006 at 03:43:21PM -0800, Andrew Morton wrote:
> > > Benjamin LaHaise <bcrl@kvack.org> wrote:
> > > >
> > > > I think it may make more sense to simply convert local_t into a long, given 
> > > > that most of the users will be things like stats counters.
> > > > 
> > > 
> > > Yes, I agree that making local_t signed would be better.  It's consistent
> > > with atomic_t, atomic64_t and atomic_long_t and it's a bit more flexible.
> > > 
> > > Perhaps.  A lot of applications would just be upcounters for statistics,
> > > where unsigned is desired.  But I think the consistency argument wins out.
> > 
> > It already is... for most of the arches except x86_64.
> 
> x86 uses unsigned long.

Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
This keeps local_t unsigned long.  (We can change it to signed value 
along with other arches later in one go I guess) 

Thanks,
Kiran


Change x86_64 local_t to 64 bits like all other arches.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>

Index: linux-2.6.16-rc5mm3/include/asm-x86_64/local.h
===================================================================
--- linux-2.6.16-rc5mm3.orig/include/asm-x86_64/local.h	2006-03-08 16:51:31.000000000 -0800
+++ linux-2.6.16-rc5mm3/include/asm-x86_64/local.h	2006-03-08 21:56:01.000000000 -0800
@@ -5,18 +5,18 @@
 
 typedef struct
 {
-	volatile unsigned int counter;
+	volatile long counter;
 } local_t;
 
 #define LOCAL_INIT(i)	{ (i) }
 
-#define local_read(v)	((v)->counter)
+#define local_read(v)	((unsigned long)(v)->counter)
 #define local_set(v,i)	(((v)->counter) = (i))
 
 static __inline__ void local_inc(local_t *v)
 {
 	__asm__ __volatile__(
-		"incl %0"
+		"incq %0"
 		:"=m" (v->counter)
 		:"m" (v->counter));
 }
@@ -24,7 +24,7 @@ static __inline__ void local_inc(local_t
 static __inline__ void local_dec(local_t *v)
 {
 	__asm__ __volatile__(
-		"decl %0"
+		"decq %0"
 		:"=m" (v->counter)
 		:"m" (v->counter));
 }
@@ -32,7 +32,7 @@ static __inline__ void local_dec(local_t
 static __inline__ void local_add(unsigned int i, local_t *v)
 {
 	__asm__ __volatile__(
-		"addl %1,%0"
+		"addq %1,%0"
 		:"=m" (v->counter)
 		:"ir" (i), "m" (v->counter));
 }
@@ -40,7 +40,7 @@ static __inline__ void local_add(unsigne
 static __inline__ void local_sub(unsigned int i, local_t *v)
 {
 	__asm__ __volatile__(
-		"subl %1,%0"
+		"subq %1,%0"
 		:"=m" (v->counter)
 		:"ir" (i), "m" (v->counter));
 }
@@ -71,4 +71,4 @@ static __inline__ void local_sub(unsigne
 #define __cpu_local_add(i, v)	cpu_local_add((i), (v))
 #define __cpu_local_sub(i, v)	cpu_local_sub((i), (v))
 
-#endif /* _ARCH_I386_LOCAL_H */
+#endif /* _ARCH_X8664_LOCAL_H */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  8:06                       ` Ravikiran G Thirumalai
  2006-03-09  4:14                         ` Andi Kleen
@ 2006-03-09  8:14                         ` Nick Piggin
  2006-03-09  8:22                           ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 31+ messages in thread
From: Nick Piggin @ 2006-03-09  8:14 UTC (permalink / raw)
  To: Ravikiran G Thirumalai
  Cc: Andrew Morton, bcrl, linux-kernel, davem, netdev, shai, Andi Kleen

Ravikiran G Thirumalai wrote:

> Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
> This keeps local_t unsigned long.  (We can change it to signed value 
> along with other arches later in one go I guess) 
> 

Why not just keep naming and structure of interfaces consistent with
atomic_t?

That would be signed and 32-bit. You then also have a local64_t.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  8:14                         ` Nick Piggin
@ 2006-03-09  8:22                           ` Ravikiran G Thirumalai
  2006-03-09  8:41                             ` Nick Piggin
  0 siblings, 1 reply; 31+ messages in thread
From: Ravikiran G Thirumalai @ 2006-03-09  8:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, bcrl, linux-kernel, davem, netdev, shai, Andi Kleen

On Thu, Mar 09, 2006 at 07:14:26PM +1100, Nick Piggin wrote:
> Ravikiran G Thirumalai wrote:
> 
> >Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
> >This keeps local_t unsigned long.  (We can change it to signed value 
> >along with other arches later in one go I guess) 
> >
> 
> Why not just keep naming and structure of interfaces consistent with
> atomic_t?
> 
> That would be signed and 32-bit. You then also have a local64_t.

No, local_t is supposed to be 64-bits on 64bits arches and 32 bit on 32 bit
arches.  x86_64 was the only exception, so this patch fixes that.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  8:22                           ` Ravikiran G Thirumalai
@ 2006-03-09  8:41                             ` Nick Piggin
  2006-03-09 18:39                               ` Benjamin LaHaise
  0 siblings, 1 reply; 31+ messages in thread
From: Nick Piggin @ 2006-03-09  8:41 UTC (permalink / raw)
  To: Ravikiran G Thirumalai
  Cc: Andrew Morton, bcrl, linux-kernel, davem, netdev, shai, Andi Kleen

Ravikiran G Thirumalai wrote:
> On Thu, Mar 09, 2006 at 07:14:26PM +1100, Nick Piggin wrote:
> 
>>Ravikiran G Thirumalai wrote:
>>
>>
>>>Here's a patch making x86_64 local_t to 64 bits like other 64 bit arches.
>>>This keeps local_t unsigned long.  (We can change it to signed value 
>>>along with other arches later in one go I guess) 
>>>
>>
>>Why not just keep naming and structure of interfaces consistent with
>>atomic_t?
>>
>>That would be signed and 32-bit. You then also have a local64_t.
> 
> 
> No, local_t is supposed to be 64-bits on 64bits arches and 32 bit on 32 bit
> arches.  x86_64 was the only exception, so this patch fixes that.
> 
> 

Right. If it wasn't I wouldn't have proposed the change.

Considering that local_t has been broken so that basically nobody
is using it, now is a great time to rethink the types before it
gets fixed and people start using it.

And modelling the type on the atomic types would make the most
sense because everyone already knows them.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh
  2006-03-09  8:41                             ` Nick Piggin
@ 2006-03-09 18:39                               ` Benjamin LaHaise
  0 siblings, 0 replies; 31+ messages in thread
From: Benjamin LaHaise @ 2006-03-09 18:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ravikiran G Thirumalai, Andrew Morton, linux-kernel, davem,
	netdev, shai, Andi Kleen

On Thu, Mar 09, 2006 at 07:41:08PM +1100, Nick Piggin wrote:
> Considering that local_t has been broken so that basically nobody
> is using it, now is a great time to rethink the types before it
> gets fixed and people start using it.

I'm starting to get more concerned as the per-cpu changes that keep adding 
more overhead is getting out of hand.

> And modelling the type on the atomic types would make the most
> sense because everyone already knows them.

Except that the usage models are different; local_t is most likely to be 
used for cpu local statistics or for sequences, where making them signed 
is a bit backwards.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2006-03-09 19:00 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-08  1:58 [patch 0/4] net: percpufy frequently used vars on struct proto Ravikiran G Thirumalai
2006-03-08  1:59 ` [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh Ravikiran G Thirumalai
2006-03-08  2:13   ` Andrew Morton
2006-03-08 20:26     ` Ravikiran G Thirumalai
2006-03-08 20:36       ` Benjamin LaHaise
2006-03-08 21:07         ` Ravikiran G Thirumalai
2006-03-08 21:17           ` Benjamin LaHaise
2006-03-08 22:25             ` Ravikiran G Thirumalai
2006-03-08 22:41               ` Benjamin LaHaise
2006-03-08 23:43                 ` Andrew Morton
2006-03-09  0:18                   ` Ravikiran G Thirumalai
2006-03-09  0:32                     ` Andrew Morton
2006-03-09  8:06                       ` Ravikiran G Thirumalai
2006-03-09  4:14                         ` Andi Kleen
2006-03-09  8:14                         ` Nick Piggin
2006-03-09  8:22                           ` Ravikiran G Thirumalai
2006-03-09  8:41                             ` Nick Piggin
2006-03-09 18:39                               ` Benjamin LaHaise
2006-03-08 23:06               ` Andrew Morton
2006-03-08 23:12                 ` Andrew Morton
2006-03-09  2:21                 ` Andi Kleen
2006-03-09  2:32                   ` Andrew Morton
2006-03-08  2:01 ` [patch 2/4] net: percpufy frequently used vars -- struct proto.memory_allocated Ravikiran G Thirumalai
2006-03-08  2:14   ` Andrew Morton
2006-03-08  3:08     ` Ravikiran G Thirumalai
2006-03-08  3:22       ` Andrew Morton
2006-03-08 20:54         ` Ravikiran G Thirumalai
2006-03-08  2:02 ` [patch 3/4] net: percpufy frequently used vars -- proto.sockets_allocated Ravikiran G Thirumalai
2006-03-08  2:16   ` Andrew Morton
2006-03-08 20:56     ` Ravikiran G Thirumalai
2006-03-08  2:03 ` [patch 4/4] net: percpufy frequently used vars -- proto.inuse Ravikiran G Thirumalai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).