From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neil Horman Subject: Re: [PATCH net-next 2/3] tipc: byte-based overload control on socket receive queue Date: Tue, 19 Feb 2013 09:26:29 -0500 Message-ID: <20130219142629.GA31871@hmsreliant.think-freely.org> References: <1360969067-29956-1-git-send-email-paul.gortmaker@windriver.com> <1360969067-29956-3-git-send-email-paul.gortmaker@windriver.com> <20130218144757.GA26199@hmsreliant.think-freely.org> <512332DA.5040508@ericsson.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Paul Gortmaker , David Miller , netdev@vger.kernel.org, Ying Xue To: Jon Maloy Return-path: Received: from charlotte.tuxdriver.com ([70.61.120.58]:55658 "EHLO smtp.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932906Ab3BSO0k (ORCPT ); Tue, 19 Feb 2013 09:26:40 -0500 Content-Disposition: inline In-Reply-To: <512332DA.5040508@ericsson.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Feb 19, 2013 at 09:07:54AM +0100, Jon Maloy wrote: > On 02/18/2013 09:47 AM, Neil Horman wrote: > > On Fri, Feb 15, 2013 at 05:57:46PM -0500, Paul Gortmaker wrote: > >> From: Ying Xue > >> > >> Change overload control to be purely byte-based, using > >> sk->sk_rmem_alloc as byte counter, and compare it to a calculated > >> upper limit for the socket receive queue. > > [...] > > >> + * > >> + * For all connectionless messages, by default new queue limits are > >> + * as belows: > >> + * > >> + * TIPC_LOW_IMPORTANCE (5MB) > >> + * TIPC_MEDIUM_IMPORTANCE (10MB) > >> + * TIPC_HIGH_IMPORTANCE (20MB) > >> + * TIPC_CRITICAL_IMPORTANCE (40MB) > >> + * > >> + * Returns overload limit according to corresponding message importance > >> + */ > >> +static unsigned int rcvbuf_limit(struct sock *sk, struct sk_buff *buf) > >> +{ > >> + struct tipc_msg *msg = buf_msg(buf); > >> + unsigned int limit; > >> + > >> + if (msg_connected(msg)) > >> + limit = CONN_OVERLOAD_LIMIT; > > This still strikes me as a bit wierd. If you really can't tolerate the default > > rmem settings in proc, have you considered separating the rmem and wmem values > > out into their own sysctls? > > Initially we tried to set this value as default for sk_rcvbuf, and then use > fractions of it as limits, as you suggest below. The problem we found was that > if we want to change this via setsockopt(SOL_SOCKET, SO_RCVBUF) the value range > we can use is very limited, and doesn't fit our purposes. > Can you elaborate on this please? The above doesn't really explain why you can't do what I suggested. Not asserting that what you say is untrue, mind you, I'm just trying to understand what it is about TIPC that requires such a specific reception buffer envelope, and how enforcing queue limits here is so important when packets could just as easily be dropped at the ip layer (with ostensibly no fatal failure). > We did consider to introduce a separate setsockopt at TIPC level for this, > but thought it had a value in itself to use the mechanism that is already there. > Hence the "re-interpretation" of sk_rcvbuf as we do below. > Considering the weird doubling of this parameter that is done elsewhere in the > code we thought that having our own interpretation might be acceptable. Thats quite different IMHO. The comments in sock_setsockopt make it pretty clear that the doubling of the rcvbuf value is done to account for the sk_buff overhead of packet reception, and thats documented in the socket(7) man page. What you have here is the ability to set sk_rcvbuf, and then have that setting be ignored, but only to within several different limits, depending on various conditions, all of which are not visible to user space. > We did of course see the potential issue with this, that is why we cc-ed > you for comments. I appreciate that. > Now I see that David already pulled the series, so I am a little uncertain > about how to proceed. I saw that too, and asked him about this. A follow-on patch (if we wind up deciding one is warranted) is the way to go here. Regards Neil