From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932261AbWF2TKf (ORCPT ); Thu, 29 Jun 2006 15:10:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932255AbWF2TKf (ORCPT ); Thu, 29 Jun 2006 15:10:35 -0400 Received: from e2.ny.us.ibm.com ([32.97.182.142]:6613 "EHLO e2.ny.us.ibm.com") by vger.kernel.org with ESMTP id S932251AbWF2TKe (ORCPT ); Thu, 29 Jun 2006 15:10:34 -0400 Message-ID: <44A425A7.2060900@watson.ibm.com> Date: Thu, 29 Jun 2006 15:10:31 -0400 From: Shailabh Nagar User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Morton CC: Paul Jackson , Valdis.Kletnieks@vt.edu, jlan@engr.sgi.com, balbir@in.ibm.com, csturtiv@sgi.com, linux-kernel@vger.kernel.org, Jamal , netdev Subject: Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats References: <44892610.6040001@watson.ibm.com> <20060609042129.ae97018c.akpm@osdl.org> <4489EE7C.3080007@watson.ibm.com> <449999D1.7000403@engr.sgi.com> <44999A98.8030406@engr.sgi.com> <44999F5A.2080809@watson.ibm.com> <4499D7CD.1020303@engr.sgi.com> <449C2181.6000007@watson.ibm.com> <20060623141926.b28a5fc0.akpm@osdl.org> <449C6620.1020203@engr.sgi.com> <20060623164743.c894c314.akpm@osdl.org> <449CAA78.4080902@watson.ibm.com> <20060623213912.96056b02.akpm@osdl.org> <449CD4B3.8020300@watson.ibm.com> <44A01A50.1050403@sgi.com> <20060626105548.edef4c64.akpm@osdl.org> <44A020CD.30903@watson.ibm.com> <20060626111249.7aece36e.akpm@osdl.org> <44A026ED.8080903@sgi.com> <20060626113959.839d72bc.akpm@osdl.org> <44A2F50D.8030306@engr.sgi.com> <20060628145341.529a61ab.akpm@osdl.org> <44A2FC72.9090407@engr.sgi.com> <20060629014050.d3bf0be4.pj@sgi.com> <200606291230.k5TCUg45030710@turing-police.cc.vt.edu> <20060629094408.360ac157.pj@sgi.com> <20060629110107.2e56310b.akpm@osdl.org> In-Reply-To: <20060629110107.2e56310b.akpm@osdl.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: >On Thu, 29 Jun 2006 09:44:08 -0700 >Paul Jackson wrote: > > > >>>You're probably correct on that model. However, it all depends on the actual >>>workload. Are people who actually have large-CPU (>256) systems actually >>>running fork()-heavy things like webservers on them, or are they running things >>>like database servers and computations, which tend to have persistent >>>processes? >>> >>> >>It may well be mostly as you say - the large-CPU systems not running >>the fork() heavy jobs. >> >>Sooner or later, someone will want to run a fork()-heavy job on a >>large-CPU system. On a 1024 CPU system, it would apparently take >>just 14 exits/sec/CPU to hit this bottleneck, if Jay's number of >>14000 applied. >> >>Chris Sturdivant's reply is reasonable -- we'll hit it sooner or later, >>and deal with it then. >> >> >> > >I agree, and I'm viewing this as blocking the taskstats merge. Because if >this _is_ a problem then it's a big one because fixing it will be >intrusive, and might well involve userspace-visible changes. > > First off, just a reminder that this is inherently a netlink flow control issue...which was being exacerbated earlier by taskstats decision to send per-tgid data (no longer the case). But I'd like to know whats our target here ? How many messages per second do we want to be able to be sent and received without risking any loss of data ? Netlink will lose messages at a high enough rate so the design point will need to be known a bit. For statistics type usage of the genetlink/netlink, I would have thought that userspace, provided it is reliably informed about the loss of data through ENOBUFS, could take measures to just account for the missing data and carry on ? >The only ways I can see of fixing the problem generally are to either > >a) throw more CPU(s) at stats collection: allow userspace to register for > "stats generated by CPU N", then run a stats collection daemon on each > CPU or > > >b) make the kernel recognise when it's getting overloaded and switch to > some degraded mode where it stops trying to send all the data to > userspace - just send a summary, or a "we goofed" message or something. > > One of the unused features of genetlink that's meant for high volume data output from the kernel is the "dump" callback of a genetlink connection. Essentially kernel space keeps getting provided sk_buffs to fill which the netlink layer then supplies to user space (over time I guess ?) But whatever we do, there's going to be some limit so its useful to decide what the design point should be ? Adding Jamal for his thoughts on netlink's flow control in the context of genetlink. --Shailabh