From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261876AbVFUA6h (ORCPT ); Mon, 20 Jun 2005 20:58:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261859AbVFUAzv (ORCPT ); Mon, 20 Jun 2005 20:55:51 -0400 Received: from tone.orchestra.cse.unsw.EDU.AU ([129.94.242.59]:51416 "EHLO tone.orchestra.cse.unsw.EDU.AU") by vger.kernel.org with ESMTP id S261758AbVFUAv4 (ORCPT ); Mon, 20 Jun 2005 20:51:56 -0400 From: Neil Brown To: Robert Love Date: Tue, 21 Jun 2005 10:51:25 +1000 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17079.25741.91251.232880@cse.unsw.edu.au> Cc: Andrew Morton , John McCutchan , Christoph Hellwig , arnd@arndb.de, zab@zabbo.net, linux-kernel@vger.kernel.org, viro@parcelfarce.linux.theplanet.co.uk Subject: Re: [patch] inotify. In-Reply-To: message from Robert Love on Friday June 17 References: <1118855899.3949.21.camel@betsy> <42B1BC4B.3010804@zabbo.net> <1118946334.3949.63.camel@betsy> <200506171907.39940.arnd@arndb.de> <20050617175605.GB1981@tentacle.dhs.org> <20050617143334.41a31707.akpm@osdl.org> <1119044430.7280.22.camel@phantasy> <1119052357.7280.24.camel@phantasy> X-Mailer: VM 7.19 under Emacs 21.4.1 X-face: [Gw_3E*Gng}4rRrKRYotwlE?.2|**#s9D On Fri, 2005-06-17 at 17:40 -0400, Robert Love wrote: > > > I wrote this up the first time you asked: > > > > Documentation/filesystems/inotify.txt > > If there is something I should add, just holler. I will write an > encyclopedia if need be. I haven't yet decided if I like it or not, but maybe I have some useful comments to make... > + > +Q: What is the design decision behind using an-fd-per-device as > +opposed to an fd-per-watch? > + > +A: An fd-per-watch quickly consumes more file descriptors than > +are allowed, more fd's than are feasible to manage, and more > +fd's than are ideally select()-able. Yes, root can bump the > +per-process fd limit and yes, users can use epoll, but requiring > +both is silly and an extraneous requirement. A watch consumes > +less memory than an open file, separating the number spaces is > +thus sensible. The current design is what user-space developers > +want: Users open the device, once, and add n watches, requiring > +but one fd and no twiddling with fd limits. > +Opening /dev/inotify two thousand times is silly. If we can > +implement user-space's preferences cleanly--and we can, the idr > +layer makes stuff like this trivial--then we should. > + To address the particular points: - "An fd-per-watch quickly consumes more file descriptors than are allowed" I note that the current limit if 'wd's is 8192 per user, compared with the default 'fd' limit of 1024 per process. So if a user has more than 8 processes making very heavy use of inotify, they would be better off with the file-descriptor limit... So I don't find this argument convincing. I would suggest that it be removed, and you put in a separate section discussion resource usage and limits. You could explain why you don't use rlimit (probably not a hard case to win) and why the sysadmin cannot give different limits to different users (as a sys-admin, I would fight that), and why no one user would ever want more than about 8 processes using inotify :-) - "more fd's than are feasible to manage," It's not clear what this means. The program will still need to manage lots of 'wd's. How is this different from handling 'fd's? The only 'manage'ment issue I could come up with is the hassle of shepherding all these 'wd's through a 'fork', only to close them before an 'exec'. Is that what you mean? If so it would help to make it more explicit. Using the fd from /dev/inotify like a 'bag' of 'wd's which can be closed as a whole or passed around (through fork or exec or even over a unix domain socket) is, I think, I strong point. You could possibly leverage that in selling the current inotify interface. - "and more fd's than are ideally select()-able" and "users can use epoll, but requiring [it] is silly and an extraneous requirement" Having to use epoll *should* not be a problem. "Surely" thought I "you an just select on the fd returned by epoll_create, and then adding inotify to your select() loop should be just as easy with epoll as with /dev/inotify". But then I discovered that you cannot select() on the epoll fd :-( Maybe this should be fixed rather than be used as an excuse. - "A watch consumes less memory than an open filep" This is largely valid. Two other possible responses: 1/ Make the filep smaller. Not every 'struct file' needs read-ahead data (which is BIG). Not every file needs the fown_struct. I suspect that a small and generic struct file could be created which was embedded inside a larger less generic struct file when needed.. If the size if filep is really an issue. 2/ An fd doesn't *have* to correspond to a 'struct file'. Suppose we allowed the 'watch_fd' systemcall to return an fd which actually mapped to a struct inotify_watch. Possibly using a few high bits to differentiate different types of 'fds'. Most system calls would reject these fds immediately. Those we wanted to handle them could easily be fixed. So sys_close would know how to close them. sys_fcntl would need to be able to set close-on-exec. Maybe a few others like poll. But from user-space, it would be 'just another fd'. You wouldn't be able to use these 'fds' in select(), but I think we can all agree that you probably wouldn't want to. It is also worth noting that there is a more telling reason to not use "struct file" than the size. A 'struct file' points to a dentry. A 'struct inotify_watch' points to an inode. This difference is fairly crucial to how inotify works. I hope some of these reflections are useful. NeilBrown