From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932400AbWDGJrz (ORCPT ); Fri, 7 Apr 2006 05:47:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932402AbWDGJrz (ORCPT ); Fri, 7 Apr 2006 05:47:55 -0400 Received: from relay.2ka.mipt.ru ([194.85.82.65]:39825 "EHLO 2ka.mipt.ru") by vger.kernel.org with ESMTP id S932400AbWDGJrz (ORCPT ); Fri, 7 Apr 2006 05:47:55 -0400 Date: Fri, 7 Apr 2006 13:47:32 +0400 From: Evgeniy Polyakov To: Yi Yang Cc: LKML , Andrew Morton , Matt Helsley Subject: Re: [2.6.16 PATCH] Filessytem Events Reporter V2 Message-ID: <20060407094732.GA13235@2ka.mipt.ru> References: <4433C456.7010708@gmail.com> <20060407062428.GA31351@2ka.mipt.ru> <44361F39.4020501@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <44361F39.4020501@gmail.com> User-Agent: Mutt/1.5.9i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.7.5 (2ka.mipt.ru [0.0.0.0]); Fri, 07 Apr 2006 13:47:35 +0400 (MSD) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 07, 2006 at 04:13:45PM +0800, Yi Yang (yang.y.yi@gmail.com) wrote: > >>+ > >>+ return (netlink_unicast(fsevent_sock, skb, pid, MSG_DONTWAIT)); > >> > > > >netlink_unicast() uses boolean value but ont MSG_* flags for nonblocking, > >so this should be netlink_unicast(fsevent_sock, skb, pid, 0); > > > a example invocation in file net/netlink/af_netlink.c: > netlink_unicast(in_skb->sk, skb, NETLINK_CB(in_skb).pid, MSG_DONTWAIT); > so, it hasn't any problem. Well... static inline long sock_sndtimeo(const struct sock *sk, int noblock) { return noblock ? 0 : sk->sk_sndtimeo; } int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 pid, int nonblock) { struct sock *sk; int err; long timeo; skb = netlink_trim(skb, gfp_any()); timeo = sock_sndtimeo(ssk, nonblock); I mean that it is boolean value, MSG_PEEK will produce the same result. But it is a matter of coding style probably. > >>+nlmsg_failure: > >>+ kfree_skb(skb); > >>+ return -1; > >>+} > >> > > > >... > > > > > >>+static void fsevent_recv(struct sock *sk, int len) > >>+{ > >>+ struct sk_buff *skb = NULL; > >>+ struct nlmsghdr *nlhdr = NULL; > >>+ struct fsevent_filter * filter = NULL; > >>+ pid_t pid; > >>+ > >>+ while ((skb = skb_dequeue(&sk->sk_receive_queue)) != NULL) { > >>+ skb_get(skb); > >>+ if (skb->len >= FSEVENT_FILTER_MSGSIZE) { > >>+ nlhdr = (struct nlmsghdr *)skb->data; > >>+ filter = NLMSG_DATA(nlhdr); > >>+ pid = NETLINK_CREDS(skb)->pid; > >>+ if (find_fsevent_listener(pid) == NULL) > >>+ atomic_inc(&fsevent_listener_num); > >>+ set_fsevent_filter(filter, pid); > >> > > > >What is the logic behind this steps? > >If there are no listeners you increment it's number no matter if it will > >or not be added in set_fsevent_filter(). > > > fsevent_recv is used to receive listener's commands, a listener must > send commands in order to get fsevents it > interests, so this is the best point to increment number of listeners. > set_fsevent_filter will add listener to listener > list, so it is OK. And what if set_fsevent_filter() fails? > >>+ } > >>+ kfree_skb(skb); > >>+ } > >>+} > >>+ > >>+#define DEFINE_FILTER_MATCH_FUNC(filtertype, key) \ > >>+ static int match_##filtertype(listener * p, \ > >>+ struct fsevent * event, \ > >>+ struct sk_buff * skb) \ > >>+ { \ > >>+ int ret = 0; \ > >>+ filtertype * xfilter = NULL; \ > >>+ struct sk_buff * skb2 = NULL; \ > >>+ struct list_head * head = &(p->key##_filter_list_head); \ > >>+ list_for_each_entry(xfilter, head, list) { \ > >>+ if (xfilter->key != event->key) \ > >>+ continue; \ > >>+ ret = filter_fsevent(xfilter->mask, event->type); \ > >>+ if ( ret != 0) \ > >>+ return -1; \ > >>+ skb2 = skb_clone(skb, GFP_KERNEL); \ > >>+ if (skb2 == NULL) \ > >>+ return -ENOMEM; \ > >>+ NETLINK_CB(skb2).dst_group = 0; \ > >>+ NETLINK_CB(skb2).dst_pid = p->pid; \ > >>+ NETLINK_CB(skb2).pid = 0; \ > >>+ return (netlink_unicast(fsevent_sock, skb2, \ > >>+ p->pid, MSG_DONTWAIT)); \ > >> > > > >The same issue about nonblocking sending. > > > > > >>+ } \ > >>+ return -ENODEV; \ > >>+ } \ > >>+ > >>+DEFINE_FILTER_MATCH_FUNC(pid_filter, pid) > >>+ > >>+DEFINE_FILTER_MATCH_FUNC(uid_filter, uid) > >>+ > >>+DEFINE_FILTER_MATCH_FUNC(gid_filter, gid) > >>+ > >>+#define MATCH_XID(key, listenerp, event, skb) \ > >>+ ret = match_##key##_filter(listenerp, event, skb); \ > >>+ if (ret == 0) { \ > >>+ kfree_skb(skb); \ > >>+ continue; \ > >>+ } \ > >>+ do {} while (0) \ > >>+ > >>+static int fsevent_send_to_process(struct sk_buff * skb) > >>+{ > >>+ listener * p = NULL, * q = NULL; > >>+ struct fsevent * event = NULL; > >>+ struct sk_buff * skb2 = NULL; > >>+ int ret = 0; > >>+ > >>+ event = (struct fsevent *)(skb->data + sizeof(struct nlmsghdr)); > >>+ spin_lock(&listener_list_lock); > >>+ list_for_each_entry_safe(p, q, &listener_list_head, list) { > >>+ MATCH_XID(pid, p, event, skb); > >>+ MATCH_XID(uid, p, event, skb); > >>+ MATCH_XID(gid, p, event, skb); > >>+ > >>+ if (filter_fsevent(p->mask, event->type) == 0) { > >>+ skb2 = skb_clone(skb, GFP_KERNEL); > >>+ if (skb2 == NULL) > >>+ return -ENOMEM; > >>+ NETLINK_CB(skb2).dst_group = 0; > >>+ NETLINK_CB(skb2).dst_pid = p->pid; > >>+ NETLINK_CB(skb2).pid = 0; > >>+ ret = netlink_unicast(fsevent_sock, skb2, > >>+ p->pid, MSG_DONTWAIT); > >>+ if (ret == -ECONNREFUSED) { > >>+ atomic_dec(&fsevent_listener_num); > >>+ cleanup_dead_listener(p); > >>+ } > >>+ } > >>+ } > >>+ spin_unlock(&listener_list_lock); > >>+ return ret; > >>+} > >>+ > >>+static void fsevent_commit(void * unused) > >>+{ > >>+ struct sk_buff * skb = NULL; > >>+ > >>+ while((skb = skb_dequeue(&get_cpu_var(fsevent_send_queue))) > >>+ != NULL) { > >>+ fsevent_send_to_process(skb); > >>+ put_cpu_var(fsevent_send_queue); > >>+ } > >>+} > >> > > > >Really strange mix of per-cpu variables for optimized performance and > >global spin locking. > >Consider using RCU for list of listeners. > > > per cpu queue is used to avoid raise_fsevent to contend spinlock, but > listener_list_lock just is used > to synchronize the operations of userspace applications(listener) on > listener list, it just protect listener > list. > > Of course, your advice is good, RCU will be better, I'm considering > substitute spinlock with RCU, > maybe list*_rcu functions can help me. You get global lock in each processor when traverse the list &listener_list_lock. And you call GFP_KERNEL allocation under that lock, which is wrong. If all your code is called from process context (it looks so), you could mutexes. > >You use unicast delivery for netlink messages. > >According to my investigation [1], it's performance is better only when > >there is only one listener (or maybe two in some cases), but then it is > >noticebly slower than broadcasting. > > > >1. http://marc.theaimsgroup.com/?l=linux-netdev&m=114424884216006&w=2 > > > Because fsevent has to deliver different events to different listeners, > so I must use netlink_unicast, > in fact, netlink_broadcast also must send skb to every member of the > group, so in my opinion, > they haven't big difference. And what if there are several listeners for the same type of events? > Can you explain why there is such a big difference between > netlink_unicast and netlink_broadcast? Netlink broadcast clones skbs, while unicasting requires the whole new one. > >Btw, you need some rebalancing of the per-cpu queues, probably in > >keventd, since CPUs can go offline and your messages will stuck foreve > >there. > > > Does keventd not do it? if so, keventd should be modified. How does keventd know about your own structures? You have an per-cpu object, but your keventd function gets object from running cpu, not from any other cpus. -- Evgeniy Polyakov