* [RFC] Migrating net/sched to new module interface @ 2003-01-02 22:50 Kronos 2003-01-03 5:10 ` Rusty Russell 0 siblings, 1 reply; 73+ messages in thread From: Kronos @ 2003-01-02 22:50 UTC (permalink / raw) To: linux-kernel; +Cc: kuznet, rusty Hi, I'm working on net/sched to convert it to new module interface. I'm going to add a new field 'owner' to struct Qdisc_ops. The init function of each module registers the callback functions and a reference to the module itself. When a qdisc is created (qdisc_create) I check try_module_get() to see if I can use the registered callback functions and if I can't I return -ENOSYS. When a qdisc is delete (qdisc_destroy) I use module_put() to decrement the refcount. The attacched patch shows this preliminar work on sch_api.c sch_generic.c. The next step is to update the other schedulers removing MOD_{INC,DEC}_USE_COUNT and adding the new 'owner' field. This work is based on my (maybe poor) understanding on the new module system, so I'm waiting some feedback before going on. diff --exclude=diff.exclude -ru linux-2.5.orig/include/net/pkt_sched.h linux-2.5/include/net/pkt_sched.h --- linux-2.5.orig/include/net/pkt_sched.h Sun Aug 11 22:07:18 2002 +++ linux-2.5/include/net/pkt_sched.h Thu Jan 2 22:50:57 2003 @@ -10,6 +10,7 @@ #include <linux/config.h> #include <linux/types.h> #include <linux/pkt_sched.h> +#include <linux/module.h> #include <net/pkt_cls.h> #ifdef CONFIG_X86_TSC @@ -67,6 +68,8 @@ int (*change)(struct Qdisc *, struct rtattr *arg); int (*dump)(struct Qdisc *, struct sk_buff *); + /* Protects callbacks */ + struct module *owner; }; extern rwlock_t qdisc_tree_lock; diff --exclude=diff.exclude -ru linux-2.5.orig/net/sched/sch_api.c linux-2.5/net/sched/sch_api.c --- linux-2.5.orig/net/sched/sch_api.c Thu Jan 2 21:55:40 2003 +++ linux-2.5/net/sched/sch_api.c Thu Jan 2 22:59:26 2003 @@ -409,6 +409,10 @@ if (ops == NULL) goto err_out; + err = -ENOSYS; + if (try_module_get(ops->owner) == 0) + goto err_module; + size = sizeof(*sch) + ops->priv_size; sch = kmalloc(size, GFP_KERNEL); @@ -416,12 +420,6 @@ if (!sch) goto err_out; - /* Grrr... Resolve race condition with module unload */ - - err = -EINVAL; - if (ops != qdisc_lookup_ops(kind)) - goto err_out; - memset(sch, 0, size); skb_queue_head_init(&sch->q); @@ -460,6 +458,8 @@ } err_out: + module_put(ops->owner); +err_module: *errp = err; if (sch) kfree(sch); diff --exclude=diff.exclude -ru linux-2.5.orig/net/sched/sch_generic.c linux-2.5/net/sched/sch_generic.c --- linux-2.5.orig/net/sched/sch_generic.c Thu Jan 2 22:41:24 2003 +++ linux-2.5/net/sched/sch_generic.c Thu Jan 2 23:07:55 2003 @@ -29,6 +29,7 @@ #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/init.h> +#include <linux/module.h> #include <net/sock.h> #include <net/pkt_sched.h> @@ -356,6 +357,8 @@ .init = pfifo_fast_init, .reset = pfifo_fast_reset, + + .owner = THIS_MODULE, }; struct Qdisc * qdisc_create_dflt(struct net_device *dev, struct Qdisc_ops *ops) @@ -422,6 +425,7 @@ ops->reset(qdisc); if (ops->destroy) ops->destroy(qdisc); + module_put(ops->owner); if (!(qdisc->flags&TCQ_F_BUILTIN)) kfree(qdisc); } ciao, Luca -- Reply-To: kronos@kronoz.cjb.net Home: http://kronoz.cjb.net "Chi parla in tono cortese, ma continua a prepararsi, potra` andare avanti; chi parla in tono bellicoso e avanza rapidamente dovra` ritirarsi" Sun Tzu -- L'arte della guerra ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-02 22:50 [RFC] Migrating net/sched to new module interface Kronos @ 2003-01-03 5:10 ` Rusty Russell 2003-01-03 8:37 ` David S. Miller 2003-01-13 22:32 ` kuznet 0 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-01-03 5:10 UTC (permalink / raw) To: kronos; +Cc: linux-kernel, kuznet In message <20030102225010.GA1197@dreamland.darkstar.lan> you write: > Hi, > > I'm working on net/sched to convert it to new module interface. I'm > going to add a new field 'owner' to struct Qdisc_ops. Hmm, I thought the sched stuff all runs under the network brlock? If so, it doesn't need to be held in, since it's not preemptible. I haven't checked, though... Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-03 5:10 ` Rusty Russell @ 2003-01-03 8:37 ` David S. Miller 2003-01-04 6:09 ` Rusty Russell 2003-01-13 22:32 ` kuznet 1 sibling, 1 reply; 73+ messages in thread From: David S. Miller @ 2003-01-03 8:37 UTC (permalink / raw) To: Rusty Russell; +Cc: kronos, linux-kernel, Alexey N. Kuznetsov On Thu, 2003-01-02 at 21:10, Rusty Russell wrote: > Hmm, I thought the sched stuff all runs under the network brlock? If > so, it doesn't need to be held in, since it's not preemptible. The packet schedulers transmit, not receive. They have their own queue locking, along with the device xmit lock. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-03 8:37 ` David S. Miller @ 2003-01-04 6:09 ` Rusty Russell 2003-01-04 16:21 ` Kronos 0 siblings, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-01-04 6:09 UTC (permalink / raw) To: David S. Miller; +Cc: kronos, linux-kernel, Alexey N. Kuznetsov In message <1041583024.8648.11.camel@rth.ninka.net> you write: > On Thu, 2003-01-02 at 21:10, Rusty Russell wrote: > > Hmm, I thought the sched stuff all runs under the network brlock? If > > so, it doesn't need to be held in, since it's not preemptible. > > The packet schedulers transmit, not receive. > They have their own queue locking, along with the device > xmit lock. Then the patch to mark the "owner" should be straightforward. I haven't looked at kronos's patch in detail, though. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-04 6:09 ` Rusty Russell @ 2003-01-04 16:21 ` Kronos 0 siblings, 0 replies; 73+ messages in thread From: Kronos @ 2003-01-04 16:21 UTC (permalink / raw) To: Rusty Russell; +Cc: linux-kernel, Alexey N. Kuznetsov Il Sat, Jan 04, 2003 at 05:09:41PM +1100, Rusty Russell ha scritto: > Then the patch to mark the "owner" should be straightforward. Yes, it seems so :) This is the patch: diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_api.c linux-2.5/net/sched/cls_api.c --- linux-2.5.orig/net/sched/cls_api.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_api.c Fri Jan 3 17:19:07 2003 @@ -216,6 +216,13 @@ kfree(tp); goto errout; } + + if (try_module_get(tp_ops->owner) == 0) { + err = -ENOSYS; + kfree(tp); + goto errout; + } + memset(tp, 0, sizeof(*tp)); tp->ops = tp_ops; tp->protocol = protocol; @@ -225,6 +232,7 @@ tp->classid = parent; err = tp_ops->init(tp); if (err) { + module_put(tp_ops->owner); kfree(tp); goto errout; } @@ -248,6 +256,7 @@ write_unlock(&qdisc_tree_lock); tp->ops->destroy(tp); + module_put(tp->ops->owner); kfree(tp); err = 0; goto errout; diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_fw.c linux-2.5/net/sched/cls_fw.c --- linux-2.5.orig/net/sched/cls_fw.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_fw.c Sat Jan 4 16:49:11 2003 @@ -117,7 +117,6 @@ static int fw_init(struct tcf_proto *tp) { - MOD_INC_USE_COUNT; return 0; } @@ -128,7 +127,6 @@ int h; if (head == NULL) { - MOD_DEC_USE_COUNT; return; } @@ -146,7 +144,6 @@ } } kfree(head); - MOD_DEC_USE_COUNT; } static int fw_delete(struct tcf_proto *tp, unsigned long arg) @@ -351,18 +348,21 @@ } struct tcf_proto_ops cls_fw_ops = { - NULL, - "fw", - fw_classify, - fw_init, - fw_destroy, - - fw_get, - fw_put, - fw_change, - fw_delete, - fw_walk, - fw_dump + .next = NULL, + .kind = "fw", + .classify = fw_classify, + .init = fw_init, + .destroy = fw_destroy, + + .get = fw_get, + .put = fw_put, + .change = fw_change, + .delete = fw_delete, + .walk = fw_walk, + + .dump = fw_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_route.c linux-2.5/net/sched/cls_route.c --- linux-2.5.orig/net/sched/cls_route.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_route.c Sat Jan 4 16:47:29 2003 @@ -272,7 +272,6 @@ static int route4_init(struct tcf_proto *tp) { - MOD_INC_USE_COUNT; return 0; } @@ -282,7 +281,6 @@ int h1, h2; if (head == NULL) { - MOD_DEC_USE_COUNT; return; } @@ -309,7 +307,6 @@ } } kfree(head); - MOD_DEC_USE_COUNT; } static int route4_delete(struct tcf_proto *tp, unsigned long arg) @@ -607,18 +604,21 @@ } struct tcf_proto_ops cls_route4_ops = { - NULL, - "route", - route4_classify, - route4_init, - route4_destroy, - - route4_get, - route4_put, - route4_change, - route4_delete, - route4_walk, - route4_dump + .next = NULL, + .kind = "route", + .classify = route4_classify, + .init = route4_init, + .destroy = route4_destroy, + + .get = route4_get, + .put = route4_put, + .change = route4_change, + .delete = route4_delete, + .walk = route4_walk, + + .dump = route4_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_rsvp.h linux-2.5/net/sched/cls_rsvp.h --- linux-2.5.orig/net/sched/cls_rsvp.h Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_rsvp.h Sat Jan 4 16:49:56 2003 @@ -247,14 +247,12 @@ { struct rsvp_head *data; - MOD_INC_USE_COUNT; data = kmalloc(sizeof(struct rsvp_head), GFP_KERNEL); if (data) { memset(data, 0, sizeof(struct rsvp_head)); tp->root = data; return 0; } - MOD_DEC_USE_COUNT; return -ENOBUFS; } @@ -294,7 +292,6 @@ } } kfree(data); - MOD_DEC_USE_COUNT; } static int rsvp_delete(struct tcf_proto *tp, unsigned long arg) @@ -673,18 +670,21 @@ } struct tcf_proto_ops RSVP_OPS = { - NULL, - RSVP_ID, - rsvp_classify, - rsvp_init, - rsvp_destroy, - - rsvp_get, - rsvp_put, - rsvp_change, - rsvp_delete, - rsvp_walk, - rsvp_dump + .next = NULL, + .kind = RSVP_ID, + .classify = rsvp_classify, + .init = rsvp_init, + .destroy = rsvp_destroy, + + .get = rsvp_get, + .put = rsvp_put, + .change = rsvp_change, + .delete = rsvp_delete, + .walk = rsvp_walk, + + .dump = rsvp_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_tcindex.c linux-2.5/net/sched/cls_tcindex.c --- linux-2.5.orig/net/sched/cls_tcindex.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_tcindex.c Sat Jan 4 16:50:21 2003 @@ -144,10 +144,8 @@ struct tcindex_data *p; DPRINTK("tcindex_init(tp %p)\n",tp); - MOD_INC_USE_COUNT; p = kmalloc(sizeof(struct tcindex_data),GFP_KERNEL); if (!p) { - MOD_DEC_USE_COUNT; return -ENOMEM; } tp->root = p; @@ -417,7 +415,6 @@ kfree(p->h); kfree(p); tp->root = NULL; - MOD_DEC_USE_COUNT; } @@ -480,20 +477,22 @@ } struct tcf_proto_ops cls_tcindex_ops = { - NULL, - "tcindex", - tcindex_classify, - tcindex_init, - tcindex_destroy, - - tcindex_get, - tcindex_put, - tcindex_change, - tcindex_delete, - tcindex_walk, - tcindex_dump -}; + .next = NULL, + .kind = "tcindex", + .classify = tcindex_classify, + .init = tcindex_init, + .destroy = tcindex_destroy, + + .get = tcindex_get, + .put = tcindex_put, + .change = tcindex_change, + .delete = tcindex_delete, + .walk = tcindex_walk, + + .dump = tcindex_dump, + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/cls_u32.c linux-2.5/net/sched/cls_u32.c --- linux-2.5.orig/net/sched/cls_u32.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/cls_u32.c Sat Jan 4 16:50:39 2003 @@ -262,15 +262,12 @@ struct tc_u_hnode *root_ht; struct tc_u_common *tp_c; - MOD_INC_USE_COUNT; - for (tp_c = u32_list; tp_c; tp_c = tp_c->next) if (tp_c->q == tp->q) break; root_ht = kmalloc(sizeof(*root_ht), GFP_KERNEL); if (root_ht == NULL) { - MOD_DEC_USE_COUNT; return -ENOBUFS; } memset(root_ht, 0, sizeof(*root_ht)); @@ -282,7 +279,6 @@ tp_c = kmalloc(sizeof(*tp_c), GFP_KERNEL); if (tp_c == NULL) { kfree(root_ht); - MOD_DEC_USE_COUNT; return -ENOBUFS; } memset(tp_c, 0, sizeof(*tp_c)); @@ -407,7 +403,6 @@ kfree(tp_c); } - MOD_DEC_USE_COUNT; tp->data = NULL; } @@ -695,18 +690,21 @@ } struct tcf_proto_ops cls_u32_ops = { - NULL, - "u32", - u32_classify, - u32_init, - u32_destroy, - - u32_get, - u32_put, - u32_change, - u32_delete, - u32_walk, - u32_dump + .next = NULL, + .kind = "u32", + .classify = u32_classify, + .init = u32_init, + .destroy = u32_destroy, + + .get = u32_get, + .put = u32_put, + .change = u32_change, + .delete = u32_delete, + .walk = u32_walk, + + .dump = u32_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_api.c linux-2.5/net/sched/sch_api.c --- linux-2.5.orig/net/sched/sch_api.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/sch_api.c Sat Jan 4 16:53:23 2003 @@ -409,6 +409,10 @@ if (ops == NULL) goto err_out; + err = -ENOSYS; + if (try_module_get(ops->owner) == 0) + goto err_module; + size = sizeof(*sch) + ops->priv_size; sch = kmalloc(size, GFP_KERNEL); @@ -416,12 +420,6 @@ if (!sch) goto err_out; - /* Grrr... Resolve race condition with module unload */ - - err = -EINVAL; - if (ops != qdisc_lookup_ops(kind)) - goto err_out; - memset(sch, 0, size); skb_queue_head_init(&sch->q); @@ -460,6 +458,8 @@ } err_out: + module_put(ops->owner); +err_module: *errp = err; if (sch) kfree(sch); diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_atm.c linux-2.5/net/sched/sch_atm.c --- linux-2.5.orig/net/sched/sch_atm.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/sch_atm.c Sat Jan 4 16:51:24 2003 @@ -575,7 +575,6 @@ p->link.ref = 1; p->link.next = NULL; tasklet_init(&p->task,sch_atm_dequeue,(unsigned long) sch); - MOD_INC_USE_COUNT; return 0; } @@ -612,7 +611,6 @@ } } tasklet_kill(&p->task); - MOD_DEC_USE_COUNT; } @@ -697,9 +695,10 @@ .destroy = atm_tc_destroy, .change = NULL, - .dump = atm_tc_dump -}; + .dump = atm_tc_dump, + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) @@ -707,9 +706,10 @@ return register_qdisc(&atm_qdisc_ops); } - void cleanup_module(void) { unregister_qdisc(&atm_qdisc_ops); } #endif + +MODULE_LICENSE("GPL"); diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_cbq.c linux-2.5/net/sched/sch_cbq.c --- linux-2.5.orig/net/sched/sch_cbq.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/sch_cbq.c Sat Jan 4 16:51:30 2003 @@ -1411,9 +1411,7 @@ r = RTA_DATA(tb[TCA_CBQ_RATE-1]); - MOD_INC_USE_COUNT; if ((q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB-1])) == NULL) { - MOD_DEC_USE_COUNT; return -EINVAL; } @@ -1749,7 +1747,6 @@ } qdisc_put_rtab(q->link.R_tab); - MOD_DEC_USE_COUNT; } static void cbq_put(struct Qdisc *sch, unsigned long arg) @@ -2099,6 +2096,8 @@ .change = NULL, .dump = cbq_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_csz.c linux-2.5/net/sched/sch_csz.c --- linux-2.5.orig/net/sched/sch_csz.c Sat Jan 4 16:38:50 2003 +++ linux-2.5/net/sched/sch_csz.c Sat Jan 4 16:53:35 2003 @@ -749,7 +749,7 @@ static void csz_destroy(struct Qdisc* sch) { - MOD_DEC_USE_COUNT; + /* nop */ } static int csz_init(struct Qdisc *sch, struct rtattr *opt) @@ -791,7 +791,6 @@ q->wd_timer.data = (unsigned long)sch; q->wd_timer.function = csz_watchdog; #endif - MOD_INC_USE_COUNT; return 0; } @@ -1046,8 +1045,9 @@ .change = NULL, .dump = csz_dump, -}; + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_dsmark.c linux-2.5/net/sched/sch_dsmark.c --- linux-2.5.orig/net/sched/sch_dsmark.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_dsmark.c Sat Jan 4 16:51:43 2003 @@ -352,7 +352,6 @@ if (!(p->q = qdisc_create_dflt(sch->dev, &pfifo_qdisc_ops))) p->q = &noop_qdisc; DPRINTK("dsmark_init: qdisc %p\n",&p->q); - MOD_INC_USE_COUNT; return 0; } @@ -381,7 +380,6 @@ qdisc_destroy(p->q); p->q = &noop_qdisc; kfree(p->mask); - MOD_DEC_USE_COUNT; } @@ -466,7 +464,9 @@ .destroy = dsmark_destroy, .change = NULL, - .dump = dsmark_dump + .dump = dsmark_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE @@ -474,7 +474,6 @@ { return register_qdisc(&dsmark_qdisc_ops); } - void cleanup_module(void) { diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_fifo.c linux-2.5/net/sched/sch_fifo.c --- linux-2.5.orig/net/sched/sch_fifo.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_fifo.c Sat Jan 4 16:14:50 2003 @@ -206,4 +206,6 @@ .change = fifo_init, .dump = fifo_dump, + + .owner = THIS_MODULE, }; diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_generic.c linux-2.5/net/sched/sch_generic.c --- linux-2.5.orig/net/sched/sch_generic.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_generic.c Sat Jan 4 16:54:10 2003 @@ -29,6 +29,7 @@ #include <linux/skbuff.h> #include <linux/rtnetlink.h> #include <linux/init.h> +#include <linux/module.h> #include <net/sock.h> #include <net/pkt_sched.h> @@ -253,6 +254,8 @@ .enqueue = noop_enqueue, .dequeue = noop_dequeue, .requeue = noop_requeue, + + .owner = THIS_MODULE, }; struct Qdisc noqueue_qdisc = @@ -356,6 +359,8 @@ .init = pfifo_fast_init, .reset = pfifo_fast_reset, + + .owner = THIS_MODULE, }; struct Qdisc * qdisc_create_dflt(struct net_device *dev, struct Qdisc_ops *ops) @@ -422,6 +427,7 @@ ops->reset(qdisc); if (ops->destroy) ops->destroy(qdisc); + module_put(ops->owner); if (!(qdisc->flags&TCQ_F_BUILTIN)) kfree(qdisc); } diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_gred.c linux-2.5/net/sched/sch_gred.c --- linux-2.5.orig/net/sched/sch_gred.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_gred.c Sat Jan 4 16:51:58 2003 @@ -348,7 +348,6 @@ table->grio=sopt->grio; table->initd=0; /* probably need to clear all the table DP entries as well */ - MOD_INC_USE_COUNT; return 0; } @@ -490,7 +489,6 @@ table->def=sopt->def_DP; table->grio=sopt->grio; table->initd=0; - MOD_INC_USE_COUNT; return 0; } @@ -602,7 +600,6 @@ if (table->tab[i]) kfree(table->tab[i]); } - MOD_DEC_USE_COUNT; } struct Qdisc_ops gred_qdisc_ops = @@ -623,8 +620,9 @@ .change = gred_change, .dump = gred_dump, + + .owner = THIS_MODULE, }; - #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_htb.c linux-2.5/net/sched/sch_htb.c --- linux-2.5.orig/net/sched/sch_htb.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_htb.c Sat Jan 4 16:55:04 2003 @@ -1181,7 +1181,6 @@ q->rate2quantum = 1; q->defcls = gopt->defcls; - MOD_INC_USE_COUNT; return 0; } @@ -1366,7 +1365,6 @@ htb_destroy_filters(&q->filter_list); __skb_queue_purge(&q->direct_queue); - MOD_DEC_USE_COUNT; } static int htb_delete(struct Qdisc *sch, unsigned long arg) @@ -1600,39 +1598,41 @@ static struct Qdisc_class_ops htb_class_ops = { - htb_graft, - htb_leaf, - htb_get, - htb_put, - htb_change_class, - htb_delete, - htb_walk, - - htb_find_tcf, - htb_bind_filter, - htb_unbind_filter, + .graft = htb_graft, + .leaf = htb_leaf, + .get = htb_get, + .put = htb_put, + .change = htb_change_class, + .delete = htb_delete, + .walk = htb_walk, + + .tcf_chain = htb_find_tcf, + .bind_tcf = htb_bind_filter, + .unbind_tcf = htb_unbind_filter, - htb_dump_class, + .dump = htb_dump_class, }; struct Qdisc_ops htb_qdisc_ops = { - NULL, - &htb_class_ops, - "htb", - sizeof(struct htb_sched), - - htb_enqueue, - htb_dequeue, - htb_requeue, - htb_drop, - - htb_init, - htb_reset, - htb_destroy, - NULL /* htb_change */, + .next = NULL, + .cl_ops = &htb_class_ops, + .id = "htb", + .priv_size = sizeof(struct htb_sched), + + .enqueue = htb_enqueue, + .dequeue = htb_dequeue, + .requeue = htb_requeue, + .drop = htb_drop, + + .init = htb_init, + .reset = htb_reset, + .destroy = htb_destroy, + .change = NULL, - htb_dump, + .dump = htb_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_ingress.c linux-2.5/net/sched/sch_ingress.c --- linux-2.5.orig/net/sched/sch_ingress.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_ingress.c Sat Jan 4 16:52:49 2003 @@ -257,7 +257,6 @@ memset(p, 0, sizeof(*p)); p->filter_list = NULL; p->q = &noop_qdisc; - MOD_INC_USE_COUNT; return 0; error: return -EINVAL; @@ -304,8 +303,6 @@ qdisc_destroy(p->q); #endif - MOD_DEC_USE_COUNT; - } @@ -359,22 +356,20 @@ .change = NULL, .dump = ingress_dump, -}; + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) { int ret = 0; - if ((ret = register_qdisc(&ingress_qdisc_ops)) < 0) { - printk("Unable to register Ingress qdisc\n"); - return ret; - } + if ((ret = register_qdisc(&ingress_qdisc_ops)) < 0) + printk(KERN_ERR "Unable to register Ingress qdisc\n"); return ret; } - void cleanup_module(void) { diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_prio.c linux-2.5/net/sched/sch_prio.c --- linux-2.5.orig/net/sched/sch_prio.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_prio.c Sat Jan 4 16:52:53 2003 @@ -163,7 +163,6 @@ qdisc_destroy(q->queues[prio]); q->queues[prio] = &noop_qdisc; } - MOD_DEC_USE_COUNT; } static int prio_tune(struct Qdisc *sch, struct rtattr *opt) @@ -227,7 +226,6 @@ if ((err= prio_tune(sch, opt)) != 0) return err; } - MOD_INC_USE_COUNT; return 0; } @@ -399,10 +397,11 @@ .change = prio_tune, .dump = prio_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE - int init_module(void) { return register_qdisc(&prio_qdisc_ops); @@ -412,6 +411,5 @@ { unregister_qdisc(&prio_qdisc_ops); } - #endif MODULE_LICENSE("GPL"); diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_red.c linux-2.5/net/sched/sch_red.c --- linux-2.5.orig/net/sched/sch_red.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_red.c Sat Jan 4 16:52:57 2003 @@ -407,14 +407,7 @@ static int red_init(struct Qdisc* sch, struct rtattr *opt) { - int err; - - MOD_INC_USE_COUNT; - - if ((err = red_change(sch, opt)) != 0) { - MOD_DEC_USE_COUNT; - } - return err; + return red_change(sch, opt); } @@ -458,7 +451,7 @@ static void red_destroy(struct Qdisc *sch) { - MOD_DEC_USE_COUNT; + /* nop */ } struct Qdisc_ops red_qdisc_ops = @@ -479,8 +472,9 @@ .change = red_change, .dump = red_dump, -}; + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_sfq.c linux-2.5/net/sched/sch_sfq.c --- linux-2.5.orig/net/sched/sch_sfq.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_sfq.c Sat Jan 4 16:53:02 2003 @@ -435,7 +435,6 @@ } for (i=0; i<SFQ_DEPTH; i++) sfq_link(q, i); - MOD_INC_USE_COUNT; return 0; } @@ -443,7 +442,6 @@ { struct sfq_sched_data *q = (struct sfq_sched_data *)sch->data; del_timer(&q->perturb_timer); - MOD_DEC_USE_COUNT; } static int sfq_dump(struct Qdisc *sch, struct sk_buff *skb) @@ -486,6 +484,8 @@ .change = NULL, .dump = sfq_dump, + + .owner = THIS_MODULE, }; #ifdef MODULE diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_tbf.c linux-2.5/net/sched/sch_tbf.c --- linux-2.5.orig/net/sched/sch_tbf.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_tbf.c Sat Jan 4 16:53:06 2003 @@ -330,23 +330,17 @@ static int tbf_init(struct Qdisc* sch, struct rtattr *opt) { - int err; struct tbf_sched_data *q = (struct tbf_sched_data *)sch->data; if (opt == NULL) return -EINVAL; - MOD_INC_USE_COUNT; - PSCHED_GET_TIME(q->t_c); init_timer(&q->wd_timer); q->wd_timer.function = tbf_watchdog; q->wd_timer.data = (unsigned long)sch; - if ((err = tbf_change(sch, opt)) != 0) { - MOD_DEC_USE_COUNT; - } - return err; + return tbf_change(sch, opt); } static void tbf_destroy(struct Qdisc *sch) @@ -359,8 +353,6 @@ qdisc_put_rtab(q->P_tab); if (q->R_tab) qdisc_put_rtab(q->R_tab); - - MOD_DEC_USE_COUNT; } static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb) @@ -409,8 +401,9 @@ .change = tbf_change, .dump = tbf_dump, -}; + .owner = THIS_MODULE, +}; #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/net/sched/sch_teql.c linux-2.5/net/sched/sch_teql.c --- linux-2.5.orig/net/sched/sch_teql.c Sat Jan 4 16:38:51 2003 +++ linux-2.5/net/sched/sch_teql.c Sat Jan 4 16:34:52 2003 @@ -178,7 +178,6 @@ } while ((prev = q) != master->slaves); } - MOD_DEC_USE_COUNT; } static int teql_qdisc_init(struct Qdisc *sch, struct rtattr *opt) @@ -223,7 +222,6 @@ m->dev.flags = (m->dev.flags&~FMASK)|(dev->flags&FMASK); } - MOD_INC_USE_COUNT; return 0; } @@ -454,8 +452,9 @@ .reset = teql_reset, .destroy = teql_destroy, .dump = NULL, -},}; + .owner = THIS_MODULE, +},}; #ifdef MODULE int init_module(void) diff --exclude-from=diff.exclude -ru linux-2.5.orig/include/net/pkt_cls.h linux-2.5/include/net/pkt_cls.h --- linux-2.5.orig/include/net/pkt_cls.h Sat Jan 4 16:38:51 2003 +++ linux-2.5/include/net/pkt_cls.h Fri Jan 3 17:20:03 2003 @@ -3,6 +3,7 @@ #include <linux/pkt_cls.h> +#include <linux/module.h> struct rtattr; struct tcmsg; @@ -56,6 +57,8 @@ /* rtnetlink specific */ int (*dump)(struct tcf_proto*, unsigned long, struct sk_buff *skb, struct tcmsg*); + + struct module *owner; }; /* Main classifier routine: scans classifier chain attached diff --exclude-from=diff.exclude -ru linux-2.5.orig/include/net/pkt_sched.h linux-2.5/include/net/pkt_sched.h --- linux-2.5.orig/include/net/pkt_sched.h Sat Jan 4 16:38:51 2003 +++ linux-2.5/include/net/pkt_sched.h Thu Jan 2 22:50:57 2003 @@ -10,6 +10,7 @@ #include <linux/config.h> #include <linux/types.h> #include <linux/pkt_sched.h> +#include <linux/module.h> #include <net/pkt_cls.h> #ifdef CONFIG_X86_TSC @@ -67,6 +68,8 @@ int (*change)(struct Qdisc *, struct rtattr *arg); int (*dump)(struct Qdisc *, struct sk_buff *); + /* Protects callbacks */ + struct module *owner; }; extern rwlock_t qdisc_tree_lock; I've tested it a bit and seems to work ok. TEQL queue still uses MOD_{INC,DEC}_USE_COUNT because it register a device and register_netdevice doesn't use the new interface yet. ciao, Luca -- Reply-To: kronos@kronoz.cjb.net Home: http://kronoz.cjb.net The trouble with computers is that they do what you tell them, not what you want. D. Cohen ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-03 5:10 ` Rusty Russell 2003-01-03 8:37 ` David S. Miller @ 2003-01-13 22:32 ` kuznet 2003-01-13 23:23 ` Max Krasnyansky 2003-01-14 17:49 ` Kronos 1 sibling, 2 replies; 73+ messages in thread From: kuznet @ 2003-01-13 22:32 UTC (permalink / raw) To: Rusty Russell; +Cc: kronos, linux-kernel Hello! > Hmm, I thought the sched stuff all runs under the network brlock? If > so, it doesn't need to be held in, since it's not preemptible. > > I haven't checked, though... It runs under semaphore. Which does not matter at all, because the hole void cleanup_module(void) { <an instance is cloned here> unregister_qdisc(&cbq_qdisc_ops); } remained in any case, be it under some preemptive, nonprepemtive lock or not under a lock at all. BTW, Rusty, a question... I do not understand, what is purpose of this "new" module stuff at all? If we still need to query something in module.c to create each instanse of something it smells exactly "old" broken approach. I just do not see differences. It is not essential in this particular case, but it would be funny to ask it each time when creating a tcp socket. Alexey ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-13 22:32 ` kuznet @ 2003-01-13 23:23 ` Max Krasnyansky 2003-01-14 17:49 ` Kronos 1 sibling, 0 replies; 73+ messages in thread From: Max Krasnyansky @ 2003-01-13 23:23 UTC (permalink / raw) To: kuznet, Rusty Russell; +Cc: kronos, linux-kernel At 01:32 AM 1/14/2003 +0300, kuznet@ms2.inr.ac.ru wrote: >It is not essential in this particular case, >but it would be funny to ask it each time when creating a tcp socket. That's what we're going to do btw. I'm cooking up new patch. See "New module refcounting for net_proto_family" thread. Max ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-13 22:32 ` kuznet 2003-01-13 23:23 ` Max Krasnyansky @ 2003-01-14 17:49 ` Kronos 2003-01-15 0:21 ` Roman Zippel 1 sibling, 1 reply; 73+ messages in thread From: Kronos @ 2003-01-14 17:49 UTC (permalink / raw) To: kuznet; +Cc: rusty, linux-kernel Il Tue, Jan 14, 2003 at 01:32:56AM +0300, Alexey N. Kuznetsov ha scritto: > Which does not matter at all, because the hole > > void cleanup_module(void) > { > <an instance is cloned here> > unregister_qdisc(&cbq_qdisc_ops); > } > > remained in any case, be it under some preemptive, nonprepemtive lock or > not under a lock at all. Hmm, this can't happen now. The try_module_get() will fail if the module is going away and so qdisc_create won't create a new qdisc. > BTW, Rusty, a question... I do not understand, what is purpose of this > "new" module stuff at all? The idea is to use try_module_get() before using any interface registered by the module. Every module has a state associated to it: if the module has been loaded but it is still in its init section its state will be MODULE_STATE_COMING (and try_module_get() will fail); if the module is in process of being unloaded its state will be MODULE_STATE_LIVE (and try_module_get() will fail); otherwise the state will be MODULE_STATE_LIVE (and try_module_get() will success). If we use try_module_get() before using the module and module_put() when we are done everything will work: int create_foo(foo **new) { if (try_module_get(foo->owner) == 0) /* Module is being unloaded, it's not safe to use it */ return -ENOSYS; *new = foo->new_foo(); } void destroy_foo(foo *foo) { foo->finalize(); module_put(foo->owner); } This is what I've done for the packet scheduler: try_module_get() is called *before* cloning the qdisc and if it fails the qdisc won't be cloned. module_put() is called in qdisc_destroy(), after ->destroy(), when we are sure that nobody will ever use a reference to the qdisc. Luca -- Reply-To: kronos@kronoz.cjb.net Home: http://kronoz.cjb.net Non capisco tutta questa eccitazione per il Multitasking: io sono anni che leggo in bagno. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-14 17:49 ` Kronos @ 2003-01-15 0:21 ` Roman Zippel 2003-01-15 1:19 ` kuznet 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-01-15 0:21 UTC (permalink / raw) To: kronos; +Cc: kuznet, rusty, linux-kernel Hi, Kronos wrote: > The idea is to use try_module_get() before using any interface > registered by the module. Every module has a state associated to it: if > the module has been loaded but it is still in its init section its > state will be MODULE_STATE_COMING (and try_module_get() will fail); > if the module is in process of being unloaded its state will be > MODULE_STATE_LIVE (and try_module_get() will fail); otherwise the state > will be MODULE_STATE_LIVE (and try_module_get() will success). Rusty had to revert this to the old scheme, as e.g. scsi relied on it. This means the kernel will oops, if someone gets a reference to a module and its init function fails. Above scheme is not without problems either, a failing try_module_get() suddenly gets a double meaning. Now it also means that the module might be ready soon, please try again later, what increases the complexity of the whole module business. BTW what makes it currently even more complex is that there is also no synchronization by the module-init-tools done anymore, a try_module_get()/request_module()/try_module_get() sequence does give no definitive answer anymore, whether a module is available or not. The results will be spurious kmod failures. > If we use try_module_get() before using the module and module_put() when > we are done everything will work: > > int create_foo(foo **new) { > if (try_module_get(foo->owner) == 0) > /* Module is being unloaded, it's not safe to use it */ > return -ENOSYS; > > *new = foo->new_foo(); > } It's not that simple, try_module_get() does not everything. It's the responsibility of the caller to protect the owner pointer, so try_module_get() itself is only usable within some locking. This means your patch is still not safe, as the test is done too late. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 0:21 ` Roman Zippel @ 2003-01-15 1:19 ` kuznet 2003-01-15 7:31 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: kuznet @ 2003-01-15 1:19 UTC (permalink / raw) To: Roman Zippel; +Cc: kronos, rusty, linux-kernel Hello! > Above scheme is not without problems either, a failing try_module_get() > suddenly gets a double meaning. Now it also means that the module might > be ready soon, please try again later, ... and btw completely useless thing, because each module user already has its own means to do serialization of this kind even when used _NOT_ as module. Somewhat overdone. Alexey ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 1:19 ` kuznet @ 2003-01-15 7:31 ` Werner Almesberger 2003-01-15 8:16 ` Rusty Russell 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-01-15 7:31 UTC (permalink / raw) To: kuznet; +Cc: Roman Zippel, kronos, rusty, linux-kernel kuznet@ms2.inr.ac.ru wrote: > Somewhat overdone. I think it would be nice to introduce in 2.7 a shutdowncall (*) function class for modules that works like exitcall, but with the following differences: - does not return before the module has really de-registered itself everywhere, including synchronization with any callbacks, etc. - has a return code, and can fail if it would have to sleep for a possibly long time Before calling the shutdown function, all symbols exported by the module are hidden, and after the shutdown functions returns, the module can be unloaded. That way, the module reference count becomes merely advisory information, and the real locking, synchronization, etc. is inside the module. The module unload mechanism could then check whether the module is old (uses exitcall) or new (uses shutdowncall), and act accordingly. Modules and the services they use could then be gradually fixed. (*) Or maybe use a nicer name :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 7:31 ` Werner Almesberger @ 2003-01-15 8:16 ` Rusty Russell 2003-01-15 9:33 ` Werner Almesberger 2003-01-15 17:04 ` Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-01-15 8:16 UTC (permalink / raw) To: Werner Almesberger; +Cc: kuznet, Roman Zippel, kronos, linux-kernel In message <20030115043147.A1840@almesberger.net> you write: > kuznet@ms2.inr.ac.ru wrote: > > Somewhat overdone. > > I think it would be nice to introduce in 2.7 a shutdowncall > (*) function class for modules that works like exitcall, but > with the following differences: > > - does not return before the module has really de-registered > itself everywhere, including synchronization with any > callbacks, etc. > - has a return code, and can fail if it would have to sleep > for a possibly long time > > Before calling the shutdown function, all symbols exported by > the module are hidden, and after the shutdown functions returns, > the module can be unloaded. This already happens. This is why all accesses to the module are protected by try_module_get(). I've analyzed dozens of "here's my implementation idea" mails over the last two years. Here's the executive summary: 1) It's simply not a good idea to force 1600 modules to change, no matter what timescale. And changing it in a way that is *more*, not *less* complex is even worse. 2) It's bad enough to force the interfaces to change: at least the primitive they are to use is one many of them are already using, and is very simple to understand. Rusty. PS. The *implementation* flaw in your scheme: someone starts using a module as you try to deregister it. Either you re-register the module (ie. you can never unload security modules), or you leave it half unloaded (even worse). -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 8:16 ` Rusty Russell @ 2003-01-15 9:33 ` Werner Almesberger 2003-01-16 1:12 ` Rusty Russell 2003-01-15 17:04 ` Roman Zippel 1 sibling, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-01-15 9:33 UTC (permalink / raw) To: Rusty Russell; +Cc: kuznet, Roman Zippel, kronos, linux-kernel Rusty Russell wrote: > 1) It's simply not a good idea to force 1600 modules to change, no > matter what timescale. That's the part that I don't believe. The kernel interfaces aren't static. Locking rules have changed several times, the wait/sleep interface has changed, the concept of a module owner has been added, various other interfaces have changed. So a module will eventually have to be maintained. And the more important a module, the more timely this will be done. (Well, or so everybody wishes :-) Unmaintained modules will eventually just fail to work or even compile for other reasons, at which point they cease to be a problem as far as the loading/unloading mechanism is concerned. I agree that it's a bad idea to force changes. That's why there should be a reasonably long period where the current hack and a clean mechanism are available in parallel. > And changing it in a way that is *more*, > not *less* complex is even worse. The implementation may be more complex, but the change I'm suggesting would greatly simplify the rules: no endless set of voodoo rites, but one simple rule: "the shutdowncall function either does nothing, and returns failure, or it returns success, and completely de-registers everything it has previously registered". > PS. The *implementation* flaw in your scheme: someone starts using a > module as you try to deregister it. How could they ? The symbols are hidden, so that way is blocked. If another module has registered callbacks using symbols obtained from that module, that other module needs to be unloaded first anyway, so these references are gone too. If the static kernel accesses a module by resolving symbols (except for the well-controlled operations of the module loader itself), yes, then that module would become un-unloadable. I'm not sure this is much of an issue. If a callback comes in at the wrong moment, the module can choose to de-register and wait until the subsystem has finished any callbacks, or detect that it's about to shut itself down, and fail the callback. The point is that all the locking is now under control of the module, and not scattered all over kernel+module(s). > (ie. you can never unload security modules), Hmm, what makes security modules (what exactly do you mean by that ?) special ? In any case, a module that's truly unloadable would simply return failure from its shutdowncall. > or you leave it half unloaded (even worse). There's always the choice of just sleeping through any inconvenient callbacks. In some cases, this is perfectly acceptable, because the callback won't keep the module busy for a long time anyway (interrupts, timers, tasklets, etc.). In other cases (e.g. "open" functions), a callback could keep it busy forever. In that case, the module needs to maintain its own usage count and have some flag that indicates that it's shutting down. So the callback would look like this: static int foo_open(...) { atomic_inc(&busy); if (shutting_down) { atomic_dec(&busy); return -EGONEFISHING; } ... } static int foo_shutdown(...) { shutting_down = 1; if (atomic_read(&busy)) { shutting_down = 0; return -EBUSY; } deregister_whatever(&myself); /* deregister and synchronize */ ... } "busy" could of course just be the module use count. There is the race condition that the module could briefly disappear (i.e. "foo_open" fails), and then come back, because it turned out to be or appear to be busy. But I think, considering that we're actually trying to make it go away, this is acceptable behaviour. You have the race conditions "is busy" vs. "can unload" and "can use" vs. "has been unloaded" anyway. This can be greatly simplified if we have a try_deregister_whatever that just returns an error if it would have to sleep, i.e. if the synchronization is pushed into the service. Of course, if the "whatever" subsystem will make callbacks after returning from deregister_whatever, this doesn't work, and the module has to use the old mechanisms. But that's really a bug in the "whatever" subsystem. The problem I see with the current module interface is that it just tries to hack the current mess into a less buggy state, but doesn't provide much of an incentive for actually cleaning up the interfaces. Avoiding the bugs is good, but we should also work towards a clean long-term solution. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 9:33 ` Werner Almesberger @ 2003-01-16 1:12 ` Rusty Russell 2003-01-16 2:42 ` Werner Almesberger 2003-01-16 13:44 ` [RFC] Migrating net/sched to new module interface Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-01-16 1:12 UTC (permalink / raw) To: Werner Almesberger; +Cc: kuznet, Roman Zippel, kronos, linux-kernel In message <20030115063349.A1521@almesberger.net> you write: > Rusty Russell wrote: > > 1) It's simply not a good idea to force 1600 modules to change, no > > matter what timescale. > > That's the part that I don't believe. The kernel interfaces > aren't static. Locking rules have changed several times, the > wait/sleep interface has changed, the concept of a module > owner has been added, various other interfaces have changed. Deprecating every module, and rewriting their initialization routines is ambitious beyond the scale of anything you have mentioned. Not that 90% of the kernel code couldn't use a damn good spring cleaning, but I'm not prepared to make such a change personally. And remember why we're doing it: for a fairly obscure race condition. > > And changing it in a way that is *more*, > > not *less* complex is even worse. > > The implementation may be more complex, but the change I'm > suggesting would greatly simplify the rules: no endless set > of voodoo rites, but one simple rule: "the shutdowncall > function either does nothing, and returns failure, or it > returns success, and completely de-registers everything it > has previously registered". So we go from: int init(void) { if (!register_foo(&foo)) return -err; if (!register_bar(&bar)) { unregister_foo(&foo); return -err; } return 0; } void fini(void) { unregister_foo(&foo); unregister_bar(&bar); } to: int fini(void) { if (!unregister_foo(&foo)) return -err; if (!unregister_bar(&bar)) { if (!register_foo(&foo)) ???? return -err; } return 0; } > > PS. The *implementation* flaw in your scheme: someone starts using a > > module as you try to deregister it. > > If a callback comes in at the wrong moment, the module can > choose to de-register and wait until the subsystem has > finished any callbacks, or detect that it's about to > shut itself down, and fail the callback. The point is that > all the locking is now under control of the module, and > not scattered all over kernel+module(s). Something like this? static int i_am_live; static spinlock_t my_lock = SPIN_LOCK_UNLOCKED; /* This is our registered function. */ static int foo_function(void *somedata) { int live; spin_lock(&my_lock); live = i_am_live; spin_unlock(&my_lock); if (!live) return -EIGNOREME???; ... } int fini(void) { spin_lock(&my_lock); i_am_live = 0; spin_unlock(&my_lock); if (!unregister_foo(&foo)) { spin_lock(&my_lock); i_am_live = 1; spin_unlock(&my_lock); return -err; } if (!unregister_bar(&bar)) { if (!register_foo(&foo)) ???? spin_lock(&my_lock); i_am_live = 1; spin_unlock(&my_lock); return -err; } return 0; } Now, if someone tries to remove a module, but it's busy, you get a window of spurious failure, even though the module isn't actually removed. Secondly, there is often no way of returning a value which says "I'm going away, act as if I'm not here": only the level above can sanely know what it would do if this were not found. > > (ie. you can never unload security modules), > > Hmm, what makes security modules (what exactly do you mean > by that ?) special ? In any case, a module that's truly > unloadable would simply return failure from its > shutdowncall. On a busy system, they're never not being used. Your unload routine would always fail. Same with netfilter modules. > > or you leave it half unloaded (even worse). > > There's always the choice of just sleeping through any > inconvenient callbacks. In some cases, this is perfectly > acceptable, because the callback won't keep the module > busy for a long time anyway (interrupts, timers, tasklets, > etc.). In other cases (e.g. "open" functions), a callback > could keep it busy forever. Exactly. It's kept there forever, while the module is useless. > The problem I see with the current module interface is that it > just tries to hack the current mess into a less buggy state, > but doesn't provide much of an incentive for actually cleaning > up the interfaces. > > Avoiding the bugs is good, but we should also work towards a > clean long-term solution. The current scheme is clean: it's two-stage delete with a nice helper function "try_module_get()" which tells you when it's going away, and no requirement that modules actually implement two-stage delete themselves. The patch to mirror this in two-stage init was posted yesterday, as well. It also puts the (minimal) burden in the right place: in the interface coder's lap, not the interface user's lap. Unfortunately, I don't have the patience to explain this once for every kernel developer. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 1:12 ` Rusty Russell @ 2003-01-16 2:42 ` Werner Almesberger 2003-01-16 3:31 ` Rusty Russell ` (2 more replies) 2003-01-16 13:44 ` [RFC] Migrating net/sched to new module interface Roman Zippel 1 sibling, 3 replies; 73+ messages in thread From: Werner Almesberger @ 2003-01-16 2:42 UTC (permalink / raw) To: Rusty Russell; +Cc: kuznet, Roman Zippel, kronos, linux-kernel Rusty Russell wrote: > Deprecating every module, and rewriting their initialization routines > is ambitious beyond the scale of anything you have mentioned. Well, it has happened before, e.g. sleep_on is now deprecated, cli() doesn't give the comprehensive protection it used to, holding spinlocks while sleeping used to be frowned upon, but it's only recently that it moved to being forbidden, etc. And the people making the new rules didn't go and fix every module in existence. You may get this kind of responsibility only if you actually break the old interface, but that's not what I'm suggesting to do. > And remember why we're doing it: for a fairly obscure race condition. No, I want to do this to fix the reason for the fix for the obscure race condition :-) Also, all this smells of a fundamental design problem: modules aren't the only things that can become unavailable. So why construct a special mechanism that only applies to modules ? > So we go from: Yes. Note that the problem case only occurs if you have more than one registration that could block you for a long time. Otherwise, you just take that one first, and sleep on all the others. Otherwise, things get messy, I admit. But that's not a new problem either, e.g. > int init(void) > { > if (!register_foo(&foo)) > return -err; > if (!register_bar(&bar)) { > unregister_foo(&foo); > return -err; > } > return 0; > } What if unregister_foo fails, because somebody already started to use the foo interface ? If "block until it works" is a valid answer, you could probably use this also for the unload case. This would actually work even with the current interface: just make the module exit function sleep until all references are gone. This means than rmmod might take a looong time, but is this really a problem ? If a module wishes to accelerate its demise, it can always add errors exits. If the interface doesn't provide error exits, this is again a general problem, and actually one of the interface. If there's a really nasty case, where you absolutely can't afford to sleep, you need to change the service to split "deregister" into: - prepare_deregister (like "deregister", but reversible) - commit_deregister - undo_deregister But there probably aren't many cases where you'd really need this. > Something like this? Yes, for instance. Or use atomic operations. > Now, if someone tries to remove a module, but it's busy, you get a > window of spurious failure, even though the module isn't actually > removed. Correct. But does this matter ? After all, we were prepared to get failures if the removal succeeds anyway. So this is not a new race condition. > Secondly, there is often no way of returning a value which > says "I'm going away, act as if I'm not here": only the level above > can sanely know what it would do if this were not found. Okay, my approach usually puts the responsibility of finally giving up in user space. If user space just ignores the error, and keeps the thing open anyway, you're in for a long wait. But that's not different from opening some device and never releasing it. In either case, your module becomes un-unloadable, unless you kill the user-space hog. Again, not a new problem. > On a busy system, they're never not being used. Your unload routine > would always fail. Same with netfilter modules. But they're only active for a short time, so the deregistration could just sleep. > It also puts the (minimal) burden in the right place: in the interface > coder's lap, not the interface user's lap. Well, both need to cooperate. And I don't see why the interface coder should have to know whether its users are modules or not, while the interface user, who is implicitly very aware if he's a module or not, shouldn't. Also note that this isn't just a module problem. Instead of tripping over code that all of a sudden isn't there, one may well trip over a data structure that has just been removed. In either case, you need to get the synchronization right. What I'm proposing is simply to make those two cases more similar, such that handling one case correctly also handles the other one. Example: struct foo *my_data; int foo_callback() { do_something(my_data->blah); ... } void foo_exit() { deregister(...); kfree(my_data); ... } So if "deregister" allows callbacks after it returns, you'll still end up getting your oops every once in a while. Now make this a non-module, and if foo_exit or some equivalent can be called for some other reason (e.g. power-down, hotplugging, etc.), you a) still have the same problem, and b) all the miracle protection of try_module_get has vanished. So you have to do the synchronization twice, i.e. on both sides of the interface. > Unfortunately, I don't have the patience to explain this once for > every kernel developer. Sorry for being so persistent. But I really think the module situation is rapidly approaching the point where just fixing the next bug isn't good enough, but where we need to get back to the drawing board, look at the larger picture, and then work towards a cleaner solution. Also, although the task may seem daunting, don't forget that even the BLK wasn't removed in a day :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 2:42 ` Werner Almesberger @ 2003-01-16 3:31 ` Rusty Russell 2003-01-16 4:20 ` Werner Almesberger 2003-01-16 4:25 ` David S. Miller 2003-01-16 18:15 ` Roman Zippel 2003-01-17 2:27 ` Rusty Russell 2 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-01-16 3:31 UTC (permalink / raw) To: Werner Almesberger; +Cc: kuznet, Roman Zippel, kronos, linux-kernel In message <20030115234258.E1521@almesberger.net> you write: > > And remember why we're doing it: for a fairly obscure race condition. > > No, I want to do this to fix the reason for the fix for the > obscure race condition :-) Semantics. > Also, all this smells of a fundamental design problem: modules > aren't the only things that can become unavailable. So why > construct a special mechanism that only applies to modules ? NO NO NO. Listen *carefully*. The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or fail to LOAD). So you're suggesting we should lock ALL functions the way we lock all other datastructures. I look forward to your compiler patch. I've explained this multiple times. If you're not convinced, fine. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 3:31 ` Rusty Russell @ 2003-01-16 4:20 ` Werner Almesberger 2003-01-16 4:25 ` David S. Miller 1 sibling, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-01-16 4:20 UTC (permalink / raw) To: Rusty Russell; +Cc: kuznet, Roman Zippel, kronos, linux-kernel Rusty Russell wrote: > The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or > fail to LOAD). Yes, of course. That's not what I mean. (See below.) > So you're suggesting we should lock ALL functions the way we lock all > other datastructures. I look forward to your compiler patch. No, it's not locking for concurrent access, but locking for access vs. removal. If a function pointer gets disseminated without any way for the caller to revoke it, then try_module_get is indeed the only possible solution. But I think this is the rare exception. The usual way for disseminating function pointers seems to be through registration at some service interface, or through a callback. Do you agree with this ? Service interfaces normally also have a deregistration function, and callbacks are either synchronous, so you have to wait for them to complete, but you're busy anyway, or asynchronous, usually with a way to cancel them. Now, if the service insists on calling back even after deregistration or cancellation, with no means for making sure that at some point no further callback will happen, I'd consider this a bug of the service - a bug that should be fixed. (If there's a way to deregister with and without synchronization, that's fine, of course.) Do you agree with this ? If you agree so far, then I hope you'll also agree that a module can always be safely unloaded, without try_module_get, if its cleanup function just deregisters/cancels and synchronizes all the service registrations/callbacks the module made. Plain and simple. Of course, this means that module can spend an unbounded amount of time in its cleanup function. Do you agree so far ? Now, if we accept that there may be the requirement that we can't always wait for deregistration/cancellation to complete, this leads to the more complex scenario we've discussed earlier in this thread. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 3:31 ` Rusty Russell 2003-01-16 4:20 ` Werner Almesberger @ 2003-01-16 4:25 ` David S. Miller 2003-01-16 4:49 ` Werner Almesberger 2003-01-16 16:05 ` Roman Zippel 1 sibling, 2 replies; 73+ messages in thread From: David S. Miller @ 2003-01-16 4:25 UTC (permalink / raw) To: Rusty Russell Cc: Werner Almesberger, Alexey N. Kuznetsov, Roman Zippel, kronos, linux-kernel On Wed, 2003-01-15 at 19:31, Rusty Russell wrote: > The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or > fail to LOAD). I totally agree with Rusty. If you don't understand this fundamental difference between module unloading vs. arbitrary kernel objects going away, then you really need to apply some gray matter to it before you continue in this conversation :) ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 4:25 ` David S. Miller @ 2003-01-16 4:49 ` Werner Almesberger 2003-01-16 16:05 ` Roman Zippel 1 sibling, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-01-16 4:49 UTC (permalink / raw) To: David S. Miller Cc: Rusty Russell, Alexey N. Kuznetsov, Roman Zippel, kronos, linux-kernel David S. Miller wrote: > I totally agree with Rusty. If you don't understand this fundamental > difference between module unloading vs. arbitrary kernel objects > going away, I think I understand that part. What I'm saying is that any interface that will still call you after deregistration will also cause problems with normal data accesses, even if no modules are involved. So, if interfaces with this kind of bug are fixed, all of a sudden the second (1) synchronization mechanism for module unloads - returning from the cleanup function (2) - becomes a viable alternative to try_module_get. (1) The first mechanism being the use count. (2) Or, in the case of initialization failure, returning from the init function. And that's what I think we should build upon. Rusty doesn't see things this way, and I'd like to find out where exactly we disagree. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 4:25 ` David S. Miller 2003-01-16 4:49 ` Werner Almesberger @ 2003-01-16 16:05 ` Roman Zippel 1 sibling, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-01-16 16:05 UTC (permalink / raw) To: David S. Miller Cc: Rusty Russell, Werner Almesberger, Alexey N. Kuznetsov, kronos, linux-kernel Hi, "David S. Miller" wrote: > On Wed, 2003-01-15 at 19:31, Rusty Russell wrote: > > The ONLY time that FUNCTIONS vanish is when MODULES get UNLOADED (or > > fail to LOAD). > > I totally agree with Rusty. If you don't understand this fundamental > difference between module unloading vs. arbitrary kernel objects > going away, then you really need to apply some gray matter to it > before you continue in this conversation :) Above is not really not wrong, but there is still no fundamental difference to other kernel objects. We will get this wrong as long as we see functions as some kind of very special case, this is simply not true. Module functions are nothing else than readonly data structures, which are allocated and initialized in module.c, by calling init_module() the ownership is transferred to the module. How the ownership is transferred back is now (hopefully) the point of the discussion. The only thing special here is that the data possibly needs to be flushed out of the data cache, before it can be used as a function. Besides of this there is not a single technical difference to other reference counted data structures, just the implementations differ. Functions are not the only thing that vanish when modules get unloaded. It's a lot more common to register functions via data structures and it would be very foolish to think that only the functions need protecting. In the first place one has to get access to the data structure, only then the functions can be safely used. Again, how these data structures are allocated doesn't matter, but at any time we have to know, who owns these data structures, so that we can safely remove and deallocate them. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 2:42 ` Werner Almesberger 2003-01-16 3:31 ` Rusty Russell @ 2003-01-16 18:15 ` Roman Zippel 2003-01-16 18:58 ` Werner Almesberger 2003-01-17 2:27 ` Rusty Russell 2 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-01-16 18:15 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, Werner Almesberger wrote: > If there's a really nasty case, where you absolutely can't > afford to sleep, you need to change the service to split > "deregister" into: > > - prepare_deregister (like "deregister", but reversible) > - commit_deregister > - undo_deregister You can simplify this. All you need are the following simple functions: - void register(); - void unregister(); - int is_registered(); - void inc_usecount(); - void dec_usecount(); - int get_usecount(); It's important to understand that the registered state and the usecount are completely independent. As soon as the object is unregistered and the usecount is zero, the object can be freed, but it doesn't matter in which order it happens. The problem is now that we are very limited how we can use these functions. We can only unregister an object after the usecount became zero, although it's also possible to first unregister the object and then wait for the usecount. Only when we can do the latter is it possible to safely force the removal of the object. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 18:15 ` Roman Zippel @ 2003-01-16 18:58 ` Werner Almesberger 2003-01-16 23:53 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-01-16 18:58 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Roman Zippel wrote: > > - prepare_deregister (like "deregister", but reversible) > > - commit_deregister > > - undo_deregister > > You can simplify this. All you need are the following simple functions: > > - void register(); > - void unregister(); > - int is_registered(); > - void inc_usecount(); > - void dec_usecount(); > - int get_usecount(); I'm not sure if you, you're not changing the semantics. What I was describing was a non-blocking interface, e.g. if (!prepare_deregister(foo)) return -E...; if (!prepare_deregister(bar)) { undo_deregister(foo); return -E...; } commit_deregister(foo); commit_deregister(bar); return 0; With your interface, you're not guaranteed that you can re-register. Well, you could externalize this, e.g. with int error = 0; inc_usecount(foo); inc_usecount(bar); unregister(foo); /* does nothing irrevokable since use count > 0 */ unregister(bar); if (get_usecount(foo) > 1 || get_usecount(bar) > 1) { register(foo); /* re-registers foo */ register(bar); /* re-registers bar */ error = E...; } dec_usecount(foo); dec_usecount(bar); return error; Is that what you had in mind ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 18:58 ` Werner Almesberger @ 2003-01-16 23:53 ` Roman Zippel 2003-01-17 1:04 ` Greg KH 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-01-16 23:53 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, Werner Almesberger wrote: > > You can simplify this. All you need are the following simple functions: > > > > - void register(); > > - void unregister(); > > - int is_registered(); > > - void inc_usecount(); > > - void dec_usecount(); > > - int get_usecount(); > > I'm not sure if you, you're not changing the semantics. See above functions as primitives, which one can use to build any other resource management you want. The module code does the get_usecount() test for every driver and if it was zero it disables inc_usecount() and only then the driver is allowed to unregister(). The module code has to add another is_active state for this and is so actually adding more complexity to it. > What I was > describing was a non-blocking interface, e.g. > > if (!prepare_deregister(foo)) > return -E...; > if (!prepare_deregister(bar)) { > undo_deregister(foo); > return -E...; > } > commit_deregister(foo); > commit_deregister(bar); > return 0; You are making it too complex, as you probably need an is_active state as well, it would be just per interface instead of global. If you really want to reduce complexity, all you can do, is to get rid of the is_active state. For the majority of drivers such state isn't needed at all, but it's currently forced on all drivers. So to keep things simple, we should just finish shutting down the driver, as soon as we started with it and don't restart or undo anything. To avoid the sleeping we can also simply return -EBUSY and continue later, so the cleanup could look like this: if (!ptr) return 0; if (is_registered()) unregister(); if (get_usecount()) return -EBUSY; release(); return 0; This way the caller just needs to do the following, which can be called as often as necessary: if (shutdown(ptr)) return -EBUSY; ptr = NULL; There is really no need to introduce extra states to make things more complex. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 23:53 ` Roman Zippel @ 2003-01-17 1:04 ` Greg KH 0 siblings, 0 replies; 73+ messages in thread From: Greg KH @ 2003-01-17 1:04 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel On Fri, Jan 17, 2003 at 12:53:12AM +0100, Roman Zippel wrote: > Hi, > > Werner Almesberger wrote: > > > > You can simplify this. All you need are the following simple functions: > > > > > > - void register(); > > > - void unregister(); > > > - int is_registered(); > > > - void inc_usecount(); > > > - void dec_usecount(); > > > - int get_usecount(); > > > > I'm not sure if you, you're not changing the semantics. > > See above functions as primitives, which one can use to build any other > resource management you want. That same functionality is already present in a kobj for you to use, instead of rolling your own resource management code. :) (sorry, I couldn't help myself...) greg k-h ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 2:42 ` Werner Almesberger 2003-01-16 3:31 ` Rusty Russell 2003-01-16 18:15 ` Roman Zippel @ 2003-01-17 2:27 ` Rusty Russell 2003-01-17 21:40 ` Roman Zippel 2003-02-13 23:16 ` Werner Almesberger 2 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-01-17 2:27 UTC (permalink / raw) To: Werner Almesberger; +Cc: kuznet, Roman Zippel, kronos, linux-kernel In message <20030115234258.E1521@almesberger.net> you write: > Rusty Russell wrote: > > Deprecating every module, and rewriting their initialization routines > > is ambitious beyond the scale of anything you have mentioned. > > Well, it has happened before, e.g. sleep_on is now deprecated, > cli() doesn't give the comprehensive protection it used to, They gave us SMP. What do we gain for your change? > holding spinlocks while sleeping used to be frowned upon, but > it's only recently that it moved to being forbidden, etc. No, it's always been forbidden, it's only recently we detect it. But apologies for the tone of my previous mail: it seems I'm oversensitive to criticism of the module stuff now 8( To go someway towards an explanation, at least, I humbly submit a fairly complete description of the approach used (assuming the module init race fix patch gets merged). Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ================ Rusty's Unreliable Guide To The 2.5 Module Locking Implementation (draft) aka. All Locking and No Play Make Rusty Go Crazy Any object in the kernel which might disappear, needs to have some form of locking; we're very familiar with locking data structures in the kernel. Modules are no exception, except that in this case it's the functions themselves need locking. The main difference is that we are not prepared to pay the price for locking every function we call: the vast majority of speed-critical functions are not in modules at all, and for many others, the call is internal to the same module. So any locking solution must be lightweight, and (or course) vanish if CONFIG_MODULES is not set. Now, we don't need to lock unless we are going across a module boundary (presumbably if you are already in the module, the locking is sorted out already), and it's easy to ensure that you don't need to lock for a normal function call, since the module loader can make the caller statically depend on the callee: if module B calls "foo", which is in module A, the loader knows that B refers to "foo" when it loads B, so simply enforces that B uses A, and A cannot be unloaded before B. The cases which remain, however, are function calls through pointers: such strategy functions are used widely, and naturally need some form of protection against going away. Since the advent of SMP, and preemption, it is extremely difficult for such functions to protect themselves. Two Stage Delete ================ So let's apply standard locking techniques. A standard approach to this for data (eg networking packets) involves two-stage delete: we keep a reference count in the object, which is usually 1 + number of users, and to delete it we mark it deleted so noone new can use it, and drop the reference count. The last user, who drops the reference count to zero, actually does the free. This is called two-stage delete: Alexey Kuznetsov's reason why IPv4 was not a module was the lack of two-stage delete support for modules. The usual and logical place for the deleted flag and reference count is in the structure being protected: in this case, that's the module itself. You can, instead, have a flag and/or reference count in every object used by the module, but it's not quite the same thing, and deactivating them all atomically is impossible, which introduces its own set of problems. The main problem advantage of a single flag which says whether the module is active or not, is that every module would need to add a function which deactivated it. There are 1600 modules in the kernel, and they work fine when built into the kernel: such a change merely for the module case seems overly disruptive. Also, there is the other case to consider: modules can disappear because they failed to initialize. The same "deleted" flag can be used to isolate modules during initialization: otherwise each module would need to implement two-stage initialization as well. Finally, the try_module_get() primitive (which combines the deleted flag test with the reference count increment in one convenient form) was already fairly widely used by the filesystem code and others, where it was called "try_mod_inc_use_count()". Most people seemed to have little problem using the primitive correctly. Corner Cases And Optimizations ============================== Some interfaces need to do more than simply flip a flag on activation: they might want to fire off a hotplug event, for example, or scan partition tables. So a notifier chain is supplied for them to do exactly this (in the "Proposed Module Init Race Fix" patch). Modules are free to "roll their own" two-stage init using module_make_live() if they want, too. One of the critical things when designing this, was that getting references to modules had to be fast, and it didn't matter if removing modules was relatively slow. The logical implementation of try_module_get() looks like: static int try_module_get(struct module *mod) { int ret; unsigned long flags; read_lock_irqsave(&module_lock, flags); if (mod->deleted) ret = 0; else { ret = 1; atomic_inc(&mod->use); } read_unlock_irqrestore(&module_lock, flags); return ret; } static int module_put(struct module *mod) { if (atomic_dec_and_test(&mod->use)) if (mod->deleted) wake_up(mod->whoever_is_waiting); } And module unloading would look like: ... write_lock_irq(&module_lock); if (atomic_read(&mod->use) != 0) ret = -EBUSY; else { mod->deleted = 1; ret = 0; } write_unlock_irq(&module_lock); But we can do better than this. The try_module_get contains interrupt saving code (even though it's often unneccessary), a spin lock and an atomic operation. Worse, on SMP the spinlock and atomic variable will have to bounce from one CPU to another. The answer is to use an atomic counter per CPU, and a "bogolock", which looks (conceptually) like this: void bogo_read_lock(void) { preempt_disable(); } void bogo_read_unlock(void) { preempt_enable(); } void bogo_write_lock(void) { run thread on every other CPU tell them all to stop interrupts stop interrupts locally } void bogo_write_unlock(void) { tell threads to unblock interrupts and exit restore interrupts locally } So the real try_module_get and module_put look like: static inline int try_module_get(struct module *module) { int ret = 1; if (module) { unsigned int cpu = get_cpu(); /* Disables preemption */ if (likely(module_is_live(module))) local_inc(&module->ref[cpu].count); else ret = 0; put_cpu(); } return ret; } static inline void module_put(struct module *module) { if (module) { unsigned int cpu = get_cpu(); local_dec(&module->ref[cpu].count); /* Maybe they're waiting for us to drop reference? */ if (unlikely(!module_is_live(module))) wake_up_process(module->waiter); put_cpu(); } } And the remove code looks like: ret = stop_refcounts(); /* bogo_write_lock */ if (ret != 0) goto out; /* If it's not unused, quit unless we are told to block. */ if ((flags & O_NONBLOCK) && module_refcount(mod) != 0) { forced = try_force(flags); if (!forced) ret = -EWOULDBLOCK; } else { mod->waiter = current; mod->state = MODULE_STATE_GOING; } restart_refcounts(); /* bogo)write_unlock */ Cheers, Rusty. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-17 2:27 ` Rusty Russell @ 2003-01-17 21:40 ` Roman Zippel 2003-02-13 23:16 ` Werner Almesberger 1 sibling, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-01-17 21:40 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, kronos, linux-kernel Hi, Rusty Russell wrote: > Two Stage Delete > ================ > > So let's apply standard locking techniques. A standard approach to > this for data (eg networking packets) involves two-stage delete: we > keep a reference count in the object, which is usually 1 + number of > users, and to delete it we mark it deleted so noone new can use it, > and drop the reference count. The last user, who drops the reference > count to zero, actually does the free. Rusty, could you please show me where you found the deleted flag in network packets? Where did you find that it's required to test the deleted flag before you can get a reference to the object? Actually it's far more common to just delete the object when all references are gone without any need for a deleted flag. Why does everyone else get away with a "single stage delete"? Where is the deleted flag (or active flag) for modules coming from? (BTW I would be really happy to get any response to this, I already explained this all before, so I'd really like to know whether someone understands what I'm talking about or what is so difficult to understand.) As already mentioned every object can only be deleted, when it's nowhere registered anymore and all users are gone, the same is true for modules: if (!is_registered(obj) && !get_usecount(obj)) delete(obj); The is_registered() check can be omitted, if we unregister the object ourselves, but this requires some locking to prevent new users until the object is unregistered: lock(); if (get_usecount(obj)) { unlock(); return -EBUSY; } unregister(obj); unlock(); delete(obj); New users have to do this: lock(); obj = lookup(); if (obj) inc_usecount(obj); unlock(); This is the standard mechanism we can find all over the kernel. The problem with modules now is that the usecount check and the unregister happen at two different places and the lock is not held until the object is unregistered, so we have to add a new flag: mod_lock(); if (get_usecount(obj)) { mod_unlock(); return -EBUSY; } deactivate(obj); mod_unlock(); ... obj_lock(); unregister(obj); obj_unlock(); delete(obj); New users have to do this now: obj_lock(); obj = lookup(); if (obj) { mod_lock(); if (is_active(obj)) inc_usecount(obj); mod_unlock(); } obj_unlock(); Rusty now worked very hard to make the read path as cheap as possible, but the general complexity is still the same as always. The only way to make this even cheaper is to reduce the complexity and this is only possible by getting rid of the is_active() check. The only consequence would be that we had no global activate switch anymore, but is it really needed? (Insert compelling reason here, why such a switch is absolutely required, as I don't know any.) To understand the possible alternatives I have to rename Rustys "Two Stage Delete" into "Three stage delete": 1. deactivate 2. unregister 3. delete (In case anyone was wondering where the single reference in Rustys example is coming from - it's the registered state from the second stage). As said above the delete stage can only be entered if the object is not registered and not used anymore, this is the only absolute requirement, but currently we can't even get past the first stage as long as there are users (otherwise we risk a deadlock). When Rusty is now talking about exposing a two stage api to the modules, he actually means the first stage in order to leave it to the module how to deactivate the module. It makes indeed little sense to expose this stage, because this is the stage, which is causing the extra complexity and which we should rather get rid of. On the other hand it would make quite a lot more sense to expose the second stage to the modules. This stage is independent of the usecount, this means we can easily and safely prevent new users from using the module. The standard mechanism above to remove an object can easily be changed into: unregister(force): lock(); if (get_usecount(obj) && !force) { unlock(); return -EBUSY; } unregister(obj); unlock(); delete: if (get_usecount(obj)) return -EBUSY; delete(obj); Doing these stages with one or two functions doesn't really matter, the work has to be done anyway and causes no real extra complexity. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-17 2:27 ` Rusty Russell 2003-01-17 21:40 ` Roman Zippel @ 2003-02-13 23:16 ` Werner Almesberger 2003-02-14 1:57 ` Rusty Russell 1 sibling, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-13 23:16 UTC (permalink / raw) To: Rusty Russell; +Cc: kuznet, Roman Zippel, kronos, linux-kernel m.au on Fri, Jan 17, 2003 at 01:27:56PM +1100 Sorry for the long pause ... Rusty Russell wrote: > They gave us SMP. What do we gain for your change? Mainly a simpler interface - one that doesn't treat module unloading as such a special case, but just as yet another fairly regular synchronization problem. This should also have a performance impact: the current approach puts "code locking" outside of the module, and "data locking" inside of it. Unifying this eliminates one layer of locking mechanisms. Independent of this, we should fix the interfaces that give us unstoppable callbacks. These are just disasters waiting to happen, modules or no. Of course, since this may imply interface changes (not necessarily in terms of changing an existing interface, but perhaps in terms of adding a properly synchronized version, and discovering bugs in modules using the not-synchronized one), it would be good if the module cleanup simplification could be done in parallel. I think, once we know exactly what semantics to aim for, the change could be relatively straightforward. (And, I wholeheartedly agree, there must be no "flag day".) > But apologies for the tone of my previous mail: it seems I'm > oversensitive to criticism of the module stuff now 8( No problem. I actually admire your thick skin, given all the unjustified and nasty stuff you get thrown at you :-) > To go someway towards an explanation, at least, I humbly submit a > fairly complete description of the approach used (assuming the module > init race fix patch gets merged). Thanks ! That part looks fine. But, of course, it's not how you do it that I don't like, but what you're doing :-) Anyway, my plan is to first get my simulation infrastructure working, and then make a few test cases that show callbacks after deregistration causing trouble. After that, hopefully other people can pick up the cleanup work. Do you see any obvious technical problems with the approach of using return from module initialization/cleanup as "ready to unload" indicator ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-13 23:16 ` Werner Almesberger @ 2003-02-14 1:57 ` Rusty Russell 2003-02-14 3:44 ` Werner Almesberger 2003-02-14 11:16 ` Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-02-14 1:57 UTC (permalink / raw) To: Werner Almesberger; +Cc: kuznet, Roman Zippel, davem, kronos, linux-kernel In message <20030213201619.A2092@almesberger.net> you write: > m.au on Fri, Jan 17, 2003 at 01:27:56PM +1100 > > Sorry for the long pause ... > > Rusty Russell wrote: > > They gave us SMP. What do we gain for your change? > > Mainly a simpler interface - one that doesn't treat module unloading > as such a special case, but just as yet another fairly regular > synchronization problem. > > This should also have a performance impact: the current approach > puts "code locking" outside of the module, and "data locking" inside > of it. Unifying this eliminates one layer of locking mechanisms. Um, no. You're special case "optimizing" it. When you have an object which may vanish, the linux kernel idiom runs something like this: write_lock() find available object in list inc object refcount write_unlock() The writelock protects the infrastructure, the refcount protects the object. Deleting is done in two stages (mark deleted and drop refcount, then whoever drops the refcount to 0 actually does the free). Usually "mark deleted" means simply remove from the list, but not always. The current module code uses exactly the same idiom for the code itself: we use a heavily read-optimized lock, but that's an implementation detail. Now, could you instead lock all the (arbitrary, widespread, unrelated) accesses to the object, instead of the object itself? Sure. It'd be unique in the kernel, and involve changing the way every interface works, and probably expose some of the complexity to every module author (every workable implementation I've seen has this problem), but you could do it. See below for why I don't think it's worth it. > Independent of this, we should fix the interfaces that give us > unstoppable callbacks. These are just disasters waiting to happen, > modules or no. I assume you're referring to the many places where we assume that the structure being added was not dynamically allocated, so don't bother to protect against its deletion? That is, of course, orthogonal. And in general, I agree: not including a refcount is asking for trouble. But these reference counts are *not* free. The module reference counts, by comparison, can be made extremely cheap (see implementation), because we can allocate a cacheline per cpu, and we can bias "read" speed in preference of awful "write" speed. In summary: the most obvious implementation is to lock to module as an object separately from any objects it might create. The current implementation is extremely fast, requires neither module changes nor (many) interface changes, and in effect canonicalizes a single existing method of locking, which coders seem quite comfortable with. Given these reasons, you can see why I no longer discuss new implementation ideas with people 8( Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 1:57 ` Rusty Russell @ 2003-02-14 3:44 ` Werner Almesberger 2003-02-14 11:16 ` Roman Zippel 1 sibling, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-14 3:44 UTC (permalink / raw) To: Rusty Russell; +Cc: kuznet, Roman Zippel, davem, kronos, linux-kernel Rusty Russell wrote: > Um, no. You're special case "optimizing" it. > > When you have an object which may vanish, the linux kernel idiom runs > something like this: Yes, that's when you view it as a locking problem, as for data objects. What I'm saying is that, if you manage the data structures properly, plus fix a few interfaces that currently don't have to manage data structures properly, you're already perfectly synchronized, so no further locking is needed. > I assume you're referring to the many places where we assume that the > structure being added was not dynamically allocated, so don't bother > to protect against its deletion? Yes. > And in general, I agree: not including a refcount is asking for > trouble. But these reference counts are *not* free. Alas, no. But we how long can we afford not to fix them, at least the "public" interfaces ? Even if they're unsafe, people will use them, particularly if they're given no other choice. > object separately from any objects it might create. The current > implementation is extremely fast, requires neither module changes nor > (many) interface changes, and in effect canonicalizes a single > existing method of locking, which coders seem quite comfortable with. It's more the changes behind the interfaces I'm worried about. It may not be bad today, but every function pointer is a potential problem. Anyway, without fixing a good number of the "ghost from the past" interfaces first, my point is moot. So I won't trouble you again with module locking before there is some progress in this area :-) > Given these reasons, you can see why I no longer discuss new > implementation ideas with people 8( Nah, don't give up ! :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 1:57 ` Rusty Russell 2003-02-14 3:44 ` Werner Almesberger @ 2003-02-14 11:16 ` Roman Zippel 2003-02-14 12:04 ` Rusty Russell 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-14 11:16 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Rusty Russell wrote: > When you have an object which may vanish, the linux kernel idiom runs > something like this: > > write_lock() > find available object in list > inc object refcount > write_unlock() > > The writelock protects the infrastructure, the refcount protects the > object. Deleting is done in two stages (mark deleted and drop > refcount, then whoever drops the refcount to 0 actually does the > free). Usually "mark deleted" means simply remove from the list, but > not always. > > The current module code uses exactly the same idiom for the code > itself: we use a heavily read-optimized lock, but that's an > implementation detail. It's not the same, please see: http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 I explained why the current module locking is more complex and why it's actually a three stage delete. Let's take an example: procfs entries. That's a more interesting example, because these can be dynamically created, but they also include pointers to the module. To keep it simple we create/remove an entry from a nonmodular kernel (so module functions are dummies): foo_entry = create_proc_entry(); foo_entry->data = kmalloc(); foo_entry->read_proc = foo_read; foo_entry->write_proc = foo_write; Problem 1: If the user opens that new proc entry, he must be prepared that the new entry is not fully initialized yet. Problem 2: There are no memory barriers, that means it's undefined in which order the data, read_proc, write_proc members arrive at another cpu, so it's possible that foo_read/foo_write can be called with a NULL data pointer. Now let's remove it again: data = foo_entry->data; remove_proc_entry(); kfree(data); Problem: Should the entry still be busy, procfs just prints a warning, delays cleanup and returns immediately, so the data can be accessed after it was freed. Now we do the same from a module but not from module_init/module_exit, that means we dynamically want to create/remove entries during the module life time. When we create the entry we also initialize the owner member, but this doesn't help with any of the above problems. The only thing module locking changes is that we know that it's safe to free the data at the time module_exit is called, but this might happen in the distant future or even never. Rusty, above are real problems, the module locking fixes these problems during module_init/module_exit, but how can these problems fixed in the other cases and how does the module locking help? bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 11:16 ` Roman Zippel @ 2003-02-14 12:04 ` Rusty Russell 2003-02-14 12:49 ` Roman Zippel 2003-02-14 13:21 ` Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Rusty Russell @ 2003-02-14 12:04 UTC (permalink / raw) To: Roman Zippel; +Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel In message <Pine.LNX.4.44.0302141035270.1336-100000@serv> you write: > It's not the same, please see: > http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 > I explained why the current module locking is more complex and why it's > actually a three stage delete. No, here is where you show *your* ignorance of kernel locking idioms, and that your axiom is that "the new system is more complex". I suggest you read the kernel locking guide: it's in the kernel sources in Documentation/DocBook/kernel-locking.*, try "make psdocs". > Rusty, above are real problems, the module locking fixes these problems > during module_init/module_exit, but how can these problems fixed in the > other cases and how does the module locking help? This isn't even a sensible question: "This is not a module problem. How does module locking help?" You're wasting your own valuable time, too. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 12:04 ` Rusty Russell @ 2003-02-14 12:49 ` Roman Zippel 2003-02-17 1:59 ` Rusty Russell 2003-02-14 13:21 ` Roman Zippel 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-14 12:49 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Rusty Russell wrote: > > It's not the same, please see: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 > > I explained why the current module locking is more complex and why it's > > actually a three stage delete. > > No, here is where you show *your* ignorance of kernel locking idioms, > and that your axiom is that "the new system is more complex". Well, it should be simple then to point out the problem, would you _please_ do it? > > Rusty, above are real problems, the module locking fixes these problems > > during module_init/module_exit, but how can these problems fixed in the > > other cases and how does the module locking help? > > This isn't even a sensible question: "This is not a module problem. > How does module locking help?" > > You're wasting your own valuable time, too. I hoped we could discuss locking problems, as I said these problems are real, so it should be worth to describe possible solutions. Then we can compare the different locking mechanism and maybe we find one, which not only solves a part of the problem. Rusty, I'm asking all these questions on purpose, I want to help you to understand the complete problem and how limited your solutions are. So either please answer my questions or point out the mistakes in my argumentation. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 12:49 ` Roman Zippel @ 2003-02-17 1:59 ` Rusty Russell 2003-02-17 10:53 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-02-17 1:59 UTC (permalink / raw) To: Roman Zippel Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel, torvalds In message <Pine.LNX.4.44.0302141334560.1336-100000@serv> you write: > Rusty, I'm asking all these questions on purpose, I want to help you to > understand the complete problem and how limited your solutions are. So > either please answer my questions or point out the mistakes in my > argumentation. There were two major changes in the module code. The first was to simplify the userland interface, from: sys_create_module(name, size) sys_query_module(...) (a 5-way multiplexed syscall) sys_init_module(name, ptr) to: sys_init_module(ptr, len, userargs) To argue against this change is a demonstration of lack of understanding, or a complete lack of taste. The second change was the speed up one system of module locking in the kernel which wasn't racy, and deprecate the other system which was racy in 99% of its uses. That is all. Did it solve all the races in the kernel? Of course not. But it's simple to use, already well understood in the kernel, and avoids massive changes. It also allows connection tracking to be properly modularized, which was my long-lost original purpose. I've repeated this enough now, I think. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-17 1:59 ` Rusty Russell @ 2003-02-17 10:53 ` Roman Zippel 2003-02-17 23:31 ` Rusty Russell 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-17 10:53 UTC (permalink / raw) To: Rusty Russell Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel, torvalds Hi, On Mon, 17 Feb 2003, Rusty Russell wrote: > There were two major changes in the module code. The first was to > simplify the userland interface, from: > > sys_create_module(name, size) > sys_query_module(...) (a 5-way multiplexed syscall) > sys_init_module(name, ptr) > > to: > sys_init_module(ptr, len, userargs) > > To argue against this change is a demonstration of lack of > understanding, or a complete lack of taste. Maybe you could share a bit of your wisdom? 1. Doing the linking in userspace requires two steps, but I still don't know what's so bad about it. 2. This still doesn't explain, why everything has to be moved into kernel, why can't we move more into userspace? 3. You simply moved part of the query syscall functionality to /proc/modules (which btw is still not enough to fix ksymoops). > The second change was the speed up one system of module locking in the > kernel which wasn't racy, and deprecate the other system which was > racy in 99% of its uses. That is all. Well, I'm not against optimizing the module locking (*), as we won't get rid of it in the near feature, but it still has problems. 1. It's adding complexity (however you implement it), I explained it in detail and you still haven't told me, where I'm wrong. 2. The module interface is incompatible with other kernel interfaces, I tried to explain that in the mail from saturday, if you think I'm wrong, your input is very welcome, but _please_ answer to that mail. > Did it solve all the races in the kernel? Of course not. But it's > simple to use, already well understood in the kernel, and avoids > massive changes. It also allows connection tracking to be properly > modularized, which was my long-lost original purpose. It's too much fun to quote Al here: "And no, I don't buy arguments about poor interface-writers. You do some infrastructure intended to be used by driver-writers - you are supposed to have a clue. 'Cause having rabid monkeys on crack on *both* sides of an interface is a recipe for disaster and on the driver side you are guaranteed to have a bunch of them. We need to have interfaces cleaned up. No silver bullets there. There's maybe a dozen of interfaces that cover 99% of all drivers out there. Remaining 1% can and should fend for itself - you do something really tricky, you are responsible for getting it right." > I've repeated this enough now, I think. Yes, you repeated your "executive summaries" now often enough, maybe you can impress some manager with it, but it's highly offensive to kernel hackers. Could you _please_ come up with some arguments now? bye, Roman (*) BTW I have patch that would make the unload path usable again, it would only be required to add a single smp_mb() to the fast path. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-17 10:53 ` Roman Zippel @ 2003-02-17 23:31 ` Rusty Russell 2003-02-18 12:26 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-02-17 23:31 UTC (permalink / raw) To: Roman Zippel Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel, torvalds In message <Pine.LNX.4.44.0302171112390.1336-100000@serv> you write: > Maybe you could share a bit of your wisdom? > 1. Doing the linking in userspace requires two steps, but I still don't > know what's so bad about it. > 2. This still doesn't explain, why everything has to be moved into kernel, > why can't we move more into userspace? > 3. You simply moved part of the query syscall functionality to > /proc/modules (which btw is still not enough to fix ksymoops). I think you'd do far better to implement it yourself for half a dozen architectures. It's not my job to teach you things which can be gained by reading the code and thinking a little. We're going in a circle again. > > The second change was the speed up one system of module locking in the > > kernel which wasn't racy, and deprecate the other system which was > > racy in 99% of its uses. That is all. > > Well, I'm not against optimizing the module locking (*), as we won't get > rid of it in the near feature, but it still has problems. > > 1. It's adding complexity (however you implement it), I explained it in > detail and you still haven't told me, where I'm wrong. No, it's exactly the same as before. You can't see that, and I've given up explaining it. > 2. The module interface is incompatible with other kernel interfaces, I > tried to explain that in the mail from saturday, if you think I'm wrong, > your input is very welcome, but _please_ answer to that mail. This problem is in your mind Roman. Deal with it. > > Did it solve all the races in the kernel? Of course not. But it's > > simple to use, already well understood in the kernel, and avoids > > massive changes. It also allows connection tracking to be properly > > modularized, which was my long-lost original purpose. > > It's too much fun to quote Al here: Quoting Al's rant isn't an argument. It wasn't very coherent when he wrote it, and it doesn't gain with repetition. The code exists. It's simple to use. I give up. You're killfiled again 8( Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-17 23:31 ` Rusty Russell @ 2003-02-18 12:26 ` Roman Zippel 0 siblings, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-02-18 12:26 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel Hi, On Tue, 18 Feb 2003, Rusty Russell wrote: > > Maybe you could share a bit of your wisdom? > > 1. Doing the linking in userspace requires two steps, but I still don't > > know what's so bad about it. > > 2. This still doesn't explain, why everything has to be moved into kernel, > > why can't we move more into userspace? > > 3. You simply moved part of the query syscall functionality to > > /proc/modules (which btw is still not enough to fix ksymoops). > > I think you'd do far better to implement it yourself for half a dozen > architectures. It's not my job to teach you things which can be > gained by reading the code and thinking a little. As usual you explain nothing, so I still don't know why a complete rewrite was necessary. The old implementation did work fine within limits and already has support for all architectures, so why should I just throw it away? Why was it not possible to first fix the problems of the old system? > > Well, I'm not against optimizing the module locking (*), as we won't get > > rid of it in the near feature, but it still has problems. > > > > 1. It's adding complexity (however you implement it), I explained it in > > detail and you still haven't told me, where I'm wrong. > > No, it's exactly the same as before. You can't see that, and I've > given up explaining it. So far you explained nothing and if you would just read and try to understand that damned mail(*), you would know, that I already said that the complexity is "exactly the same as before". I'm comparing it to other solutions, which you obviously haven't understood. (*) http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 > > 2. The module interface is incompatible with other kernel interfaces, I > > tried to explain that in the mail from saturday, if you think I'm wrong, > > your input is very welcome, but _please_ answer to that mail. > > This problem is in your mind Roman. Thanks for another detailed explaination. :( > > It's too much fun to quote Al here: > > Quoting Al's rant isn't an argument. It wasn't very coherent when he > wrote it, and it doesn't gain with repetition. Well, if you don't even try to understand, what Al is trying to tell you, I'm afraid I can't help you either. > The code exists. It's simple to use. > > I give up. You're killfiled again 8( I seriously consider to take over modules maintainership, but I have neither the energy nor the time to do this alone, so I can only wish everyone much fun with modules during 2.6. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 12:04 ` Rusty Russell 2003-02-14 12:49 ` Roman Zippel @ 2003-02-14 13:21 ` Roman Zippel 2003-02-14 13:53 ` Werner Almesberger 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-14 13:21 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Rusty Russell wrote: > > It's not the same, please see: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 > > I explained why the current module locking is more complex and why it's > > actually a three stage delete. > > No, here is where you show *your* ignorance of kernel locking idioms, > and that your axiom is that "the new system is more complex". Another point you probably misunderstood: I said that the complexity of the new and the old system is exactly the same, please read more carefully before flaming, it might backfire. > > Rusty, above are real problems, the module locking fixes these problems > > during module_init/module_exit, but how can these problems fixed in the > > other cases and how does the module locking help? > > This isn't even a sensible question: "This is not a module problem. > How does module locking help?" I hate to drag people into a discussion, but maybe you're more inclined to believe Al: http://marc.theaimsgroup.com/?l=linux-kernel&m=103761331525509&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=103769023500378&w=2 Please read this very careful and think about it. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 13:21 ` Roman Zippel @ 2003-02-14 13:53 ` Werner Almesberger 2003-02-14 14:24 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-14 13:53 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > On Fri, 14 Feb 2003, Rusty Russell wrote: >> This isn't even a sensible question: "This is not a module problem. >> How does module locking help?" I have to side with Rusty here - it's really not a module problem. The module changes only make this a little worse, because they follow the philosophy that this can't be fixed, so try_module_get and friends try to implement the right kind of locking for unload races (but for little else) without tripping over those "unfixable" bugs. The good news is that you can start fixing all those bugs right now, and even without Rusty's consent :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 13:53 ` Werner Almesberger @ 2003-02-14 14:24 ` Roman Zippel 2003-02-14 18:30 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-14 14:24 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Werner Almesberger wrote: > The module changes only make this a little worse, because they > follow the philosophy that this can't be fixed, so try_module_get > and friends try to implement the right kind of locking for unload > races (but for little else) without tripping over those "unfixable" > bugs. If you see these bugs as "unfixable", you already gave up and you end up putting one band-aid over another, each only solving part of the problem. Please try work with me here and we might find a more general solution. I could just post possible solutions, but as long nobody understands the problem, they will be rejected anyway. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 14:24 ` Roman Zippel @ 2003-02-14 18:30 ` Werner Almesberger 2003-02-14 20:09 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-14 18:30 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > If you see these bugs as "unfixable", you already gave up and you end up > putting one band-aid over another, each only solving part of the problem. Yup, that's what I don't like either. > Please try work with me here and we might find a more general solution. > I could just post possible solutions, but as long nobody understands the > problem, they will be rejected anyway. Step one is to fix those "unfixable" problems. That's independent of modules, and I'm convinced that it needs to be done. I don't want to have to go through lots of subsystems and try to figure our how to do their locking right, so my plan is to use a simulator to exercise such obscure race conditions, and once the Oops has been shown live and in color, leave the actual fixes to the maintainers of the respective code. (And eventually, also leave the bug hunt to others. I'm lousy at maintenance-type of work.) I expect to have that simulator ready for useful work on UP in about one week. SMP will come later. (For a sneak preview, you can look at umlsim.sourceforge.net, which has the (old) kernel side, and at netbb/dumpbb/README.UMLSIM in netbb from www.almesberger.net/netbb, which describes some of the user-space side (basically a script-driven debugger).) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 18:30 ` Werner Almesberger @ 2003-02-14 20:09 ` Roman Zippel 2003-02-15 0:12 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-14 20:09 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Werner Almesberger wrote: > > Please try work with me here and we might find a more general solution. > > I could just post possible solutions, but as long nobody understands the > > problem, they will be rejected anyway. > > Step one is to fix those "unfixable" problems. That's independent > of modules, and I'm convinced that it needs to be done. Step one is still to understand the problem. I described a very real problem, once this problem is solved (which can be done in different ways), I can explain how this can be applied to modules. It's not really independent of modules. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-14 20:09 ` Roman Zippel @ 2003-02-15 0:12 ` Werner Almesberger 2003-02-15 0:51 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-15 0:12 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > Step one is still to understand the problem. I described a very real > problem, once this problem is solved (which can be done in different > ways), I though we were talking about static data_used_by_callbacks; ... register_foo(&stuff_with_callbacks); ... unregister_foo(&stuff_with_callbacks); make_unusable(&data_used_by_callbacks) ... /* oops, we just got a callback */ ("data_used_by_callbacks" could be a pointer to kmalloc'ed memory, etc.) This kind of problem seems to be understood well enough. But maybe you mean something else ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-15 0:12 ` Werner Almesberger @ 2003-02-15 0:51 ` Roman Zippel 2003-02-15 2:28 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-15 0:51 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Werner Almesberger wrote: > I though we were talking about > > static data_used_by_callbacks; > ... > register_foo(&stuff_with_callbacks); > ... > unregister_foo(&stuff_with_callbacks); > make_unusable(&data_used_by_callbacks) > ... > /* oops, we just got a callback */ > > ("data_used_by_callbacks" could be a pointer to kmalloc'ed > memory, etc.) > > This kind of problem seems to be understood well enough. Yes, and now compare how the solutions differ when the data is static and when it's allocated. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-15 0:51 ` Roman Zippel @ 2003-02-15 2:28 ` Werner Almesberger 2003-02-15 23:20 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-15 2:28 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > Yes, and now compare how the solutions differ when the data is static and > when it's allocated. Do they ? Even if the data is static, it can become invalid (in the sense that accessing it from a callback would lead to some kind of undesirable behaviour, even though the access itself would work), so I don't quite see why the difference would matter. Example: static ... common_callback(...) { switch (my_state) { ... } } ... my_state = A; register_fancy_timer_A(&me_A,common_callback); ... unregister_fancy_timer_A(&me_A); my_state = B; /* stray fancy_timer_A call to common_callback would trigger action for state B */ ... register_fancy_timer_B(&me_B,common_callback); ... Depending on "my_state", the callback would perform different actions. (The "fancy timers" would be some timer-like service that doesn't del_timer_sync.) This is getting to abstract. Why don't you just say where you see the difference ? :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-15 2:28 ` Werner Almesberger @ 2003-02-15 23:20 ` Roman Zippel 2003-02-17 17:04 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-15 23:20 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, On Fri, 14 Feb 2003, Werner Almesberger wrote: > > Yes, and now compare how the solutions differ when the data is static and > > when it's allocated. > > Do they ? Even if the data is static, it can become invalid > (in the sense that accessing it from a callback would lead > to some kind of undesirable behaviour, even though the access > itself would work), so I don't quite see why the difference > would matter. Let's stay at the main problem, we have find out when it's safe to delete an object. For dynamic objects you have the following options: - callbacks: when the refcount becomes zero, we call a function to remove the object. - failure: we just return -EBUSY and try again later. - wait: we simply wait until the refcount becomes zero Static objects and functions are freed by the module code and usually we want to unregister them at module unload time, so there are basically two ways: - we use the module count via try_module_get/module_put - we use your own refcount and must wait in module_exit until all users are gone If we exclude the possibly-wait-forever-option, do you see the problem for dynamic objects which also contain references to static data/ functions? Procfs entries are such objects, there is a count field for the dynamic part and an owner field for the static part and proc_get_inode always has to get two references. The interesting question is now, can we get rid of one of them? If the answer is no, it would mean we need two procfs APIs, one which can be used from module_exit and another which can be used to remove dynamic entries. OTOH if we want to avoid this it would mean we have to make one or more of the above options generically usable. I stop here, because before we can discuss possible solutions, we have to agree, that this problem is real. Rusty, did I make any mistake so far? I'd really like to have your opinion too. You are free to flame me, if I made any mistake, but please do it with a bit more substance than just accusing me of ignorance. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-02-15 23:20 ` Roman Zippel @ 2003-02-17 17:04 ` Werner Almesberger 2003-02-17 23:09 ` [RFC] Is an alternative module interface needed/possible? Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-17 17:04 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > Let's stay at the main problem, we have find out when it's safe to delete > an object. For dynamic objects you have the following options: [...] > Static objects and functions are freed by the module code and usually we [...] Okay so far. > If we exclude the possibly-wait-forever-option, do you see the problem > for dynamic objects which also contain references to static data/ > functions? You mean that two locking mechanisms are used, where one of them shouldn't be doing all that much ? Well, yes. Now, is this a problem, or just a symptom ? I'd say it's a symptom: we already have a perfectly good locking/synchronization method, and that's through the register/unregister interface, so the module-specific part is unnecessary. That much about the theory. Of course, in real life, we have to face a few more problems: - if callbacks can happen after apparently successful "unregister", we die - if accesses to other static data owned by a module can happen after apparently successful "unregister", we may die - if a module doesn't "unregister" at all, we die too But all these problems equally affect code that does other things that can break a callback/access, e.g. if we destroy *de->data immediately after remove_proc_entry returns. So this is not a module-specific problem. Agreed ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC] Is an alternative module interface needed/possible? 2003-02-17 17:04 ` Werner Almesberger @ 2003-02-17 23:09 ` Roman Zippel 2003-02-18 1:18 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-17 23:09 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, (Subject changed to hopefully get a bit more attention.) On Mon, 17 Feb 2003, Werner Almesberger wrote: > > If we exclude the possibly-wait-forever-option, do you see the problem > > for dynamic objects which also contain references to static data/ > > functions? > > You mean that two locking mechanisms are used, where one of them > shouldn't be doing all that much ? Well, yes. > > Now, is this a problem, or just a symptom ? I'd say it's a symptom: > we already have a perfectly good locking/synchronization method, > and that's through the register/unregister interface, so the > module-specific part is unnecessary. If it was perfectly good, we hadn't a problem. :) > That much about the theory. Of course, in real life, we have to > face a few more problems: > > - if callbacks can happen after apparently successful "unregister", > we die > - if accesses to other static data owned by a module can happen > after apparently successful "unregister", we may die > - if a module doesn't "unregister" at all, we die too > > But all these problems equally affect code that does other things > that can break a callback/access, e.g. if we destroy *de->data > immediately after remove_proc_entry returns. > > So this is not a module-specific problem. You're skipping ahead. You haven't solved the problem yet, but you're already jumping to conclusions. :-) Remember, that we want to savely remove a proc entry and as added bonus, we only want a single reference count. Let's look first at the possible solutions: module count: by design this only works for entries, which are removed during module exit, but not for dynamic entries. failure: if the object is still busy, we just return -EBUSY. This is simple, but this doesn't work for modules, since during module exit you can't fail anymore. callbacks: the callback function itself had to be protected somehow, so just to unregister a proc entry, you have to register a callback. To unregister that callback, it would be silly to use another callback and failure doesn't work with modules, so that only leaves the module count. The last solution sounds complicated, but exactly this is done for filesystems and we didn't really get rid of the second reference count, we just moved it somewhere else, where it hurts least. Without interface changes this is also the only generic option to export dynamic data - the drivers have to get a filesystem like interface (or just become filesystem themselves). The very basic reason which prevents another solution is that static data (which includes functions) is controlled by the generic module code and dynamic data is controlled by the driver itself. It's obvious that we can't give the module code control over dynamic data, on the other hand would it be possible to give the driver control over the static data? This way it suddenly it becomes a module-specific problem - how can we give drivers more control over its data? bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-17 23:09 ` [RFC] Is an alternative module interface needed/possible? Roman Zippel @ 2003-02-18 1:18 ` Werner Almesberger 2003-02-18 4:54 ` Rusty Russell 2003-02-19 1:48 ` Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-18 1:18 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Roman Zippel wrote: > If it was perfectly good, we hadn't a problem. :) I said we he have the method. Now we need to use it properly :-) > You're skipping ahead. You haven't solved the problem yet, but you're > already jumping to conclusions. :-) The solution is another issue. I simply stated that the problem happens with or without modules. > module count: by design this only works for entries, which are removed > during module exit, but not for dynamic entries. Works only for modules, not good. > failure: if the object is still busy, we just return -EBUSY. This is > simple, but this doesn't work for modules, since during module exit you > can't fail anymore. That's a modules API problem. And yes, I think modules should eventually be able to say that they're busy. > callbacks: the callback function itself had to be protected somehow, so > just to unregister a proc entry, you have to register a callback. To > unregister that callback, it would be silly to use another callback and If all you want to do is to decrement the module count, you could have a global handler for this that is guaranteed not to reside in a module. By the way, a loong time ago, in the modules thread, I suggested a "decrement_module_count_and_return" function [1]. Such a construct would be useful in this specific case. [1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0207.0/0147.html > failure doesn't work with modules, so that only leaves the module count. And how would you ensure correct access to static data in the absence of modules ? Any solution that _requires_ a module count looks highly suspicious to me. Likewise, possibly dynamically allocated data that is synchronized by the caller, e.g. "user" in "struct proc_dir_entry". > The last solution sounds complicated, but exactly this is done for > filesystems and we didn't really get rid of the second reference count, we > just moved it somewhere else, where it hurts least. Hmm, I'm confused. With "filesystem", do you mean the file system driver per se (e.g. "ext3"), or a specific instance of such a file system (e.g. /dev/hda1 mounted on /) ? - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 1:18 ` Werner Almesberger @ 2003-02-18 4:54 ` Rusty Russell 2003-02-18 7:20 ` Werner Almesberger 2003-02-19 1:48 ` Roman Zippel 1 sibling, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-02-18 4:54 UTC (permalink / raw) To: Werner Almesberger; +Cc: Roman Zippel, kuznet, davem, kronos, linux-kernel In message <20030217221837.Q2092@almesberger.net> you write: > > failure: if the object is still busy, we just return -EBUSY. This is > > simple, but this doesn't work for modules, since during module exit you > > can't fail anymore. Of course, they never could fail module exit. You could use can_unload() which was only protected by the BKL, hence so rarely used: grep 2.4.20 to see all four users. Having a separate can_unload function means a window between asking can_unload() and doing the actual unload. Doing it in the function itself (ie. cleanup returns an int, or a separate "prepare_to_unload" kind of function) means you have to deal with the "I started deregistering my stuff, but then had to re-register them" which leaves a race where the module is partially unavailable (ie. the spurious failure problem). Both these problems are soluble, but they're not trivial. > That's a modules API problem. And yes, I think modules should > eventually be able to say that they're busy. Take a look at the current ip_conntrack code. It keeps its own reference count and spins until it hits zero in unload, because otherwise it would never be unloadable (it attaches a callback to each packet, so logically it would bump its refcount there). In 2.5 it can use try_module_get() and be unloaded with rmmod --wait in this worst case (I'm slack, patching is on my todo list). Security modules have the same issues, and Greg Kroah-Hartmann was telling me USB has it as well. You can see why I considered deprecating module unloading: it's a hard problem, and fixing it in general probably means massive changes. Which I'm happy to leave to someone else. > By the way, a loong time ago, in the modules thread, I suggested > a "decrement_module_count_and_return" function [1]. Such a > construct would be useful in this specific case. Stephen and I even implemented one, for x86 (see below). But implementing the matching "enter and inc" case remains problematic, so I dropped that idea. Anyway, good luck! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. Name: module_put_return primitive for x86 Author: Rusty Russell Status: Experimental Depends: Module/forceunload.patch.gz D: This patch implements module_put_return() for x86, which allows a D: module to control its own reference counts in some cases. A module D: must never use module_put() on itself, since this may result in the D: module being removable immediately: this is the alternative, which D: atomically decrements the count and returns. D: D: Each architecture will need to implement this for preempt. diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/kernel/module.c .13134-linux-2.5.38.updated/arch/i386/kernel/module.c --- .13134-linux-2.5.38/arch/i386/kernel/module.c 2002-09-25 02:02:28.000000000 +1000 +++ .13134-linux-2.5.38.updated/arch/i386/kernel/module.c 2002-09-25 02:02:56.000000000 +1000 @@ -28,6 +28,9 @@ #define DEBUGP(fmt , ...) #endif +/* For the magic module return. */ +struct module_percpu module_percpu[NR_CPUS]; + static void *alloc_and_zero(unsigned long size) { void *ret; diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/arch/i386/lib/module.S .13134-linux-2.5.38.updated/arch/i386/lib/module.S --- .13134-linux-2.5.38/arch/i386/lib/module.S 1970-01-01 10:00:00.000000000 +1000 +++ .13134-linux-2.5.38.updated/arch/i386/lib/module.S 2002-09-25 02:02:56.000000000 +1000 @@ -0,0 +1,33 @@ +/* Icky, icky, return trampoline for dying modules. Entered with + interrupts off. (C) 2002 Stephen Rothwell, IBM Corporation. */ + +#include <asm/thread_info.h> + +.text +.align 4 +.globl __magic_module_return + +#define MODULE_PERCPU_SIZE_ORDER 3 + +__magic_module_return: + /* Save one working variable */ + pushl %eax + + /* Get CPU number from current. */ + GET_THREAD_INFO(%eax) + movl TI_CPU(%eax), %eax + + /* Push module_percpu[cpu].flags on the stack */ + shll $MODULE_PERCPU_SIZE_ORDER, %eax + pushl module_percpu(%eax) + + /* Put module_percpu[cpu].returnaddr into %eax */ + movl module_percpu+4(%eax), %eax + + /* Push returnaddr and restore eax */ + xchgl %eax, 4(%esp) + + /* Restore interrupts */ + popf + /* Go home */ + ret diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .13134-linux-2.5.38/include/asm-i386/module.h .13134-linux-2.5.38.updated/include/asm-i386/module.h --- .13134-linux-2.5.38/include/asm-i386/module.h 2002-09-25 02:02:28.000000000 +1000 +++ .13134-linux-2.5.38.updated/include/asm-i386/module.h 2002-09-25 02:02:56.000000000 +1000 @@ -8,4 +8,33 @@ struct mod_arch_specific #define Elf_Shdr Elf32_Shdr #define Elf_Sym Elf32_Sym #define Elf_Ehdr Elf32_Ehdr + +/* Non-preemtible decrement the refcount and return. */ +#define module_put_return(firstarg , ...) \ +do { \ + unsigned int cpu; \ + unsigned long flags; \ + \ + local_irq_save(flags); \ + if (unlikely(module_put(THIS_MODULE)) { \ + module_percpu[cpu].flags = flags; \ + module_percpu[cpu].returnaddr = ((void **)&(firstarg))[-1]; \ + ((void **)&(firstarg))[-1] = __magic_module_return; \ + } else \ + local_irq_restore(flags); \ + return __VA_ARGS__; \ +} while(0) + +/* FIXME: Use per-cpu vars --RR */ +struct module_percpu +{ + unsigned long flags; + void *returnaddr; +}; + +extern struct module_percpu module_percpu[NR_CPUS]; + +/* Restore flags and return to caller. */ +extern void __magic_module_return(void); + #endif /* _ASM_I386_MODULE_H */ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 4:54 ` Rusty Russell @ 2003-02-18 7:20 ` Werner Almesberger 2003-02-18 12:06 ` Roman Zippel 2003-02-18 12:35 ` Roman Zippel 0 siblings, 2 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-18 7:20 UTC (permalink / raw) To: Rusty Russell; +Cc: Roman Zippel, kuznet, davem, kronos, linux-kernel Grr, I hate this. I just typed a long reply to your posting, and wanted to finish it with a reminder that the issue is more general than just modules, so that we shouldn't look at modules yet. Then I realized of course that everything above was about modules :-( I don't think we'll make much progress if we keep on mixing issues of interface correctness, current module constraints, and possible module interface changes, all that with performance considerations and minimum invasive migration plans thrown in. So I'd suggest the following sequence: 1) do we agree that the current registration/deregistration interfaces are potential hazards for their users, be they modules or not ? 2) one we agree with this, we can look for mechanisms that solve this, again for general users, which may or may not be modules 3) last but not least, we can look at what this means for modules (and that's where beautiful tools like "module_put_return" (thanks !), or also ideas about module_exit redesign have their place) 4) "the root of all evil ...". Okay, and now to which level of hell would all this shoot our performance ? (And back we go to step 2.) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 7:20 ` Werner Almesberger @ 2003-02-18 12:06 ` Roman Zippel 2003-02-18 14:12 ` Werner Almesberger 2003-02-18 12:35 ` Roman Zippel 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-18 12:06 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel Hi, On Tue, 18 Feb 2003, Werner Almesberger wrote: > I don't think we'll make much progress if we keep on mixing issues > of interface correctness, current module constraints, and possible > module interface changes, all that with performance considerations > and minimum invasive migration plans thrown in. So I'd suggest the > following sequence: > > 1) do we agree that the current registration/deregistration > interfaces are potential hazards for their users, be they > modules or not ? > 2) one we agree with this, we can look for mechanisms that > solve this, again for general users, which may or may not > be modules > 3) last but not least, we can look at what this means for > modules (and that's where beautiful tools like > "module_put_return" (thanks !), or also ideas about > module_exit redesign have their place) > 4) "the root of all evil ...". Okay, and now to which level > of hell would all this shoot our performance ? (And back > we go to step 2.) Basically I can agree with this, although I'd like to avoid that we iterate too much over these steps, as it would too easily divert the discussion to other things, so I'd rather take smaller steps and keep the scope a bit broader. Another point is the perfomance, which is not that important right now. I'm more interested in the general complexity, it's simply easier to optimize a simple design than a very complex design. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 12:06 ` Roman Zippel @ 2003-02-18 14:12 ` Werner Almesberger 2003-02-18 12:45 ` Thomas Molina 2003-02-18 17:22 ` Werner Almesberger 0 siblings, 2 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-18 14:12 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, davem, kronos, linux-kernel (I think we should limit the Cc:s to Roman, Rusty, the list, and me, and leave the others in peace. Please yell if you really want that extra copy :-) Roman Zippel wrote: > Basically I can agree with this, although I'd like to avoid that we > iterate too much over these steps, as it would too easily divert the > discussion to other things, so I'd rather take smaller steps and keep the > scope a bit broader. Good :-) I want to avoid modules as much as possible, because they've extensively been tackled in the past (which didn't help much making the interfaces better), and also because they're just a bit too political an issue. Okay, this brings us to the issue of broken interfaces. Do we have agreement that there are cases where interfaces like remove_proc_entry, in their current state, cannot be used correctly ? Example: static ...callback...(... struct file *file ...) { void *user = PDE(file->f_dentry->d_inode)->data; ... do something with "*user" ... } ... struct proc_dir_entry *de = create_proc_entry(...); void *my_data; de->data = my_data = kmalloc(...); ... remove_proc_entry(...); /* what happens with "my_data", formerly known as "de->data" ? */ a) kfree it. May cause an oops or other problems if we're in the middle of the callback. b) don't kfree it. So we now have a (hopefully small) memory leak. Problem: my_data may point to a lot more than just some tiny allocation. Okay so far ? (Possible solutions on the next, slid^H^H^H^Hposting :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 14:12 ` Werner Almesberger @ 2003-02-18 12:45 ` Thomas Molina 2003-02-18 17:22 ` Werner Almesberger 1 sibling, 0 replies; 73+ messages in thread From: Thomas Molina @ 2003-02-18 12:45 UTC (permalink / raw) To: linux-kernel On Tue, 18 Feb 2003, Werner Almesberger wrote: > Good :-) I want to avoid modules as much as possible, because > they've extensively been tackled in the past (which didn't help > much making the interfaces better), and also because they're > just a bit too political an issue. > > Okay, this brings us to the issue of broken interfaces. Do we > have agreement that there are cases where interfaces like > remove_proc_entry, in their current state, cannot be used > correctly ? I hope this discussion is taking place in the context of looking forward towards something to implement in 2.7. IMHO we are much too late in the 2.5 cycle to implement this now. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 14:12 ` Werner Almesberger 2003-02-18 12:45 ` Thomas Molina @ 2003-02-18 17:22 ` Werner Almesberger 2003-02-19 3:30 ` Rusty Russell ` (2 more replies) 1 sibling, 3 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-18 17:22 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, linux-kernel Next round: possible remedies and their side-effects. As usual, if you disagree with something, please holler. If yes, let's look at possible (and not overly insane) solutions, using remove_proc_entry as a case study: 1) still don't kfree, and leave it to the user to somehow minimize the damage. (Good luck :-) 2) add a callback that is invoked when the proc entry gets deleted. (This callback may be called before remove_proc_entry completes.) Problem: unload/return race for modules. 3) change remove_proc_entry or add remove_proc_entry_wait that works like remove_proc_entry, but blocks until the entry is deleted. Problem: may sleep "forever". 4) make remove_proc_entry return an indication of whether the entry was successfully removed or not. Problem: if it wasn't, what can we do then ? 5) like above, but don't remove the entry if we can't do it immediately. Problem: there's no notification for when we should try again, so we'd have to poll. 6) export the lookup mechanism, and let the caller poll for removal. Problem: races with creation of a new entry with the same name. 7) transfer ownership of de->data to procfs, and set some (possibly configurable) destruction policy (e.g. always kfree, or such). Similar to 2), but less flexible. Any more ? I think that most programmers would intuitively expect an interface of type 3). In cases where we can't sleep, either type 2) or type 5) would be common choices. Does that sound reasonable so far ? I'll wait a little until I continue with ways for dealing with the side-effects. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 17:22 ` Werner Almesberger @ 2003-02-19 3:30 ` Rusty Russell 2003-02-19 4:11 ` Werner Almesberger 2003-02-20 0:40 ` Roman Zippel 2003-02-23 23:34 ` Kevin O'Connor 2 siblings, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-02-19 3:30 UTC (permalink / raw) To: Werner Almesberger; +Cc: Roman Zippel, kuznet, linux-kernel In message <20030218142257.A10210@almesberger.net> you write: > Next round: possible remedies and their side-effects. As > usual, if you disagree with something, please holler. > > If yes, let's look at possible (and not overly insane) solutions, > using remove_proc_entry as a case study: > > 1) still don't kfree, and leave it to the user to somehow > minimize the damage. (Good luck :-) > > 2) add a callback that is invoked when the proc entry gets > deleted. (This callback may be called before remove_proc_entry > completes.) Problem: unload/return race for modules. OK. For reference, the "state of 2.4" solution (which is also the "state of 2.5" solution) looks like: > struct proc_dir_entry *de = create_proc_entry(...); > void *my_data; > > de->data = my_data = kmalloc(...); =====> de->owner = THIS_MODULE; > ... > remove_proc_entry(...); > /* what happens with "my_data", formerly known as "de->data" ? */ And have proc_file_operations do the standard owner get and release: open: proc_open, release: proc_release, static int proc_open(struct inode *inode, struct file *filp) { struct proc_dir_entry *dp = PDE(inode); if (!try_module_get(dp->owner)) return -ENOENT; return 0; } static int proc_release(struct inode *inode, struct file *filp) { struct proc_dir_entry *dp = PDE(inode); module_put(dp->owner); return 0; } Now, if remove_proc_entry() is called from module_exit(), the kfree() works fine, since (1) we wouldn't be in module_exit() if the proc entry was in used, and (2) the try_module_get() prevents any new users. Of course, if you wanted to remove the entry at any other time (eg. hotplug), this doesn't help you one damn bit (which is kind of your point). > 3) change remove_proc_entry or add remove_proc_entry_wait that > works like remove_proc_entry, but blocks until the entry is > deleted. Problem: may sleep "forever". This is what network devices do, and what the sockopt registration code does, too, so this is already in the kernel, too. It's not great, but it becomes a noop for the module deregistration stuff. Thanks! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-19 3:30 ` Rusty Russell @ 2003-02-19 4:11 ` Werner Almesberger 2003-02-19 23:38 ` Rusty Russell 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-19 4:11 UTC (permalink / raw) To: Rusty Russell; +Cc: Roman Zippel, kuznet, kronos, linux-kernel Rusty Russell wrote: > Of course, if you wanted to remove the entry at any other time > (eg. hotplug), this doesn't help you one damn bit (which is kind of > your point). Yep, try_module_get solves the general synchronization problem for the special but interesting case of modules, but not for the general case. > This is what network devices do, and what the sockopt registration > code does, too, so this is already in the kernel, too. It's not > great, but it becomes a noop for the module deregistration stuff. Yes, I think just sleeping isn't so bad at all. First of all, we already have the module use count as a kind of "don't unload now" advice (not sure if you plan to phase out MOD_INC_USE_COUNT ?), and second, it's not exactly without precedent anyway. E.g. umount will have little qualms of letting you sleep "forever". (And, naturally, every once in a while, people hate it for this :-) Anyway, I'll write more about this tomorrow. For tonight, I have my advanced insanity 101 to finish, topic "ptracing more than one UML/TT at the same time". - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-19 4:11 ` Werner Almesberger @ 2003-02-19 23:38 ` Rusty Russell 2003-02-20 9:46 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Rusty Russell @ 2003-02-19 23:38 UTC (permalink / raw) To: Werner Almesberger; +Cc: Roman Zippel, kuznet, kronos, linux-kernel In message <20030219011154.X2092@almesberger.net> you write: > Rusty Russell wrote: > > Of course, if you wanted to remove the entry at any other time > > (eg. hotplug), this doesn't help you one damn bit (which is kind of > > your point). > > Yep, try_module_get solves the general synchronization problem for > the special but interesting case of modules, but not for the general > case. Absolutely. And I admire your (and Roman's) bravery for trying the general case. > will have little qualms of letting you sleep "forever". (And, > naturally, every once in a while, people hate it for this :-) (Well, MOD_INC_USE_COUNT is already deprecated, mainly because it's used to try to control module counts from within, which preempt makes really hard). Yes, the addition of "umount -l" is congruent to "rmmod --wait". The assumption is "I don't want any new users, and I'll handle any current ones". You can get yourself in trouble with both of them. Keep up the good work, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-19 23:38 ` Rusty Russell @ 2003-02-20 9:46 ` Roman Zippel 0 siblings, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-02-20 9:46 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, kronos, linux-kernel Hi, On Thu, 20 Feb 2003, Rusty Russell wrote: > Yes, the addition of "umount -l" is congruent to "rmmod --wait". The > assumption is "I don't want any new users, and I'll handle any current > ones". You can get yourself in trouble with both of them. With the small difference that "umount -l" won't deadlock. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 17:22 ` Werner Almesberger 2003-02-19 3:30 ` Rusty Russell @ 2003-02-20 0:40 ` Roman Zippel 2003-02-20 2:17 ` Werner Almesberger 2003-02-23 23:34 ` Kevin O'Connor 2 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-20 0:40 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, On Tue, 18 Feb 2003, Werner Almesberger wrote: > Next round: possible remedies and their side-effects. As > usual, if you disagree with something, please holler. > > If yes, let's look at possible (and not overly insane) solutions, > using remove_proc_entry as a case study: > > 1) still don't kfree, and leave it to the user to somehow > minimize the damage. (Good luck :-) Obviously not a good idea. > 3) change remove_proc_entry or add remove_proc_entry_wait that > works like remove_proc_entry, but blocks until the entry is > deleted. Problem: may sleep "forever". Sleeping is not a good general solution, as you have to be very careful not to hold any important locks, otherwise it's easy to abuse. > 6) export the lookup mechanism, and let the caller poll for > removal. Problem: races with creation of a new entry with > the same name. I don't really understand this one. > 7) transfer ownership of de->data to procfs, and set some > (possibly configurable) destruction policy (e.g. always > kfree, or such). Similar to 2), but less flexible. de->data might contain references to other data and de->data might not the only dynamic data, just the most visible one, the read/write functions can access other dynamic data as well. > 2) add a callback that is invoked when the proc entry gets > deleted. (This callback may be called before remove_proc_entry > completes.) Problem: unload/return race for modules. > > 4) make remove_proc_entry return an indication of whether the > entry was successfully removed or not. Problem: if it > wasn't, what can we do then ? > > 5) like above, but don't remove the entry if we can't do it > immediately. Problem: there's no notification for when we > should try again, so we'd have to poll. > > Any more ? If you want to make 4) and 5) separate options, you could the same with 2), you can unlink the entry during remove_proc_entry or after the last user is gone (before the callback is called). It would not be difficult to separate an unlink_proc_entry from remove_proc_entry, so we can force an unlink if needed and we can reduce this again to two basic options in two variations. :) The question is now whether we return an error value or use a callback. When a device is removed, we usually also want to remove all its data structures, on the other hand we can only remove a module when there are no users, so here we return an error value. Now I need a bigger example to put this into a context, a nice example is scsi_unregister. It removes among other things procfs entries and these entries have a reference to struct Scsi_Host. As long as scsi_unregister is called from module_exit everything works fine, but a bit searching reveals drivers/usb/storage/usb.c, which create/removes a scsi host when you plug/unplug a storage device (let's ignore other problems here, like it's still mounted). In this case nothing protects the removal of the proc entries anymore, this means if someone accesses /proc/scsi/usb-storage/... while the device is unplugged he will access freed data. Here would be now the patch from yesterday useful to implement a callback, but now we also can't simply free the Scsi_Host structure, so we have to add a reference count to it and only if all references to it are gone, it can be removed as well. At this point I can also respond to Rusty's main argument against giving more control to the modules, that they suddenly had to deal with "I started deregistering my stuff, but then...". Unless interfaces are only used during module_init/module_exit, they have to deal with this anyway. I can understand that the module interface is tempting - just wait until all user are gone and then disable the module. The only problem is that this breaks horribly, when the driver interface is used in a different context (as demonstrated with the usb/scsi example). So what speaks against forcing driver/interface writers to get it right from the beginning? (I at least could think of a few more reason that speak for this, but I'll continue this later.) bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-20 0:40 ` Roman Zippel @ 2003-02-20 2:17 ` Werner Almesberger 2003-02-23 16:02 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-20 2:17 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Roman Zippel wrote: > Sleeping is not a good general solution, as you have to be very careful > not to hold any important locks, otherwise it's easy to abuse. Sleeping has many problems, but it also has one great advantage: it's relatively easy to understand and doesn't need much extra code (if any) in the user. Also, I wouldn't expect lock interdependencies to be a major problem at the level of something removing all traces of itself (e.g. a driver module unregistering itself). Or would you happen to know some example where such problems are happening ? > > 6) export the lookup mechanism, and let the caller poll for > > removal. Problem: races with creation of a new entry with > > the same name. > > I don't really understand this one. Something like: foo_schedule_destruction(&whatever); while (foo_lookup(whatever.name)) yield(); Usually too ugly to consider for anything but desperate cases. > It would not be difficult to separate an unlink_proc_entry from > remove_proc_entry, so we can force an unlink if needed and we can reduce > this again to two basic options in two variations. :) Yes, separating unlink (quick, non-blocking, always succeeds) and destruction (slow, may run nethack, fragile) clearly has some appeal. > The question is now whether we return an error value or use a callback. There's also the difference that the version using a callback would schedule the removal, while the version returning an error would do nothing (unless the caller tries again). You could also do both, but that gets pretty difficult to use. > Now I need a bigger example to put this into a context, a nice example is > scsi_unregister. Nicely illustrates the problem of the "can look around one corner, but not two" property of things like try_module_get, yes. I'd also expect many cases of multiple devices which are serviced by the same driver to have the same sort of problems. E.g. I'm pretty sure I committed a few such sins in the ATM code, too. > So what speaks against forcing driver/interface writers to get it > right from the beginning? Raising the general awareness of such problems is exactly what I'm trying to accomplish here :-) By the way, also the argument "it may be broken but it has never failed so far" is getting weaker with the emergence of technologies that change relative code path lengths, like hyperthreading. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-20 2:17 ` Werner Almesberger @ 2003-02-23 16:02 ` Roman Zippel 2003-02-26 23:26 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-23 16:02 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, On Wed, 19 Feb 2003, Werner Almesberger wrote: > Roman Zippel wrote: > > Sleeping is not a good general solution, as you have to be very careful > > not to hold any important locks, otherwise it's easy to abuse. > > Sleeping has many problems, but it also has one great advantage: > it's relatively easy to understand and doesn't need much extra > code (if any) in the user. Uninterruptable sleep should generally be avoided at kernel level, either it should be possible to interrupt it or it should at least include a timeout, but then you have to deal with failure again. OTOH if you have callbacks, you can easily implement sleeping yourself. > Also, I wouldn't expect lock interdependencies to be a major > problem at the level of something removing all traces of itself > (e.g. a driver module unregistering itself). Or would you happen > to know some example where such problems are happening ? Imagine you want to remove a device structure, if you manage to block this somehow, you might completely block the creation and removeal of other devices. > > Now I need a bigger example to put this into a context, a nice example is > > scsi_unregister. > > Nicely illustrates the problem of the "can look around one corner, > but not two" property of things like try_module_get, yes. Thanks, it seems I didn't cause too much confusion yet, so I can go on with the next part. I think we can agree that the module locking only protects a special case and as hotpluggable devices become more and more important, we should rather get it right in the general case. Anyway, this alone would be not reason enough to change the module interface, but another module interface would give us more flexibility and reduce the locking complexity. To make it easy for Rusty I take an example he is familiar with - netfilter. Let's assume we had an interface which allows exit to fail, how would that help? First, struct nf_hook_ops gets a reference count (struct module does the job), now with a small change to nf_unregister_hook: void nf_unregister_hook(struct nf_hook_ops *reg) { br_write_lock_bh(BR_NETPROTO_LOCK); - list_del(®->list); + list_del_init(®->list); br_write_unlock_bh(BR_NETPROTO_LOCK); } we can do the following from module_exit: int foo_exit() { nf_unregister_hook(&foo_ops); return module_refcount(THIS_MODULE) ? -EBUSY : 0; } to get a reference to the module this would be enough: static inline void __module_get(struct module *module) { local_inc(&module->ref[smp_processor_id()].count); } releasing the reference is as simple: static inline void __module_put(struct module *module) { local_dec(&module->ref[smp_processor_id()].count); } This probably needs a bit of explanation: 1. more flexibility: The driver has better the knowledge about how the module can be stopped and especially only the driver knows which events should prevent the shutdown of the module. There is an important difference between stopping the module and actually removing the module. Any reference to the module must prevent the module removal, but not everything needs to stop the module from shutting down. For example an incoming network packet doesn't need to prevent the shutdown, so we can just remove the hook above independent of the reference count and just wait for the reference to become zero. On the other hand this means the module could be in a disabled state, but this can rather be compared with lazy unmount (via "umount -l") - after the last user is gone the module can be removed without problems, but the module could also be reenabled. 2. reduced locking complexity: Above demonstrates how it could look like, if we can get rid of the live state (at least outside of the generic module code). I explained the theory behind this before and posted the link already a few times: http://marc.theaimsgroup.com/?l=linux-kernel&m=104284223130775&w=2 This live state isn't needed most of the time and if some module should really need this, it can easily implement that itself. To help understanding this it maybe helps to clear up a common misconception: modules are no special objects and they are not different to other objects which can be removed while the kernel is running. Even Rusty confuses this issue (http://lwn.net/Articles/15404/). The question should rather be: how can I safely access an object which I don't own? There are exactly two rules: - The user has to lock out the owner. - When the user wants to access the object outside the lock, he has to tell that the owner, usually done via a reference count. One of these conditions must be true (no exception), otherwise we have a problem. Applied to netfilter this means as long as we hold the netproto brlock we can safely access any netfilter hook and only if we need a reference to the hook outside of this lock, we had to call try_module_get(). This also means that we don't have to try to get a reference for every packet, instead e.g. it would be enough to do this once per connection. (Why now an atomic counter wasn't sufficient for this and why Rusty had to mess up the module code for this, will probably stay one of the big mysteries...) bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-23 16:02 ` Roman Zippel @ 2003-02-26 23:26 ` Werner Almesberger 2003-02-27 12:34 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-26 23:26 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Roman Zippel wrote: > Anyway, this alone would be not reason enough to change the module > interface, but another module interface would give us more flexibility and > reduce the locking complexity. Wait, wait ! :-) There's one step you've left out: what we actually expect the module interface to do. We have: - what it currently does, or what it did in the past - what users think it does - what users want it to do - what we think the users should want - what we think is a comfortable compromise With "users", I mean primarily the guy who invokes "rmmod", or such. Anyway, I'm afraid I can't offer much wisdom from experience for this part, for I'm not much of a module user myself. I'll have more to say on service interfaces, though. Sorry for slowing down, but I'm currently quite busy absorbing all the cool stuff that's recently been happening with UML. (So, blame Jeff ;-)) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-26 23:26 ` Werner Almesberger @ 2003-02-27 12:34 ` Roman Zippel 2003-02-27 13:20 ` Werner Almesberger 0 siblings, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-27 12:34 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, On Wed, 26 Feb 2003, Werner Almesberger wrote: > Roman Zippel wrote: > > Anyway, this alone would be not reason enough to change the module > > interface, but another module interface would give us more flexibility and > > reduce the locking complexity. > > Wait, wait ! :-) There's one step you've left out: what we actually > expect the module interface to do. We have: There are several module interfaces: - module user interface - module load interface - module driver interface These are quite independent and so far we were talking about the last one, so I'm a bit confused about your request to talk about the first. <rant> BTW Why Rusty had to completely break the first two interfaces to "improve" the last one, is probably another mystery, I'll never understand. </rant> bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-27 12:34 ` Roman Zippel @ 2003-02-27 13:20 ` Werner Almesberger 2003-02-27 14:33 ` Roman Zippel 0 siblings, 1 reply; 73+ messages in thread From: Werner Almesberger @ 2003-02-27 13:20 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Roman Zippel wrote: > There are several module interfaces: > - module user interface > - module load interface > - module driver interface Hmm, the "load interface" would be the system calls, while the "driver interface" would be initialization and cleanup functions in the module ? We've already established that most things called from the latter isn't actually module specific. > These are quite independent and so far we were talking about the last one, > so I'm a bit confused about your request to talk about the first. I'm not so sure about them being independent. E.g. if we just had a blocking single-phase cleanup, users would always see success, but resources may be tied up indefinitely, which would break uses like rmmod foo insmod foo.o On the other hand, with a non-blocking two-phase cleanup, users would still always see success, but only "anonymous" resources (memory, etc.) would be tied up. Last but not least, a non-blocking single-phase cleanup would expose all failures to the user, and require some retry strategy. Furthermore, use counts can have several meanings: - indicate when it's convenient for the module to be removed - indicate when it's possible for the module to be removed - indicate what consequences the user may experience if trying to remove now (e.g. blocking) The "old" module count was a bit of a mixture of the first two. The "new" count is clearer. Oh, lest I be misunderstood: I'm not saying that we should redesign everything, and screw compatibility. I just want to bring the hidden assumptions that have piled up over the life of the module system out in the open. Then, of course, there are still plenty of opportunities to plot any massive breakage ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-27 13:20 ` Werner Almesberger @ 2003-02-27 14:33 ` Roman Zippel 0 siblings, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-02-27 14:33 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, On Thu, 27 Feb 2003, Werner Almesberger wrote: > > There are several module interfaces: > > - module user interface > > - module load interface > > - module driver interface > > Hmm, the "load interface" would be the system calls, while > the "driver interface" would be initialization and cleanup > functions in the module ? > > We've already established that most things called from the > latter isn't actually module specific. But the module driver interface has an impact on the other driver interfaces and that's what I tried to describe in the previous mail. > > These are quite independent and so far we were talking about the last one, > > so I'm a bit confused about your request to talk about the first. > > I'm not so sure about them being independent. E.g. if we > just had a blocking single-phase cleanup, users would always > see success, but resources may be tied up indefinitely, which > would break uses like > > rmmod foo > insmod foo.o rmmod could already fail before, because the module is busy, so I'm not sure, what should break here. > On the other hand, with a non-blocking two-phase cleanup, users > would still always see success, but only "anonymous" resources > (memory, etc.) would be tied up. > > Last but not least, a non-blocking single-phase cleanup would > expose all failures to the user, and require some retry strategy. Now you're skipping ahead. You are already looking at the possible consequences for the module user interface, before we actually made clear in which ways the module driver interface could be changed. There are of course dependencies between the interfaces, but you can change a lot in one before you have to modify another. > Furthermore, use counts can have several meanings: > - indicate when it's convenient for the module to be removed > - indicate when it's possible for the module to be removed > - indicate what consequences the user may experience if > trying to remove now (e.g. blocking) > > The "old" module count was a bit of a mixture of the first two. > The "new" count is clearer. There is/was no count for the first?! Which "new" count? bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 17:22 ` Werner Almesberger 2003-02-19 3:30 ` Rusty Russell 2003-02-20 0:40 ` Roman Zippel @ 2003-02-23 23:34 ` Kevin O'Connor 2003-02-24 12:14 ` Roman Zippel 2 siblings, 1 reply; 73+ messages in thread From: Kevin O'Connor @ 2003-02-23 23:34 UTC (permalink / raw) To: Werner Almesberger; +Cc: Roman Zippel, Rusty Russell, linux-kernel On Tue, Feb 18, 2003 at 02:22:57PM -0300, Werner Almesberger wrote: > Next round: possible remedies and their side-effects. As > usual, if you disagree with something, please holler. > > If yes, let's look at possible (and not overly insane) solutions, > using remove_proc_entry as a case study: Hi Werner, Thanks for putting together this list; I've been following this thread for a while and it seems that the discussion always deteriorates into semantics. Anyway, I think there is an eighth possible solution to the list you proposed. Just to be specific, the requirements for the proc entry stuff are: a) a mechanism needs to be defined to indicate that the entry is no longer needed (something that starts the delete process -- ie, remove_proc_entry), b) the code must conform to a system that ensures de->data will not be used after the "release" code is executed, and c) the "release" code must be eventually executed. Assuming these requirements are really requirements (your options 1 and 4 don't seem to meet these) then an "eighth" solution is: 8) Have the unregister code (remove_proc_entry) set an external flag (eg, de->data_is_there), and update all users of de->data to check the flag before following the pointer. Option 8 may not qualify as "sane", but I think it is important to add it because it is what the module code is currently using. Thus, one need not look at the module stuff as a "special case", but as a general (if complicated) resource management solution. Finally, one could probably apply any of the "options" for any dynamically allocated memory. It is interesting that Linus seems to prefer option 2/7 (from the recent kobject work and CodingStyle). If I've missed something, please let me know. -Kevin > 1) still don't kfree, and leave it to the user to somehow > minimize the damage. (Good luck :-) > > 2) add a callback that is invoked when the proc entry gets > deleted. (This callback may be called before remove_proc_entry > completes.) Problem: unload/return race for modules. > > 3) change remove_proc_entry or add remove_proc_entry_wait that > works like remove_proc_entry, but blocks until the entry is > deleted. Problem: may sleep "forever". > > 4) make remove_proc_entry return an indication of whether the > entry was successfully removed or not. Problem: if it > wasn't, what can we do then ? > > 5) like above, but don't remove the entry if we can't do it > immediately. Problem: there's no notification for when we > should try again, so we'd have to poll. > > 6) export the lookup mechanism, and let the caller poll for > removal. Problem: races with creation of a new entry with > the same name. > > 7) transfer ownership of de->data to procfs, and set some > (possibly configurable) destruction policy (e.g. always > kfree, or such). Similar to 2), but less flexible. -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-23 23:34 ` Kevin O'Connor @ 2003-02-24 12:14 ` Roman Zippel 0 siblings, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-02-24 12:14 UTC (permalink / raw) To: Kevin O'Connor; +Cc: Werner Almesberger, Rusty Russell, linux-kernel Hi, On Sun, 23 Feb 2003, Kevin O'Connor wrote: > 8) Have the unregister code (remove_proc_entry) set an external flag (eg, > de->data_is_there), and update all users of de->data to check the flag > before following the pointer. > > Option 8 may not qualify as "sane", but I think it is important to add it > because it is what the module code is currently using. Thus, one need not > look at the module stuff as a "special case", but as a general (if > complicated) resource management solution. Yes, it's another possible solution, but it has the same problem as the current module locking - increased locking complexity. Such flag actually exists already ("deleted"), but no user can use it currently, because the read/write functions don't have the proc entry argument. Even if they could use it, switching this flag isn't enough remove_proc_entry also had to synchronize with active users, so users had to take some lock just to read the data, where a simple reference was sufficient before. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 7:20 ` Werner Almesberger 2003-02-18 12:06 ` Roman Zippel @ 2003-02-18 12:35 ` Roman Zippel 2003-02-18 14:14 ` Werner Almesberger 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-18 12:35 UTC (permalink / raw) To: Werner Almesberger; +Cc: linux-kernel Hi, On Tue, 18 Feb 2003, Werner Almesberger wrote: > I don't think we'll make much progress if we keep on mixing issues > of interface correctness, current module constraints, and possible > module interface changes, all that with performance considerations > and minimum invasive migration plans thrown in. So I'd suggest the > following sequence: Um, another point, let's ignore "minimum invasive migration plans", if we found a good solution, we can still figure out how to get there smoothly, so this shouldn't be a primary concern. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 12:35 ` Roman Zippel @ 2003-02-18 14:14 ` Werner Almesberger 0 siblings, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-18 14:14 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel Roman Zippel wrote: > Um, another point, let's ignore "minimum invasive migration plans", if we > found a good solution, we can still figure out how to get there smoothly, > so this shouldn't be a primary concern. In my next posting, I just wrote (this very minute) "For brainstorming's sake, let's not worry about backward-compatibility for now:" So I guess I don't need this disclaimer then ;-) - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-18 1:18 ` Werner Almesberger 2003-02-18 4:54 ` Rusty Russell @ 2003-02-19 1:48 ` Roman Zippel 2003-02-19 2:27 ` Werner Almesberger 1 sibling, 1 reply; 73+ messages in thread From: Roman Zippel @ 2003-02-19 1:48 UTC (permalink / raw) To: Werner Almesberger; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Hi, On Mon, 17 Feb 2003, Werner Almesberger wrote: > > failure: if the object is still busy, we just return -EBUSY. This is > > simple, but this doesn't work for modules, since during module exit you > > can't fail anymore. > > That's a modules API problem. And yes, I think modules should > eventually be able to say that they're busy. As Rusty already correctly said, this is currently not possible with the current modules API. On the other hand he also jumps very quickly to conclusions, so let's look at all the options first and then decide how trivial they are. Anyway, for now just keep this option in mind and let's look at the other options, which don't require a module API change. > By the way, a loong time ago, in the modules thread, I suggested > a "decrement_module_count_and_return" function [1]. Such a > construct would be useful in this specific case. This would just be an optimization(?) for the module count, it doesn't change the general problem and is not useful as generic solution, so I'd rather put it back for now. > > failure doesn't work with modules, so that only leaves the module count. > > And how would you ensure correct access to static data in the > absence of modules ? Any solution that _requires_ a module count > looks highly suspicious to me. In that case it would be kernel memory, which cannot be freed, so it will not go away and requires no module count. > Likewise, possibly dynamically allocated data that is synchronized > by the caller, e.g. "user" in "struct proc_dir_entry". user? > > The last solution sounds complicated, but exactly this is done for > > filesystems and we didn't really get rid of the second reference count, we > > just moved it somewhere else, where it hurts least. > > Hmm, I'm confused. With "filesystem", do you mean the file system > driver per se (e.g. "ext3"), or a specific instance of such a file > system (e.g. /dev/hda1 mounted on /) ? A generic file system as it's registered via register_filesystem. A file system exports lots of dynamic and static data and the only (used) module pointer you will find is in struct file_system_type (the owner pointer in struct file_operations is not used by any standard fs). It's interesting to see how file systems keep the module business at a minimum (no try_module_get() is still cheaper than the cheapest try_module_get()). Um, it's getting late and I just played too much with procfs to find a sensible solution. Below is an experimental patch to add callback which would allow it to safely remove user data. Very lightly tested, so no guarantees. bye, Roman Index: include/linux/proc_fs.h =================================================================== RCS file: /usr/src/cvsroot/linux-2.5/include/linux/proc_fs.h,v retrieving revision 1.1.1.9 diff -u -p -r1.1.1.9 proc_fs.h --- include/linux/proc_fs.h 27 Aug 2002 23:38:02 -0000 1.1.1.9 +++ include/linux/proc_fs.h 19 Feb 2003 00:46:28 -0000 @@ -69,6 +69,7 @@ struct proc_dir_entry { void *data; read_proc_t *read_proc; write_proc_t *write_proc; + void (*release_proc)(struct proc_dir_entry *); atomic_t count; /* use count */ int deleted; /* delete flag */ kdev_t rdev; Index: fs/proc/generic.c =================================================================== RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/generic.c,v retrieving revision 1.1.1.12 diff -u -p -r1.1.1.12 generic.c --- fs/proc/generic.c 15 Feb 2003 01:25:07 -0000 1.1.1.12 +++ fs/proc/generic.c 19 Feb 2003 01:40:36 -0000 @@ -639,6 +639,8 @@ void free_proc_entry(struct proc_dir_ent if (ino < PROC_DYNAMIC_FIRST || ino >= PROC_DYNAMIC_FIRST+PROC_NDYNAMIC) return; + if (de->release_proc) + de->release_proc(de); if (S_ISLNK(de->mode) && de->data) kfree(de->data); kfree(de); @@ -655,6 +657,7 @@ void remove_proc_entry(const char *name, const char *fn = name; int len; + lock_kernel(); if (!parent && xlate_proc_name(name, &parent, &fn) != 0) goto out; len = strlen(fn); @@ -680,5 +683,6 @@ void remove_proc_entry(const char *name, break; } out: + unlock_kernel(); return; } Index: fs/proc/inode.c =================================================================== RCS file: /usr/src/cvsroot/linux-2.5/fs/proc/inode.c,v retrieving revision 1.1.1.13 diff -u -p -r1.1.1.13 inode.c --- fs/proc/inode.c 1 Feb 2003 19:59:21 -0000 1.1.1.13 +++ fs/proc/inode.c 19 Feb 2003 01:40:36 -0000 @@ -23,13 +23,6 @@ extern void free_proc_entry(struct proc_dir_entry *); -static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de) -{ - if (de) - atomic_inc(&de->count); - return de; -} - /* * Decrements the use count and checks for deferred deletion. */ @@ -44,11 +37,13 @@ static void de_put(struct proc_dir_entry } if (atomic_dec_and_test(&de->count)) { + struct module *owner = de->owner; if (de->deleted) { printk("de_put: deferred delete of %s\n", de->name); free_proc_entry(de); } + module_put(owner); } unlock_kernel(); } @@ -71,11 +66,8 @@ static void proc_delete_inode(struct ino /* Let go of any associated proc directory entry */ de = PROC_I(inode)->pde; - if (de) { - if (de->owner) - module_put(de->owner); + if (de) de_put(de); - } } struct vfsmount *proc_mnt; @@ -178,44 +170,36 @@ struct inode * proc_get_inode(struct sup /* * Increment the use count so the dir entry can't disappear. */ - de_get(de); -#if 1 -/* shouldn't ever happen */ -if (de && de->deleted) -printk("proc_iget: using deleted entry %s, count=%d\n", de->name, atomic_read(&de->count)); -#endif + if (!atomic_read(&de->count) && !try_module_get(de->owner)) + return NULL; + atomic_inc(&de->count); inode = iget(sb, ino); if (!inode) goto out_fail; - + PROC_I(inode)->pde = de; - if (de) { - if (de->mode) { - inode->i_mode = de->mode; - inode->i_uid = de->uid; - inode->i_gid = de->gid; - } - if (de->size) - inode->i_size = de->size; - if (de->nlink) - inode->i_nlink = de->nlink; - if (!try_module_get(de->owner)) - goto out_fail; - if (de->proc_iops) - inode->i_op = de->proc_iops; - if (de->proc_fops) - inode->i_fop = de->proc_fops; - else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode)) - init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev)); + if (de->mode) { + inode->i_mode = de->mode; + inode->i_uid = de->uid; + inode->i_gid = de->gid; } + if (de->size) + inode->i_size = de->size; + if (de->nlink) + inode->i_nlink = de->nlink; + if (de->proc_iops) + inode->i_op = de->proc_iops; + if (de->proc_fops) + inode->i_fop = de->proc_fops; + else if (S_ISBLK(de->mode)||S_ISCHR(de->mode)||S_ISFIFO(de->mode)) + init_special_inode(inode,de->mode,kdev_t_to_nr(de->rdev)); -out: return inode; out_fail: de_put(de); - goto out; + return NULL; } int proc_fill_super(struct super_block *s, void *data, int silent) ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Is an alternative module interface needed/possible? 2003-02-19 1:48 ` Roman Zippel @ 2003-02-19 2:27 ` Werner Almesberger 0 siblings, 0 replies; 73+ messages in thread From: Werner Almesberger @ 2003-02-19 2:27 UTC (permalink / raw) To: Roman Zippel; +Cc: Rusty Russell, kuznet, kronos, linux-kernel Roman Zippel wrote: > Anyway, for now just keep this option in mind and let's look at the other > options, which don't require a module API change. Yes, we can look at the modules case at the end. > In that case it would be kernel memory, which cannot be freed, so it will > not go away and requires no module count. kfree isn't the only way to make data unusable. Remember the "my_state" example. > > Likewise, possibly dynamically allocated data that is synchronized > > by the caller, e.g. "user" in "struct proc_dir_entry". > > user? "data", sorry. I always call this kind of argument something like "user" in my code ... > A generic file system as it's registered via register_filesystem. Okay, I'll have a look at that. > Um, it's getting late and I just played too much with procfs to find a > sensible solution. Below is an experimental patch to add callback which > would allow it to safely remove user data. Very lightly tested, so no > guarantees. Yep, that's the kind of callback I had in mind. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-16 1:12 ` Rusty Russell 2003-01-16 2:42 ` Werner Almesberger @ 2003-01-16 13:44 ` Roman Zippel 1 sibling, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-01-16 13:44 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, kronos, linux-kernel Hi, Rusty Russell wrote: > Deprecating every module, and rewriting their initialization routines > is ambitious beyond the scale of anything you have mentioned. Not > that 90% of the kernel code couldn't use a damn good spring cleaning, > but I'm not prepared to make such a change personally. The transition would be rather painless: if (mod->new_exit) { res = -EBUSY; if (force || !mod_use_count(mod)) res = mod->new_exit(); } else { res = mod_try_exit(); if (!res && mod->old_exit) { mod->old_exit(); } Yes, there is a small race, but at least it's nonfatal, as the module has to repeat the module count test or is protected otherwise. Now we can start fixing interfaces and exit functions can still call mod_try_exit() before calling old style interfaces. > So we go from: > > int init(void) > { > if (!register_foo(&foo)) > return -err; > if (!register_bar(&bar)) { > unregister_foo(&foo); > return -err; > } > return 0; > } > > void fini(void) > { > unregister_foo(&foo); > unregister_bar(&bar); > } > > to: int init(void) { if (register_foo(&foo)) return -err; if (register_bar(&bar)) return -err; return 0; } void fini(void) { if (unregister_bar(&bar)) return -err; if (unregister_foo(&foo)) return -err; return 0; } Hmm, looks as simple if not simpler to me. > Something like this? > > static int i_am_live; > static spinlock_t my_lock = SPIN_LOCK_UNLOCKED; > > /* This is our registered function. */ > static int foo_function(void *somedata) > { > int live; > > spin_lock(&my_lock); > live = i_am_live; > spin_unlock(&my_lock); > if (!live) > return -EIGNOREME???; > ... > } If the exit function starts the shutdown, there is no going back anymore. Let's take loop.c as example. It can either deconfigure the devices itself (via loop_clr_fd()) or it let's the user do it and only removes unused devices and prevents any new loop_set_fd() to succeed. In any case there will be no deadlock. > > Hmm, what makes security modules (what exactly do you mean > > by that ?) special ? > > On a busy system, they're never not being used. Your unload routine > would always fail. Same with netfilter modules. int unregister(foo) { lock(); list_del_init(&foo->list); unlock(); return foo->usecount ? -EBUSY : 0; } This makes sure no new user will start using it and at some point it has to return zero. > The current scheme is clean: it's two-stage delete with a nice helper > function "try_module_get()" which tells you when it's going away, and > no requirement that modules actually implement two-stage delete > themselves. The patch to mirror this in two-stage init was posted > yesterday, as well. The current scheme is the reserve/activate scheme in disguise and it sucks as much. > Unfortunately, I don't have the patience to explain this once for > every kernel developer. Well, I'd be happy to see an implementation that not only works for the most simplest cases. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC] Migrating net/sched to new module interface 2003-01-15 8:16 ` Rusty Russell 2003-01-15 9:33 ` Werner Almesberger @ 2003-01-15 17:04 ` Roman Zippel 1 sibling, 0 replies; 73+ messages in thread From: Roman Zippel @ 2003-01-15 17:04 UTC (permalink / raw) To: Rusty Russell; +Cc: Werner Almesberger, kuznet, kronos, linux-kernel Hi, Rusty Russell wrote: > 2) It's bad enough to force the interfaces to change: at least the > primitive they are to use is one many of them are already using, > and is very simple to understand. They are indeed simple, but only because it's impossible to implement anything more complex. An example: A "rmmod -w loop" will currently deadlock on a busy loop module. Could you please explain, how it will be possible to force a safe removal of the loop module? > PS. The *implementation* flaw in your scheme: someone starts using a > module as you try to deregister it. Either you re-register the > module (ie. you can never unload security modules), or you leave > it half unloaded (even worse). What is the problem with a half unloaded module? Only the module knows which interfaces it can safely remove to prevent new users, afterwards it only has to wait for remaining user to leave to complete the cleanup. BTW this also solves nicely the module init race. bye, Roman ^ permalink raw reply [flat|nested] 73+ messages in thread
end of thread, other threads:[~2003-02-27 14:23 UTC | newest] Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-01-02 22:50 [RFC] Migrating net/sched to new module interface Kronos 2003-01-03 5:10 ` Rusty Russell 2003-01-03 8:37 ` David S. Miller 2003-01-04 6:09 ` Rusty Russell 2003-01-04 16:21 ` Kronos 2003-01-13 22:32 ` kuznet 2003-01-13 23:23 ` Max Krasnyansky 2003-01-14 17:49 ` Kronos 2003-01-15 0:21 ` Roman Zippel 2003-01-15 1:19 ` kuznet 2003-01-15 7:31 ` Werner Almesberger 2003-01-15 8:16 ` Rusty Russell 2003-01-15 9:33 ` Werner Almesberger 2003-01-16 1:12 ` Rusty Russell 2003-01-16 2:42 ` Werner Almesberger 2003-01-16 3:31 ` Rusty Russell 2003-01-16 4:20 ` Werner Almesberger 2003-01-16 4:25 ` David S. Miller 2003-01-16 4:49 ` Werner Almesberger 2003-01-16 16:05 ` Roman Zippel 2003-01-16 18:15 ` Roman Zippel 2003-01-16 18:58 ` Werner Almesberger 2003-01-16 23:53 ` Roman Zippel 2003-01-17 1:04 ` Greg KH 2003-01-17 2:27 ` Rusty Russell 2003-01-17 21:40 ` Roman Zippel 2003-02-13 23:16 ` Werner Almesberger 2003-02-14 1:57 ` Rusty Russell 2003-02-14 3:44 ` Werner Almesberger 2003-02-14 11:16 ` Roman Zippel 2003-02-14 12:04 ` Rusty Russell 2003-02-14 12:49 ` Roman Zippel 2003-02-17 1:59 ` Rusty Russell 2003-02-17 10:53 ` Roman Zippel 2003-02-17 23:31 ` Rusty Russell 2003-02-18 12:26 ` Roman Zippel 2003-02-14 13:21 ` Roman Zippel 2003-02-14 13:53 ` Werner Almesberger 2003-02-14 14:24 ` Roman Zippel 2003-02-14 18:30 ` Werner Almesberger 2003-02-14 20:09 ` Roman Zippel 2003-02-15 0:12 ` Werner Almesberger 2003-02-15 0:51 ` Roman Zippel 2003-02-15 2:28 ` Werner Almesberger 2003-02-15 23:20 ` Roman Zippel 2003-02-17 17:04 ` Werner Almesberger 2003-02-17 23:09 ` [RFC] Is an alternative module interface needed/possible? Roman Zippel 2003-02-18 1:18 ` Werner Almesberger 2003-02-18 4:54 ` Rusty Russell 2003-02-18 7:20 ` Werner Almesberger 2003-02-18 12:06 ` Roman Zippel 2003-02-18 14:12 ` Werner Almesberger 2003-02-18 12:45 ` Thomas Molina 2003-02-18 17:22 ` Werner Almesberger 2003-02-19 3:30 ` Rusty Russell 2003-02-19 4:11 ` Werner Almesberger 2003-02-19 23:38 ` Rusty Russell 2003-02-20 9:46 ` Roman Zippel 2003-02-20 0:40 ` Roman Zippel 2003-02-20 2:17 ` Werner Almesberger 2003-02-23 16:02 ` Roman Zippel 2003-02-26 23:26 ` Werner Almesberger 2003-02-27 12:34 ` Roman Zippel 2003-02-27 13:20 ` Werner Almesberger 2003-02-27 14:33 ` Roman Zippel 2003-02-23 23:34 ` Kevin O'Connor 2003-02-24 12:14 ` Roman Zippel 2003-02-18 12:35 ` Roman Zippel 2003-02-18 14:14 ` Werner Almesberger 2003-02-19 1:48 ` Roman Zippel 2003-02-19 2:27 ` Werner Almesberger 2003-01-16 13:44 ` [RFC] Migrating net/sched to new module interface Roman Zippel 2003-01-15 17:04 ` Roman Zippel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).