On 04/12/2010 08:49 PM, Chris Mason wrote: > I have a microbenchmark to test how quickly we can post and wait in > bulk. With this change, semtimedop is able do to more than twice > as much work in the same run. On a large numa machine, it brings > the IPC lock system time (reported by perf) down from 85% to 15%. > > Looking at the current code: - update_queue() can be O(N^2) if only some of the waiting tasks are woken up. Actually: all non-woken up tasks are rescanned after a task that can be woken up is found. - Your test app tests the best case for the current code: You wake up the tasks in the same order as the called semop(). If you invert the order (i.e.: worklist_add() adds to head instead of tail), I would expect an even worse performance of the current code. The O(N^2) is simple to fix, I've attached a patch. For your micro-benchmark, the patch does not change much: you wake-up in-order, thus the current code does not misbehave. Do you know how Oracle wakes up the tasks? FIFO, LIFO, un-ordered? > while(unlikely(error == IN_WAKEUP)) { > cpu_relax(); > error = queue.status; > } > > - if (error != -EINTR) { > + /* > + * we are lock free right here, and we could have timed out or > + * gotten a signal, so we need to be really careful with how we > + * play with queue.status. It has three possible states: > + * > + * -EINTR, which means nobody has changed it since we slept. This > + * means we woke up on our own. > + * > + * IN_WAKEUP, someone is currently waking us up. We need to loop > + * here until they change it to the operation error value. If > + * we don't loop, our process could exit before they are done waking us > + * > + * operation error value: we've been properly woken up and can exit > + * at any time. > + * > + * If queue.status is currently -EINTR, we are still being processed > + * by the semtimedop core. Someone either has us on a list head > + * or is currently poking our queue struct. We need to find that > + * reference and remove it, which is what remove_queue_from_lists > + * does. > + * > + * We always check for both -EINTR and IN_WAKEUP because we have no > + * locks held. Someone could change us from -EINTR to IN_WAKEUP at > + * any time. > + */ > + if (error != -EINTR&& error != IN_WAKEUP) { > /* fast path: update_queue already obtained all requested > * resources */ No: The code accesses a local variable. The loop above the comment guarantees that the error can't be IN_WAKEUP. > + > +out_putref: > + sem_putref(sma); > + goto out_free; > Is it possible to move the sem_putref into wakeup_sem_queue()? Right now, the exit path of semtimedop doesn't touch the spinlock. You remove that optimization. -- Manfred