From: Kent Overstreet <koverstreet@google.com> To: Oleg Nesterov <oleg@redhat.com> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, akpm@linux-foundation.org, Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux-foundation.org>, Ingo Molnar <mingo@redhat.com> Subject: Re: [PATCH 17/21] Percpu tag allocator Date: Wed, 15 May 2013 02:25:43 -0700 [thread overview] Message-ID: <20130515092543.GE16164@moria.home.lan> (raw) In-Reply-To: <20130514134859.GA17587@redhat.com> On Tue, May 14, 2013 at 03:48:59PM +0200, Oleg Nesterov wrote: > On 05/13, Kent Overstreet wrote: > > > > +unsigned tag_alloc(struct tag_pool *pool, bool wait) > > +{ > > + struct tag_cpu_freelist *tags; > > + unsigned long flags; > > + unsigned ret; > > +retry: > > + preempt_disable(); > > + local_irq_save(flags); > > + tags = this_cpu_ptr(pool->tag_cpu); > > + > > + while (!tags->nr_free) { > > + spin_lock(&pool->lock); > > + > > + if (pool->nr_free) > > + move_tags(tags->free, &tags->nr_free, > > + pool->free, &pool->nr_free, > > + min(pool->nr_free, pool->watermark)); > > + else if (wait) { > > + struct tag_waiter wait = { .task = current }; > > + > > + __set_current_state(TASK_UNINTERRUPTIBLE); > > + list_add(&wait.list, &pool->wait); > > + > > + spin_unlock(&pool->lock); > > + local_irq_restore(flags); > > + preempt_enable(); > > + > > + schedule(); > > + __set_current_state(TASK_RUNNING); > > schedule() always returns in TASK_RUNNING state > > > + > > + if (!list_empty_careful(&wait.list)) { > > + spin_lock_irqsave(&pool->lock, flags); > > + list_del_init(&wait.list); > > + spin_unlock_irqrestore(&pool->lock, flags); > > This is only theoretical, but racy. > > tag_free() does > > list_del_init(wait->list); > /* WINDOW */ > wake_up_process(wait->task); > > in theory the caller of tag_alloc() can notice list_empty_careful(), > return without taking pool->lock, exit, and free this task_struct. > > But the main problem is that it is not clear why this code reimplements > add_wait_queue/wake_up_all, for what? To save on locking... there's really no point in another lock for the wait queue. Could just use the wait queue lock instead I suppose, like wait_event_interruptible_locked() (the extra spin_lock()/unlock() might not really cost anything but nested irqsave()/restore() is ridiculously expensive, IME). > I must admit, I do not understand what this code actually does ;) > I didn't try to read it carefully though, but perhaps at least the > changelog could explain more? The changelog is admittedly terse, but that's basically all there is to it - Say you've got a device where you can have multiple outstanding commands - you'll identify commands/responses by some integer (the "tag"). Typically you won't get a full 64 bits for the tag, it might be 10 or 16 or 32 bits or whatever - and even if you could use raw pointers you wouldn't really want to because then if the device gives you garbage response you're derefing an untrusted pointer - you want to allocate tag structures out of a fixed array so you can validate responses. So you preallocate all your tag structures up front - now you can refer to them by small fixed integers. But if you want to be able to efficiently allocate from the same pool of tags across multiple CPUs - well, that's what this code is for.
WARNING: multiple messages have this Message-ID (diff)
From: Kent Overstreet <koverstreet@google.com> To: Oleg Nesterov <oleg@redhat.com> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, akpm@linux-foundation.org, Tejun Heo <tj@kernel.org>, Christoph Lameter <cl@linux-foundation.org>, Ingo Molnar <mingo@redhat.com> Subject: Re: [PATCH 17/21] Percpu tag allocator Date: Wed, 15 May 2013 02:25:43 -0700 [thread overview] Message-ID: <20130515092543.GE16164@moria.home.lan> (raw) In-Reply-To: <20130514134859.GA17587@redhat.com> On Tue, May 14, 2013 at 03:48:59PM +0200, Oleg Nesterov wrote: > On 05/13, Kent Overstreet wrote: > > > > +unsigned tag_alloc(struct tag_pool *pool, bool wait) > > +{ > > + struct tag_cpu_freelist *tags; > > + unsigned long flags; > > + unsigned ret; > > +retry: > > + preempt_disable(); > > + local_irq_save(flags); > > + tags = this_cpu_ptr(pool->tag_cpu); > > + > > + while (!tags->nr_free) { > > + spin_lock(&pool->lock); > > + > > + if (pool->nr_free) > > + move_tags(tags->free, &tags->nr_free, > > + pool->free, &pool->nr_free, > > + min(pool->nr_free, pool->watermark)); > > + else if (wait) { > > + struct tag_waiter wait = { .task = current }; > > + > > + __set_current_state(TASK_UNINTERRUPTIBLE); > > + list_add(&wait.list, &pool->wait); > > + > > + spin_unlock(&pool->lock); > > + local_irq_restore(flags); > > + preempt_enable(); > > + > > + schedule(); > > + __set_current_state(TASK_RUNNING); > > schedule() always returns in TASK_RUNNING state > > > + > > + if (!list_empty_careful(&wait.list)) { > > + spin_lock_irqsave(&pool->lock, flags); > > + list_del_init(&wait.list); > > + spin_unlock_irqrestore(&pool->lock, flags); > > This is only theoretical, but racy. > > tag_free() does > > list_del_init(wait->list); > /* WINDOW */ > wake_up_process(wait->task); > > in theory the caller of tag_alloc() can notice list_empty_careful(), > return without taking pool->lock, exit, and free this task_struct. > > But the main problem is that it is not clear why this code reimplements > add_wait_queue/wake_up_all, for what? To save on locking... there's really no point in another lock for the wait queue. Could just use the wait queue lock instead I suppose, like wait_event_interruptible_locked() (the extra spin_lock()/unlock() might not really cost anything but nested irqsave()/restore() is ridiculously expensive, IME). > I must admit, I do not understand what this code actually does ;) > I didn't try to read it carefully though, but perhaps at least the > changelog could explain more? The changelog is admittedly terse, but that's basically all there is to it - Say you've got a device where you can have multiple outstanding commands - you'll identify commands/responses by some integer (the "tag"). Typically you won't get a full 64 bits for the tag, it might be 10 or 16 or 32 bits or whatever - and even if you could use raw pointers you wouldn't really want to because then if the device gives you garbage response you're derefing an untrusted pointer - you want to allocate tag structures out of a fixed array so you can validate responses. So you preallocate all your tag structures up front - now you can refer to them by small fixed integers. But if you want to be able to efficiently allocate from the same pool of tags across multiple CPUs - well, that's what this code is for. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
next prev parent reply other threads:[~2013-05-15 9:26 UTC|newest] Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-05-14 1:18 AIO refactoring/performance improvements/cancellation Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 01/21] aio: fix kioctx not being freed after cancellation at exit time Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 02/21] aio: reqs_active -> reqs_available Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 03/21] aio: percpu reqs_available Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 04/21] Generic percpu refcounting Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 13:51 ` Oleg Nesterov 2013-05-14 13:51 ` Oleg Nesterov 2013-05-15 8:21 ` Kent Overstreet 2013-05-15 8:21 ` Kent Overstreet 2013-05-14 14:59 ` Tejun Heo 2013-05-14 14:59 ` Tejun Heo 2013-05-14 15:28 ` Oleg Nesterov 2013-05-14 15:28 ` Oleg Nesterov 2013-05-15 9:00 ` Kent Overstreet 2013-05-15 9:00 ` Kent Overstreet 2013-05-15 8:58 ` Kent Overstreet 2013-05-15 8:58 ` Kent Overstreet 2013-05-15 17:37 ` Tejun Heo 2013-05-15 17:37 ` Tejun Heo 2013-05-28 23:47 ` Kent Overstreet 2013-05-28 23:47 ` Kent Overstreet 2013-05-29 1:11 ` Tejun Heo 2013-05-29 1:11 ` Tejun Heo 2013-05-29 4:59 ` Rusty Russell 2013-05-29 4:59 ` Rusty Russell 2013-05-31 20:12 ` Kent Overstreet 2013-05-31 20:12 ` Kent Overstreet 2013-05-14 21:59 ` Tejun Heo 2013-05-14 21:59 ` Tejun Heo 2013-05-14 22:15 ` Tejun Heo 2013-05-14 22:15 ` Tejun Heo 2013-05-15 9:07 ` Kent Overstreet 2013-05-15 9:07 ` Kent Overstreet 2013-05-15 17:56 ` Tejun Heo 2013-05-15 17:56 ` Tejun Heo 2013-05-16 0:26 ` Rusty Russell 2013-05-16 0:26 ` Rusty Russell 2013-05-14 1:18 ` [PATCH 05/21] aio: percpu ioctx refcount Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 06/21] aio: io_cancel() no longer returns the io_event Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 07/21] aio: Don't use ctx->tail unnecessarily Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 08/21] aio: Kill aio_rw_vect_retry() Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 09/21] aio: Kill unneeded kiocb members Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 10/21] aio: Kill ki_users Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 11/21] aio: Kill ki_dtor Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 12/21] aio: convert the ioctx list to radix tree Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 13/21] block: prep work for batch completion Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 14/21] block, aio: batch completion for bios/kiocbs Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 15/21] virtio-blk: convert to batch completion Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 16/21] mtip32xx: " Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 17/21] Percpu tag allocator Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 13:48 ` Oleg Nesterov 2013-05-14 13:48 ` Oleg Nesterov 2013-05-14 14:24 ` Oleg Nesterov 2013-05-14 14:24 ` Oleg Nesterov 2013-05-15 9:34 ` Kent Overstreet 2013-05-15 9:34 ` Kent Overstreet 2013-05-15 9:25 ` Kent Overstreet [this message] 2013-05-15 9:25 ` Kent Overstreet 2013-05-15 15:41 ` Oleg Nesterov 2013-05-15 15:41 ` Oleg Nesterov 2013-05-15 16:10 ` Oleg Nesterov 2013-05-15 16:10 ` Oleg Nesterov 2013-06-10 23:20 ` Kent Overstreet 2013-06-10 23:20 ` Kent Overstreet 2013-06-11 17:42 ` Oleg Nesterov 2013-06-11 17:42 ` Oleg Nesterov 2013-05-14 15:03 ` Tejun Heo 2013-05-14 15:03 ` Tejun Heo 2013-05-15 20:19 ` Andi Kleen 2013-05-15 20:19 ` Andi Kleen 2013-05-14 1:18 ` [PATCH 18/21] aio: Allow cancellation without a cancel callback, new kiocb lookup Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 19/21] aio/usb: Update cancellation for new synchonization Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 20/21] direct-io: Set dio->io_error directly Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-14 1:18 ` [PATCH 21/21] block: Bio cancellation Kent Overstreet 2013-05-14 1:18 ` Kent Overstreet 2013-05-15 17:52 ` Jens Axboe 2013-05-15 17:52 ` Jens Axboe 2013-05-15 19:29 ` Kent Overstreet 2013-05-15 19:29 ` Kent Overstreet 2013-05-15 20:01 ` Jens Axboe 2013-05-15 20:01 ` Jens Axboe 2013-05-31 22:52 ` Kent Overstreet 2013-05-31 22:52 ` Kent Overstreet
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20130515092543.GE16164@moria.home.lan \ --to=koverstreet@google.com \ --cc=akpm@linux-foundation.org \ --cc=cl@linux-foundation.org \ --cc=linux-aio@kvack.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=oleg@redhat.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.