On Mon, 12 Aug 2013 10:18:03 +0800 Shaohua Li <shli@kernel.org> wrote:

> Neil,
> 
> This is another attempt to make raid5 stripe handling multi-threading.
> Recent workqueue improvement for unbound workqueue looks very promising to the
> raid5 usage. I had details in the first patch.
> 
> The patches are against your tree with patch 'raid5: make release_stripe
> lockless' and 'raid5: fix stripe release order' but without 'raid5: create
> multiple threads to handle stripes'
> 
> My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling
> multi-threading, group_thread_cnt is set to 8.
> 
> randwrite	throughput(ratio) unpatch/patch		requestsize(sectors) unpatch/patch
> 4k		1/5.9					8/8
> 8k		1/5.5					16/13
> 16k		1/4.8					16/13
> 32k		1/4.6					18/14
> 64k		1/4.7					17/13
> 128k		1/4.2					23/16
> 256k		1/3.5					41/21
> 512k		1/3					75/28
> 1M		1/2.6					134/34
> 
> For example, in 1M randwrite test, patched kernel is 2.6x faster, but average
> request sending to each disk is drop to 34 sectors from 134 sectors long.
> 
> Currently the biggest problem is request size is dropped, because there are
> multiple threads dispatching requests. This indicates multi-threading might not
> be proper for all setup, so I disable it by default in this version. But since
> throughput is largly improved, I thought this isn't a blocking issue. I'm still
> working on improving this, maybe schedule stripes from one block plug as a
> whole.
> 
> Thanks,
> Shaohua


Thanks.  I like this a lot more than the previous version.

It doesn't seem to apply exactly to my current 'for-next' - probably because
I have moved things around and have a different set of patches applied :-(
If you could rebase it on my current for-next I'll apply it and probably
submit for next merge window.
A couple of little changes I'd like made:

1/ alloc_thread_groups need to use GFP_NOIO, it least when called from
raid5_store_group_thread_cnt.  At this point in time IO to the RAID5 is
stalled so if the malloc needs to free memory it might wait for writeout to
the RAID5 and so deadlock.  GFP_NOIO prevents that.

2/ could we move the

+                       if (!cpu_online(cpu)) {
+                               cpu = cpumask_any(cpu_online_mask);
+                               sh->cpu = cpu;
+                       }

inside raid5_wakeup_stripe_thread() ?

It isn't a perfect fit, but I think it is a better place for it.
It could check list_empty(&sh->lru) and if it is empty, add to
the appropriate group->handle_list.

The code in do_release_stripe would become
                else {
                        clear_bit(STRIPE_DELAYED, &sh->state);
                        clear_bit(STRIPE_BIT_DELAY, &sh->state);
+                       if (conf->worker_cnt_per_group == 0) {
+                               list_add_tail(&sh->lru, &conf->handle_list);
+                       } else {
+                               raid5_wakeup_stripe_thread(sh);
+                               return;
+                       }
                }
                md_wakeup_thread(conf->mddev->thread);

??

NeilBrown