On Mon, 12 Aug 2013 10:18:03 +0800 Shaohua Li wrote: > Neil, > > This is another attempt to make raid5 stripe handling multi-threading. > Recent workqueue improvement for unbound workqueue looks very promising to the > raid5 usage. I had details in the first patch. > > The patches are against your tree with patch 'raid5: make release_stripe > lockless' and 'raid5: fix stripe release order' but without 'raid5: create > multiple threads to handle stripes' > > My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling > multi-threading, group_thread_cnt is set to 8. > > randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch > 4k 1/5.9 8/8 > 8k 1/5.5 16/13 > 16k 1/4.8 16/13 > 32k 1/4.6 18/14 > 64k 1/4.7 17/13 > 128k 1/4.2 23/16 > 256k 1/3.5 41/21 > 512k 1/3 75/28 > 1M 1/2.6 134/34 > > For example, in 1M randwrite test, patched kernel is 2.6x faster, but average > request sending to each disk is drop to 34 sectors from 134 sectors long. > > Currently the biggest problem is request size is dropped, because there are > multiple threads dispatching requests. This indicates multi-threading might not > be proper for all setup, so I disable it by default in this version. But since > throughput is largly improved, I thought this isn't a blocking issue. I'm still > working on improving this, maybe schedule stripes from one block plug as a > whole. > > Thanks, > Shaohua Thanks. I like this a lot more than the previous version. It doesn't seem to apply exactly to my current 'for-next' - probably because I have moved things around and have a different set of patches applied :-( If you could rebase it on my current for-next I'll apply it and probably submit for next merge window. A couple of little changes I'd like made: 1/ alloc_thread_groups need to use GFP_NOIO, it least when called from raid5_store_group_thread_cnt. At this point in time IO to the RAID5 is stalled so if the malloc needs to free memory it might wait for writeout to the RAID5 and so deadlock. GFP_NOIO prevents that. 2/ could we move the + if (!cpu_online(cpu)) { + cpu = cpumask_any(cpu_online_mask); + sh->cpu = cpu; + } inside raid5_wakeup_stripe_thread() ? It isn't a perfect fit, but I think it is a better place for it. It could check list_empty(&sh->lru) and if it is empty, add to the appropriate group->handle_list. The code in do_release_stripe would become else { clear_bit(STRIPE_DELAYED, &sh->state); clear_bit(STRIPE_BIT_DELAY, &sh->state); + if (conf->worker_cnt_per_group == 0) { + list_add_tail(&sh->lru, &conf->handle_list); + } else { + raid5_wakeup_stripe_thread(sh); + return; + } } md_wakeup_thread(conf->mddev->thread); ?? NeilBrown