On Sat, Sep 30 2017, Xiao Ni wrote: > On 09/12/2017 09:49 AM, NeilBrown wrote: >> Hi, >> I looked again at the previous patch I posted which tried to mak >> md_update_sb() safe without taking reconfig_mutex, and realized that >> it had serious problems, particularly around devices being added or >> removed while the update was happening. > Could you explain this in detail? What's the serious problems? My patch allowed md_update_sb() to run without holding ->reconfig_mutex. md_update_sb() uses rdev_for_each() in several places, so either that list needs to be kept stable, or md_update_sb() needs to ensure it is very careful while walking the list. I couldn't just use a spinlock to protect the lists because one of the rdev_for_Each calls md_super_write() on some of the devices, which is not allowed while holding a spinlock. I decided it was too hard to make md_update_sb() walk the list in a fully say manner. NeilBrown > > Regards > Xiao >> >> So I decided to try a different approach, which is embodied in these >> patches. The md thread is now explicitly allowed to call >> md_update_sb() while some other thread holds the lock and >> waits for mddev_suspend() to complete. >> >> Please test these and confirm that they still address the problem you >> found. >> >> Thanks, >> NeilBrown >> >> --- >> >> NeilBrown (4): >> md: always hold reconfig_mutex when calling mddev_suspend() >> md: don't call bitmap_create() while array is quiesced. >> md: use mddev_suspend/resume instead of ->quiesce() >> md: allow metadata update while suspending. >> >> >> drivers/md/dm-raid.c | 5 ++++- >> drivers/md/md.c | 45 ++++++++++++++++++++++++++++++++------------- >> drivers/md/md.h | 6 ++++++ >> drivers/md/raid5-cache.c | 2 ++ >> 4 files changed, 44 insertions(+), 14 deletions(-) >> >> -- >> Signature >>