On Fri, Mar 11 2016, NeilBrown wrote: > On Fri, Mar 11 2016, Jan Kara wrote: > >> Currently DAX page fault locking is racy. >> >> CPU0 (write fault) CPU1 (read fault) >> >> __dax_fault() __dax_fault() >> get_block(inode, block, &bh, 0) -> not mapped >> get_block(inode, block, &bh, 0) >> -> not mapped >> if (!buffer_mapped(&bh)) >> if (vmf->flags & FAULT_FLAG_WRITE) >> get_block(inode, block, &bh, 1) -> allocates blocks >> if (page) -> no >> if (!buffer_mapped(&bh)) >> if (vmf->flags & FAULT_FLAG_WRITE) { >> } else { >> dax_load_hole(); >> } >> dax_insert_mapping() >> >> And we are in a situation where we fail in dax_radix_entry() with -EIO. >> >> Another problem with the current DAX page fault locking is that there is >> no race-free way to clear dirty tag in the radix tree. We can always >> end up with clean radix tree and dirty data in CPU cache. >> >> We fix the first problem by introducing locking of exceptional radix >> tree entries in DAX mappings acting very similarly to page lock and thus >> synchronizing properly faults against the same mapping index. The same >> lock can later be used to avoid races when clearing radix tree dirty >> tag. > > Hi, > I think the exception locking bits look good - I cannot comment on the > rest. > I looks like it was a good idea to bring the locking into dax.c instead > of trying to make it generic. > Actually ... I'm still bothered by the exclusive waiting. If an entry is locked and there are two threads in dax_pfn_mkwrite() then one would be woken up when the entry is unlocked and it will just set the TAG_DIRTY flag and then continue without ever waking the next waiter on the wait queue. I *think* that any thread which gets an exclusive wakeup is responsible for performing another wakeup. In this case it must either lock the slot, or call __wakeup. That means: grab_mapping_entry needs to call wakeup: if radix_tree_preload() fails if radix_tree_insert fails other than with -EEXIST if a valid page was found dax_delete_mapping_entry needs to call wakeup if the fail case, though as that isn't expect (WARN_ON_ONCE) it should be a problem not to wakeup here dax_pfn_mkwrite needs to call wakeup unconditionally Am I missing something? Thanks, NeilBrown