From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:59200 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753311AbcAXEFc (ORCPT ); Sat, 23 Jan 2016 23:05:32 -0500 Date: Sun, 24 Jan 2016 04:05:29 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel Subject: Re: Orangefs ABI documentation Message-ID: <20160124040529.GX17997@ZenIV.linux.org.uk> References: <20160122200442.GF17997@ZenIV.linux.org.uk> <20160123001202.GJ17997@ZenIV.linux.org.uk> <20160123012808.GK17997@ZenIV.linux.org.uk> <20160123191055.GN17997@ZenIV.linux.org.uk> <20160123214006.GO17997@ZenIV.linux.org.uk> <20160124001615.GT17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160124001615.GT17997@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sun, Jan 24, 2016 at 12:16:15AM +0000, Al Viro wrote: > On Sat, Jan 23, 2016 at 05:36:41PM -0500, Mike Marshall wrote: > > AV> IOW, would the following do the right thing? > > AV> That would've left us with only one caller of > > AV> handle_io_error()... > > > > It works. With your simplified code all > > the needed things still happen: complete and > > bufmap_put... > > > > I've never had an error there unless I forgot > > to turn on the client-core... > > > > You must be looking for a way to get rid of > > another macro ... > > That as well, but mostly I want to sort the situation with cancels out and > get a better grasp on when can that code be reached. BTW, an error at that > spot is trivial to arrange - just pass read() a destination with munmapped > page in the middle and it'll trigger just fine. IOW, > p = mmap(NULL, 65536, PROT_READ|PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > munmap(p + 16384, 16384); > read(fd, p, 65536); > with fd being a file on orangefs should step into that. Hmm... I've just realized that I don't understand why you are waiting in the orangefs_devreq_write_iter() at all. After all, you've reserved the slot in wait_for_direct_io() and do not release it until right after you've done complete(&op->done). Why should server block while your read(2) copies the data from bufmap to its real destination? After all, any further request hitting the same slot won't come until the slot is released anyway, right? What (and why) would the server be doing to that slot in the meanwhile? It's already copied whatever data it was going to copy as part of that file_read... Speaking of your slot allocator - surely it would be better to maintain the count of unused slots there? For example, replace that waitq with semaphore, initialize it to number of slots, and have wait_for_a_slot() do if (down_interruptible(slargs->slots_sem)) { whine "interrupted" return -EINTR; } /* now we know that there's an empty slot for us */ spin_lock(slargs->slots_lock); n = find_first_zero_bit(slargs->slots_bitmap, slargs->slot_count); set_bit(slargs->slots_bitmap, n); spin_unlock(slargs->slots_lock); // or a lockless variant thereof, for that matter - just need to be // carefull with barriers return n; with put_back_slot() being spin_lock clear_bit spin_unlock up Wait a minute... What happens if you have a daemon disconnect while somebody is holding a bufmap slot? Each of those (as well as whoever's waiting for a slot to come free) is holding a reference to bufmap. devreq_mutex won't do a damn thing, contrary to the comment in ->release() - it's _not_ held across the duration of wait_for_direct_io(). We'll just decrement the refcount of bufmap, do nothing since it hasn't reached zero, proceed to mark all ops as purged, wake each service_operation() up and sod off. Now, the holders of those slots will call orangefs_get_bufmap_init(), get 1 (since we hadn't dropped the last reference yet - can it *ever* see 0 there, actually?) and return -EAGAIN. With wait_for_direct_io() noticing that, freeing the slot and going into restart. And if there was the only one, we are fine, but what if there were several? Suppose there had been two read() going on at the time of disconnect. The first one drops the slot. Good, refcount of bufmap is down to 1 now. And we go back to orangefs_bufmap_get(). Which calls orangefs_bufmap_ref(). Which sees that __orangefs_bufmap is still non-NULL. And promptly regains the reference and allocates the slot in that sucker. Then it's back to service_operations()... Which will check is_daemon_in_service(), possibly getting "yes, it is" (if the new one got already started). Ouch. The fun part in all of that is that new daemon won't be able to get new bufmap in place until all users of the old one run down. What a mess... Ho-hum... So we need to a) have ->release() wait for all slots to get freed b) have orangefs_bufmap_get() recognize that situation and _not_ get new reference to bufmap that is already going down. c) move the "wait for new daemon to come and install a new bufmap" to some point past the release of the slot - service_operation() is too early for that to work. Or am I missing something simple that makes the scenario above not go that way? Confused...