From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:58478 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932145AbcA3S1e (ORCPT ); Sat, 30 Jan 2016 13:27:34 -0500 Date: Sat, 30 Jan 2016 18:27:31 +0000 From: Al Viro To: Martin Brandenburg Cc: Mike Marshall , Linus Torvalds , linux-fsdevel Subject: Re: Orangefs ABI documentation Message-ID: <20160130182731.GF17997@ZenIV.linux.org.uk> References: <20160123012808.GK17997@ZenIV.linux.org.uk> <20160123191055.GN17997@ZenIV.linux.org.uk> <20160123214006.GO17997@ZenIV.linux.org.uk> <20160124001615.GT17997@ZenIV.linux.org.uk> <20160124040529.GX17997@ZenIV.linux.org.uk> <20160130173413.GE17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160130173413.GE17997@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Jan 30, 2016 at 05:34:13PM +0000, Al Viro wrote: > And again, how could > /* op uses shared memory */ > if (orangefs_get_bufmap_init() == 0) { > in service_operation() possibly be true, when we have > * op->uses_shared_memory just checked to be set > * all callers that set it (orangefs_readdir() and wait_for_direct_io() > having allocated a slot before calling service_operation() and not releasing > it until service_operation() returns > * __orangefs_bufmap not becoming NULL until all slots are freed and > * orangefs_get_bufmap_init() returning 1 unless __orangefs_bufmap is > NULL? > > AFAICS, that code (waiting for daemon to be restarted) is provably never > executed. What am I missing? While we are at it, what happens to original code (without refcount changes, etc. from my pile - the problem remains with those, but let's look at the code at merge from v4.4) if something does read() and gets a signal just as the daemon gets to n = copy_from_iter(&op->downcall, downcall_size, iter); in ->write_iter(), reporting that is has finished that read? We were in wait_for_matching_downcall(), interruptibly sleeping. We got woken up by signal delivery. Checked op->op_state; it's still not marked as serviced. We check signal_pending() and find it true. We hit orangefs_clean_up_interrupted_operation(), which doesn't do anything to op->op_state, and we return -EINTR to service_operation(). Which returns without waiting for anything to wait_for_direct_io() and we go into sending a cancel. Now the daemon regains the timeslice. Marks op as serviced. And proceeds to wait on op->io_completion_waitq. Who's going to wake it up? orangefs_cancel_op_in_progress() sure as hell won't - it has no way to find op, nevermind doing wakeups. The rest of wait_for_direct_io() also doesn't wake the daemon up. How is that supposed to work? Moreover, we issue a cancel; when is it supposed to be processed and how do we tell if it's already been processed? Who should be waiting for what in case of cancel being issued just as the daemon gets around to reporting success of original operation? On the protocol level, that is.