From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:43379 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1946164AbcBRTUb (ORCPT ); Thu, 18 Feb 2016 14:20:31 -0500 Date: Thu, 18 Feb 2016 19:20:28 +0000 From: Al Viro To: Mike Marshall Cc: Martin Brandenburg , Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160218192027.GT17997@ZenIV.linux.org.uk> References: <20160216233609.GE17997@ZenIV.linux.org.uk> <20160216235441.GF17997@ZenIV.linux.org.uk> <20160217230900.GP17997@ZenIV.linux.org.uk> <20160217231524.GQ17997@ZenIV.linux.org.uk> <20160218000439.GR17997@ZenIV.linux.org.uk> <20160218111122.GS17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Feb 18, 2016 at 01:58:52PM -0500, Mike Marshall wrote: > wait_for_matching_downcall: operation purged (tag 10889, ffff880012898000, att 0 > service_operation: wait_for_matching_downcall returned -11 for ffff880012898000 > Interrupted: Removed op ffff880012898000 from htable_ops_in_progress state is "in progress" > tag 10889 (orangefs_create) -- operation to be retried (1 attempt) > service_operation: orangefs_create op:ffff880012898000: moved to "waiting" > service_operation:client core is NOT in service, ffff880012898000 > > > > service_operation: wait_for_matching_downcall returned 0 for ffff880012898000 > service_operation orangefs_create returning: 0 for ffff880012898000 ... and we've got to "serviced" somehow. IDGI... Are you sure that it's not a daemon replying with zero fsid? Could you slap gossip_debug(GOSSIP_WAIT_DEBUG, "%s: %s op:%p: process:%s state -> %d\n", __func__, op_name, op, current->comm, op->op_state); after assignments to ->op_state in set_op_state_purged() and set_op_state_serviced() as well as after the calls of set_op_state_waiting() (in service_operation() and orangefs_devreq_read()) and set_op_state_inprogress() (in orangefs_devreq_read()). Another thing: in orangefs_devreq_write_iter(), just before the set_op_state_serviced() add WARN_ON(op->upcall.type == ORANGEFS_OP_VFS_CREATE && !op->downcall.create.refn.fs_id); to make sure that this crap isn't coming from the daemon. While we are at it - #define op_is_cancel(op) ((op)->downcall.type == ORANGEFS_VFS_OP_CANCEL)is checking the wrong thing; should be #define op_is_cancel(op) ((op)->upcall.type == ORANGEFS_VFS_OP_CANCEL) Shouldn't be worse than a leak, though, so I doubt that it could be causing this problem...