From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755101Ab3LPRuk (ORCPT ); Mon, 16 Dec 2013 12:50:40 -0500 Received: from smtp.citrix.com ([66.165.176.89]:47661 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754550Ab3LPRui (ORCPT ); Mon, 16 Dec 2013 12:50:38 -0500 X-IronPort-AV: E=Sophos;i="4.95,496,1384300800"; d="scan'208";a="85008425" Date: Mon, 16 Dec 2013 17:50:36 +0000 From: Wei Liu To: Zoltan Kiss CC: Wei Liu , , , , , Subject: Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Message-ID: <20131216175036.GB25969@zion.uk.xensource.com> References: <1386892097-15502-1-git-send-email-zoltan.kiss@citrix.com> <1386892097-15502-2-git-send-email-zoltan.kiss@citrix.com> <20131213153138.GL21900@zion.uk.xensource.com> <52AB506E.3040509@citrix.com> <20131213191423.GA12582@zion.uk.xensource.com> <52AF1A84.3090304@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <52AF1A84.3090304@citrix.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote: [...] > >>>> > > >>>> >Should this be BUG_ON? AIUI this kthread should be the only one doing > >>>> >unmap, right? > >>>The NAPI instance can do it as well if it is a small packet fits > >>>into PKT_PROT_LEN. But still this scenario shouldn't really happen, > >>>I was just not sure we have to crash immediately. Maybe handle it as > >>>a fatal error and destroy the vif? > >>> > >It depends. If this is within the trust boundary, i.e. everything at the > >stage should have been sanitized then we should BUG_ON because there's > >clearly a bug somewhere in the sanitization process, or in the > >interaction of various backend routines. > > My understanding is that crashing should be avoided if we can bail > out somehow. At this point there is clearly a bug in netback > somewhere, something unmapped that page before it should have > happened, or at least that array get corrupted somehow. However > there is a chance that xenvif_fatal_tx_err() can contain the issue, > and the rest of the system can go unaffected. > That would make debugging much harder if a crash is caused by a previous corrupted array and we pretend we can carry on serving IMHO. Now netback is having three routines (NAPI, two kthreads) to serve a single vif, the interation among them makes bug hard to reproduce. Wei.