From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753806AbdA3ROF (ORCPT ); Mon, 30 Jan 2017 12:14:05 -0500 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:49516 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752432AbdA3ROC (ORCPT ); Mon, 30 Jan 2017 12:14:02 -0500 X-IronPort-AV: E=Sophos;i="5.33,312,1477958400"; d="scan'208";a="662183343" Subject: Re: [PATCH v2] xen-netfront: Fix Rx stall during network stress and OOM To: Boris Ostrovsky References: <1484771149-12699-1-git-send-email-vineethp@u480fcf3b67f557f68df1.ant.amazon.com> <66b10c64-936a-8001-6855-2ff1ed626642@amazon.com> <38ccfaea-0a65-a6f3-c19a-e6f9c0d4ef76@oracle.com> <989bd104-13a9-f25f-b857-24ec49781f9c@amazon.com> <30069778-9509-8112-5089-2eea7b679236@oracle.com> CC: , David Miller , , Wei Liu , Paul Durrant , xen-devel From: Vineeth Remanan Pillai Message-ID: Date: Mon, 30 Jan 2017 09:13:57 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <30069778-9509-8112-5089-2eea7b679236@oracle.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.43.160.56] X-ClientProxiedBy: EX13D01UWA002.ant.amazon.com (10.43.160.74) To EX13D08UWC003.ant.amazon.com (10.43.162.21) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/30/2017 09:06 AM, Boris Ostrovsky wrote: > On 01/30/2017 11:47 AM, Vineeth Remanan Pillai wrote: > >>> 2. It tickles a latent bug during resume where the timer triggers >>> before we re-connect. The trouble is that we now try to dereference >>> queue->rx.sring which is NULL since we disconnect in >>> netfront_resume(). (Curiously, I only observe it with 32-bit guests) >> I think we may hit this bug after removing the timer as well. We call >> RING_PUSH_REQUESTS_AND_CHECK_NOTIFY soon after, which also dereference >> queue->rx.sring. > If the timer is deleted in xennet_disconnect_backend() then why would > anyone be pushing anything to the backend after that? Sorry, I got the ordering wrong. Thanks for the clarification.. Thanks, Vineeth