From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964952Ab3E1SDY (ORCPT ); Tue, 28 May 2013 14:03:24 -0400 Received: from mail-pb0-f52.google.com ([209.85.160.52]:50437 "EHLO mail-pb0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964903Ab3E1SDW (ORCPT ); Tue, 28 May 2013 14:03:22 -0400 Message-ID: <1369764200.3301.551.camel@edumazet-glaptop> Subject: Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets From: Eric Dumazet To: Rafael Aquini , Roland Dreier Cc: Ben Greear , Francois Romieu , atomlin@redhat.com, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pshelar@nicira.com, mst@redhat.com, alexander.h.duyck@intel.com, riel@redhat.com, sergei.shtylyov@cogentembedded.com, linux-kernel@vger.kernel.org Date: Tue, 28 May 2013 11:03:20 -0700 In-Reply-To: <20130528174304.GD11614@optiplex.redhat.com> References: <1369601101-23057-1-git-send-email-atomlin@redhat.com> <20130527224149.GA4384@electric-eye.fr.zoreil.com> <51A4D4AD.2010507@candelatech.com> <20130528161518.GC11614@optiplex.redhat.com> <1369758577.3301.543.camel@edumazet-glaptop> <20130528174304.GD11614@optiplex.redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2013-05-28 at 14:43 -0300, Rafael Aquini wrote: > Perhaps the explanation is because we're looking into old stuff bad effects, > then. But just to list a few for your appreciation: > -------------------------------------------------------- > Apr 23 11:25:31 217-IDC kernel: httpd: page allocation failure. order:1, > mode:0x20 Apr 23 11:25:31 217-IDC kernel: Pid: 19747, comm: httpd Not tainted > 2.6.32-358.2.1.el6.x86_64 #1 Apr 23 11:25:31 217-IDC kernel: Call Trace: Apr 23 > 11:25:31 217-IDC kernel: [] ? > __alloc_pages_nodemask+0x757/0x8d0 Apr 23 11:25:31 217-IDC kernel: > [] ? bond_start_xmit+0x2f1/0x5d0 [bonding] > .... > -------------------------------------------------------- > Apr 4 18:51:32 exton kernel: swapper: page allocation failure. order:1, > mode:0x20 > Apr 4 18:51:32 exton kernel: Pid: 0, comm: swapper Not tainted > 2.6.32-279.19.1.el6.x86_64 #1 > Apr 4 18:51:32 exton kernel: Call Trace: > Apr 4 18:51:32 exton kernel: [] ? > __alloc_pages_nodemask+0x77f/0x940 > Apr 4 18:51:32 exton kernel: [] ? kmem_getpages+0x62/0x170 > Apr 4 18:51:32 exton kernel: [] ? fallback_alloc+0x1ba/0x270 > Apr 4 18:51:32 exton kernel: [] ? cache_grow+0x2cf/0x320 > Apr 4 18:51:32 exton kernel: [] ? > ____cache_alloc_node+0x99/0x160 > Apr 4 18:51:32 exton kernel: [] ? > kmem_cache_alloc_node_trace+0x90/0x200 > Apr 4 18:51:32 exton kernel: [] ? __kmalloc_node+0x4d/0x60 > Apr 4 18:51:32 exton kernel: [] ? __alloc_skb+0x6d/0x190 > Apr 4 18:51:32 exton kernel: [] ? dev_alloc_skb+0x1d/0x40 > Apr 4 18:51:32 exton kernel: [] ? > ipoib_cm_alloc_rx_skb+0x30/0x430 [ib_ipoib] > Apr 4 18:51:32 exton kernel: [] ? > ipoib_cm_handle_rx_wc+0x29f/0x770 [ib_ipoib] > Apr 4 18:51:32 exton kernel: [] ? mlx4_ib_poll_cq+0x2c6/0x7f0 > [mlx4_ib] > .... > ---- This one seems a real bug/problem in drivers/infiniband/ulp/ipoib/ipoib_cm.c It uses : IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE, IPOIB_CM_RX_SG = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE, but then, ipoib_cm_alloc_rx_skb() does : skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12); so really asking more than one page for the first frag (skb->head), while the intent of the code was to use order-0 allocations. for (i = 0; i < frags; i++) { struct page *page = alloc_page(GFP_ATOMIC); .... Ideally, IPOIB_CM_HEAD_SIZE should be redefined to use SKB_MAX_HEAD(NET_SKB_PAD + 12) so that skb->head would use exactly oder-0 page, not order-1 one. Do you know understand why we should not hide allocation errors ?