From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B82CC43460 for ; Tue, 13 Apr 2021 09:24:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DD61613B7 for ; Tue, 13 Apr 2021 09:24:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245667AbhDMJY4 (ORCPT ); Tue, 13 Apr 2021 05:24:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240373AbhDMJYx (ORCPT ); Tue, 13 Apr 2021 05:24:53 -0400 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C58C06175F for ; Tue, 13 Apr 2021 02:24:32 -0700 (PDT) Received: by mail-yb1-xb2d.google.com with SMTP id x76so7403474ybe.5 for ; Tue, 13 Apr 2021 02:24:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=eilsWJ436y5EpMfON6Wru7/wZHB3hyKVpK6txyWMaEk=; b=U2AR7ZhsvlKFUL9UAI2j57eVY4R8TZJ57Rex9pkWTeCtEVT6wD9V2k0XQiFOio+ZSl WrT9bj4HyAST2TmN2iAVLrOTZK1xBOqZCA1t5/p/RxgAzH67MfqXxLNxbORiqnzonhrl fhPj8199fGlyld+3rlYneaGjU5AjMFjytQJb30bI5INbjkWUc47dXI8kW2KWQF41qTfy pwanSzKipCKygspEAru7pzJTal7FO9htNYplaofaJjcyojbDR9BMuNIJYK1q/HOXrhmY +ISjULw85tVhgI7ErWMduza7zTsZE1MRAcGFiGaVUjUJBgl048aa02JgXlJI3R3fgxa1 adOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=eilsWJ436y5EpMfON6Wru7/wZHB3hyKVpK6txyWMaEk=; b=hJU3HHlU3rexGlVcstdSdPeMJXJxvUlN0ykMOdxI8Tn7fthedG4HC7xGoAW6wvoErz Q5oz6dAsiXbZPYMesc/+ahMiMtMLkoNRiPyRwovhv+6gGcTKZ8FbXCQsLxz5AQGSQxPt V0nKVxV3yTcaCqGwWIGIgYxVh9ABIU/FfU2OtBnQS9Wu1H97idcz6K6+VJZtkRtbdzrL 1dIF1pXMHdS7HI58px9IjiFlVS+8BTqwNfqlQin1uWLlso0tPXS5DWzeIRxlmJTCTG0D VrE+MN9PAuRcqXvHUbuBy3XGwcxqcbv0+mLSDGY6LHCyu3bpVCBio2tvMiF/Bkh7TPPu hOFA== X-Gm-Message-State: AOAM530HKmoN1K4Al4yOWqW81wb26i5lGOpk5j0aes5fzvex4X+kBN+r m8eCMYSXxYIKRUvxim9muL2BwTXNXvY5D3DQ48khLA== X-Google-Smtp-Source: ABdhPJwNxYSRQPQPqbHv6ObXbGvShQMUCY0Y74AJPbb0xgQNMVttsb9i97F8rFlNrlZWwreruBAxg9HW4gr5GrjDNAk= X-Received: by 2002:a25:b906:: with SMTP id x6mr41172495ybj.504.1618305871573; Tue, 13 Apr 2021 02:24:31 -0700 (PDT) MIME-Version: 1.0 References: <20210412051445.GA47322@roeck-us.net> <78c858ba-a847-884f-80c3-cb1eb84d4113@roeck-us.net> In-Reply-To: From: Eric Dumazet Date: Tue, 13 Apr 2021 11:24:20 +0200 Message-ID: Subject: Re: Linux 5.12-rc7 To: Guenter Roeck Cc: Linus Torvalds , Xuan Zhuo , "Michael S. Tsirkin" , Linux Kernel Mailing List , Netdev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 12, 2021 at 10:05 PM Guenter Roeck wrote: > > On 4/12/21 10:38 AM, Eric Dumazet wrote: > [ ... ] > > > Yes, I think this is the real issue here. This smells like some memory > > corruption. > > > > In my traces, packet is correctly received in AF_PACKET queue. > > > > I have checked the skb is well formed. > > > > But the user space seems to never call poll() and recvmsg() on this > > af_packet socket. > > > > After sprinkling the kernel with debug messages: > > 424 00:01:33.674181 sendto(6, "E\0\1H\0\0\0\0@\21y\246\0\0\0\0\377\377\377\377\0D\0C\00148\346\1\1\6\0\246\336\333\v\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0RT\0\ > 424 00:01:33.693873 close(6) = 0 > 424 00:01:33.694652 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 > 424 00:01:33.695213 clock_gettime64(CLOCK_MONOTONIC, 0x7be18a18) = -1 EFAULT (Bad address) > 424 00:01:33.695889 write(2, "udhcpc: clock_gettime(MONOTONIC) failed\n", 40) = -1 EFAULT (Bad address) > 424 00:01:33.697311 exit_group(1) = ? > 424 00:01:33.698346 +++ exited with 1 +++ > > I only see that after adding debug messages in the kernel, so I guess there must be > a heisenbug somehere. > > Anyway, indeed, I see (another kernel debug message): > > __do_sys_clock_gettime: Returning -EFAULT on address 0x7bacc9a8 > > So udhcpc doesn't even try to read the reply because it crashes after sendto() > when trying to read the current time. Unless I am missing something, that means > that the problem happens somewhere on the send side. > > To make things even more interesting, it looks like the failing system call > isn't always clock_gettime(). > > Guenter I think GRO fast path has never worked on SUPERH. Probably SUPERH has never used a fast NIC (10Gbit+) The following hack fixes the issue. diff --git a/net/core/dev.c b/net/core/dev.c index af8c1ea040b9364b076e2d72f04dc3de2d7e2f11..91ba89a645ff91d4cd4f3d8dc8a009bcb67da344 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5916,13 +5916,16 @@ static struct list_head *gro_list_prepare(struct napi_struct *napi, static void skb_gro_reset_offset(struct sk_buff *skb) { +#if !defined(CONFIG_SUPERH) const struct skb_shared_info *pinfo = skb_shinfo(skb); const skb_frag_t *frag0 = &pinfo->frags[0]; +#endif NAPI_GRO_CB(skb)->data_offset = 0; NAPI_GRO_CB(skb)->frag0 = NULL; NAPI_GRO_CB(skb)->frag0_len = 0; +#if !defined(CONFIG_SUPERH) if (!skb_headlen(skb) && pinfo->nr_frags && !PageHighMem(skb_frag_page(frag0))) { NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0); @@ -5930,6 +5933,7 @@ static void skb_gro_reset_offset(struct sk_buff *skb) skb_frag_size(frag0), skb->end - skb->tail); } +#endif } static void gro_pull_from_frag0(struct sk_buff *skb, int grow)