From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BD7AC433EF for ; Mon, 14 Feb 2022 05:53:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231709AbiBNFxF (ORCPT ); Mon, 14 Feb 2022 00:53:05 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:44056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240557AbiBNFxE (ORCPT ); Mon, 14 Feb 2022 00:53:04 -0500 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 802075004F; Sun, 13 Feb 2022 21:52:57 -0800 (PST) Received: by mail-io1-xd2b.google.com with SMTP id y84so18733903iof.0; Sun, 13 Feb 2022 21:52:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FGezhFYDn+FIpD2g9NAEA9h9LALmUmeNx1mmcxyun78=; b=CaKWE/efZEQOf9QdTrU0JX8VjnDEFyrUuuuAj4YxnrKV98W3tibxNbXbaq73eAvsYM ihpxxVnHsTwOiKlJg7kOusMoyVVyXLtRretKO+Qv2kYkxELo2CYCSHs64b8DBvWJhh2U Ikjn0ItCCnvWCZjXPebXOCLK7bCYfPIOUs84fRsGnBcGgiB6NDUh3jCT6H5xnTJ4+oM4 bIbhRoyb5sEZAa6d3bYOe4tMCH4Pbcudt+nmnspiFJTNrBTrsgDdgdflSJ3UmLmNs365 yywkPnY5fREYF2NQI6wOqJ2y1lWHH7RkkAn9dlhBSo/ZHJCmZImCk1DDMPmJQ9M1xRzO 7amA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FGezhFYDn+FIpD2g9NAEA9h9LALmUmeNx1mmcxyun78=; b=pM10aaSZReqjPoaL/x4Gv/sP2gpCeDz0/3kZ2cM/W83qeapGJYzwPtXivqKZkET/jc oCADBLUTLmPjqz/yHLfM+Wi6uZujgMstk12naL8KyAhmGdhWV0Kht0jtONcMLuGwFmV3 GpZQKDDOAeziCu48CvSU9s0iHUKMBwGvjUKCCIdjzybwRvzxlyQgAhR4j1b8YePKhOAn 9Ee4HkstG1ZKn6roqid1V0uE5ZglwlejOMm5jxVU/ltyj4xcFhniSQrJyLc5EgQtalK9 Qd8gVDmikNjrZ7lOLctsoSa/eoRxJbJ1OrWRrde8XrhhqX/U1uX7IbyBGFwqUz3t7+EZ d/jQ== X-Gm-Message-State: AOAM532A3Y1zjAyMuq48zc1yQiQNEtMWT68/TpuZ9P6OdxCCqIZDpwvv 5VAhFepEe7Cc5+u9JDi13GDQbfmyejr5/kicoSo= X-Google-Smtp-Source: ABdhPJzyNfT3dCCyH23JrWaxnDgYYrTxiIDKzsOq71Fam71i0Yn3AY9c6XUUElOnQ23RvpD5lsairavIkZgjEOZDrBM= X-Received: by 2002:a5e:a806:: with SMTP id c6mr6492459ioa.112.1644817976715; Sun, 13 Feb 2022 21:52:56 -0800 (PST) MIME-Version: 1.0 References: <20220211234819.612288-1-toke@redhat.com> <87h7927q3o.fsf@toke.dk> In-Reply-To: <87h7927q3o.fsf@toke.dk> From: Andrii Nakryiko Date: Sun, 13 Feb 2022 21:52:45 -0800 Message-ID: Subject: Re: [PATCH bpf-next v2] libbpf: Use dynamically allocated buffer when receiving netlink messages To: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Kumar Kartikeya Dwivedi , Zhiqian Guan , Networking , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Sun, Feb 13, 2022 at 7:17 AM Toke H=C3=B8iland-J=C3=B8rgensen wrote: > > Andrii Nakryiko writes: > > > On Fri, Feb 11, 2022 at 3:49 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: > >> > >> When receiving netlink messages, libbpf was using a statically allocat= ed > >> stack buffer of 4k bytes. This happened to work fine on systems with a= 4k > >> page size, but on systems with larger page sizes it can lead to trunca= ted > >> messages. The user-visible impact of this was that libbpf would insist= no > >> XDP program was attached to some interfaces because that bit of the ne= tlink > >> message got chopped off. > >> > >> Fix this by switching to a dynamically allocated buffer; we borrow the > >> approach from iproute2 of using recvmsg() with MSG_PEEK|MSG_TRUNC to g= et > >> the actual size of the pending message before receiving it, adjusting = the > >> buffer as necessary. While we're at it, also add retries on interrupte= d > >> system calls around the recvmsg() call. > >> > >> v2: > >> - Move peek logic to libbpf_netlink_recv(), don't double free on ENO= MEM. > >> > >> Reported-by: Zhiqian Guan > >> Fixes: 8bbb77b7c7a2 ("libbpf: Add various netlink helpers") > >> Acked-by: Kumar Kartikeya Dwivedi > >> Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen > >> --- > > > > Applied to bpf-next. > > Awesome, thanks! > > > One improvement would be to avoid initial malloc of 4096, especially > > if that size is enough for most cases. You could detect this through > > iov.iov_base =3D=3D buf and not free(iov.iov_base) at the end. Seems > > reliable and simple enough. I'll leave it up to you to follow up, if > > you think it's a good idea. > > Hmm, seems distributions tend to default the stack size limit to 8k; so > not sure if blowing half of that on a buffer just to avoid a call to > malloc() in a non-performance-sensitive is ideal to begin with? I think > I'd prefer to just keep the dynamic allocation... 8KB for user-space thread stack, really? Not 2MB by default? Are you sure you are not confusing this with kernel threads? > > -Toke >