From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22170C10F14 for ; Thu, 3 Oct 2019 09:05:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFD6D2070B for ; Thu, 3 Oct 2019 09:05:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727611AbfJCJFA (ORCPT ); Thu, 3 Oct 2019 05:05:00 -0400 Received: from p-mail-ext.rd.orange.com ([161.106.1.9]:46922 "EHLO p-mail-ext.rd.orange.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727357AbfJCJFA (ORCPT ); Thu, 3 Oct 2019 05:05:00 -0400 Received: from p-mail-ext.rd.orange.com (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 8A9315615AC; Thu, 3 Oct 2019 12:42:01 +0200 (CEST) Received: from p-mail-int.rd.francetelecom.fr (p-mail-int.rd.francetelecom.fr [10.192.117.12]) by p-mail-ext.rd.orange.com (Postfix) with ESMTP id 8503A561596; Thu, 3 Oct 2019 12:42:01 +0200 (CEST) Received: from p-mail-int.rd.francetelecom.fr (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id F084A1804EB; Thu, 3 Oct 2019 11:04:45 +0200 (CEST) Received: from [10.193.71.64] (yd-CZC9059FTQ.rd.francetelecom.fr [10.193.71.64]) by p-mail-int.rd.francetelecom.fr (Postfix) with ESMTP id AFD081804B4; Thu, 3 Oct 2019 11:04:45 +0200 (CEST) Subject: Re: [PATCH] PCI/IOV: update num_VFs earlier To: Bjorn Helgaas Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org References: <20191001234520.GA96866@google.com> From: CREGUT Pierre IMT/OLN Message-ID: <49b0ad6d-7b6f-adbd-c4a3-5f9328a7ad9d@orange.com> Date: Thu, 3 Oct 2019 11:04:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191001234520.GA96866@google.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-PMX-Version: 6.4.7.2805085, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2019.10.3.85416, AntiVirus-Engine: 5.65.0, AntiVirus-Data: 2019.10.3.5650000 X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' MULTIPLE_RCPTS 0.1, HTML_00_01 0.05, HTML_00_10 0.05, BODY_SIZE_3000_3999 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, ECARD_WORD 0, FROM_NAME_PHRASE 0, IN_REP_TO 0, LEGITIMATE_SIGNS 0, MSG_THREAD 0, MULTIPLE_REAL_RCPTS 0, REFERENCES 0, SINGLE_URI_IN_BODY 0, URI_WITH_PATH_ONLY 0, __ANY_URI 0, __BODY_NO_MAILTO 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __DQ_NEG_HEUR 0, __DQ_NEG_IP 0, __FORWARDED_MSG 0, __HAS_CC_HDR 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_REFERENCES 0, __HTTPS_URI 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_TEXT_P 0, __MIME_TEXT_P1 0, __MIME_VERSION 0, __MOZILLA_USER_AGENT 0, __MULTIPLE_RCPTS_CC_X2 0, __NO_HTML_TAG_RAW 0, __PHISH_SPEAR_SUBJ_PREDICATE 0, __REFERENCES 0, __SANE_MSGID 0, __SINGLE_URI_TEXT 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __SUBJ_REPLY 0, __TO_MALFORMED_2 0, __TO_NAME 0, __TO_NAME_DIFF_FROM_ACC 0, __TO_REAL_NAMES 0, __URI_IN_BODY 0, __URI_NOT_IMG 0, __URI_NO_MAILTO 0, __URI_NO_WWW 0, __URI_NS , __URI_WITH_PATH 0, __USER_AGENT 0' Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org Le 02/10/2019 à 01:45, Bjorn Helgaas a écrit : > On Fri, Apr 26, 2019 at 10:11:54AM +0200, CREGUT Pierre IMT/OLN wrote: >> I also initially thought that kobject_uevent generated the netlink event >> but this is not the case. This is generated by the specific driver in use. >> For the Intel i40e driver, this is the call to i40e_do_reset_safe in >> i40e_pci_sriov_configure that sends the event. >> It is followed by i40e_pci_sriov_enable that calls i40e_alloc_vfs that >> finally calls the generic pci_enable_sriov function. > I don't know anything about netlink. The script from the bugzilla > (https://bugzilla.kernel.org/show_bug.cgi?id=202991) looks like it > runs > > ip monitor dev enp9s0f2 > > What are the actual netlink events you see? Are they related to a > device being removed? We have netlink events both when num_vfs goes from 0 to N and from N to 0. Indeed you have to go to 0 before going to M with M != N. On an Intel card, when one goes from 0 to N, the netlink event is sent "early". The value of num_vfs is still 0 and you get the impression that the number of VFS has not changed. As the meaning of those events is overloaded, you have to wait an arbitrary amount of time until it settles (there will be no other event). There is no such problem when it goes from N to 0 because of implementation details but it may be different for another brand. > When we change num_VFs, I think we have to disable any existing VFs > before enabling the new num_VFs, so if you trigger on a netlink > "remove" event, I wouldn't be surprised that reading sriov_numvfs > would give a zero until the new VFs are enabled. Yes but we are speaking of the event sent when num_vfs is changed from 0 to N > [...] > I thought this was a good idea, but > > - It does break the device_lock() encapsulation a little bit: > sriov_numvfs_store() uses device_lock(), which happens to be > implemented as "mutex_lock(&dev->mutex)", but we really shouldn't > rely on that implementation, and The use of device_lock was the cheapest solution. It is true that lock and trylock are exposed by device.h but not is_locked. To respect the abstraction, we would have to lock the device (at least use trylock but it means locking when we can access the value, in that case we may just make reading num_vfs blocking ?). The other solution is to record the state of freshness of num_vfs but it means a new Boolean in the pci_sriov data-structure. > - The netlink events are being generated via the NIC driver, and I'm > a little hesitant about changing the PCI core to deal with timing > issues "over there". NIC drivers send netlink events when their state change, but it is the core that changes the value of num_vfs. So I would think it is the core responsibility to make sure the exposed value makes sense and it would be better to ignore the details of the driver implementation. That is why the initial patch moving when the value was updated was finally not such a good idea. [...]