From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Vesker Subject: Re: [PATCH net-next 0/9] devlink: Add support for region access Date: Sat, 31 Mar 2018 09:11:27 +0300 Message-ID: References: <1522339672-18273-1-git-send-email-valex@mellanox.com> <20180329171359.GA12150@lunn.ch> <962b56c1-d471-97ec-e8e9-18252e809dfe@mellanox.com> <20180329195154.GB15565@lunn.ch> <28b99a08-1967-3044-4010-0faa5d6bfc14@mellanox.com> <20180330143403.GD28244@lunn.ch> <6d55f271-18f9-9ca5-0dbf-24951dd09978@gmail.com> <98477af6-b774-48bd-f663-28a7f9f554e3@mellanox.com> <86ebf2c1-dcdf-bfad-f1b8-cf73acf08ddc@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , , "Tariq Toukan" , Jiri Pirko To: David Ahern , Andrew Lunn Return-path: Received: from mail-he1eur01on0043.outbound.protection.outlook.com ([104.47.0.43]:34782 "EHLO EUR01-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753383AbeCaGLf (ORCPT ); Sat, 31 Mar 2018 02:11:35 -0400 In-Reply-To: <86ebf2c1-dcdf-bfad-f1b8-cf73acf08ddc@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On 3/31/2018 1:26 AM, David Ahern wrote: > On 3/30/18 1:39 PM, Alex Vesker wrote: >> >> On 3/30/2018 7:57 PM, David Ahern wrote: >>> On 3/30/18 8:34 AM, Andrew Lunn wrote: >>>>>> And it seems to want contiguous pages. How well does that work after >>>>>> the system has been running for a while and memory is fragmented? >>>>> The allocation can be changed, there is no read need for contiguous >>>>> pages. >>>>> It is important to note that we the amount of snapshots is limited >>>>> by the >>>>> driver >>>>> this can be based on the dump size or expected frequency of collection. >>>>> I also prefer not to pre-allocate this memory. >>>> The driver code also asks for a 1MB contiguous chunk of memory! You >>>> really should think about this API, how can you avoid double memory >>>> allocations. And can kvmalloc be used. But then you get into the >>>> problem for DMA'ing the memory from the device... >>>> >>>> This API also does not scale. 1MB is actually quite small. I'm sure >>>> there is firmware running on CPUs with a lot more than 1MB of RAM. >>>> How well does with API work with 64MB? Say i wanted to snapshot my >>>> GPU? Or the MC/BMC? >>>> >>> That and the drivers control the number of snapshots. The user should be >>> able to control the number of snapshots, and an option to remove all >>> snapshots to free up that memory. >> There is an option to free up this memory, using a delete command. >> The reason I added the option to control the number of snapshots from >> the driver side only is because the driver knows the size of the snapshots >> and when/why they will be taken. >> For example in our mlx4 driver the snapshots are taken on rare failures, >> the snapshot is quite large and from past analyses the first dump is >> usually >> the important one, this means that 8 is more than enough in my case. >> If a user wants more than that he can always monitor notification read >> the snapshot and delete once backup-ed, there is no reason for keeping >> all of this data in the kernel. >> >> > I was thinking less. ie., a user says keep only 1 or 2 snapshots or > disable snapshots altogether. Devlink configuration is not persistent if the driver is reloaded, currently there is no way to sync this. One or two might not be enough time to read, delete and make room for the next one, as I said each driver should do its calculations here based on frequency, size and even the time it takes capturing it. The user can't know if one snapshot is enough for debug I saw cases in which debug requires more than one snapshot to make sure a health clock is incremented and the FW is alive. I want to be able to login to a customer and accessing this snapshot without any previous configuration from the user and not asking for enabling the feature and then waiting for a repro...this will help debugging issues that are hard to reproduce, I don't see any reason to disable this.