From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8671BC48BD1 for ; Thu, 10 Jun 2021 21:17:52 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3D6CC6100A for ; Thu, 10 Jun 2021 21:17:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D6CC6100A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C66686E2C8; Thu, 10 Jun 2021 21:17:51 +0000 (UTC) Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8334D6E2C8 for ; Thu, 10 Jun 2021 21:17:51 +0000 (UTC) Received: by mail-ot1-x32d.google.com with SMTP id i12-20020a05683033ecb02903346fa0f74dso1085627otu.10 for ; Thu, 10 Jun 2021 14:17:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DjD0xGa41vKKvZskmfJP2lLSG2GM1RH60CL4Pe/fCSI=; b=kFwZ1RgDni182kOhKxqaKxscyxbkxt1cPA2UYvUMwJpacU0rTFeByU7/5dA5ugVRMR jVBCpPJ1Yk1/OpuqbTt9DieemeyIpXmcaoqbjCH9n8bgLfyqkuzTNMufiwJ72Hk/X3Z7 MSsR3qGwz0JV7S91VnypQh78wPJ2lQKi6TmZLUMoo9RBQ8ooaXsUxOjICzbbrHZejq7K dzQNvQjVUP6nU5AaMd4u/yO9kRJNwrngnhRXBCBEVw6YzBOfkMpXMaS9A+SBFxIrMQ7U oPrxZmQcr8muIYPPb4F+q4WQD4208dGQhYpoMxVA7zACeV/cehsusGpupBYUZI6mloYN ysJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DjD0xGa41vKKvZskmfJP2lLSG2GM1RH60CL4Pe/fCSI=; b=GHQpUdRzMOmrsRzQcAVnYsQZBbv+Lhanern1TAEZwL7cSTGMcFEjZDqpv2a8prUq2t 2xyyxRNV43tmYBlmQ7jv8/gi0HYuJoWkv5kg2wkwJyV1xHo9SPLb1F+KFGbBZ4A2mwBd io5w/4YrmB+06O61l+NBrioOm278o/6W2kUBOAteitsvyaHZaVUY8zKVVWlsqsOk5VnD 78Wyzw0B8NaJ4Resj6osBr/DF8nN3OajInSDIcn0vuVpVdsd0pLiQUC2kYxibqeo/G/h HH/DwBxjl/KbBWNn6ry9cSTzigPUiHgiyY8Z0WhTrHFCTWuhNRKqaGjPhDYVspUmePN6 jVfg== X-Gm-Message-State: AOAM533M/Xgqh6n6wOgvxw96Pm5BZwpwgAMvFKXdQlaai2qVH+zGO1Jy erxQA1Jp0FYrKp/nNtDSX6rPnRbjH2iFqG40v7I4truqn9E= X-Google-Smtp-Source: ABdhPJw0eFQr+jGEhPz+V8y/2OdeY0vtmJoN6OzfOMkrSEH/dn1O6R8kdU/SnkNhbArnW3lkV9AqM4qf/JaMW6C6a2s= X-Received: by 2002:a05:6830:33ef:: with SMTP id i15mr285482otu.311.1623359870906; Thu, 10 Jun 2021 14:17:50 -0700 (PDT) MIME-Version: 1.0 References: <20210608213954.5517-1-luben.tuikov@amd.com> <20210608213954.5517-37-luben.tuikov@amd.com> In-Reply-To: <20210608213954.5517-37-luben.tuikov@amd.com> From: Alex Deucher Date: Thu, 10 Jun 2021 17:17:40 -0400 Message-ID: Subject: Re: [PATCH 36/40] drm/amdgpu: Use explicit cardinality for clarity To: Luben Tuikov X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexander Deucher , John Clements , Guchun Chen , amd-gfx list , Hawking Zhang Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" On Tue, Jun 8, 2021 at 5:41 PM Luben Tuikov wrote: > > RAS_MAX_RECORD_NUM may mean the maximum record > number, as in the maximum house number on your > street, or it may mean the maximum number of > records, as in the count of records, which is also > a number. To make this distinction whether the > number is ordinal (index) or cardinal (count), > rename this macro to RAS_MAX_RECORD_COUNT. > > This makes it easy to understand what it refers > to, especially when we compute quantities such as, > how many records do we have left in the table, > especially when there are so many other numbers, > quantities and numerical macros around. > > Also rename the long, > amdgpu_ras_eeprom_get_record_max_length() to the > more succinct and clear, > amdgpu_ras_eeprom_max_record_count(). > > When computing the threshold, which also deals > with counts, i.e. "how many", use cardinal > "max_eeprom_records_count", than the quantitative > "max_eeprom_records_len". > > Simplify the logic here and there, as well. > > Cc: Guchun Chen > Cc: John Clements > Cc: Hawking Zhang > Cc: Alexander Deucher > Signed-off-by: Luben Tuikov Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 50 ++++++++----------- > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 8 +-- > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h | 2 +- > 4 files changed, 30 insertions(+), 39 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 3de1accb060e37..0203f654576bcc 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -853,11 +853,10 @@ MODULE_PARM_DESC(reset_method, "GPU reset method (-1 = auto (default), 0 = legac > module_param_named(reset_method, amdgpu_reset_method, int, 0444); > > /** > - * DOC: bad_page_threshold (int) > - * Bad page threshold is to specify the threshold value of faulty pages > - * detected by RAS ECC, that may result in GPU entering bad status if total > - * faulty pages by ECC exceed threshold value and leave it for user's further > - * check. > + * DOC: bad_page_threshold (int) Bad page threshold is specifies the > + * threshold value of faulty pages detected by RAS ECC, which may > + * result in the GPU entering bad status when the number of total > + * faulty pages by ECC exceeds the threshold value. > */ > MODULE_PARM_DESC(bad_page_threshold, "Bad page threshold(-1 = auto(default value), 0 = disable bad page retirement)"); > module_param_named(bad_page_threshold, amdgpu_bad_page_threshold, int, 0444); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 66c96c65e7eeb9..95ab400b641af0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -71,8 +71,8 @@ const char *ras_block_string[] = { > /* inject address is 52 bits */ > #define RAS_UMC_INJECT_ADDR_LIMIT (0x1ULL << 52) > > -/* typical ECC bad page rate(1 bad page per 100MB VRAM) */ > -#define RAS_BAD_PAGE_RATE (100 * 1024 * 1024ULL) > +/* typical ECC bad page rate is 1 bad page per 100MB VRAM */ > +#define RAS_BAD_PAGE_COVER (100 * 1024 * 1024ULL) > > enum amdgpu_ras_retire_page_reservation { > AMDGPU_RAS_RETIRE_PAGE_RESERVED, > @@ -1841,27 +1841,24 @@ int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev) > static int amdgpu_ras_load_bad_pages(struct amdgpu_device *adev) > { > struct amdgpu_ras_eeprom_control *control = > - &adev->psp.ras.ras->eeprom_control; > - struct eeprom_table_record *bps = NULL; > - int ret = 0; > + &adev->psp.ras.ras->eeprom_control; > + struct eeprom_table_record *bps; > + int ret; > > /* no bad page record, skip eeprom access */ > - if (!control->num_recs || (amdgpu_bad_page_threshold == 0)) > - return ret; > + if (control->num_recs == 0 || amdgpu_bad_page_threshold == 0) > + return 0; > > bps = kcalloc(control->num_recs, sizeof(*bps), GFP_KERNEL); > if (!bps) > return -ENOMEM; > > - if (amdgpu_ras_eeprom_read(control, bps, control->num_recs)) { > + ret = amdgpu_ras_eeprom_read(control, bps, control->num_recs); > + if (ret) > dev_err(adev->dev, "Failed to load EEPROM table records!"); > - ret = -EIO; > - goto out; > - } > - > - ret = amdgpu_ras_add_bad_pages(adev, bps, control->num_recs); > + else > + ret = amdgpu_ras_add_bad_pages(adev, bps, control->num_recs); > > -out: > kfree(bps); > return ret; > } > @@ -1901,11 +1898,9 @@ static bool amdgpu_ras_check_bad_page(struct amdgpu_device *adev, > } > > static void amdgpu_ras_validate_threshold(struct amdgpu_device *adev, > - uint32_t max_length) > + uint32_t max_count) > { > struct amdgpu_ras *con = amdgpu_ras_get_context(adev); > - int tmp_threshold = amdgpu_bad_page_threshold; > - u64 val; > > /* > * Justification of value bad_page_cnt_threshold in ras structure > @@ -1926,18 +1921,15 @@ static void amdgpu_ras_validate_threshold(struct amdgpu_device *adev, > * take no effect. > */ > > - if (tmp_threshold < -1) > - tmp_threshold = -1; > - else if (tmp_threshold > max_length) > - tmp_threshold = max_length; > + if (amdgpu_bad_page_threshold < 0) { > + u64 val = adev->gmc.mc_vram_size; > > - if (tmp_threshold == -1) { > - val = adev->gmc.mc_vram_size; > - do_div(val, RAS_BAD_PAGE_RATE); > + do_div(val, RAS_BAD_PAGE_COVER); > con->bad_page_cnt_threshold = min(lower_32_bits(val), > - max_length); > + max_count); > } else { > - con->bad_page_cnt_threshold = tmp_threshold; > + con->bad_page_cnt_threshold = min_t(int, max_count, > + amdgpu_bad_page_threshold); > } > } > > @@ -1945,7 +1937,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) > { > struct amdgpu_ras *con = amdgpu_ras_get_context(adev); > struct ras_err_handler_data **data; > - uint32_t max_eeprom_records_len = 0; > + u32 max_eeprom_records_count = 0; > bool exc_err_limit = false; > int ret; > > @@ -1965,8 +1957,8 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) > atomic_set(&con->in_recovery, 0); > con->adev = adev; > > - max_eeprom_records_len = amdgpu_ras_eeprom_get_record_max_length(); > - amdgpu_ras_validate_threshold(adev, max_eeprom_records_len); > + max_eeprom_records_count = amdgpu_ras_eeprom_max_record_count(); > + amdgpu_ras_validate_threshold(adev, max_eeprom_records_count); > > /* Todo: During test the SMU might fail to read the eeprom through I2C > * when the GPU is pending on XGMI reset during probe time > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > index 54ef31594accd9..21e1e59e4857ff 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c > @@ -54,7 +54,7 @@ > #define RAS_TBL_SIZE_BYTES (256 * 1024) > #define RAS_HDR_START 0 > #define RAS_RECORD_START (RAS_HDR_START + RAS_TABLE_HEADER_SIZE) > -#define RAS_MAX_RECORD_NUM ((RAS_TBL_SIZE_BYTES - RAS_TABLE_HEADER_SIZE) \ > +#define RAS_MAX_RECORD_COUNT ((RAS_TBL_SIZE_BYTES - RAS_TABLE_HEADER_SIZE) \ > / RAS_TABLE_RECORD_SIZE) > > #define to_amdgpu_device(x) (container_of(x, struct amdgpu_ras, eeprom_control))->adev > @@ -532,7 +532,7 @@ static int amdgpu_ras_eeprom_xfer(struct amdgpu_ras_eeprom_control *control, > * TODO - Check the assumption is correct > */ > control->num_recs += num; > - control->num_recs %= RAS_MAX_RECORD_NUM; > + control->num_recs %= RAS_MAX_RECORD_COUNT; > control->tbl_hdr.tbl_size += RAS_TABLE_RECORD_SIZE * num; > if (control->tbl_hdr.tbl_size > RAS_TBL_SIZE_BYTES) > control->tbl_hdr.tbl_size = RAS_TABLE_HEADER_SIZE + > @@ -568,9 +568,9 @@ int amdgpu_ras_eeprom_write(struct amdgpu_ras_eeprom_control *control, > return amdgpu_ras_eeprom_xfer(control, records, num, true); > } > > -inline uint32_t amdgpu_ras_eeprom_get_record_max_length(void) > +inline uint32_t amdgpu_ras_eeprom_max_record_count(void) > { > - return RAS_MAX_RECORD_NUM; > + return RAS_MAX_RECORD_COUNT; > } > > /* Used for testing if bugs encountered */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h > index 4906ed9fb8cdd3..504729b8053759 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h > @@ -88,7 +88,7 @@ int amdgpu_ras_eeprom_read(struct amdgpu_ras_eeprom_control *control, > int amdgpu_ras_eeprom_write(struct amdgpu_ras_eeprom_control *control, > struct eeprom_table_record *records, const u32 num); > > -inline uint32_t amdgpu_ras_eeprom_get_record_max_length(void); > +inline uint32_t amdgpu_ras_eeprom_max_record_count(void); > > void amdgpu_ras_eeprom_test(struct amdgpu_ras_eeprom_control *control); > > -- > 2.32.0 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx