linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vaibhav Jain <vaibhav@linux.ibm.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Santosh Sivaraj <santosh@fossix.org>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Date: Fri, 05 Jun 2020 00:19:03 +0530	[thread overview]
Message-ID: <87bllyhdk0.fsf@linux.ibm.com> (raw)
In-Reply-To: <20200603231814.GK1505637@iweiny-DESK2.sc.intel.com>

Hi Ira,

Thanks again for looking into patch. My responses below:

Ira Weiny <ira.weiny@intel.com> writes:

> On Thu, Jun 04, 2020 at 12:34:04AM +0530, Vaibhav Jain wrote:
>> Hi Ira,
>> 
>> Thanks for reviewing this patch. My responses below:
>> 
>> Ira Weiny <ira.weiny@intel.com> writes:
>> 
>> > On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
>> >> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
>> >> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
>> >> containing dimm health information back to user space in response to
>> >> ND_CMD_CALL. This functionality is implemented in newly introduced
>> >> papr_pdsm_health() that queries the nvdimm health information and
>> >> then copies this information to the package payload whose layout is
>> >> defined by 'struct nd_papr_pdsm_health'.
>> >> 
>> >> The patch also introduces a new member 'struct papr_scm_priv.health'
>> >> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
>> >> information of a nvdimm. As a result functions drc_pmem_query_health()
>> >> and flags_show() are updated to populate and use this new struct
>> >> instead of a u64 integer that was earlier used.
>> >> 
>> >> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
>> >> Cc: Dan Williams <dan.j.williams@intel.com>
>> >> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> >> Cc: Ira Weiny <ira.weiny@intel.com>
>> >> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> >> ---
>> >> Changelog:
>> >> 
>> >> Resend:
>> >> * Added ack from Aneesh.
>> >> 
>> >> v8..v9:
>> >> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g  [ Dan , Aneesh ]
>> >> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
>> >> * Renamed papr_scm_get_health() to papr_psdm_health()
>> >> * Updated patch description to replace papr-scm dimm with nvdimm.
>> >> 
>> >> v7..v8:
>> >> * None
>> >> 
>> >> Resend:
>> >> * None
>> >> 
>> >> v6..v7:
>> >> * Updated flags_show() to use seq_buf_printf(). [Mpe]
>> >> * Updated papr_scm_get_health() to use newly introduced
>> >>   __drc_pmem_query_health() bypassing the cache [Mpe].
>> >> 
>> >> v5..v6:
>> >> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>> >>   gaurd against possibility of different compilers adding different
>> >>   paddings to the struct [ Dan Williams ]
>> >> 
>> >> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>> >>   'bool' and also updated drc_pmem_query_health() to take this into
>> >>   account. [ Dan Williams ]
>> >> 
>> >> v4..v5:
>> >> * None
>> >> 
>> >> v3..v4:
>> >> * Call the DSM_PAPR_SCM_HEALTH service function from
>> >>   papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>> >> 
>> >> v2..v3:
>> >> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>> >>   as its exported to the userspace [Aneesh]
>> >> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>> >>   from enum to #defines [Aneesh]
>> >> 
>> >> v1..v2:
>> >> * New patch in the series
>> >> ---
>> >>  arch/powerpc/include/uapi/asm/papr_pdsm.h |  39 +++++++
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
>> >>  2 files changed, 147 insertions(+), 17 deletions(-)
>> >> 
>> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> index 6407fefcc007..411725a91591 100644
>> >> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>> >>   */
>> >>  enum papr_pdsm {
>> >>  	PAPR_PDSM_MIN = 0x0,
>> >> +	PAPR_PDSM_HEALTH,
>> >>  	PAPR_PDSM_MAX,
>> >>  };
>> >>  
>> >> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> >>  		return (void *)(pcmd->payload);
>> >>  }
>> >>  
>> >> +/* Various nvdimm health indicators */
>> >> +#define PAPR_PDSM_DIMM_HEALTHY       0
>> >> +#define PAPR_PDSM_DIMM_UNHEALTHY     1
>> >> +#define PAPR_PDSM_DIMM_CRITICAL      2
>> >> +#define PAPR_PDSM_DIMM_FATAL         3
>> >> +
>> >> +/*
>> >> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>> >> + * Various flags indicate the health status of the dimm.
>> >> + *
>> >> + * dimm_unarmed		: Dimm not armed. So contents wont persist.
>> >> + * dimm_bad_shutdown	: Previous shutdown did not persist contents.
>> >> + * dimm_bad_restore	: Contents from previous shutdown werent restored.
>> >> + * dimm_scrubbed	: Contents of the dimm have been scrubbed.
>> >> + * dimm_locked		: Contents of the dimm cant be modified until CEC reboot
>> >> + * dimm_encrypted	: Contents of dimm are encrypted.
>> >> + * dimm_health		: Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
>> >> + */
>> >> +struct nd_papr_pdsm_health_v1 {
>> >> +	__u8 dimm_unarmed;
>> >> +	__u8 dimm_bad_shutdown;
>> >> +	__u8 dimm_bad_restore;
>> >> +	__u8 dimm_scrubbed;
>> >> +	__u8 dimm_locked;
>> >> +	__u8 dimm_encrypted;
>> >> +	__u16 dimm_health;
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Typedef the current struct for dimm_health so that any application
>> >> + * or kernel recompiled after introducing a new version automatically
>> >> + * supports the new version.
>> >> + */
>> >> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
>> >> +
>> >> +/* Current version number for the dimm health struct */
>> >
>> > This can't be the 'current' version.  You will need a list of versions you
>> > support.  Because if the user passes in an old version you need to be able to
>> > respond with that old version.  Also if you plan to support 'return X for a Y
>> > query' then the user will need both X and Y defined to interpret X.
>> Yes, and that change will be introduced with addition of version-2 of
>> nd_papr_pdsm_health. Earlier version of the patchset[1] had such a table
>> implemented. But to simplify the patchset, as we are only dealing with
>> version-1 of the structs right now, it was dropped.
>> 
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/20200220095805.197229-9-vaibhav@linux.ibm.com/
>
> I'm not sure I follow that comment.
>
> I feel like there is some confusion about what firmware can return vs the UAPI
> structure.  You have already marshaled the data between the 2.  We can define
> whatever we want for the UAPI structures throwing away data the kernel does not
> understand from the firmware.
>
>> 
>> >
>> >> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
>> >> +
>> >>  #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 5e2237e7ec08..c0606c0c659c 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -88,7 +88,7 @@ struct papr_scm_priv {
>> >>  	unsigned long lasthealth_jiffies;
>> >>  
>> >>  	/* Health information for the dimm */
>> >> -	u64 health_bitmap;
>> >> +	struct nd_papr_pdsm_health health;
>> >
>> > ok so we are throwing away all the #defs from patch 1?  Are they still valid?
>> >
>> > I'm confused that patch 3 added this and we are throwing it away
>> > here...
>> The #defines are still valid, only the usage moved to a __drc_pmem_query_health().
>> 
>> >
>> >>  };
>> >>  
>> >>  static int drc_pmem_bind(struct papr_scm_priv *p)
>> >> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>> >>  static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >>  {
>> >>  	unsigned long ret[PLPAR_HCALL_BUFSIZE];
>> >> +	u64 health;
>> >>  	long rc;
>> >>  
>> >>  	/* issue the hcall */
>> >> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >>  	if (rc != H_SUCCESS) {
>> >>  		dev_err(&p->pdev->dev,
>> >>  			 "Failed to query health information, Err:%ld\n", rc);
>> >> -		rc = -ENXIO;
>> >> -		goto out;
>> >> +		return -ENXIO;
>> >
>> > I missed this...  probably did not need the goto in the first patch?
>> Yes, will get rid of the goto from patch-1.
>
> Cool.
>
>> 
>> >
>> >>  	}
>> >>  
>> >>  	p->lasthealth_jiffies = jiffies;
>> >> -	p->health_bitmap = ret[0] & ret[1];
>> >> +	health = ret[0] & ret[1];
>> >>  
>> >>  	dev_dbg(&p->pdev->dev,
>> >>  		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> >>  		ret[0], ret[1]);
>> >> -out:
>> >> -	return rc;
>> >> +
>> >> +	memset(&p->health, 0, sizeof(p->health));
>> >> +
>> >> +	/* Check for various masks in bitmap and set the buffer */
>> >> +	if (health & PAPR_PMEM_UNARMED_MASK)
>> >
>> > Oh ok...  odd.  (don't add code then just take it away in a series)
>> > You could have lead with the user structure and put this code in patch
>> > 3.
>> The struct nd_papr_pdsm_health in only introduced this patch in header
>> 'papr_pdsm.h' as means of exchanging nvdimm health information with
>> userspace. Introducing this struct without introducing the necessary
>> scafolding in 'papr_pdsm.h' would have been very counter-intutive.
>
> I respectfully disagree.  You intended to use a copy of this structure in
> kernel to store the data.  Just do that.
Have addressed this in v10 that doesnt resort to removing the
functionality that was introduced in an earlier patch.

>
>> 
>> >
>> > Why does the user need u8 to represent a single bit?  Does this help protect
>> > against endian issues?
>> This was 'bool' earlier but since type 'bool' isnt suitable for ioctl abi
>> and I wanted to avoid bit fields here as not sure if their packing may
>> differ across compilers hence replaced with u8.
>> 
>
> ok works for me...
>
>> >
>> >> +		p->health.dimm_unarmed = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> +		p->health.dimm_bad_shutdown = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> +		p->health.dimm_bad_restore = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_ENCRYPTED)
>> >> +		p->health.dimm_encrypted = 1;
>> >> +
>> >> +	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
>> >> +		p->health.dimm_locked = 1;
>> >> +		p->health.dimm_scrubbed = 1;
>> >> +	}
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_CRITICAL)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
>> >> +
>> >> +	if (health & PAPR_PMEM_HEALTH_FATAL)
>> >> +		p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
>> >> +
>> >> +	return 0;
>> >>  }
>> >>  
>> >>  /* Min interval in seconds for assuming stable dimm health */
>> >> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >>  	return 0;
>> >>  }
>> >>  
>> >> +/* Fetch the DIMM health info and populate it in provided package. */
>> >> +static int papr_pdsm_health(struct papr_scm_priv *p,
>> >> +			       struct nd_pdsm_cmd_pkg *pkg)
>> >> +{
>> >> +	int rc;
>> >> +	size_t copysize = sizeof(p->health);
>> >> +
>> >> +	/* Ensure dimm health mutex is taken preventing concurrent access */
>> >> +	rc = mutex_lock_interruptible(&p->health_mutex);
>> >> +	if (rc)
>> >> +		goto out;
>> >> +
>> >> +	/* Always fetch upto date dimm health data ignoring cached values */
>> >> +	rc = __drc_pmem_query_health(p);
>> >> +	if (rc)
>> >> +		goto out_unlock;
>> >> +	/*
>> >> +	 * If the requested payload version is greater than one we know
>> >> +	 * about, return the payload version we know about and let
>> >> +	 * caller/userspace handle.
>> >> +	 */
>> >> +	if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
>> >> +		pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
>> >
>> > I know this seems easy now but I do think you will run into trouble later.
>> 
>> I did addressed this in an earlier iteration of this patchset[1] and
>> dropped it in favour of simplicity.
>> 
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/20200220095805.197229-9-vaibhav@linux.ibm.com/
>  
> I don't see how that addresses this?  See my other email.
>
> Ira
>
>> 
>> > Ira
>> >
>> >> +
>> >> +	if (pkg->hdr.nd_size_out < copysize) {
>> >> +		dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
>> >> +			pkg->hdr.nd_size_out, copysize);
>> >> +		rc = -ENOSPC;
>> >> +		goto out_unlock;
>> >> +	}
>> >> +
>> >> +	dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
>> >> +		copysize, pkg->payload_version);
>> >> +
>> >> +	/* Copy the health struct to the payload */
>> >> +	memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
>> >> +	pkg->hdr.nd_fw_size = copysize;
>> >> +
>> >> +out_unlock:
>> >> +	mutex_unlock(&p->health_mutex);
>> >> +
>> >> +out:
>> >> +	/*
>> >> +	 * Put the error in out package and return success from function
>> >> +	 * so that errors if any are propogated back to userspace.
>> >> +	 */
>> >> +	pkg->cmd_status = rc;
>> >> +	dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
>> >> +
>> >> +	return 0;
>> >> +}
>> >> +
>> >>  static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >>  				struct nd_pdsm_cmd_pkg *call_pkg)
>> >>  {
>> >> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >>  
>> >>  	/* Depending on the DSM command call appropriate service routine */
>> >>  	switch (call_pkg->hdr.nd_command) {
>> >> +	case PAPR_PDSM_HEALTH:
>> >> +		return papr_pdsm_health(p, call_pkg);
>> >> +
>> >>  	default:
>> >>  		dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> >>  			call_pkg->hdr.nd_command);
>> >> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
>> >>  	struct nvdimm *dimm = to_nvdimm(dev);
>> >>  	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> >>  	struct seq_buf s;
>> >> -	u64 health;
>> >>  	int rc;
>> >>  
>> >>  	rc = drc_pmem_query_health(p);
>> >>  	if (rc)
>> >>  		return rc;
>> >>  
>> >> -	/* Copy health_bitmap locally, check masks & update out buffer */
>> >> -	health = READ_ONCE(p->health_bitmap);
>> >> -
>> >>  	seq_buf_init(&s, buf, PAGE_SIZE);
>> >> -	if (health & PAPR_PMEM_UNARMED_MASK)
>> >> +
>> >> +	/* Protect concurrent modifications to papr_scm_priv */
>> >> +	rc = mutex_lock_interruptible(&p->health_mutex);
>> >> +	if (rc)
>> >> +		return rc;
>> >> +
>> >> +	if (p->health.dimm_unarmed)
>> >>  		seq_buf_printf(&s, "not_armed ");
>> >>  
>> >> -	if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> +	if (p->health.dimm_bad_shutdown)
>> >>  		seq_buf_printf(&s, "flush_fail ");
>> >>  
>> >> -	if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> +	if (p->health.dimm_bad_restore)
>> >>  		seq_buf_printf(&s, "restore_fail ");
>> >>  
>> >> -	if (health & PAPR_PMEM_ENCRYPTED)
>> >> +	if (p->health.dimm_encrypted)
>> >>  		seq_buf_printf(&s, "encrypted ");
>> >>  
>> >> -	if (health & PAPR_PMEM_SMART_EVENT_MASK)
>> >> +	if (p->health.dimm_health)
>> >>  		seq_buf_printf(&s, "smart_notify ");
>> >>  
>> >> -	if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
>> >> -		seq_buf_printf(&s, "scrubbed locked ");
>> >> +	if (p->health.dimm_scrubbed)
>> >> +		seq_buf_printf(&s, "scrubbed ");
>> >> +
>> >> +	if (p->health.dimm_locked)
>> >> +		seq_buf_printf(&s, "locked ");
>> >> +
>> >> +	mutex_unlock(&p->health_mutex);
>> >>  
>> >>  	if (seq_buf_used(&s))
>> >>  		seq_buf_printf(&s, "\n");
>> >> -- 
>> >> 2.26.2
>> >> 
>> 
>> -- 
>> Cheers
>> ~ Vaibhav

-- 
Cheers
~ Vaibhav

      reply	other threads:[~2020-06-04 18:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02 10:14 [RESEND PATCH v9 0/5] powerpc/papr_scm: Add support for reporting nvdimm health Vaibhav Jain
2020-06-02 10:14 ` [RESEND PATCH v9 1/5] powerpc: Document details on H_SCM_HEALTH hcall Vaibhav Jain
2020-06-02 21:21   ` Ira Weiny
2020-06-02 10:14 ` [RESEND PATCH v9 2/5] seq_buf: Export seq_buf_printf Vaibhav Jain
2020-06-02 10:14 ` [RESEND PATCH v9 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP Vaibhav Jain
2020-06-02 18:45   ` Ira Weiny
2020-06-02 10:14 ` [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods Vaibhav Jain
2020-06-02 20:51   ` Ira Weiny
2020-06-02 21:05     ` Ira Weiny
2020-06-03 20:28       ` Vaibhav Jain
2020-06-03 23:22         ` Ira Weiny
2020-06-03 18:11     ` Vaibhav Jain
2020-06-03 22:46       ` Ira Weiny
2020-06-04 18:42         ` Vaibhav Jain
     [not found]   ` <BN6PR11MB413223B333153721405DFD91C6890@BN6PR11MB4132.namprd11.prod.outlook.com>
2020-06-04  9:05     ` Vaibhav Jain
2020-06-05  0:54       ` Williams, Dan J
2020-06-05 15:21         ` Vaibhav Jain
2020-06-05 18:19           ` Dan Williams
2020-06-02 10:14 ` [RESEND PATCH v9 5/5] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH Vaibhav Jain
2020-06-02 21:19   ` Ira Weiny
2020-06-03 19:04     ` Vaibhav Jain
2020-06-03 23:18       ` Ira Weiny
2020-06-04 18:49         ` Vaibhav Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bllyhdk0.fsf@linux.ibm.com \
    --to=vaibhav@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=santosh@fossix.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).