All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Ben Widawsky" <bwidawsk@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	<linux-kernel@vger.kernel.org>, <linux-cxl@vger.kernel.org>
Subject: Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record
Date: Mon, 12 Sep 2022 16:04:07 -0700	[thread overview]
Message-ID: <Yx+65zjlpTsmg6M5@iweiny-mobl> (raw)
In-Reply-To: <20220825114632.00003fc6@huawei.com>

On Thu, Aug 25, 2022 at 11:46:32AM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:39 -0700
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> > 
> > Determine if the event read is a DRAM event record and if so trace the
> > record.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > 
> > ---
> > This record has a very odd byte layout with 2 - 16 bit fields
> > (validity_flags and column) aligned on an odd byte boundary.  In
> > addition nibble_mask and row are oddly aligned.
> > 
> > I've made my best guess as to how the endianess of these fields should
> > be resolved.  But I'm happy to hear from other folks if what I have is
> > wrong.
> My assumption is same as you.  We should sanity check of course by
> poking relevant people.  
> 
> Similar comments in here to previous.  Use the get_unaligned_le24()
> accessors + consider not printing invalid fields.

Yea I've already converted the 3 byte fields to get_unaligned_le24()

> > 
> > struct cxl_evt_dram_rec {
> > 	struct cxl_event_record_hdr hdr;
> > 	__le64 phys_addr;
> > 	u8 descriptor;
> > 	u8 type;
> > 	u8 transaction_type;
> > 	u16 validity_flags;
> > 	u8 channel;
> > 	u8 rank;
> > 	u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> > 	u8 bank_group;
> > 	u8 bank;
> > 	u8 row[CXL_EVT_DER_ROW_SIZE];
> > 	u16 column;
> > 	u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> > } __packed;
> > ---
> >  drivers/cxl/core/mbox.c           |  16 +++++
> >  drivers/cxl/cxlmem.h              |  24 +++++++
> >  include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> >  3 files changed, 154 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 0e433f072163..6414588a3c7b 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> >  	UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> >  		  0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> >  
> > +/*
> > + * DRAM Event Record
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
> rev3.0, r3.0 or just 3.0  

Already done.

> 
> > + */
> > +static const uuid_t dram_event_uuid =
> > +	UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> > +		  0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> > +
> >  static void cxl_trace_event_record(const char *dev_name,
> >  				   enum cxl_event_log_type type,
> >  				   struct cxl_get_event_payload *payload)
> > @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> >  		return;
> >  	}
> >  
> > +	if (uuid_equal(id, &dram_event_uuid)) {
> Why not else if?  Should be obvious to compiler that multiple uuid_equal
> conditions can't match, but even better to not make it try hard perhaps?

Sure else if can work.

> 
> > +		struct cxl_evt_dram_rec *rec =
> > +				(struct cxl_evt_dram_rec *)&payload->record;
> > +
> > +		trace_cxl_dram_event(dev_name, type, rec);
> > +		return;
> > +	}
> > +
> >  	/* For unknown record types print just the header */
> >  	trace_cxl_event(dev_name, type, &payload->record);
> >  }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 33669459ae4b..50536c0a7850 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> >  	u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> >  } __packed;
> >  
> > +/*
> > + * DRAM Event Record - DER
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> > + */
> > +#define CXL_EVT_DER_NIBBLE_MASK_SIZE		3
> > +#define CXL_EVT_DER_ROW_SIZE			3
> > +#define CXL_EVT_DER_CORRECTION_MASK_SIZE	0x20
> > +struct cxl_evt_dram_rec {
> > +	struct cxl_event_record_hdr hdr;
> > +	__le64 phys_addr;
> > +	u8 descriptor;
> > +	u8 type;
> > +	u8 transaction_type;
> > +	u16 validity_flags;
> I've not tried it, but can we just mark these as __le16 and use
> the unaligned accessors?  get_unaligned_le16 etc

get_unaligned_le16() requires a byte array...

So I think this needs to be:

	u8 validity_flags[2];

Now that I know about those calls I think this does make a lot more sense.  The
test code works but I knew that it would be sketchy with real devices.

I'll adjust this.

> Also there is get_unaligned_le24() for the 3 byte ones.

Yea done.

[snip]

> > +
> > +	TP_fast_assign(
> > +		/* Common */
> > +		__assign_str(dev_name, dev_name);
> > +		memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > +		__entry->log = log;
> > +		__entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > +		__entry->handle = le16_to_cpu(rec->hdr.handle);
> > +		__entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > +		__entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +
> > +		/* DRAM */
> > +		__entry->phys_addr = le64_to_cpu(rec->phys_addr);
> > +		__entry->descriptor = rec->descriptor;
> > +		__entry->type = rec->type;
> > +		__entry->transaction_type = rec->transaction_type;
> > +		__entry->validity_flags = le16_to_cpu(rec->validity_flags);
> > +		__entry->channel = rec->channel;
> > +		__entry->rank = rec->rank;
> > +		__entry->nibble_mask = rec->nibble_mask[0] << 24 |
> > +				       rec->nibble_mask[1] << 16 |
> > +				       rec->nibble_mask[2] << 8; /* 3 byte LE ? */
> 
> Use get_unalinged_le24() ? I'd definitely expect these to be le24.
> 
> 
> > +		__entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
> 
> That doesn't look right.  You will have unwound the endianness using
> the shifts above. Don't convert it again (noop on le systems, so you
> probably won't see a problem when testing).

I thought I did it right with 2 shifts.  But regardless using
get_unalinged_le24() is better and I've already changed it.

> 
> > +		__entry->bank_group = rec->bank_group;
> > +		__entry->bank = rec->bank;
> > +		__entry->row = rec->row[0] << 24 |
> > +			       rec->row[1] << 16 |
> > +			       rec->row[2] << 8; /* 3 byte LE ? */
> 
> get_unaligned_le24()

... and this one.

> 
> > +		__entry->row = le32_to_cpu(__entry->row);
> 
> > +		__entry->column = le16_to_cpu(rec->column);
> > +		memcpy(__entry->cor_mask, &rec->correction_mask,
> > +			CXL_EVT_DER_CORRECTION_MASK_SIZE);
> > +	),
> > +
> > +	TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > +		  "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > +		  "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> > +		  "cor_mask=%s valid_flags='%s'",
> > +		__get_str(dev_name), show_log_type(__entry->log),
> > +		__entry->timestamp, __entry->id, __entry->handle,
> > +		__entry->related_handle, show_hdr_flags(__entry->flags),
> > +		__entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > +		(__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > +		show_event_desc_flags(__entry->descriptor),
> As before can we not print the invalid ones based on the validity flags?
> 
> Few years ago now, but I did something along those lines for the CCIX equivalent of
> this stuff.  (honestly can't remember much about it now though!)
> Was a bit fiddly but lead to nicer prints in my opinion.
> 
> https://lore.kernel.org/all/20191114133919.32290-2-Jonathan.Cameron@huawei.com/

I'm still not seeing anything which alters the actual print in this patch or
ras_event.h

Perhaps I'm missing what you mean by selecting the valid fields.

Something will have to change the TP_printk() format itself from what I can see
and I don't see a way to do that within the trace infrastructure.

We _could_ do that within the C code where trace_dram() is called.  But I'd
like to keep all the info together and let user space decode more than what the
kernel may know.

Ira

  reply	other threads:[~2022-09-12 23:04 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-13  5:32 [RFC PATCH 0/9] CXL: Read and clear event logs ira.weiny
2022-08-13  5:32 ` [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command ira.weiny
2022-08-16 16:39   ` Steven Rostedt
2022-08-16 16:41     ` Steven Rostedt
2022-08-16 23:11       ` Ira Weiny
2022-08-16 23:35     ` Ira Weiny
2022-08-17 22:54   ` Dave Jiang
2022-09-07  4:53     ` Ira Weiny
2022-08-24 15:50   ` Jonathan Cameron
2022-09-07  4:28     ` Ira Weiny
2022-09-08 12:52       ` Jonathan Cameron
2022-09-09 20:53         ` Ira Weiny
2022-09-20 15:49           ` Jonathan Cameron
2022-09-20 20:23             ` Dave Jiang
2022-09-20 22:10               ` Ira Weiny
2022-09-21 16:36                 ` Jonathan Cameron
2022-09-22  4:16                   ` Ira Weiny
2022-08-13  5:32 ` [RFC PATCH 2/9] cxl/mem: Implement Clear " ira.weiny
2022-08-24 15:55   ` Jonathan Cameron
2022-09-09 21:35     ` Ira Weiny
2022-08-13  5:32 ` [RFC PATCH 3/9] cxl/mem: Clear events on driver load ira.weiny
2022-08-24 15:57   ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record ira.weiny
2022-08-24 16:11   ` Jonathan Cameron
2022-09-12 22:38     ` Ira Weiny
2022-09-20 15:52       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 5/9] cxl/mem: Trace DRAM " ira.weiny
2022-08-25 10:46   ` Jonathan Cameron
2022-09-12 23:04     ` Ira Weiny [this message]
2022-09-20 16:02       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 6/9] cxl/mem: Trace Memory Module " ira.weiny
2022-08-25 10:58   ` Jonathan Cameron
2022-09-14 21:17     ` Ira Weiny
2022-09-20 16:11       ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 7/9] cxl/test: Add generic mock events ira.weiny
2022-08-25 11:31   ` Jonathan Cameron
2022-09-15 18:53     ` Ira Weiny
2022-09-20 16:17       ` Jonathan Cameron
2022-09-26 21:39         ` Ira Weiny
2022-09-27 13:56           ` Jonathan Cameron
2022-09-27 16:13             ` Ira Weiny
2022-09-28  9:49               ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 8/9] cxl/test: Add specific events ira.weiny
2022-08-25 11:37   ` Jonathan Cameron
2022-08-13  5:32 ` [RFC PATCH 9/9] cxl/test: Simulate event log overflow ira.weiny
2022-08-16 16:44   ` Steven Rostedt
2022-08-22 16:18 ` [RFC PATCH 0/9] CXL: Read and clear event logs Davidlohr Bueso
2022-08-22 22:53   ` Ira Weiny
2022-08-23 16:12     ` Davidlohr Bueso
2022-08-24 10:07     ` Jonathan Cameron
2022-09-01 18:10       ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yx+65zjlpTsmg6M5@iweiny-mobl \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.