On Wed, Mar 13, 2024 at 09:05:20PM -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Add helpers to extract the value of an event record field given the > field name. This is useful when the user knows the name and format > of the field and simply needs to get it. The helpers also return > the 'type'_MAX of the type when the field is > > Since this is in preparation for adding a cxl_poison private parser > for 'cxl list --media-errors' support those specific required > types: u8, u32, u64. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- Reviewed-by: Fan Ni <fan.ni@samsung.com> > cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > cxl/event_trace.h | 8 +++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/cxl/event_trace.c b/cxl/event_trace.c > index 640abdab67bf..324edb982888 100644 > --- a/cxl/event_trace.c > +++ b/cxl/event_trace.c > @@ -15,6 +15,43 @@ > #define _GNU_SOURCE > #include <string.h> > > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + unsigned long long val; > + > + if (tep_get_field_val(NULL, event, name, record, &val, 0)) > + return ULLONG_MAX; > + > + return val; > +} > + > +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + char *val; > + int len; > + > + val = tep_get_field_raw(NULL, event, name, record, &len, 0); > + if (!val) > + return UINT_MAX; > + > + return *(u32 *)val; > +} > + > +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + char *val; > + int len; > + > + val = tep_get_field_raw(NULL, event, name, record, &len, 0); > + if (!val) > + return UCHAR_MAX; > + > + return *(u8 *)val; > +} > + > static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags) > { > bool sign = flags & TEP_FIELD_IS_SIGNED; > diff --git a/cxl/event_trace.h b/cxl/event_trace.h > index b77cafb410c4..7b30c3922aef 100644 > --- a/cxl/event_trace.h > +++ b/cxl/event_trace.h > @@ -5,6 +5,7 @@ > > #include <json-c/json.h> > #include <ccan/list/list.h> > +#include <ccan/short_types/short_types.h> > > struct jlist_node { > struct json_object *jobj; > @@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx); > int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system, > const char *event); > int cxl_event_tracing_disable(struct tracefs_instance *inst); > - > +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, > + const char *name); > +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, > + const char *name); > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, > + const char *name); > #endif > -- > 2.37.3 >
Alison Schofield wrote:
> On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote:
> > On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > CXL devices maintain a list of locations that are poisoned or result
> > > in poison if the addresses are accessed by the host.
> > >
> > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison
> > > List as a set of Media Error Records that include the source of the
> > > error, the starting device physical address and length.
> > >
> > > Trigger the retrieval of the poison list by writing to the memory
> > > device sysfs attribute: trigger_poison_list. The CXL driver only
> > > offers triggering per memdev, so the trigger by region interface
> > > offered here is a convenience API that triggers a poison list
> > > retrieval for each memdev contributing to a region.
> > >
> > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev);
> > > int cxl_region_trigger_poison_list(struct cxl_region *region);
> > >
> > > The resulting poison records are logged as kernel trace events
> > > named 'cxl_poison'.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > ---
> > > cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
> > > cxl/lib/libcxl.sym | 2 ++
> > > cxl/libcxl.h | 2 ++
> > > 3 files changed, 51 insertions(+)
> > >
> > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c
> > > index ff27cdf7c44a..73db8f15c704 100644
> > > --- a/cxl/lib/libcxl.c
> > > +++ b/cxl/lib/libcxl.c
> > > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev)
> > > return 0;
> > > }
> > >
> > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev)
> > > +{
> > > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev);
> > > + char *path = memdev->dev_buf;
> > > + int len = memdev->buf_len, rc;
> > > +
> > > + if (snprintf(path, len, "%s/trigger_poison_list",
> > > + memdev->dev_path) >= len) {
> > > + err(ctx, "%s: buffer too small\n",
> > > + cxl_memdev_get_devname(memdev));
> > > + return -ENXIO;
> > > + }
> > > + rc = sysfs_write_attr(ctx, path, "1\n");
> > > + if (rc < 0) {
> > > + fprintf(stderr,
> > > + "%s: Failed write sysfs attr trigger_poison_list\n",
> > > + cxl_memdev_get_devname(memdev));
> >
> > Should we use err() instead of fprintf here?
>
> Thanks Fan,
>
> How about this?
>
> - use fprintf if access() fails, ie device doesn't support poison list,
> - use err() for failure to actually read the poison list on a device with
> support
Why? There is no raw usage of fprintf in any of the libraries (ndctl,
daxctl, cxl) to date. If someone builds the library without logging then
it should not chat on stderr at all, and if someone redirects logging to
syslog then it also should emit messages only there and not stderr.
I'm working on V3. Thanks for Ying's feedback. cc: sthanneeru@micron.com On Thu, Mar 14, 2024 at 12:54 AM Huang, Ying <ying.huang@intel.com> wrote: > > "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> writes: > > > On Tue, Mar 12, 2024 at 2:21 AM Huang, Ying <ying.huang@intel.com> wrote: > >> > >> "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> writes: > >> > >> > The current implementation treats emulated memory devices, such as > >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory > >> > (E820_TYPE_RAM). However, these emulated devices have different > >> > characteristics than traditional DRAM, making it important to > >> > distinguish them. Thus, we modify the tiered memory initialization process > >> > to introduce a delay specifically for CPUless NUMA nodes. This delay > >> > ensures that the memory tier initialization for these nodes is deferred > >> > until HMAT information is obtained during the boot process. Finally, > >> > demotion tables are recalculated at the end. > >> > > >> > * Abstract common functions into `find_alloc_memory_type()` > >> > >> We should move kmem_put_memory_types() (renamed to > >> mt_put_memory_types()?) too. This can be put in a separate patch. > >> > > > > Will do! Thanks, > > > > > >> > >> > Since different memory devices require finding or allocating a memory type, > >> > these common steps are abstracted into a single function, > >> > `find_alloc_memory_type()`, enhancing code scalability and conciseness. > >> > > >> > * Handle cases where there is no HMAT when creating memory tiers > >> > There is a scenario where a CPUless node does not provide HMAT information. > >> > If no HMAT is specified, it falls back to using the default DRAM tier. > >> > > >> > * Change adist calculation code to use another new lock, mt_perf_lock. > >> > In the current implementation, iterating through CPUlist nodes requires > >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up > >> > trying to acquire the same lock, leading to a potential deadlock. > >> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect > >> > `default_dram_perf`. This approach not only avoids deadlock but also > >> > prevents holding a large lock simultaneously. > >> > > >> > Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> > >> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> > >> > --- > >> > drivers/acpi/numa/hmat.c | 11 ++++++ > >> > drivers/dax/kmem.c | 13 +------ > >> > include/linux/acpi.h | 6 ++++ > >> > include/linux/memory-tiers.h | 8 +++++ > >> > mm/memory-tiers.c | 70 +++++++++++++++++++++++++++++++++--- > >> > 5 files changed, 92 insertions(+), 16 deletions(-) > >> > > >> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c > >> > index d6b85f0f6082..28812ec2c793 100644 > >> > --- a/drivers/acpi/numa/hmat.c > >> > +++ b/drivers/acpi/numa/hmat.c > >> > @@ -38,6 +38,8 @@ static LIST_HEAD(targets); > >> > static LIST_HEAD(initiators); > >> > static LIST_HEAD(localities); > >> > > >> > +static LIST_HEAD(hmat_memory_types); > >> > + > >> > >> HMAT isn't a device driver for some memory devices. So I don't think we > >> should manage memory types in HMAT. > > > > I can put it back in memory-tier.c. How about the list? Do we still > > need to keep a separate list for storing late_inited memory nodes? > > And how about the list name if we need to remove the prefix "hmat_"? > > I don't think we need a separate list for memory-less nodes. Just > iterate all memory-less nodes. > Ok. There is no need to keep a list of memory-less nodes. I will only keep a memory_type list to manage created memory types. > > > >> Instead, if the memory_type of a > >> node isn't set by the driver, we should manage it in memory-tier.c as > >> fallback. > >> > > > > Do you mean some device drivers may init memory tiers between > > memory_tier_init() and late_initcall(memory_tier_late_init);? > > And this is the reason why you mention to exclude > > "node_memory_types[nid].memtype != NULL" in memory_tier_late_init(). > > Is my understanding correct? > > Yes. > Thanks. > >> > static DEFINE_MUTEX(target_lock); > >> > > >> > /* > >> > @@ -149,6 +151,12 @@ int acpi_get_genport_coordinates(u32 uid, > >> > } > >> > EXPORT_SYMBOL_NS_GPL(acpi_get_genport_coordinates, CXL); > >> > > >> > +struct memory_dev_type *hmat_find_alloc_memory_type(int adist) > >> > +{ > >> > + return find_alloc_memory_type(adist, &hmat_memory_types); > >> > +} > >> > +EXPORT_SYMBOL_GPL(hmat_find_alloc_memory_type); > >> > + > >> > static __init void alloc_memory_initiator(unsigned int cpu_pxm) > >> > { > >> > struct memory_initiator *initiator; > >> > @@ -1038,6 +1046,9 @@ static __init int hmat_init(void) > >> > if (!hmat_set_default_dram_perf()) > >> > register_mt_adistance_algorithm(&hmat_adist_nb); > >> > > >> > + /* Post-create CPUless memory tiers after getting HMAT info */ > >> > + memory_tier_late_init(); > >> > + > >> > >> This should be called in memory-tier.c via > >> > >> late_initcall(memory_tier_late_init); > >> > >> Then, we don't need hmat to call it. > >> > > > > Thanks. Learned! > > > > > >> > return 0; > >> > out_put: > >> > hmat_free_structures(); > >> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > >> > index 42ee360cf4e3..aee17ab59f4f 100644 > >> > --- a/drivers/dax/kmem.c > >> > +++ b/drivers/dax/kmem.c > >> > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types); > >> > > >> > static struct memory_dev_type *kmem_find_alloc_memory_type(int adist) > >> > { > >> > - bool found = false; > >> > struct memory_dev_type *mtype; > >> > > >> > mutex_lock(&kmem_memory_type_lock); > >> > - list_for_each_entry(mtype, &kmem_memory_types, list) { > >> > - if (mtype->adistance == adist) { > >> > - found = true; > >> > - break; > >> > - } > >> > - } > >> > - if (!found) { > >> > - mtype = alloc_memory_type(adist); > >> > - if (!IS_ERR(mtype)) > >> > - list_add(&mtype->list, &kmem_memory_types); > >> > - } > >> > + mtype = find_alloc_memory_type(adist, &kmem_memory_types); > >> > mutex_unlock(&kmem_memory_type_lock); > >> > > >> > return mtype; > >> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > >> > index b7165e52b3c6..3f927ff01f02 100644 > >> > --- a/include/linux/acpi.h > >> > +++ b/include/linux/acpi.h > >> > @@ -434,12 +434,18 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp); > >> > > >> > #ifdef CONFIG_ACPI_HMAT > >> > int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord); > >> > +struct memory_dev_type *hmat_find_alloc_memory_type(int adist); > >> > #else > >> > static inline int acpi_get_genport_coordinates(u32 uid, > >> > struct access_coordinate *coord) > >> > { > >> > return -EOPNOTSUPP; > >> > } > >> > + > >> > +static inline struct memory_dev_type *hmat_find_alloc_memory_type(int adist) > >> > +{ > >> > + return NULL; > >> > +} > >> > #endif > >> > > >> > #ifdef CONFIG_ACPI_NUMA > >> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > >> > index 69e781900082..4bc2596c5774 100644 > >> > --- a/include/linux/memory-tiers.h > >> > +++ b/include/linux/memory-tiers.h > >> > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist); > >> > int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, > >> > const char *source); > >> > int mt_perf_to_adistance(struct access_coordinate *perf, int *adist); > >> > +struct memory_dev_type *find_alloc_memory_type(int adist, > >> > + struct list_head *memory_types); > >> > +void memory_tier_late_init(void); > >> > #ifdef CONFIG_MIGRATION > >> > int next_demotion_node(int node); > >> > void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); > >> > @@ -136,5 +139,10 @@ static inline int mt_perf_to_adistance(struct access_coordinate *perf, int *adis > >> > { > >> > return -EIO; > >> > } > >> > + > >> > +static inline void memory_tier_late_init(void) > >> > +{ > >> > + > >> > +} > >> > #endif /* CONFIG_NUMA */ > >> > #endif /* _LINUX_MEMORY_TIERS_H */ > >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > >> > index 0537664620e5..79f748d60e6f 100644 > >> > --- a/mm/memory-tiers.c > >> > +++ b/mm/memory-tiers.c > >> > @@ -6,6 +6,7 @@ > >> > #include <linux/memory.h> > >> > #include <linux/memory-tiers.h> > >> > #include <linux/notifier.h> > >> > +#include <linux/acpi.h> > >> > > >> > #include "internal.h" > >> > > >> > @@ -35,6 +36,7 @@ struct node_memory_type_map { > >> > }; > >> > > >> > static DEFINE_MUTEX(memory_tier_lock); > >> > +static DEFINE_MUTEX(mt_perf_lock); > >> > >> Please add comments about what it protects. And put it near the data > >> structure it protects. > >> > > > > Where is better for me to add comments for this? Code? Patch description? > > Will put it closer to the protected data. Thanks. > > Just put the comments at the above of the lock in the source code. > Got it. Thanks! > > > > > > >> > static LIST_HEAD(memory_tiers); > >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > >> > struct memory_dev_type *default_dram_type; > >> > @@ -623,6 +625,58 @@ void clear_node_memory_type(int node, struct memory_dev_type *memtype) > >> > } > >> > EXPORT_SYMBOL_GPL(clear_node_memory_type); > >> > > >> > +struct memory_dev_type *find_alloc_memory_type(int adist, struct list_head *memory_types) > >> > +{ > >> > + bool found = false; > >> > + struct memory_dev_type *mtype; > >> > + > >> > + list_for_each_entry(mtype, memory_types, list) { > >> > + if (mtype->adistance == adist) { > >> > + found = true; > >> > + break; > >> > + } > >> > + } > >> > + if (!found) { > >> > + mtype = alloc_memory_type(adist); > >> > + if (!IS_ERR(mtype)) > >> > + list_add(&mtype->list, memory_types); > >> > + } > >> > + > >> > + return mtype; > >> > +} > >> > +EXPORT_SYMBOL_GPL(find_alloc_memory_type); > >> > + > >> > +static void memory_tier_late_create(int node) > >> > +{ > >> > + struct memory_dev_type *mtype = NULL; > >> > + int adist = MEMTIER_ADISTANCE_DRAM; > >> > + > >> > + mt_calc_adistance(node, &adist); > >> > + if (adist != MEMTIER_ADISTANCE_DRAM) { > >> > >> We can manage default_dram_type() via find_alloc_memory_type() > >> too. > >> > >> And, if "node_memory_types[node].memtype == NULL", we can call > >> mt_calc_adistance(node, &adist) and find_alloc_memory_type() in > >> set_node_memory_tier(). Then, we can cover hotpluged memory node too. > >> > >> > + mtype = hmat_find_alloc_memory_type(adist); > >> > + if (!IS_ERR(mtype)) > >> > + __init_node_memory_type(node, mtype); > >> > + else > >> > + pr_err("Failed to allocate a memory type at %s()\n", __func__); > >> > + } > >> > + > >> > + set_node_memory_tier(node); > >> > +} > >> > + > >> > +void memory_tier_late_init(void) > >> > +{ > >> > + int nid; > >> > + > >> > + mutex_lock(&memory_tier_lock); > >> > + for_each_node_state(nid, N_MEMORY) > >> > + if (!node_state(nid, N_CPU)) > >> > >> We should exclude "node_memory_types[nid].memtype != NULL". Some memory > >> nodes may be onlined by some device drivers and setup memory tiers > >> already. > >> > >> > + memory_tier_late_create(nid); > >> > + > >> > + establish_demotion_targets(); > >> > + mutex_unlock(&memory_tier_lock); > >> > +} > >> > +EXPORT_SYMBOL_GPL(memory_tier_late_init); > >> > + > >> > static void dump_hmem_attrs(struct access_coordinate *coord, const char *prefix) > >> > { > >> > pr_info( > >> > @@ -636,7 +690,7 @@ int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, > >> > { > >> > int rc = 0; > >> > > >> > - mutex_lock(&memory_tier_lock); > >> > + mutex_lock(&mt_perf_lock); > >> > if (default_dram_perf_error) { > >> > rc = -EIO; > >> > goto out; > >> > @@ -684,7 +738,7 @@ int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, > >> > } > >> > > >> > out: > >> > - mutex_unlock(&memory_tier_lock); > >> > + mutex_unlock(&mt_perf_lock); > >> > return rc; > >> > } > >> > > >> > @@ -700,7 +754,7 @@ int mt_perf_to_adistance(struct access_coordinate *perf, int *adist) > >> > perf->read_bandwidth + perf->write_bandwidth == 0) > >> > return -EINVAL; > >> > > >> > - mutex_lock(&memory_tier_lock); > >> > + mutex_lock(&mt_perf_lock); > >> > /* > >> > * The abstract distance of a memory node is in direct proportion to > >> > * its memory latency (read + write) and inversely proportional to its > >> > @@ -713,7 +767,7 @@ int mt_perf_to_adistance(struct access_coordinate *perf, int *adist) > >> > (default_dram_perf.read_latency + default_dram_perf.write_latency) * > >> > (default_dram_perf.read_bandwidth + default_dram_perf.write_bandwidth) / > >> > (perf->read_bandwidth + perf->write_bandwidth); > >> > - mutex_unlock(&memory_tier_lock); > >> > + mutex_unlock(&mt_perf_lock); > >> > > >> > return 0; > >> > } > >> > @@ -836,6 +890,14 @@ static int __init memory_tier_init(void) > >> > * types assigned. > >> > */ > >> > for_each_node_state(node, N_MEMORY) { > >> > + if (!node_state(node, N_CPU)) > >> > + /* > >> > + * Defer memory tier initialization on CPUless numa nodes. > >> > + * These will be initialized when HMAT information is > >> > >> HMAT is platform specific, we should avoid to mention it in general code > >> if possible. > >> > > > > Will fix! Thanks, > > > > > >> > + * available. > >> > + */ > >> > + continue; > >> > + > >> > memtier = set_node_memory_tier(node); > >> > if (IS_ERR(memtier)) > >> > /* > >> > > -- > Best Regards, > Huang, Ying -- Best regards, Ho-Ren (Jack) Chuang 莊賀任
On Mon, Mar 18, 2024 at 10:51:13AM -0700, fan wrote: > On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > CXL devices maintain a list of locations that are poisoned or result > > in poison if the addresses are accessed by the host. > > > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison > > List as a set of Media Error Records that include the source of the > > error, the starting device physical address and length. > > > > Trigger the retrieval of the poison list by writing to the memory > > device sysfs attribute: trigger_poison_list. The CXL driver only > > offers triggering per memdev, so the trigger by region interface > > offered here is a convenience API that triggers a poison list > > retrieval for each memdev contributing to a region. > > > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > > int cxl_region_trigger_poison_list(struct cxl_region *region); > > > > The resulting poison records are logged as kernel trace events > > named 'cxl_poison'. > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > > --- > > cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ > > cxl/lib/libcxl.sym | 2 ++ > > cxl/libcxl.h | 2 ++ > > 3 files changed, 51 insertions(+) > > > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > > index ff27cdf7c44a..73db8f15c704 100644 > > --- a/cxl/lib/libcxl.c > > +++ b/cxl/lib/libcxl.c > > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev) > > return 0; > > } > > > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev) > > +{ > > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > > + char *path = memdev->dev_buf; > > + int len = memdev->buf_len, rc; > > + > > + if (snprintf(path, len, "%s/trigger_poison_list", > > + memdev->dev_path) >= len) { > > + err(ctx, "%s: buffer too small\n", > > + cxl_memdev_get_devname(memdev)); > > + return -ENXIO; > > + } > > + rc = sysfs_write_attr(ctx, path, "1\n"); > > + if (rc < 0) { > > + fprintf(stderr, > > + "%s: Failed write sysfs attr trigger_poison_list\n", > > + cxl_memdev_get_devname(memdev)); > > Should we use err() instead of fprintf here? Thanks Fan, How about this? - use fprintf if access() fails, ie device doesn't support poison list, - use err() for failure to actually read the poison list on a device with support Alison > > Fan > > > + return rc; > > + } > > + return 0; > > +} > > + > > +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region) > > +{ > > + struct cxl_memdev_mapping *mapping; > > + int rc; > > + > > + cxl_mapping_foreach(region, mapping) { > > + struct cxl_decoder *decoder; > > + struct cxl_memdev *memdev; > > + > > + decoder = cxl_mapping_get_decoder(mapping); > > + if (!decoder) > > + continue; > > + > > + memdev = cxl_decoder_get_memdev(decoder); > > + if (!memdev) > > + continue; > > + > > + rc = cxl_memdev_trigger_poison_list(memdev); > > + if (rc) > > + return rc; > > + } > > + > > + return 0; > > +} > > + > > CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev) > > { > > struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym > > index de2cd84b2960..3f709c60db3d 100644 > > --- a/cxl/lib/libcxl.sym > > +++ b/cxl/lib/libcxl.sym > > @@ -280,4 +280,6 @@ global: > > cxl_memdev_get_pmem_qos_class; > > cxl_memdev_get_ram_qos_class; > > cxl_region_qos_class_mismatch; > > + cxl_memdev_trigger_poison_list; > > + cxl_region_trigger_poison_list; > > } LIBCXL_6; > > diff --git a/cxl/libcxl.h b/cxl/libcxl.h > > index a6af3fb04693..29165043ca3f 100644 > > --- a/cxl/libcxl.h > > +++ b/cxl/libcxl.h > > @@ -467,6 +467,8 @@ enum cxl_setpartition_mode { > > > > int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd, > > enum cxl_setpartition_mode mode); > > +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > > +int cxl_region_trigger_poison_list(struct cxl_region *region); > > > > int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd, > > int threshold); > > -- > > 2.37.3 > >
On Wed, Mar 13, 2024 at 09:05:17PM -0700, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > CXL devices maintain a list of locations that are poisoned or result > in poison if the addresses are accessed by the host. > > Per the spec (CXL 3.1 8.2.9.9.4.1), the device returns the Poison > List as a set of Media Error Records that include the source of the > error, the starting device physical address and length. > > Trigger the retrieval of the poison list by writing to the memory > device sysfs attribute: trigger_poison_list. The CXL driver only > offers triggering per memdev, so the trigger by region interface > offered here is a convenience API that triggers a poison list > retrieval for each memdev contributing to a region. > > int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > int cxl_region_trigger_poison_list(struct cxl_region *region); > > The resulting poison records are logged as kernel trace events > named 'cxl_poison'. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > cxl/lib/libcxl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ > cxl/lib/libcxl.sym | 2 ++ > cxl/libcxl.h | 2 ++ > 3 files changed, 51 insertions(+) > > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > index ff27cdf7c44a..73db8f15c704 100644 > --- a/cxl/lib/libcxl.c > +++ b/cxl/lib/libcxl.c > @@ -1761,6 +1761,53 @@ CXL_EXPORT int cxl_memdev_disable_invalidate(struct cxl_memdev *memdev) > return 0; > } > > +CXL_EXPORT int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev) > +{ > + struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > + char *path = memdev->dev_buf; > + int len = memdev->buf_len, rc; > + > + if (snprintf(path, len, "%s/trigger_poison_list", > + memdev->dev_path) >= len) { > + err(ctx, "%s: buffer too small\n", > + cxl_memdev_get_devname(memdev)); > + return -ENXIO; > + } > + rc = sysfs_write_attr(ctx, path, "1\n"); > + if (rc < 0) { > + fprintf(stderr, > + "%s: Failed write sysfs attr trigger_poison_list\n", > + cxl_memdev_get_devname(memdev)); Should we use err() instead of fprintf here? Fan > + return rc; > + } > + return 0; > +} > + > +CXL_EXPORT int cxl_region_trigger_poison_list(struct cxl_region *region) > +{ > + struct cxl_memdev_mapping *mapping; > + int rc; > + > + cxl_mapping_foreach(region, mapping) { > + struct cxl_decoder *decoder; > + struct cxl_memdev *memdev; > + > + decoder = cxl_mapping_get_decoder(mapping); > + if (!decoder) > + continue; > + > + memdev = cxl_decoder_get_memdev(decoder); > + if (!memdev) > + continue; > + > + rc = cxl_memdev_trigger_poison_list(memdev); > + if (rc) > + return rc; > + } > + > + return 0; > +} > + > CXL_EXPORT int cxl_memdev_enable(struct cxl_memdev *memdev) > { > struct cxl_ctx *ctx = cxl_memdev_get_ctx(memdev); > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym > index de2cd84b2960..3f709c60db3d 100644 > --- a/cxl/lib/libcxl.sym > +++ b/cxl/lib/libcxl.sym > @@ -280,4 +280,6 @@ global: > cxl_memdev_get_pmem_qos_class; > cxl_memdev_get_ram_qos_class; > cxl_region_qos_class_mismatch; > + cxl_memdev_trigger_poison_list; > + cxl_region_trigger_poison_list; > } LIBCXL_6; > diff --git a/cxl/libcxl.h b/cxl/libcxl.h > index a6af3fb04693..29165043ca3f 100644 > --- a/cxl/libcxl.h > +++ b/cxl/libcxl.h > @@ -467,6 +467,8 @@ enum cxl_setpartition_mode { > > int cxl_cmd_partition_set_mode(struct cxl_cmd *cmd, > enum cxl_setpartition_mode mode); > +int cxl_memdev_trigger_poison_list(struct cxl_memdev *memdev); > +int cxl_region_trigger_poison_list(struct cxl_region *region); > > int cxl_cmd_alert_config_set_life_used_prog_warn_threshold(struct cxl_cmd *cmd, > int threshold); > -- > 2.37.3 >
On Fri, Mar 15, 2024 at 10:39:53AM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > Add helpers to extract the value of an event record field given the
> > field name. This is useful when the user knows the name and format
> > of the field and simply needs to get it. The helpers also return
> > the 'type'_MAX of the type when the field is
> >
> > Since this is in preparation for adding a cxl_poison private parser
> > for 'cxl list --media-errors' support those specific required
> > types: u8, u32, u64.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> > cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> > cxl/event_trace.h | 8 +++++++-
> > 2 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> > index 640abdab67bf..324edb982888 100644
> > --- a/cxl/event_trace.c
> > +++ b/cxl/event_trace.c
> > @@ -15,6 +15,43 @@
> > #define _GNU_SOURCE
> > #include <string.h>
> >
> > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> > + const char *name)
> > +{
> > + unsigned long long val;
> > +
> > + if (tep_get_field_val(NULL, event, name, record, &val, 0))
> > + return ULLONG_MAX;
> > +
> > + return val;
> > +}
>
> Hm, why are these prefixed "cxl_" there is nothing cxl specific in the
> internals. Maybe these event trace helpers grow non-CXL users in the
> future. Could be "trace_" or "util_" like other generic helpers in the
> codebase.
All the helpers in cxl/event_trace.c are prefixed "cxl_". The cxl
special-ness is only that ndctl/cxl is the only user of trace events
in ndctl/. cxl/monitor.c and now cxl/json.c (this usage)
I can move: ndctl/cxl/event_trace.h,c to ndctl/utils/event_trace.h,c.
and update cxl/monitor.c to find.
Yay?
On Sat, Mar 16, 2024 at 08:03:34AM +0900, Wonjae Lee wrote: > alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > Exercise cxl list, libcxl, and driver pieces of the get poison list > > pathway. Inject and clear poison using debugfs and use cxl-cli to > > read the poison list by memdev and by region. > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > --- > > test/cxl-poison.sh 137 +++++++++++++++++++++++++++++++++++++++++++++ > > test/meson.build 2 + > > 2 files changed, 139 insertions(+) > > create mode 100644 test/cxl-poison.sh > > > > diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh > > new file mode 100644 > > index 000000000000..af2e9dcd1a11 > > --- /dev/null > > +++ b/test/cxl-poison.sh > > [snip] > > > +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing. > > Hi, > > I know it's trivial and not sure if I'm understanding the history of > the patch series correctly, but --poison seems to be an option that > was suggested before --media-errors. I'm wondering if it's okay to > leave this comment. Thanks Wonjae - I appreciate your find. I'll fix it up. Alison > > Thanks, > Wonjae
alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Exercise cxl list, libcxl, and driver pieces of the get poison list > pathway. Inject and clear poison using debugfs and use cxl-cli to > read the poison list by memdev and by region. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > --- > test/cxl-poison.sh 137 +++++++++++++++++++++++++++++++++++++++++++++ > test/meson.build 2 + > 2 files changed, 139 insertions(+) > create mode 100644 test/cxl-poison.sh > > diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh > new file mode 100644 > index 000000000000..af2e9dcd1a11 > --- /dev/null > +++ b/test/cxl-poison.sh [snip] > +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing. Hi, I know it's trivial and not sure if I'm understanding the history of the patch series correctly, but --poison seems to be an option that was suggested before --media-errors. I'm wondering if it's okay to leave this comment. Thanks, Wonjae
The pull request you sent on Thu, 14 Mar 2024 13:44:39 -0700: > git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git tags/libnvdimm-for-6.9 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/4757c3c64a71820a37da7a14c5b63a1f26fed0f5 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Add helpers to extract the value of an event record field given the
> field name. This is useful when the user knows the name and format
> of the field and simply needs to get it. The helpers also return
> the 'type'_MAX of the type when the field is
>
> Since this is in preparation for adding a cxl_poison private parser
> for 'cxl list --media-errors' support those specific required
> types: u8, u32, u64.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> cxl/event_trace.h | 8 +++++++-
> 2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/cxl/event_trace.c b/cxl/event_trace.c
> index 640abdab67bf..324edb982888 100644
> --- a/cxl/event_trace.c
> +++ b/cxl/event_trace.c
> @@ -15,6 +15,43 @@
> #define _GNU_SOURCE
> #include <string.h>
>
> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record,
> + const char *name)
> +{
> + unsigned long long val;
> +
> + if (tep_get_field_val(NULL, event, name, record, &val, 0))
> + return ULLONG_MAX;
> +
> + return val;
> +}
Hm, why are these prefixed "cxl_" there is nothing cxl specific in the
internals. Maybe these event trace helpers grow non-CXL users in the
future. Could be "trace_" or "util_" like other generic helpers in the
codebase.
On 3/13/24 9:05 PM, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Exercise cxl list, libcxl, and driver pieces of the get poison list > pathway. Inject and clear poison using debugfs and use cxl-cli to > read the poison list by memdev and by region. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > test/cxl-poison.sh | 137 +++++++++++++++++++++++++++++++++++++++++++++ > test/meson.build | 2 + > 2 files changed, 139 insertions(+) > create mode 100644 test/cxl-poison.sh > > diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh > new file mode 100644 > index 000000000000..af2e9dcd1a11 > --- /dev/null > +++ b/test/cxl-poison.sh > @@ -0,0 +1,137 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2023 Intel Corporation. All rights reserved. > + > +. "$(dirname "$0")"/common > + > +rc=77 > + > +set -ex > + > +trap 'err $LINENO' ERR > + > +check_prereq "jq" > + > +modprobe -r cxl_test > +modprobe cxl_test > + > +rc=1 > + > +# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to > +# inject, clear, and get the poison list. Do it by memdev and by region. > + > +find_memdev() > +{ > + readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M | > + jq -r ".[] | select(.pmem_size != null) | > + select(.ram_size != null) | .memdev") > + > + if [ ${#capable_mems[@]} == 0 ]; then > + echo "no memdevs found for test" > + err "$LINENO" > + fi > + > + memdev=${capable_mems[0]} > +} > + > +create_x2_region() > +{ > + # Find an x2 decoder > + decoder="$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] | > + select(.pmem_capable == true) | > + select(.nr_targets == 2) | > + .decoder")" > + > + # Find a memdev for each host-bridge interleave position > + port_dev0="$($CXL list -T -d "$decoder" | jq -r ".[] | > + .targets | .[] | select(.position == 0) | .target")" > + port_dev1="$($CXL list -T -d "$decoder" | jq -r ".[] | > + .targets | .[] | select(.position == 1) | .target")" > + mem0="$($CXL list -M -p "$port_dev0" | jq -r ".[0].memdev")" > + mem1="$($CXL list -M -p "$port_dev1" | jq -r ".[0].memdev")" > + > + region="$($CXL create-region -d "$decoder" -m "$mem0" "$mem1" | > + jq -r ".region")" > + if [[ ! $region ]]; then > + echo "create-region failed for $decoder" > + err "$LINENO" > + fi > + echo "$region" > +} > + > +# When cxl-cli support for inject and clear arrives, replace > +# the writes to /sys/kernel/debug with the new cxl commands. > + > +inject_poison_sysfs() > +{ > + memdev="$1" > + addr="$2" > + > + echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison > +} > + > +clear_poison_sysfs() > +{ > + memdev="$1" > + addr="$2" > + > + echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison > +} > + > +validate_poison_found() > +{ > + list_by="$1" > + nr_expect="$2" > + > + poison_list="$($CXL list "$list_by" --media-errors | > + jq -r '.[].media_errors')" > + if [[ ! $poison_list ]]; then > + nr_found=0 > + else > + nr_found=$(jq "length" <<< "$poison_list") > + fi > + if [ "$nr_found" -ne "$nr_expect" ]; then > + echo "$nr_expect poison records expected, $nr_found found" > + err "$LINENO" > + fi > +} > + > +test_poison_by_memdev() > +{ > + find_memdev > + inject_poison_sysfs "$memdev" "0x40000000" > + inject_poison_sysfs "$memdev" "0x40001000" > + inject_poison_sysfs "$memdev" "0x600" > + inject_poison_sysfs "$memdev" "0x0" > + validate_poison_found "-m $memdev" 4 > + > + clear_poison_sysfs "$memdev" "0x40000000" > + clear_poison_sysfs "$memdev" "0x40001000" > + clear_poison_sysfs "$memdev" "0x600" > + clear_poison_sysfs "$memdev" "0x0" > + validate_poison_found "-m $memdev" 0 > +} > + > +test_poison_by_region() > +{ > + create_x2_region > + inject_poison_sysfs "$mem0" "0x40000000" > + inject_poison_sysfs "$mem1" "0x40000000" > + validate_poison_found "-r $region" 2 > + > + clear_poison_sysfs "$mem0" "0x40000000" > + clear_poison_sysfs "$mem1" "0x40000000" > + validate_poison_found "-r $region" 0 > +} > + > +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing. > +# Turning it on here allows the test user to also view inject and clear > +# trace events. > +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable > + > +test_poison_by_memdev > +test_poison_by_region > + > +check_dmesg "$LINENO" > + > +modprobe -r cxl-test > diff --git a/test/meson.build b/test/meson.build > index a965a79fd6cb..d871e28e17ce 100644 > --- a/test/meson.build > +++ b/test/meson.build > @@ -160,6 +160,7 @@ cxl_events = find_program('cxl-events.sh') > cxl_sanitize = find_program('cxl-sanitize.sh') > cxl_destroy_region = find_program('cxl-destroy-region.sh') > cxl_qos_class = find_program('cxl-qos-class.sh') > +cxl_poison = find_program('cxl-poison.sh') > > tests = [ > [ 'libndctl', libndctl, 'ndctl' ], > @@ -192,6 +193,7 @@ tests = [ > [ 'cxl-sanitize.sh', cxl_sanitize, 'cxl' ], > [ 'cxl-destroy-region.sh', cxl_destroy_region, 'cxl' ], > [ 'cxl-qos-class.sh', cxl_qos_class, 'cxl' ], > + [ 'cxl-poison.sh', cxl_poison, 'cxl' ], > ] > > if get_option('destructive').enabled()
On 3/13/24 9:05 PM, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > The --media-errors option to 'cxl list' retrieves poison lists from > memory devices supporting the capability and displays the returned > media_error records in the cxl list json. This option can apply to > memdevs or regions. > > Include media-errors in the -vvv verbose option. > > Example usage in the Documentation/cxl/cxl-list.txt update. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++- > cxl/filter.h | 3 ++ > cxl/list.c | 3 ++ > 3 files changed, 67 insertions(+), 1 deletion(-) > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt > index 838de4086678..6d3ef92c29e8 100644 > --- a/Documentation/cxl/cxl-list.txt > +++ b/Documentation/cxl/cxl-list.txt > @@ -415,6 +415,66 @@ OPTIONS > --region:: > Specify CXL region device name(s), or device id(s), to filter the listing. > > +-L:: > +--media-errors:: > + Include media-error information. The poison list is retrieved from the > + device(s) and media_error records are added to the listing. Apply this > + option to memdevs and regions where devices support the poison list > + capability. "offset:" is relative to the region resource when listing > + by region and is the absolute device DPA when listing by memdev. > + "source:" is one of: External, Internal, Injected, Vendor Specific, > + or Unknown, as defined in CXL Specification v3.1 Table 8-140. > + > +---- > +# cxl list -m mem9 --media-errors -u > +{ > + "memdev":"mem9", > + "pmem_size":"1024.00 MiB (1073.74 MB)", > + "pmem_qos_class":42, > + "ram_size":"1024.00 MiB (1073.74 MB)", > + "ram_qos_class":42, > + "serial":"0x5", > + "numa_node":1, > + "host":"cxl_mem.5", > + "media_errors":[ > + { > + "offset":"0x40000000", > + "length":64, > + "source":"Injected" > + } > + ] > +} > +---- > +In the above example, region mappings can be found using: > +"cxl list -p mem9 --decoders" > +---- > +# cxl list -r region5 --media-errors -u > +{ > + "region":"region5", > + "resource":"0xf110000000", > + "size":"2.00 GiB (2.15 GB)", > + "type":"pmem", > + "interleave_ways":2, > + "interleave_granularity":4096, > + "decode_state":"commit", > + "media_errors":[ > + { > + "offset":"0x1000", > + "length":64, > + "source":"Injected" > + }, > + { > + "offset":"0x2000", > + "length":64, > + "source":"Injected" > + } > + ] > +} > +---- > +In the above example, memdev mappings can be found using: > +"cxl list -r region5 --targets" and "cxl list -d <decoder_name>" > + > + > -v:: > --verbose:: > Increase verbosity of the output. This can be specified > @@ -431,7 +491,7 @@ OPTIONS > devices with --idle. > - *-vvv* > Everything *-vv* provides, plus enable > - --health and --partition. > + --health, --partition, --media-errors. > > --debug:: > If the cxl tool was built with debug enabled, turn on debug > diff --git a/cxl/filter.h b/cxl/filter.h > index 3f65990f835a..956a46e0c7a9 100644 > --- a/cxl/filter.h > +++ b/cxl/filter.h > @@ -30,6 +30,7 @@ struct cxl_filter_params { > bool fw; > bool alert_config; > bool dax; > + bool media_errors; > int verbose; > struct log_ctx ctx; > }; > @@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param) > flags |= UTIL_JSON_ALERT_CONFIG; > if (param->dax) > flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS; > + if (param->media_errors) > + flags |= UTIL_JSON_MEDIA_ERRORS; > return flags; > } > > diff --git a/cxl/list.c b/cxl/list.c > index 93ba51ef895c..0b25d78248d5 100644 > --- a/cxl/list.c > +++ b/cxl/list.c > @@ -57,6 +57,8 @@ static const struct option options[] = { > "include memory device firmware information"), > OPT_BOOLEAN('A', "alert-config", ¶m.alert_config, > "include alert configuration information"), > + OPT_BOOLEAN('L', "media-errors", ¶m.media_errors, > + "include media-error information "), > OPT_INCR('v', "verbose", ¶m.verbose, "increase output detail"), > #ifdef ENABLE_DEBUG > OPT_BOOLEAN(0, "debug", &debug, "debug list walk"), > @@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx) > param.fw = true; > param.alert_config = true; > param.dax = true; > + param.media_errors = true; > /* fallthrough */ > case 2: > param.idle = true;
On 3/13/24 9:05 PM, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Media_error records are logged as events in the kernel tracing > subsystem. To prepare the media_error records for cxl list, enable > tracing, trigger the poison list read, and parse the generated > cxl_poison events into a json representation. > > Use the event_trace private parsing option to customize the json > representation based on cxl-list calling options and event field > settings. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Minor nit below. > --- > cxl/json.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 194 insertions(+) > > diff --git a/cxl/json.c b/cxl/json.c > index fbe41c78e82a..974e98f13cec 100644 > --- a/cxl/json.c > +++ b/cxl/json.c > @@ -1,16 +1,20 @@ > // SPDX-License-Identifier: GPL-2.0 > // Copyright (C) 2015-2021 Intel Corporation. All rights reserved. > #include <limits.h> > +#include <errno.h> > #include <util/json.h> > +#include <util/bitmap.h> > #include <uuid/uuid.h> > #include <cxl/libcxl.h> > #include <json-c/json.h> > #include <json-c/printbuf.h> > #include <ccan/short_types/short_types.h> > +#include <tracefs/tracefs.h> > > #include "filter.h" > #include "json.h" > #include "../daxctl/json.h" > +#include "event_trace.h" > > #define CXL_FW_VERSION_STR_LEN 16 > #define CXL_FW_MAX_SLOTS 4 > @@ -571,6 +575,184 @@ err_jobj: > return NULL; > } > > +/* CXL Spec 3.1 Table 8-140 Media Error Record */ > +#define CXL_POISON_SOURCE_MAX 7 > +static const char *poison_source[] = { "Unknown", "External", "Internal", > + "Injected", "Reserved", "Reserved", > + "Reserved", "Vendor" }; > + > +/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */ > +#define CXL_POISON_FLAG_MORE BIT(0) > +#define CXL_POISON_FLAG_OVERFLOW BIT(1) > +#define CXL_POISON_FLAG_SCANNING BIT(2) > + > +static int poison_event_to_json(struct tep_event *event, > + struct tep_record *record, > + struct event_ctx *e_ctx) > +{ > + struct poison_ctx *p_ctx = e_ctx->poison_ctx; > + struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison; > + struct cxl_memdev *memdev = p_ctx->memdev; > + struct cxl_region *region = p_ctx->region; > + unsigned long flags = p_ctx->flags; > + const char *region_name = NULL; > + char flag_str[32] = { '\0' }; > + bool overflow = false; > + u8 source, pflags; > + u64 offset, ts; > + u32 length; > + char *str; > + int len; > + > + jp = json_object_new_object(); > + if (!jp) > + return -ENOMEM; > + > + /* Skip records not in this region when listing by region */ > + if (region) > + region_name = cxl_region_get_devname(region); > + if (region_name) > + str = tep_get_field_raw(NULL, event, "region", record, &len, 0); > + if ((region_name) && (strcmp(region_name, str) != 0)) { > + json_object_put(jp); > + return 0; > + } > + /* Include offset,length by region (hpa) or by memdev (dpa) */ > + if (region) { > + offset = cxl_get_field_u64(event, record, "hpa"); > + if (offset != ULLONG_MAX) { > + offset = offset - cxl_region_get_resource(region); > + jobj = util_json_object_hex(offset, flags); > + if (jobj) > + json_object_object_add(jp, "offset", jobj); > + } > + } else if (memdev) { > + offset = cxl_get_field_u64(event, record, "dpa"); > + if (offset != ULLONG_MAX) { > + jobj = util_json_object_hex(offset, flags); > + if (jobj) > + json_object_object_add(jp, "offset", jobj); > + } > + } > + length = cxl_get_field_u32(event, record, "dpa_length"); > + jobj = util_json_object_size(length, flags); > + if (jobj) > + json_object_object_add(jp, "length", jobj); > + > + /* Always include the poison source */ > + source = cxl_get_field_u8(event, record, "source"); > + if (source <= CXL_POISON_SOURCE_MAX) > + jobj = json_object_new_string(poison_source[source]); > + else > + jobj = json_object_new_string("Reserved"); > + if (jobj) > + json_object_object_add(jp, "source", jobj); > + > + /* Include flags and overflow time if present */ > + pflags = cxl_get_field_u8(event, record, "flags"); > + if (pflags && pflags < UCHAR_MAX) { > + if (pflags & CXL_POISON_FLAG_MORE) > + strcat(flag_str, "More,"); > + if (pflags & CXL_POISON_FLAG_SCANNING) > + strcat(flag_str, "Scanning,"); > + if (pflags & CXL_POISON_FLAG_OVERFLOW) { > + strcat(flag_str, "Overflow,"); > + overflow = true; > + } > + jobj = json_object_new_string(flag_str); > + if (jobj) > + json_object_object_add(jp, "flags", jobj); > + } > + if (overflow) { > + ts = cxl_get_field_u64(event, record, "overflow_ts"); > + jobj = util_json_object_hex(ts, flags); > + if (jobj) > + json_object_object_add(jp, "overflow_t", jobj); > + } > + json_object_array_add(jpoison, jp); > + > + return 0; > +} > + > +static struct json_object * > +util_cxl_poison_events_to_json(struct tracefs_instance *inst, > + struct poison_ctx *p_ctx) > +{ > + struct event_ctx ectx = { > + .event_name = "cxl_poison", > + .event_pid = getpid(), > + .system = "cxl", > + .poison_ctx = p_ctx, > + .parse_event = poison_event_to_json, > + }; > + int rc = 0; No need to init rc here. DJ > + > + p_ctx->jpoison = json_object_new_array(); > + if (!p_ctx->jpoison) > + return NULL; > + > + rc = cxl_parse_events(inst, &ectx); > + if (rc < 0) { > + fprintf(stderr, "Failed to parse events: %d\n", rc); > + goto put_jobj; > + } > + if (json_object_array_length(p_ctx->jpoison) == 0) > + goto put_jobj; > + > + return p_ctx->jpoison; > + > +put_jobj: > + json_object_put(p_ctx->jpoison); > + return NULL; > +} > + > +static struct json_object * > +util_cxl_poison_list_to_json(struct cxl_region *region, > + struct cxl_memdev *memdev, > + unsigned long flags) > +{ > + struct json_object *jpoison = NULL; > + struct poison_ctx p_ctx; > + struct tracefs_instance *inst; > + int rc; > + > + inst = tracefs_instance_create("cxl list"); > + if (!inst) { > + fprintf(stderr, "tracefs_instance_create() failed\n"); > + return NULL; > + } > + > + rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison"); > + if (rc < 0) { > + fprintf(stderr, "Failed to enable trace: %d\n", rc); > + goto err_free; > + } > + > + if (region) > + rc = cxl_region_trigger_poison_list(region); > + else > + rc = cxl_memdev_trigger_poison_list(memdev); > + if (rc) > + goto err_free; > + > + rc = cxl_event_tracing_disable(inst); > + if (rc < 0) { > + fprintf(stderr, "Failed to disable trace: %d\n", rc); > + goto err_free; > + } > + > + p_ctx = (struct poison_ctx) { > + .region = region, > + .memdev = memdev, > + .flags = flags, > + }; > + jpoison = util_cxl_poison_events_to_json(inst, &p_ctx); > + > +err_free: > + tracefs_instance_free(inst); > + return jpoison; > +} > + > struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, > unsigned long flags) > { > @@ -664,6 +846,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, > json_object_object_add(jdev, "firmware", jobj); > } > > + if (flags & UTIL_JSON_MEDIA_ERRORS) { > + jobj = util_cxl_poison_list_to_json(NULL, memdev, flags); > + if (jobj) > + json_object_object_add(jdev, "media_errors", jobj); > + } > + > json_object_set_userdata(jdev, memdev, NULL); > return jdev; > } > @@ -1012,6 +1200,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region, > json_object_object_add(jregion, "state", jobj); > } > > + if (flags & UTIL_JSON_MEDIA_ERRORS) { > + jobj = util_cxl_poison_list_to_json(region, NULL, flags); > + if (jobj) > + json_object_object_add(jregion, "media_errors", jobj); > + } > + > util_cxl_mappings_append_json(jregion, region, flags); > > if (flags & UTIL_JSON_DAX) {
On 3/13/24 9:05 PM, alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Add helpers to extract the value of an event record field given the > field name. This is useful when the user knows the name and format > of the field and simply needs to get it. The helpers also return > the 'type'_MAX of the type when the field is > > Since this is in preparation for adding a cxl_poison private parser > for 'cxl list --media-errors' support those specific required > types: u8, u32, u64. > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > cxl/event_trace.h | 8 +++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/cxl/event_trace.c b/cxl/event_trace.c > index 640abdab67bf..324edb982888 100644 > --- a/cxl/event_trace.c > +++ b/cxl/event_trace.c > @@ -15,6 +15,43 @@ > #define _GNU_SOURCE > #include <string.h> > > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + unsigned long long val; > + > + if (tep_get_field_val(NULL, event, name, record, &val, 0)) > + return ULLONG_MAX; > + > + return val; > +} > + > +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + char *val; > + int len; > + > + val = tep_get_field_raw(NULL, event, name, record, &len, 0); > + if (!val) > + return UINT_MAX; > + > + return *(u32 *)val; > +} > + > +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, > + const char *name) > +{ > + char *val; > + int len; > + > + val = tep_get_field_raw(NULL, event, name, record, &len, 0); > + if (!val) > + return UCHAR_MAX; > + > + return *(u8 *)val; > +} > + > static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags) > { > bool sign = flags & TEP_FIELD_IS_SIGNED; > diff --git a/cxl/event_trace.h b/cxl/event_trace.h > index b77cafb410c4..7b30c3922aef 100644 > --- a/cxl/event_trace.h > +++ b/cxl/event_trace.h > @@ -5,6 +5,7 @@ > > #include <json-c/json.h> > #include <ccan/list/list.h> > +#include <ccan/short_types/short_types.h> > > struct jlist_node { > struct json_object *jobj; > @@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx); > int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system, > const char *event); > int cxl_event_tracing_disable(struct tracefs_instance *inst); > - > +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, > + const char *name); > +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, > + const char *name); > +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, > + const char *name); > #endif
Alison Schofield wrote:
> On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote:
> > alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > The --media-errors option to 'cxl list' retrieves poison lists from
> > > memory devices supporting the capability and displays the returned
> > > media_error records in the cxl list json. This option can apply to
> > > memdevs or regions.
> > >
> > > Include media-errors in the -vvv verbose option.
> > >
> > > Example usage in the Documentation/cxl/cxl-list.txt update.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> > > cxl/filter.h 3 ++
> > > cxl/list.c 3 ++
> > > 3 files changed, 67 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> > > index 838de4086678..6d3ef92c29e8 100644
> > > --- a/Documentation/cxl/cxl-list.txt
> > > +++ b/Documentation/cxl/cxl-list.txt
> >
> > [snip]
> >
> > +----
> > +In the above example, region mappings can be found using:
> > +"cxl list -p mem9 --decoders"
> > +----
> >
> > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
> > cover letter, too.
>
> Thanks for the review! I went with -p because it gives only
> the endpoint decoder while -m gives all the decoders up to
> the root - more than needed to discover the region.
The first thing that comes to mind to list memory devices with their
decoders is:
cxl list -MD -d endpoint
...however the problem is that endpoint ports connect memdevs to their
parent port, so the above results in:
Warning: no matching devices found
I think I want to special case "-d endpoint" when both -M and -D are
specified to also imply -E, "endpoint ports". However that also seems to
have a bug at present:
# cxl list -EDM -d endpoint -iu
{
"endpoint":"endpoint2",
"host":"mem0",
"parent_dport":"0000:34:00.0",
"depth":2
}
That needs to be fixed up to merge:
# cxl list -ED -d endpoint -iu
{
"endpoint":"endpoint2",
"host":"mem0",
"parent_dport":"0000:34:00.0",
"depth":2,
"decoders:endpoint2":[
{
"decoder":"decoder2.0",
"interleave_ways":1,
"state":"disabled"
}
]
}
...and:
# cxl list -EMu
{
"endpoint":"endpoint2",
"host":"mem0",
"parent_dport":"0000:34:00.0",
"depth":2,
"memdev":{
"memdev":"mem0",
"pmem_size":"512.00 MiB (536.87 MB)",
"serial":"0",
"host":"0000:35:00.0"
}
}
...so that one can get a nice listing of just endpoint ports, their
decoders (with media errors) and their memdevs.
The reason that "cxl list -p mem9 -D" works is subtle because it filters
the endpoint decoders by an endpoint port filter, but I think most users
would expect to not need to enable endpoint-port listings to see their
decoders the natural key to filter endpoint decoders is by memdev.
On Fri, Mar 15, 2024 at 10:09:44AM +0900, Wonjae Lee wrote: > alison.schofield@intel.com wrote: > > From: Alison Schofield <alison.schofield@intel.com> > > > > The --media-errors option to 'cxl list' retrieves poison lists from > > memory devices supporting the capability and displays the returned > > media_error records in the cxl list json. This option can apply to > > memdevs or regions. > > > > Include media-errors in the -vvv verbose option. > > > > Example usage in the Documentation/cxl/cxl-list.txt update. > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > --- > > Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++- > > cxl/filter.h 3 ++ > > cxl/list.c 3 ++ > > 3 files changed, 67 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt > > index 838de4086678..6d3ef92c29e8 100644 > > --- a/Documentation/cxl/cxl-list.txt > > +++ b/Documentation/cxl/cxl-list.txt > > [snip] > > +---- > +In the above example, region mappings can be found using: > +"cxl list -p mem9 --decoders" > +---- > > Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's > cover letter, too. Thanks for the review! I went with -p because it gives only the endpoint decoder while -m gives all the decoders up to the root - more than needed to discover the region. Here are the 2 outputs - what do you think? # cxl list -p mem9 --decoders -u { "decoder":"decoder20.0", "resource":"0xf110000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":2, "interleave_granularity":4096, "region":"region5", "dpa_resource":"0x40000000", "dpa_size":"1024.00 MiB (1073.74 MB)", "mode":"pmem" } # cxl list -m mem9 --decoders -u [ { "root decoders":[ { "decoder":"decoder7.1", "resource":"0xf050000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":2, "interleave_granularity":4096, "max_available_extent":"2.00 GiB (2.15 GB)", "volatile_capable":true, "qos_class":42, "nr_targets":2 }, { "decoder":"decoder7.3", "resource":"0xf110000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":2, "interleave_granularity":4096, "max_available_extent":0, "pmem_capable":true, "qos_class":42, "nr_targets":2 } ] }, { "port decoders":[ { "decoder":"decoder9.0", "resource":"0xf110000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":1, "region":"region5", "nr_targets":1 }, { "decoder":"decoder13.0", "resource":"0xf110000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":1, "region":"region5", "nr_targets":1 } ] }, { "endpoint decoders":[ { "decoder":"decoder20.0", "resource":"0xf110000000", "size":"2.00 GiB (2.15 GB)", "interleave_ways":2, "interleave_granularity":4096, "region":"region5", "dpa_resource":"0x40000000", "dpa_size":"1024.00 MiB (1073.74 MB)", "mode":"pmem" } ] } ] > > Thanks, > Wonjae
alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> The --media-errors option to 'cxl list' retrieves poison lists from
> memory devices supporting the capability and displays the returned
> media_error records in the cxl list json. This option can apply to
> memdevs or regions.
>
> Include media-errors in the -vvv verbose option.
>
> Example usage in the Documentation/cxl/cxl-list.txt update.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> Documentation/cxl/cxl-list.txt 62 +++++++++++++++++++++++++++++++++-
> cxl/filter.h 3 ++
> cxl/list.c 3 ++
> 3 files changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt
> index 838de4086678..6d3ef92c29e8 100644
> --- a/Documentation/cxl/cxl-list.txt
> +++ b/Documentation/cxl/cxl-list.txt
[snip]
+----
+In the above example, region mappings can be found using:
+"cxl list -p mem9 --decoders"
+----
Hi, isn't it '-m mem9' instead of -p? FYI, it's also on patch's
cover letter, too.
Thanks,
Wonjae
Hi Linus, please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git tags/libnvdimm-for-6.9 ... to get updates to the nvdimm tree. They are a number of updates to interfaces used by nvdimm/dax and a documentation fix. Doc fixes: ACPI_NFIT Kconfig documetation fix Updates: Make nvdimm_bus_type and dax_bus_type const Remove SLAB_MEM_SPREAD flag usage in DAX They have all been in the -next for more than a week with no reported issues. --- The following changes since commit 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478: Linux 6.8-rc3 (2024-02-04 12:20:36 +0000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git tags/libnvdimm-for-6.9 for you to fetch changes up to d9212b35da52109361247b66010802d43c6b1f0d: dax: remove SLAB_MEM_SPREAD flag usage (2024-03-04 09:10:37 -0700) ---------------------------------------------------------------- libnvdimm updates for v6.9 - ACPI_NFIT Kconfig documetation fix - Make nvdimm_bus_type const - Make dax_bus_type const - remove SLAB_MEM_SPREAD flag usage in DAX ---------------------------------------------------------------- Chengming Zhou (1): dax: remove SLAB_MEM_SPREAD flag usage Peter Robinson (1): libnvdimm: Fix ACPI_NFIT in BLK_DEV_PMEM help Ricardo B. Marliere (2): nvdimm: make nvdimm_bus_type const device-dax: make dax_bus_type const drivers/dax/bus.c | 2 +- drivers/dax/super.c | 2 +- drivers/nvdimm/Kconfig | 2 +- drivers/nvdimm/bus.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
"Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> writes: > On Tue, Mar 12, 2024 at 2:21 AM Huang, Ying <ying.huang@intel.com> wrote: >> >> "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> writes: >> >> > The current implementation treats emulated memory devices, such as >> > CXL1.1 type3 memory, as normal DRAM when they are emulated as normal memory >> > (E820_TYPE_RAM). However, these emulated devices have different >> > characteristics than traditional DRAM, making it important to >> > distinguish them. Thus, we modify the tiered memory initialization process >> > to introduce a delay specifically for CPUless NUMA nodes. This delay >> > ensures that the memory tier initialization for these nodes is deferred >> > until HMAT information is obtained during the boot process. Finally, >> > demotion tables are recalculated at the end. >> > >> > * Abstract common functions into `find_alloc_memory_type()` >> >> We should move kmem_put_memory_types() (renamed to >> mt_put_memory_types()?) too. This can be put in a separate patch. >> > > Will do! Thanks, > > >> >> > Since different memory devices require finding or allocating a memory type, >> > these common steps are abstracted into a single function, >> > `find_alloc_memory_type()`, enhancing code scalability and conciseness. >> > >> > * Handle cases where there is no HMAT when creating memory tiers >> > There is a scenario where a CPUless node does not provide HMAT information. >> > If no HMAT is specified, it falls back to using the default DRAM tier. >> > >> > * Change adist calculation code to use another new lock, mt_perf_lock. >> > In the current implementation, iterating through CPUlist nodes requires >> > holding the `memory_tier_lock`. However, `mt_calc_adistance()` will end up >> > trying to acquire the same lock, leading to a potential deadlock. >> > Therefore, we propose introducing a standalone `mt_perf_lock` to protect >> > `default_dram_perf`. This approach not only avoids deadlock but also >> > prevents holding a large lock simultaneously. >> > >> > Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> >> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> >> > --- >> > drivers/acpi/numa/hmat.c | 11 ++++++ >> > drivers/dax/kmem.c | 13 +------ >> > include/linux/acpi.h | 6 ++++ >> > include/linux/memory-tiers.h | 8 +++++ >> > mm/memory-tiers.c | 70 +++++++++++++++++++++++++++++++++--- >> > 5 files changed, 92 insertions(+), 16 deletions(-) >> > >> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c >> > index d6b85f0f6082..28812ec2c793 100644 >> > --- a/drivers/acpi/numa/hmat.c >> > +++ b/drivers/acpi/numa/hmat.c >> > @@ -38,6 +38,8 @@ static LIST_HEAD(targets); >> > static LIST_HEAD(initiators); >> > static LIST_HEAD(localities); >> > >> > +static LIST_HEAD(hmat_memory_types); >> > + >> >> HMAT isn't a device driver for some memory devices. So I don't think we >> should manage memory types in HMAT. > > I can put it back in memory-tier.c. How about the list? Do we still > need to keep a separate list for storing late_inited memory nodes? > And how about the list name if we need to remove the prefix "hmat_"? I don't think we need a separate list for memory-less nodes. Just iterate all memory-less nodes. > >> Instead, if the memory_type of a >> node isn't set by the driver, we should manage it in memory-tier.c as >> fallback. >> > > Do you mean some device drivers may init memory tiers between > memory_tier_init() and late_initcall(memory_tier_late_init);? > And this is the reason why you mention to exclude > "node_memory_types[nid].memtype != NULL" in memory_tier_late_init(). > Is my understanding correct? Yes. >> > static DEFINE_MUTEX(target_lock); >> > >> > /* >> > @@ -149,6 +151,12 @@ int acpi_get_genport_coordinates(u32 uid, >> > } >> > EXPORT_SYMBOL_NS_GPL(acpi_get_genport_coordinates, CXL); >> > >> > +struct memory_dev_type *hmat_find_alloc_memory_type(int adist) >> > +{ >> > + return find_alloc_memory_type(adist, &hmat_memory_types); >> > +} >> > +EXPORT_SYMBOL_GPL(hmat_find_alloc_memory_type); >> > + >> > static __init void alloc_memory_initiator(unsigned int cpu_pxm) >> > { >> > struct memory_initiator *initiator; >> > @@ -1038,6 +1046,9 @@ static __init int hmat_init(void) >> > if (!hmat_set_default_dram_perf()) >> > register_mt_adistance_algorithm(&hmat_adist_nb); >> > >> > + /* Post-create CPUless memory tiers after getting HMAT info */ >> > + memory_tier_late_init(); >> > + >> >> This should be called in memory-tier.c via >> >> late_initcall(memory_tier_late_init); >> >> Then, we don't need hmat to call it. >> > > Thanks. Learned! > > >> > return 0; >> > out_put: >> > hmat_free_structures(); >> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c >> > index 42ee360cf4e3..aee17ab59f4f 100644 >> > --- a/drivers/dax/kmem.c >> > +++ b/drivers/dax/kmem.c >> > @@ -55,21 +55,10 @@ static LIST_HEAD(kmem_memory_types); >> > >> > static struct memory_dev_type *kmem_find_alloc_memory_type(int adist) >> > { >> > - bool found = false; >> > struct memory_dev_type *mtype; >> > >> > mutex_lock(&kmem_memory_type_lock); >> > - list_for_each_entry(mtype, &kmem_memory_types, list) { >> > - if (mtype->adistance == adist) { >> > - found = true; >> > - break; >> > - } >> > - } >> > - if (!found) { >> > - mtype = alloc_memory_type(adist); >> > - if (!IS_ERR(mtype)) >> > - list_add(&mtype->list, &kmem_memory_types); >> > - } >> > + mtype = find_alloc_memory_type(adist, &kmem_memory_types); >> > mutex_unlock(&kmem_memory_type_lock); >> > >> > return mtype; >> > diff --git a/include/linux/acpi.h b/include/linux/acpi.h >> > index b7165e52b3c6..3f927ff01f02 100644 >> > --- a/include/linux/acpi.h >> > +++ b/include/linux/acpi.h >> > @@ -434,12 +434,18 @@ int thermal_acpi_critical_trip_temp(struct acpi_device *adev, int *ret_temp); >> > >> > #ifdef CONFIG_ACPI_HMAT >> > int acpi_get_genport_coordinates(u32 uid, struct access_coordinate *coord); >> > +struct memory_dev_type *hmat_find_alloc_memory_type(int adist); >> > #else >> > static inline int acpi_get_genport_coordinates(u32 uid, >> > struct access_coordinate *coord) >> > { >> > return -EOPNOTSUPP; >> > } >> > + >> > +static inline struct memory_dev_type *hmat_find_alloc_memory_type(int adist) >> > +{ >> > + return NULL; >> > +} >> > #endif >> > >> > #ifdef CONFIG_ACPI_NUMA >> > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h >> > index 69e781900082..4bc2596c5774 100644 >> > --- a/include/linux/memory-tiers.h >> > +++ b/include/linux/memory-tiers.h >> > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist); >> > int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, >> > const char *source); >> > int mt_perf_to_adistance(struct access_coordinate *perf, int *adist); >> > +struct memory_dev_type *find_alloc_memory_type(int adist, >> > + struct list_head *memory_types); >> > +void memory_tier_late_init(void); >> > #ifdef CONFIG_MIGRATION >> > int next_demotion_node(int node); >> > void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); >> > @@ -136,5 +139,10 @@ static inline int mt_perf_to_adistance(struct access_coordinate *perf, int *adis >> > { >> > return -EIO; >> > } >> > + >> > +static inline void memory_tier_late_init(void) >> > +{ >> > + >> > +} >> > #endif /* CONFIG_NUMA */ >> > #endif /* _LINUX_MEMORY_TIERS_H */ >> > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c >> > index 0537664620e5..79f748d60e6f 100644 >> > --- a/mm/memory-tiers.c >> > +++ b/mm/memory-tiers.c >> > @@ -6,6 +6,7 @@ >> > #include <linux/memory.h> >> > #include <linux/memory-tiers.h> >> > #include <linux/notifier.h> >> > +#include <linux/acpi.h> >> > >> > #include "internal.h" >> > >> > @@ -35,6 +36,7 @@ struct node_memory_type_map { >> > }; >> > >> > static DEFINE_MUTEX(memory_tier_lock); >> > +static DEFINE_MUTEX(mt_perf_lock); >> >> Please add comments about what it protects. And put it near the data >> structure it protects. >> > > Where is better for me to add comments for this? Code? Patch description? > Will put it closer to the protected data. Thanks. Just put the comments at the above of the lock in the source code. > > >> > static LIST_HEAD(memory_tiers); >> > static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; >> > struct memory_dev_type *default_dram_type; >> > @@ -623,6 +625,58 @@ void clear_node_memory_type(int node, struct memory_dev_type *memtype) >> > } >> > EXPORT_SYMBOL_GPL(clear_node_memory_type); >> > >> > +struct memory_dev_type *find_alloc_memory_type(int adist, struct list_head *memory_types) >> > +{ >> > + bool found = false; >> > + struct memory_dev_type *mtype; >> > + >> > + list_for_each_entry(mtype, memory_types, list) { >> > + if (mtype->adistance == adist) { >> > + found = true; >> > + break; >> > + } >> > + } >> > + if (!found) { >> > + mtype = alloc_memory_type(adist); >> > + if (!IS_ERR(mtype)) >> > + list_add(&mtype->list, memory_types); >> > + } >> > + >> > + return mtype; >> > +} >> > +EXPORT_SYMBOL_GPL(find_alloc_memory_type); >> > + >> > +static void memory_tier_late_create(int node) >> > +{ >> > + struct memory_dev_type *mtype = NULL; >> > + int adist = MEMTIER_ADISTANCE_DRAM; >> > + >> > + mt_calc_adistance(node, &adist); >> > + if (adist != MEMTIER_ADISTANCE_DRAM) { >> >> We can manage default_dram_type() via find_alloc_memory_type() >> too. >> >> And, if "node_memory_types[node].memtype == NULL", we can call >> mt_calc_adistance(node, &adist) and find_alloc_memory_type() in >> set_node_memory_tier(). Then, we can cover hotpluged memory node too. >> >> > + mtype = hmat_find_alloc_memory_type(adist); >> > + if (!IS_ERR(mtype)) >> > + __init_node_memory_type(node, mtype); >> > + else >> > + pr_err("Failed to allocate a memory type at %s()\n", __func__); >> > + } >> > + >> > + set_node_memory_tier(node); >> > +} >> > + >> > +void memory_tier_late_init(void) >> > +{ >> > + int nid; >> > + >> > + mutex_lock(&memory_tier_lock); >> > + for_each_node_state(nid, N_MEMORY) >> > + if (!node_state(nid, N_CPU)) >> >> We should exclude "node_memory_types[nid].memtype != NULL". Some memory >> nodes may be onlined by some device drivers and setup memory tiers >> already. >> >> > + memory_tier_late_create(nid); >> > + >> > + establish_demotion_targets(); >> > + mutex_unlock(&memory_tier_lock); >> > +} >> > +EXPORT_SYMBOL_GPL(memory_tier_late_init); >> > + >> > static void dump_hmem_attrs(struct access_coordinate *coord, const char *prefix) >> > { >> > pr_info( >> > @@ -636,7 +690,7 @@ int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, >> > { >> > int rc = 0; >> > >> > - mutex_lock(&memory_tier_lock); >> > + mutex_lock(&mt_perf_lock); >> > if (default_dram_perf_error) { >> > rc = -EIO; >> > goto out; >> > @@ -684,7 +738,7 @@ int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, >> > } >> > >> > out: >> > - mutex_unlock(&memory_tier_lock); >> > + mutex_unlock(&mt_perf_lock); >> > return rc; >> > } >> > >> > @@ -700,7 +754,7 @@ int mt_perf_to_adistance(struct access_coordinate *perf, int *adist) >> > perf->read_bandwidth + perf->write_bandwidth == 0) >> > return -EINVAL; >> > >> > - mutex_lock(&memory_tier_lock); >> > + mutex_lock(&mt_perf_lock); >> > /* >> > * The abstract distance of a memory node is in direct proportion to >> > * its memory latency (read + write) and inversely proportional to its >> > @@ -713,7 +767,7 @@ int mt_perf_to_adistance(struct access_coordinate *perf, int *adist) >> > (default_dram_perf.read_latency + default_dram_perf.write_latency) * >> > (default_dram_perf.read_bandwidth + default_dram_perf.write_bandwidth) / >> > (perf->read_bandwidth + perf->write_bandwidth); >> > - mutex_unlock(&memory_tier_lock); >> > + mutex_unlock(&mt_perf_lock); >> > >> > return 0; >> > } >> > @@ -836,6 +890,14 @@ static int __init memory_tier_init(void) >> > * types assigned. >> > */ >> > for_each_node_state(node, N_MEMORY) { >> > + if (!node_state(node, N_CPU)) >> > + /* >> > + * Defer memory tier initialization on CPUless numa nodes. >> > + * These will be initialized when HMAT information is >> >> HMAT is platform specific, we should avoid to mention it in general code >> if possible. >> > > Will fix! Thanks, > > >> > + * available. >> > + */ >> > + continue; >> > + >> > memtier = set_node_memory_tier(node); >> > if (IS_ERR(memtier)) >> > /* >> -- Best Regards, Huang, Ying
From: Alison Schofield <alison.schofield@intel.com> Exercise cxl list, libcxl, and driver pieces of the get poison list pathway. Inject and clear poison using debugfs and use cxl-cli to read the poison list by memdev and by region. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- test/cxl-poison.sh | 137 +++++++++++++++++++++++++++++++++++++++++++++ test/meson.build | 2 + 2 files changed, 139 insertions(+) create mode 100644 test/cxl-poison.sh diff --git a/test/cxl-poison.sh b/test/cxl-poison.sh new file mode 100644 index 000000000000..af2e9dcd1a11 --- /dev/null +++ b/test/cxl-poison.sh @@ -0,0 +1,137 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2023 Intel Corporation. All rights reserved. + +. "$(dirname "$0")"/common + +rc=77 + +set -ex + +trap 'err $LINENO' ERR + +check_prereq "jq" + +modprobe -r cxl_test +modprobe cxl_test + +rc=1 + +# THEORY OF OPERATION: Exercise cxl-cli and cxl driver ability to +# inject, clear, and get the poison list. Do it by memdev and by region. + +find_memdev() +{ + readarray -t capable_mems < <("$CXL" list -b "$CXL_TEST_BUS" -M | + jq -r ".[] | select(.pmem_size != null) | + select(.ram_size != null) | .memdev") + + if [ ${#capable_mems[@]} == 0 ]; then + echo "no memdevs found for test" + err "$LINENO" + fi + + memdev=${capable_mems[0]} +} + +create_x2_region() +{ + # Find an x2 decoder + decoder="$($CXL list -b "$CXL_TEST_BUS" -D -d root | jq -r ".[] | + select(.pmem_capable == true) | + select(.nr_targets == 2) | + .decoder")" + + # Find a memdev for each host-bridge interleave position + port_dev0="$($CXL list -T -d "$decoder" | jq -r ".[] | + .targets | .[] | select(.position == 0) | .target")" + port_dev1="$($CXL list -T -d "$decoder" | jq -r ".[] | + .targets | .[] | select(.position == 1) | .target")" + mem0="$($CXL list -M -p "$port_dev0" | jq -r ".[0].memdev")" + mem1="$($CXL list -M -p "$port_dev1" | jq -r ".[0].memdev")" + + region="$($CXL create-region -d "$decoder" -m "$mem0" "$mem1" | + jq -r ".region")" + if [[ ! $region ]]; then + echo "create-region failed for $decoder" + err "$LINENO" + fi + echo "$region" +} + +# When cxl-cli support for inject and clear arrives, replace +# the writes to /sys/kernel/debug with the new cxl commands. + +inject_poison_sysfs() +{ + memdev="$1" + addr="$2" + + echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/inject_poison +} + +clear_poison_sysfs() +{ + memdev="$1" + addr="$2" + + echo "$addr" > /sys/kernel/debug/cxl/"$memdev"/clear_poison +} + +validate_poison_found() +{ + list_by="$1" + nr_expect="$2" + + poison_list="$($CXL list "$list_by" --media-errors | + jq -r '.[].media_errors')" + if [[ ! $poison_list ]]; then + nr_found=0 + else + nr_found=$(jq "length" <<< "$poison_list") + fi + if [ "$nr_found" -ne "$nr_expect" ]; then + echo "$nr_expect poison records expected, $nr_found found" + err "$LINENO" + fi +} + +test_poison_by_memdev() +{ + find_memdev + inject_poison_sysfs "$memdev" "0x40000000" + inject_poison_sysfs "$memdev" "0x40001000" + inject_poison_sysfs "$memdev" "0x600" + inject_poison_sysfs "$memdev" "0x0" + validate_poison_found "-m $memdev" 4 + + clear_poison_sysfs "$memdev" "0x40000000" + clear_poison_sysfs "$memdev" "0x40001000" + clear_poison_sysfs "$memdev" "0x600" + clear_poison_sysfs "$memdev" "0x0" + validate_poison_found "-m $memdev" 0 +} + +test_poison_by_region() +{ + create_x2_region + inject_poison_sysfs "$mem0" "0x40000000" + inject_poison_sysfs "$mem1" "0x40000000" + validate_poison_found "-r $region" 2 + + clear_poison_sysfs "$mem0" "0x40000000" + clear_poison_sysfs "$mem1" "0x40000000" + validate_poison_found "-r $region" 0 +} + +# Turn tracing on. Note that 'cxl list --poison' does toggle the tracing. +# Turning it on here allows the test user to also view inject and clear +# trace events. +echo 1 > /sys/kernel/tracing/events/cxl/cxl_poison/enable + +test_poison_by_memdev +test_poison_by_region + +check_dmesg "$LINENO" + +modprobe -r cxl-test diff --git a/test/meson.build b/test/meson.build index a965a79fd6cb..d871e28e17ce 100644 --- a/test/meson.build +++ b/test/meson.build @@ -160,6 +160,7 @@ cxl_events = find_program('cxl-events.sh') cxl_sanitize = find_program('cxl-sanitize.sh') cxl_destroy_region = find_program('cxl-destroy-region.sh') cxl_qos_class = find_program('cxl-qos-class.sh') +cxl_poison = find_program('cxl-poison.sh') tests = [ [ 'libndctl', libndctl, 'ndctl' ], @@ -192,6 +193,7 @@ tests = [ [ 'cxl-sanitize.sh', cxl_sanitize, 'cxl' ], [ 'cxl-destroy-region.sh', cxl_destroy_region, 'cxl' ], [ 'cxl-qos-class.sh', cxl_qos_class, 'cxl' ], + [ 'cxl-poison.sh', cxl_poison, 'cxl' ], ] if get_option('destructive').enabled() -- 2.37.3
From: Alison Schofield <alison.schofield@intel.com> The --media-errors option to 'cxl list' retrieves poison lists from memory devices supporting the capability and displays the returned media_error records in the cxl list json. This option can apply to memdevs or regions. Include media-errors in the -vvv verbose option. Example usage in the Documentation/cxl/cxl-list.txt update. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- Documentation/cxl/cxl-list.txt | 62 +++++++++++++++++++++++++++++++++- cxl/filter.h | 3 ++ cxl/list.c | 3 ++ 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/Documentation/cxl/cxl-list.txt b/Documentation/cxl/cxl-list.txt index 838de4086678..6d3ef92c29e8 100644 --- a/Documentation/cxl/cxl-list.txt +++ b/Documentation/cxl/cxl-list.txt @@ -415,6 +415,66 @@ OPTIONS --region:: Specify CXL region device name(s), or device id(s), to filter the listing. +-L:: +--media-errors:: + Include media-error information. The poison list is retrieved from the + device(s) and media_error records are added to the listing. Apply this + option to memdevs and regions where devices support the poison list + capability. "offset:" is relative to the region resource when listing + by region and is the absolute device DPA when listing by memdev. + "source:" is one of: External, Internal, Injected, Vendor Specific, + or Unknown, as defined in CXL Specification v3.1 Table 8-140. + +---- +# cxl list -m mem9 --media-errors -u +{ + "memdev":"mem9", + "pmem_size":"1024.00 MiB (1073.74 MB)", + "pmem_qos_class":42, + "ram_size":"1024.00 MiB (1073.74 MB)", + "ram_qos_class":42, + "serial":"0x5", + "numa_node":1, + "host":"cxl_mem.5", + "media_errors":[ + { + "offset":"0x40000000", + "length":64, + "source":"Injected" + } + ] +} +---- +In the above example, region mappings can be found using: +"cxl list -p mem9 --decoders" +---- +# cxl list -r region5 --media-errors -u +{ + "region":"region5", + "resource":"0xf110000000", + "size":"2.00 GiB (2.15 GB)", + "type":"pmem", + "interleave_ways":2, + "interleave_granularity":4096, + "decode_state":"commit", + "media_errors":[ + { + "offset":"0x1000", + "length":64, + "source":"Injected" + }, + { + "offset":"0x2000", + "length":64, + "source":"Injected" + } + ] +} +---- +In the above example, memdev mappings can be found using: +"cxl list -r region5 --targets" and "cxl list -d <decoder_name>" + + -v:: --verbose:: Increase verbosity of the output. This can be specified @@ -431,7 +491,7 @@ OPTIONS devices with --idle. - *-vvv* Everything *-vv* provides, plus enable - --health and --partition. + --health, --partition, --media-errors. --debug:: If the cxl tool was built with debug enabled, turn on debug diff --git a/cxl/filter.h b/cxl/filter.h index 3f65990f835a..956a46e0c7a9 100644 --- a/cxl/filter.h +++ b/cxl/filter.h @@ -30,6 +30,7 @@ struct cxl_filter_params { bool fw; bool alert_config; bool dax; + bool media_errors; int verbose; struct log_ctx ctx; }; @@ -88,6 +89,8 @@ static inline unsigned long cxl_filter_to_flags(struct cxl_filter_params *param) flags |= UTIL_JSON_ALERT_CONFIG; if (param->dax) flags |= UTIL_JSON_DAX | UTIL_JSON_DAX_DEVS; + if (param->media_errors) + flags |= UTIL_JSON_MEDIA_ERRORS; return flags; } diff --git a/cxl/list.c b/cxl/list.c index 93ba51ef895c..0b25d78248d5 100644 --- a/cxl/list.c +++ b/cxl/list.c @@ -57,6 +57,8 @@ static const struct option options[] = { "include memory device firmware information"), OPT_BOOLEAN('A', "alert-config", ¶m.alert_config, "include alert configuration information"), + OPT_BOOLEAN('L', "media-errors", ¶m.media_errors, + "include media-error information "), OPT_INCR('v', "verbose", ¶m.verbose, "increase output detail"), #ifdef ENABLE_DEBUG OPT_BOOLEAN(0, "debug", &debug, "debug list walk"), @@ -121,6 +123,7 @@ int cmd_list(int argc, const char **argv, struct cxl_ctx *ctx) param.fw = true; param.alert_config = true; param.dax = true; + param.media_errors = true; /* fallthrough */ case 2: param.idle = true; -- 2.37.3
From: Alison Schofield <alison.schofield@intel.com> Media_error records are logged as events in the kernel tracing subsystem. To prepare the media_error records for cxl list, enable tracing, trigger the poison list read, and parse the generated cxl_poison events into a json representation. Use the event_trace private parsing option to customize the json representation based on cxl-list calling options and event field settings. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- cxl/json.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 194 insertions(+) diff --git a/cxl/json.c b/cxl/json.c index fbe41c78e82a..974e98f13cec 100644 --- a/cxl/json.c +++ b/cxl/json.c @@ -1,16 +1,20 @@ // SPDX-License-Identifier: GPL-2.0 // Copyright (C) 2015-2021 Intel Corporation. All rights reserved. #include <limits.h> +#include <errno.h> #include <util/json.h> +#include <util/bitmap.h> #include <uuid/uuid.h> #include <cxl/libcxl.h> #include <json-c/json.h> #include <json-c/printbuf.h> #include <ccan/short_types/short_types.h> +#include <tracefs/tracefs.h> #include "filter.h" #include "json.h" #include "../daxctl/json.h" +#include "event_trace.h" #define CXL_FW_VERSION_STR_LEN 16 #define CXL_FW_MAX_SLOTS 4 @@ -571,6 +575,184 @@ err_jobj: return NULL; } +/* CXL Spec 3.1 Table 8-140 Media Error Record */ +#define CXL_POISON_SOURCE_MAX 7 +static const char *poison_source[] = { "Unknown", "External", "Internal", + "Injected", "Reserved", "Reserved", + "Reserved", "Vendor" }; + +/* CXL Spec 3.1 Table 8-139 Get Poison List Output Payload */ +#define CXL_POISON_FLAG_MORE BIT(0) +#define CXL_POISON_FLAG_OVERFLOW BIT(1) +#define CXL_POISON_FLAG_SCANNING BIT(2) + +static int poison_event_to_json(struct tep_event *event, + struct tep_record *record, + struct event_ctx *e_ctx) +{ + struct poison_ctx *p_ctx = e_ctx->poison_ctx; + struct json_object *jp, *jobj, *jpoison = p_ctx->jpoison; + struct cxl_memdev *memdev = p_ctx->memdev; + struct cxl_region *region = p_ctx->region; + unsigned long flags = p_ctx->flags; + const char *region_name = NULL; + char flag_str[32] = { '\0' }; + bool overflow = false; + u8 source, pflags; + u64 offset, ts; + u32 length; + char *str; + int len; + + jp = json_object_new_object(); + if (!jp) + return -ENOMEM; + + /* Skip records not in this region when listing by region */ + if (region) + region_name = cxl_region_get_devname(region); + if (region_name) + str = tep_get_field_raw(NULL, event, "region", record, &len, 0); + if ((region_name) && (strcmp(region_name, str) != 0)) { + json_object_put(jp); + return 0; + } + /* Include offset,length by region (hpa) or by memdev (dpa) */ + if (region) { + offset = cxl_get_field_u64(event, record, "hpa"); + if (offset != ULLONG_MAX) { + offset = offset - cxl_region_get_resource(region); + jobj = util_json_object_hex(offset, flags); + if (jobj) + json_object_object_add(jp, "offset", jobj); + } + } else if (memdev) { + offset = cxl_get_field_u64(event, record, "dpa"); + if (offset != ULLONG_MAX) { + jobj = util_json_object_hex(offset, flags); + if (jobj) + json_object_object_add(jp, "offset", jobj); + } + } + length = cxl_get_field_u32(event, record, "dpa_length"); + jobj = util_json_object_size(length, flags); + if (jobj) + json_object_object_add(jp, "length", jobj); + + /* Always include the poison source */ + source = cxl_get_field_u8(event, record, "source"); + if (source <= CXL_POISON_SOURCE_MAX) + jobj = json_object_new_string(poison_source[source]); + else + jobj = json_object_new_string("Reserved"); + if (jobj) + json_object_object_add(jp, "source", jobj); + + /* Include flags and overflow time if present */ + pflags = cxl_get_field_u8(event, record, "flags"); + if (pflags && pflags < UCHAR_MAX) { + if (pflags & CXL_POISON_FLAG_MORE) + strcat(flag_str, "More,"); + if (pflags & CXL_POISON_FLAG_SCANNING) + strcat(flag_str, "Scanning,"); + if (pflags & CXL_POISON_FLAG_OVERFLOW) { + strcat(flag_str, "Overflow,"); + overflow = true; + } + jobj = json_object_new_string(flag_str); + if (jobj) + json_object_object_add(jp, "flags", jobj); + } + if (overflow) { + ts = cxl_get_field_u64(event, record, "overflow_ts"); + jobj = util_json_object_hex(ts, flags); + if (jobj) + json_object_object_add(jp, "overflow_t", jobj); + } + json_object_array_add(jpoison, jp); + + return 0; +} + +static struct json_object * +util_cxl_poison_events_to_json(struct tracefs_instance *inst, + struct poison_ctx *p_ctx) +{ + struct event_ctx ectx = { + .event_name = "cxl_poison", + .event_pid = getpid(), + .system = "cxl", + .poison_ctx = p_ctx, + .parse_event = poison_event_to_json, + }; + int rc = 0; + + p_ctx->jpoison = json_object_new_array(); + if (!p_ctx->jpoison) + return NULL; + + rc = cxl_parse_events(inst, &ectx); + if (rc < 0) { + fprintf(stderr, "Failed to parse events: %d\n", rc); + goto put_jobj; + } + if (json_object_array_length(p_ctx->jpoison) == 0) + goto put_jobj; + + return p_ctx->jpoison; + +put_jobj: + json_object_put(p_ctx->jpoison); + return NULL; +} + +static struct json_object * +util_cxl_poison_list_to_json(struct cxl_region *region, + struct cxl_memdev *memdev, + unsigned long flags) +{ + struct json_object *jpoison = NULL; + struct poison_ctx p_ctx; + struct tracefs_instance *inst; + int rc; + + inst = tracefs_instance_create("cxl list"); + if (!inst) { + fprintf(stderr, "tracefs_instance_create() failed\n"); + return NULL; + } + + rc = cxl_event_tracing_enable(inst, "cxl", "cxl_poison"); + if (rc < 0) { + fprintf(stderr, "Failed to enable trace: %d\n", rc); + goto err_free; + } + + if (region) + rc = cxl_region_trigger_poison_list(region); + else + rc = cxl_memdev_trigger_poison_list(memdev); + if (rc) + goto err_free; + + rc = cxl_event_tracing_disable(inst); + if (rc < 0) { + fprintf(stderr, "Failed to disable trace: %d\n", rc); + goto err_free; + } + + p_ctx = (struct poison_ctx) { + .region = region, + .memdev = memdev, + .flags = flags, + }; + jpoison = util_cxl_poison_events_to_json(inst, &p_ctx); + +err_free: + tracefs_instance_free(inst); + return jpoison; +} + struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, unsigned long flags) { @@ -664,6 +846,12 @@ struct json_object *util_cxl_memdev_to_json(struct cxl_memdev *memdev, json_object_object_add(jdev, "firmware", jobj); } + if (flags & UTIL_JSON_MEDIA_ERRORS) { + jobj = util_cxl_poison_list_to_json(NULL, memdev, flags); + if (jobj) + json_object_object_add(jdev, "media_errors", jobj); + } + json_object_set_userdata(jdev, memdev, NULL); return jdev; } @@ -1012,6 +1200,12 @@ struct json_object *util_cxl_region_to_json(struct cxl_region *region, json_object_object_add(jregion, "state", jobj); } + if (flags & UTIL_JSON_MEDIA_ERRORS) { + jobj = util_cxl_poison_list_to_json(region, NULL, flags); + if (jobj) + json_object_object_add(jregion, "media_errors", jobj); + } + util_cxl_mappings_append_json(jregion, region, flags); if (flags & UTIL_JSON_DAX) { -- 2.37.3
From: Alison Schofield <alison.schofield@intel.com> Add helpers to extract the value of an event record field given the field name. This is useful when the user knows the name and format of the field and simply needs to get it. The helpers also return the 'type'_MAX of the type when the field is Since this is in preparation for adding a cxl_poison private parser for 'cxl list --media-errors' support those specific required types: u8, u32, u64. Signed-off-by: Alison Schofield <alison.schofield@intel.com> --- cxl/event_trace.c | 37 +++++++++++++++++++++++++++++++++++++ cxl/event_trace.h | 8 +++++++- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/cxl/event_trace.c b/cxl/event_trace.c index 640abdab67bf..324edb982888 100644 --- a/cxl/event_trace.c +++ b/cxl/event_trace.c @@ -15,6 +15,43 @@ #define _GNU_SOURCE #include <string.h> +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, + const char *name) +{ + unsigned long long val; + + if (tep_get_field_val(NULL, event, name, record, &val, 0)) + return ULLONG_MAX; + + return val; +} + +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, + const char *name) +{ + char *val; + int len; + + val = tep_get_field_raw(NULL, event, name, record, &len, 0); + if (!val) + return UINT_MAX; + + return *(u32 *)val; +} + +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, + const char *name) +{ + char *val; + int len; + + val = tep_get_field_raw(NULL, event, name, record, &len, 0); + if (!val) + return UCHAR_MAX; + + return *(u8 *)val; +} + static struct json_object *num_to_json(void *num, int elem_size, unsigned long flags) { bool sign = flags & TEP_FIELD_IS_SIGNED; diff --git a/cxl/event_trace.h b/cxl/event_trace.h index b77cafb410c4..7b30c3922aef 100644 --- a/cxl/event_trace.h +++ b/cxl/event_trace.h @@ -5,6 +5,7 @@ #include <json-c/json.h> #include <ccan/list/list.h> +#include <ccan/short_types/short_types.h> struct jlist_node { struct json_object *jobj; @@ -32,5 +33,10 @@ int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx); int cxl_event_tracing_enable(struct tracefs_instance *inst, const char *system, const char *event); int cxl_event_tracing_disable(struct tracefs_instance *inst); - +u8 cxl_get_field_u8(struct tep_event *event, struct tep_record *record, + const char *name); +u32 cxl_get_field_u32(struct tep_event *event, struct tep_record *record, + const char *name); +u64 cxl_get_field_u64(struct tep_event *event, struct tep_record *record, + const char *name); #endif -- 2.37.3
From: Alison Schofield <alison.schofield@intel.com> CXL event tracing provides helpers to iterate through a trace buffer and extract events of interest. It offers two parsing options: a default parser that adds every field of an event to a json object, and a private parsing option where the caller can parse each event as it wishes. Although the private parser can do some conditional parsing based on field values, it has no method to receive additional information needed to make parsing decisions in the callback. Provide additional information required by cxl_poison events by adding a pointer to the poison_ctx directly the struct event_context. Tidy-up the calling convention by passing the entire event_ctx to it's own parse_event method rather than growing the param list. This is in preparation for adding a private parser requiring the additional context for cxl_poison events. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> --- cxl/event_trace.c | 9 ++++----- cxl/event_trace.h | 10 +++++++++- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/cxl/event_trace.c b/cxl/event_trace.c index 93a95f9729fd..640abdab67bf 100644 --- a/cxl/event_trace.c +++ b/cxl/event_trace.c @@ -60,7 +60,7 @@ static struct json_object *num_to_json(void *num, int elem_size, unsigned long f } static int cxl_event_to_json(struct tep_event *event, struct tep_record *record, - struct list_head *jlist_head) + struct event_ctx *ctx) { struct json_object *jevent, *jobj, *jarray; struct tep_format_field **fields; @@ -190,7 +190,7 @@ static int cxl_event_to_json(struct tep_event *event, struct tep_record *record, } } - list_add_tail(jlist_head, &jnode->list); + list_add_tail(&ctx->jlist_head, &jnode->list); return 0; err_jevent: @@ -220,10 +220,9 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record, } if (event_ctx->parse_event) - return event_ctx->parse_event(event, record, - &event_ctx->jlist_head); + return event_ctx->parse_event(event, record, event_ctx); - return cxl_event_to_json(event, record, &event_ctx->jlist_head); + return cxl_event_to_json(event, record, event_ctx); } int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx) diff --git a/cxl/event_trace.h b/cxl/event_trace.h index 7f7773b2201f..b77cafb410c4 100644 --- a/cxl/event_trace.h +++ b/cxl/event_trace.h @@ -11,13 +11,21 @@ struct jlist_node { struct list_node list; }; +struct poison_ctx { + struct json_object *jpoison; + struct cxl_region *region; + struct cxl_memdev *memdev; + unsigned long flags; +}; + struct event_ctx { const char *system; struct list_head jlist_head; const char *event_name; /* optional */ int event_pid; /* optional */ + struct poison_ctx *poison_ctx; /* optional */ int (*parse_event)(struct tep_event *event, struct tep_record *record, - struct list_head *jlist_head); /* optional */ + struct event_ctx *ctx); }; int cxl_parse_events(struct tracefs_instance *inst, struct event_ctx *ectx); -- 2.37.3
From: Alison Schofield <alison.schofield@intel.com> When parsing CXL events, callers may only be interested in events that originate from the current process. Introduce an optional argument to the event trace context: event_pid. When event_pid is present, simply skip the parsing of events without a matching pid. It is not a failure to see other, non matching events. The initial use case for this is device poison listings where only the media-error records requested by this process are wanted. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> --- cxl/event_trace.c | 5 +++++ cxl/event_trace.h | 1 + 2 files changed, 6 insertions(+) diff --git a/cxl/event_trace.c b/cxl/event_trace.c index 1b5aa09de8b2..93a95f9729fd 100644 --- a/cxl/event_trace.c +++ b/cxl/event_trace.c @@ -214,6 +214,11 @@ static int cxl_event_parse(struct tep_event *event, struct tep_record *record, return 0; } + if (event_ctx->event_pid) { + if (event_ctx->event_pid != tep_data_pid(event->tep, record)) + return 0; + } + if (event_ctx->parse_event) return event_ctx->parse_event(event, record, &event_ctx->jlist_head); diff --git a/cxl/event_trace.h b/cxl/event_trace.h index ec6267202c8b..7f7773b2201f 100644 --- a/cxl/event_trace.h +++ b/cxl/event_trace.h @@ -15,6 +15,7 @@ struct event_ctx { const char *system; struct list_head jlist_head; const char *event_name; /* optional */ + int event_pid; /* optional */ int (*parse_event)(struct tep_event *event, struct tep_record *record, struct list_head *jlist_head); /* optional */ }; -- 2.37.3