From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751444AbWAWOZr (ORCPT ); Mon, 23 Jan 2006 09:25:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751454AbWAWOZr (ORCPT ); Mon, 23 Jan 2006 09:25:47 -0500 Received: from tayrelbas01.tay.hp.com ([161.114.80.244]:12983 "EHLO tayrelbas01.tay.hp.com") by vger.kernel.org with ESMTP id S1751444AbWAWOZq (ORCPT ); Mon, 23 Jan 2006 09:25:46 -0500 Date: Mon, 23 Jan 2006 06:23:16 -0800 From: Stephane Eranian To: "Bryan O'Sullivan" Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/6] 2.6.16-rc1 perfmon2 patch for review Message-ID: <20060123142316.GC5317@frankl.hpl.hp.com> Reply-To: eranian@hpl.hp.com References: <200601201520.k0KFKCdY023112@frankl.hpl.hp.com> <1137773148.28944.29.camel@serpentine.pathscale.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1137773148.28944.29.camel@serpentine.pathscale.com> User-Agent: Mutt/1.4.1i Organisation: HP Labs Palo Alto Address: HP Labs, 1U-17, 1501 Page Mill road, Palo Alto, CA 94304, USA. E-mail: eranian@hpl.hp.com Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Bryan, On Fri, Jan 20, 2006 at 08:05:48AM -0800, Bryan O'Sullivan wrote: > On Fri, 2006-01-20 at 07:20 -0800, Stephane Eranian wrote: > > > + * Fast, simple, yet decent quality random number generator based on > > + * a paper by David G. Carta ("Two Fast Implementations of the > > + * `Minimal Standard' Random Number Generator," Communications of the > > + * ACM, January, 1990). > > What on earth do you need a random number generator for? > Good question! Perfmon2 has support for kernel-level sampling buffer. You can setup sampling with a kernel buffer. At the end of a sampling period, a sample a added to the buffer. When the buffer becomes full, the user tool is notified. Most likely it will copy the data out of the buffer or do some processing, then it invokes the pfm_restart() system call to "reset" the buffer and resume monitoring. As you'd expect, the user tool is not involved between samples. The next sampling period is selected by the kernel. It can either be a fixed period determined when the counter was program or be randomized. Randomization is VERY important to avoid biased samples with certain measurements. In particular, when sampling of events that do occur very frequently. We need a simple/cheap randomizer to move the sampling period a bit. The Carta algorithm is very simple and provides good enough randomization. Furthermore it is totally determined by its sedd value so it is easy to reproduce the same pseudo-number series if necessary. > > + * XXX: Hack until we figure out which header file to use. > > + * Could use linux/random.h but then we would need > > + * asm-XX/random.h which does not exists yet > > + */ > > +#ifdef __ia64__ > > +#define __HAVE_ARCH_CARTA_RANDOM32 > > +#endif > > Please use the same mechanism as I did last week for __iowrite_copy32, > instead of this ifdeffery. If you search the archives, you'll find it. > I did some search and could not find the message thread you are referring to. Could you provide some URL? > > > + int ctx_fd; /* ret arg: fd for context */ > > Why not an s32? This is a file descriptor. I think most user programs think of a file descriptor as int. Looking at the kernel code for sys_read(),it looks like this field should instead be the using u32 date type. > > > + u32 reg_flags; /* input: flags, return: reg error */ > > This overloading is a bit gross, if I understand it correctly. You're > passing in a bitmask and getting back an integer, is that right? Or is > it a different kind of bitmask in either direction? You are passing a bitmask of flags. You are getting back the same bitmask of flags + a bitmask of error in the upper bits. The idea is that given that you are passing a vectors of structs, any one element in the vector can cause a failure. You get the syscall error code, but you'd like to figure out quickly which element caused trouble. For that you simply need to scan the upper bits of the reg_flags field. > > > +/* > > + * argument structure for pfm_write_pmds() and pfm_read_pmds() > > + */ > > +typedef struct { > > + u16 reg_num; /* which register */ > > + u16 reg_set; /* event set for this register */ > > + u32 reg_flags; /* input: flags, return: reg error */ > > + u64 reg_value; /* initial pmc/pmd value */ > > + u64 reg_long_reset; /* value to reload after notification */ > > + u64 reg_short_reset; /* reset after counter overflow */ > > + u64 reg_last_reset_val; /* return: PMD last reset value */ > > + u64 reg_ovfl_switch_cnt;/* #overflows before switch */ > > + unsigned long reg_reset_pmds[PFM_PMD_BV]; /* reset on overflow */ > > + unsigned long reg_smpl_pmds[PFM_PMD_BV]; /* record in sample */ > > + u64 reg_smpl_eventid; /* opaque event identifier */ > > + u64 reg_random_mask; /* bitmask used to limit random value */ > > + u32 reg_random_seed; /* seed for randomization */ > > + u32 reg_reserved2[7]; /* for future use */ > > +} pfarg_pmd_t; > > If this header file is shared with userspace (it looks like it is) and > these are syscall arguments, these should all be double-underscored > fixed-size types, as should all other types and structs that may be used > by userspace. Using unsigned long, in particular, makes extra work for > multiarch machines that will have to repack those structs. > The data structure is indeed shared with user space. I do not reuse the same perfmon.h for user applications. Instead, I have a "user" perfmon.h (that would eventually migrate to libc). It defines the field using the uint32_t, uint64_t notation instead. We had a long discussion with David Gibson about the bitmask reg_*pmds[] in the context of ABI issues on platforms running a 64-bit OS but with both a 32-bit and 64-bit ABIs. The number of bits in the bitmask is fixed. When using unsigned long the number of elements in the reg_*_pmds[] array differ between 32-bit and 64-bit but it does work. I am fine with hardcoding those fields as u64, if people think this is more appropriate or/and more efficient with 32-bit ABIs/architectures. > > > + * default value for the user and group security parameters in > > + * /proc/sys/kernel/perfmon > > + */ > > +#define PFM_GROUP_PERM_ANY -1 /* any user/group */ > > /proc???? > I must admit I don't quite follow the arguments about/sys vs. /proc if that is what you are asking for. I'd like to understand better why put entries in one vs. the other. > > Please don't declare functions as extern. It's redundant, and the head > penguins don't like the style. > > Also, some comment in the header that makes it obvious where's the > boundary between functions that are purely internal to perfmon, > functions that can be used by other kernel subsystems (if any), and > syscalls would be useful. > Will do that. > > +/* use for IA-64 only */ > > +#ifdef __ia64__ > > +#define pfm_release_dbregs(_t) do { } while (0) > > +#define pfm_use_dbregs(_t) (0) > > +#endif > > Please move this to asm-ia64/perfmon.h, then. > Well the issue is that this part is used whenever CONFIG_PERFMON is not selected. perfmon.h includes but only when CONFIG_PERFMON is selected. I would have to move asm/perfmon.h at the top of perfmon.h but it depends on some definitions in perfmon.h. > > +/* > > + * This header is at the beginning of the sampling buffer returned to the user. > > How big are these sampling buffers, and would they be a better match for > relayfs? Good question again. David Gibson suggested this a while back, same thing with Andrew Morton. The issue is that relayfs is designed to provide per-CPU buffers. Perfmon2 sampling buffers are used in system-wide mode where we do monitor on a per-CPU basis. So there relayfs could work. But perfmon2 also provides kernel level sampling buffer for per-thread sampling. There the buffer is attached to the thread not the processor it is running on. That's where this becomes very complicated with relayfs. Thanks for your feedback. -- -Stephane