linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Hack to use mkdir/rmdir in debugfs
@ 2013-01-23  3:01 Steven Rostedt
  2013-01-23  3:47 ` Steven Rostedt
  2013-01-23  4:08 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  3:01 UTC (permalink / raw)
  To: LKML, linux-fsdevel; +Cc: Greg Kroah-Hartman, Al Viro, Andrew Morton

Let me explain the situation.

I'm adding the ability to create multiple ftrace instances (buffers). I
have a directory: /debug/tracing/instances that will hold sub
directories that have the control files for these buffers. Basically,
each directory in the instances directory will be a duplicate of
the /debug/tracing directory, and will act agnostic from each other.

I original had:

# ls /debug/tracing/instances
new	free

Where to create an instance you had to echo the name of the instance
into the "new" file and it would be created:

# cd /debug/tracing/instances
# echo foo > new

# ls
new	foo	free

# ls foo
buffer_size_kb        free_buffer  trace_clock    trace_pipe
buffer_total_size_kb  set_event    trace_marker   tracing_on
events                trace        trace_options


And to remove it you would echo the name into the "free" file.

I find this interface rather awkward. A more elegant interface would be
to use mkdir and rmdir to create and remove these instances:

# cd /debug/traces/instances

# ls

# mkdir foo

# ls
foo


Unfortunately, the debugfs dir_inode_operations does not include a
mkdir. I decided to hijack the dir operations and install my own. But as
the mkdir needs to created debugfs directories with the instances
directory, they too will grab the i_node->i_mutex that is held by the
mkdir system call.

I finally did the following hack that works in my tests. But I'm a bit
uncomfortable and decided to call upon the VFS overlords to say this is
fine, or that I've must have stumbled upon a new kind of crack. Or maybe
this new crack is just fine.

I release the inode->i_mutex as the new_instance_create() has its own
locking to synchronize things and does the debugfs_create_dir() and
friends . Also, the instances directory is created at boot up and is
never destroyed, so I'm hoping that it's OK to release its i_mutex. I
even added a paranoid check to make sure it is the "instances"
directory. Note, the instances d_entry is saved as a static variable and
isn't needed to be recovered.

Is doing something as silly as the following fine, or is there a better
way?

----
static int instance_mkdir (struct inode *inode, struct dentry *dentry, umode_t mode)
{
	struct dentry *parent;
	int ret;

	/* Paranoid: Make sure the parent is the "instances" directory */
	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
	if (WARN_ON_ONCE(parent != trace_instance_dir))
		return -ENOENT;

	/*
	 * The inode mutex is locked, but debugfs_create_dir() will also
	 * take the mutex. As the instances directory can not be destroyed
	 * or changed in any other way, it is safe to unlock it, and
	 * let the dentry try. If two users try to make the same dir at
	 * the same time, then the new_instance_create() determine the
	 * winner.
	 */
	mutex_unlock(&inode->i_mutex);

	ret = new_instance_create(dentry->d_iname);

	mutex_lock(&inode->i_mutex);

	return ret;
}

static const struct inode_operations instance_dir_inode_operations = {
	.lookup		= simple_lookup,
	.mkdir		= instance_mkdir,
};

static __init void create_trace_instances(struct dentry *d_tracer)
{
	trace_instance_dir = debugfs_create_dir("instances", d_tracer);
	if (WARN_ON(!trace_instance_dir))
		return;

	/* Hijack the dir inode operations, to allow mkdir */
	trace_instance_dir->d_inode->i_op = &instance_dir_inode_operations;
}
----

As you can see, I haven't implemented the rmdir yet.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  3:01 [RFC] Hack to use mkdir/rmdir in debugfs Steven Rostedt
@ 2013-01-23  3:47 ` Steven Rostedt
  2013-01-23  4:08 ` Greg Kroah-Hartman
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  3:47 UTC (permalink / raw)
  To: LKML; +Cc: linux-fsdevel, Greg Kroah-Hartman, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 22:01 -0500, Steven Rostedt wrote:

> As you can see, I haven't implemented the rmdir yet.

I decided to implement this too :-)

static int instance_rmdir(struct inode *inode, struct dentry *dentry)
{
	struct dentry *parent;
	int ret;

	/* Paranoid: Make sure the parent is the "instances" directory */
	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
	if (WARN_ON_ONCE(parent != trace_instance_dir))
		return -ENOENT;

	/* The caller did a dget() on dentry */
	mutex_unlock(&dentry->d_inode->i_mutex);

	/*
	 * The inode mutex is locked, but debugfs_create_dir() will also
	 * take the mutex. As the instances directory can not be destroyed
	 * or changed in any other way, it is safe to unlock it, and
	 * let the dentry try. If two users try to make the same dir at
	 * the same time, then the instance_delete() will determine the
	 * winner.
	 */
	mutex_unlock(&inode->i_mutex);

	ret = instance_delete(dentry->d_iname);

	mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
	mutex_lock(&dentry->d_inode->i_mutex);

	return ret;
}

static const struct inode_operations instance_dir_inode_operations = {
	.lookup		= simple_lookup,
	.mkdir		= instance_mkdir,
	.rmdir		= instance_rmdir,
};


I ran it under lockdep and it didn't complain.

-- Steve




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  3:01 [RFC] Hack to use mkdir/rmdir in debugfs Steven Rostedt
  2013-01-23  3:47 ` Steven Rostedt
@ 2013-01-23  4:08 ` Greg Kroah-Hartman
  2013-01-23  4:31   ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2013-01-23  4:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, Jan 22, 2013 at 10:01:54PM -0500, Steven Rostedt wrote:
> Let me explain the situation.
> 
> I'm adding the ability to create multiple ftrace instances (buffers). I
> have a directory: /debug/tracing/instances that will hold sub
> directories that have the control files for these buffers. Basically,
> each directory in the instances directory will be a duplicate of
> the /debug/tracing directory, and will act agnostic from each other.
> 
> I original had:
> 
> # ls /debug/tracing/instances
> new	free
> 
> Where to create an instance you had to echo the name of the instance
> into the "new" file and it would be created:
> 
> # cd /debug/tracing/instances
> # echo foo > new
> 
> # ls
> new	foo	free
> 
> # ls foo
> buffer_size_kb        free_buffer  trace_clock    trace_pipe
> buffer_total_size_kb  set_event    trace_marker   tracing_on
> events                trace        trace_options
> 
> 
> And to remove it you would echo the name into the "free" file.
> 
> I find this interface rather awkward. A more elegant interface would be
> to use mkdir and rmdir to create and remove these instances:
> 
> # cd /debug/traces/instances
> 
> # ls
> 
> # mkdir foo
> 
> # ls
> foo
> 
> 
> Unfortunately, the debugfs dir_inode_operations does not include a
> mkdir. I decided to hijack the dir operations and install my own. But as
> the mkdir needs to created debugfs directories with the instances
> directory, they too will grab the i_node->i_mutex that is held by the
> mkdir system call.
> 
> I finally did the following hack that works in my tests. But I'm a bit
> uncomfortable and decided to call upon the VFS overlords to say this is
> fine, or that I've must have stumbled upon a new kind of crack. Or maybe
> this new crack is just fine.
> 
> I release the inode->i_mutex as the new_instance_create() has its own
> locking to synchronize things and does the debugfs_create_dir() and
> friends . Also, the instances directory is created at boot up and is
> never destroyed, so I'm hoping that it's OK to release its i_mutex. I
> even added a paranoid check to make sure it is the "instances"
> directory. Note, the instances d_entry is saved as a static variable and
> isn't needed to be recovered.
> 
> Is doing something as silly as the following fine, or is there a better
> way?

Yes, why not create your own fs for ftrace?  :)

> ----
> static int instance_mkdir (struct inode *inode, struct dentry *dentry, umode_t mode)
> {
> 	struct dentry *parent;
> 	int ret;
> 
> 	/* Paranoid: Make sure the parent is the "instances" directory */
> 	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
> 	if (WARN_ON_ONCE(parent != trace_instance_dir))
> 		return -ENOENT;
> 
> 	/*
> 	 * The inode mutex is locked, but debugfs_create_dir() will also
> 	 * take the mutex. As the instances directory can not be destroyed
> 	 * or changed in any other way, it is safe to unlock it, and
> 	 * let the dentry try. If two users try to make the same dir at
> 	 * the same time, then the new_instance_create() determine the
> 	 * winner.
> 	 */
> 	mutex_unlock(&inode->i_mutex);
> 
> 	ret = new_instance_create(dentry->d_iname);
> 
> 	mutex_lock(&inode->i_mutex);
> 
> 	return ret;
> }
> 
> static const struct inode_operations instance_dir_inode_operations = {
> 	.lookup		= simple_lookup,
> 	.mkdir		= instance_mkdir,
> };
> 
> static __init void create_trace_instances(struct dentry *d_tracer)
> {
> 	trace_instance_dir = debugfs_create_dir("instances", d_tracer);
> 	if (WARN_ON(!trace_instance_dir))
> 		return;
> 
> 	/* Hijack the dir inode operations, to allow mkdir */
> 	trace_instance_dir->d_inode->i_op = &instance_dir_inode_operations;
> }
> ----

But how can you callback to your code to let it know that something was
created in it?

Don't you need that for both mkdir and rmdir?

But again, I'd really not want to do this in debugfs, how about your own
filesystem?

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:08 ` Greg Kroah-Hartman
@ 2013-01-23  4:31   ` Steven Rostedt
  2013-01-23  4:41     ` Greg Kroah-Hartman
  2013-01-23  4:44     ` Steven Rostedt
  0 siblings, 2 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  4:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 20:08 -0800, Greg Kroah-Hartman wrote:

> > Is doing something as silly as the following fine, or is there a better
> > way?
> 
> Yes, why not create your own fs for ftrace?  :)

But but but...

debugfs is soooo convenient!

Do you think it would be worth doing that though? I only need the mkdir
and rmdir for this one instance. Nothing more.

> >	mutex_unlock(&inode->i_mutex);
> > 
> > 	ret = new_instance_create(dentry->d_iname);
> > 
> > 	mutex_lock(&inode->i_mutex);
> > 

> But how can you callback to your code to let it know that something was
> created in it?

I pass the dentry->d_iname to create a directory.

> 
> Don't you need that for both mkdir and rmdir?

It's a global list. It's very specific and doesn't need to be robust. It
all deals with modifying one global parameter. Not that hard. And I've
already implemented this. It works quite well :-)

> 
> But again, I'd really not want to do this in debugfs, how about your own
> filesystem?

I will note that this never modifies the debugfs code. But it does
circumvent it. That is, all this code lives in kernel/trace/trace.c. I
don't modify any of the debugfs code. I just replace the debugfs
dentry->d_inode->i_op with my own ops.

I can create my own fs, but that just seems to be overkill. The only
difference is that I need mkdir and rmdir for this one instance.

That said, perhaps it wouldn't be too hard to create a ftracefs. Where
should it go? fs/ftrace or perhaps kernel/trace/fs ?

I notice that I only use:

debugfs_create_file()
debugfs_remove();
debugfs_create_dir();
debugfs_remove_recursive();
debugfs_initialized()

so maybe it wouldn't be such a far fetch thing to implement.

But then would I be able to still mount it in /debug/tracing ? As this
is where everything currently uses it? But then we need to teach admins
to add it there, or someplace else. /sys/kernel/ftrace?

Tools like trace-cmd and perf already expect it to be in the debugfs
tracing directory, and will automate mounting it to /sys/kernel/debug/
if not found. This may break userspace if I make another fs.

-- Steve




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:31   ` Steven Rostedt
@ 2013-01-23  4:41     ` Greg Kroah-Hartman
  2013-01-23  4:52       ` Steven Rostedt
  2013-01-23  4:44     ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2013-01-23  4:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, Jan 22, 2013 at 11:31:35PM -0500, Steven Rostedt wrote:
> On Tue, 2013-01-22 at 20:08 -0800, Greg Kroah-Hartman wrote:
> 
> > > Is doing something as silly as the following fine, or is there a better
> > > way?
> > 
> > Yes, why not create your own fs for ftrace?  :)
> 
> But but but...
> 
> debugfs is soooo convenient!
> 
> Do you think it would be worth doing that though? I only need the mkdir
> and rmdir for this one instance. Nothing more.
> 
> > >	mutex_unlock(&inode->i_mutex);
> > > 
> > > 	ret = new_instance_create(dentry->d_iname);
> > > 
> > > 	mutex_lock(&inode->i_mutex);
> > > 
> 
> > But how can you callback to your code to let it know that something was
> > created in it?
> 
> I pass the dentry->d_iname to create a directory.
> 
> > 
> > Don't you need that for both mkdir and rmdir?
> 
> It's a global list. It's very specific and doesn't need to be robust. It
> all deals with modifying one global parameter. Not that hard. And I've
> already implemented this. It works quite well :-)
> 
> > 
> > But again, I'd really not want to do this in debugfs, how about your own
> > filesystem?
> 
> I will note that this never modifies the debugfs code. But it does
> circumvent it.

Ah, I like circumventing debugfs, it kind of fits right into its mission :)

> That is, all this code lives in kernel/trace/trace.c. I don't modify
> any of the debugfs code. I just replace the debugfs
> dentry->d_inode->i_op with my own ops.

Oh, ok, I thought you were talking about modifying the debugfs core.

> I can create my own fs, but that just seems to be overkill. The only
> difference is that I need mkdir and rmdir for this one instance.
> 
> That said, perhaps it wouldn't be too hard to create a ftracefs. Where
> should it go? fs/ftrace or perhaps kernel/trace/fs ?

Sure, it only takes 300 lines to write a fs so it's not hard to do your
own.

> I notice that I only use:
> 
> debugfs_create_file()
> debugfs_remove();
> debugfs_create_dir();
> debugfs_remove_recursive();
> debugfs_initialized()
> 
> so maybe it wouldn't be such a far fetch thing to implement.
> 
> But then would I be able to still mount it in /debug/tracing ? As this
> is where everything currently uses it? But then we need to teach admins
> to add it there, or someplace else. /sys/kernel/ftrace?
> 
> Tools like trace-cmd and perf already expect it to be in the debugfs
> tracing directory, and will automate mounting it to /sys/kernel/debug/
> if not found. This may break userspace if I make another fs.

Yes, you would need to require distros to mount it in the proper place,
which is a pain, but not impossible.  It's up to you, as I thought you
needed to modify debugfs, I didn't like it, but if you can do it in your
own code, I really don't care anymore :)

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:31   ` Steven Rostedt
  2013-01-23  4:41     ` Greg Kroah-Hartman
@ 2013-01-23  4:44     ` Steven Rostedt
  2013-01-23  4:48       ` Steven Rostedt
                         ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  4:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 23:31 -0500, Steven Rostedt wrote:

> > 
> > But again, I'd really not want to do this in debugfs, how about your own
> > filesystem?
> 
> I will note that this never modifies the debugfs code. But it does
> circumvent it. That is, all this code lives in kernel/trace/trace.c. I
> don't modify any of the debugfs code. I just replace the debugfs
> dentry->d_inode->i_op with my own ops.

Again, I want to stress that this doesn't touch the debugfs code. Here's
the real change that I've been testing. It includes the code for the
"new" and "free" files but those are not created because of an early
'return' I added. Notice that it's all contained in
kernel/trace/trace.c.

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 253cb51..851eb4a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4912,6 +4912,331 @@ static const struct file_operations rb_simple_fops = {
 	.llseek		= default_llseek,
 };
 
+struct dentry *trace_instance_dir;
+
+static void
+init_tracer_debugfs(struct trace_array *tr, struct dentry *d_tracer);
+
+static ssize_t
+trace_new_instance_read(struct file *filp, char __user *ubuf,
+			size_t cnt, loff_t *ppos)
+{
+	static char text[] =
+		"\nWrite into this file to create a new instance\n\n";
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, text, sizeof(text));
+}
+
+static int new_instance_create(const char *name)
+{
+	enum ring_buffer_flags rb_flags;
+	struct trace_array *tr;
+	int ret;
+	int i;
+
+	mutex_lock(&trace_types_lock);
+
+	ret = -EEXIST;
+	list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+		if (tr->name && strcmp(tr->name, name) == 0)
+			goto out_unlock;
+	}
+
+	ret = -ENOMEM;
+	tr = kzalloc(sizeof(*tr), GFP_KERNEL);
+	if (!tr)
+		goto out_unlock;
+
+	tr->name = kstrdup(name, GFP_KERNEL);
+	if (!tr->name)
+		goto out_free_tr;
+
+	raw_spin_lock_init(&tr->start_lock);
+
+	tr->current_trace = &nop_trace;
+
+	INIT_LIST_HEAD(&tr->systems);
+	INIT_LIST_HEAD(&tr->events);
+
+	rb_flags = trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0;
+
+	tr->buffer = ring_buffer_alloc(trace_buf_size, rb_flags);
+	if (!tr->buffer)
+		goto out_free_tr;
+
+	tr->data = alloc_percpu(struct trace_array_cpu);
+	if (!tr->data)
+		goto out_free_tr;
+
+	for_each_tracing_cpu(i) {
+		memset(per_cpu_ptr(tr->data, i), 0, sizeof(struct trace_array_cpu));
+		per_cpu_ptr(tr->data, i)->trace_cpu.cpu = i;
+		per_cpu_ptr(tr->data, i)->trace_cpu.tr = tr;
+	}
+
+	/* Holder for file callbacks */
+	tr->trace_cpu.cpu = RING_BUFFER_ALL_CPUS;
+	tr->trace_cpu.tr = tr;
+
+	tr->dir = debugfs_create_dir(name, trace_instance_dir);
+	if (!tr->dir)
+		goto out_free_tr;
+
+	ret = event_trace_add_tracer(tr->dir, tr);
+	if (ret)
+		goto out_free_tr;
+
+	init_tracer_debugfs(tr, tr->dir);
+
+	mutex_unlock(&trace_types_lock);
+
+	list_add(&tr->list, &ftrace_trace_arrays);
+
+	return 0;
+
+ out_free_tr:
+	if (tr->buffer)
+		ring_buffer_free(tr->buffer);
+	kfree(tr->name);
+	kfree(tr);
+
+ out_unlock:
+	mutex_unlock(&trace_types_lock);
+
+	return ret;
+
+}
+
+static ssize_t
+trace_new_instance_write(struct file *filp, const char __user *ubuf,
+			 size_t cnt, loff_t *ppos)
+{
+	char *buf;
+	char *name;
+	int ret;
+
+	/* Don't let names be bigger than 1024 */
+	if (cnt > 1024)
+		return -EINVAL;
+
+	ret = -ENOMEM;
+	name = kmalloc(cnt+1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	ret = -EFAULT;
+	if (strncpy_from_user(name, ubuf, cnt) < 0)
+		goto out_free_name;
+
+	name[cnt] = '\0';
+
+	/* remove leading and trailing whitespace */
+	ret = -ENOMEM;
+	buf = strstrip(name);
+	buf = kstrdup(buf, GFP_KERNEL);
+	if (!buf)
+		goto out_free_name;
+	kfree(name);
+	name = buf;
+
+	ret = new_instance_create(name);
+
+	kfree(name);
+
+	if (ret)
+		return ret;
+
+	(*ppos) += cnt;
+
+	return cnt;
+
+ out_free_name:
+	kfree(name);
+	return ret;
+}
+
+static ssize_t
+trace_del_instance_read(struct file *filp, char __user *ubuf,
+			size_t cnt, loff_t *ppos)
+{
+	static char text[] =
+		"\nWrite into this file to remove an instance\n\n";
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, text, sizeof(text));
+}
+
+static int instance_delete(const char *name)
+{
+	struct trace_array *tr;
+	int found = 0;
+	int ret;
+
+	mutex_lock(&trace_types_lock);
+
+	ret = -ENODEV;
+	list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+		if (tr->name && strcmp(tr->name, name) == 0) {
+			found = 1;
+			break;
+		}
+	}
+	if (!found)
+		goto out_unlock;
+
+	list_del(&tr->list);
+
+	event_trace_del_tracer(tr);
+	debugfs_remove_recursive(tr->dir);
+	free_percpu(tr->data);
+	ring_buffer_free(tr->buffer);
+
+	kfree(tr->name);
+	kfree(tr);
+
+	ret = 0;
+
+ out_unlock:
+	mutex_unlock(&trace_types_lock);
+
+	return ret;
+}
+
+static ssize_t
+trace_del_instance_write(struct file *filp, const char __user *ubuf,
+			 size_t cnt, loff_t *ppos)
+{
+	char *buf;
+	char *name;
+	int ret;
+
+	/* Don't let names be bigger than 1024 */
+	if (cnt > 1024)
+		return -EINVAL;
+
+	name = kmalloc(cnt+1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	ret = -EFAULT;
+	if (strncpy_from_user(name, ubuf, cnt) < 0)
+		goto out_free_name;
+
+	name[cnt] = '\0';
+
+	/* remove leading and trailing whitespace */
+	buf = strstrip(name);
+	buf = kstrdup(buf, GFP_KERNEL);
+	if (!buf)
+		goto out_free_name;
+	kfree(name);
+	name = buf;
+
+	ret = instance_delete(name);
+
+	(*ppos) += cnt;
+
+	return cnt;
+
+ out_free_name:
+	kfree(name);
+	return ret;
+}
+
+static const struct file_operations trace_new_instance_fops = {
+	.open		= tracing_open_generic,
+	.read		= trace_new_instance_read,
+	.write		= trace_new_instance_write,
+	.llseek		= default_llseek,
+};
+
+static const struct file_operations trace_del_instance_fops = {
+	.open		= tracing_open_generic,
+	.read		= trace_del_instance_read,
+	.write		= trace_del_instance_write,
+	.llseek		= default_llseek,
+};
+
+static int instance_mkdir (struct inode *inode, struct dentry *dentry, umode_t mode)
+{
+	struct dentry *parent;
+	int ret;
+
+	/* Paranoid: Make sure the parent is the "instances" directory */
+	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
+	if (WARN_ON_ONCE(parent != trace_instance_dir))
+		return -ENOENT;
+
+	/*
+	 * The inode mutex is locked, but debugfs_create_dir() will also
+	 * take the mutex. As the instances directory can not be destroyed
+	 * or changed in any other way, it is safe to unlock it, and
+	 * let the dentry try. If two users try to make the same dir at
+	 * the same time, then the new_instance_create() will determine the
+	 * winner.
+	 */
+	mutex_unlock(&inode->i_mutex);
+
+	ret = new_instance_create(dentry->d_iname);
+
+	mutex_lock(&inode->i_mutex);
+
+	return ret;
+}
+
+static int instance_rmdir(struct inode *inode, struct dentry *dentry)
+{
+	struct dentry *parent;
+	int ret;
+
+	/* Paranoid: Make sure the parent is the "instances" directory */
+	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
+	if (WARN_ON_ONCE(parent != trace_instance_dir))
+		return -ENOENT;
+
+	/* The caller did a dget() on dentry */
+	mutex_unlock(&dentry->d_inode->i_mutex);
+
+	/*
+	 * The inode mutex is locked, but debugfs_create_dir() will also
+	 * take the mutex. As the instances directory can not be destroyed
+	 * or changed in any other way, it is safe to unlock it, and
+	 * let the dentry try. If two users try to make the same dir at
+	 * the same time, then the instance_delete() will determine the
+	 * winner.
+	 */
+	mutex_unlock(&inode->i_mutex);
+
+	ret = instance_delete(dentry->d_iname);
+
+	mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
+	mutex_lock(&dentry->d_inode->i_mutex);
+
+	return ret;
+}
+
+static const struct inode_operations instance_dir_inode_operations = {
+	.lookup		= simple_lookup,
+	.mkdir		= instance_mkdir,
+	.rmdir		= instance_rmdir,
+};
+
+static __init void create_trace_instances(struct dentry *d_tracer)
+{
+	trace_instance_dir = debugfs_create_dir("instances", d_tracer);
+	if (WARN_ON(!trace_instance_dir))
+		return;
+
+	/* Hijack the dir inode operations, to allow mkdir */
+	trace_instance_dir->d_inode->i_op = &instance_dir_inode_operations;
+
+	return;
+	trace_create_file("new", 0644, trace_instance_dir, NULL,
+			  &trace_new_instance_fops);
+
+	trace_create_file("free", 0644, trace_instance_dir, NULL,
+			  &trace_del_instance_fops);
+}
+
 static void
 init_tracer_debugfs(struct trace_array *tr, struct dentry *d_tracer)
 {
@@ -4983,6 +5308,8 @@ static __init int tracer_init_debugfs(void)
 			&ftrace_update_tot_cnt, &tracing_dyn_info_fops);
 #endif
 
+	create_trace_instances(d_tracer);
+
 	create_trace_options_dir(&global_trace);
 
 	for_each_tracing_cpu(cpu)



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:44     ` Steven Rostedt
@ 2013-01-23  4:48       ` Steven Rostedt
  2013-01-23  4:55       ` Greg Kroah-Hartman
  2013-01-24 22:39       ` Valdis.Kletnieks
  2 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  4:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 23:44 -0500, Steven Rostedt wrote:

> +	ret = event_trace_add_tracer(tr->dir, tr);
> +	if (ret)
> +		goto out_free_tr;
> +
> +	init_tracer_debugfs(tr, tr->dir);
> +
> +	mutex_unlock(&trace_types_lock);
> +
> +	list_add(&tr->list, &ftrace_trace_arrays);

I just noticed this bug :-)

The list_add() needs to be inside the mutex protection. This code use to
be within the trace_new_instance_write(), and I just moved it into its
own function. Must have been where I screwed up.

-- Steve

> +
> +	return 0;
> +
> + out_free_tr:
> +	if (tr->buffer)
> +		ring_buffer_free(tr->buffer);
> +	kfree(tr->name);
> +	kfree(tr);
> +
> + out_unlock:
> +	mutex_unlock(&trace_types_lock);
> +
> +	return ret;
> +
> +}
> +


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:41     ` Greg Kroah-Hartman
@ 2013-01-23  4:52       ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  4:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 20:41 -0800, Greg Kroah-Hartman wrote:

> Yes, you would need to require distros to mount it in the proper place,
> which is a pain, but not impossible.  It's up to you, as I thought you
> needed to modify debugfs, I didn't like it, but if you can do it in your
> own code, I really don't care anymore :)
> 

OK, I'll keep it as is unless Al freaks out over it. Then I may need to
put more effort into it :-)

I just want to make sure I'm not missing something by releasing the
inode mutexes.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:44     ` Steven Rostedt
  2013-01-23  4:48       ` Steven Rostedt
@ 2013-01-23  4:55       ` Greg Kroah-Hartman
  2013-01-23  5:05         ` Steven Rostedt
  2013-01-24 22:39       ` Valdis.Kletnieks
  2 siblings, 1 reply; 12+ messages in thread
From: Greg Kroah-Hartman @ 2013-01-23  4:55 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, Jan 22, 2013 at 11:44:09PM -0500, Steven Rostedt wrote:
> On Tue, 2013-01-22 at 23:31 -0500, Steven Rostedt wrote:
> 
> > > 
> > > But again, I'd really not want to do this in debugfs, how about your own
> > > filesystem?
> > 
> > I will note that this never modifies the debugfs code. But it does
> > circumvent it. That is, all this code lives in kernel/trace/trace.c. I
> > don't modify any of the debugfs code. I just replace the debugfs
> > dentry->d_inode->i_op with my own ops.
> 
> Again, I want to stress that this doesn't touch the debugfs code. Here's
> the real change that I've been testing. It includes the code for the
> "new" and "free" files but those are not created because of an early
> 'return' I added. Notice that it's all contained in
> kernel/trace/trace.c.

Ok, then I'll just forget you ever asked anything about this and wish
you well :)

Have fun,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:55       ` Greg Kroah-Hartman
@ 2013-01-23  5:05         ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-23  5:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: LKML, linux-fsdevel, Al Viro, Andrew Morton

On Tue, 2013-01-22 at 20:55 -0800, Greg Kroah-Hartman wrote:
> On Tue, Jan 22, 2013 at 11:44:09PM -0500, Steven Rostedt wrote:
> > On Tue, 2013-01-22 at 23:31 -0500, Steven Rostedt wrote:
> > 
> > > > 
> > > > But again, I'd really not want to do this in debugfs, how about your own
> > > > filesystem?
> > > 
> > > I will note that this never modifies the debugfs code. But it does
> > > circumvent it. That is, all this code lives in kernel/trace/trace.c. I
> > > don't modify any of the debugfs code. I just replace the debugfs
> > > dentry->d_inode->i_op with my own ops.
> > 
> > Again, I want to stress that this doesn't touch the debugfs code. Here's
> > the real change that I've been testing. It includes the code for the
> > "new" and "free" files but those are not created because of an early
> > 'return' I added. Notice that it's all contained in
> > kernel/trace/trace.c.
> 
> Ok, then I'll just forget you ever asked anything about this and wish
> you well :)
> 
> Have fun,

Don't leave me.... I'm afraid....


I only Cc'd you because it was interacting with debugfs, and was hoping
that you could shed any light that what I did may cause me issues in the
future.

But it's my problem not yours. It may bite me in the ass later, but for
now, it looks like it may work.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-23  4:44     ` Steven Rostedt
  2013-01-23  4:48       ` Steven Rostedt
  2013-01-23  4:55       ` Greg Kroah-Hartman
@ 2013-01-24 22:39       ` Valdis.Kletnieks
  2013-01-24 23:53         ` Steven Rostedt
  2 siblings, 1 reply; 12+ messages in thread
From: Valdis.Kletnieks @ 2013-01-24 22:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Greg Kroah-Hartman, LKML, linux-fsdevel, Al Viro, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

On Tue, 22 Jan 2013 23:44:09 -0500, Steven Rostedt said:

> Again, I want to stress that this doesn't touch the debugfs code. Here's
> the real change that I've been testing. It includes the code for the
> "new" and "free" files but those are not created because of an early
> 'return' I added. Notice that it's all contained in
> kernel/trace/trace.c.

So I was looking through here, and something caught my eye....

> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 253cb51..851eb4a 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c


> +static int instance_mkdir (struct inode *inode, struct dentry *dentry, umode
_t mode)
> +{
> +	struct dentry *parent;
> +	int ret;
> +
> +	/* Paranoid: Make sure the parent is the "instances" directory */
> +	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
> +	if (WARN_ON_ONCE(parent != trace_instance_dir))
> +		return -ENOENT;
> +
> +	/*
> +	 * The inode mutex is locked, but debugfs_create_dir() will also
> +	 * take the mutex. As the instances directory can not be destroyed
> +	 * or changed in any other way, it is safe to unlock it, and
> +	 * let the dentry try. If two users try to make the same dir at
> +	 * the same time, then the new_instance_create() will determine the
> +	 * winner.
> +	 */
> +	mutex_unlock(&inode->i_mutex);
> +
> +	ret = new_instance_create(dentry->d_iname);
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	return ret;
> +}
> +
> +static int instance_rmdir(struct inode *inode, struct dentry *dentry)
> +{
> +	struct dentry *parent;
> +	int ret;
> +
> +	/* Paranoid: Make sure the parent is the "instances" directory */
> +	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
> +	if (WARN_ON_ONCE(parent != trace_instance_dir))
> +		return -ENOENT;
> +
> +	/* The caller did a dget() on dentry */
> +	mutex_unlock(&dentry->d_inode->i_mutex);
> +
> +	/*
> +	 * The inode mutex is locked, but debugfs_create_dir() will also
> +	 * take the mutex. As the instances directory can not be destroyed
> +	 * or changed in any other way, it is safe to unlock it, and
> +	 * let the dentry try. If two users try to make the same dir at
> +	 * the same time, then the instance_delete() will determine the
> +	 * winner.
> +	 */
> +	mutex_unlock(&inode->i_mutex);

Looks like the rmdir comment was slice-n-miced from the mkdir one?

What exactly *are* the race condition rules here, especially for mkdir
racing against an rmdir?

[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] Hack to use mkdir/rmdir in debugfs
  2013-01-24 22:39       ` Valdis.Kletnieks
@ 2013-01-24 23:53         ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-01-24 23:53 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Greg Kroah-Hartman, LKML, linux-fsdevel, Al Viro, Andrew Morton

On Thu, 2013-01-24 at 17:39 -0500, Valdis.Kletnieks@vt.edu wrote:

> > +static int instance_rmdir(struct inode *inode, struct dentry *dentry)
> > +{
> > +	struct dentry *parent;
> > +	int ret;
> > +
> > +	/* Paranoid: Make sure the parent is the "instances" directory */
> > +	parent = hlist_entry(inode->i_dentry.first, struct dentry, d_alias);
> > +	if (WARN_ON_ONCE(parent != trace_instance_dir))
> > +		return -ENOENT;
> > +
> > +	/* The caller did a dget() on dentry */
> > +	mutex_unlock(&dentry->d_inode->i_mutex);
> > +
> > +	/*
> > +	 * The inode mutex is locked, but debugfs_create_dir() will also
> > +	 * take the mutex. As the instances directory can not be destroyed
> > +	 * or changed in any other way, it is safe to unlock it, and
> > +	 * let the dentry try. If two users try to make the same dir at
> > +	 * the same time, then the instance_delete() will determine the
> > +	 * winner.
> > +	 */
> > +	mutex_unlock(&inode->i_mutex);
> 
> Looks like the rmdir comment was slice-n-miced from the mkdir one?
> 
> What exactly *are* the race condition rules here, especially for mkdir
> racing against an rmdir?

Note, after the i_mutex is released, new mutexes are grabbed to test if
we can create or delete a file:

static int new_instance_create(const char *name)
{
        enum ring_buffer_flags rb_flags;
        struct trace_array *tr;
        int ret;
        int i;

        mutex_lock(&trace_types_lock);

        ret = -EEXIST;
        list_for_each_entry(tr, &ftrace_trace_arrays, list) {
                if (tr->name && strcmp(tr->name, name) == 0)
                        goto out_unlock;
        }

        ret = -ENOMEM;
        tr = kzalloc(sizeof(*tr), GFP_KERNEL);
        if (!tr)
                goto out_unlock;

        tr->name = kstrdup(name, GFP_KERNEL);
        if (!tr->name)
                goto out_free_tr;

        raw_spin_lock_init(&tr->start_lock);

        tr->current_trace = &nop_trace;

        INIT_LIST_HEAD(&tr->systems);
        INIT_LIST_HEAD(&tr->events);

        rb_flags = trace_flags & TRACE_ITER_OVERWRITE ? RB_FL_OVERWRITE : 0;

        tr->buffer = ring_buffer_alloc(trace_buf_size, rb_flags);
        if (!tr->buffer)
                goto out_free_tr;

        tr->data = alloc_percpu(struct trace_array_cpu);
        if (!tr->data)
                goto out_free_tr;

        for_each_tracing_cpu(i) {
                memset(per_cpu_ptr(tr->data, i), 0, sizeof(struct trace_array_cpu));
                per_cpu_ptr(tr->data, i)->trace_cpu.cpu = i;
                per_cpu_ptr(tr->data, i)->trace_cpu.tr = tr;
        }

        /* Holder for file callbacks */
        tr->trace_cpu.cpu = RING_BUFFER_ALL_CPUS;
        tr->trace_cpu.tr = tr;

        tr->dir = debugfs_create_dir(name, trace_instance_dir);
        if (!tr->dir)
                goto out_free_tr;

        ret = event_trace_add_tracer(tr->dir, tr);
        if (ret)
                goto out_free_tr;

        init_tracer_debugfs(tr, tr->dir);

        list_add(&tr->list, &ftrace_trace_arrays);

        mutex_unlock(&trace_types_lock);

        return 0;

 out_free_tr:
        if (tr->buffer)
                ring_buffer_free(tr->buffer);
        kfree(tr->name);
        kfree(tr);

 out_unlock:
        mutex_unlock(&trace_types_lock);

        return ret;

}

static int instance_delete(const char *name)
{
        struct trace_array *tr;
        int found = 0;
        int ret;

        mutex_lock(&trace_types_lock);

        ret = -ENODEV;
        list_for_each_entry(tr, &ftrace_trace_arrays, list) {
                if (tr->name && strcmp(tr->name, name) == 0) {
                        found = 1;
                        break;
                }
        }
        if (!found)
                goto out_unlock;

        list_del(&tr->list);

        event_trace_del_tracer(tr);
        debugfs_remove_recursive(tr->dir);
        free_percpu(tr->data);
        ring_buffer_free(tr->buffer);

        kfree(tr->name);
        kfree(tr);

        ret = 0;

 out_unlock:
        mutex_unlock(&trace_types_lock);

        return ret;
}


The trace_types_lock protects the ftrace_trace_arrays, which holds the
descriptors for the entities that can be created or removed. The
debugfs_remove_recursive(), and debugfs_create_dir() etc, are children
of the inode i_mutex that we released. Basically, the trace_types_lock
takes over for creating and destroying directories. It's the same type
of protection I had when I had the "new" and "free" files that would
create the directory when echoing in "new" and delete one when echoing
in "free". There's really no difference. I'm only using the mkdir/rmdir
interface to trigger the exact same interface I had before. They both
have the same locking.

There is no rename or any other manipulation here. Even the parent
directory can't change (no way to delete it).

As for races, I've been running this little script to test it out, and
it works fine:

-- Steve

----
#!/bin/bash

if [ ! -d /debug/tracing/instances ]; then
	echo "No instances directory"
	exit 0
fi

cd /debug/tracing/instances

instance_slam() {
	while :; do
		mkdir x
		mkdir y
		mkdir z
		rmdir x
		rmdir y
		rmdir z
	done 2>/dev/null
}

instance_slam &
x=`jobs -l`
p1=`echo $x | cut -d' ' -f2`
echo $p1

instance_slam &
x=`jobs -l | tail -1`
p2=`echo $x | cut -d' ' -f2`
echo $p2

instance_slam &
x=`jobs -l | tail -1`
p3=`echo $x | cut -d' ' -f2`
echo $p3

instance_slam &
x=`jobs -l | tail -1`
p4=`echo $x | cut -d' ' -f2`
echo $p4

instance_slam &
x=`jobs -l | tail -1`
p5=`echo $x | cut -d' ' -f2`
echo $p5

for i in `seq 120`; do
	ls
	sleep 1
done
kill -9 $p1 
kill -9 $p2
kill -9 $p3
kill -9 $p4
kill -9 $p5

mkdir x y z
ls x y z
rmdir x y z
exit



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-01-24 23:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-23  3:01 [RFC] Hack to use mkdir/rmdir in debugfs Steven Rostedt
2013-01-23  3:47 ` Steven Rostedt
2013-01-23  4:08 ` Greg Kroah-Hartman
2013-01-23  4:31   ` Steven Rostedt
2013-01-23  4:41     ` Greg Kroah-Hartman
2013-01-23  4:52       ` Steven Rostedt
2013-01-23  4:44     ` Steven Rostedt
2013-01-23  4:48       ` Steven Rostedt
2013-01-23  4:55       ` Greg Kroah-Hartman
2013-01-23  5:05         ` Steven Rostedt
2013-01-24 22:39       ` Valdis.Kletnieks
2013-01-24 23:53         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).