linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/2] oProfile: oops when profile_pc() return ~0LU
@ 2007-10-21 12:08 Philippe Elie
  2007-10-23  2:38 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Philippe Elie @ 2007-10-21 12:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, Sami Farin

From: Philippe Elie <phil.el@wanadoo.fr>

Instruction pointer returned by profile_pc() can be a random value. This
break the assumption than we can safely set struct op_sample.eip field to
a magic value to signal to the per-cpu buffer reader side special event
like task switch ending up in a segfault in get_task_mm() when profile_pc()
return ~0UL. Fixed by exchanging the meaning of eip/event field for these
specials events.

Special thanks to Sami Farin who reported the oops and helped patiently
to narrow down the problem.

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Sami Farin <safari-kernel@safari.iki.fi>

---
 drivers/oprofile/buffer_sync.c |   12 ++++++------
 drivers/oprofile/cpu_buffer.c  |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

Beside Sami, Jesse Barnes already reported this bug two years ago
but I've been unable to understand it at this time.

diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
index 8134c7e..7a4ccc7 100644
--- a/drivers/oprofile/buffer_sync.c
+++ b/drivers/oprofile/buffer_sync.c
@@ -514,21 +514,21 @@ void sync_buffer(int cpu)
 	for (i = 0; i < available; ++i) {
 		struct op_sample * s = &cpu_buf->buffer[cpu_buf->tail_pos];
  
-		if (is_code(s->eip)) {
-			if (s->event <= CPU_IS_KERNEL) {
+		if (is_code(s->event)) {
+			if (s->eip <= CPU_IS_KERNEL) {
 				/* kernel/userspace switch */
-				in_kernel = s->event;
+				in_kernel = s->eip;
 				if (state == sb_buffer_start)
 					state = sb_sample_start;
-				add_kernel_ctx_switch(s->event);
-			} else if (s->event == CPU_TRACE_BEGIN) {
+				add_kernel_ctx_switch(s->eip);
+			} else if (s->eip == CPU_TRACE_BEGIN) {
 				state = sb_bt_start;
 				add_trace_begin();
 			} else {
 				struct mm_struct * oldmm = mm;
 
 				/* userspace context switch */
-				new = (struct task_struct *)s->event;
+				new = (struct task_struct *)s->eip;
 
 				release_mm(oldmm);
 				mm = take_tasks_mm(new);
diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index a83c3db..c856530 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -156,7 +156,7 @@ add_sample(struct oprofile_cpu_buffer * cpu_buf,
 static inline void
 add_code(struct oprofile_cpu_buffer * buffer, unsigned long value)
 {
-	add_sample(buffer, ESCAPE_CODE, value);
+	add_sample(buffer, value, ESCAPE_CODE);
 }
 
 /* This must be safe from any context. It's safe writing here


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch 1/2] oProfile: oops when profile_pc() return ~0LU
  2007-10-21 12:08 [patch 1/2] oProfile: oops when profile_pc() return ~0LU Philippe Elie
@ 2007-10-23  2:38 ` Linus Torvalds
  2007-10-23 10:10   ` Sami Farin
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2007-10-23  2:38 UTC (permalink / raw)
  To: Philippe Elie; +Cc: Linux Kernel Mailing List, Andrew Morton, Sami Farin


This set of two patches look ok by me, but I'd like sign-offs.. Also, were 
they tested and found to fix the problem by Sami?

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 1/2] oProfile: oops when profile_pc() return ~0LU
  2007-10-23  2:38 ` Linus Torvalds
@ 2007-10-23 10:10   ` Sami Farin
  2007-10-23 16:13     ` Philippe Elie
  0 siblings, 1 reply; 6+ messages in thread
From: Sami Farin @ 2007-10-23 10:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Philippe Elie, Linux Kernel Mailing List, Andrew Morton

On Mon, Oct 22, 2007 at 19:38:10 -0700, Linus Torvalds wrote:
> 
> This set of two patches look ok by me, but I'd like sign-offs.. Also, were 
> they tested and found to fix the problem by Sami?
> 
> 		Linus

The previous patch I tested by Philippe, oprof-fix-profile_pc-use.patch,
worked ok, but with this latest patch oprofiled aborts.
But kernel does not oops or print msgs.

(gdb) bt
#0  0x00007f68d66fee65 in raise () from /lib64/libc.so.6
#1  0x00007f68d6700910 in abort () from /lib64/libc.so.6
#2  0x00007f68d7053974 in code_unknown (trans=0x7fff5bcd1c60) at opd_trans.c:140
#3  0x00007f68d7053eff in opd_process_samples (buffer=0x7f68d6ef3010 "��������\002", count=99349) at opd_trans.c:362
#4  0x00007f68d7050408 in opd_do_samples (opd_buf=0x7f68d6ef3010 "��������\002", count=794792) at init.c:131
#5  0x00007f68d70504e9 in opd_do_read (buf=0x7f68d6ef3010 "��������\002", size=1048576) at init.c:181
#6  0x00007f68d70506ca in opd_26_start () at init.c:265
#7  0x00007f68d7051468 in main (argc=9, argv=0x7fff5bcd1e98) at oprofiled.c:501

(gdb) frame 2
#2  0x00007f68d7053974 in code_unknown (trans=0x7fff5bcd1c60) at opd_trans.c:140
140             abort();

(gdb) p *trans
$1 = {buffer = 0x7f68d6ef3c38 "�)\200����", remaining = 98960, tracing = TRACING_OFF, current = 0x7f68d89dd010, last = 0x7f68d89dd010, anon = 0x0, 
  last_anon = 0x0, cookie = 18446604436320244048, app_cookie = 18446604436320244048, pc = 18446744071568019675, last_pc = 18446744071568019675, 
  event = 5, in_kernel = 1, cpu = 1, tid = 10074, tgid = 10074, embedded_offset = 18446744073709551615}
(gdb) 

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 1/2] oProfile: oops when profile_pc() return ~0LU
  2007-10-23 10:10   ` Sami Farin
@ 2007-10-23 16:13     ` Philippe Elie
  2007-10-23 16:49       ` Sami Farin
  2007-10-23 16:55       ` Linus Torvalds
  0 siblings, 2 replies; 6+ messages in thread
From: Philippe Elie @ 2007-10-23 16:13 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton, Sami Farin

On Tue, 23 Oct 2007 at 13:10 +0000, Sami Farin wrote:

> On Mon, Oct 22, 2007 at 19:38:10 -0700, Linus Torvalds wrote:
> > 
> > This set of two patches look ok by me, but I'd like sign-offs.. Also, were 
> > they tested and found to fix the problem by Sami?
> > 
> > 		Linus

For the signed-offs I thought the From: was an implicit Signed-offs.

Test was done privately, Sami helped to narrow down the trouble, but
he didn't test the last patch, nothing bad on Sami side, I was too
confident the fix was obvious after narrowing it.

> 
> The previous patch I tested by Philippe, oprof-fix-profile_pc-use.patch,
> worked ok, but with this latest patch oprofiled aborts.
> But kernel does not oops or print msgs.

argh, I just moved the wrong eip from kernel to user space where the same
problem occur too, *sighs*, since I can't reproduce Sami problem, my own
test obviously worked...

Sami, can you test this new patch. After testing can you report
the contents of /dev/oprofile/stats/cpu*/sample_invalid_eip ?

Linus, there is two way to fix this problem, the attached patch fix it
by sanitizing the sampled eip, the other is to replace the use of
profile_pc(); by instruction_pointer(); in cpu_buffer.c, that one was
tested by Sami  but 1) it'll break the 'use oprofile as a sort of lockometer'
2) I think sanitizing the eip will be necessary anyway as I'm not really
confident than instruction_pointer() can never return weird eip on some
weird arch and/or some weird circumstances.

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index a83c3db..c93d3d2 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -64,6 +64,8 @@ int alloc_cpu_buffers(void)
 		b->head_pos = 0;
 		b->sample_received = 0;
 		b->sample_lost_overflow = 0;
+		b->backtrace_aborted = 0;
+		b->sample_invalid_eip = 0;
 		b->cpu = i;
 		INIT_DELAYED_WORK(&b->work, wq_sync_buffer);
 	}
@@ -175,6 +177,11 @@ static int log_sample(struct oprofile_cpu_buffer * cpu_buf, unsigned long pc,
 
 	cpu_buf->sample_received++;
 
+	if (pc == ESCAPE_CODE) {
+		cpu_buf->sample_invalid_eip++;
+		return 0;
+	}
+
 	if (nr_available_slots(cpu_buf) < 3) {
 		cpu_buf->sample_lost_overflow++;
 		return 0;
diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h
index 49900d9..c66c025 100644
--- a/drivers/oprofile/cpu_buffer.h
+++ b/drivers/oprofile/cpu_buffer.h
@@ -42,6 +42,7 @@ struct oprofile_cpu_buffer {
 	unsigned long sample_received;
 	unsigned long sample_lost_overflow;
 	unsigned long backtrace_aborted;
+	unsigned long sample_invalid_eip;
 	int cpu;
 	struct delayed_work work;
 } ____cacheline_aligned;
diff --git a/drivers/oprofile/oprofile_stats.c b/drivers/oprofile/oprofile_stats.c
index f0acb66..d1f6d77 100644
--- a/drivers/oprofile/oprofile_stats.c
+++ b/drivers/oprofile/oprofile_stats.c
@@ -26,6 +26,8 @@ void oprofile_reset_stats(void)
 		cpu_buf = &cpu_buffer[i]; 
 		cpu_buf->sample_received = 0;
 		cpu_buf->sample_lost_overflow = 0;
+		cpu_buf->backtrace_aborted = 0;
+		cpu_buf->sample_invalid_eip = 0;
 	}
  
 	atomic_set(&oprofile_stats.sample_lost_no_mm, 0);
@@ -61,6 +63,8 @@ void oprofile_create_stats_files(struct super_block * sb, struct dentry * root)
 			&cpu_buf->sample_lost_overflow);
 		oprofilefs_create_ro_ulong(sb, cpudir, "backtrace_aborted",
 			&cpu_buf->backtrace_aborted);
+		oprofilefs_create_ro_ulong(sb, cpudir, "sample_invalid_eip",
+			&cpu_buf->sample_invalid_eip);
 	}
  
 	oprofilefs_create_ro_atomic(sb, dir, "sample_lost_no_mm",


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch 1/2] oProfile: oops when profile_pc() return ~0LU
  2007-10-23 16:13     ` Philippe Elie
@ 2007-10-23 16:49       ` Sami Farin
  2007-10-23 16:55       ` Linus Torvalds
  1 sibling, 0 replies; 6+ messages in thread
From: Sami Farin @ 2007-10-23 16:49 UTC (permalink / raw)
  To: Philippe Elie; +Cc: Linus Torvalds, Linux Kernel Mailing List, Andrew Morton

On Tue, Oct 23, 2007 at 18:13:21 +0200, Philippe Elie wrote:
> On Tue, 23 Oct 2007 at 13:10 +0000, Sami Farin wrote:
> 
> > On Mon, Oct 22, 2007 at 19:38:10 -0700, Linus Torvalds wrote:
> > > 
> > > This set of two patches look ok by me, but I'd like sign-offs.. Also, were 
> > > they tested and found to fix the problem by Sami?
> > > 
> > > 		Linus
> 
> For the signed-offs I thought the From: was an implicit Signed-offs.
> 
> Test was done privately, Sami helped to narrow down the trouble, but
> he didn't test the last patch, nothing bad on Sami side, I was too
> confident the fix was obvious after narrowing it.
> 
> > 
> > The previous patch I tested by Philippe, oprof-fix-profile_pc-use.patch,
> > worked ok, but with this latest patch oprofiled aborts.
> > But kernel does not oops or print msgs.
> 
> argh, I just moved the wrong eip from kernel to user space where the same
> problem occur too, *sighs*, since I can't reproduce Sami problem, my own
> test obviously worked...
> 
> Sami, can you test this new patch. After testing can you report
> the contents of /dev/oprofile/stats/cpu*/sample_invalid_eip ?

cat /dev/oprofile/stats/cpu?/sample_invalid_eip; sleep 10; cat /dev/oprofile/stats/cpu?/sample_invalid_eip
834
835
0
0
906
911
0
0

For some reason there are four directories, but I have
only two CPUs in reality.

And oprofiled survives the test OK.

> Linus, there is two way to fix this problem, the attached patch fix it
> by sanitizing the sampled eip, the other is to replace the use of
> profile_pc(); by instruction_pointer(); in cpu_buffer.c, that one was
> tested by Sami  but 1) it'll break the 'use oprofile as a sort of lockometer'
> 2) I think sanitizing the eip will be necessary anyway as I'm not really
> confident than instruction_pointer() can never return weird eip on some
> weird arch and/or some weird circumstances.

-- 
Do what you love because life is too short for anything else.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 1/2] oProfile: oops when profile_pc() return ~0LU
  2007-10-23 16:13     ` Philippe Elie
  2007-10-23 16:49       ` Sami Farin
@ 2007-10-23 16:55       ` Linus Torvalds
  1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2007-10-23 16:55 UTC (permalink / raw)
  To: Philippe Elie; +Cc: Linux Kernel Mailing List, Andrew Morton, Sami Farin



On Tue, 23 Oct 2007, Philippe Elie wrote:
> 
> For the signed-offs I thought the From: was an implicit Signed-offs.

No, there are no implicit sign-offs. The point of sign-offs is that it 
makes the copyright and flow of patches explicit, so an "implicit 
sign-off" would defeat the whole point.

> Test was done privately, Sami helped to narrow down the trouble, but
> he didn't test the last patch, nothing bad on Sami side, I was too
> confident the fix was obvious after narrowing it.

Well, I just wanted an ack that it was tested, since it seemed a bit 
subtle and (like the sign-offs) that explicit "yes, this was tested" was 
missing. Looks like I was right in asking for it:

> > The previous patch I tested by Philippe, oprof-fix-profile_pc-use.patch,
> > worked ok, but with this latest patch oprofiled aborts.
> > But kernel does not oops or print msgs.
> 
> argh, I just moved the wrong eip from kernel to user space where the same
> problem occur too, *sighs*, since I can't reproduce Sami problem, my own
> test obviously worked...
> 
> Sami, can you test this new patch. After testing can you report
> the contents of /dev/oprofile/stats/cpu*/sample_invalid_eip ?
> 
> Linus, there is two way to fix this problem, the attached patch fix it
> by sanitizing the sampled eip,

I'm perfectly happy with the attached patch, I just want it to be 
properly tested and have all the sign-offs (and explanations etc) in 
place, and I can apply it.

Of course, I cannot think of a single architecture where an EIP of ~0UL is 
valid anyway, so I'm also not opposed to just keeping ~0UL as the magic 
value.

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-10-23 16:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-21 12:08 [patch 1/2] oProfile: oops when profile_pc() return ~0LU Philippe Elie
2007-10-23  2:38 ` Linus Torvalds
2007-10-23 10:10   ` Sami Farin
2007-10-23 16:13     ` Philippe Elie
2007-10-23 16:49       ` Sami Farin
2007-10-23 16:55       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).