linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.2.18: Thread problem with smbfs
@ 2000-12-19  9:33 Hans-Joachim Baader
  2000-12-19 10:58 ` Urban Widmark
  0 siblings, 1 reply; 5+ messages in thread
From: Hans-Joachim Baader @ 2000-12-19  9:33 UTC (permalink / raw)
  To: linux-kernel

Hi,

I hava a strange problem with smbfs. My application creates threads
that copy files from a mounted SMB share to the local disk. When
I run the application normally, there's no problem. However when
I run it in gdb 4.18 or 5.0, one of the threads goes into the D state
(not always), and the whole program including gdb hangs.

With strace, these are the last lines of output I get:

1854  sched_get_priority_max(0)         = 0
1854  sched_get_priority_min(0)         = 0
1854  brk(0x80ca000)                    = 0x80ca000
1854  pipe([9, 10])                     = 0
1854  clone()                           = 1856
1854  write(10, "\300\357\215@\5\0\0\0\24\364\377\277\256^\204@\370\377\215@\240\353\215@\276\271w@Q\270w@\274Dx@\240\353\215@Q\270w@\274Dx@\240\353\215@\260\357\215@\304\357\215@H\364\377\277\300\357\215@\370\377\215@\240\353\215@d\364\377\277\276\271w@\274Dx@\260\357\215@\256^\204@\370\377\215@\276\271w@\274Dx@\260\357\215@\2\0\0\0T\365\377\277G\200\0@>[w@\324Vf@D:\1@`R\216@\3\0\0\0p\365\377\277", 148) = 148
1854  rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
1854  write(10, "\0!x@\0\0\0\0\360\365\377\277\0 q@\340`\f\10\0\0\0\200\0\0\0\0\f\0\0\0P\357\22@\f\0\0\0l\365\377\277\\.d@\204\342\22@\354\215\371\7\234\365\377\277\"@f@\314\233\315\4\250\365\377\277\\.d@\240\365\377\277A\245\0@X\340\22@@R\216@\7\0\0\0\216\244\0@\370\227v@\340`\f\10P\234\v\10|\263d@H\236v@D;w@\24\366\377\277\360\246\0@\0 q@2\0\0\0p\232w@x\340\22@", 148) = 148
1854  rt_sigprocmask(SIG_SETMASK, NULL, [RT_0], 8) = 0
1854  rt_sigsuspend([] <unfinished ...>

In the syslog I find the following:

Dec 18 19:07:58 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:58 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:07:59 George kernel: smb_retry: sucessful, new pid=16002, generation=38
Dec 18 19:07:59 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:59 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:07:59 George kernel: smb_retry: sucessful, new pid=16002, generation=39
Dec 18 19:07:59 George kernel: smb_get_length: recv error = 512
Dec 18 19:07:59 George kernel: smb_trans2_request: result=-512, setting invalid
Dec 18 19:08:00 George kernel: smb_retry: sucessful, new pid=16002, generation=40

and so on, endlessly. So, AFAIK,  smbfs thinks it has lost connection and
tells smbmount to re-establish it, which succeeds (at least smbmount
thinks so). This happens several times per second.

However, with processes instead of threads, without the debugger, or
when reading from a local filesystem instead of a SMB filesystem, there
is no problem.

Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.

Regards,
hjb
-- 
http://www.pro-linux.de/ - Germany's largest volunteer Linux support site
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.2.18: Thread problem with smbfs
  2000-12-19  9:33 2.2.18: Thread problem with smbfs Hans-Joachim Baader
@ 2000-12-19 10:58 ` Urban Widmark
  2000-12-20 20:41   ` Hans-Joachim Baader
  0 siblings, 1 reply; 5+ messages in thread
From: Urban Widmark @ 2000-12-19 10:58 UTC (permalink / raw)
  To: Hans-Joachim Baader; +Cc: linux-kernel

On Tue, 19 Dec 2000, Hans-Joachim Baader wrote:

> and so on, endlessly. So, AFAIK,  smbfs thinks it has lost connection and
> tells smbmount to re-establish it, which succeeds (at least smbmount
> thinks so). This happens several times per second.

-512 means that the recv was interrupted by a signal, or rather, the
current process has a signal maybe the recv was interrupted, maybe there
is a problem with the connection, better reconnect.

Still, it's better than pre-2.2.18 where smbmount wouldn't stay alive ...

I don't really know how signal delivery works within the kernel, but
smb_trans2_request tries to disable some signals. That does not work
(completely?) so either it needs fixing or the -512 errno needs to be
handled.

Why so bad in gdb? perhaps it causes more signals.
Why does one thread end up in D state? don't know.


> Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.

A small testprogram that causes this would be nice. The -512 is easy to
reproduce but I haven't seen the 'D' before.

If someone is interested the relevant code is fs/smbfs/sock.c
(smb_trans2_request, ..., _recvfrom)

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.2.18: Thread problem with smbfs
  2000-12-19 10:58 ` Urban Widmark
@ 2000-12-20 20:41   ` Hans-Joachim Baader
  2001-01-02 21:13     ` Urban Widmark
  0 siblings, 1 reply; 5+ messages in thread
From: Hans-Joachim Baader @ 2000-12-20 20:41 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel

Hi,

Urban Widmark wrote:

> I don't really know how signal delivery works within the kernel, but
> smb_trans2_request tries to disable some signals. That does not work
> (completely?) so either it needs fixing or the -512 errno needs to be
> handled.
> 
> Why so bad in gdb? perhaps it causes more signals.
> Why does one thread end up in D state? don't know.
> 
> 
> > Kernel 2.2.18, smbfs as a module. I can provide more info if necessary.
> 
> A small testprogram that causes this would be nice. The -512 is easy to
> reproduce but I haven't seen the 'D' before.
> 
> If someone is interested the relevant code is fs/smbfs/sock.c
> (smb_trans2_request, ..., _recvfrom)

Here is a test program to reproduce this. Don't worry about
missing error checks and so on, it's just a quick hack.
Create the required files file1..file5 on a SMB share and edit
the #define accordingly. File sizes of 1-2 MB should suffice.
Then run the program. It should copy the files to the current
directory. Then run it under gdb. It should hang until you kill
gdb.

I tested only with a NT 4 server (sp 5 or 6).

Regards,
hjb

#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

/* Size of the blocks we read from a file. */
static const int ChunkSize = 8192;

/* Path on the mounted SMB share from which we copy files */
#define SourcePath "/mnt/net/test"

struct CopyThreadInfo
{
    char*        src;
    char*        dst;
};

/* returns 1 on success */
int CopyFile(char* src, char* dst)
{
    char        buffer[ChunkSize];
    int         f, g;
    ssize_t     nRet;
    int         nError;

    if ((f = open(src, O_RDONLY)) < 0)
        return 0;

    g = open(dst, O_WRONLY | O_CREAT | O_TRUNC, 0666);
    if (g < 0)
    {
        close(f);
        return 0;
    }

    do
    {
        nRet = read(f, buffer, sizeof(buffer));
        if (nRet < 0 && errno == EINTR)
            nRet = 0;
        if (nRet < 0)
        {
            return 0;
        }
        if (nRet > 0)
            nRet = write(g, buffer, nRet);
    } while (nRet > 0);

    close(g);
    close(f);

    if (nRet < 0)
        return 0;

    return 1;
}

void* Copy(struct CopyThreadInfo *info)
{
    CopyFile(info->src, info->dst);
    return NULL;
}

void Fetch(char* name)
{
    char src[4096];
    char dst[4096];

    pthread_attr_t attr;
    pthread_t pid;
    struct CopyThreadInfo* pCopy = (struct CopyThreadInfo *) malloc(sizeof(struct CopyThreadInfo));

    strcpy(src, SourcePath);
    strcat(src, name);
    strcpy(dst, name);

    pCopy->src = strdup(src);
    pCopy->dst = strdup(dst);

    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    pthread_create(&pid, &attr, Copy, pCopy);
}

int main()
{
	Fetch("file1");
	Fetch("file2");
	Fetch("file3");
	Fetch("file4");
	Fetch("file5");
	while(1)
		;
	return 0;
}


-- 
http://www.pro-linux.de/ - Germany's largest volunteer Linux support site
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.2.18: Thread problem with smbfs
  2000-12-20 20:41   ` Hans-Joachim Baader
@ 2001-01-02 21:13     ` Urban Widmark
  2001-01-03 10:16       ` Hans-Joachim Baader
  0 siblings, 1 reply; 5+ messages in thread
From: Urban Widmark @ 2001-01-02 21:13 UTC (permalink / raw)
  To: Hans-Joachim Baader; +Cc: linux-kernel

On Wed, 20 Dec 2000, Hans-Joachim Baader wrote:

> Then run the program. It should copy the files to the current
> directory. Then run it under gdb. It should hang until you kill
> gdb.

Hello again
(Sorry for the long response time but this really is the busiest time of
 the year, or maybe it's the food and drink that is slowing me down :)

Anyway,
gdb is doing strange things to your testprogram on ext2 as well. Does it
work for you? I have not been able to reproduce a gdb hang (you do know
that there is a while(1); in main ... ;-), but it generates a lot of smbfs
messages and in one case made smbfs stop working.

Your program modified to take path as argument and print a message on
entry/exit of the copy function. Here run with files on ext2.

GNU gdb 5.0
...
(gdb) run tmp
Starting program: /home/puw/src/smbfs/thread-test tmp
[New Thread 1024 (LWP 2456)]
[New Thread 2049 (LWP 2463)]
[New Thread 1026 (LWP 2464)]
copy()  tmp/file1 -> out/file1
copy() -- exit
[New Thread 2051 (LWP 2465)]

Program exited normally.
(gdb) quit

	Hmm, strange. Why does it only copy one file? Looking at the last
	process gives a sleeping process in rt_sigsuspend, like you
	reported in your strace. Am I using gdb incorrectly?

% ps -lw 2465
  F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY        TIME CMD
040 S   501  2465     1  0  60   0    -  1366 rt_sig pts/3      0:00 /home/puw/src/smbfs/thread-test tmp


The patch below vs 2.2.18 should remove the -512 (-ERESTARTSYS) errors.

But I don't like it at all. It blocks all signals, including SIGKILL, for
a while. The problem is that tcp_recvmsg checks if there is a signal (any
signal) and aborts with -ERESTARTSYS (a comment says it only cares about
SIGURG, maybe that could be changed instead).

Could you test if this fixes the gdb problem? And try gdb with all files
on ext2 too. For me there is no difference between that and smbfs vs a
NT4.

/Urban


--- linux-2.2.18-orig/fs/smbfs/sock.c	Wed Dec 13 21:27:44 2000
+++ linux/fs/smbfs/sock.c	Tue Jan  2 21:19:03 2001
@@ -30,11 +30,13 @@
 
 static int
 _recvfrom(struct socket *socket, unsigned char *ubuf, int size,
-	  unsigned flags)
+	  unsigned rflags)
 {
 	struct iovec iov;
 	struct msghdr msg;
 	struct scm_cookie scm;
+	sigset_t old_set;
+	unsigned long flags;
 
 	msg.msg_name = NULL;
 	msg.msg_namelen = 0;
@@ -43,11 +45,33 @@
 	msg.msg_control = NULL;
 	iov.iov_base = ubuf;
 	iov.iov_len = size;
-	
+
 	memset(&scm, 0,sizeof(scm));
-	size=socket->ops->recvmsg(socket, &msg, size, flags, &scm);
-	if(size>=0)
-		scm_recv(socket,&msg,&scm,flags);
+
+	/*
+	 * block all signals to avoid -ERESTARTSYS problem in recvmsg
+	 *
+	 * FIXME: changing the signal mask is done elsewhere too.
+	 * This code removes the ability to SIGKILL a process that has hung in
+	 * recvmsg (does it? I'm guessing ...).
+	 * Use poll/timeout to ensure progress?
+	 */
+	spin_lock_irqsave(&current->sigmask_lock, flags);
+	old_set = current->blocked;
+	siginitsetinv(&current->blocked, 0);
+	recalc_sigpending(current);
+	spin_unlock_irqrestore(&current->sigmask_lock, flags);
+
+	size = socket->ops->recvmsg(socket, &msg, size, rflags, &scm);
+	if (size >= 0)
+		scm_recv(socket, &msg, &scm, rflags);
+
+	/* restore old signal mask */
+	spin_lock_irqsave(&current->sigmask_lock, flags);
+	current->blocked = old_set;
+	recalc_sigpending(current);
+	spin_unlock_irqrestore(&current->sigmask_lock, flags);
+
 	return size;
 }
 
@@ -529,7 +553,7 @@
 				buf_len = server->packet_size;
 			buf_len = smb_round_length(buf_len);
 			if (buf_len > SMB_MAX_PACKET_SIZE)
-				goto out_no_mem;
+				goto out_too_long;
 
 			rcv_buf = smb_vmalloc(buf_len);
 			if (!rcv_buf)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.2.18: Thread problem with smbfs
  2001-01-02 21:13     ` Urban Widmark
@ 2001-01-03 10:16       ` Hans-Joachim Baader
  0 siblings, 0 replies; 5+ messages in thread
From: Hans-Joachim Baader @ 2001-01-03 10:16 UTC (permalink / raw)
  To: Urban Widmark; +Cc: Hans-Joachim Baader, linux-kernel

Hi Urban,

> Anyway,
> gdb is doing strange things to your testprogram on ext2 as well. Does it
> work for you? I have not been able to reproduce a gdb hang (you do know
> that there is a while(1); in main ... ;-), but it generates a lot of smbfs
> messages and in one case made smbfs stop working.

I put the while(1) there to give all threads time to do their work.
You know, it was just a quick & dirty test case.

> 	Hmm, strange. Why does it only copy one file? Looking at the last
> 	process gives a sleeping process in rt_sigsuspend, like you
> 	reported in your strace. Am I using gdb incorrectly?

I don't think so, but I'm not a gdb expert. In any case, I did test
the program on ext2 and it behaved correctly all the time. You don't
need gdb to reproduce the smbfs problems, you can also use strace.
So there's nothing wrong with gdb.

> The patch below vs 2.2.18 should remove the -512 (-ERESTARTSYS) errors.
> 
> But I don't like it at all. It blocks all signals, including SIGKILL, for
> a while. The problem is that tcp_recvmsg checks if there is a signal (any
> signal) and aborts with -ERESTARTSYS (a comment says it only cares about
> SIGURG, maybe that could be changed instead).
> 
> Could you test if this fixes the gdb problem? And try gdb with all files
> on ext2 too. For me there is no difference between that and smbfs vs a
> NT4.

It seems to work perfectly. I tested with up to 10 threads in 2
simulteneous processes with both ext2 and smbfs. I'll do further
testing in the next days.

Many thanks for fixing that.

Regards,
hjb
-- 
Pro-Linux - Germany's largest volunteer Linux support site
http://www.pro-linux.de/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-01-03 10:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-19  9:33 2.2.18: Thread problem with smbfs Hans-Joachim Baader
2000-12-19 10:58 ` Urban Widmark
2000-12-20 20:41   ` Hans-Joachim Baader
2001-01-02 21:13     ` Urban Widmark
2001-01-03 10:16       ` Hans-Joachim Baader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).