Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* (Ubuntu 18.04) NFSv4+sec=krb5 client turns into a DoS device after ticket expires?
@ 2019-09-16  6:33 Peter Eriksson
  2019-09-16 15:12 ` Goetz, Patrick G
  0 siblings, 1 reply; 2+ messages in thread
From: Peter Eriksson @ 2019-09-16  6:33 UTC (permalink / raw)
  To: linux-nfs

We’re investigating a problem where we are seeing a number of Ubuntu 18.04 Linux clients that bombards our NFS Servers with TCP connections. Like 400 requests/second of new TCP connection requests to the NFS server port (with unique source port numbers) which causes the firewall state tracking tables on the server to grow like crazy.

The stream of IP packets we are seeing look like this (times in seconds):

0.001280 Client -> Server SYN
0.001289 Server -> Client SYN+ACK
0.001516 Client -> Server ACK
0.003609 Client -> Server FIN+ACK
0.003615 Server -> Client ACK
0.003620 Server -> Client FIN+ACK
0.003841 Client -> Server ACK
<repeat 200-400 times/s>

Ie, initiate a new connection and then immediately disconnect again.

On the client this coincides with a user which has their home directory mounted via NFS (v4) with sec=krb5 and an expired Kerberos ticket and having an “evolution” process running. This somehow causes rpc.gssd to run a lot...

The rpc.gssd seems to be doing this:

> Tasks: 266 total,   1 running, 193 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  5.0 us,  1.5 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
> KiB Mem : 16303264 total,   955844 free,  3912820 used, 11434600 buff/cache
> KiB Swap:  7812092 total,  7811580 free,      512 used. 11481708 avail Mem 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                      
>   705 root      20   0  328888   4924   3568 S  20.8  0.0   1520:27 rpc.gssd

> epoll_wait(4, [{EPOLLIN, {u32=9, u64=9}}], 32, -1) = 1
> read(9, "mech=krb5 uid=1036985 enctypes=1"..., 32768) = 50
> clone(child_stack=0x7f4746f5ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f4746f609d0, tls=0x7f4746f60700, child_tidptr=0x7f4746f609d0) = 1310
> epoll_wait(4, [{EPOLLIN, {u32=9, u64=9}}], 32, -1) = 1


Evolution has a number of file descriptors open:

> evolution 26916 olaha93  cwd       DIR               0,68       45        4 /home/olaha93 (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93  mem       REG               0,68            229909 /home/olaha93/.cache/mesa_shader_cache/index (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
> evolution 26916 olaha93  mem       REG               0,68            175429 /home/olaha93/.config/dconf/user (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
> evolution 26916 olaha93  mem       REG               0,68            171739 /home/olaha93/.local/share/gvfs-metadata/.nfs0000000000029edb00000333 (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
> evolution 26916 olaha93   15u      REG               0,68    28672   230539 /home/olaha93/.pki/nssdb/cert9.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   16u      REG               0,68    28672   230541 /home/olaha93/.pki/nssdb/key4.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   18u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/e013fa56fec8685e5cb71733b66216b51495a944/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   19u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/1e49c4d51a101a775373e6455c4ce75b1bcc87a9/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   21u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/13f710d8659a79fd42ea3aff53b116f6a67bbbeb/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   22u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/94e55bfdbf6fe8f93ceedef1a7ae247fb97ea2ff/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   23u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/16e1d5d4fcf3c26e45b366759fdd06df1b55b979/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   24u  unknown               0,68                   /home/olaha93/.local/share/evolution/mail/local/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   25u  unknown               0,68                   /home/olaha93/.local/share/evolution/mail/vfolder/folders.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   26u  unknown               0,68                   /home/olaha93/.config/evolution/mail/remote-content.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   27u  unknown               0,68                   /home/olaha93/.config/evolution/mail/properties.db (filur04.it.liu.se:/staff/olaha93)
> evolution 26916 olaha93   29r  unknown               0,68                   /home/olaha93/.local/share/gvfs-metadata/.nfs0000000000029edb00000333 (filur04.it.liu.se:/staff/olaha93)


Repeated errors from evolution in the log files:

> Sep 13 16:01:48 lille71 evolution[26916]: camel_store_summary_disconnect_folder_summary: Store summary 0x559b44194390 is not connected to folder summary 0x7fcaf40428e0
> Sep 13 16:01:48 lille71 evolution[26916]: Unable to load summary: disk I/O error
> Sep 13 16:01:48 lille71 evolution[26916]: GError set over the top of a previous GError or uninitialized memory.
>                                           This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
>                                           The overwriting error message was: disk I/O error

Restarting rpc.gssd doesn’t seem to help. Removing the expired kerberos cache file doesn’t help. So my guess is that evolution somehow triggers new NFS server connections but then the kernel immediately terminates it without ever sending any data to the NFS server.

(Rebooting the client works - but.. :-)


It would have been nice if an expired kerberos cache didn’t cause this aggressive behaviour…

- Peter

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: (Ubuntu 18.04) NFSv4+sec=krb5 client turns into a DoS device after ticket expires?
  2019-09-16  6:33 (Ubuntu 18.04) NFSv4+sec=krb5 client turns into a DoS device after ticket expires? Peter Eriksson
@ 2019-09-16 15:12 ` Goetz, Patrick G
  0 siblings, 0 replies; 2+ messages in thread
From: Goetz, Patrick G @ 2019-09-16 15:12 UTC (permalink / raw)
  To: Peter Eriksson, linux-nfs

This is off topic, but it sounds like you might be accessing IMAP 
folders over NFS?  This is a bad idea if you are -- just use the IMAP 
protocol and use local storage on the IMAP server.  Anything else is a 
recipe for lost mail and endless headaches.

On 9/16/19 1:33 AM, Peter Eriksson wrote:
> We’re investigating a problem where we are seeing a number of Ubuntu 18.04 Linux clients that bombards our NFS Servers with TCP connections. Like 400 requests/second of new TCP connection requests to the NFS server port (with unique source port numbers) which causes the firewall state tracking tables on the server to grow like crazy.
> 
> The stream of IP packets we are seeing look like this (times in seconds):
> 
> 0.001280 Client -> Server SYN
> 0.001289 Server -> Client SYN+ACK
> 0.001516 Client -> Server ACK
> 0.003609 Client -> Server FIN+ACK
> 0.003615 Server -> Client ACK
> 0.003620 Server -> Client FIN+ACK
> 0.003841 Client -> Server ACK
> <repeat 200-400 times/s>
> 
> Ie, initiate a new connection and then immediately disconnect again.
> 
> On the client this coincides with a user which has their home directory mounted via NFS (v4) with sec=krb5 and an expired Kerberos ticket and having an “evolution” process running. This somehow causes rpc.gssd to run a lot...
> 
> The rpc.gssd seems to be doing this:
> 
>> Tasks: 266 total,   1 running, 193 sleeping,   0 stopped,   0 zombie
>> %Cpu(s):  5.0 us,  1.5 sy,  0.0 ni, 93.1 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
>> KiB Mem : 16303264 total,   955844 free,  3912820 used, 11434600 buff/cache
>> KiB Swap:  7812092 total,  7811580 free,      512 used. 11481708 avail Mem
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>>    705 root      20   0  328888   4924   3568 S  20.8  0.0   1520:27 rpc.gssd
> 
>> epoll_wait(4, [{EPOLLIN, {u32=9, u64=9}}], 32, -1) = 1
>> read(9, "mech=krb5 uid=1036985 enctypes=1"..., 32768) = 50
>> clone(child_stack=0x7f4746f5ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f4746f609d0, tls=0x7f4746f60700, child_tidptr=0x7f4746f609d0) = 1310
>> epoll_wait(4, [{EPOLLIN, {u32=9, u64=9}}], 32, -1) = 1
> 
> 
> Evolution has a number of file descriptors open:
> 
>> evolution 26916 olaha93  cwd       DIR               0,68       45        4 /home/olaha93 (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93  mem       REG               0,68            229909 /home/olaha93/.cache/mesa_shader_cache/index (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
>> evolution 26916 olaha93  mem       REG               0,68            175429 /home/olaha93/.config/dconf/user (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
>> evolution 26916 olaha93  mem       REG               0,68            171739 /home/olaha93/.local/share/gvfs-metadata/.nfs0000000000029edb00000333 (filur04.it.liu.se:/staff/olaha93) (stat: Permission denied)
>> evolution 26916 olaha93   15u      REG               0,68    28672   230539 /home/olaha93/.pki/nssdb/cert9.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   16u      REG               0,68    28672   230541 /home/olaha93/.pki/nssdb/key4.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   18u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/e013fa56fec8685e5cb71733b66216b51495a944/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   19u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/1e49c4d51a101a775373e6455c4ce75b1bcc87a9/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   21u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/13f710d8659a79fd42ea3aff53b116f6a67bbbeb/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   22u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/94e55bfdbf6fe8f93ceedef1a7ae247fb97ea2ff/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   23u  unknown               0,68                   /home/olaha93/.cache/evolution/mail/16e1d5d4fcf3c26e45b366759fdd06df1b55b979/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   24u  unknown               0,68                   /home/olaha93/.local/share/evolution/mail/local/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   25u  unknown               0,68                   /home/olaha93/.local/share/evolution/mail/vfolder/folders.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   26u  unknown               0,68                   /home/olaha93/.config/evolution/mail/remote-content.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   27u  unknown               0,68                   /home/olaha93/.config/evolution/mail/properties.db (filur04.it.liu.se:/staff/olaha93)
>> evolution 26916 olaha93   29r  unknown               0,68                   /home/olaha93/.local/share/gvfs-metadata/.nfs0000000000029edb00000333 (filur04.it.liu.se:/staff/olaha93)
> 
> 
> Repeated errors from evolution in the log files:
> 
>> Sep 13 16:01:48 lille71 evolution[26916]: camel_store_summary_disconnect_folder_summary: Store summary 0x559b44194390 is not connected to folder summary 0x7fcaf40428e0
>> Sep 13 16:01:48 lille71 evolution[26916]: Unable to load summary: disk I/O error
>> Sep 13 16:01:48 lille71 evolution[26916]: GError set over the top of a previous GError or uninitialized memory.
>>                                            This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
>>                                            The overwriting error message was: disk I/O error
> 
> Restarting rpc.gssd doesn’t seem to help. Removing the expired kerberos cache file doesn’t help. So my guess is that evolution somehow triggers new NFS server connections but then the kernel immediately terminates it without ever sending any data to the NFS server.
> 
> (Rebooting the client works - but.. :-)
> 
> 
> It would have been nice if an expired kerberos cache didn’t cause this aggressive behaviour…
> 
> - Peter
>>> This message is from an external sender. Learn more about why this <<
>>> matters at https://links.utexas.edu/rtyclf.                        <<

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16  6:33 (Ubuntu 18.04) NFSv4+sec=krb5 client turns into a DoS device after ticket expires? Peter Eriksson
2019-09-16 15:12 ` Goetz, Patrick G

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git