Re: oxenstored memory leak? seems related with XSA-38

* Re: oxenstored memory leak? seems related with XSA-38
@ 2013-07-04  2:48 Liuqiming (John)
  2013-07-04  8:52 ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Liuqiming (John) @ 2013-07-04  2:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Yanqiangjun

Hi all,

Continue my test about oxenstored:

I switch to original C xenstored and test my "broken" vm. The cxenstored do not have the "memory leak" issue. 
So I compared the IO ring handler logic between cxenstored and oxenstored and find out the difference:

In Cxenstord, after got the cons and prod value, a index check will be performed

	if (!check_indexes(cons, prod)) {
		errno = EIO;
		return -1;
	}

static bool check_indexes(XENSTORE_RING_IDX cons, XENSTORE_RING_IDX prod)
{
	return ((prod - cons) <= XENSTORE_RING_SIZE);
}

So any connection has prod - cons > XENSTORE_RING_SIZE will be treated as "bad client", and cxenstored will not handle its IO ring msg any more.

But in oxenstored, there is just a simple comparison between prod and cons
    if (prod == cons)
		return 0;

so there leaves a security hole to a guest vm user who can manipulate prod and cons to make oxenstored increasing memory usage and out of service.

I managed to create a patch to fix this and I'm testing it on xen4.2.2. Will send out soon.

> -----Original Message-----
> From: Liuqiming (John)
> Sent: Monday, July 01, 2013 9:47 PM
> To: 'xen-devel@lists.xen.org'; 'ian.jackson@eu.citrix.com';
> 'ian.campbell@citrix.com'
> Cc: Yanqiangjun
> Subject: oxenstored memory leak? seems related with XSA-38
> 
> Hi all,
> 
> I test starting vm using xen-4.2.2 release with oxenstored, and got a problem
> may be related with XSA-38
> (http://lists.xen.org/archives/html/xen-announce/2013-02/msg00005.html).
> 
> When vm started, oxenstored memory usage keep increasing, and it took
> 1.5G memory at last. Vm hanged at loading OS screen.
> 
> Here is the output of top:
> 
> top - 20:18:32 up 1 day,  3:09,  5 users,  load average: 0.99, 0.63, 0.32
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.5 us,  1.8 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
> st
> KiB Mem:  46919428 total, 46699012 used,   220416 free,    36916
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 44260932
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0  955m 926m 1068 R  99.9  2.0   4:54.14
> oxenstored
> 
> 
> top - 20:19:05 up 1 day,  3:09,  5 users,  load average: 0.99, 0.67, 0.34
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.6 us,  1.6 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
> st
> KiB Mem:  46919428 total, 46708564 used,   210864 free,    36964
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 44168380
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0 1048m 1.0g 1068 R 100.2  2.2   5:27.03
> oxenstored
> 
> 
> 
> top - 20:21:35 up 1 day,  3:12,  5 users,  load average: 1.00, 0.80, 0.44
> Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  4.7 us,  1.6 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0
> st
> KiB Mem:  46919428 total, 46703052 used,   216376 free,    37208
> buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free, 43682968
> cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+
> COMMAND
>   806 root      20   0 1551m 1.5g 1068 R 100.2  3.3   7:56.10
> oxenstored
> 
> And oxenstored log got these over and over again:
> 
> [20130701T12:27:14.290Z] D8           invalid
> device/suspend/event-channel
> 
> ..
> [20130701T12:27:14.290Z] D8.1937077039 invalid
> /event-channel
> 
>   ..
> [20130701T12:27:14.290Z] D8.1852727656
> invalid
> 
>             ..
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> [20130701T12:27:14.290Z] D8           debug
> 
> My vm is a windows guest and has GPL PVDriver installed. This problem is
> hard to reproduce, and after a hard reboot, everything looks normal.
> 
> I guess it's something wrong with the xenbus IO Ring, so I investigated the
> code:
> 
> 1) oxenstored and xenbus in vm using a shared page to communicate with
> each other
>   struct xenstore_domain_interface {
>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>     XENSTORE_RING_IDX req_cons, req_prod;
>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
> };
> 
> 2) xenbus in vm put request in req and increase req_prod, then send a event
> to oxenstored
> 3) oxenstored calculates how many to read using req_cons and req_prod,
> and after read oxenstored increase req_cons to make it equals req_prod
> which means no request pending.
> 4) oxenstored put responds in rsp and increase rsp_prod, then send a event
> to vm, xenbus in vm using similar logic to handle the rsp ring.
> 
>  Am I correct?
> 
> So, I'm curious about what happened when req_cons larger than req_prod
> (this can be caused by buggy PV Driver or malicious guest user), it seems
> oxenstored will fell in a endless loop.
> 
> Is this what XSA-38 talk about?
> 
> I built a pvdriver which will set req_prod to 0 after several xenstore
> operation, and test it on xen-unstable.hg make sure all XSA-38 patches
> applied.
> It seems that the problem I first met reproduced. Oxenstored will took a lot
> of memory eventually.
> 
> Could anyone help me about this issue?
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread