Tuesday, May 19, 2009

Minimize Code Explosion of Generic Type

Generic is added to .net framework since version 2, which highly increase the re-usability of commonly used algorithms. It's well known that jit compiler will generate concrete type with given generic type argument at run time. So, it's possible that there will be code explosion if a lot of concrete types are created.

What kind of explosion?
According to the compilation model of .net application. The C#/VB code is first compiled into IL code. Then the jit compiler will compile the IL code into native code on demand. The jit compiler will also generate concrete type with specified type arguments. So, there is only one copy of IL code with generic type argument still in place.
What get duplicated is the native code generated by jit compiler. There is a copy for every method for each concrete type.
Another kind of data has duplication is EEClass and MethodTable. EEClass and MethodTable is type specific data. Strictly speaking, such data don't get duplicated because they are unique to each concrete type.

How .net tries to avoid explosion

In .net framework, two methods are adopted to minimize code explosion.
1. Different invokes of a generic method with the same type argument share the same copy of native code. This only takes effect when these invokes are in the same appdomain.
2. The CLR considers all reference type arguments to be identical. It does this based on the fact that reference variables are pointers (kind of, not accurate expression) to object on the heap. They can be manipulated in the same way.

Verify the optimization
In order to verify that the optimization method acutally behaves that way, we create the following sample and debug it with windbg.

static void Main()
List<int> intList = new List<int>();
List<object> objList = new List<object>();
List<system.delegate> delList = new List<system.delegate>();

Input sxe ld:mscorlib to instruct windbg to break when the application loads mscorlib module
When windbg breaks, input .loadby sos mscorwks to load sos.dll
Input .chain to confirm the sos extension has been successfully loaded
C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos: image 2.0.50727.3053, API 1.0.0, built Fri Jul 25 22:08:38 2008
[path: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll]

Input !bpmd Test.exe Test.Program.Main to set a managed breakpoint in Main method
Input p command several times until we see System.Collections.Generic.List`1 object on the managed stack with !dso command. The output below shows objects we are interested in:
ESP/REG  Object   Name
0019e3a4 01e6bc04 System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
0019e5c4 01e6bbec System.Collections.Generic.List`1[[System.Object, mscorlib]]
0019e5c8 01e6bbc8 System.Collections.Generic.List`1[[System.Int32, mscorlib]]

Input !do 01e6bc04 to dump the first object and we get:
Name: System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
MethodTable: 008126a4
EEClass: 698fca68
Size: 24(0x18) bytes
MT    Field   Offset                 Type VT     Attr    Value Name
69b140bc  40009d8        4      System.Object[]  0 instance 01e6bc1c _items
69b42b38  40009d9        c         System.Int32  1 instance        0 _size
69b42b38  40009da       10         System.Int32  1 instance        0 _version
69b40508  40009db        8        System.Object  0 instance 00000000 _syncRoot
69b140bc  40009dc        0      System.Object[]  0   shared   static _emptyArray
Domain:Value dynamic statics NYI

Input !dumpmt -md 008126a4 to dump method table for this object. We get:
EEClass: 698fca68
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Delegate, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69aef320   69913310   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
69b03f00   69913318   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].System.Collections.IList.Add(System.Object)

And we do the same thing to dump method table for the 2nd and 3rd object. The output is:
EEClass: 698fca68
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Object, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69aef320   69913310   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].Add(System.__Canon)
69b03f00   69913318   PreJIT System.Collections.Generic.List`1[[System.__Canon, mscorlib]].System.Collections.IList.Add(System.Object)

EEClass: 698f6c3c
Module: 698d1000
Name: System.Collections.Generic.List`1[[System.Int32, mscorlib]]
mdToken: 0200028d  (C:\Windows\assembly\GAC_32\mscorlib\\mscorlib.dll)
BaseSize: 0x18
ComponentSize: 0x0
Number of IFaces in IFaceMap: 6
Slots in VTable: 77
MethodDesc Table
Entry MethodDesc      JIT Name
69a96a70   69914934   PreJIT System.Object.ToString()
69a96a90   6991493c   PreJIT System.Object.Equals(System.Object)
69a96b00   6991496c   PreJIT System.Object.GetHashCode()
69b072f0   69914990   PreJIT System.Object.Finalize()
69fd3b60   699ac468   PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].Add(Int32)
69fd2f80   699ac470   PreJIT System.Collections.Generic.List`1[[System.Int32, mscorlib]].System.Collections.IList.Add(System.Object)

From the output, we can easily identify that the method for objList and delList are the same, but the method for intList is different. So we've verified that the code for concrete type of reference type argument are shared.
Although the code is shared, these objects' EEClass are different. So they are actually different types.

Given the debugging skill above, we can also easily verify that different generic instances defined with the same type argument in different scope have the same EEClass.

Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects

Saturday, May 9, 2009

GoAhead Web Server Hang

Recently, we are experiencing process hang with the goAhead web server. The symptom can be reproduced if we disconnect the network cable while the browser is loading a page. When it occurs, we can see that the process doesn't occupy any cpu resource with top command. And we can see there are a lot of connections in ESTABLISHED, CLOSE_WAIT, TIME_WAIT, FIN_WAIT status with netsstat -atn command.
From the symptom we observed, there is no doubt it's caused by process hang. Usually, process hang is caused by the process being waiting on some conditions never or take an extreme long time to to satisfy. A typical scenario is dead lock.
We adopted a method that is kind of naive but straightforward to investigate the cause, which is printf. We inserted a lot of printf statement into source code to find out exactly in which method did the web server hanged. This is time consuming but yet effective. By time consuming, we spend more than two days on finding out the calling sequence. By effective, we finally find out that the web server is hanging in network operation.
Aside: It does seems inefficient to do so. Actually, we've tried to attach a debugger to the hung process with gdbserver(cmd: gdbserver --attach IPADDRESS:PORT PID). But in the debugger, it seems to be missing correct symbol information. And even the thread information (cmd: info threads) isn't correct. These information are correct if we attach the debugger to the web server when it's not hung.
The real cause is when the peer of the socket is forcibly disconnected even without sending FIN. So the web server still considers the socket in ESTABLISHED state. Then it will operate on the socket as normal. If the socket is in Blocking mode and doesn't have a timeout specified, the web server will be blocked on reading from or writting to the socket indefinitely.
Having found out the cause, it's easy to solve it. We can either specify a timeout on the native socket or set the socket to non-blocking mode. Code below demonstrates how to achieve so.

1. Specify timeout
void websSSLReadEvent(webs_t wp)
sptr = socketPrt(wp->sid);
struct timeval tv;
tv.tv_sec = 2; // timeout is two seconds
tv.tv_usec = 0; // it must be set to 0 explicitly, otherwise it may be a random number
int rc = setsockopt(sptr->sock, SOL_SOCKET, SO_RCVTIMEO, (struct timeval*)&tv, sizeof(struct timeval));
rc = setsockopt(sptr->sock, SOL_SOCKET, SO_SNDTIMEO, (struct timeval*)&tv, sizeof(struct timeval));

2. Clear Blocking mode
void websDone(webs_t wp)
socketSetBlock(wp->sid, 0); // the second parameter is one originally. so that it will flush everything to the peer in blocking mode to achieve graceful closing