Ok, this is super krazy Microsoft week for me. Today is day 1.
Today was the MIGANGmeeting. The rest of my super krazy Microsoft week involves free training on Friday afternoon and meeting to setup an Ann Arbor Dot Net Users Group on Friday evening. I’ll be doing penance for weeks to come by using only Linux.
MIGANG’s format is that there is a tutorial at 6pm, followed by eats and a more advanced presentation at 7pm. Today was a little different. The tutorial was a solicitation about NPower Michigan and the possibility of collaborating on projects with them. NPower helps non profits put technology to use. Sometimes custom programmed solutions are required and most of the time non profits simply cannot afford to contract a developer to do them. MIGANG is going to contribute the programming effort on a volunteer basis. This is REALLY COOL. I have no idea why more groups aren’t doing things like this, and by more groups I certainly don’t mean dot net groups. I more mean the Linux groups and they Python groups and so on. Open Source philosophies just seem to match with non profits, but that is just my view.
The 7pm presentation was on NGEN and Rebasing to improve application load times. (Charles) Stacy Harris presented.(The Charles is silent). Now I haven’t been to many MIGANG meetings. I have been to two. They were November 2004, which turned out to be a special event, the .NET Mobility Roadshow, and December 2004, which was a normal meeting with an INETA speaker. It was interesting, but not immediately applicable for me. The topic was something about advanced Web Services this or that. Today’s presentation seemed different; I guess it was the topic.
Stacy says that Claudio Caldato has an article in MSDN magazine this month on the same topic. Poor Stacy had his thunder stolen. 🙂
Stacy started off and I felt like I was back in my undergraduate operating systems course. He drew on the whiteboard the memory space of a process and show the heap and stack and mentioned shared libraries between process and the virtual vs. physical address space. I felt right at home. Then he went into how DLLs are loaded by the loader. It turns out that the Windows developer tools default all DLLs to have the same base address. The loader loads the dll and has to rebase (relocate for you UNIX and Linux people) the library to an available location in the process’s virtual address space. I wasn’t immediately sure that rebasing was the same as relocating, but upon getting home I did some background reading and it is indeed the same. The Windows developer tools (somewhere) includes a tool called rebase which will let you change this base address of this DLL and all the offsets within it.
At the time of presentation I was fuzzy about how exactly prelink works in Linux. It turns out I had it wrong as I was trying to remember it; I thought it was somehow related. Nope. It is not related, it IS the same thing. What Linux (and presumably other ELF based operating systems) calls prelinking, Microsoft calls rebasing.
There is an interesting point with respect to rebasing when talking about managed code. This point probably applies to Mono DLLs as well. In the case of a signed assembly, the signature for the signed assembly will no longer be valid after rebasing(prelinking). Go read how assembly signing works if you want to know why. So I guess it is best to set the library offset address at compile time before signing and then sign that assembly.
Inspection of the address space for a process can be done via command line using a Microsoft tool called vadump or better yet with a nice GUI tool from sysinternals.com calls Process Explorer (which does a lot more too). These tools also show the libraries default (suggested) base address as well as where it ended up in the process. This must be the kind of output which prelink uses when it forks ld to obtain its library offset information.
Of course the presentation was not just on prelinking (ugh, rebasing). Another optimization was use of the MultiDomain loader optimization. This makes the compiler generate reentrant assemblies, so that they are only loaded once in the process but can be used by more than one app domain. No one could say for sure why this is not the default. I’ve got one program which uses multiple app domains, so I may explore this option for optimization.
Also mentioned was a brief discussion on JITTING vs. NGEN and pros and cons. I didn’t get a sense that it was a preferred optimization. It(NGENing) may help application performance, but it has its cons. I definitely wasn’t clear on how it effected load times, but it is definitely something to investigate if you are having performance issues. This debate seems to be quite popular on blogs and forums so google for ngen and I’m sure you will find some lively discussions.
Another load time optimization had to do with the security of signed assemblies. The use case throughout the presentation was a program which loaded close to 300 DLLs, all at program startup. I’m not sure this next optimization would really make a different for most programs. When loading a signed DLL, the first thing the loader does is validate the hash. This is expensive. When an assembly is in the GAC, the hash is validated at GAC install time. Assemblies loaded in the GAC don’t need to be validated. The possibility of a security problem was suggested. Happy cracking.
Another optimization which is not limited to program or library load time was the last one mentioned and the only one which I’ve actually run into. The example given was allocating a large (even just 1M) byte array in a multi-threaded(or not) situation which can lead to lots of allocations on the heap and lots of garbage collection as the allocated objects go out of scope. The solutions suggested were what one programmer did was what ended up being writing their own memory manager. It turns out that C# in unsafe mode allows stack allocation. That is the solution they used. Stacy really liked the idea of writing that bit of code in managed C++, but the project leaders or someone with authority didn’t agree. It turns out they used stackalloc to solve it.
I ran into exactly the same issue on a single threaded very simple for loop where for some foolish reason I was not paying attention and I was doing a bunch of heap allocations for every cycle of the loop. It was fun to connect to the process with Performance Monitor and watch the allocations and garbage collections use 90% of my CPU leaving only 10% for my program to actually run. That was when I started to understand what Scott Collins had said to me a while before about good program architecture. I knew what he meant at the time, but now it was really sinking in.
In all it was a very impressive topic. I really enjoyed it.
I ran home and skimmed a linux prelink paper (ftp://people.redhat.com/jakub/prelink/prelink.pdfto refresh my knowledge and now I realize that I really got a presentation on how it work in Microsoft land.