<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Programmer&#039;s Notebook</title>
	<atom:link href="http://programmersnotebook.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://programmersnotebook.wordpress.com</link>
	<description>My lessons about C#, the Windows API, and anything else</description>
	<lastBuildDate>Thu, 05 May 2011 13:08:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='programmersnotebook.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Programmer&#039;s Notebook</title>
		<link>http://programmersnotebook.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://programmersnotebook.wordpress.com/osd.xml" title="Programmer&#039;s Notebook" />
	<atom:link rel='hub' href='http://programmersnotebook.wordpress.com/?pushpress=hub'/>
		<item>
		<title>rsync and &#8211;exclude-from</title>
		<link>http://programmersnotebook.wordpress.com/2010/03/20/rsync-and-exclude-from/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/03/20/rsync-and-exclude-from/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 23:58:00 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[backup]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[rsync]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=185</guid>
		<description><![CDATA[Lately I&#8217;ve been looking for a better backup solution, and I settled on rsync. If you want a good tutorial, you can find one easily enough by googling. But I had a lot of trouble finding good documentation about the --exclude-from option. What I found was either too cursory or just plain wrong. So I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=185&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been looking for a better backup solution, and I settled on rsync. If you want a good tutorial, you can find one easily enough by googling. But I had a lot of trouble finding good documentation about the <code>--exclude-from</code> option. What I found was either too cursory or just plain wrong. So I did some tests, and here are my results. I&#8217;m pleased to say that rsync&#8217;s <code>--exclude-from</code> option has very full functionality!</p>
<p>This option names a file, as in <code>--exclude-from=my-excludes.txt</code>. That file contains a list of patterns, and if one of the patterns matches a given file or directory, rsync leaves that file/directory out of the backup. </p>
<p>These patterns may be full filenames, as in <code>log</code>, or they may use wildcards, as in <code>*.bak</code>. If a directory is excluded, then all the files within it are excluded, too. (This sounds obvious, but I&#8217;ve seen other programs with exclude files that just leave out the directory!)</p>
<p>You can even include bits of a path in your exclude file to further limit what gets excluded. For instance, if you say <code>photos/thumb</code>, then the <code>thumb</code> directory will be excluded whenever it appears inside a folder called <code>photos</code>. It will not be excluded if it appears in, say, <code>body-parts</code>. Also, rsync will still include your <code>photos</code> folders; just their <code>thumbs</code> folders will be left out.</p>
<p>Finally (and this is the part I was really interested in and found wrongly-documented elsewhere), you can include specific paths. You do this by starting the path with a slash: <code>/root/tmp</code>. This forward slash does <em>not</em> indicate a file at the root of your filesystem, like a normal unix path. Rather it tells rsync to anchor this path to the base of the top-level source folder.</p>
<p>Note that if you&#8217;re using this feature, you must use it consistently with how you specify your source folder. Remember that in rsync, if you name your source folder without a trailing slash, rsync copies that folder and everything in it, whereas if you give it a trailing slash, rsync copies just the folders&#8217; contents. Well, anchored paths in the <code>--exclude-from</code> file must take account of this. Suppose your source folder is <code>root</code>. Then your excludes file would have a line like <code>/root/tmp</code>. If if your source folder is <code>root/</code>, then your excludes file needs a line like <code>/tmp</code>.</p>
<p>To put this all into an example, suppose you have this file structure and you want to exclude all the red files:</p>
<pre>
<code>
root/
     important.txt
     <span style="color:red;">important.txt.bak</span>
     mysql/
           ecom.db
           <span style="color:red;">log/mysql.log</span>
     <span style="color:red;">photos/thumb/</span>
     <span style="color:red;">tmp/note.txt</span>
     www/
         index.html
         <span style="color:red;">log/www.log</span>
         <span style="color:red;">photos/thumb/</span>
         tmp/index-ver2.html
</code>
</pre>
<p>In other words, you want to exclude:</p>
<ul>
<li>All <code>*.bak</code> files.</li>
<li>All <code>log</code> directories.</li>
<li>All <code>photo/thumb</code> directories.</li>
<li>The top-level <code>tmp</code> directory.</li>
</ul>
<p>Then you could either call rsync like this:</p>
<pre>
<code>
rsync -avz --exclude-from=excludes.txt root backups
</code>
</pre>
<p>with <code>excludes.txt</code> as follows:</p>
<pre>
<code>
*.bak
log
photos/thumb
/root/tmp
</code>
</pre>
<p>Or you could call rsync like this:</p>
<pre>
<code>
rsync -avz --exclude-from=excludes.txt root/ backups
</code>
</pre>
<p>with an <code>excludes.txt</code> like this:</p>
<pre>
<code>
*.bak
log
photos/thumb
/tmp
</code>
</pre>
<p><strong>UPDATE:</strong> One more note: You might think that ending a line in the excludes file with a trailing slash would work analogously to naming a source directory with a trailing slash: &#8220;include this directory but not its contents.&#8221; That way if you were backing up your whole filesystem, you could have a line like <code>/mnt/</code> that would create a <code>/mnt</code> directory but ignore all its contents. But in fact, rsync just seems to ignore trailing slashes in the excludes file. If you want to exclude everything in the <code>/mnt</code> directory, you need a line like this instead: <code>/mnt/*</code>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/185/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/185/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/185/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=185&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/03/20/rsync-and-exclude-from/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>A Linq Catalog</title>
		<link>http://programmersnotebook.wordpress.com/2010/03/16/a-linq-catalog/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/03/16/a-linq-catalog/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 13:56:56 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[c#]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=170</guid>
		<description><![CDATA[I thought I&#8217;d review the special methods available in Linq. In the API reference, they can be found under the Enumerable&#60;T&#62; class. Here they are: static IEnumerable&#60;T&#62; Empty&#60;T&#62;() static IEnumerable&#60;T&#62; Repeat&#60;T&#62;(T element, int count) T[] ToArray() List&#60;T&#62; ToList() Dictionary&#60;K,T&#62; ToDictionary(Func&#60;T,K&#62; keySelector) // puts every element in a dictionary // using keys generated by keySelector Dictionary&#60;K,T&#62; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=170&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I thought I&#8217;d review the special methods available in Linq. In the API reference, they can be found under the <code>Enumerable&lt;T&gt;</code> class. Here they are:</p>
<pre>
<code>
static IEnumerable&lt;T&gt; Empty&lt;T&gt;()
static IEnumerable&lt;T&gt; Repeat&lt;T&gt;(T element, int count)

T[] ToArray()
List&lt;T&gt; ToList()
Dictionary&lt;K,T&gt; ToDictionary(Func&lt;T,K&gt; keySelector)    // puts every element in a dictionary
                                                       // using keys generated by keySelector
Dictionary&lt;K,T&gt; ToDictionary(Func&lt;T,K&gt; keySelector, IEqualityComparer&lt;K&gt;)
Dictionary&lt;K,R&gt; ToDictionary(Func&lt;T,K&gt; keySelector, Func&lt;T,R&gt; elemSelector)
                               // transforms each T to an R and puts the Rs in the dictionary
Dictionary&lt;K,R&gt; ToDictionary(Func&lt;T,K&gt; keySelector, Func&lt;T,R&gt; elemSelector, IEqualityComparer&lt;K&gt;)
Lookup&lt;K,T&gt; ToLookup(Func&lt;T,K&gt; keySelector)           // a Lookup is like a Dictionary except
                                                      // each key points to a list of values.
                                                      // In Perl terms it's a hash of arrays.
Lookup&lt;K,T&gt; ToLookup(Func&lt;T,K&gt; keySelector, IEqualityComparer&lt;K&gt;)
Lookup&lt;K,R&gt; ToLookup(Func&lt;T,K&gt; keySelector, Func&lt;T,R&gt; elemSelector)
Lookup&lt;K,R&gt; ToLookup(Func&lt;T,K&gt; keySelector, Func&lt;T,R&gt; elemSelector, IEqualityComparer&lt;K&gt;)

T Aggregate(Func&lt;T,T,T&gt; adder)        // like python's reduce

T ElementAt(int index)
T ElementAtOrDefault(int index)       // returns this[i] or default(T) if i &gt;= this.Count

T First()
T First(Func&lt;T,Boolean&gt;)              // the first element that satisfies the predicate
T FirstOrDefault()
T FirstOrDefault(Func&lt;T,Boolean&gt;)
T Last()
T Last(Func&lt;T,Boolean&gt;)               // the last element that satisfies the predicate
T LastOrDefault()
T LastOrDefault(Func&lt;T,Boolean&gt;)

T Single()                        // returns the only element of the sequence,
                                  // or throws an exception
T Single(Func&lt;T,Boolean&gt; f)       // returns the only element for which f returns true.
                                  // Exception if there is not one and only one.
T SingleOrDefault()               // default(T) if empty. Still exception if more than one T.
T SingleOrDefault(Func&lt;T,Boolean&gt; f)

bool Contains(T)
bool Contains(T, IEqualityComparer&lt;T&gt;)
bool All(Func&lt;T, bool&gt; predicate)       // true if predicate is always true
bool Any(Func&lt;T, bool&gt; predicate)       // true if predicate is ever true
bool Any()                              // true if the sequence contains any elements
bool SequenceEqual(IEnumerable&lt;T&gt;)      // true if all the elements of the two sequences
                                        // are equal, using T's default equality comparer.
bool SequenceEqual(IEnumerable&lt;T&gt;, IEqualityComparer&lt;T&gt;)

int Count()
int Count(Func&lt;T,Boolean&gt;)    // counts how many elements satisfy the predicate.
long LongCount()
long LongCount(Func&lt;T,Boolean&gt;)

int Average()
int Average(Func&lt;T,int&gt; transform)    // calls transform to alter each element before
                                      // including it in the average.
                                      // You could use this to get a weighted average, e.g.
int Max()
int Max(Func&lt;T,int&gt; transform)        // uses transform to turn each T into an int
int Min()
int Min(Func&lt;T,int&gt; transform)
int Sum()
int Sum(Func&lt;T,int&gt; transform)
// ... lots of overloads ...

IEnumerable&lt;T&gt; DefaultIfEmpty()          // returns [default(T)] if empty
IEnumerable&lt;T&gt; DefaultIfEmpty(T elem)    // returns [elem] if empty.

IEnumerable&lt;T&gt; AsEnumerable()    // forces the compiler to call IEnumerable&lt;T&gt; methods
IEnumerable&lt;R&gt; Cast&lt;R&gt;()         // casts each element to an R
IEnumerable&lt;R&gt; OfType&lt;R&gt;()       // filters out any elements not of type R

IEnumerable&lt;T&gt; Intersect(IEnumerable&lt;T&gt; compare)    // the standard set operation
IEnumerable&lt;T&gt; Intersect(IEnumerable&lt;T&gt; compare, IEqualityComparer&lt;T&gt;)
IEnumerable&lt;T&gt; Union(IEnumerable&lt;T&gt; compare)        // the standard set operation
IEnumerable&lt;T&gt; Union(IEnumerable&lt;T&gt; compare, IEqualityComparer&lt;T&gt;)
IEnumerable&lt;T&gt; Except(IEnumerable&lt;T&gt; compare)       // returns this - compare
IEnumerable&lt;T&gt; Except(IEnumerable&lt;T&gt; compare, IEqualiterComperer&lt;T&gt;)

IEnumerable&lt;T&gt; Distinct()                      // like SQL distinct keyword
IEnumerable&lt;T&gt; Distinct(IEqualityComperer&lt;T&gt;)
IEnumerable&lt;T&gt; Concat(IEnumerable&lt;T&gt;)          // like Union, but with list semantics
                                               // instead of set
IEnumerable&lt;T&gt; Range(int start, int count)

IEnumerable&lt;T&gt; Skip(int n)                       // skips n elements; returns the rest
IEnumerable&lt;T&gt; SkipWhile(Func&lt;T,Boolean&gt; f)      // skips while f is true; returns the rest
IEnumerable&lt;T&gt; SkipWhile(Func&lt;T,int,Boolean&gt; f)  // f also gets the 0-based index
IEnumerable&lt;T&gt; Take(int n)                       // returns Range(0,n)
IEnumerable&lt;T&gt; TakeWhile(Func&lt;T,Boolean&gt; f)      // takes while f is true; skips the rest
IEnumerable&lt;T&gt; TakeWhile(Func&lt;T,int,Boolean&gt; f)

IEnumerable&lt;T&gt; While(Func&lt;T,Boolean&gt; f)     // gets the elements where f is true.
IEnumerable&lt;T&gt; While(Func&lt;T,int,Boolean&gt; f) // f also knows the 0-based index.

IEnumerable&lt;IGrouping&lt;K,T&gt;&gt; GroupBy(Func&lt;T,K&gt; f)    // f generates a key for each element
IEnumerable&lt;IGrouping&lt;K,R&gt;&gt; GroupBy(Func&lt;T,K&gt; f, Func&lt;K, IEnumerable&lt;T&gt;, R&gt; resultSelector)
                       // resultSelector gets an enumeration of values with a given key,
                       // and it returns a single value of type R, e.g. representing
                       // their min value. Note that you could use anonymous classes
                       // to stick several values into R, e.g. a min, max, and average.
                       // There is a nice example here:
                       // <a href="http://msdn.microsoft.com/en-us/library/bb549393.aspx">http://msdn.microsoft.com/en-us/library/bb549393.aspx</a>.
... lots more overloads ...

IOrderedEnumerable&lt;T&gt; OrderBy(Func&lt;T,K&gt; keySelector)    // orders the elements based on keys
                                                        // generated by keySelector
IOrderedEnumerable&lt;T&gt; OrderBy(Func&lt;T,K&gt; keySelector, IComparer&lt;K&gt; comp)
                                // orders the elements based on keys generated by keySelector
                                // and compared by comp
IOrderedEnumerable&lt;T&gt; OrderByDescending(Func&lt;T,K&gt; keySelector)
IOrderedEnumerable&lt;T&gt; OrderByDescending(Func&lt;T,K&gt; keySelector, IComparer&lt;K&gt; comp)
IEnumerable&lt;T&gt; Reverse()
IOrderedEnumerable&lt;T&gt; ThenBy(Func&lt;T,K&gt; keySelector)     // sub-sorts elements deemed equal
                                                        // by a previous sort
IOrderedEnumerable&lt;T&gt; ThenBy(Func&lt;T,K&gt; keySelector, IComparer&lt;K&gt; comp)
IOrderedEnumerable&lt;T&gt; ThenByDescending(Func&lt;T,K&gt; keySelector)
IOrderedEnumerable&lt;T&gt; ThenByDescending(Func&lt;T,K&gt; keySelector, IComparer&lt;K&gt; comp)

IEnumerable&lt;T&gt; Join(IEnumerable&lt;I&gt; inner, Func&lt;O,K&gt; outerKeySelector,
        Func&lt;I,K&gt; innerKeySelector, Func&lt;O,I,R&gt; resultSelector)
                // joins two sequences as an inner equijoin.
                // There is no method for outer joins,
                // but GroupJoin can get used to get that effect.
IEnumerable&lt;O,I,K,R&gt; GroupJoin(IEnumerable&lt;I&gt; inner, Func&lt;O,K&gt; outerKeySelector,
        Func&lt;I,K&gt; innerKeySelector, Func&lt;O, IEnumerable&lt;I&gt;, R&gt; resultSelector)
                // joins the two sequences (see Join) and groups the results (see GroupBy).
                // Unlike with Join, this function can be used to do a left outer join,
                // because even when there is no value in inner matching a value in outer,
                // the resultSelector gets called for each outer value,
                // just with an empty IEnuemrable&lt;I&gt;.

IEnumerable&lt;R&gt; Select(Func&lt;T,R&gt; f)                  // uses f to transform each T to an R
IEnumerable&lt;R&gt; Select(Func&lt;T,int,R&gt; f)              // f also knows the 0-based index of each T
IEnumerable&lt;R&gt; SelectMany(Func&lt;T,IEnumerable&lt;R&gt; f)  // f can return many Rs. All the Rs
                                                    // are flattened into one sequence.
IEnumerable&lt;R&gt; SelectMany(Func&lt;T,int,IEnumerable&lt;R&gt; f)
IEnumerable&lt;R&gt; SelectMany(Func&lt;T,IEnumerable&lt;U&gt; f, Func&lt;T,U,R&gt; g)
                // f generates an intermediate value of type U; g uses both a T and U
                // to get an R. This is really only useful if U doesn't have access to T.
                // See the example code here:
                // <a href="http://msdn.microsoft.com/en-us/library/bb534631.aspx">http://msdn.microsoft.com/en-us/library/bb534631.aspx</a>.
IEnumerable&lt;R&gt; SelectMany(Func&lt;T,int,IEnumerable&lt;U&gt; f, Func&lt;T,U,R&gt; g)
</code>
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/170/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/170/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/170/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=170&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/03/16/a-linq-catalog/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>Equals and compareTo in Subclasses</title>
		<link>http://programmersnotebook.wordpress.com/2010/03/15/equals-and-compareto-in-subclasses/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/03/15/equals-and-compareto-in-subclasses/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 23:36:18 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=164</guid>
		<description><![CDATA[The other day I read this interesting paper on contructing a correct equals method when subclassing. It is about Java, but it applies equally well to C#. They cite Josh Bloch&#8217;s book Effective Java, who writes: There is no way to extend an instantiable class and add a value component while preserving the equals contract, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=164&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The other day I read <a href="http://www.artima.com/lejava/articles/equality.html">this interesting paper</a> on contructing a correct <code>equals</code> method when subclassing. It is about Java, but it applies equally well to C#. They cite Josh Bloch&#8217;s book <a href="http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683/">Effective Java</a>, who writes:</p>
<blockquote><p>
There is no way to extend an instantiable class and add a value component while preserving the equals contract, unless you are willing to forgo the benefits of object-oriented abstraction.
</p></blockquote>
<p>I read this book a while back&mdash;maybe six or seven years ago now. At the time I thought it was invaluable. It seemed like the Java version of <a href="http://www.amazon.com/Expert-Programming-Peter-van-Linden/dp/0131774298/">Expert C Programming: Deep C Secrets</a> (the fish book). But when I think back on it now, all I can remember is the author&#8217;s ongoing struggle to overcome Java&#8217;s limitations and contradictions. It seemed he was constantly recommending more and more verbose code to get around problems in the underlying language. I guess that&#8217;s not Bloch&#8217;s fault, but Java just seems to be like that. It&#8217;s the reason my resume has a crowded half-page of Java TLAs. </p>
<p>Anyway, contrary to Bloch&#8217;s commonly-accepted denial, the authors of the paper present a way to write an equals method that preserves the contract of equals even when extending from another non-abstract class and adding more state. Their solution isn&#8217;t even that verbose or twisted. It&#8217;s worth a read. Basically they recommend this (some parts changed for brevity):</p>
<pre>
<code>
public class Point {
    public int x;
    public int y;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof Point) {
            Point that = (Point)other;
            result = that.canEqual(this) &amp;&amp; this.x == that.x &amp;&amp; this.y == that.y;
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof Point);
    }

    public int hashCode() {
        return 41 * (41 + x) + y;
    }
}

public class ColoredPoint extends Point {
    public Color color;

    public boolean equals(Object other) {
        boolean result = false;
        if (other instanceof ColoredPoint) {
            ColoredPoint that = (ColoredPoint)other;
            result = that.canEqual(this) &amp;&amp; color.equals(that.color) &amp;&amp; super.equals(that);
        }
        return result;
    }

    public boolean canEqual(Object other) {
        return (other instanceof ColoredPoint);
    }

    public int hashCode() {
        return 41 * super.hashCode() + color.hashCode();
    }
}
</code>
</pre>
<p>The trick here is that <code>canEquals</code> method. It is not really a public method; it is only called from within the <code>equals</code> method. But note that an object doesn&#8217;t call it&#8217;s own <code>canEquals</code> method; it calls it <em>on the other object</em>. This lets the two objects agree that they are really equal, and it solves the problem of non-symmetric implementations of <code>equals</code> (where <code>a.equals(b) != b.equals(a)</code>). This is a common problem, because a Point might think it equals a ColoredPoint, whereas the ColoredPoint knows it doesn&#8217;t equal the Point.</p>
<p>The naive way of ensuring symmetry would be to replace <code>instanceof</code> with a comparison of the object&#8217;s actual class. But this is too crude, because it means you can&#8217;t use anonymous classes. For instance, a Point should still equal an anonymous class instance like this one:</p>
<pre>
<code>
Point pAnon = new Point() {
    public void overrideSomeMethod() {
        // ...
    }
}
</code>
</pre>
<p>With <code>canEquals</code>, the anonymous class simple inherits <code>canEquals</code> from Point, and the two objects will still agree on their equality. I think this is a really nice solution to a thorny problem.</p>
<p>The forum discussion about the paper (which is almost as good as the paper itself) argues that Java ought to support an <code>Equalator</code> interface as a parallel to <code>Comparator&lt;T&gt;</code>. The idea is that just as you can override the &#8220;natural ordering&#8221; of a class, you should be able to override its &#8220;natural equivalence.&#8221; This would let you instantiate a <code>Set</code>, <code>HashMap</code>, etc. with an <code>Equalator</code> to get a different notion of equals than usual. Just as objects may sort differently in different contexts, so they may &#8220;be equal&#8221; differently in different contexts, depending on what you care about. Who hasn&#8217;t run into the need for a <code>Set</code> based on reference identity, for example? Apache Collections provides just such a class.</p>
<p>The need for an <code>Equalator</code> seems most pressing in classes like <code>TreeSet</code> that use <code>compareTo</code> rather than <code>equals</code> to test for set duplicates. If you use a <code>TreeSet</code> with a <code>Comparator</code> that is not consistent with equals, that the <code>TreeSet</code> will appear to violate the set contract, because you could have <code>a.equals(b)</code> but <code>s.contains(a) != s.contains(b)</code>. I went to bed thinking it&#8217;s a shame Sun hasn&#8217;t added this <code>Equalator</code> concept.</p>
<p>But as I was thinking about it over the night, I started to believe Sun is right to leave it out, at least in so far as it pertains to classes like <code>TreeSet</code> that use <code>compareTo</code> instead of <code>equals</code>. Basically, the <code>TreeSet</code>&#8216;s <code>Comparator</code> is already operating as an <code>Equalator</code> here. Why do you need an <code>Equalator</code>, too? What problem would it solve that isn&#8217;t already solved by the <code>Comparator</code>? If you passed an <code>Equalator</code> to a <code>TreeSet</code>, it wouldn&#8217;t change this code problem: <code>a.equals(b) &amp;&amp; (s.contains(a) != s.contains(b))</code>. The whole point of an <code>Equalator</code> is to impose a different notion of equality <em>on a limited context</em>, and with <code>TreeSet</code> a <code>Comparator</code> is sufficient for that.</p>
<p>Of course, that&#8217;s not to say an <code>Equalator</code> wouldn&#8217;t be useful in a regular <code>Set</code> or <code>Map</code>. It turns out that C# does have the <code>Equalator</code> idea, but it&#8217;s called <code>IEqualityComparer&lt;T&gt;</code>. It doesn&#8217;t seem to be used much, but <code>Dictionary&lt;K,V&gt;</code> and <code>HashSet&lt;T&gt;</code> both support it.</p>
<p>I actually came across this paper while thinking about how C#&#8217;s <code>CompareTo&lt;T&gt;</code> can work in a class heirarchy. As in Java, this method must be &#8220;consistent with <code>Equals</code>.&#8221; That is, whenever <code>Equals</code> returns true, <code>CompareTo</code> must return 0, and whenever <code>CompareTo</code> returns 0, <code>Equals</code> must return true. Put into code, <code>(a.CompareTo(b) == 0) == a.Equals(b)</code>. The <code>CompareTo</code> method is tricky because it&#8217;s parameterized: its signature is <code>bool CompareTo(T o)</code>, where <code>T</code> comes from <code>IComparable&lt;T&gt;</code>.</p>
<p>So what happens if you have <code>Base : IComparable&lt;Base&gt;</code> and <code>Subclass : Base, IComparable&lt;Subclass&gt;</code>? My instinct is you&#8217;re asking for trouble, although when I think it through it seems that the compiler will choose the method based on the current static type, not the instance&#8217;s actual type, so maybe you&#8217;d be okay. If your code is interested in comparing Bases, you&#8217;ll call that method; you&#8217;ll only call <code>CompareTo(Subclass o)</code> if you&#8217;re explicitly comparing Subclasses. So maybe everything will work out, but I&#8217;m still uneasy.</p>
<p>I also see that C# 4.0 is supporting new keywords for co- and contra-variance in generic parameters. So we get <code>IEnumerable&lt;out T&gt;</code> and <code>IComparable&lt;in T&gt;</code>. This means that if you implement <code>IEnumerator&lt;Subclass&gt; GetEnumerator</code>, you also fulfill the contract for IEnumerable&lt;Base&gt;, and if you implement <code>CompareTo(Base o)</code>, your subclass doesn&#8217;t have to implement <code>CompareTo(Subclass o)</code> in order to fulfill the contract for <code>IComparable&lt;Subclass&gt;</code>. I hope I&#8217;ve got that right!</p>
<p>The first part&mdash;covariant return types&mdash;seems like the bigger deal here. (I only wish it were <em>full</em> covariant return types as in C++!) But the part about <code>IComparable</code> seems nice, too. It should save a bit of code, because it means that if you have a full-featured base class and you want to write a quick subclass on top of it, you can still use your subclass in things that require an <code>IComparable&lt;Subclass&gt;</code> (like <code>List&lt;Subclass&gt;</code>) without writing another <code>CompareTo</code> implementation.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/164/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=164&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/03/15/equals-and-compareto-in-subclasses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>Viewing Unit Test Output in Visual Studio</title>
		<link>http://programmersnotebook.wordpress.com/2010/03/11/viewing-unit-test-output-in-visual-studio/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/03/11/viewing-unit-test-output-in-visual-studio/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 19:31:47 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[visual studio]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=162</guid>
		<description><![CDATA[I&#8217;ve been using Visual Studio 2008 to write unit tests. I had a couple failing in very strange ways, and I wanted to add some debugging statements to watch the state of the code. I usually find this more useful than running a debugger. (If that seems weak, then I&#8217;ll just say that K &#38; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=162&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using Visual Studio 2008 to write unit tests. I had a couple failing in very strange ways, and I wanted to add some debugging statements to watch the state of the code. I usually find this more useful than running a debugger. (If that seems weak, then I&#8217;ll just say that K &amp; R agrees with me!) I can print whatever state I want and I don&#8217;t have to deal with breakpoints. Furthermore, once those debugging messages are in there, I can leave them there so they help out next time.</p>
<p>The problem was that I couldn&#8217;t figure out how to see Console output for unit tests. I kept trying to open various Output windows: Build output, Debug output, Refactor output. No Test output! Googling didn&#8217;t turn up anything. Today I was hitting the same problem, so I thought I&#8217;d try Googling one more time. I must have hit the magic search term combination, because this time all my top results were relevant.</p>
<p>It turns out to see a test&#8217;s output, you just double-click on the test summary line, and all the output is down at the bottom of that window. You get Console.Out messages and (more importantly) {Trace,Debug}.WriteLine(). Wonderful!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/162/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/162/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/162/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=162&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/03/11/viewing-unit-test-output-in-visual-studio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>Sort by File Size with du</title>
		<link>http://programmersnotebook.wordpress.com/2010/03/02/sort-by-file-size-with-du/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/03/02/sort-by-file-size-with-du/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 13:41:14 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=139</guid>
		<description><![CDATA[Here is a handy Perl script I wrote a while back to sort the output from du -sh by file size. The standard sort command can&#8217;t do this because it doesn&#8217;t know how to compare values like &#8220;488M&#8221; and &#8220;5.0K.&#8221; My code will sort any lines where this values appear in the first field. I&#8217;m [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=139&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is a handy Perl script I wrote a while back to sort the output from <code>du -sh</code> by file size. The standard <code>sort</code> command can&#8217;t do this because it doesn&#8217;t know how to compare values like &#8220;488M&#8221; and &#8220;5.0K.&#8221; My code will sort any lines where this values appear in the first field. I&#8217;m sure the Perl could be more compressed, but keeping it easy to read like this is more my style:</p>
<pre>
<code>
#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my @lines;

while (&lt;&gt;) {
        chomp;
        push @lines, [unabbrev($_), $_];
        # print "$_: ", unabbrev($_), "\n";
}

# print Dumper \@lines;

for my $line (reverse sort { return $a-&gt;[0] &lt;=&gt; $b-&gt;[0] } @lines) {
        print $line-&gt;[1], "\n";
}

sub unabbrev {
        my $val = shift;
        if ($val =~ m/^\s*(\d+(\.\d+)?)([KMGB]?)/) {
                if ($3 eq 'K') {
                        $val = $1 * 1000;
                } elsif ($3 eq 'M') {
                        $val = $1 * 1000000;
                } elsif ($3 eq 'G') {
                        $val = $1 * 1000000000;
                } else { # B or nothing
                        $val = $1;
                }
        }
        return $val;
}
</code>
</pre>
<p>It&#8217;d be fun to re-write this in ruby&mdash;or even better, add it as a feature to GNU sort.</p>
<p>UPDATE: Reading the documentation for GNU coreutils (which contains sort), I see that sort does have an -h option (for &#8211;human-numeric-sort). Strangely, this option is not documented in the man page on Ubuntu 9.10 and is unrecognized by /usr/bin/sort. I guess I&#8217;ve got an old version.</p>
<p>If anyone is looking for a small open source project, sort&#8217;s implementation of this option could still be improved. Right now, according to the online docs, &#8220;values with different precisions like 6000K and 5M will be sorted incorrectly.&#8221; It&#8217;d be great if it fully implemented the rules for <a href="http://www.gnu.org/software/coreutils/manual/coreutils.html#Block-size">block size</a> used by other coreutils programs.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/139/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/139/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/139/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=139&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/03/02/sort-by-file-size-with-du/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>C# XmlTextReader Tutorial</title>
		<link>http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 15:41:50 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[windows]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=125</guid>
		<description><![CDATA[I needed to read a big XML file into an object structure. I wanted it to be fast and use a low memory footprint. I also wanted the XML to stay pretty clean to make future support easy. Because of the speed and memory requirements, DOM and XPath were out. I toyed around with XmlSerializer, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=125&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I needed to read a big XML file into an object structure. I wanted it to be fast and use a low memory footprint. I also wanted the XML to stay pretty clean to make future support easy. Because of the speed and memory requirements, DOM and XPath were out. I toyed around with XmlSerializer, but it didn&#8217;t quite give me the XML I wanted&mdash;too ugly&mdash;and I didn&#8217;t like cluttering my classes with xml serialization attributes. That doesn&#8217;t belong in the objects, does it? And then the code to structure the XML is scattered around everywhere. And finally, what if I need to serialize the objects in different ways at different times?</p>
<p>So I thought I&#8217;d try my hand at XmlTextReader, which is a bit like SAX but more &#8220;push&#8221; than &#8220;pull.&#8221; It&#8217;s not based on callbacks, and you don&#8217;t have to manage state so attentively. Coming from the Java world, I&#8217;m used to SAX. I actually like it. The state machine thing isn&#8217;t so hard, really. So I was pretty excited about XmlTextReader. It looked like it would have the advantages of SAX but be easier to use.</p>
<p>Having now written some code with XmlTextReader, I&#8217;m still pretty happy with it, but I&#8217;m a bit disappointed that Microsoft junked up the API so much. It seems gappy and needlessly complicated. But having learned it, I thought I&#8217;d set it all down in writing.</p>
<p>One approach, which is fairly SAX-like, is to put everything into a read loop and switch on element names. That might look like this:</p>
<pre>
<code>
XmlTextReader r = new XmlTextReader(stream);
r.WhiteSpaceHandling = WhiteSpaceHandling.None;
while (r.Read()) {
  if (r.NodeType.Equals(XmlNodeType.Element)) {
    switch (r.LocalName) {
	  case "this":
	    // processing ...
		break;
      case "that":
	    // processing ...
		break;
	  // more cases ...
	}
  }
}
</code>
</pre>
<p>If you wanted, you could stop there. The Read() method gives you one node at a time, and you handle each element. You could add code to handle endElements also, or comments, or whatever, just as in SAX. But there are lots of other methods to make things easier.</p>
<p>If you&#8217;ve never done this sort of thing before, you should know that an XML document consists of nodes. A node can be a start element, an end element, a run of text, a comment, a processing instruction, even whitespace. By setting WhitespaceHandling to None above, we told the XmlTextReader to skip whitespace, so Read() doesn&#8217;t report it. Attributes are nodes too, but the Read loop doesn&#8217;t emit them, either. Instead you use special methods to get at attributes when you&#8217;re positioned on their containing element. As you can see, some nodes have children (e.g. some elements), whereas other nodes are leaves (text, attributes, other elements).</p>
<p>One tricky thing is to keep track of your current position in the document. In describing the various methods below, I&#8217;ll try to pay attention to how they affect the document position. The Read() element advances one node.</p>
<p>First let&#8217;s talk about some methods that look useful but you should probably avoid. One is ReadElementString(). This advances one element and returns the contents of the next element as a string. A variant is ReadElementString(elemName), which verifies that the current element matches the given name. I didn&#8217;t find these methods too useful. The first gives you no checking to see if you&#8217;re actually reading the right thing. The second checks the wrong thing. I don&#8217;t want to test the current element and then read the contents of the next one. The test needs to be the name of the next element. Both of these methods read the element blindly; the latter just looks backwards a bit.</p>
<p>Another method to avoid is ReadString(). This returns a string when positioned on either an element or a text node. That&#8217;s handy, but it doesn&#8217;t skip comments and processing instructions very well. For that we&#8217;ll need a different method.</p>
<p>One method that looks great is MoveToElement(elemName). But if you think this will scan through the document until it finds your desired element, you&#8217;re wrong. Instead, this is used in attribute processing to move up from the attributes back to their owning element. Alas!</p>
<p>One method that really is handy is MoveToContent. This skips over comments, whitespace, processing instructions, and documentType nodes. In all your processing, you should be aware that users may throw comments into weird places. Robust XML parsing doesn&#8217;t get tripped up by comments. So this call is quite useful.</p>
<p>To get the contents of an element, ignoring any comments, you want the ReadContentAsXXX() methods and the ReadElementContentAsXXX() methods. These methods skip comments and processing instructions and automatically convert entities. This is just the sort of friendly assistance you want from your XML parser. As for the XXX, you have a lot of options. It could be String, Int, DateTime, even Object.</p>
<p>The difference between ReadConentAs and ReadElementContentAs is this: the first must be positioned on a text node, whereas the latter can be positioned on either a text node or the text&#8217;s containing element. If you call ReadContentAs on an element, you get an exception. In practice, I think ReadElementContentAs is the more useful family of methods. Also, when it returns, the reader is positioned at whatever follows the endElement node of what you just read. If you&#8217;re ignoring whitespace, this could be the next element. (Or it could be a comment, etc.)</p>
<p>Then there are a few methods for moving around in the document. I think this is where the API really falls short, but here&#8217;s what we&#8217;ve got: ReadToFollowing, ReadToNextSibling, and ReadToDescendant. All these methods take an element name and return true if it is found, leaving you positioned on the element node (and ready to call ReadElementContentAs). If they can&#8217;t find what you want, they return false, and then your new position in the document is however far they&#8217;ve searched. You&#8217;ll be at EOF for ReadToFollowing, or the end tag of the current/parent element for the other two. This is a bit of a disappointment. If you&#8217;re looking for a required element, you could just throw an exception, but what about finding optional elements? It&#8217;s no big deal the element wasn&#8217;t there, but you&#8217;re way off course.</p>
<p>One other useful method is Skip(). This skips the children of the current node.</p>
<p>Here is some code making use of these elements to parse a fairly simply document. The document describes a family, with elements for headOfHousehold, spouse, and children. Each person has a name, birthdate, and sex. Let&#8217;s treat the last two of these as optional. The spouse node can also have an optional marriageDate element. We could indicate the XML structure like this, where wildcards have their usual meaning (? = 0 or 1, * = 0 or more, + = 1 or more):</p>
<pre>
<code>
family
  headOfHousehold
    name
      birthdate?
      sex?
  spouse?
    name
    birthdate?
    sex?
    marriageDate?
  children?
    child+
      name
      birthdate?
      sex?
</code>
</pre>
<p>Here is some code that processes such a file by building up an object structure, then prints the results:</p>
<pre>
<code>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;

namespace XmlTextReaderTest {

    public enum Sex { MALE, FEMALE }

    public class Person {
        public string Name { get; set; }
        public DateTime Birthdate { get; set; }
        public Sex Sex { get; set; }
        public DateTime? MarriageDate { get; set; }

        public override string ToString() {
            return String.Format("{0}: {1}, {2:yyyy-MM-dd}, {3:yyyy-MM-dd}", Name, Sex, Birthdate, MarriageDate);
        }
    }

    public class Family {
        public Person Head { get; set; }
        public Person Spouse { get; set; }
        public List&lt;Person&gt; Children { get; private set; }

        public Family() {
            Children = new List&lt;Person&gt;();
        }

        public override String ToString() {
            String ret = "Head:\n\t" + Head + "\nSpouse:\n\t" + Spouse + "\n";
            foreach (Person p in Children) {
                ret += "Child:\n\t" + p + "\n";
            }
            return ret;
        }
    }

    public class Program {
        static void Main(string[] args) {
            string xmlDoc = @"&lt;?xml version='1.0' encoding='utf-8'?&gt;
&lt;family&gt;
    &lt;headOfHousehold&gt;
      &lt;name id='asdf'&gt;Paul Jungwirth&lt;/name&gt;
      &lt;!-- not my real birthday of course: --&gt;
      &lt;birthdate&gt;1975-02-08&lt;/birthdate&gt;
      &lt;sex&gt;male&lt;/sex&gt;
    &lt;/headOfHousehold&gt;
    &lt;spouse&gt;
      &lt;name&gt;Arielle Jungwirth&lt;/name&gt;
      &lt;birthdate&gt;1979-11-11&lt;/birthdate&gt;
      &lt;sex&gt;female&lt;/sex&gt;
      &lt;marriageDate&gt;2006-09-09&lt;/marriageDate&gt;
    &lt;/spouse&gt;
    &lt;children&gt;
      &lt;child&gt;
        &lt;name&gt;James Jungwirth&lt;/name&gt;
        &lt;birthdate&gt;2007-12-31&lt;/birthdate&gt;
        &lt;sex&gt;male&lt;/sex&gt;
      &lt;/child&gt;
      &lt;child&gt;
        &lt;name&gt;Miriam Jungwirth&lt;/name&gt;
        &lt;birthdate&gt;2010-01-20&lt;/birthdate&gt;
        &lt;sex&gt;female&lt;/sex&gt;
      &lt;/child&gt;
    &lt;/children&gt;
&lt;/family&gt;";
            Family f = ParseFamily(new MemoryStream(Encoding.Default.GetBytes(xmlDoc)));
            Console.Write(f);
            Console.ReadLine();
        }

        public static Family ParseFamily(Stream stream) {
            Family f = new Family();
            XmlTextReader r = new XmlTextReader(stream);
            r.WhitespaceHandling = WhitespaceHandling.None;
            r.MoveToContent();

            while (r.Read()) {
                // Console.WriteLine(r.NodeType + ": " + r.LocalName + "\n");
                if (r.NodeType.Equals(XmlNodeType.Element)) {
                    switch (r.LocalName) {
                        case "headOfHousehold":
                            f.Head = ParsePerson(r);
                            break;
                        case "spouse":
                            f.Spouse = ParsePerson(r);
                            f.Head.MarriageDate = f.Spouse.MarriageDate;
                            break;
                        case "child":
                            f.Children.Add(ParsePerson(r));
                            break;
                        default:
                            // ignore other nodes
                            break;
                    }
                }
            }

            return f;
        }

        public static Person ParsePerson(XmlTextReader r) {
            Person p = new Person();

            // Right now we're pointing to the person's containing element, e.g. headOfHousehold.
            // Read past that, then read until we get to a new start element.
            r.Read();

            r.MoveToContent();
            if (r.LocalName.Equals("name")) p.Name = r.ReadElementContentAsString();
            else throw new InvalidDataException("no name for person");

            r.MoveToContent();
            if (r.LocalName.Equals("birthdate")) p.Birthdate = r.ReadElementContentAsDateTime();

            r.MoveToContent();
            if (r.LocalName.Equals("sex")) p.Sex = (Sex)Enum.Parse(typeof(Sex), r.ReadElementContentAsString(), true);

            r.MoveToContent();
            if (r.LocalName.Equals("marriageDate")) p.MarriageDate = r.ReadElementContentAsDateTime();

            return p;
        }
    }
}
</code>
</pre>
<p>This code demonstrates on a small scale the pattern I use to parse XML documents and keep the code manageable. I read through the whole document using our Read/switch loop, and I call out to helper functions to build objects representing significant chunks. These methods may call other helper functions or (as here) just navigate through the XML to pick out primitives. Each chunk of code is self-contained, and you&#8217;re never looking at more than a page or so.</p>
<p>You can see that in building each Person, I use MoveToContent and then test the name of the next element. Calling ReadElementContentAs takes me past the endElement, so afterwards I&#8217;m ready to read some more. If I&#8217;m already on an element, MoveToContent won&#8217;t advance at all, so it&#8217;s safe to call twice in the case when an optional element is missing.</p>
<p>You could also implement ParsePerson as a second Read/switch loop. That would mean the child elements can come in any order, but you&#8217;d have to verify at the end that you got data for all the required ones. You also may not know when to exit, if the name of your endElement can vary as in this example (e.g. &#8220;spouse&#8221; vs. &#8220;child&#8221;).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/125/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=125&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>Sorting Out the Confusion: 32- vs. 64-Bit, CLR vs. Native, C# vs. C++</title>
		<link>http://programmersnotebook.wordpress.com/2010/02/13/sorting-out-the-confusion-32-vs-64-bit-clr-vs-native-cs-vs-cpp/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/02/13/sorting-out-the-confusion-32-vs-64-bit-clr-vs-native-cs-vs-cpp/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 11:46:22 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[c#]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=117</guid>
		<description><![CDATA[I&#8217;ve been trying to learn how things work on Windows based on whether you write code in C# or C++, target a 32- or 64-bit platform, and produce files with either native code or one of the CLR options. One of my focuses is the interaction between exes and dlls. I think I&#8217;ve got things [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=117&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been trying to learn how things work on Windows based on whether you write code in C# or C++, target a 32- or 64-bit platform, and produce files with either native code or one of the CLR options. One of my focuses is the interaction between exes and dlls. I think I&#8217;ve got things mostly straightened out, so this is what I&#8217;ve learned.</p>
<p>First, the basics: a 32-bit platform can run 32-bit apps, but not 64-bit apps. A 64-bit platform can run either, but 32-bit apps run in an emulation environment called WOW64 (Windows on Windows 64). When Windows starts your app, it decides whether WOW64 is necessary. You can <a href="http://msdn.microsoft.com/en-us/library/ms684139%28VS.85%29.aspx">tell whether your app is running in WOW64</a> using this C++ code:</p>
<pre>
<code>
#include "stdafx.h"
#include &lt;windows.h&gt;

typedef BOOL (WINAPI *LPFN_ISWOW64PROCESS) (HANDLE, PBOOL);

LPFN_ISWOW64PROCESS fnIsWow64Process;

BOOL isWow64() {
    BOOL ret = FALSE;
    fnIsWow64Process = (LPFN_ISWOW64PROCESS)GetProcAddress(
        GetModuleHandle(TEXT("kernel32")), "IsWow64Process");

    if (NULL != fnIsWow64Process) {
        if (!fnIsWow64Process(GetCurrentProcess(), &amp;ret)) {
            printf("Got some error\n");
        }
    }
    return ret;
}

int _tmain(int argc, _TCHAR* argv[]) {
    if (isWow64()) {
        printf("Running under WOW64.\n");
    } else {
        printf("NOT running under WOW64.\n");
    }
    scanf("press return");
    return 0;
}
</code>
</pre>
<p>It&#8217;s easy enough to call <code>isWow64</code> from C#, like so:</p>
<pre>
<code>
[DllImport]("IsWow64Dll.dll")]
static extern bool isWow64();

static void Main(String[] args) {
    Console.WriteLine(isWow64().ToString());
    Console.ReadLine();
}
</code>
</pre>
<p>Visual Studio lets you build files for either 32- or 64-bit platforms. I&#8217;ve already written how to <a href="/2010/01/23/enabling-64-bit-support-in-visual-studio-2008/">build for 32 or 64 bits in C++</a>. C# actually provides <em>three</em> options: 32-bits, 64-bits, or &#8220;Any CPU.&#8221; We can use a tool called corflags to see what results we get depending on which option we choose. Corflags comes with Visual Studio and can be run by choosing the special DOS prompt command under Visual Studio in the Start menu. This is a little different from the regular DOS prompt: it has a specially-tailored environment for running Visual Studio&#8217;s command-line utilities. From there, you can ask corflags to report information about any exe or dll, like this:</p>
<pre>
<code>
C:\&gt; corflags myapp.exe
Microsoft (R) .NET Framework CorFlags Conversion Tool.  Version  3.5.21022.8
Copyright (c) Microsoft Corporation.  All rights reserved.

Version   : v2.0.50727
CLR Header: 2.5
PE        : PE32
CorFlags  : 3
ILONLY    : 1
32BIT     : 1
Signed    : 0
</code>
</pre>
<p>We&#8217;re mostly interested in three values: PE, 32BIT, and ILONLY. There is also a line labelled &#8220;Signed,&#8221; which I&#8217;m not interested in right now. Finally, the &#8220;CorFlags&#8221; line appears to be a combination of the four other values.</p>
<p>PE specifies whether or not the file can run on 32-bit platforms. It is either PE32 or PE32+. A PE32+ file cannot run on a 32-bit machine.</p>
<p>Next there is the 32BIT flag. This is a little different from PE. If PE indicates whether your app <em>can</em> run as 32 bits, then 32BIT indicates whether it <em>must</em> run as 32 bits. If this flag is 0, your app can run on a 64-bit machine without WOW64. But if the flag is 1, then your app has to run under WOW64. Here is a table showing how the bits are set depending on your compiler&#8217;s /platform setting:</p>
<table>
<tr>
<th>Compiler Option</th>
<th>PE</th>
<th>32BIT</th>
</tr>
<tr>
<td>x86</td>
<td>PE32</td>
<td>1</td>
</tr>
<tr>
<td>Any CPU</td>
<td>PE32</td>
<td>0</td>
</tr>
<tr>
<td>x64</td>
<td>PE32+</td>
<td>0</td>
</tr>
</table>
<p>From this table, you can see that the corflags example above is inspecting a C# app built for the x86 platform. Note that you could never have a file that is PE32+ with the 32BIT flag set, because then one flag would require 32 bits and the other 64.</p>
<p>To put all this together, a 32-bit machine can run anything with a PE set to PE32, but nothing with a PE of PE32+. A 64-bit machine can run your file in 64-bit mode as long as 32BIT is 0, but if 32BIT is 1 then it must use WOW64.</p>
<p>The ILONLY flag indicates that your file contains only MSIL opcodes (recently renamed to CIL), with no native assembly instructions. A C# app will always have this flag set (unless you use something like <a href="http://msdn.microsoft.com/en-us/library/6t9t5wcf%28VS.80%29.aspx">ngen</a> to compile down to machine language&mdash;an approach with <a href="http://www.codeguru.com/Csharp/.NET/net_general/toolsand3rdparty/article.php/c4651">some distribution problems</a>), but a C++ app&#8217;s setting depends on your compiler options (described below).</p>
<p>When it comes to loading dlls, these flags control whether your app loads the dll successfully or gets a BadImageFormatException. Basically, a 32-bit app can only load 32-bit dlls, and a 64-bit app can only load 64-bit dlls. But what about apps compiled as &#8220;Any CPU&#8221;? In that case, you can only load dlls matching whatever bitness you&#8217;re currently running as. Of course, if you&#8217;re running on a 32-bit machine, there is no complication, because everything is 32-bit already.</p>
<p>But on a 64-bit machine, you may have problems. Windows will not use WOW64 for your app, because it claims to support 64-bit operation. But if your app has a dependency on a 32-bit dll, then you&#8217;ll get a BadImageFormatException, because the 32-bit dll only works in WOW64. The choice to use WOW64 happens only when starting your app. You can&#8217;t run an app natively and load just the dlls in WOW64. So you get the exception.</p>
<p>The solution is to tell Windows that your app must start in WOW64 from the beginning. You should probably do this by building your app for x86, not Any CPU, but if that is somehow a problem (e.g. you don&#8217;t have the code), then you can use corflags to set the 32BIT flag. You just type something like this:</p>
<pre>
<code>
corflags /32BIT+ myapp.exe
</code>
</pre>
<p>For C++ applications, you can do something similar with <a href="http://msdn.microsoft.com/en-us/library/31zwwc39%28VS.80%29.aspx">the linker&#8217;s /clrimagetype flag</a>.</p>
<p>Another choice, at least when writing in C++, is how to support the CLR. You can choose among four options: native (the default), /clr, /clr:pure, and /clr:safe. The first one is simple enough: you get a file with machine language instructions. The other three give you a file that is partially or entirely composed of MSIL. Using /clr will produce a CLR header and mostly MSIL code, but with some native code mixed in. Specifically, you get native data types but MSIL functions, unless the function uses something unsupported like function pointers. (Everyday pointers to data are supported.) You can also use <code>#pragma unmanaged</code> to force native code. Because these files have some native code, they must be built for a specific platform, either x86 or x64.</p>
<p>The /clr:pure option does what it sounds like: it gives you a file of entirely MSIL. Nonetheless, it must be built for either x86 or x64. This option is said to be equivalent to a C# project with unsafe code.</p>
<p><a href="http://msdn.microsoft.com/en-us/library/31zwwc39%28VS.80%29.aspx">Microsoft&#8217;s documentation on the /clr and /clr:pure flags</a> says that they can only produce x86 files, but my tests prove this to be false. If I build the C++ version of the WOW64-tester, using x64 and /clr compilation options, then it reports that it is not running in WOW64. So apparently you can in fact produce x64 applications with these options.</p>
<p>The last one, /clr:safe, enforces code that is verifiably type-safe&mdash;but I&#8217;m not sure what all that means. I&#8217;ve read that if you use this option, your file can run on any platform, like building as Any CPU in C#. This option requires that you use Microsoft&#8217;s C++/CLI language, formerly known as Managed C++. I know nothing about this, but people say it&#8217;s virtually a new language. I tried to build a Hello World app with <code>printf</code> and got innumerable compile errors, so I wasn&#8217;t able to run any tests on what this option produces.</p>
<p>There is also a /clr:oldSyntax option, which is like /clr:safe but with the old Managed C++ syntax rather than C++/CLI. Since Managed C++ is deprecated, I&#8217;m not sure why you&#8217;d use this for new code.</p>
<p>I don&#8217;t know what the /clr* options mean for P/Invoke. If I build a dll with /clr or /clr:pure, does that mean I can call its exported functions from C# without a <code>DllImport</code> statement? I haven&#8217;t tried. Using <code>DllImport</code> on these dlls doesn&#8217;t cause problems, though.</p>
<p>You can use a tool called <a href="http://msdn.microsoft.com/en-us/library/ds03hhk8%28VS.80%29.aspx">dumpbin</a> to see which /clr options were used to produce a given file. Dumpbin comes with Visual Studio and runs from the command-line:</p>
<pre>
<code>
dumpbin /CLRHEADER myapp.exe
</code>
</pre>
<p>This will print (among other things) a <code>Flags</code> value, which is 0 if the file was build with /clr, 1 with /clr:safe, and 3 with /clr:pure.</p>
<p>I&#8217;m also curious about the interaction between WOW64 and the CLR. If I run a 32-bit C# app on x64, then which comes first: WOW64 or the CLR? Is there a 64-bit CLR that can JIT-compile to either 32- or 64-bit code? Or do I have two CLRs, one for 32 bits and one for 64, and the former runs under WOW64? I suspect the answer is the latter, but I&#8217;m not sure how to tell. Either way my code is running in WOW64, so the check I described above won&#8217;t tell me anything.</p>
<p>I created a table to keep track of the data from all my tests. Here it is:</p>
<table>
<tr>
<th>exe</th>
<th>machine type</th>
<th>binary contents</th>
<th>managed?</th>
<th>contains assembly?</th>
<th>can run on x86?</th>
<th>can run on x64?</th>
<th>can use x86 C# dll?</th>
<th>can use x64 C# dll?</th>
<th>can use &#8220;Any CPU&#8221; C# dll?</th>
<th>can use x86 C++ native dll?</th>
<th>can use x64 C++ native dll?</th>
<th>can use x86 C++ /clr dll?</th>
<th>can use x64 C++ /clr dll?</th>
<th>can use x86 C++ /clr:pure dll?</th>
<th>can use x64 C++ /clr:pure dll?</th>
</tr>
<tr>
<td>x86 C# exe</td>
<td>i386</td>
<td>MSIL</td>
<td>yes</td>
<td>yes</td>
<td>yes, in CLR</td>
<td>yes, in WOW64+CLR</td>
<td>yes</td>
<td>no</td>
<td>yes</td>
<td>yes</td>
<td>no</td>
<td>yes</td>
<td>no</td>
<td>yes</td>
<td>no</td>
</tr>
<tr>
<td>x64 C# exe</td>
<td>x64</td>
<td>MSIL</td>
<td>yes</td>
<td>yes</td>
<td>no</td>
<td>yes, in CLR</td>
<td>no</td>
<td>yes</td>
<td>yes</td>
<td>no<sup>2</sup></td>
<td>yes</td>
<td>no</td>
<td>yes</td>
<td>no</td>
<td>yes</td>
</tr>
<tr>
<td>&#8220;Any CPU&#8221; C# exe</td>
<td>i386</td>
<td>MSIL</td>
<td>yes</td>
<td>yes</td>
<td>yes, in CLR</td>
<td>yes, in CLR</td>
<td>only on x86<sup>1</sup></td>
<td>only on x64</td>
<td>yes</td>
<td>only on x86<sup>1</sup></td>
<td>only on x64</td>
<td>only on x86</td>
<td>only on x64</td>
<td>only on x86</td>
<td>only on x64</td>
</tr>
<tr>
<td>x86 C++ exe</td>
<td>i386</td>
<td>asm</td>
<td>no</td>
<td>no</td>
<td>yes, natively</td>
<td>yes, in WOW64</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>x64 C++ exe</td>
<td>x64</td>
<td>asm</td>
<td>no</td>
<td>no</td>
<td>no</td>
<td>yes, natively</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>x86 C++ /clr exe</td>
<td>i386</td>
<td>MSIL, mostly</td>
<td>?</td>
<td>?</td>
<td>yes, in CLR</td>
<td>yes, in WOW64+CLR</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>x64 C++ /clr exe</td>
<td>x64</td>
<td>MSIL, mostly</td>
<td>?</td>
<td>?</td>
<td>no</td>
<td>yes, in CLR</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>x86 C++ /clr:pure exe</td>
<td>i386</td>
<td>MSIL</td>
<td>yes</td>
<td>?</td>
<td>yes, in CLR</td>
<td>yes, in WOW64+CLR</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
<tr>
<td>x64 C++ /clr:pure exe</td>
<td>x64</td>
<td>MSIL</td>
<td>yes</td>
<td>?</td>
<td>no</td>
<td>yes, in CLR</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
<td>?</td>
</tr>
</table>
<p><sup>1</sup>There may be some option like the linker&#8217;s /CLRHEADER for C# apps.<br />
<sup>2</sup>Can&#8217;t run on x86, and won&#8217;t be in WOW64 on x64. But see note 1.</p>
<p>There are still some gaps in this table. I&#8217;m not that concerned about the interactions between C++ exes and C++ dlls, so I&#8217;ve left those cells blank. I&#8217;ve also left some cell blanks regarding when C++ files are managed/unmanaged and when they contain an assembly. If I figure any of this out, I&#8217;ll update the table.</p>
<p>One final notable tool is ildasm (IL-disassembler), which also comes with Visual Studio. This lets you inspect the IL of an exe or dll. Most of it is over my head, but it&#8217;s intetesting to see what your code becomes.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/117/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=117&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/02/13/sorting-out-the-confusion-32-vs-64-bit-clr-vs-native-cs-vs-cpp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>Decline in Inverse and Leveraged ETFs</title>
		<link>http://programmersnotebook.wordpress.com/2010/02/05/decline-in-inverse-and-leveraged-etfs/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/02/05/decline-in-inverse-and-leveraged-etfs/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 12:44:05 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[money]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=89</guid>
		<description><![CDATA[So this post isn&#8217;t about programming, but it has a lot of math, and it&#8217;s about understanding how something works. Lately my investing has used some ETFs, but I&#8217;ve read that if your ETF is inverse or leverage, then it gradually loses money over time—simply because of the mathematics. I wanted to investigate just how [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=89&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So this post isn&#8217;t about programming, but it has a lot of math, and it&#8217;s about understanding how something works. Lately my investing has used some ETFs, but I&#8217;ve read that if your ETF is inverse or leverage, then it gradually loses money over time—simply because of the mathematics. I wanted to investigate just how this worked.</p>
<p>First, <a href="http://www.fool.com/investing/etf/mutual-funds-v-etfs.aspx">a bit of background</a>: an ETF is an Exchange-Traded Fund, a pretty new thing. It&#8217;s not a stock, but a type of derivative used to track something else, such as a sector of stocks. In this regard it&#8217;s a bit like an index, but more flexible. So you could have an ETF tracking the performance of pharmaceutical stocks, or financial stocks, or the S&amp;P 500. They are a little different from index-based mutual funds. You can trade them any time during the day, and they don&#8217;t have the high minimum purchases required by some mutual funds. On the other hand, you pay a commission, unlike a non-load mutual fund. (In <a href="http://www.benzinga.com/110871/ivv-buying-the-s-p-500-without-commissions">some cases</a> this may not be true.)</p>
<p>But the really interesting thing about them is that they can be leveraged or inverse. A leveraged fund aims to achieve some multiple of the daily change of the underlying index. So while the SPY fund tracks the S&amp;P 500, you can get 2x the S&amp;P 500 (both up and down) with the SSO fund. This greater volatility gives you the potential for more profit with each trade, diminishing the relative size of the commission. It&#8217;s a bit like you invested twice as many dollars as you actually have.</p>
<p>An inverse fund is a leveraged fund with a negative multiplier. You can get funds at -1x, -2x, or I think even -3x. This essentially lets you short the market, something difficult for retail investors to manage. Also, unlike with shorting, you don&#8217;t risk an unlimited amount of money. You only risk the money you use to buy the ETF.</p>
<p>The fly in the ointment is that ETFs, when leveraged or inverse, gradually decline in value. Let&#8217;s say we have an index that start at 100, declines to 80 (-20%), then goes back up to 100 (+25%). Now let&#8217;s say we were tracking it with four ETFs, a 1x, a -1x, a 2x, and a -2x. They all start at 40. Here are the results:</p>
<table>
<tbody>
<tr>
<th>Fund</th>
<th>t<sub>0</sub></th>
<th>t<sub>1</sub></th>
<th>t<sub>2</sub></th>
</tr>
<tr>
<td>Index</td>
<td>100</td>
<td>80</td>
<td>100</td>
</tr>
<tr>
<td>1x ETF</td>
<td>40</td>
<td>32</td>
<td>40</td>
</tr>
<tr>
<td>-1x ETF</td>
<td>40</td>
<td>48</td>
<td>36</td>
</tr>
<tr>
<td>2x ETF</td>
<td>40</td>
<td>24</td>
<td>36</td>
</tr>
<tr>
<td>-2x ETF</td>
<td>40</td>
<td>56</td>
<td>28</td>
</tr>
</tbody>
</table>
<p>As you can see, all our ETFs lost money, except the 1x version. Conspicuously, the -1x and 2x ETF lost the same amount, but the -2x ETF lost more. If we had reversed the order of changes (100 to 125 to 100), we would see the same results.</p>
<p>So I set out to analyze the mathematics behind all this. Let&#8217;s start with some equations describing our targeted fund, the index. I&#8217;ll call its initial value T. Our intermediate value will be T&#8217;. Our final value will be T&#8221;—but of course this is equal to T. Let&#8217;s call the first percent change d, and the second percent change d&#8217;. That gives us these equations:</p>
<blockquote><p>T&#8217; = T + dT<br />
T = T&#8221; = T&#8217; + d&#8217;T&#8217;</p></blockquote>
<p>If we substitute T for T&#8217; and solve for d&#8217;, we get:</p>
<blockquote><p>T = (T + dT) + d&#8217;(T + dT)<br />
0 = dT + d&#8217;(T + dt)<br />
-dT / (T + dT) = d&#8217;<br />
d&#8217; = -d / (1 + d)</p></blockquote>
<p>Now let&#8217;s write out the equations for our derived fund, D (the ETF). We&#8217;ll use D, D&#8217;, D&#8221; to correspond with T, T&#8217;, T&#8221;. The multiplier of our ETF will be f. So in a 1x ETF, f = 1, but in a -2x ETF, f = -2. That gives us these equations:</p>
<blockquote><p>D&#8217; = D + fdD<br />
D&#8221; = D&#8217; + fd&#8217;D&#8217;</p></blockquote>
<p>Substituting, we get:</p>
<blockquote><p>D&#8221; = (D + fdD) + f * (-d/(1+d)) * (D + fdD)</p></blockquote>
<p>So far so good, but I&#8217;m really interested in the percent lost for each up-and-down cycle. The absolute loss is D &#8211; D&#8221;, so the percent loss is (D &#8211; D&#8221;) / D. Let&#8217;s call this L. That gives us:</p>
<blockquote><p>L = (D &#8211; D&#8221;) / D<br />
L = (D &#8211; ((D + fdD) + f * (-d/(1+d)) * (D + fdD))) / D<br />
L = (D &#8211; D &#8211; fdD &#8211; f * (-d/(1+d)) * (D + fdD)) / D<br />
L = -fd &#8211; f * (-d/(1+d) * (1 + fd)<br />
L = fd/(1+d) * (1 + fd) &#8211; fd<br />
L = fd(1+fd) / (1+d) &#8211; fd(1+d) / (1+d)<br />
L = (fd(1+fd) &#8211; fd(1+d)) / (1+d)<br />
L = fd(1+fd-1-d) / (1+d)<br />
L = fd(fd-d) / (1+d)<br />
L = fd<sup>2</sup>(f-1) / (1+d)<br />
L = f(f-1) * d<sup>2</sup> / (1+d)</p></blockquote>
<p>That final equation tells us that the loss is proportional to f and d. Well, d is kind of obvious: the more our underlying index changes, the more the ETF will change. But f is quite interesting. The more leveraged we are, the more we lose on each cycle. Although neither factor is as simple to write out as we might like, consider this. f(f-1) is almost f*f, or f<sup>2</sup>:</p>
<blockquote><p>f*(f-1) &lt; f*f<br />
f*(f-1) &lt; f<sup>2</sup></p></blockquote>
<p>The effect of f is almost a square. We could imagine this as f<sup>1.9</sup>, although that isn&#8217;t quite right.</p>
<p>At first glance, it seems we could make a similar simplification with d, since d<sup>2</sup>/(d+1) is a little less than d<sup>2</sup>/d:</p>
<blockquote><p>d<sup>2</sup>/(d+1) &lt; d<sup>2</sup>/d<br />
d<sup>2</sup>/(d+1) &lt; d</p></blockquote>
<p>So d&#8217;s effect would be a little less than d: sort of like d<sup>0.9</sup>. But this is misleading, because d<sup>2</sup>/(d+1) only approximates d for large values, and our d will (probably) never go above 1. In fact it&#8217;s only interesting for values like 0.01 or 0.1 or maybe (gulp) 0.5. At that range, we get values like:</p>
<table>
<tbody>
<tr>
<th>d</th>
<th>d<sup>2</sup>/(d+1)</th>
</tr>
<tr>
<td>0.01</td>
<td>0.00009</td>
</tr>
<tr>
<td>0.02</td>
<td>0.00039</td>
</tr>
<tr>
<td>0.03</td>
<td>0.00087</td>
</tr>
<tr>
<td>0.04</td>
<td>0.00153</td>
</tr>
<tr>
<td>0.05</td>
<td>0.00238</td>
</tr>
<tr>
<td>0.10</td>
<td>0.00909</td>
</tr>
<tr>
<td>0.20</td>
<td>0.03333</td>
</tr>
<tr>
<td>0.30</td>
<td>0.06923</td>
</tr>
<tr>
<td>0.40</td>
<td>0.11428</td>
</tr>
<tr>
<td>0.50</td>
<td>0.16666</td>
</tr>
</tbody>
</table>
<p>The graph looks like this:</p>
<p><a href="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline.jpg"><img class="alignnone size-full wp-image-112" title="etf_decline" src="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline.jpg?w=600" alt=""   /></a></p>
<p>Or at close range, like this:</p>
<p><a href="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_closeup.jpg"><img class="alignnone size-full wp-image-113" title="etf_decline_closeup" src="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_closeup.jpg?w=600" alt=""   /></a></p>
<p>All this means that comparatively speaking, f has a greater effect than d.</p>
<p>We could chart f&#8217;s effect for different levels of leverage:</p>
<table>
<tbody>
<tr>
<th>f</th>
<th>f(f-1)</th>
</tr>
<tr>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>6</td>
</tr>
<tr>
<td>4</td>
<td>12</td>
</tr>
<tr>
<td>5</td>
<td>20</td>
</tr>
<tr>
<td>6</td>
<td>30</td>
</tr>
<tr>
<td>-1</td>
<td>2</td>
</tr>
<tr>
<td>-2</td>
<td>6</td>
</tr>
<tr>
<td>-3</td>
<td>12</td>
</tr>
<tr>
<td>-4</td>
<td>20</td>
</tr>
<tr>
<td>-5</td>
<td>30</td>
</tr>
</tbody>
</table>
<p>In other words, each &#8220;step&#8221; up in leverage increases your loss, and going inverse counts as one additional &#8220;step.&#8221;</p>
<p>By multiplying these values with those from d&#8217;s table, you can see your loss for each cycle. For a 2x or -1x ETF, you lose about 2% of your investment for each 10% cycle, or 0.02% for each 1% cycle. For a -2x ETF, you lose about 6% for a 10% cycle or 0.06% for a 1% cycle. Best not to stay in these investments for too long!</p>
<p>The next question is: how long is right? I guess you could look at the last few years to count the number of cycles, and try to figure out an ETF&#8217;s theoretical decline if the market had no net change. But the conventional wisdom answer seems to be about a week, and this sounds right to me. You&#8217;re really betting on the direction of the next move, and if you get that wrong, it&#8217;s going to cost you. So to make money with these ETFs, you need to get not just the direction right, but the timing, too. It&#8217;s hard enough just to get the direction! This need to be right about timing makes them risky for the same reason options are risky: your bet only has so long to play out.</p>
<p>Now here is another thought. Take a look at a graph of f(f-1):</p>
<p><a href="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_f.png"><img class="alignnone size-full wp-image-115" title="etf_decline_f" src="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_f.png?w=600" alt=""   /></a></p>
<p>As you can see, between 0 and 1, this graph dips below 0. That means that if you had a x0.5 ETF, say, you would actually <em>make</em> money over time. I wonder if there are practical reasons to prevent this, or if the return is just too small, so you&#8217;d do better putting your money in a bank. Anyway, it&#8217;s an interesting thought. . . .</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/89/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/89/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/89/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=89&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/02/05/decline-in-inverse-and-leveraged-etfs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>

		<media:content url="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline.jpg" medium="image">
			<media:title type="html">etf_decline</media:title>
		</media:content>

		<media:content url="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_closeup.jpg" medium="image">
			<media:title type="html">etf_decline_closeup</media:title>
		</media:content>

		<media:content url="http://programmersnotebook.files.wordpress.com/2010/02/etf_decline_f.png" medium="image">
			<media:title type="html">etf_decline_f</media:title>
		</media:content>
	</item>
		<item>
		<title>Variation on C Password Dictionary Variator</title>
		<link>http://programmersnotebook.wordpress.com/2010/02/03/variation-on-c-password-dictionary-variator/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/02/03/variation-on-c-password-dictionary-variator/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 01:29:34 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=85</guid>
		<description><![CDATA[So I was thinking, since my password app generates too many possibilities, what if we limited its output so that any given letter was translated into the same variant each time it appears in the word? In other words, if your word was &#8220;aa&#8221; and your leet.txt file had &#8220;a A @,&#8221; then you&#8217;d get [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=85&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So I was thinking, since my password app generates too many possibilities, what if we limited its output so that any given letter was translated into the same variant each time it appears in the word? In other words, if your word was &#8220;aa&#8221; and your leet.txt file had &#8220;a A @,&#8221; then you&#8217;d get just &#8220;aa,&#8221; &#8220;AA,&#8221; and &#8220;@@,&#8221; not &#8220;aA,&#8221; &#8220;a@,&#8221; &#8220;Aa,&#8221; etc. So I made that change, and it seems so reasonable, I made it the default. To get all combinations, use the &#8220;-a&#8221; option. Here is the code:</p>
<pre>
<code>
/**
 * pw-vary.c - implements pw-vary, a program to obfuscate passwords by replacing letters with leet.
 *
 * Makes a lot of assumptions about limits. Also assumes ASCII strings. Pretty lazy!
 *
 * Copyright (c) 2009 by Paul A Jungwirth
 */
#include
#include
#include 

#define LINE_LEN		1024
#define MAX_VARIANTS	20
#define LONGEST_RESULT	4092

#define is_whitespace(c) (c == ' ' || c == '\t' || c == '\n' || c == '\r')
#define set_or_die(v, s) if (!v) { perror(s); exit(EXIT_FAILURE); }
#define unset_or_die(v, s) if (v) { perror(s); exit(EXIT_FAILURE); }

#ifdef DEBUG
#define DEBUGGING(s) s;
#else
#define DEBUGGING(s)
#endif

typedef struct t_variant {
	char letter;
	char *variants[MAX_VARIANTS];
	int count;
	char line[LINE_LEN];
	int maxlen;
} t_variant;

static t_variant *init_variant(char letter) {
	t_variant* ret;

	ret = (t_variant*)malloc(sizeof(t_variant));
	set_or_die(ret, NULL);

	ret-&gt;letter = letter;
	ret-&gt;count = 1;
	ret-&gt;maxlen = 1;
	ret-&gt;variants[0] = ret-&gt;line;

	return ret;
}

static char *find_line_start(char *line) {
	int i = 0;
	char c;

	c = line[i];
	while (c) {
		if (!is_whitespace(c)) return &amp;line[i];
		c = line[i];
		i++;
	}

	return NULL;
}

static void read_leet(t_variant **variants, FILE *f) {
	char line[LINE_LEN];
	char *line_start;
	int len;
	int i, j;
	char first_letter, c;
	int in_whitespace;
	t_variant *vs;

	while (fgets(line, LINE_LEN, f)) {
		// strip leading white space and check for contents
		line_start = find_line_start(line);
		if (!line_start) continue;

		// Create a new entry in variants for the letter.
		first_letter = line_start[0];
		variants[first_letter] = init_variant(first_letter);
		vs = variants[first_letter];

		// List all the possible variants.
		strncpy(vs-&gt;line, line_start, LINE_LEN);
		line_start = vs-&gt;line;
		len = strlen(line_start);
		i = j = 1;
		in_whitespace = 1;
		while (i &lt; len &amp;&amp; j variants[j] = &amp;line_start[i];
					vs-&gt;count++;
					in_whitespace = 0;
					j++;
				}
			}
			i++;
		}

		// find the longest variant for this letter:
		for (i = 0; vs-&gt;variants[i]; i++) {
			len = strlen(vs-&gt;variants[i]);
			if (len &gt; vs-&gt;maxlen) vs-&gt;maxlen = len;
		}
	}
	unset_or_die(ferror(f), "reading leet file");
}

/*@unused@*/
static void print_variants(t_variant **variants) {
	int i, j;

	for (i = 0; i variants[j]) {
				puts(variants[i]-&gt;variants[j]);
				j++;
			}
			puts("====");
		}
	}
}

/**
 * do_word_all and do_word_selective:
 * Each iterates through all the variant combinations for the given word.
 *
 * do_word_all does this with two arrays, each the same length as the
 * original string:
 *   - maxes holds the number of variants for each letter in the string.
 *   - indices holds the current variant number to be printed. We keep
 *     incrementing this array, "carrying" when necessary, just like
 *     counting.
 * This algorithm tests all possible combinations.
 *
 * do_word_selective causes a slightly more complex variation:
 * instead of testing all combinations, we assume that all instances of a
 * given letter use written with the same variant, cutting down on the
 * number of combinations to print. In this version, we rely on three arrays:
 *   - maxes &amp; indices are only as long as the number of *unique* letters.
 *     We use them for "counting" just as before.
 *   - poses has one entry for each char in the string, and it stores
 *     an int for indexing into the two other arrays.
 */

static void do_word_selective(char *buffer, t_variant **variants, char *orig) {
	char c;
	char *end;
	int i, j, k;
	int len, newlen, len2, maxes_len;
	int *maxes;
	int *indices;
	int *poses;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes &amp; indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);
		poses = (int*)calloc(len, sizeof(int));
		set_or_die(poses, NULL);

		newlen = 1; // start with 1 for the .
		j = 0;
		for (i = 0; i count;
				} else {
					maxes[j] = 1;
				}
				poses[i] = j;
				j++;
			} else {
				// point wherever the first letter points
				poses[i] = poses[k];
			}
			newlen += maxes[poses[i]];
		}
		maxes_len = j;

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen &gt;= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j &gt;= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[poses[i]]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = maxes_len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 &amp;&amp; j-- &gt; 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
		free(poses);
	}
}

static void do_word_all(char *buffer, t_variant **variants, char *orig) {
	char *end;
	int i, j;
	int len, newlen, len2;
	int *maxes;
	int *indices;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes &amp; indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);

		newlen = 1; // start with 1 for the .
		for (i = 0; i count;
				newlen += vs-&gt;maxlen;
			} else {
				maxes[i] = 1;
				newlen++;
			}
		}

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen &gt;= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j &gt;= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[i]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 &amp;&amp; j-- &gt; 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
	}
}

static void do_all_words(t_variant **variants, FILE *f, int do_all) {
	char line[1024];
	char buffer[LONGEST_RESULT];

	while (fgets(line, 1024, f)) {
		if (do_all) do_word_all(buffer, variants, line);
		else do_word_selective(buffer, variants, line);
	}
	unset_or_die(ferror(f), "reading file");
}

int main(int argc, char **argv) {
	t_variant *variants[255];		/* 255 pointers, each to an array of 20 pointers to strings. */
	FILE *f;
	int i;
	int arg_start, do_all;

	// Argument processing:
	if (argc &gt; 1 &amp;&amp; (strcmp(argv[1], "-a") == 0)) {
		do_all = 1;
		arg_start = 2;
	} else {
		do_all = 0;
		arg_start = 1;
	}

	bzero(variants, sizeof(variants));

	f = fopen("leet.txt", "r");
	set_or_die(f, "opening leet file");
	read_leet(variants, f);
	unset_or_die(fclose(f), "closing leet file");
	DEBUGGING(print_variants(variants));

	// read words from argv or, if no args, from stdin
	if (argc &gt; arg_start) {
		for (i = arg_start; i &lt; argc; i++) {
			f = fopen(argv[i], &quot;r&quot;);
			set_or_die(f, argv[i]);
			do_all_words(variants, f, do_all);
			unset_or_die(fclose(f), argv[i]);
		}
	} else {
		do_all_words(variants, stdin, do_all);
	}

	return EXIT_SUCCESS;
}
</code>
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/85/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=85&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/02/03/variation-on-c-password-dictionary-variator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
		<item>
		<title>C Version of Password Dictionary Variator</title>
		<link>http://programmersnotebook.wordpress.com/2010/02/03/c-version-of-password-dictionary-variator/</link>
		<comments>http://programmersnotebook.wordpress.com/2010/02/03/c-version-of-password-dictionary-variator/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 00:28:20 +0000</pubDate>
		<dc:creator>pjungwir</dc:creator>
				<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://programmersnotebook.wordpress.com/?p=79</guid>
		<description><![CDATA[I wrote a program to expand password dictionaries the other day, but due to its recursive algorithm it choked on words greater than 11 characters long, such as &#8220;electroencephalograph&#8217;s,&#8221; the longest word in my /usr/share/dict/words. So now I&#8217;ve tweaked the algorithm and also reimplemented the whole thing in C. It&#8217;s not real pretty C, but [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=79&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I wrote a program to <a href="http://programmersnotebook.wordpress.com/2010/01/31/password-dictionary-variations/">expand password dictionaries</a> the other day, but due to its recursive algorithm it choked on words greater than 11 characters long, such as &#8220;electroencephalograph&#8217;s,&#8221; the longest word in my /usr/share/dict/words. So now I&#8217;ve tweaked the algorithm and also reimplemented the whole thing in C. It&#8217;s not real pretty C, but I tried to aim more for performance. It doesn&#8217;t crash on &#8220;electroencephalograph&#8217;s&#8221; anymore, but generating just the combinations for that one word (all 61,917,364,224 of them) took 43,499 seconds of CPU time (42653.61 user, 846.95 system). And that&#8217;s not even the time-consuming part! Just imagine running that many combinations through aircrack, all to test a single word.</p>
<p>Anyway, here is the code:</p>
<pre>
<code>
/**
 * pw-vary.c - implements pw-vary, a program to obfuscate passwords by replacing letters with leet.
 *
 * Makes a lot of assumptions about limits. Also assumes ASCII strings. Pretty lazy!
 *
 * Copyright (c) 2009 by Paul A Jungwirth
 */
#include
#include
#include 

#define LINE_LEN	1024
#define MAX_VARIANTS	20
#define LONGEST_RESULT	4092

#define is_whitespace(c) (c == ' ' || c == '\t' || c == '\n' || c == '\r')
#define set_or_die(v, s) if (!v) { perror(s); exit(EXIT_FAILURE); }
#define unset_or_die(v, s) if (v) { perror(s); exit(EXIT_FAILURE); }

#ifdef DEBUG
#define DEBUGGING(s) s;
#else
#define DEBUGGING(s)
#endif

typedef struct t_variant {
	char letter;
	char *variants[MAX_VARIANTS];
	int count;
	char line[LINE_LEN];
	int maxlen;
} t_variant;

static t_variant *init_variant(char letter) {
	t_variant* ret;

	ret = (t_variant*)malloc(sizeof(t_variant));
	set_or_die(ret, NULL);

	ret-&gt;letter = letter;
	ret-&gt;count = 1;
	ret-&gt;maxlen = 1;
	ret-&gt;variants[0] = ret-&gt;line;

	return ret;
}

static char *find_line_start(char *line) {
	int i = 0;
	char c;

	c = line[i];
	while (c) {
		if (!is_whitespace(c)) return &amp;line[i];
		c = line[i];
		i++;
	}

	return NULL;
}

static void read_leet(t_variant **variants, FILE *f) {
	char line[LINE_LEN];
	char *line_start;
	int len;
	int i, j;
	char first_letter, c;
	int in_whitespace;
	t_variant *vs;

	while (fgets(line, LINE_LEN, f)) {
		// strip leading white space and check for contents
		line_start = find_line_start(line);
		if (!line_start) continue;

		// Create a new entry in variants for the letter.
		first_letter = line_start[0];
		variants[first_letter] = init_variant(first_letter);
		vs = variants[first_letter];

		// List all the possible variants.
		strncpy(vs-&gt;line, line_start, LINE_LEN);
		line_start = vs-&gt;line;
		len = strlen(line_start);
		i = j = 1;
		in_whitespace = 1;
		while (i &lt; len &amp;&amp; j variants[j] = &amp;line_start[i];
					vs-&gt;count++;
					in_whitespace = 0;
					j++;
				}
			}
			i++;
		}

		// find the longest variant for this letter:
		for (i = 0; vs-&gt;variants[i]; i++) {
			len = strlen(vs-&gt;variants[i]);
			if (len &gt; vs-&gt;maxlen) vs-&gt;maxlen = len;
		}
	}
	unset_or_die(ferror(f), "reading leet file");
}

/*@unused@*/
static void print_variants(t_variant **variants) {
	int i, j;

	for (i = 0; i variants[j]) {
				puts(variants[i]-&gt;variants[j]);
				j++;
			}
			puts("====");
		}
	}
}

static void do_word(char *buffer, t_variant **variants, char *orig) {
	char *end;
	int i, j;
	int len, newlen, len2;
	int *maxes;
	int *indices;
	t_variant *vs;

	orig = find_line_start(orig);
	if (orig) {
		end = strpbrk(orig, "\n\r");
		if (end) end[0] = '';
		// puts(orig);

		// initialize maxes &amp; indices
		// TODO: pass these as buffers to speed things up
		len = strlen(orig);
		maxes = (int*)malloc(sizeof(int) * len);
		set_or_die(maxes, NULL);
		indices = (int*)calloc(len, sizeof(int));	// start with all zeroes.
		set_or_die(indices, NULL);

		newlen = 1; // start with 1 for the .
		for (i = 0; i count;
				newlen += vs-&gt;maxlen;
			} else {
				maxes[i] = 1;
				newlen++;
			}
		}

		// don't proceed if newlen is greater than LONGEST_RESULT (improbable)
		if (newlen &gt;= LONGEST_RESULT) {
			fprintf(stderr, "Result for \"%s\" too long: %d characters\n", orig, newlen);
		} else {
			j = 0;
			while (j &gt;= 0) {
				// Construct and print the string using our current position, represented by indices.
				buffer[0] = '';
				for (i = 0; i variants[indices[i]]);
					} else {
						len2 = strlen(buffer);
						buffer[len2] = orig[i];
						buffer[len2 + 1] = '';
					}
				}
				puts(buffer);

				// Now add one to our position.
				j = len - 1;
				do {
					indices[j] = (indices[j] + 1) % maxes[j];
				} while (indices[j] == 0 &amp;&amp; j-- &gt; 0); // short-circuit: j only decremented when we carry

				// when j is -1, we've overflowed; hence we've printed all combinations.
			}
		}

		free(maxes);
		free(indices);
	}
}

static void do_all_words(t_variant **variants, FILE *f) {
	char line[1024];
	char buffer[LONGEST_RESULT];

	while (fgets(line, 1024, f)) {
		do_word(buffer, variants, line);
	}
	unset_or_die(ferror(f), "reading file");
}

int main(int argc, char **argv) {
	t_variant *variants[255];		/* 255 pointers, each to an array of 20 pointers to strings. */
	FILE *f;
	int i;

	bzero(variants, sizeof(variants));

	f = fopen("leet.txt", "r");
	set_or_die(f, "opening leet file");
	read_leet(variants, f);
	unset_or_die(fclose(f), "closing leet file");
	DEBUGGING(print_variants(variants));

	// read words from argv or, if no args, from stdin
	if (argc &gt; 1) {
		for (i = 1; i &lt; argc; i++) {
			f = fopen(argv[i], &quot;r&quot;);
			set_or_die(f, argv[i]);
			do_all_words(variants, f);
			unset_or_die(fclose(f), argv[i]);
		}
	} else {
		do_all_words(variants, stdin);
	}

	return EXIT_SUCCESS;
}
</code>
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/programmersnotebook.wordpress.com/79/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/programmersnotebook.wordpress.com/79/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/programmersnotebook.wordpress.com/79/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=programmersnotebook.wordpress.com&amp;blog=11506190&amp;post=79&amp;subd=programmersnotebook&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://programmersnotebook.wordpress.com/2010/02/03/c-version-of-password-dictionary-variator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">pjungwir</media:title>
		</media:content>
	</item>
	</channel>
</rss>
