Showing posts with label C#. Show all posts
Showing posts with label C#. Show all posts

Wednesday, 23 February 2011

Reverse the polarity of a stream using co-streams

This post
is about

C#

Motivation

There is one very useful thing about streams: you can chain them. You can implement a stream that takes an “underlying” stream, and you end up with a chain of streams that process data like as if it were going through a pipeline.

However, there is a fundamental limitation to this. All the streams need to be either all reading, or all writing. If you have a stream that can only pass data you write to an underlying writable stream, you’re stuffed if you actually want to read the generated data instead.

Specific example: System.IO.Compression.GZipStream. Inexplicably, if you want to compress to gzip, you can only get a write-only stream, and if you want to uncompress from gzip, you can only get a read-only stream. What if you want to read an uncompressed file but compress the data while you’re reading it? GZipStream doesn’t let you do that. So I needed a way to “reverse the polarity” of a stream.

Enter co-streams

I invented this term, so you won’t find it on Google. The idea is very simple. You write two methods that each take a stream. One of them writes the gzipped data to the provided stream (this is almost boilerplate code):

public static void Write(Stream stream)
{
    using (var gzip = new GZipStream(stream, CompressionMode.Compress))
    using (var f = File.Open(@"...", FileMode.Open))
    {
        var buffer = new byte[65536];
        int bytesRead;
        while ((bytesRead = f.Read(buffer, 0, 65536)) > 0)
            gzip.Write(buffer, 0, bytesRead);
    }
}

... and the other one reads from a stream and processes it as if it were expecting to get gzipped data from the stream. It acts as if the provided stream were a gzip compress stream with its polarity reversed:

public static void Read(Stream stream)
{
    // Read from ‘stream’, and gzipped data will come out!
}

Then you put them together, and the magic happens behind the scenes:

Costreams.RunCostreams(Write, Read);

Of course, exactly the same thing works the other way around: If you have already-gzipped data which you want to write to a stream and have it decompressed, write it to the stream in Write() and use a GZipStream in the Read() method to do the decompressing.

Behind the scenes

Behind the scenes, the read and write methods are executed in parallel (in separate threads).

But before we do that, we instantiate two streams: a read-only stream and a write-only stream. We pass the read-only stream to the read method and the write-only stream to the write method.

The two stream objects share a queue of data between them. Every time the write method writes to the write-only stream, it populates that queue and notifies the read thread; when the read method reads from the read-only stream, it waits for that notification and then dequeues the data.

When the write method finishes, we add “null” to the queue to communicate to the read-only stream that this is the end of the stream (which it will then communicate to the read method by returning 0 from Read()). Then we wait for the read thread to return.

Cool things about co-streams

  • This solution maintains the generality of streams. You can use this to reverse the polarity of any stream.
  • I like the fact that the two methods run in parallel. This way they can do whatever computations they want and they get automatically somewhat parallelised.
  • The use of a queue means that the write thread never needs to wait for the read thread to consume the data. Instead, it just puts the data in the queue and lets the read thread get to it whenever it’s convenient. The only time there is need to wait is when the read thread needs data (obviously) and when the write thread finishes and the read thread is still processing.

Implementation

To save you from having to duplicate my implementation, here is the complete implementation.

Room for improvement

The Read() method in the read-only stream implementation is not very clever. If the writing thread writes small amounts of data at a time, the read-only stream will return the same small chunks at a time. It could be improved by consolidating several chunks whenever the provided buffer is large enough to fit them.

Friday, 12 March 2010

Dynamic method invocation without the TargetInvocationException wrapping

This post
is about

C#

Motivation

I find it useful that the Visual Studio debugger stops at the right place where an unhandled exception occurs. That makes it all the more annoying when it doesn’t. One of those cases where it doesn’t is when any of the methods on the call stack happen to have been called dynamically via Reflection. MethodBase.Invoke catches all exceptions thrown by the target method, wraps them in a TargetInvocationException, and then rethrows that. Consequently, the Visual Studio debugger stops at the place where the dynamic invocation happens, which means I can't see the context and call stack where the actual exception really happened.

I am, nota bene, aware of the fact that there is a good reason for having the TargetInvocationException wrapping. I am also aware of the fact that you can instruct Visual Studio, via “Exceptions” in the “Debug” menu, to stop the debugger when the exception happens irrespective of whether it is unhandled or not. Neither of those considerations mitigate the annoyance; the “Exceptions” dialog is very hard to use, especially for custom-defined exception types, which are not automatically listed — and even if it wasn’t, I simply shouldn’t have to do any of that because in most cases the fact that a method was invoked dynamically is irrelevant to the particular problem I might be debugging.

A partial solution

Any solution to this problem must not involve MethodBase.Invoke(). Nor, indeed, can it use Delegate.DynamicInvoke() because that performs the same exception wrapping. However, invoking a normal delegate of the right signature doesn’t do the wrapping, so we’ll start looking there.

public delegate string IntToStringDelegate(int integer);

public string InvokeMethod(MethodInfo method, object instance, int parameter)
{
    var delegate = (IntToStringDelegate) Delegate.CreateDelegate(
        typeof(IntToStringDelegate), instance, method);
    return delegate(parameter);
}

We’ve achieved part of our goal; we can now call any int-to-string method — even private ones! — and have the Visual Studio debugger report exceptions in the place where they occur. But the usefulness is limited by the fact that we can only call single-parameter methods that take an int and return a string. We can easily extend this to any single parameter type by declaring generic methods:

public void InvokeVoidMethod<TParam>(MethodInfo method, object instance,
        TParam parameter)
{
    var delegate = (Action<TParam>) Delegate.CreateDelegate(
        typeof(Action<TParam>), instance, method);
    delegate(parameter);
}

public TResult InvokeMethod<TParam, TResult>(MethodInfo method, object instance,
        TParam parameter)
{
    var delegate = (Func<TParam, TResult>) Delegate.CreateDelegate(
        typeof(Func<TParam, TResult>), instance, method);
    return delegate(parameter);
}

It is straightforward from here to extend this further by adding additional overloads that can take multiple parameters. But the attentive reader will surely see that this solution is still rather useless: it requires the caller to know the exact signature of the method at compile time. In most cases where you know the signature, you generally know the method itself, so you could actually just call it directly (unless, of course, it’s private). We want to cover the case where you know nothing about the method and all you have is an object[] containing the intended parameters for the method (which have the right types, but only at runtime), an object containing the instance on which you want to call the method, and the MethodInfo object describing the method to be called. In other words, we want to replace MethodInfo.Invoke() completely.

This poses several problems. We can’t use Func<> or Action<> because they only exist for up to four parameters. Of course you could declare more of them, but you can’t declare infinitely many of them. We also can’t invoke the delegate directly; the only method on the Delegate type that can take an object[] for the parameters is Delegate.DynamicInvoke(), which we know is unsuitable because it wraps our precious exceptions.

The full solution

We are going to create a whole new assembly. At runtime. We will dynamically declare a delegate type and we will write a method in IL. Sounds crazy? It sure is!

But let’s go slow. Let’s start from the beginning. What do we need? We know that we can’t invoke the delegate directly. We can’t even create the delegate directly. But we certainly need a method that we can invoke directly. Let’s think about what such a method would have to look like by creating an interface declaration. We certainly can’t make any assumptions about the parameter types, the return type, or the instance type on which we want to call the method. Everything has to be unspecific:

public interface IInvokeDirect
{
    object DoInvoke(object instance, object[] parameters, MethodInfo method);
}

Now that we have an interface, let’s think about what a class would have to look like that wants to implement this interface. Since we can theoretically have several different classes implementing this same interface, we can actually start making assumptions. So let’s examine an easy special case first. What would an implementation of this method have to look like if the method has one parameter of type “int” and a return type of “string” (like the IntToStringDelegate we mentioned earlier)? Answer: it would look almost exactly like the InvokeMethod() we had at the very beginning:

public delegate string IntToStringDelegate(int integer);

public class IntToStringInvoker : IInvokeDirect
{
    public object DoInvoke(object instance, object[] parameters, MethodInfo method)
    {
        var delegate = (IntToStringDelegate) Delegate.CreateDelegate(
            typeof(IntToStringDelegate), instance, method);
        return (string) delegate((int) parameters[0]);
    }
}

And now comes the magic. We don’t want to make any assumptions about the parameter types at compile time, but we certainly know the method parameter types at runtime (simply by looking at the MethodInfo object we’re given). Therefore, for any given MethodInfo object, we know exactly what the above delegate and class declarations would have to look like. We could, if we wanted to, generate the C# code at runtime. But that would be laborious: we’d have to make sure we output all the type identifiers in valid C# syntax (including arrays, generic types, etc.) and we’d have to make sure to include references to the client code that declares some of those types. So instead of generating C# code, we generate the actual delegate and class types directly — using System.Reflection.Emit.AssemblyBuilder. This works because the stuff in the System.Reflection.Emit namespace is clever enough to let us pass Type and MethodInfo objects into it that we don’t know anything about. Also, we can be more efficient by using only one assembly instead of generating a new assembly for each different method signature.

So our InvokeDirect() method will look roughly like this:

public static class DirectInvoker
{
    private static Dictionary<string, IInvokeDirect> invokers =
        new Dictionary<string, IInvokeDirect>();

    public static object InvokeDirect(MethodInfo method,
        object instance, params object[] parameters)
    {
        // Put some exception throw statements here to ensure that method != null,
        // method.GetParameters().Length == parameters.Length, etc.

        // Create a dynamic assembly, or re-use the last one
        createAssembly();

        // Generate a string that identifies the method "signature" (the parameter
        // types and return type, not the method name). Different methods which
        // have the same "signature" according to this criterion can re-use the
        // same generated code.
        var sig = string.Join(" : ", method.GetParameters()
            .Select(p => p.ParameterType.FullName)
            .Concat(new[] { method.ReturnType.FullName }).ToArray());

        if (!invokers.ContainsKey(sig))
        {
            // Create a delegate type compatible with the method signature
            var delegateType = createDelegateType(method, sig);

            // Create a class that implements IInvokeDirect
            var classType = createClassType(method, sig, delegateType);

            // Instantiate the new class and remember the instance for future calls
            invokers[sig] = (IInvokeDirect) Activator.CreateInstance(classType);
        }

        // Invoke the interface method, which will call the target method.
        return invokers[sig].DoInvoke(instance, parameters, method);
    }
}

I’ll skip the boring parts (createAssembly() and createDelegateType() are straightforward) and get straight to createClassType(), where we construct our method by emitting IL code. We have to write IL code that is equivalent to the DoInvoke() method above, so let’s first see what this method looks like in IL. We’ll actually look at the IL for a single-line version of the method that doesn’t use a local variable, and I’ll make it two parameters instead of one so we can more easily spot the repeating bit:

public object DoInvoke(object instance, object[] parameters, MethodInfo method)
{
    return ((IntToStringDelegate) Delegate.CreateDelegate(
        typeof(IntToStringDelegate), instance, method))
        (
            (int) parameters[0],
            (string) parameters[1]
        );
}

Here’s the IL. I’ve abbreviated the type names to make it more readable, and I’ve added colours to show which part of the above method each bit corresponds to. Also, notice that “delegate(parameters)” is C# syntactic sugar for “delegate.Invoke(parameters)”, hence the call to an “Invoke” method at the end. Finally, the following IL is for the Release build; if you compile in Debug mode, you’ll see a few redundant extra operations such as nop.

.method public hidebysig static object DoInvoke(
    object instance, object[] parameters, MethodInfo method)
    cil managed
{
    .maxstack 8
    L_0000: ldtoken IntToStringDelegate
    L_0005: call Type Type.GetTypeFromHandle(RuntimeTypeHandle)
    L_000a: ldarg.0
    L_000b: ldarg.2
    L_000c: call Delegate Delegate.CreateDelegate(Type, object, MethodInfo)
    L_0011: castclass IntToStringDelegate
    L_0016: ldarg.1
    L_0017: ldc.i4.0
    L_0018: ldelem.ref            // Notice this one uses unbox.any instead of castclass
    L_0019: unbox.any int32    // because int is a value type
    L_001e: ldarg.1
    L_001f: ldc.i4.1
    L_0020: ldelem.ref            // Notice this one uses castclass instead of unbox.any
    L_0021: castclass string     // because string is a reference type
    L_0026: callvirt string IntToStringDelegate.Invoke(int, string)
    L_002b: ret
}

You can straightforwardly translate this into a series of calls to ILGenerator.Emit(). The only gotchas you need to be aware of are to use unbox.any for value types and castclass otherwise; and to use ldc.i4.s and ldc.i4 appropriately to handle cases where you have more than 9 parameters. Finally, if the return type is void, you have to add an extra ldnull before the ret (just like in C# where you have to add an extra “return null” after the delegate call).

The complete code

To save you some frustration trying to get the boring bits to work, here is the complete implementation.

Caveat

This solution doesn’t work for invoking constructors. Delegate.CreateDelegate() expects a MethodInfo, not a MethodBase (ConstructorInfo derives from MethodBase). If you can think of a way to extend my solution that it would cover constructors too, please e-mail me.

Tuesday, 2 March 2010

Ensure code quality using a post-build event

This post
is about

C#

Motivation

While writing code, have you ever run into a situation where you got this uneasy feeling that there’s something you need to remember for later? Something where you need your code to remain consistent, but it’s not something the compiler can check for you? I’ll give an example so you know what I’m talking about. Suppose you declare an enum, something like this:

public enum Animal
{
   Cat,
   Sheep
}

And then you have a method that uses a switch statement on the enum, something like this:

public static string GetAnimalSound(Animal animal)
{
   switch (animal)
   {
       case Animal.Cat:
           return "meow";
       case Animal.Sheep:
           return "baah";
       default:
           throw new InvalidOperationException("Unrecognised animal: " + animal);
   }
}

Of course, you don’t want that InvalidOperationException to happen in your production code. But if you add a new value to your enum and forget to update that method, that is exactly what’s going to happen. And you knew this while writing the code; you knew that the compiler won’t remind you of this method when you add an enum value. In some simple cases, you can add a comment to the enum to remind yourself, but that still doesn’t stop you from simply missing that comment — and it breaks down if the enum is in a library and the method is in client code.

Solution

A more robust solution is to run an automated validity check and cause the build to fail if the check doesn’t pass. In other words, you cannot run the code while it is inconsistent. First, let’s write the validity check itself.

#if DEBUG
    private static void PostBuildCheck()
    {
        // Ensures that all Animal enum values are covered by Tools.GetAnimalSound()
        foreach (var animal in Enum.GetValues(typeof(Animal)))
            Tools.GetAnimalSound((Animal) animal);
    }
#endif

As you can see, this will simply call GetAnimalSound() on each Animal value and discard the result. The net effect is that the method will throw an InvalidOperationException if any of the enum values is not covered by the method; otherwise it returns normally.

Now we need something that calls this method. We don’t want to call the method directly; we want to be able to add PostBuildCheck() methods to various types as we see fit and not have to remember to enlist them somewhere. So we go through all types in the assembly and call all those methods via reflection:

#if DEBUG
    public static int RunPostBuildChecks(Assembly assembly)
    {
        bool anyError = false;
        foreach (var ty in assembly.GetTypes())
        {
            var meth = ty.GetMethod("PostBuildCheck", BindingFlags.NonPublic | BindingFlags.Static);
            if (meth != null && meth.GetParameters().Length == 0 && meth.ReturnType == typeof(void))
            {
                try
                {
                    meth.Invoke(null, null);
                }
                catch (Exception e)
                {
                    var realException = e;
                    while (realException is TargetInvocationException && realException.InnerException != null)
                    {
                        realException = realException.InnerException;
                    }
                    Console.Error.WriteLine(string.Format("Error: {0} ({1})",
                        realException.Message.Replace("\n", " ").Replace("\r", ""),
                        realException.GetType().FullName));
                    anyError = true;
                }
            }
        }
        return anyError ? 1 : 0;
    }
#endif

Things to note here:

  • We wrap all this code inside #if DEBUG. This way, the code will be absent from the Release build, lest all of you scream and shout that you’d be shipping redundant code.
  • The call to Invoke() has the habit of wrapping all exceptions in a TargetInvocationException. We unravel this to get to the real exception (in our example, the InvalidOperationException).
  • We output the error message to Console.Error and prefix it with the word "Error:". We will explain later why this is important.
  • We return 0 in case of success, 1 in case of failure. Maybe you can already guess where this is going.

Now obviously something needs to call this method. Add the following small piece of code to the beginning of your project’s Main() method:

#if DEBUG
    if (args.Length == 1 && args[0] == "--post-build-check")
        return RunPostBuildChecks(Assembly.GetExecutingAssembly());
#endif

And finally, edit your project properties, and on the Build Events tab, add the following post-build event command line:

    "$(TargetPath)" --post-build-check

Now add a value to the original enum and try to build your project. You should see an error message in the Error Window. The reason this works is because if the Main() method returns 0, the post-build event is considered to have succeeded; otherwise it is considered to have failed, and each line that was output to Console.Error is shown in the Error Window if it starts with "Error:" or "Warning:".

Enhancements

At the moment, the error message you get is not very useful. For one, it only says "Unrecognised animal", but it doesn’t tell you where the error occurred. The good news is, the original exception contains a stacktrace that we can use to find out where it was thrown. Instead of the simple call to Console.Error.WriteLine in RunPostBuildChecks(), use the following code:

    var stackTrace = new StackTrace(e, true);
    var stackFrame = stackTrace.GetFrame(0);
    Console.Error.WriteLine(string.Format("{0}({1},{2}): Error: {3} ({4})",
        stackFrame.GetFileName(),
        stackFrame.GetFileLineNumber(),
        stackFrame.GetFileColumnNumber(),
        realException.Message.Replace("\n", " ").Replace("\r", ""),
        realException.GetType().FullName));

This will cause your console output to look something like this:

Visual Studio correctly recognises this as a filename, line number and column number, allowing you to simply double-click on the error in the Error Window and taking you directly to where the exception occurred in the PostBuildCheck() method.

Advantages of this method

Here are some reasons why I like this idea:

  • You don’t have to create or maintain a new project. All the code is right there in the project itself.
  • Because it’s in the same project, and because you can place the verification code in any class (or struct), it’s trivially possible to access all the types, fields and methods in your project, including the private fields and methods of the class you’re testing.
  • A failing check tells you the place in the code where it is failing. It’s as good as a compiler error.
  • Everyone in your team who builds the project will run the tests. If someone in your team submits a change to revision control that violates a test, it is as inexcusable as submitting a change that doesn’t compile.
  • In theory you can put all your unit tests into a post-build check like this, removing the need for a separate unit-testing framework. In some situations this may be undesirable because it might make the build take too long, but you can do it in simple cases.

Further enhancements

There are some more ways in which I’ve enhanced this idea for myself:

  • The RunPostBuildChecks() method is actually in a library so I can re-use it in several projects.
  • Using exceptions allows each PostBuildCheck() method to return only one error. Therefore, I decided to declare an interface IPostBuildReporter, which has methods to output errors and warnings directly, and pass it as a parameter to the PostBuildCheck() methods. This way I can report any number of errors or warnings.
  • My implementation of IPostBuildReporter can search the source code for any string. Instead of outputting the location of the exception, I use that to output the location of the actual error. (This doesn’t make much sense in the above example because the location of the exception is the location of the error, but imagine a different example where a post-build check verifies that a certain set of classes in your project are all marked with the [Serializable] attribute or something like that. You want the error to point to the class that is missing the attribute, which could be anywhere in the source.)
  • To this end, I pass the $(SolutionDir) as a command-line parameter in the post-build event, allowing the IPostBuildReporter implementation to know where to look for the source code.