A Brand New .NET Language: Top#

Recently, I had to implement a software engine for web use that end-users could extend and customize using some form of scripting. I looked at various .NET based Lua libraries but found that they were either lacking functionality or stability and were not thread safe. I attempted to somehow leverage C# with things like the DLR, Roslyn, Mono’s CSharp.Compiler and so on, but in the end found that the risks of crashing the engine were too high. I needed something that could be controlled completely, so that for instance infinite recursion could be handled properly even with multiple scripts running in parallel.

In the end, I decided that a Domain-Specific Language (DSL) would be the right solution, and so implemented a small but completely functional .NET programming language with its own IL compiler. I couldn’t come up with a cool name, so I decided to call it Top#.

The syntax (grammar) is written from the ground up to be C-like, because everybody loves curly braces and the functional style is easily understood. The compiler is also written from the ground up and generates CLI-compliant IL bytecode that can run in both the Microsoft .NET runtime (requires 3.5) and in Mono (2.2). In other words, Top# is an actual .NET language just like C#.

Here’s an example of what a program written in Top# looks like:

<#
 Simple Top# test script.
 Copyright (C) 2013, Topholt Solutions A/S.
#>

script
{
  # Global variable.
  global int counter = 0;

  # Function that takes an int and returns an int.
  int MultiplyIt(int a, int b)
  {
    return a * b;
  }

  # Function that calls itself recursively.
  void Calc(int a)
  {
    # Nested for-loops.
    for (int i = 0; i <= a; i++)
    {
      for (int t = 0; t <= 10; t++)
      {
        # Call .NET method to output message (automatic conversion int->string)
        say ("i=" + i + ", t=" + t);

        # Nested if-statements.
        if (counter == 300)
        {
          # Call .NET function to output uppercased message.
          shout("we made it to 300 iterations so stop now");

          # Return from inside a nested structure.
          return;
        }
        else
        {
          if(i * t == 25)
          {
            whisper("do something here");
          }
        }

        # Global variable use inside function.
        counter = counter + 1;
      }
    }
    # Recursive call.
    if (a > 10)
    {
      Calc(a + 1);
    }
  }

  # Function call from global scope.
  Calc(MultiplyIt(2, 2));
}

At the time of writing, Top# supports this amazing set of features:

  • int, string and boolean types
  • Void and typed-return functions
  • Addition, subtraction, multiplication, division, increment and decrement operators
  • Equals (all types), greater-than, greater-than-equal, less-than and less-than-equal expressions (ints only)
  • For-loops
  • If-else statements
  • Global and local (scoped) variables
  • Allows global and local calls to functions, including recursion
  • Nested scope blocks for things such as functions, for-loops, if-else, etc.
  • Automatic conversion to string in expressions where the left side is string
  • Stack depth is limited to 1000, so it’s not (easily) possible to do infinite recursive loops
  • Single and multiline comments
  • Complete and utter disregard for whitespace

But wait! There’s more! The Top# compiler generates a collectible DynamicAssembly, which is great because types from these things can be instantiated in the current AppDomain and then be garbage collected when instances are no longer held. In other words, Top# scripts can be compiled, run, re-compiled and re-run endlessly without creating a clutter of .NET Assemblies that can’t be unloaded. Yet at the same time, they are proper .NET Assemblies and the compiler can save them to disk, to be used in other C# projects, for instance. Or opened in Reflector and disassembled into C# πŸ™‚

Since Top# compiles to IL bytecode, it can interoperate with any .NET framework or 3rd party Assemblies, just like you would expect. So for instance, calling System.Console::WriteLine() or System.IO.File::ReadXXX() is a one-liner for the compiler. The same with calling to and from the application hosting the Top# Assembly (i.e. referencing the dll).

No I haven’t cheated and somehow used one of the above mentioned Roslyn, DLR, CSharp.Compiler, Cecil or other compilation frameworks. The IL that the Top# compiler emits is hand-written by me. I realize this probably sounds a lot harder than it actually was. The CLR is in fact simply a giant stack-based virtual machine and the IL opcodes are easy to understand and work with — it’s a bit like assembler but without some of the weird parts. For instance, let’s say you have a method that receives an int argument with a value of 5 and you want to return it multiplied by 2. Here are the opcodes for that:

ldarg a  // load argument a (eg integer with value 5) onto stack
ldc.i4 2 // load a 4-byte integer with the value 2 onto stack
mul      // pop values from stack, multiply and push result onto stack
ret      // return

I have at time of writing not implemented any of the obvious IL optimizations that could be done, and as such Top# runs at (exactly) the same speed as a C# program compiled in Debug mode. For instance, if an expression says:

int a = 2 + 2 * 2;

The C# compiler will in Debug mode push the three integers onto the stack and add/mul them together (as far as I recall). But in release mode, it will push the integer value 6 onto the stack, since all the expression values are known at compile time. Top# could do the same, but doesn’t.

I should note that I have NOT written the lexer/parser myself, but instead used Terence Parr’s absolutely ass-kicking ANTLR4 project, which is what everybody else also use if they are not Microsoft or IBM. ANTLR4 is amazing and β€œParrT” is very entertaining both on YouTube and in writing. I highly recommend you use his software and buy his book if you ever find yourself wanting to implement a Domain-Specific Language of any sort.

When I get a chance, I will try to clean up the core Top# compiler and post the source code here. In the mean time, feel free to ask questions in the comment section if you are working on a similar project and I can maybe help out with snippets and such.