.NET

Mono & the .NET Framework

By Miguel de Icaza and Brian Jepson, January 01, 2002

Jan02: Mono & the .NET Framework

An open-source alternative

By

Miguel is the founder and leader of the GNOME Foundation and cofounder and CTO of Ximian (http://www.ximian.com/). He can be contacted at [email protected]. Brian is the author of Database Application Programming with Linux (John Wiley & Sons, 2000) and coauthor of the Perl Resource Kit Utilities Guides (O'Reilly & Associates, 1997, 1998). He can be contacted at http://www.jepstone.net/.

Since the early days of the GNOME project (http://www.gnome.org/), there's been a tremendous demand to export API calls to programming languages other than C — Python, Perl, Ada, and Java, to name a few. However, the problem with language bindings is that it takes considerable time and effort to expose new APIs to languages other than C once those APIs are defined. This clearly impacts development costs. Like most organizations, Ximian (the company Miguel works for), wants to minimize costs when building large applications — and the .NET Framework appears to offer a way to do this. Take, for example, Evolution, Ximian's groupware suite that consists of approximately 750,000 lines of source code. By providing language independence, garbage collection, and thread support, the .NET Framework appeared to address some of the language-binding and development-cost challenges we face. Furthermore, the C# language is a great object-oriented programming language our developers can use.

In March 2001, with this in mind, we started prototyping Mono — an open-source implementation of the .NET Development Framework — with the specific goal of addressing these cost and language-binding challenges. We were confident that if we made Mono (http://www.go-mono.com/) an open-source project, we could implement the framework, a JIT engine, and C# compiler relatively quickly. Nevertheless, we were pleasantly surprised when so many programmers expressed interest in contributing to Mono when we launched it with a partial implementation last July — even without a self-hosting environment on Linux. As it turns out, much of the class library code ended up coming from Windows programmers, as well as tools such as the IL assembler and code verifier.

.NET Standardization

The .NET Framework can be divided into two parts:

That which Microsoft, Intel, and Hewlett-Packard submitted to the European Computer Manufacturers Association (ECMA) as an open standard.
The Framework Class Library (FCL), which includes class libraries that go beyond those submitted for standardization.

Mono is based on the ECMA Common Language Infrastructure (CLI) and C# specifications (available at http://msdn.microsoft.com/net/ecma). Mono will also be compatible with the .NET Framework and offer compatible implementations of FCL libraries such as ADO.NET and Windows Forms.

The C# specification describes the lexical structure of C#, the types it supports, and its syntax. C# was designed with the CLI in mind, so its structure complements the CLI.

The CLI specification, in turn, is broken down into five partitions:

Partition I describes the underpinnings of CLI, which include the Common Language Specification (CLS), Common Type System (CTS), and Virtual Execution System (VES).
Partition II describes the metadata that drives the operation of the VES.
Partition III is a complete reference to the Common Intermediate Language (CIL), the instruction set to which C# programs are compiled.
Partition IV describes the run-time libraries that support various CLI profiles. For example, the kernel profile is the smallest possible collection of APIs needed to implement the CLI. The compact profile is a superset of the kernel profile and includes simple XML support, networking (including HTTP), and reflection.
Partition V includes miscellaneous information related to the specification, such as sample CIL programs and class library design guidelines.

The Common Language Specification

The CLS is a set of rules that guarantee language interoperability. A language is said to be a CLS consumer (for example, JScript.NET) if it can instantiate objects and invoke methods on them. A CLS extender language (such as C# and VB.NET) can act as a CLS consumer, but can also define new types and extend existing types. Applications and libraries that conform to the CLS can be consumed and extended.

For example, the .NET Framework includes two integral types, UInt32 and Int32. Both are 4 bytes, but UInt32 represents an unsigned integer. As it happens, the CLS does not define an unsigned integer, and the UInt32 type is marked as not CLS compliant. If you use UInt32 in your application, it is not guaranteed to be portable to all CLI implementations.

The Common Type System

The CTS specifies two fundamental data types for the CLI. Reference types live on the heap (a region of memory from which memory is dynamically allocated) and have a unique identity, lifetime management, and the ability to contain other types. Value types, on the other hand, are strongly typed blobs of data that live on the stack. In the declarations in Example 1(a), the struct, when viewed in disassembly, has no code — only fields. A value that has its types is an 8-byte blob of memory that has MyStruct superimposed on it; see Example 1(b). The object in Example 1(a), on the other hand, is more complicated. The fields are there, but there is also some code, see Example 1(c), to support the constructor.

Starting with value and reference types, the CTS supports a variety of derived types. Value types give rise to such types as enumerations and primitive types. Reference types are the basis of delegates, arrays, interfaces, and classes. Reference and value types have common ground in boxing and unboxing operations: A value type can be boxed into an object and later unboxed back into a value type, as in Example 1(d).

The Virtual Execution System

The VES specifies a hypothetical virtual machine that interprets a machine code called Common Intermediate Language (CIL). Like Java's JVM, this virtual machine is stack based. Unlike the JVM, this virtual machine does not exist in any ECMA-compliant implementation of .NET. This is because the ECMA specification stipulates that all CIL be Just-In-Time (JIT) compiled into native code before it's executed.

The VES supports several primitive integral and floating-point types, as well as an object type and a managed pointer type. The object type is a reference into managed memory (memory that is under the control of the CLI's garbage collector).

Mono Availability

The Mono C# compiler is written in C#, as are the Mono class libraries. The JIT engine is written in C. As of this writing, the Mono compiler runs on Windows as a standalone .NET application. To compile Mono on Windows, you need the Cygwin subsystem (http://www.cygwin.com/) — a Linux-like subsystem for Windows that includes the GNU C compiler and tools. (Mono requires the make utility and possibly some shell utilities that make depends on.) Mint, the Mono run time, runs under Linux or Cygwin.

You can get Mono at http://www.go-mono.com/. Follow the instructions to obtain the source and compile Mono, paying attention to recommended versions of tools and libraries. Since Mono is likely to be in a rapid state of development for some time, it may be sensitive to such things.

From C# to CIL

Example 2(a) is a C# program that calculates, increments, and displays a number on the console. As of this writing, the Mono compiler executable is named "compiler" and compiles Example 2(a) without trouble, as Example 2(b) illustrates.

Mono includes a disassembler called "monodis" that dumps the contents of that executable in CIL. Example 2(c) is CIL code for the Main() method. Lines IL_0000 and IL0002 push the two integer values 23 and 67 onto the stack. The next line adds those values and leaves the result on the stack (at this writing, the compiler does not support constant folding). Line IL_0005 stores that result into the stack local variable number 0, which corresponds to the variable a.

Line IL_0006 pushes that variable's value back onto the stack, and line IL_0007 pushes the value 1 onto the stack as a 4-byte integer (i4). Next, line IL_0008 adds the values, and line IL_0009 duplicates the result. Finally, the first copy is stored back into the variable, and the second copy is consumed by the call to WriteLine(), leaving the stack empty. Example 2(d) is the output if you run the compiled application through the Mono run time (mint).

The Mono C# Compiler

The Mono C# compiler is written in C#. Eventually, Mono will expose the compiler as a component that can be reused by tools such as SharpDevelop (an open-source C# IDE) or the Mono implementation of the System.CodeDom.Compiler classes.

Writing the compiler in C# has a number of advantages. For example, the lexical analyzer can use C# objects to represent the entities that it is parsing. This makes it easier to deal with literals and perform constant folding, since you can use existing C# facilities to implement those. Writing the compiler in C# requires us to have a set of class libraries sufficiently complete to host the compiler on Linux.

The C# parser uses Jay, a port of Berkeley Yacc to Java, that we ported to C#. We considered using a more advanced parser generator, but decided the returns on such an investment would be minimal. C# itself is a simple language, and most of the interesting work takes place during the semantic analysis phase (after parsing).

The compiler driver orchestrates the compilation process. The parser and the lexical analyzer create an internal representation of the input files using one class for each construct. For example, the if statement is represented by an If class that derives from the Statement class (all statements derive from this class). As with statements, expressions derive from the Expression abstract class. This organization is similar to that of the Guavac Java compiler.

Instead of implementing a complete type system that could cope with all the various features of the C# object model, we used types from the System.Reflection namespace as our type repository and System.Reflection.Emit to create types on the fly.

The types in System.Reflection inspect and manipulate types at run time (for example, you can enumerate all the public methods exposed by System.String). System.Reflection.Emit generates in-memory or on-disk types based on System.Reflection representations. These two namespaces provide the building blocks for types. So, the Mono C# compiler creates a type, adds members (properties, events, methods, and fields), and uses System.Reflection.Emit to write the types out to an assembly, which is an EXE or DLL that has a Portable Executable (PE) header and contains CIL.

The Class Libraries

At this writing, the class libraries are a work in progress. However, we have some pieces implemented that let simple applications be executed under the Mono CLI run time.

The class library is a good place to contribute to Mono, as the work is very compartmentalized. The interfaces are well defined and the communication required between the various groups is small, so different programmers can work on different areas without interfering with each other.

We are using the NUnit framework (http://nunit.sourceforge.net/) to create test cases that exercise the class library. This is also an area where contributions can be made without a lot of communication or a deep understanding of the ever-evolving Mono. Since the Mono class library will be compatible with .NET, you could even develop the unit tests against Microsoft's .NET SDK.

We recently migrated to NAnt (http://nant.sourceforge.net/) as the build system for the class libraries. Other parts of Mono still use a make-based process to compile. At this time, we are working towards completing enough pieces of the class library to have a self-hosting tool chain that can be used to further develop Mono in Linux.

Mono's VES

Mono has two virtual execution systems — the Mono Interpreter (mint) and a JIT compiler — that share a metadata library that accesses and manipulates PE/COFF images containing CIL instructions.

Mint was originally developed as a proof of concept for Mono. It was designed to be easy to debug, easy to study, and comprehensive enough that it could be used as a reference for debugging problems with the JIT engine. Mint is more portable than a JIT, so a nice side effect is that you can run Mono on different architectures without a lot of work. Ideally, we will port the JIT to each supported platform, but the interpreter will be useful for bootstrapping, getting Mono running quickly, and running under systems where speed is not as important.

Currently, the interpreter supports most C# language semantics. We routinely test it against a test suite that includes many test cases, including large bodies of code from the class libraries. Mint has been useful as a prototyping testbed.

Mono's JIT

Mono's JIT translates CIL instructions into native code at run time. The JIT compiles an entire assembly in one pass, or one method at a time the first time each method is invoked.

The JIT uses a set of macros that generate code in a memory buffer. Mono needs one set of macros for each architecture. These macros simplify code generation debugging and prototyping. The code generation interface for the x86 class computer platform is in the mono/arch/x86/x86-codegen.h file. Listing One illustrates use of those macros. The x86-codegen.h macros originated in Intel's Open Research Platform Java Virtual Machine. We have converted the macros to be used from C, the language that the JIT is written in.

The conversion of CIL bytecodes into native instructions is where things get interesting. Mono uses an instruction selector based on bottom-up rewrite system (BURS) tree pattern matching — the same technology used by the portable lcc ANSI C compiler.

BURS uses a grammar that maps a set of operations (the terminal nodes) into nonterminal elements that match the target architecture. This grammar is fed into a code generator program, monoburg. For those of you familiar with Yacc, you can think of monoburg as a Yacc parser. However, you don't run screaming for the hills in the face of reduce/reduce conflicts. Instead, conflicts are seen as a good thing, and are resolved by using cost functions associated with each production. The pattern matcher's input is a tree of operations. It maps the tree to the target architecture by selecting the nodes that have the minimum total cost associated with them.

The first step transforms a sequence of CIL instructions into a forest of trees. Each tree has to be fed to the instruction selector separately. During this forest/tree creation process, the standard CIL instructions are transformed into codes that are deemed better matches by the instruction selector. That is why the BURS grammar does not actually contain real CIL opcodes, but similarly named pseudo opcodes.

To generate code, a number of passes are performed on the forest of nodes. The first pass labels all the nodes and finds the cheapest tree, and the second pass performs register allocation. The final stage emits the x86 code.

At this writing, the JIT engine supports most of the nonobject-oriented features of the virtual machine. By the time you read this, the object-oriented features should be implemented.

Garbage Collection

Garbage collection (GC) in Mono is based on the Intel Open Runtime Platform (ORP; http://orp.sourceforge.net/). The ORP garbage collector provides an interface that can be plugged into existing applications and provides precise GC.

One of the GC modes provided by ORP's precise GC system is a generational, copying, and precise garbage collector. It is possible to control the kind of garbage collection algorithm based on this mode.

P/Invoke

P/Invoke (Platform Invoke) is the bridge between the CLR and any platform that hosts it. Under Windows, P/Invoke lets you call into Win32 DLLs (there is a separate API for calling into COM). Under UNIX, you can use P/Invoke to call into shared libraries.

Any implementation of .NET delegates as much as possible to the underlying platform. For example, the Windows Forms API needs to draw windows and put widgets in them. Under the hood, this chore is delegated to the appropriate Win32 or GNOME APIs. Anyone implementing the .NET Framework will need to rely on P/Invoke to manage this delegation.

P/Invoke uses a combination of attributes and extern declarations to pull functions into the CLR. The DllImport attribute specifies a shared library and function, and must be attached to an extern method declaration. Example 3(a) imports the puts() function from libc.so.6, while Example 3(b) pulls in several functions from the ncurses library. Figure 1 shows the output of running this program under mint.

Beyond the CLI

Mono is currently not self hosting: The C# compiler still must be compiled on Windows using Microsoft's C# compiler. When the C# compiler can run under mint and is capable of compiling itself, the Mono development team will turn its focus to other areas. However, some progress is already being made in those areas:

Gtk#. GNOME's GUI foundation is the Gtk+ toolkit. The Gtk# classes are Mike Kestner's work on a set of Mono bindings for Gtk+. C# properties map nicely to the GtkArgument system; events and delegates propagate Gtk+ signals. Gtk# will become the foundation on which we can build desktop applications for Mono, and will also become the foundation on which the Windows Forms (System.Windows.Forms) classes will be implemented.
Bonobo. GNOME's component system is a set of CORBA interfaces for components and compound documents. By the time we are done with Mono, you should be able to author Bonobo components in C# and make those available to the rest of the desktop with little effort, similar to what .NET does with COM under Windows.
Rafael Teixeira has been working on an implementation of Visual Basic .NET to be integrated with the Mono Compiler Suite. Another effort will yield a free ECMAScript implementation that generates CIL. Sergey Chaban has written an IL assembler that uses System.Reflection.Emit, just as the Mono C# compiler does. He also has contributed a verifier that checks the generated output of the compiler.
Programmers are at work on complementary projects. There is a set of OpenGL bindings for C#, and work is in progress to port the Camel mailer API to C# (Camel is similar in spirit to JavaMail). Again, Mike Krueger is implementing SharpDevelop, a free IDE written entirely in C#. SharpDevelop currently runs on Windows, but we hope to provide enough functionality in Mono to run the binary unmodified. The C# and Visual Basic parsers and integration with the .NET type system should help SharpDevelop support language-aware features (such as autocompletion in the GUI).

Conclusion

Implementing Mono is a big task that would not be possible without the help of the many contributors (you can see a list of them at http://www.go-mono.com/). We are thankful to all the contributors who have helped get Mono where it is today, and will certainly help in its future.

We are focused on having a complete and correct platform. Optimizations are not part of our initial design goals, since it is difficult to optimize ahead of time without good performance measurements. Hopefully when we are done with the foundational pieces of Mono, we will tackle a number of interesting tasks such as an an ahead-of-time compiler that would compile assemblies for maximum execution speed. An ahead-of-time compiler can perform more expensive optimizations than JIT engine would, since there is no rush to get the code compiled.

CIL is a good platform for writing code optimizers, as the division between the language and the target are clear at the time an ahead-of-time compiler would be invoked. Various optimizations can be applied on the intermediate forest and the individual trees: Enhanced register allocation and more traditional compiler optimization techniques can be applied here; also, the use of profile-based optimization seems convenient at this point. Various peephole optimizations that we are currently missing can be performed at the grammar level and at the code emission level.

The current code generator lacks an instruction scheduler. This is mildly important for x86 machines, but is more important if Mono is to support the ia64 instruction set or other RISC chips.

DDJ

Listing One

lreg: ADD (lreg, lreg) {
  if (tree->reg1 != tree->left->reg1)
    x86_mov_reg_reg (s->code, tree->reg1, tree->left->reg1, 4);
  if (tree->reg2 != tree->left->reg2)
    x86_mov_reg_reg (s->code, tree->reg2, tree->left->reg2, 4);
  x86_alu_reg_reg (s->code, X86_ADD, tree->reg1, tree->right->reg1);
  x86_alu_reg_reg (s->code, X86_ADC, tree->reg2, tree->right->reg2);
}

Back to Article

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

Mono & the .NET Framework

An open-source alternative