Thor - An Ergonomic Systems Programming Language
A New Language
This post introduces a new systems programming language that I call Thor. Thor’s design is based on various pain points that I have found when working in existing languages, both professionally at large scale systems (e.g. C++ telecom codebases) and in my own hobby projects. Simplicity, expressiveness, and the ability to provide solid developer ergonomics have been major driving factors in the design of Thor. Thor borrows many ideas from existing languages (namely, Rust, C++, Go, and modern functional languages), but combines them in a way that aims to be more ergonomic and simpler for the developer.
I have already implemented many of the features described in this post in a Rust compiler (20k+ LOC), but there is still a lot to do and I have even more things planned than what is detailed here!
The compiler is closed source for now, but will become public eventually. Also, for now the compiler compiles Thor to C99 instead of LLVM.
Hello, World!
This example introduces the syntax for classes and methods. Thor is not inherently object-oriented, classes are simply containers for grouping functions and data together. A class
cannot inherit another class
, but they may implement interfaces.
This is to favor composition over inheritance by default, and avoiding the pitfalls of deep class hierarchies.
class Printer {
pub fun print_hello() {
println("Hello, world!");
}
// A destructor is called when the class goes out of scope
~Printer() { println("Goodbye, world!"); }
}
fun main() {
let printer = Printer{};
printer.print_hello();
}
Generics and Interfaces
Generics and interfaces are two ways of writing generic code, in Thor’s design they are often used in conjunction with each other as they each provide their own benefits:
- Generics allow us to define placeholder types that can be instantiated with various concrete types.
- Interfaces ensure that types in the system follow certain constraints.
Here is a short example demonstrating this:
fun use_printer<T: Printer>(printer: &T) {
// printer is guaranteed to implement Printer
// ...
}
This function is said to be generic over the type T
, it also has a constraint on T
that enforces it to implement the interface Printer
When using generics and interfaces like this, the compiler will resolve all types at compilation time, so there is no dynamic dispatch involved, meaning there is no performance penalty.
The dyn
Keyword
While the Thor design generally favors static dispatch through generics and interfaces, dynamic dispatch is supported for the cases where it is needed.
fun use_printer(printer: &dyn Printer) {
// printer can be of any type that implements Printer
// Methods invoked on printer will involve vtable lookups
}
The dyn
keyword indicates that the concrete type is not known at compile time, and that the object will contain some extra data for tracking a virtual table that is used for dynamic dispatch.
Type-Safe Unions
One of my favorite features in modern programming languages are type-safe unions (also known as algebraic data types, discriminated unions, or tagged unions). Unions exist in various forms, but the general idea is that it represents a data container consisting of a number of fields, where just a single field can be active
In Thor, it would look like this to declare an Optional
union
, which is a type that either contains a value T
, or nothing.
union Optional<T> {
Some: T;
None;
}
A union
in Thor essentially compiles down to a C union
with an extra field that denotes the currently active field/variant. So, union
s in Thor are type-safe.
A novel thing in Thor is that union
s can implement interfaces! When implementing an interface on a union
, it places a requirement on the variants that they all implement the same interface.
The intended use of this feature is whenever you’re using unions to represent types that all have similar behavior.
union AstExpression implement Node {
ident: Identifier;
integer: IntegerLiteral;
}
This effectively means that with Thor’s unions you get polymorphism over a fixed set of types, but without the cost of dynamic dispatch.
Flow-Sensitive Typing
A union
would be very limited in use if it was not possible to retrieve its stored value. Though, to retrieve its value you have to check which type is actually stored in it. Thor is using a simplified implementation of flow-sensitive typing, this means that the conditions, namely if statements, are used to inspect the type of a union
. Then, in the if statement, its type will be known.
Note how this is not introducing a new variable, value
is the exact same both inside and outside the if statement. Once you check which variant a union holds, the compiler automatically narrows its type within that branch.
let value = get_value();
if (value: Err) {
// value is known to be of type Err
}
Lambdas and Closures
A lambda, also known as an anonymous function, is a function that does not have a name. Unlike regular functions, lambdas are themselves expressions and can generally be used as any other expression in the language.
A lambda is declared with the syntax (param: T1, param2: T2, ...): {}
.
An example of it below:
fun main(): int {
let greet = (): {
println("Hello, world!");
};
greet();
return 0;
}
Closures
Closures are similar to lambdas, but they can also access variables outside of its parameter list by capturing variables from their environment. Closures use the syntax |capture_1, capture_2, ...|(): { }
.
fun main(): int {
let foo = Foo {};
let greet = |foo|(): { // Foo is moved into the closure
// We can access foo here!
// Only possible in closures, and not lambdas.
println(foo.message);
};
greet();
return 0;
}
Ownership and Move Semantics
Ownership is the rules and principles for how a program manages resources, such as memory. In C, the programmers themselves have to allocate and deallocate memory manually. Conceptually, this is easy to grasp, but can be tricky to get right in large code bases. C++ attempts to improve upon this by introducing move semantics. This enables transfer of ownership, but it also introduced a significant amount of extra complexity to the language. It can be hard for new developers to use move semantics correctly, as the compiler does not help the developers that much.
Rust has the borrow checker, which, among many things, ensure that references that are used in the program are valid before they are used. Particularly, relationships between lifetimes quickly get complicated (known as subtyping and variance) and greatly increases the barrier of entry to the language.
Lastly, the idea used in Thor is that a model without a full borrow checker, but with move semantics rules that are enforced at compile time, will be enough to provide a good developer experience, while still eliminating most of the footguns that you find in older languages, like C and C++.
let foo = Foo{};
// This moves foo into another_foo
let another_foo = foo;
// Using foo after the move is a compilation error
// foo.use();
Similar to Rust, when declaring a function as taking a plain parameter (i.e. not a pointer), it is actually used to indicate that the function takes ownership of the parameter. This means that the caller can no longer use the parameter after the function has been called.
fun take_obj(obj: Object) {
}
let obj = Object{};
take_obj(obj);
// obj cannot be used after the function call
As shown in the beginning code sample, classes can also declare destructors, which is a function for cleaning up resources when a class goes out of scope. Thereby making it possible to follow the RAII pattern.
While this might look different from C and C++, but from an ownership perspective, they are similar in a way. A function that takes a parameter by value, in some sense, takes ownership of the parameter, it is just that the parameter is copied by the caller instead of moved, making the function an owner of a copy.
Combining the Features
Eventually, you will be able to write code like this:
interface Printer {
fun print(): void;
}
union Message implement Printer {
text: String;
number: i32;
}
fun make_messages(): Vector<Message> {
let messages = Vector<Message>::new();
messages.push(Message::text("Thor!"));
messages.push(Message::number(42));
return messages;
}
fun main() {
for (let msg in make_messages()) {
msg.print();
}
}
In fact, it is almost compiling! The main missing parts are standard library functions and iterator semantics.
Closing Thoughts
Thor is still early in its development, while I have implemented many features detailed here, there are still many things left to do. Eventually, at some point, I would like to implement the compiler itself in Thor, thereby making the language self-compiling, but that is for another blog post.
Oh, and there are still features left in my own design doc, like the import/module system, defer
, variadic generics, tuples, operator overloading, etc.
Actually, one of the features that I am quite excited to eventually introduce is the ability to create anonymous unions, this would make it more convenient to have functions that return one-off types that you do not want to create a new union for:
fun get_foo(value: *ComplexUnion): Foo | String {
// Can only return a Foo OR a String.
// This is instead of creating a new union just for this return type
// Instead, the compiler creates an anonymous union automatically
}
// `is` is basically a switch statement that enforces handling of each variant
is get_foo(&value) {
foo: Foo {},
str: String {}
}
I think that those features not covered here really tie everything together in a way that distinguishes Thor from the languages it has its roots in.
And at long last, the compiler, admittedly, in its current state probably has more bugs than features. But, I am hoping to change that.