July 2022 Introduction To Generics In Strict

July 1, 2022

It is true that we can have programming languages without supporting generics like first versions of Java (till much later) or C# (till .net 2.0) or Go (for 12+years). They didn't have generics at all and a programmer could do everything still fine using polymorphism and runtime checks as well. However, generics make things not just more flexible, but allow the compiler to do MANY checks before anything even runs. You can think of it as "anything can be a object" casted down to specific interfaces.

How did we start implementing Generics in Strict?

We had to disallow ANY "any" for the moment to make the focus on generics very clear, we don't want the "Any" Type anymore except for the automatic base class (ala "object" in c#). This made many things to not compile or work. These things didn't really work anyway, they just compiled and would not run (except if just using "object" in the transpiled c#, but in cuda there is no equivalent)

How did we fix it?

The first thing that didn't work anymore was all of the methods returning Any because we didn't had generics or conversions to get a type of an instance or converting from one instance type to another, e.g. BinaryOperator.strict (while this is not used directly, it directs on what the compiler will parse it as)

to(type) returns Any
to(any) returns Any

And there are many classes that use unknown things like HashCode.strict (which also will never be used directly, it is just explaining how it works)

has any -> these are clearly forbidden now, so we MUST use generics here or have a way to create types in code

to(type) returns Type(type)

The above line of code makes just more sense, the type will be the generic Type, we probably could just say "type" and mean "Type(type)" by that, which is a mouthful

to(any) // this is simply forbidden, you can only have specific types with to operator
to(Type(Number)) returns Number or
to(Type(Text)) returns Text

For HashCode just remove "has any", everything is already any, so no need to include it, the only problem here is when using "this" or "value" inside a method or expression, it will not be clear from where those are coming, but again this class is NOT used directly, it is just to explain how HashCode are calculated:

implement Number
Compute returns Number
    if value
        is Number
            return value
        is Text
            return value.Length
    else
        return 0

(btw: I invented switch statements here by allowing to split if expression into multiple choices), the "if value" can also be optimized to "if", which just forwards the "value" from the above scope to below. But this is yet to be implemented in the near future

How to use Generics in Strict program?

Input.strict base type should be limited to what works for now and what is needed, which is Text reading

Read returns Text

If we need numbers or bytes or parsing in the future (xml, json, yaml, csv, etc.), we will add it when needed like ReadNumbers(separators Texts) returns Numbers, but we probably should think about how bytes make it through, how we can avoid saying "ReadNumbers" and then also having to say "returns Numbers", too much fluff

Output.strict is more complicated and could be solved with generics, but that is a really hard example to start with, also normally you just have a ton of methods for each type you allow to write (e.g. in java, c#, go, std, etc.), instead we just add the methods we need:

Write(text)
Write(number)

Remaining methods like Write(xml), Write(json), Write(type), etc. can be added when needed and we are not that far yet. Currently there is no method overloading, so maybe this is internally just a generic implementation or we just allow different parameters for the same named method (and just point to the correct one as we know each type at compile time so far). Only problem here is if there is polymorphism (which we also don't have yet), then the decision has to be still done at run time (compiler can only check that it makes sense up to the trait/interface)

Applications of Generics

The first and important application of Generics inside Strict language is List which is the reason for implementing Generics at this time. Therefore, after Generics, statements like

has anys -> won't be supported

Instead, we need to mention the specific type for example,

has numbers -> List of Number
has elements -> Numbers indirectly mean the same List of Number type

This means numbers is a type of List of Number and in this way the compiler will directly know the type without the need of conditional castings and thus makes all the operations of the type faster than before.

May 2021 Strict Newbie Experiences

May 19, 2021

Mikael Egevig

I started on the Strict project two days ago and the past 48 hours have been immensely joyful. Not only do I get to work with my friend Alexandre Bencz, but I am also getting to know several other, highly competent developers, all of whom have a shared passion for programming language design and implementation. The team now consists of five people, including Ben, so stuff is happening and quite fast even!

Obviously, the first few days are very hectic and overwhelming, because of all the new tools, procedures, habits, attitudes, and standards that the newcomer has to learn. I personally expect to learn more from joining this project than I’ve learned on my own during the past 5 years. There’s nothing that can speed up your own development as the process of working with people with the same interests and all kinds of useful skills.

Strict is a great vision, which we are currently working hard to both document and formalize as the compiler begins to take shape. An example of this is that Ben added a draft grammar in BNF format yesterday. This helps everybody on the team and outside to quickly visualize what the project is all about.

The next steps we’re working on are to formalize things like the Intermediate Representation (IR) and talk about the upcoming VM has begun as well, while Alexandre is working busily on the compiler itself. Harald is looking into the IDE support and stuff will happen there soon as well.

My role is, given the short period of employment so far, still a bit uncertain, but I do love to write documentation, so perhaps that will be part of what I do. I hope to return to these pages in not too long, with more exciting news from and documentation for the Strict project, and am already working on various updates to the existing documentation.

June 2021 Slow Progress

May 19, 2021

Benjamin Nitschke

Last month we had some new people starting to work Strict, but sadly after a few weeks progress came back to a halt. We got a new tokenizer and a clearer vision which parts are next and most important (IDE, VM), however the new people are doing their own thing again. Progress is much slower again, I am still super busy with growing the team, and AI projects. There will be a few months of training and getting the new guys on the AI projects up to speed, but hopefully after I will have more time on myself to continue on the next important parts of Strict.

I noticed even with compiler and programming language experts it was still hard to get the functionality, syntax and vision across. In the beginning everyone was hyped, but once the hard problems come up to solve we need longevity to finish the parts up and when the vision is not in everyones head, it gets hard. So my main focus is to write a functional prototype start to finish (which was what was planned anyway). Maybe a minimum viable product is possible now to get all the parts working (language parsing, running code, IDE experience, SCrunch, etc.) and then even play around with the AI generating some of this code. I am thinking 8 kyu/7 kyu CodeWars level, basic kind of problems, hello world, loops, simple state machines, some conversion of arrays and lists. A huge part is string manipulation and lists/collections/queries/etc. which is probably best left out in this first prototype iteration.

Once that works, I will show it to some programmers and see what they think and if it clicks easier than the current iteration of ideas. And maybe focus more on hiring, the C# Compiler Job position is still open in the meantime.

May 2021 BNF Grammar

May 18, 2021

Benjamin Nitschke

In todays meeting we discussed the grammar a bit more and we updated our Strict.bnf file, it is still very small, distinct and most importantly not done yet (tm):

file ::= {implement} {import} {member} {method}
implement ::= 'implement' type '\n'
import ::= 'import' namespace '\n'
namespace ::= Name | namespace '.' Name
type ::= Name
member ::= 'has' variable '\n'
variable ::= Name [type]
method ::= methodcall ['returns' type] '\n' [block]
methodcall ::= methodname ['(' parameters ')']
methodname ::= Name | binary | unary | 'from' | 'to'
parameters ::= variable | variable ',' parameters
block ::= {'\t'} {expression '\n'} ['return' expression '\n']
expression ::= 'true' | 'false' | 'from' | 'to' | Number | String |
    expression binary expression | unary expression | [namespace] methodcall |
    'let' Name '=' expression |
    variablereference |
    'if' expression '\n' block '\n' ['else' '\n' block] |
    'for' variablereference 'in' expression 'to' expression '\n' expression
variablereference ::= [namespace] Name
binary ::= '+' | '-' | '*' | '/' | '%' |
    '<' | '<=' | '>' | '>=' |
    'is' | 'is' 'not' | 'and' | 'or'
unary ::= '-' | 'not'

Strict Grammar

Strict is easy to read and write, there is usually only one way to do things and it doesn't need fluff like end of line characters. Blocks are indented and have no start, end or brakets (like in Python). All lines are expressions and have to evaluate to true, otherwise the execution and even compilation stops at this point. Callers can use catch blocks to check for this.

These grammar files are not really used to generate any lexer, parser, tokenizer. They are here for informational purposes and to generate syntax highlighting like for Textmate (.tmLanguage), which can be imported to Visual Studio Code, Textmate, Atom, Ace, Sublime, etc.

To generate .tmLanguage (for Visual Studio Code or Textmate) or syntax highlighter files for other IDEs or tools use https://eeyo.io/iro/

Other languages

Strict has lots of similarity with C#, Java, C++, F#, Lisp, Scheme, etc. However Lua and Python are syntax-wise probably the closest because of their simplicity and more simple look.

May 2021 Next Steps

May 10, 2021

Benjamin Nitschke

We recently got some interest again in developing Strict and got some freelancer help. Our job posts for full-time Compiler Engineers and a C# TDD Developer for our main project are still open: The intelligent robot arm.

Last time I talked about parsing libraries like Pidgin, Sprache and Superpower. The main idea still stands: Don't use the external lexer/parser code generator tools. Instead use combinational parsers and do everything in one go, the current code base shows this very nicely. To be honest I was a bit stuck last year with the supersimple approach of just fixing one test at a time until I ran into trouble with not looking forward or backward in the parser, which is very much needed for expressions in method bodies. We now have a custom tokenizer (thanks to Alexandre) and parsing solution again and things seem to work out.

The main reason nothing happened with Strict this year yet is simply that I have been busy 24/7 with the AI and robotics work, there was absolutely no time for anything else. Plus we recently trying to add some employees and there is a lot of interviews and teaching, learning, code reviews, etc. going on. Abir helps a lot with that recently.

Documentation

The Strict documentation is still mostly valid, even my C# Coding Guidelines from 2012 are still used for every new programmer that joins the team and they have not changed much in the past 10 years. However recently in interviews applicants noticed that we could be more clear about the current state, what works, what is next, what are the immediate next steps. Hopefully this blog post helps a bit. I will also edit the Documentation once we have more things working (e.g. the tokenizer work from today), I hope the other Strict-ers can also join the fun and help out with writing up what is going on. Wiki and Websites will always be important for Strict as the source code is not allowed to contain any comments, it all has to be on the web instead (AI won't read or understand that anyway atm).

Coverage not at 100%

Instead of pushing the coverage back to 100% with the mess I left behind last year with the tokenizer only working for simple usecases by commenting out problems and barely getting it up and working again, I moved all the commented out code and TODOs back and we should fix them one by one. Not much work really .. however there is plenty of unfinished stuff with both the backend (e.g. c# or c++ code generation) and the virtual machine (mostly not done, just some low level tests).

Cuda

I added some Cuda experiments late last year and they are very promising, we could easily parallize any code that makes sense to parallize (big loops, neural networks, math, matrices) by running on Gpu or Cpu or both. We have quite a lot of decent computers in the office as well and connecting them all up with our own networking stack (Tachyon, very much a faster version of SignalR), similar to NCrunch work queue servers. This is not easy and we will probably revisit this much later this year. However the Cuda stuff has made some advances this year, we created our own internal repositories for our engine and AI work to handle Cuda code more easily. Still mostly hand-written, but there is also great help with libraries like cudnn that provide most of the math we need for neural networks. Maybe in Q3 we can check this out for Strict as well.

Plan

This month (May 2021) our plan is to get all of the low level important parts up and running, there will be a lot of learning, teaching, discussions around many smaller problems like memory management, string handling, math, numbers. Next up is doing small hello world programs, expressions, and finally solving some 8kyu and 7kyu codewars.com katas in strict.

June is all about integrating strict as quickly as possible into IDEs, most importantly Visual Studio Code, but also Visual Studio 2019, IntelliJ and others via Language Server Protocol. We have some early stuff working from last year, but as usual there is a lot of fiddly work to be done to get it all nice and shiny. Especially SCrunch, nice auto-completion, always on compilation, super fast speed and easy refactoring, debugging and all the other great features any decent IDE brings.

In July we want to revisit some old use-cases and talk about new usecases we can then accomplish with the language, maybe focus on compiling Strict with Strict and see whats missing. Maybe networking, maybe parallization, concurrency or building neural networks with Cuda, who knows, we will find out. Most likely we have to go through the existing backlog and see if we are ready to give the language to other programmers and let them solve some katas with it.

Obviously all depends how much time I and the freelancers can spend on this and how successful we are. The most important goals as always are (in this order):

Clean Code with Tests written first!
Super fast always on compilation (I am talking nanoseconds here, with any backend this is not possible, so in our own Virtual Machine)
Very short and easy to understand code (our strict rules will mostly enforce this)
Almost all aspects of the language should be functional (deterministic, no inheritance, composition, most things are only calculated once and reused all the time). There will be 10% of mutatable fields and methods modifying those be allowed for special problems and optimizations, but this is not the norm.
Running the code also must be fast, C++ comparable, all impacted tests are always executed (later with slower integration tests that only run at CI server or checkin times). This includes parallization, concurrency, networking, Cuda and lots and lots of optimizations
And finally our main goal is to build AIs and let Strict be controlled by an AI as well -> we will start with normal Neural Networks like the ones we already write and maintain, up to evolutionary systems and meta parameters.

Till next time, I plan to blog about the progress weekly from now on, gives us also a good overview about our progress.

Btw: Abir and me do weekly Sharp Clean Code 1h live streams on https://twitch.tv/deltaengine and talk about very related things as well, mostly solving some interesting codewars kata or TDD problem.

Jul 2020 Getting Back Into It

July 29, 2020

Benjamin Nitschke

There are always times when something important has to be fixed or be ready for a presentation, release or milestone. In these times the temptation is very strong to just quickly hack it together and test test test until it works. Short term this is fine and this is pretty much how any Game Jam works, sadly for most games it leads to throw-away code which most people just notice when they start the next project.

We just had such a week last week and I tried to steer the team away from hacking it quickly together for the presentation/milestone. It was still stressful and I didn't really have time to finish my refactoring work on Strict. Since two weeks ago I am still in the process of changing the parsing to the Pidgin library, which works great, but I still have to go through most lines of code, throw away stuff, fix tests and coverage, etc.

Parsing libraries

Pidgin is a pretty good library similar to Sprache or Superpower with even better performance. It is very similar to the Parsec Monadic parser from the Haskell world, which combines lexing and parsing into a bunch of functions to find expressions this way. Debugging and developing parsing this way is much more comfortable than going the lexer/parser route or using external lexer/parser code generator tools. These work great if you want to do exactly what many have done before you, they generate much better code than a newbie can write himself and it will perform much better. However Strict is not doing much in the traditional way and I still am constantly changing how things work, the more complicated it is to change how the parsing works, the more work it is. Originally I was writing my own parsing (as you see from the earlier commits) and I might continue with that later on, but for now it is nicer to have something working quickly to experiment around until the language is more fleshed out. Pidgin is very well tested and fast, only method bodies need complex parsing in Strict and they are evaluted lazily when needed, most code is not executed and there is no point in loading it or getting it ready. This makes loading files in Strict much faster than in any other programming language, you can load as many files as you like in parallel, more similar to database or json loading and less like c++ compiling.

A good example is the strict ruleset for source code in Strict, we do not want multiple ways to write code (very similar to Python, just more strict and even more basic). There shouldn't be multiple ways to format your code, write loops or indent code or blocks for your conditions. Since the end goal is to generate code via the Stricti AI, there should be the least amount of possible variations leading to compiling code producing the right results (most preferbly there should be exactly one solution).

Own company cryptocurrency token

Another small side project I had over the past two weekends was to create a cryptocurrency token for our company, we have a little internal economy going, basically giving employees a way to earn extra story points from sprints or just a thank you for doing good work. Originally I tried creating an Ethereum smart contract, but fees are crazy high plus things are still very hard to do and test. After looking around a while (haven't done much crypto work for about a year) I went back to Neo and some other smart contract coins like Waves, which I immediately liked. It fit very well to our economy and idea, it also gives the new employees an easy way to get started and learn all about crypto. Things are heating up recently again, Ethereum went up 50% this month, Bitcoin just did a 20% move as well.

However the token was still not a good idea, the experiment ran fine, I got everyone to be their own bank, handle the tokens and explained how to use them. One guy played around with trading, but everyone else didn't do anything last week with it, it seems it still felt like Monopoly Money for most guys here. So I tried to assign a value to the token, but that didn't really work either, no one was exchanging it or even "getting" it.

So this week I discussed all these points with the team and we decided to switch to a stable coin instead (and burn all our tokens). This takes out all the fluctuations and makes it very clear how much each point is worth. Also if one stablecoin dollar is exactly one USD, it is clear what it means, even if it is still hard for some people to understand that have not done anything in crypto yet .. well, learning by doing I guess.

The other change was to change the way that everyone is their own bank back to the MyDashWallet bot system, where the bot has full control over your account and shares the private keys with you if you also want to have control. This way the Telegram bot we are using internally (like several others we have written before) can do whatever you want very easily and securely: tipping, receiving or sending coins, exchanging, raining, price information and many other cool features.

Had a short presentation today at our local crypto meetup and everyone got it immediately there and was very impressed, hopefully the employees will get it as well when using it more :)

Coverage back to 100%

Similar to our company work where we had to cleanup last weeks presentation work to get everything nice and clean again, all tests passing and coverage back to 100%, removing any dummy or hacky code immediatly I still have to do the same for Strict. I am still kinda stuck in the MethodBody parsing, which has to be rewritten as the old LineLexer and Tokenizer parsing logic doesn't make much sense anymore. Should hopefully be finished by tomorrow, I will try to blog more in August on more progress there. We are also discussing creating another blog for our Towers game development starting up right now (or in general game engine development, vr, games, etc.). On my old blog I had a lot of categories, my focus is still just this blog and hopefully the other blogs can be done by other team members.

Jul 2020 Optimizing FindType

July 5, 2020

Benjamin Nitschke

Still working on the package loading code from the last blog entry. The main issue was the dummy repositories system I build a few days ago to grab code from a fixed folder, which didn't exist on the CI server. So instead of hacking another quick solution, the code was changed to download any repository from github and provide it at a StrictPackages local cache folder. This works very well and is also efficient, but there are so many problems to be solved, not just the caching and when to redownload the cached folders, but a huge amount of testing and CI issues took a long time to fix:

All good now, very fast for development and the CI server will just pull any github repository older than 1h and keep using it for all its tests, later with versioning and https://packages.strict.dev it will work much nicer. Also packages should not just be github repositories, but also be compiled and versioned, which will be much easier to download and use. Currently package management is not very high on the priority list, it just needs to work so testing can go on.

FindType

Once packages work the first obvious use case is to grab Types from them. As explained in the last blog post any public type (any package publicly available and any upper-case type in them) are always available in all .strict files, there is no need to import anything, the whole universe is always available. This is pretty cool when writing code and discovering existing types and features, but it makes type discovery quite a challenge and requires a ton of high level optimizations and caching plus low level code that performs very well going through the code trees. This is the picture from the last blog post: FindType

The final implementation is actually just one expression body, but took me multiple days to find all the issues and write a lot of tests to cover all the required features. And even with it working now, the performance is not that great yet, see below for more optimizations.

public override Type? FindType(string name, Context? searchingFrom) =>
    FindDirectType(name) ?? (IsPrivateName(name)
        ? null
        : FindTypeInChildrenPackages(name, searchingFrom) ?? Parent.FindType(name, this));

FindDirectType is just a foreach loop on any type defined in the package directly (not in any sub folder, which are sub packages). It is about twice as fast as a similar Find or FirstOrDefault linq query. It is also usually inlined and only used at a few places:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public Type? FindDirectType(string name)
{
    foreach (var type in types)
        if (type.Name == name)
            return type;
    return null;
}

Next the FindType method skips over any private name (when a type starts with a lower case letter) because it wouldn't be allowed to use it any other package anyway. The final line first searches all children packages recursively via FindDirectType again, also excluding the context we are coming from (usually our package we jumped into from the Parent.FindType search).

private Type? FindTypeInChildrenPackages(string name, Context? searchingFromPackage)
{
    foreach (var child in children)
        if (child != searchingFromPackage)
        {
            var childType = child.FindDirectType(name) ??
                (children.Count > 0 ? child.FindTypeInChildrenPackages(name, searchingFromPackage) : null);
            if (childType != null)
                return childType;
        }
    return null;
}

Not the prettiest code, but it works and performs its job well. This was actually the most difficult part as I initially used FindType here recursively and had a lot of problem of sub trees not searching the same parent again or parents going into the same children over and over again (lots of StackOverflowExceptions).

Performance

The first rule of optimization is to measure. I pretty much knew that the main issue will be searching from the root package to all children, so this is where I added the cache. This high level optimization gave already a good boost (10-100x faster depending on the use case), it will probably be way faster in the long run when there are hundreds of packages and thousand or million of files.

This is my first line-by-line profiling on the finished working code with all tests green and ContextTests.LoadingTypesOverAndOverWillAlwaysQuicklyReturnTheSame used to check the performance of doing 1 milion calls to FindType. Without the cache it is around twice as slow (and sometimes would time out NCrunch, so the cache is really good), but as you can see from the profile result, that is not really the main problem. FindType Performance Initial

It seems only 39% of the time is even spend in the code I wrote, most of it is wasted on system, string and collection code. First order of business is to reduce the amount of string manipulations done and maybe inline a few more properties and methods just passing data around (with line-by-line profiling there is a lot of overhead, so switched to sample profiling mode).

Digging deeper into the performance results I saw a lot of Enumerators being created and disposed, so I started removing any foreach loop or linq query and if there was any string manipulation or comparison, I tried to remove it or simpify it. Profiling a bit more after some optimizations showed that most time in my example was spent in the Root package checking the cache, which means it works very well already, almost no time is spent in the tree and the only optimization left is to make the cache faster. FindType Performance Dictionary

After replacing Dictionary with FastDictionary it was time to profile again and surprise surprise, it was 3 times slower again. I guess .NET core 3.1 is already optimized quite well. I remembered that I could still make string operations about twice as fast by using StringComparer.Ordinal like this blog post talks about, except it didn't help either and made the code about 20% slower than just using non StringComparer methods. Last thing I tried was char.IsLower, replaced it by some custom if code making that part a bit faster as well, but I reverted it back because the .net core is quite optimized and good for this and much more capable than a quick if check (took like 3% of all string checks time, so not important anyway).

This is the final result, I spent over an hour trying out the above optimizations and just made it worse, so back to this version, good enough (78ms for 1 million FindType calls): FindType Performance Final

Jul 2020 Package Loading

July 1, 2020

Benjamin Nitschke

Tons of changes have been made in the last few days to load packages with all types and all their methods. The reason and use case was trying to put Strict into production already. Having a few unit tests work and experimenting around with simplified language ideas is all nice and good, but useless in the long run if I can't prove it works with actual code in the real world. It is obviously way too early to tell. However nothing is preventing us to write some tests that assume we can run the existing code already.

Initially I tried running some code in a quick self written interpreter (ala virtual machine) like shown in most compiler/interpreter books and getting some simple state machine and calculator parser and interpreter off the ground is not that hard if you have done it a few times. Not that exiciting for me or Strict, so I was looking for a full solution instead. The much older Strict parser and interpreter was written in NRefactory and then later ported to Roslyn (many many years ago when it came out first) and also using Irony for the SNF parsing. That code still works, but is quite complex and not very similar to the new functional way. We also got the strict sdk running in go and that is working fine too, but we don't have an interpreter/virtual machine here yet, just some backend code to generate source code in another language, which has quite a lot of issues as well (e.g. for c++ code to compile each type must work and currently it just isn't done yet).

After some back and forth and trying different solutions I went back to Roslyn, which I know well enough to quickly generate a bunch of code, classes, methods, statements, etc. and run them. Initially getting Roslyn up and running at runtime can be slow (around 1-2s), but the cool thing about Roslyn is that subsequent runs are super fast (in the 10ms range), which makes it good enough for quick testing and running many NCrunch tests in the background all the time. We had 5000+ tests on the Delta Engine backend to generate C++, Objective-C, Java, JavaScript and C# code 6 years ago, initially using NRefactory and later Roslyn on top of our own AST model as well and while it wasn't as fast as the frontend 3000+ tests (which ran in a few seconds), it was fast enough to run all the time and especially on each CI commit. Nowadays computers are faster and tools are better too, our goal is always having unit tests run in under 10ms (with Roslyn the initialization time needs to be excluded as that is not possible to get up and running that quickly, especially generating a new assembly, loading all the required assemblies for analysis, code generation and execution).

Back to the problem at hand: Loading packages, which contain class types and sub packages. Types contain methods and all the statements are in those.

Packages

A good example for a package is the Strict.Base package, which gives us all the base types we usually need anyway (reduced the implementation to what is working now, there will be more types soon).

For now Any.strict (providing ComputeHashCode and IsEqualTo methods) was removed as we don't want to force everyone to implement those or autogenerate them for everything. Every type should get a hashcode, equal checks and conversion to text (ala ToString) automatically anyway.

Strict.Base

Any trait: Basis for all classes, is always implemented. Provides to HashCode and to Text (both automatically implemented by default in the compiler, can be overwritten)
Mutable trait: Does not implement anything, just provides the compiler with the knowledge that this changes and is not threadsafe (and should be avoided)
Number class: Most used type for anything that requires computation, provides number manipulation methods and to / from Text, etc.
Character class: Needed for text, basically a number, but will be implemented as utf8 char
Count class: Mutable version of Number, which is only used in a single thread, often optimized away
HashCode class: Just implements number and stores the hashcode in the implementation (usually as int)
Text class: List of characters with a bunch of helpful text methods (implemented as string obviously)
Input trait: For getting data, usually from stdin, also reading files or any input device
Output trait: For writing data, usually to stdout, stderr or any file, display, data, etc.
Log class: implements Output is by default implemented to write to the Console (but the user can provide his own implementation, which would change usages)
App trait: Entry point for all apps (there can only be one per package, which must be in the main namespace), requires Run to be implemented

This should be enough to create a console app. If a file is a class or trait usually doesn't matter except when you try to implement it for a new class, where only traits are shown and allowed. Classes are used via has keyword as members. On purpose most complicated methods and features have been left out (localization and culture stuff, we always assume international ISO formats for now). Also no Type, Function or Iterator features yet. Again: We don't want to replace any framework here, just provide the basis so simple programs can be understood and generated by machines.

Any.strict

is(any) returns Boolean
to returns HashCode
to returns Text

Defines all the methods available in any type (everything automatically implements Any). These methods don't have to be implemented by any class, they will be automatically implemented with default behavior if not provided. In the current iteration I removed the method keyword as it is obvious that returns is only used for methods (and None methods are easy to spot as well). Often Any is replaced by a specific type or trait to be more useful in an implementation, for example Input.

Mutable.strict

from(Any)

Number.strict

+(other) returns Number
  +(5) is 5
    Number(3) + 4 is 7
    return self + other
-(other) returns Number
  -(5) is -5
    Number(3) - 2 is 1
    return self - other
/(other) returns Number
    /(50) is 0
    Number(1) / 20 is 0.05
    return self / other
*(other) returns Number
    Number(3) * 4 is 12
    return self * other
>(other) returns Is
    test(0) is false
    test(3) is true
    return self > other
>=(other) returns Is
    test(0) is true
    return self >= other
<(other) returns Is
    test(0) is false
    test(3) is true
    return self < other
<=(other) returns Is
    test(0) is true
    return self <= other

Currently implements all the basic math operations. Conversion to Text is done in that class.

Character.strict

implement Number
from(number)
    test(7) is '7'
    return '0' + number
from(text)
    test("b") is 'b'
    return text.Characters[0]
to returns Text
    test('a') is "a"
    yield self

'7' is not valid yet, maybe Character will become private (thus character), not sure if there are any usecases outside Text for this. Converting numbers to Characters is helpful and getting the first Character from text is also good, same as converting back to Text.

Count.strict

implement Mutable
implement Number
Increase
    Count(5).Increase is 6
    self = self + 1
Decrease
    Count(3).Decrease is 2
    self = self - 1

Here we can test methods that return None because they modify the state of itself (the Number), but we still allow the shortcut testing because we know that we talk about the thing before the None method call. This works everywhere else just as well (even with chaining). ++ or -- are not valid operators in Strict.

HashCode.strict

implement Number

Nothing here yet except a number, probably will stay that way and the Any autoimplementation of to HashCode will just xor each member (with some optimizations for complex things like Text).

Text.strict

has Characters
from(number)
  test(45) is "45"
  return stream digit from digits(number)
    create Character(digit)
digits(number) returns Iterator<Number>
  test(1) is (1)
  test(123) is (1, 2, 3)
  if number / 10 > 0
    yield digits(number / 10)
  yield number % 10
+(other) returns Text
  +("more") is "more"
  "Hey" + " " + "you" is "Hey you"
  return self.Characters + other.Characters

See the blog post June 17, 2020 As Simple As Possible for details. Because Characters ends with s, the type Character is used as an Iterator (readonly array). The + method adds two texts by using the + method for Iterators, which will just create a new bigger list containing both parts.

Input.strict

Read returns Any

Typical example of a trait in Strict, it is super short and easy to read. When loading files Iterator<Text> or Iterator<Number> might be more useful than just Any, but anything is allowed and can be limited when implementing.

Output.strict

Write(any)

Log.strict

implement Output<Text>
Write(text)

Log implements Output via generic specification implements Output<Text>, so only text entries can be written (lines). The log trait is not implemented in Strict yet, the backend will provide us with a ConsoleLog version that will be injected. For testing we need a MockLog thingy as well and I am currently thinking about enforcing writing Mock implementation classes in Strict when using external classes.

App.strict

Run

Another very simple trait just telling us to implement Run, which is the entry point for our package (in case we want to run it, most packages will just be libraries).

Loading Order

All this was just done to force me to implement pre-loading types in a package for the current LoadStrictBaseTypes test, then pre-load each of the implementations, members and methods (which might use other not yet loaded types from the same package). And then do the same for the methods, which are evalutated lazily until they are needed. All types and methods defined in a method body need to be available to compile correctly.

This is not easy at all, I tried several approaches and had to revisit and update this a few times until it all made sense and worked, luckily unit tests helped to stay sane. The following picture shows the typical search steps and optimizations done. It is different from simple binary searchs or finding types in other languages because in Strict any public type can be used at any place. There is much more to be done to make this work by discovering types from packages.strict.dev, more on that later. FindType

Jun 2020 Parsing Methods

June 25, 2020

Benjamin Nitschke

As described last week I tried to simplify the Strict syntax and get some low level type, member and method parsing working in a new simplified respository: https://github.com/strict-lang/Strict

It took a few evenings to make sense of it, now we got a pretty decent simple packages, type, members and method definition parsing system in around 250 lines. No lexing or actual tokenized parsing is going on, Strict is very strict about the syntax and we can assume a lot of things and just abort if a file doesn't match the expected pattern.

However with methods there is obviously a lot of flexibility and even more rules, this approach isn't going to work. However using a full parser is not the best choice either as it allows way too flexible input (ignoring whitespaces, comments, tabs, spaces, extra spaces at end of lines or files), which we want to avoid. The goal is still to get a 1:1 mapping of compiled packages back to their source code without losing anything going back and forth. Plus we want to find one true solution to a problem and not allow many possible ways to do it (which is impossible to archieve, but at least we can limit the search space a lot).

Alternatives

So I looked around in old code (including the sdk in go, older strict versions in C#, lua, python and C++). Code I found ranged from Domain-specific languages, simple state machine parsers, regex parsers, cool projects like Sprache and Superpower and of course the many available full fletched parsers (ANTLR, Irony, etc.), but nothing really fit out of the box. I tried plugging in some old code and got a few lines working, but I wasn't happy with the extra complexity.

Parsing manually

I started back from the beginning with a very simple lexer and spits out tokens, which are then consumed by the MethodParser. Some parts might even be merged because the lexer isn't really doing much and the tokens have to be in an expected order anyway. But error reporting is nice this way and I am not sure about the complexity yet and we might be better off separating lexer, tokens and parsing so applying things like Observer pattern stays easy (have no usecase yet for that, so it is not implemented).

Implemented Tokens

We use the lexer for each line and always start looking at the tabs first, we start at 1 and go deeper for nested statements (if, for, stream), there is no space following this token, but a space must be between every other token except ( and ).

test is the first one in any method
( and ) are needed to pass in arguments to test and method calls
is is our comparer (ala ==, which doesn't exist in Strict)
let allows to create scoped assignments (ala const, reassignment isn't allowed in Strict without the mutable keyword)
identifier to name let assignments, also might be a type, unknown here (actually we could know this and classify this different maybe)
= assign values to let, has or parameters, any expression more complex than a const value can only happen in methods (we don't have a complex parser at member or parameter level anyway)
+ example binary operator for now

returns also has been removed in the last post, the last statement in a method must either be a non-return statement and thus makes the method not return anything (None) or return a value of a specific type.

if, for, etc. coming soon.

Statements

MethodCall test or any other method call, currently must include () to tell the parser this is a method call as opposed to a member or let
LetAssignment assigns a value or expression to a local field
Return ends the function and can return a value

As you can see all this is still very easy and allows me to experiment around with different ideas very quickly.

Example

method Double(number)
  test(1) is 2
  let doubled = number + number
  return doubled

The let is obviously useless and would be optimized away (which means the source code would change to return number + number automatically and more optimizations based on that). The whole method doesn't make much sense and probably won't be allowed in some future version, i.e. removing and inlining all code would make it much clearer (especially by just replacing it with 2 * number).

All code can be found at the usual location, coverage is 100%, TeamCity does a lot of nice extra checks and the code is still very clean, nice and short: https://github.com/strict-lang/Strict

Jun 2020 As Simple As Possible

June 17, 2020

Benjamin Nitschke

While exploring options yesterday and today for creating a great editor experience for Strict, I discovered some new options. We already got a VSCode integration that provides basic syntax highlighting and works to write a few lines, but is not a fun experience at all if you are used to fully fletched IDEs. The Strict IntelliJ plugin we recently got working is good enough for some basic Auto Completion / IntelliSense, but there are thousand little issues, which makes the experience not very good yet (which is why it is not released yet and we have noone using or working it daily atm, as opposed to the sdk, Strict and VSCode code bases). I am by no means a Java Guru and don't really like working on top of the IntelliJ platform sdk, so I am unsure when this plugin is gonna be improved.

What sounds very promising is the Language Server Protocol and the growing numbers of implementations. Doing some early experiments a Stritc language server works in Visual Studio Code, Visual Studio 2019 and even IntelliJ (plus a lot of other IDEs and Editors that support it like Emacs, Vim, Atom, whatever people like to use). More on that in the next blog post.

As Simple As Possible, but not simpler

"Everything should be as simple as it can be but not simpler!" - Albert Einstein

While trying to get the Strict Language server plugin up and running for testing, I still noticed some pain points. I am currently preparing the upcoming work for the new employee Mahmoud (starting tomorrow). I can explain away most design decissions, but there are some open issues plus some simplifications that Merlin and me talked about in the past, but are not implemented yet. So instead of continuing with the current sdk in go, I thought why not try starting to bootstrap the Strict compiler directly in Strict .. but no, its not ready yet, I got stuck very quickly.

The sdk code base is already too large for quick experiments, so I just created a new one in c# (where I feel most comfortable until Strict is hopefully more useable later this year) and keep staring at the very old design, the redesign from last year and the redesign from this year (in go). The main thing I noticed is that many checks are just not needed and Strict is very clear on what is valid code and what isn't, so why not get away with no lexing or parsing at the file level at all.

We know a .strict file is describing a type. A type can either be just a trait (think interface) describing what should be implemented, or it is a class optionally implementing one or multiple traits. From the outside it doesn't really matter, you want to use some functionality like Account, Count, Computation, Number, Iteration, etc.

Everthing automatically derives from the Any.strict trait, which looks like this (notice there is no implementation):

method ComputeHashCode() returns Number
method IsEqualTo(target Any) returns Boolean

Either a file contains no implementations, then it is a trait, or it has just implementations, which is most files. Let's look at some String.strict examples while simplifying the language.

Iteration 1

implement Sequence<Character>
has characters
factory From(number)
  From(5) is "5"
  From(123) is "123"
  let result = create StringBuilder()
  while number > 0
    result = "0" + (number % 10) + result
    number = number / 10
  return result

This was an early implementation idea, close to the current String.strict code. You can see it starts with a bunch of tests to make sure what we are doing makes sense and works. Strict enforces to have at least one test condition for every method (which can be any expression returning true, anything else would fail the test and thus compilation).

Here we implement the generic trait Sequence with the Character class, which is used in the next line to create an array (which is immutable like everything else not marked with the mutable trait). Next we have a special factory method called From, which has no method and no return type as it is a factory method to construct this type based on a number.

Next we create a result, which is not a class name, so here we see a type definition for the first time as the compiler can't figure out what we mean by result automatically (string, text, name would all be strings, stringBuilder would be a StringBuilder, but that is long and ugly). The StringBuilder internally keeps a mutable array of characters we can append to, which is useful in this usecase. Now we use a simple formula to add each base10 number at the beginning of result, then reducing the number by a factor of 10. Finally we return the StringBuilder, which has a to method to give us a String, which matches the characters defined above.

Now there are several problems with this code, first of all the number can't be mutated as everything is immutable by default in Strict. We can change that by making it mutable. Next is that we don't even have while loops, there is currently only one form of loops, which is the good old for loop.

Iteration 2

Let's skin the code another way:

implement Sequence<Character>
has characters
from(mutable number)
  test(5) is "5"
  test(123) is "123"
  create result StringBuilder
  for digit in Range(0, Log10(number))
    result = "0" + (number % 10) + result
    number = number / 10
  return result

Ok, here we removed factory, just named it from, which is a reserved keyword anyway to convert stuff to something else. We also added a mutable to the number (which is still of type Number) to allow changing it in our loop. The tests look better as they directly tell us what we are asserting (btw: complex tests with multiple lines can be written as indented code blocks like everything else). Also calling yourself and trying the method name again and again isn't produce, lets just use the test keyword and pass the parameters directly in here.

Next I have renamed let to create and removed all the assignment stuff and the parentheses as there is nothing we pass as parameters. The loop is now a for loop and got the Range going over the digits of the number and still does the same logic inside the loop.

Iteration 3

This is still not very functional and it seems I am still trying to low level optimize, which should be the job of the compiler and not the coder. Let's try to go a more functional approach.

implement Sequence<Character>
has characters
from(number)
  test(45) is "45"
  return stream digit from digits(number)
    create Character(digit)
method digits(number)
  test(1) is (1)
  test(123) is (1, 2, 3)
  if number / 10 > 0
    yield digits(number / 10)
  yield number % 10

Here we use streams, which are not documented well yet. I just added the streams page. Basically they grab any array, collection, sequence or data and pass it though the pipe in the lines below. Here we simply create a Character for each digit (which does the "0" + number thing for us). The stream combines it all back to an array of characters, which automatically matches our String we wanted to build (any type can be constructed by supplying the has members, no need to write any method, constructor or factory like that, as usual this is forbidden in Strict ^^).

Iteration X

This is not done yet and will be changed many times. I am currently just experimenting with parsing the above code and see if the AST that pops out makes any sense.

Anyway, methods contain code that needs to be parsed, everything else (implement, has, from, method) we can make up from simple rules, which is what I am currently trying at https://github.com/strict-lang/Strict

One final note: I completely removed imports as the Context that is used to parse a file knows all types already and if any unknown type is used in a .strict file, the parsing (and thus compiler) stops. There is probably some ordering that needs to be done and the optional build.yml file needs to allow users to point to more than just the default repository for all known types.

Summary

Just two nightly code sessions with most of the time thinking about simplifications what what makes sense, this repository will stay in flux for some time and should not be considered stable (the sdk repo works and is usable and any bugs there we will still fix till the new repo is remotely usable). The main goal here is to make the editor support and language server implementation much easier and also think on what makes sense while adding some code we can compile and run soon (using as much as possible from existing blocks).

Todays goal is just to get it all green on TeamCity CI (Continuous Integration), which is still complaining about some ugly comments, some small issues and not having 100% coverage yet .. no biggy.

← Prev Next →