Devlog #6 - Pankti Language

Sat, April 22, 2023 | 800 words | 4 Minutes

It’s been quite some time, and as usual, I’m inconsistent, well I was busy, very busy rewriting Pankti Interpreter.

I have already mentioned in the last few posts that my programming language Pankti is being rewritten from scratch, previously it was Tree 🌳 Walker Interpreter but it now has a compiler and a virtual machine. The compiler generates some gibberish integers (Opcodes, Instructions) and the vm does some specific tasks based on those integers, fundamentally it is a very simple thing. Unfortunately, those fat academic books made us total fools by making us believe that writing interpreters and compilers are not the job of 🤹 mere humans.

But some people have written some nice books to make it easier for us to learn the magical art of writing interpreters and compilers such as Robert Nystrom’s Crafting Interpreters, Thorsten Ball’s Writing An Interpreter In Go & Writing A Compiler In Go and many more.

It’s been always my goal in life to create a Bengali programming language since the day I touched a compiler. When I was 12 years old, my parents bought me a computer, and since then I have been trying to put together a programming language. At that time I had very limited time for 2G internet¹, so I had to pull as many reading resources as I could within 24 hours or so. I read about parsers, parser generators, lexer aka. scanners, compiler compiler, bootstrapping, and many more. Parser Generators seemed “a way” of reaching my goal; first I tried Bison & Flex but those were too complicated for me ‘cause at that time I was just beginning to learn programming.

By the way, Did I ever mention C++ was my first programming language

Then I looked up other parser generators, I found ANTLR (Java), Ply (Python) and Gold Parser.

Using Ply and Gold Parser I reached much closer to my goal than ever before.

Years passed, and I got busy in high school. It is not like I have not done anything about building my own language, I built MewMew an esoteric programming language that let’s you program in cats’ language, which was written in ANTLR4 and C++.

In the lockdown, I made multiple prototypes at least 6 excluding small single experiments on lexer or parsers; I made most of my attempts in Rust. But as I progressed through the prototypes, Rust’s borrow checkers & compiler made my life a living hell, I got huge help from the Rust community which in my opinion is the friendliest of all the programming communities I ever interacted with; unfortunately, my field of interest was so niche that I was struggling to get optimal solutions, so I started replacing some safe components with unsafe rust, gradually making the whole projects unsafe rust. As a result, the codebase got too complicated and ultimately it became “unfun” to work on the project.

At that time I was reading the book Crafting Interpreters and started rewriting the project in C with guidance from the book. I started having fun again and I almost finished the project.

The most difficult part of the whole project is Bengali support, programming language like Rust, and Go supports Unicode very well but C is bare bones, I had to write big chunks of helper codes to handle Bengali² Unicode characters properly but I believe I have kept the code simple yet functional.

In the beginning, I used wchar to handle Unicode characters but shifted to use char32_t present in C11 standards. I use UTF-32 internally for everything thus keeping the code simpler but making it consume a little more memory than it would if UTF-8 was used; though I use UTF-32, in some parts of the code characters are converted to UTF-8 back and forth extensively. Actually, to use UTF-8 I don’t have to change most logics instead of using custom char32_t specific counterparts of standard char specific functions. But the problem I will face is in the Lexer (Scanner), it’s not like I didn’t try using UTF-8, but its variable length creates extra headaches to correctly detect Bengali alphabet characters, given a single character like 0xE0 I have to check the next character and the character after that to correctly assume the 3 chars are a single Bengali character, it’s too much work and will complicate the lexing process too much.

I want and will keep the code as simple and dumb as possible, I am aware that in the process I will lose some speed and create some memory bugs, but the code must stay simple, magic-less, and close to the metal.

If in the future I must reimplement the interpreter in other languages, my first choices will be Zig or D.

Read more about my experience here - https://palashbauri.in/internet-with-my-eyes/ ↩︎
Bengali Alphabet - wikipedia ↩︎

devlog programming

✏️ Last edited: Sat, April 22, 2023 | 📎 permalink

Devlog #6 - Pankti Language

Read more