Daily Learning Notes for February 7, 2016

I really missed NAND2Tetris these last couple days! I guess it was nice to have a weekend, though. Today is Sunday but I feel like working, since I spent all of Friday socializing and driving around town. Anyway, where did I even leave off? A couple days ago I finished the first version of my assembler, but it wasn’t working. I guess I should try to fix it!

See all the code in my GitHub repository here. (But don’t look at it or read this blog post until after you’ve already completed the project yourself!)

Another note: the code snippets in this post are broken, reflective of my thought process at that particular moment in time. The fixed code is on GitHub.

Fixing version 1 of my assembler

The book suggested building a very basic version of the assembler first, then extending it to handle user-defined symbols. So I’m just trying to get that first version to work.

When I ran one of the sample assembly scripts through my assembler, it spit out “undefined” a bunch of times for the comments at the beginning of the script. So I took another look at my code and sure enough, I wasn’t skipping over comments or empty lines! Oops!

So I added this into my forEach loop:

// if this line was empty space or a comment, skip it
if (instruction === '') {
    return;
}

I ran the sample script through my assembler again, saved the output file, and opened it in the CPU emulator. Now for the moment of truth…

Hurray, it works!!! The CPU emulator drew a rectangle on the screen just like it should! It works exactly the same with the provided sample script and with my assembler’s generated machine code. Woohoo! That was easier than I thought.

This is a good time to make another commit and push it to my GitHub repo.

Assembler version 2

Next, I’ll extend my assembler to handle symbols! There are three kinds of symbols that I need to look out for:

Predefined symbols like SCREEN and KBD, which refer to RAM locations
Label symbols like (LOOP) and (END), which refer to the location of the next line of code
Variable symbols like @sum, which the book says should be “mapped to consecutive memory locations as they are first encountered, starting at RAM address 16”

Instruction memory vs data memory

Hmm, I just realized there’s something I don’t understand about my computer. It looks like the RAM locations and ROM locations are treated the same…

On page 85, the book confirms this:

“the contents of the A-register can be interpreted in three different ways, depending on the instruction’s context: as a data value, as a RAM address, or as a ROM address.”

I don’t get it! How does that work? If RAM locations starting at 16 are reserved for variables, and programs can be longer than 16 lines and can include label symbols that point to other lines of the program, how is it that the label symbols don’t conflict with the variables?

I think I need to go back and review for a moment. (I don’t have to; I could just assume it works fine as it is and just finish my program. But I’m only doing this to learn more about how computers work, so I may as well check my understanding instead of rushing to complete this project.)

What I do understand so far:

The ROM is the instruction memory, where the programs are stored. It contains 32K addressable 16-bit registers.
- This means that my programs can be up to 32768 lines long – that’s 2 to the power of 15, which is also why this computer uses 15-bit addresses.
The RAM refers to the data memory, which contains a 16-bit wide 16K RAM plus an 8K memory map for the screen and a single register for the keyboard’s memory map.
- Memory locations 0 to 16 are used to store predefined symbols, and locations 16 to 16383 can be used to store variable symbols and other data.
- Memory locations 16384 to 24575 are reserved for the screen, and location 24576 is reserved for the keyboard.

OK, so how does my computer know the difference between data values, RAM addresses, and ROM addresses, all of which are stored in the CPU’s A-register?

After staring at the CPU diagram on page 94 for a while and then reviewing the Hack computer’s machine language specification in chapter 4, I think I understand it now. The meaning of the value stored in the A-register is determined by the next command in the program.

The A-register’s contents can be interpreted three different ways:

As a literal data value, if the next instruction uses A, as in D=A-1.
As a data memory address, if the next instruction uses M, as in D=M+1.
As an instruction memory address, if the next instruction causes a jump, as in 0;JMP.

And here’s a very important implication of this design: you can’t do operations with the memory and the A-register at the same time! That’s why the computation mnemonics contain D+A and D+M but not M+A. (I had read about this earlier, but I glossed over it because I didn’t have a reason to really dig into this detail until now.)

So if I have a variable stored in RAM location 16, that will never conflict with the line of my program contained in ROM location 16, because if the binary value for 16 is loaded into the A-register, it will either be routed to the program counter to effect a jump in the next clock cycle, or it will be routed to the CPU’s addressM output to fetch or store a value in the data memory – but never both!

Phew! I’m glad I cleared that up. Now I can get back to building my assembler with a little more confidence.

Handling symbols with a symbol table

Following the specifications in chapter 6, I first made a symbol table object with a couple methods:

// maps symbols to addresses:
function SymbolTable() {
    // add symbol/address pair
    this.addEntry = function(symbol, address) {
        this[symbol] = address;
    };
    
    // check if symbol is already in the table
    this.contains = function(symbol) {
        if (this.hasOwnProperty(symbol)) {
            return true;
        } else {
            return false;
        }
    };
    
    // return address of given symbol
    this.getAddress = function(symbol) {
        if (this.contains(symbol)) {
            return this[symbol];
        }
    };
}

I got caught up in reading about JavaScript’s hasOwnProperty() method for a while. As usual, there are many opinions on the internet, haha. I don’t fully understand the difference between direct and inherited properties, but that’s for another day. Aside from that, this was a very straightforward snippet of code to write and a good review of objects in JavaScript.

Next, I’ll add in the predefined symbols:

// maps symbols to addresses:
function SymbolTable() {
    // predefined symbols:
    this['SP'] = 0;
    this['LCL'] = 1;
    this['ARG'] = 2;
    this['THIS'] = 3;
    this['THAT'] = 4;
    this['SCREEN'] = 16384;
    this['KBD'] = 24576;
    
    // predefined symbols R0-R15
    for (var i=0;i<=15;i++) {
        this['R'+i] = i;
    }
    
    // the rest of the code for SymbolTable...
}

So far, so good! Onto the next piece of this puzzle.

Identifying labels and building the symbol table

The book recommends running the assembler through two passes: the first pass adds user-defined labels to the symbol table, mapping them to their locations in the ROM, and the second pass adds user-defined variables to the symbol table and completes the assembly process.

Here’s a quick outline of the next changes I need to make to my existing code:

Add a loop for the first pass to identify labels and add them to the symbol table
Extend my commandType function to identify labels
Extend my getSymbol function to extract the value or symbol of both A-instructions and labels
Extend loop for the second pass to add new variables to the symbol table and replace symbols with their addresses

First I added a quick and lazy fix to the commandType function:

// takes a command, returns A, L or C for instruction types.
function commandType(str) {
    if (str.charAt(0) === '@') {
        return 'A';
    } else if (str.charAt(0) === '(') {
        return 'L';
    } else {
        return 'C';
    }
}

Yeah, that is indeed pretty lazy, but whatever. My program isn’t checking for syntax, anyway. They don’t even mention that in the specifications! So if the assembly program has syntax errors, whoever wrote the program is on their own when it comes to debugging it. But that’s fine for now.

Next, my getSymbol function now handles labels too:

// takes an instruction string, returns value of A-instruction or L-instruction
function getSymbol(str) {
    if (commandType(str) === 'A') {
        return str.slice(1);
    } else if (commandType(str) === 'L'){
        return str.slice(1,-1);
    }
}

Once again, a very lazy solution prone to errors, but like I said, I don’t care about that right now because my program isn’t doing any error-checking anyway. Assuming the assembly program is written properly, my assembler will work properly.

Next, my loop for the first pass looks like this:

// first pass identifies labels, adds them to symbol table
instructionArray.forEach( function (instruction) {		
    var ROMcounter = 0;
    instruction = removeWhitespace(instruction);
    instruction = removeComments(instruction);			
    // if this line was empty space or a comment, skip it
    if (instruction === '') {
        return;
    } else if (instruction === 'C' || instruction === 'A') {
        // increment counter to keep track of ROM addresses
        ROMcounter++;
    } else if (commandType(instruction) === 'L') {
        // add label to symbol table with current ROM address
        symbolTable.addEntry( getSymbol(instruction), ROMcounter );
    }
});

Yay, that feels right! I have a good feeling about this.

The assembler’s final pass

Now to modify my second loop so that it can put all the final pieces together:

// second pass identifies variables, adds them to symbol table, replaces symbols with addresses, and translates operations into machine code
instructionArray.forEach( function (instruction) {			
    var RAMcounter = 16;
    instruction = removeWhitespace(instruction);
    instruction = removeComments(instruction);			
    // if this line was a full-line comment, empty space, or a label, skip it!
    if (instruction === '' || instruction === 'L') {
        return;
    } else if (commandType(instruction) === 'A') {				
        var AValue = getSymbol(instruction);				
        if ( !Number.isInteger( parseInt(AValue, 10) )) {
            // if it's a symbol (not a decimal integer), save to symbol table and get its value
            if (!symbolTable.contains(AValue)) {
                // if symbol isn't already in symbol table, add it and increase counter
                symbolTable.addEntry( AValue, RAMcounter);
                RAMcounter++;
            }
            // convert from symbol to address (decimal representation)
            AValue = symbolTable.getAddress(AValue);
        }				
        // convert from decimal to binary representation
        assemblerOutput += getBinary16(AValue) + '\n';		
    } else if (commandType(instruction) === 'C') {
        // convert C-instructions to binary representation
        assemblerOutput += getCInstructMachineCode( operationFields(instruction) ) + '\n';
    }
});

Here’s what I added onto version 1:

A counter for assigning RAM locations to user-defined symbols (starting at 16, in keeping with the specs)
It now skips labels in addition to empty lines
For A-instructions, it now checks for symbols and then for newly-defined symbols
It adds new symbols to the symbol table as needed
It replaces all symbols with their addresses before converting the addresses from decimal representation to binary

The first time I wrote that chunk of code, I forgot that I could have a mix of symbols and decimal values, like @100 and @SCREEN. I just had to add another condition and reorder stuff a little bit, but that was easy enough. Now I’m pretty sure I got it right.

Let’s find out! I’m running another sample assembly program through the assembler and loading the output file into the CPU eumlator…

Fingers crossed…

Aw, no luck. It’s doing something weird with the labels and variables. OK, time to debug!

This is always the not-so-fun part, and I’m tempted to just call it a day and spend my Sunday binge-watching TV or something instead. Nah, that wouldn’t make me any happier. I’ll stick with it.

OK, let’s see… Comparing my broken machine code output to the correct output, I see that in three places my code is outputting 0 instead of a number, and those three places correspond to each occurrence of a user-defined label. So that’s the first problem: it seems that my assembler is not actually storing what it should be storing in its first pass, or it’s not saving that data for the second pass.

A couple console.log() statements and checking the browser console should help clear this up… Oh! My counters aren’t counting. Yeah, that explains it. I defined my counter variables inside the loop that increments them, so they just keep resetting to their initial value. Oops!

At first I thought I had some weird scoping issue with my forEach function because it wasn’t working even after defining the variables outside the loop, so I tried using a simple for loop instead but that didn’t fix it either. Then I realized that I was using conditions like instruction === 'C' instead of commandType(instruction) === 'C'. Oops! Now it works.

But does it work correctly? Let’s run it in the CPU emulator again and find out…

Ahem. One more time. Fingers crossed…

Woohoo, success!!! My assembler is working! Just to be absolutely sure, I’m going to test all the other sample scripts too.

Haha, the script for the Pong game is 28374 lines long and my browser took an awfully long time to convert it into machine code! But it works! That is so cool, playing Pong in my CPU emulator, running from the machine code generated by my very first assembler!

Making my last commit and calling it a day. Phew!

Ideas for teaching what I’ve learned and what’s next

I decided to take a break after finishing the first half of this course, so now that it’s done, I feel a bit sad the same way I feel sad after finishing a truly excellent book. Part of me wishes it wasn’t over. A big part of me feels anxious right now, wondering: “Well, now what?!” It doesn’t help that it’s Sunday and I have no plans for the evening and I’m feeling unusually lonely today. Ugh. I’ll try my best to keep busy.

But yeah, what’s next: I started a NAND2Tetris study group through my meetup group, and the first meeting is in two days! So I’ll see how many people show up and what their ideas are. Then I’ll see how much time I want to devote to teaching what I just learned. I could just spend a couple hours a week reviewing the material with my study group, or I could throw myself into it and create something new.

Here are some of my ideas for creative/educational projects related to this course:

Make a whole series of videos about computer science from first principles!
Or just make a couple videos sharing my experience with NAND2Tetris.
Create an outline of lesson plans and supplemental material to help others who want to run NAND2Tetris study groups.
Build a JavaScript hardware simulator so other learners can visualize this stuff right in the web browser.
Write a set of tutorials with interactive visualizations.
Or just make some fun animated GIFs that illustrate the main concepts.
Make a mini-documentary with the help of some of my study group partners to explore the learning process in a collaborative, self-directed sort of environment.
Make a hybrid tutorial-slash-documentary that both teaches the material and documents how different people learn it.

Or I could just drop it, move on and keep learning new things. I’m very tempted to do that, because these other projects feel like slowing down rather than speeding up. Then again, I like the idea of creating content that others can benefit from, and some extra review would help ensure that I don’t forget what I’ve learned.

I know I love learning and teaching, and I know they are two complementary processes. I guess I just need to experiment until I find the right balance.

As for what else I want to learn: I signed up for a free online course called Machine Learning for Musicians and Artists, and that sounds very interesting! Maybe I’ll throw myself into that for a while, and then do the second half of NAND2Tetris in a few weeks from now with the help of my new study group.

OK, time to cook some food and figure out how else I should keep myself busy this Sunday evening. Goodnight!

Learning summary (#TIL)

Finished my very first assembler, completing the project for week 6 and wrapping up the first half of the NAND2Tetris course!

Next steps

Start teaching what I’ve learned over these last 5 weeks or so! Experiment with finding the right balance between learning and teaching.
Prepare for my first Nand2Tetris study group meeting.

Time breakdown

Study time: 4 hours 48 min
- Working on project: 3 hours 48 min
- Reviewing: 1 hour