Daily Learning Notes for February 3, 2016

I can’t believe it was so easy to finish building the final computer for NAND2Tetris! I was pretty happy about that yesterday but I wish it was a bit more challenging. So today, starting on week 6 of the course, I’m hoping that I’ll have a bigger challenge to sink my teeth into!

Chapter 6 reading notes

Learning materials for this week:

I already watched the video lectures (on double speed), but not much of it sank in. So now I’m going to do my usual active reading to help myself stay focused as I read chapter 6.

What the assembler does:

It parses assembly commands into separate fields,
translates each field into machine code,
replaces symbols with memory addresses,
and assembles the generated codes into a complete set of machine code instructions.

Their recommended implementation has a module for each of those four tasks: a parser, a code module for translating fields into machine code, a symbol table, and a main program that puts all the pieces together.

The first, second, and fourth tasks in that list are easy to implement – so they say! The third task is the hard part, so they devote a section of the chapter to it. Let’s read all about it!

Symbol tables for writing an assembler

The main challenge in building an assembler – a program that translates assembly code into machine code – lies in translating the user-defined symbols to physical memory addresses. As they put it, “the mapping of user-defined variable names and symbolic labels on actual memory addresses is not trivial.” I’d say so! To me, it sounds downright scary!

So they give an example on page 106 and introduce the concept of a symbol table, a list of every symbol used in the assembly program, along with each symbol’s corresponding memory address.

It doesn’t matter which memory address is used for which symbol, as long as they’re consistent. (And I assume I should only use memory addresses that aren’t already being used, of course!) They note that most programming languages will require some more complex features in their assemblers, but this one is pretty straightforward – good for beginners like me!

And here’s another important note: usually, humans don’t write assembly programs; compilers do! But apparently it’s pretty common in the C language for programmers to embed some assembly language within their programs. That sounds fun! Something to learn about later.

Specifications for the Hack assembler

Most of this week’s chapter just describes the specifications for the assembler. Here are my notes on the main points:

It converts .asm files to .hack files.
The .hack binary code files contain one 16-bit instruction code per line, written in ASCII code characters containing nothing but 0s and 1s.
When the code is loaded into the computer’s instruction memory, each line of the .hack file corresponds to an address in the instruction memory starting at 0.
Each line of an assembly language program represents either an instruction or a symbol declaration.
Comments and white space are ignored.
Constants are non-negative decimal numbers and symbols can contain letters, digits, underscores, periods, dollar signs, and colons. Symbols must not begin with a digit, and they’re case-sensitive. Built-in assembly mnemonics must be written in all caps.
Pre-defined symbols are stored in RAM addresses 0 to 15, and user-defined symbols are stored in consecutive RAM addresses beginning at 16.
Symbols are either variables pointing to memory locations, labels pointing to locations in the instruction memory, or pre-defined symbols.

They outline a sort of language-agnostic API for the assembler’s modules, so I could build it in any language I’m most comfortable with. Python or Java would work well but maybe I’ll try doing this in JavaScript…

Anyway, they recommend first writing an assembler that doesn’t handle symbols and then extending it to create the finished assembler that handles symbols as well. Makes sense to me. They provide a few sample programs written both with and without symbols, which I’ll use to test my assembler.

As for handling user-defined symbols, they say that a common solution is to make two passes over the assembly code: first build the symbol table, assigning memory addresses to each symbol, and then replace each symbol with its address.

Writing my first assembler – in JavaScript!

I thought it would be fun to write my assembler in JavaScript, since I’ve been teaching JavaScript workshops for a while now and it’s still pretty fresh in my mind.

First question: can I generate a text file with JavaScript? (I need to create a file to load into my Hack computer in order to test that it works!)

Answer: Yes, I sure can! I’m sure I could do this with Node.js, but apparently I can also do this in the browser! I found this StackOverflow answer explaining that I can create a download link allowing a user to download a generated file, thanks to Blob and URL.createObjectURL()! So cool! They even provide this JSFiddle with working code.

First I’m creating a GitHub repository for this project. OK, done! I made a little page for the project.

And I know, I know, I’m not supposed to give away answers to the course projects, but it seems silly to build a web app without publishing it on the web! I just want to share it with friends from my meetup group, and a link is the easiest way to do that. Besides, I’ve found lots of other complete solutions posted online. It’s a good learning resource to compare your solution to other solutions after you’ve given it your best shot.

Anyway, for now I’m going to just paste my assembly code into a text field in my web app, but later I might want to figure out how to read it from a file.

To figure out how to modify the contents of my file with JavaScript, I’m reading about the Blob object (and giggling about the term “blob constructor”). Oh OK, I don’t need to know about blobs! I can just manipulate the value of the text box as a string and then turn that into a blob. Simple solution.

So my next commit for this project is just the container for the soon-to-be-written assembler, which takes text input and spits it out as a .hack file that the user can download by clicking a link.

The parser and code modules

For my next commit, I created a skeleton of an assembler function with parser and translator modules containing functions as outlined in the book – well, more or less. I don’t need an “advance” function if I’m using a forEach loop in JavaScript, for example. I’ll use the handy dandy split() method to turn the giant text string into an array, one element for each line of the assembly program.

Here’s what the skeleton of my assembler looks like:

// VERSION 1: for .asm files with NO symbols or labels, only constants
	function assemble (asmProgram) {		
		var assemblerOutputArray = [];
		var instructionArray = asmProgram.split('\n');
		
		instructionArray.forEach( function (instruction) {
			// do stuff to parse and translate here!
		});
		
		var parser = {};

		parser.commandType = function(str) {
			// takes an instruction string, returns A, C, or L for command types
		};
		
		parser.symbol = function(str) {
			// takes an instruction string, returns symbol/decimal number of A or L
		};
		
		parser.dest = function(str) {
			// takes an instruction string, returns 1 of 8 dest mnemonic strings from C
		};
		
		parser.comp = function(str) {
			// takes an instruction string, returns 1 of 28 comp mnemonic strings from C
		};
		
		parser.jump = function(str) {
			// takes an instruction string, returns 1 of 8 jump mnemonic strings from C
		};
		
		var translator = {};
		
		translator.dest = function(field) {
			// takes a mnemonic field from C instruction, returns 3-bit binary code for dest
		};
		
		translator.comp = function(field) {
			// takes a mnemonic field from C instruction, returns 7-bit binary code for comp
		};
		
		translator.jump = function(field) {
			// takes a mnemonic field from C instruction, returns 3-bit binary code for jump
		};
		
		return asmProgram; // need to change this to output newly generated machine code instead!
		
	}; // end of assembler

I always find it helpful to begin a project by outlining its parts and what each part should do. So I typically write a bunch of comments and empty functions before I start writing my program. It just helps me stay organized.

Unfortunately, I’ve run out of time today! I’ll have to pick this back up again tomorrow.

New concepts and jargon

Software hierarchy
Symbol table
Input and output streams

Questions

Is a compiler also an assembler?
When is it helpful to embed assembly code within a higher-level program?
How can you upload and read the contents of a file with JavaScript and web APIs?

Learning summary (#TIL)

I studied the specifications for the Hack assembler and checked that I understand what it’s supposed to do.
I found and tested some example code for creating downloadable files with JavaScript in the web browser.
I created a structure (scaffolding?) to prepare to start writing my first assembler!

Next steps

Finish building the Hack assembler to complete the project for week 6!

Time breakdown

Study time: 4 hours 15 min
- Active reading: 2 hours 12 min
- Watching video lectures: 29 min
- Working on project: 1 hour 35 min