Monday, November 30, 2009

Another space between updates

So it's been a while again! I went to Texas last week, so I was pretty much out of contact save the occasional text and facebook check.

I originally wanted to proceed with some more video code and implementation, but I was reading into the NDS video architecture and found out that I'd made a bit of a mistake. The code that I've been writing is what's called the BIOS of the CPU; the BIOS is just composed of routines that initialize the system, load some code from somewhere else (in this case, the firmware), and let it fly, along with the SWI routines that I talked about earlier.

The Nintendo DS has two chips in it, the ARMM7 and the ARM9, and they each need a BIOS. Up until now, I'd been putting all my code in the ARM7 BIOS, including the video code. However, I found out that the ARM7 chip in the DS is unable to access video memory! I had put no check in for that, so essentially I've been writing invalid DS code.

I decided to move to the ARMM9 BIOS and rewrite it...taking some stuff from the ARM7 BIOS I've already written. However, I ran into a problem; the ARM9 handles memory a bit differently than the ARM7, and as a result I was left unsure as to where I wanted to put my data. Essentially, the ARM9 has more of an internal memory manager which you can configure for memory protection, caching, etc, while with the ARM7, you have to accomplish most of that with external hardware.

I decided to take a detour and implement this memory configuration. It involves writing a set of coprocessor routines that take in some parameters, specified by a normal assembly language instruction, and using those parameters in a way that configures some aspect of the memory manager. For instance, the instruction:

MCR p15, 0, r0, c9, c1, 1

Asks coprocessor 15 (the internal memory manager) to write the value of ARM register 0 into the ITCM (Internal Tightly Coupled Memory) register. This basically configures a feature of the memory manager which allows code to reside in a non-cachable memory.

Anyways, I had to implement this in my emulator...there are 15 possible registers to write to, each with their own set of operations. I started working on this, BUT it got boring really quickly. It involved a lot of looking at the datasheet, then writing a line of code, then looking at the datasheet, trying to remember exactly what bit was which, etc. It was just a lot of implementation, without any real indication of what it was for.

This can happen in the world of programming; sometimes your mind just gets overloaded with information that you can't quite use, so all you are left with is an attempt to keep it all in your head. Since you can't actually apply it (in this case, in writing code) for a while, you just instead try to keep your mind fresh with it, which can be tough.

Anyways, I spent the entire day sort of kiblitzing on this, but I eventually got so bored I just sat on the sofa and watched TV until I fell asleep for a few hours. When I woke up, I decided to not even look at it and just investigate something else; in this case, I dusted the cobwebs off of Facebook application development. This helped my mind stop trying to memorize all the different coprocessor calls and registers, and balanced it out again, which also helped me remember WHY I was writing all these funny implementations.

Anyways, I'll pick it up again tomorrow.

Saturday, November 21, 2009

Structs and Syntax

I only realized today that it's been a few days since I last posted. Time flies when stuff is going on; hanging out with someone here and there, telling people about my trip, watching TV, etc.

I got structs into the assembler pretty quickly; now I can have something like:

.struct thing_typ
.word name
.word x
.word y
.endstruct

_variable:
.thing_typ

ldr r0, [_variable.name]
ldr r1, [_variable.x]
ldr r2, [_variable.y]
bl _print

or something similar. This just basically allows better representation and access of data for the future.

I then had to deal with a few extra features of structs, namely their size and the offset of each variable within the struct. In C, you get the size of a variable by saying "sizeof (variable)", which is an operation completed at compile-time. In an effort to avoid too much parsing and continuity in syntax, I decided to implement this with the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

mov r0, .thing_typ.sizeof

The assembler will convert ".thing_typ.sizeof" into an immediate value and pass that back to the instruciton parser. The reason I didn't opt for the C way was more an effort to keep parenthesis only for arithmetic operations, but it is perfectly implementable.

Getting the offset of variables in structs was a bit different. I envisioned a situation where I would loop through an array of them and want to access the variables of each one; with the syntax I first described, I would need a variable for each struct in the array. Therefore, the way I could iterate was to store a pointer in a register and just increment the register. In C, when you have a pointer to a struct, you can access each variable within it like this:

struct thing_typ
{
char* name;
unsigned int x;
unsigned int y;
};

thing_typ t;
thing_typ* ptr = &t;

ptr->name = "Andrew";

In my assembler, I implemented the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

t:
.thing_typ

ldr r0, =t
ldr r1, [r0, .thing_typ.x]

It's a bit funny to look at and wrap around at first, but I'll get used to it.

I'll probably spend some more time refining syntax (need to add support for static arrays), which will also lend an opportunity for me to practice ARM/THUMB assembly language some more, which will be big.

Wednesday, November 18, 2009

Sections fixed, start of structs

I fixed the problem with variable access between sections; it was simply a problem of the linker not knowing about the section I was trying to define.

The linker uses an external file with the extension ".ld" to determine how to order sections in the final binary; this file is simply composed of a collection of name-value pairs; names define section names, and values are memory addresses corresponding to where in the binary we want the section to be placed.

I was trying to define a section named ".hello", but this wasn't defined in the file, so when it came time to try to find variables from it, their addresses were completely wrong.

I put in some code to deal with sections that aren't defined in the file; if we encounter a section that has no definition, it is simply appended to the end of the binary...a simple and effective fix.

I started work on structs today, more to come later.

Tuesday, November 17, 2009

Europe with Soul

I returned from Europe on Sunday...the time I spent there was the best time I've ever had in my life. It was truly amazing to see the rest of the world; a world with soul, feeling, and experience all in one.

If you want to see some photos, I have them here

I got back into working on my projects yesterday; I had a few unresolved issues from when I left off. The first one had to do with the usage of registers to pass parameters to the various SWI functions. I changed this just to use a memory area, but I then took a quick look at the libnds definitions, and they seem to use registers. I'll have to look further into this; maybe the caller of the SWI routine expects certain registers to be clobbered and simply adapts to this?

I then dealt with some rendering issues, which took a bit of time. I wrote some code to draw a colored square to the screen, but it seemed to not be completing in time for the vertical blank...this resulted in me seeing about half of the square being drawn, followed by the other half. I did some analysis, and found that the swi routine I was using was slowing the rendering code down immensely.

The routine I was using was a routine that simply divides two numbers and returns the remainder. This is a simple case of subtracting the second number from the first until the first is smaller than the second, and recording both how many subtractions you made and what the final value of the first number is, which are the result and the remainder, respectively. However, this loop is expensive; on other CPU's, such as the Intel, the integer divide is done with an instruction that takes only a few cycles...with the ARM, this loop has to be written by hand, and it results in an operation that could be 100-1000 cycles, depending on the two numbers.

This is obviously a problem, as my draw square routine was using the divide to see if it had hit the end of a line to draw. I changed the routine to a seemingly slower one, which uses two loops; one for the vertical direction, and the other for the horizontal direction, and this fixed the slowdown.

I then ran into another timing issue; at each vertical blank I was copying my draw buffer to the main buffer and clearing this draw buffer in preparation for the next frame. This operation is far too slow and never completed in the time it needed to complete...I therefore had to blit directly to the main buffer and instead of clearing it entirely, only clear the parts that had been changed.

I attribute these slowdowns to the fact that the screen is updating at 60 Hz, and the ARM's CPU is far too slow to perform long operations such as clearing or copying within this timeframe. I have honestly never had to worry about this (being privleged to the confines of my PC and it's speed), so it's a good learning experience. Furthermore, this is a huge practice for writing slick and tight code, while keeping the clock cycle counts known...I already stressed this discipline before, but now it actually matters ;)

I ran into a few bugs with my assembler; it had a problem accessing variables from another section, so I need to take a look at that. Furthermore, I need to add long-awaited structure support to it, so I can represent all my objects with better syntax.