Thursday, December 10, 2009

Background Progress

I was going to write this last night, but I had to sleep early for a 7:30 AM dentist appointment...quite a start to the day.

As I wrote before, the DS has 4 different types of backgrounds: text, rotation, extended, and bitmap. Text backgrounds have about the same capability as what you'll find on the GBC or NES, while rotation backgrounds were very popular on the SNES.

As of now, I have text backgrounds working well; they render correctly, they scroll correctly, and my virtual memory mapper seems to be working well. I ran into a bug yesterday where I had loaded tile data into the sub core memory space, but no map data, and stuff was showing up on the sub core screen! To make it simpler, I was trying to get stuff to show up on both the screens; each screen has an associated core; the main core is the primary rendering core of the DS; it supports a few more styles, plus 3D...and can map more memory. The sub core is a secondary rendering core which has less memory at its disposal and lesser capabilities. Therefore, each core has its own virtual memory space, but for whatever reason, the sub core was taking data from the main core space. This was a bug in my virtual memory mapper; I wasn't doing a check correctly, so each memory read to this virtual memory space was always going through the main core.

My next task will be to read up on extended and rotation backgrounds; they seem pretty straightforward, albeit full of more features, so we'll see how that goes.

Monday, December 7, 2009

Start of Backgrounds

So after a little bit of fiddling with the ARM9 memory manager, I decided to dive straight into the next stage of the DS video core...the backgrounds.

Essentially, part of the DS video core allows you to define a background, which is essentially an array of indexes into an array of tiles. The tiles are represented as palettized image data; either 8-bit indexes or 4-bit indexes. The last time I really dealt with this was when I was writing my NES and GBC emulators; both systems handle backgrounds pretty much the same way, and what they have to offer is quite simple.

The DS's background core is far more complex; it supports 4 different backgrounds, with two types of palettes, 4 different sizes, and alpha blending between backgrounds and sprites. Furthermore, there is no static memory reserved for the map or tile data; you instead map a VRAM bank to background data. Also, there are 5 different modes of operation, and in each mode, the backgrounds have different types; they can either be text, rotation, or extended.

Text backgrounds are the simplest backgrounds, and the ones I decided to go with first. They are similar to the NES and GBC in that they're just represented by an array of indexes, and the only transformation you can do on them is scroll them.

Rotation backgrounds (also called affline backgrounds) are the same as text backgrounds, except you can perform rotation, shearing, or scaling operations on them for neat visual effects. This is accomplished with a 2x2 matrix which maps screen space to background space.

Extended backgrounds I believe are just large bitmaps; I haven't read too much into these yet, but that will come soon.

I ran into a quick hiccup with the palettes; palette memory occupies 1 KB of space, but there are two different palette modes: 256 color and 16x16 color. Each palette entry occupies 2 bytes, so a quick calculation says that there can be 512 possible palette entries. 256 color mode should only use 256, and 16x16 color mode (16 colors split into 16 palettes), should really be 16x32 mode? Was it true that the palette only used 512 bytes of the 1 KB space?

I eventually had to look this up online, and found out that the 1 KB space is in fact split into two; backgrounds occupy the first 512 bytes, and sprites occupy the next 512 bytes!

This is the sort of answer that's unbelievably obvious when you uncover it. I figured that this had to be the case, but I couldn't quite put together the full explanation. I could have gone with the assumption and figured it out down the road, but lately I'm feeling that the best option is to be as trained as possible about everything that's happening in the system, as you never know when things change just because an assumption you made was wrong, or didn't factor everything in.

I've tried to avoid running another emulator in my quest to learn how the system works. This may seem like an unusual approach, but it has several key reasons. First off, emulators might not emulate the system completely correctly, so I can possibly get into situations where I'm not sure if my code is correct or the emulator's interpretation is correct. Furthermore, I want to keep my investigation to raw datasheets and reverse-engineering, as this is a clear limitation on my resources, which is tremendous practice for when the situation actually calls for a clear limitation with no other option.

Anyways, I'm getting pretty tired right now, so that's about all I have to write.

Wednesday, December 2, 2009

Assembly Language Extasy

One thing about me: If there's ever an excuse to write assembly language, I'll take it and automatically try to shut down everything else. I did it about a year ago, when a friend needed some help with a bubble-sort-in-x86 assignment; I wrote it for him and then started writing everything in assembly language, including Space Invaders.

In messing around with the BIOSes of both chips, I've found that my focus has lacked a bit on the actual emulator. For instance, I made a stupid mistake in the memory manager; for each possible chip, I check to see if the address I'm trying to access can be accessed by this chip. I hacked together a quick two-register chip, and failed to write this validation function correctly (twice).

This sort of focus lapse is common when something is exciting, you just have to continue to remember why you're writing what you're writing, and on that focus, start to pick out the challenges and concentrate on them. I'm doing that just fine with assembly language, and the challenge it presents is an easy distraction!

Monday, November 30, 2009

Another space between updates

So it's been a while again! I went to Texas last week, so I was pretty much out of contact save the occasional text and facebook check.

I originally wanted to proceed with some more video code and implementation, but I was reading into the NDS video architecture and found out that I'd made a bit of a mistake. The code that I've been writing is what's called the BIOS of the CPU; the BIOS is just composed of routines that initialize the system, load some code from somewhere else (in this case, the firmware), and let it fly, along with the SWI routines that I talked about earlier.

The Nintendo DS has two chips in it, the ARMM7 and the ARM9, and they each need a BIOS. Up until now, I'd been putting all my code in the ARM7 BIOS, including the video code. However, I found out that the ARM7 chip in the DS is unable to access video memory! I had put no check in for that, so essentially I've been writing invalid DS code.

I decided to move to the ARMM9 BIOS and rewrite it...taking some stuff from the ARM7 BIOS I've already written. However, I ran into a problem; the ARM9 handles memory a bit differently than the ARM7, and as a result I was left unsure as to where I wanted to put my data. Essentially, the ARM9 has more of an internal memory manager which you can configure for memory protection, caching, etc, while with the ARM7, you have to accomplish most of that with external hardware.

I decided to take a detour and implement this memory configuration. It involves writing a set of coprocessor routines that take in some parameters, specified by a normal assembly language instruction, and using those parameters in a way that configures some aspect of the memory manager. For instance, the instruction:

MCR p15, 0, r0, c9, c1, 1

Asks coprocessor 15 (the internal memory manager) to write the value of ARM register 0 into the ITCM (Internal Tightly Coupled Memory) register. This basically configures a feature of the memory manager which allows code to reside in a non-cachable memory.

Anyways, I had to implement this in my emulator...there are 15 possible registers to write to, each with their own set of operations. I started working on this, BUT it got boring really quickly. It involved a lot of looking at the datasheet, then writing a line of code, then looking at the datasheet, trying to remember exactly what bit was which, etc. It was just a lot of implementation, without any real indication of what it was for.

This can happen in the world of programming; sometimes your mind just gets overloaded with information that you can't quite use, so all you are left with is an attempt to keep it all in your head. Since you can't actually apply it (in this case, in writing code) for a while, you just instead try to keep your mind fresh with it, which can be tough.

Anyways, I spent the entire day sort of kiblitzing on this, but I eventually got so bored I just sat on the sofa and watched TV until I fell asleep for a few hours. When I woke up, I decided to not even look at it and just investigate something else; in this case, I dusted the cobwebs off of Facebook application development. This helped my mind stop trying to memorize all the different coprocessor calls and registers, and balanced it out again, which also helped me remember WHY I was writing all these funny implementations.

Anyways, I'll pick it up again tomorrow.

Saturday, November 21, 2009

Structs and Syntax

I only realized today that it's been a few days since I last posted. Time flies when stuff is going on; hanging out with someone here and there, telling people about my trip, watching TV, etc.

I got structs into the assembler pretty quickly; now I can have something like:

.struct thing_typ
.word name
.word x
.word y
.endstruct

_variable:
.thing_typ

ldr r0, [_variable.name]
ldr r1, [_variable.x]
ldr r2, [_variable.y]
bl _print

or something similar. This just basically allows better representation and access of data for the future.

I then had to deal with a few extra features of structs, namely their size and the offset of each variable within the struct. In C, you get the size of a variable by saying "sizeof (variable)", which is an operation completed at compile-time. In an effort to avoid too much parsing and continuity in syntax, I decided to implement this with the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

mov r0, .thing_typ.sizeof

The assembler will convert ".thing_typ.sizeof" into an immediate value and pass that back to the instruciton parser. The reason I didn't opt for the C way was more an effort to keep parenthesis only for arithmetic operations, but it is perfectly implementable.

Getting the offset of variables in structs was a bit different. I envisioned a situation where I would loop through an array of them and want to access the variables of each one; with the syntax I first described, I would need a variable for each struct in the array. Therefore, the way I could iterate was to store a pointer in a register and just increment the register. In C, when you have a pointer to a struct, you can access each variable within it like this:

struct thing_typ
{
char* name;
unsigned int x;
unsigned int y;
};

thing_typ t;
thing_typ* ptr = &t;

ptr->name = "Andrew";

In my assembler, I implemented the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

t:
.thing_typ

ldr r0, =t
ldr r1, [r0, .thing_typ.x]

It's a bit funny to look at and wrap around at first, but I'll get used to it.

I'll probably spend some more time refining syntax (need to add support for static arrays), which will also lend an opportunity for me to practice ARM/THUMB assembly language some more, which will be big.

Wednesday, November 18, 2009

Sections fixed, start of structs

I fixed the problem with variable access between sections; it was simply a problem of the linker not knowing about the section I was trying to define.

The linker uses an external file with the extension ".ld" to determine how to order sections in the final binary; this file is simply composed of a collection of name-value pairs; names define section names, and values are memory addresses corresponding to where in the binary we want the section to be placed.

I was trying to define a section named ".hello", but this wasn't defined in the file, so when it came time to try to find variables from it, their addresses were completely wrong.

I put in some code to deal with sections that aren't defined in the file; if we encounter a section that has no definition, it is simply appended to the end of the binary...a simple and effective fix.

I started work on structs today, more to come later.

Tuesday, November 17, 2009

Europe with Soul

I returned from Europe on Sunday...the time I spent there was the best time I've ever had in my life. It was truly amazing to see the rest of the world; a world with soul, feeling, and experience all in one.

If you want to see some photos, I have them here

I got back into working on my projects yesterday; I had a few unresolved issues from when I left off. The first one had to do with the usage of registers to pass parameters to the various SWI functions. I changed this just to use a memory area, but I then took a quick look at the libnds definitions, and they seem to use registers. I'll have to look further into this; maybe the caller of the SWI routine expects certain registers to be clobbered and simply adapts to this?

I then dealt with some rendering issues, which took a bit of time. I wrote some code to draw a colored square to the screen, but it seemed to not be completing in time for the vertical blank...this resulted in me seeing about half of the square being drawn, followed by the other half. I did some analysis, and found that the swi routine I was using was slowing the rendering code down immensely.

The routine I was using was a routine that simply divides two numbers and returns the remainder. This is a simple case of subtracting the second number from the first until the first is smaller than the second, and recording both how many subtractions you made and what the final value of the first number is, which are the result and the remainder, respectively. However, this loop is expensive; on other CPU's, such as the Intel, the integer divide is done with an instruction that takes only a few cycles...with the ARM, this loop has to be written by hand, and it results in an operation that could be 100-1000 cycles, depending on the two numbers.

This is obviously a problem, as my draw square routine was using the divide to see if it had hit the end of a line to draw. I changed the routine to a seemingly slower one, which uses two loops; one for the vertical direction, and the other for the horizontal direction, and this fixed the slowdown.

I then ran into another timing issue; at each vertical blank I was copying my draw buffer to the main buffer and clearing this draw buffer in preparation for the next frame. This operation is far too slow and never completed in the time it needed to complete...I therefore had to blit directly to the main buffer and instead of clearing it entirely, only clear the parts that had been changed.

I attribute these slowdowns to the fact that the screen is updating at 60 Hz, and the ARM's CPU is far too slow to perform long operations such as clearing or copying within this timeframe. I have honestly never had to worry about this (being privleged to the confines of my PC and it's speed), so it's a good learning experience. Furthermore, this is a huge practice for writing slick and tight code, while keeping the clock cycle counts known...I already stressed this discipline before, but now it actually matters ;)

I ran into a few bugs with my assembler; it had a problem accessing variables from another section, so I need to take a look at that. Furthermore, I need to add long-awaited structure support to it, so I can represent all my objects with better syntax.