Thursday, December 10, 2009

Background Progress

I was going to write this last night, but I had to sleep early for a 7:30 AM dentist appointment...quite a start to the day.

As I wrote before, the DS has 4 different types of backgrounds: text, rotation, extended, and bitmap. Text backgrounds have about the same capability as what you'll find on the GBC or NES, while rotation backgrounds were very popular on the SNES.

As of now, I have text backgrounds working well; they render correctly, they scroll correctly, and my virtual memory mapper seems to be working well. I ran into a bug yesterday where I had loaded tile data into the sub core memory space, but no map data, and stuff was showing up on the sub core screen! To make it simpler, I was trying to get stuff to show up on both the screens; each screen has an associated core; the main core is the primary rendering core of the DS; it supports a few more styles, plus 3D...and can map more memory. The sub core is a secondary rendering core which has less memory at its disposal and lesser capabilities. Therefore, each core has its own virtual memory space, but for whatever reason, the sub core was taking data from the main core space. This was a bug in my virtual memory mapper; I wasn't doing a check correctly, so each memory read to this virtual memory space was always going through the main core.

My next task will be to read up on extended and rotation backgrounds; they seem pretty straightforward, albeit full of more features, so we'll see how that goes.

Monday, December 7, 2009

Start of Backgrounds

So after a little bit of fiddling with the ARM9 memory manager, I decided to dive straight into the next stage of the DS video core...the backgrounds.

Essentially, part of the DS video core allows you to define a background, which is essentially an array of indexes into an array of tiles. The tiles are represented as palettized image data; either 8-bit indexes or 4-bit indexes. The last time I really dealt with this was when I was writing my NES and GBC emulators; both systems handle backgrounds pretty much the same way, and what they have to offer is quite simple.

The DS's background core is far more complex; it supports 4 different backgrounds, with two types of palettes, 4 different sizes, and alpha blending between backgrounds and sprites. Furthermore, there is no static memory reserved for the map or tile data; you instead map a VRAM bank to background data. Also, there are 5 different modes of operation, and in each mode, the backgrounds have different types; they can either be text, rotation, or extended.

Text backgrounds are the simplest backgrounds, and the ones I decided to go with first. They are similar to the NES and GBC in that they're just represented by an array of indexes, and the only transformation you can do on them is scroll them.

Rotation backgrounds (also called affline backgrounds) are the same as text backgrounds, except you can perform rotation, shearing, or scaling operations on them for neat visual effects. This is accomplished with a 2x2 matrix which maps screen space to background space.

Extended backgrounds I believe are just large bitmaps; I haven't read too much into these yet, but that will come soon.

I ran into a quick hiccup with the palettes; palette memory occupies 1 KB of space, but there are two different palette modes: 256 color and 16x16 color. Each palette entry occupies 2 bytes, so a quick calculation says that there can be 512 possible palette entries. 256 color mode should only use 256, and 16x16 color mode (16 colors split into 16 palettes), should really be 16x32 mode? Was it true that the palette only used 512 bytes of the 1 KB space?

I eventually had to look this up online, and found out that the 1 KB space is in fact split into two; backgrounds occupy the first 512 bytes, and sprites occupy the next 512 bytes!

This is the sort of answer that's unbelievably obvious when you uncover it. I figured that this had to be the case, but I couldn't quite put together the full explanation. I could have gone with the assumption and figured it out down the road, but lately I'm feeling that the best option is to be as trained as possible about everything that's happening in the system, as you never know when things change just because an assumption you made was wrong, or didn't factor everything in.

I've tried to avoid running another emulator in my quest to learn how the system works. This may seem like an unusual approach, but it has several key reasons. First off, emulators might not emulate the system completely correctly, so I can possibly get into situations where I'm not sure if my code is correct or the emulator's interpretation is correct. Furthermore, I want to keep my investigation to raw datasheets and reverse-engineering, as this is a clear limitation on my resources, which is tremendous practice for when the situation actually calls for a clear limitation with no other option.

Anyways, I'm getting pretty tired right now, so that's about all I have to write.

Wednesday, December 2, 2009

Assembly Language Extasy

One thing about me: If there's ever an excuse to write assembly language, I'll take it and automatically try to shut down everything else. I did it about a year ago, when a friend needed some help with a bubble-sort-in-x86 assignment; I wrote it for him and then started writing everything in assembly language, including Space Invaders.

In messing around with the BIOSes of both chips, I've found that my focus has lacked a bit on the actual emulator. For instance, I made a stupid mistake in the memory manager; for each possible chip, I check to see if the address I'm trying to access can be accessed by this chip. I hacked together a quick two-register chip, and failed to write this validation function correctly (twice).

This sort of focus lapse is common when something is exciting, you just have to continue to remember why you're writing what you're writing, and on that focus, start to pick out the challenges and concentrate on them. I'm doing that just fine with assembly language, and the challenge it presents is an easy distraction!

Monday, November 30, 2009

Another space between updates

So it's been a while again! I went to Texas last week, so I was pretty much out of contact save the occasional text and facebook check.

I originally wanted to proceed with some more video code and implementation, but I was reading into the NDS video architecture and found out that I'd made a bit of a mistake. The code that I've been writing is what's called the BIOS of the CPU; the BIOS is just composed of routines that initialize the system, load some code from somewhere else (in this case, the firmware), and let it fly, along with the SWI routines that I talked about earlier.

The Nintendo DS has two chips in it, the ARMM7 and the ARM9, and they each need a BIOS. Up until now, I'd been putting all my code in the ARM7 BIOS, including the video code. However, I found out that the ARM7 chip in the DS is unable to access video memory! I had put no check in for that, so essentially I've been writing invalid DS code.

I decided to move to the ARMM9 BIOS and rewrite it...taking some stuff from the ARM7 BIOS I've already written. However, I ran into a problem; the ARM9 handles memory a bit differently than the ARM7, and as a result I was left unsure as to where I wanted to put my data. Essentially, the ARM9 has more of an internal memory manager which you can configure for memory protection, caching, etc, while with the ARM7, you have to accomplish most of that with external hardware.

I decided to take a detour and implement this memory configuration. It involves writing a set of coprocessor routines that take in some parameters, specified by a normal assembly language instruction, and using those parameters in a way that configures some aspect of the memory manager. For instance, the instruction:

MCR p15, 0, r0, c9, c1, 1

Asks coprocessor 15 (the internal memory manager) to write the value of ARM register 0 into the ITCM (Internal Tightly Coupled Memory) register. This basically configures a feature of the memory manager which allows code to reside in a non-cachable memory.

Anyways, I had to implement this in my emulator...there are 15 possible registers to write to, each with their own set of operations. I started working on this, BUT it got boring really quickly. It involved a lot of looking at the datasheet, then writing a line of code, then looking at the datasheet, trying to remember exactly what bit was which, etc. It was just a lot of implementation, without any real indication of what it was for.

This can happen in the world of programming; sometimes your mind just gets overloaded with information that you can't quite use, so all you are left with is an attempt to keep it all in your head. Since you can't actually apply it (in this case, in writing code) for a while, you just instead try to keep your mind fresh with it, which can be tough.

Anyways, I spent the entire day sort of kiblitzing on this, but I eventually got so bored I just sat on the sofa and watched TV until I fell asleep for a few hours. When I woke up, I decided to not even look at it and just investigate something else; in this case, I dusted the cobwebs off of Facebook application development. This helped my mind stop trying to memorize all the different coprocessor calls and registers, and balanced it out again, which also helped me remember WHY I was writing all these funny implementations.

Anyways, I'll pick it up again tomorrow.

Saturday, November 21, 2009

Structs and Syntax

I only realized today that it's been a few days since I last posted. Time flies when stuff is going on; hanging out with someone here and there, telling people about my trip, watching TV, etc.

I got structs into the assembler pretty quickly; now I can have something like:

.struct thing_typ
.word name
.word x
.word y
.endstruct

_variable:
.thing_typ

ldr r0, [_variable.name]
ldr r1, [_variable.x]
ldr r2, [_variable.y]
bl _print

or something similar. This just basically allows better representation and access of data for the future.

I then had to deal with a few extra features of structs, namely their size and the offset of each variable within the struct. In C, you get the size of a variable by saying "sizeof (variable)", which is an operation completed at compile-time. In an effort to avoid too much parsing and continuity in syntax, I decided to implement this with the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

mov r0, .thing_typ.sizeof

The assembler will convert ".thing_typ.sizeof" into an immediate value and pass that back to the instruciton parser. The reason I didn't opt for the C way was more an effort to keep parenthesis only for arithmetic operations, but it is perfectly implementable.

Getting the offset of variables in structs was a bit different. I envisioned a situation where I would loop through an array of them and want to access the variables of each one; with the syntax I first described, I would need a variable for each struct in the array. Therefore, the way I could iterate was to store a pointer in a register and just increment the register. In C, when you have a pointer to a struct, you can access each variable within it like this:

struct thing_typ
{
char* name;
unsigned int x;
unsigned int y;
};

thing_typ t;
thing_typ* ptr = &t;

ptr->name = "Andrew";

In my assembler, I implemented the following syntax:

.struct thing_typ
.word name
.word x
.word y
.endstruct

t:
.thing_typ

ldr r0, =t
ldr r1, [r0, .thing_typ.x]

It's a bit funny to look at and wrap around at first, but I'll get used to it.

I'll probably spend some more time refining syntax (need to add support for static arrays), which will also lend an opportunity for me to practice ARM/THUMB assembly language some more, which will be big.

Wednesday, November 18, 2009

Sections fixed, start of structs

I fixed the problem with variable access between sections; it was simply a problem of the linker not knowing about the section I was trying to define.

The linker uses an external file with the extension ".ld" to determine how to order sections in the final binary; this file is simply composed of a collection of name-value pairs; names define section names, and values are memory addresses corresponding to where in the binary we want the section to be placed.

I was trying to define a section named ".hello", but this wasn't defined in the file, so when it came time to try to find variables from it, their addresses were completely wrong.

I put in some code to deal with sections that aren't defined in the file; if we encounter a section that has no definition, it is simply appended to the end of the binary...a simple and effective fix.

I started work on structs today, more to come later.

Tuesday, November 17, 2009

Europe with Soul

I returned from Europe on Sunday...the time I spent there was the best time I've ever had in my life. It was truly amazing to see the rest of the world; a world with soul, feeling, and experience all in one.

If you want to see some photos, I have them here

I got back into working on my projects yesterday; I had a few unresolved issues from when I left off. The first one had to do with the usage of registers to pass parameters to the various SWI functions. I changed this just to use a memory area, but I then took a quick look at the libnds definitions, and they seem to use registers. I'll have to look further into this; maybe the caller of the SWI routine expects certain registers to be clobbered and simply adapts to this?

I then dealt with some rendering issues, which took a bit of time. I wrote some code to draw a colored square to the screen, but it seemed to not be completing in time for the vertical blank...this resulted in me seeing about half of the square being drawn, followed by the other half. I did some analysis, and found that the swi routine I was using was slowing the rendering code down immensely.

The routine I was using was a routine that simply divides two numbers and returns the remainder. This is a simple case of subtracting the second number from the first until the first is smaller than the second, and recording both how many subtractions you made and what the final value of the first number is, which are the result and the remainder, respectively. However, this loop is expensive; on other CPU's, such as the Intel, the integer divide is done with an instruction that takes only a few cycles...with the ARM, this loop has to be written by hand, and it results in an operation that could be 100-1000 cycles, depending on the two numbers.

This is obviously a problem, as my draw square routine was using the divide to see if it had hit the end of a line to draw. I changed the routine to a seemingly slower one, which uses two loops; one for the vertical direction, and the other for the horizontal direction, and this fixed the slowdown.

I then ran into another timing issue; at each vertical blank I was copying my draw buffer to the main buffer and clearing this draw buffer in preparation for the next frame. This operation is far too slow and never completed in the time it needed to complete...I therefore had to blit directly to the main buffer and instead of clearing it entirely, only clear the parts that had been changed.

I attribute these slowdowns to the fact that the screen is updating at 60 Hz, and the ARM's CPU is far too slow to perform long operations such as clearing or copying within this timeframe. I have honestly never had to worry about this (being privleged to the confines of my PC and it's speed), so it's a good learning experience. Furthermore, this is a huge practice for writing slick and tight code, while keeping the clock cycle counts known...I already stressed this discipline before, but now it actually matters ;)

I ran into a few bugs with my assembler; it had a problem accessing variables from another section, so I need to take a look at that. Furthermore, I need to add long-awaited structure support to it, so I can represent all my objects with better syntax.

Wednesday, October 28, 2009

Software Interrupt Odyssey

So, I had a few nasty bugs while dealing with software interrupts.

The first one arose with my code triggering a data abort exception once the software interrupt handler was finished. It took me a little bit of thinking to figure this one out, but what was happening requires a bit of a technical and detailed explanation:

When you enter the software interrupt mode, your stack pointer changes. Each different mode that the CPU supports has their own stack pointer for security purposes. Therefore, when you are finished in the mode, you have to switch the mode back, but you also most likely have to manipulate your stack pointer (as all the registers are saved since they are modified in your mode handler). You need to do the mode switch and register restore in one operation, which basically reloads all the registers and jumps back to where you were before the mode handler was set off.

My implementation of this operation failed to take into account that the mode could be switched during the operation, and if so, the stack pointer cannot be modified. Therefore, what was happening was that the stack pointer was being returned in an incorrect state, and a "data abort exception" was happening as a result.

The other bug occurred in my rendering code; it was also data aborting because the stack pointer was invalid. What was happening in the end was that my rendering code was actually modifying an instruction that was to be executed, which was telling it to modify the stack pointer more than it needed to be modified! Another bug that took a few minutes to find, but once I saw the modified instruction, I knew exactly what the problem was.

However, my software interrupt handlers are still very up in the air. I'm not entirely sure what parameters each one take, how they take them, plus how they return their results. I'll figure this out somehow, but for now it's a bit hacked together, which isn't a good thing and needs to be taken care of.

When programming, odd bugs like the data aborts happen all the time. They're the type of bugs that can really throw off your tempo and rhythm and force you into a debugging session with an unclear understanding of when it will end. The best solution to this is to simply not let it disrupt your tempo, but rather adapt your tempo to the task at hand and just focus on solving it, solving it well, and being willing to change whatever needs to be changed.

I'm heading off to Europe in a few hours, so I'll most likely update this blog when I get back on November 14th.

Saturday, October 24, 2009

Much better progress

Had a much better session today. I implemented the symbol debugger and began work on the BIOS SWI routines. The symbol debugger was easy to finish off, and it's very useful already.

The SWI routines are an interesting batch. They are basically like what system calls are to an operating system. The average system has two modes of operation: user mode and kernel mode, which is also known as privileged mode. User mode is where your programs run, and privileged mode is where the operating system runs. Whenever your program wants to run an operation that the operating system can do; such as open a file, it makes a system call which passes control to the operating system, which then does its work.

The SWI's function in the same way; they are routines provided by the BIOS that do things such as divide (as the ARM has no native divide instruction) or wait for a vertical blank interrupt. Implementing them all is essential, but by their names, they are quite detailed.

Also, for some random reason, if you do an SWI through ARM code, you have to specify the number in the upper 16-bits of the expression, like:

SWI 0x00010000: Perform operation 1

Not quite sure why this is, but it was in one of the DS manuals I've been reading.

Thursday, October 22, 2009

Progress

I continued work on the symbol debugger today, but I wasn't feeling too well, so I left Starbucks a bit earlier than I wanted to.

I hashed out a few things; the loader within the emulator for the debugger file, as well as the command line option to enable/disable the debugger for each chip in the DS. I also put in a command line option to specify the BIOS image to load, as this serves its purpose right away for the debugger file format.

You really have to constantly stay on your toes when you're working on something. A common pitfall is to plan out exactly what you're gonna write code-wise then just write it. This is a pitfall because lots of people don't realize that writing code is like constructing a building; you can have a blueprint, but there are also challenges in using the hammer, saw, and other tools. You have to be on your toes as to the best way to write functions both in the short-term and the long-term, from optimization to cleanliness. The trick is to strike some sort of personal balance between thinking quickly and thinking slowly; a zen that creates confidence within you in the idea that you're doing it your own way.

Tuesday, October 20, 2009

Debugger progress

So it turns out it wasn't an incorrect shift. What was happening was that my count variable was getting set to zero right away as a result of the multiplication op, and therefore by the time that the loop was checking its status, it was already negative.

This was happening because there was a bug in my assembler that wasn't properly checking to see if I could load immediate operands into registers. The instruction to do so is a bit unique; you pass it an 8 bit number and a 4 bit rotate value; you then rotate the 8 bit number by twice the 4 bit rotate value. This means that you can get all powers of 2, but you also cannot get every possible number.

I was trying to load my registers with 0x00100010, which cannot be loaded. I have error checking code to make sure this reports an error, but the bug prevented this error from coming up. Once fixed, everything proceeded smoothly.

I also started on the symbol debugger today. I decided on the file format and wrote the code in the assembler to save to it if it's specified on the command line. However, I need to put more thought into how the emulator will handle this. How the emulator handles debugging currently is a bit of a hack, not very bad, but it needs to be polished up a little bit.

Debugger

So I ran into a bug yesterday where I believe a shift is being done incorrecly, and leaving a multiplication result negative. This occurs on a register that stores a count variable, so it never hits zero and I run into a loop which doesn't break until a data abort.

I could easily find the cause of this, but I decided to postpone it for the day to think about a feature I wanted to add to the emulator's debugger: symbol lookup. Currently, if I want to set a breakpoint I have to enter in the exact address, which is fine, but when I just want to jump to a function without having to go through anything else, I essentially have to run the program twice; once to get the address of the function, and the other to actually debug it. Now, I could calculate this just by looking at the binary, as that's just awesome and fun, but if I want this to be used, I don't think I want to force that.

Symbol tables in this case are complex. First of all, each binary has a number of different sections, and symbols are local to sections. Therefore, duplicate naming must be taken into account, so I need to probably prompt the user of the debugger for all possible choices if there are any, when they want to break at a symbol. Furthermore, there are many different binaries that are in use by the emulator. You have the BIOS, which loads the firmware, which can load the Pictochat program or a game. You can't just have one giant symbol table for all these binaries, but at the same time, you have to make some sort of distinction. I think what I'm going to do is just allow only one binary to be debugged for now...how that will be specified is a challenge for a later date. What needs to be done now is modifying my assembler/linker to place the symbols in the binary and classify it as a "debug" binary, instead of a raw one.

Monday, October 19, 2009

Rain and Starbucks

Had a coding session at Starbucks today, while it was pouring down rain outside (unusual for California during a drought). I spent most of my time continuing to dissect the Nintendo DS video architecture; it's definitely the biggest and most detailed challenge I've come up to so far while writing my emulator. It was more of a learning session to get myself totally up to speed on the architecture; all the registers, modes, settings, etc. but I was able to write a bit of code. The direct-framebuffer mode is the first mode I'm tackling, as it's the least complex and can get the ground running.

First

This is mostly just a journal of sorts to keep track of my development progress for the day. I have lots of things on my plate; Nintendo DS development, Facebook development, and Android development are the top three.