sobota, 15 sierpnia 2015

Xbox360 emulator - Problem overview

Hey!

The basic principle behind emulator is to act as a host for a unkowing guest application. From the application point of view it should not realize, that it's not running on the actual system. There are few basic considerations:

Speed

Clock speed on Xbox360 was 3.2Ghz, that's more than most of the current PCs. Any kind of emulator should be able to keep up with that. Good thing is that the Xbox's CPU is using in-order RISC architecture, which in practice means lots of stalls so that was giving me some hope. As always with CPU emulation there are few choices:

Runtime interpretation - which means instructions for the CPU would be dynamically decoded and executed. Probably using a big "switch" statement, fancy jump table, etc. While straightforward to implement this method has huge overhead and besides adding runtime decoding cost to every instruction it blocks A LOT of optimizations possibilities - for example I would have to treat all immediate values as dynamic data and I wouldn't be able to do any optimizations between instructions as I would be getting only one instruction at a time.

JIT compiler - (Just-In-Time) seems initially like a solution to the previous problems - whenever you enter a new block of code and you don't have an optimal version of it prepared than compile it at the spot and run the optimal version. In general this works well but it two disadvantages:

- there's a cost every time a new code block is visited (that can cause hitches)
- any kind of global (cross code-blocks) optimization is not easily achievable

The later seemed important due to the shittiness of the ABI (Aplication-Binary interface) used on Xbox360. (and on PowerPC in general), especially the amount of non-volatile registers that have to be stored/restored in a function calls + general performance of argument passing (when emulated). Being able to optimize around that may be a make or break of the whole system. So, I decided to go with...

Recompilation - which meens to decompile all existing code and recompile it into a new form. This is done in one big step for the whole executable, there's no JIT, no runtime interpretation. The major disadvantage of this solution is that it's the most complex one, and requires, in general a LOT of code to be created before you can even see the first Hello World. Well, I like writing code :)

Memory

Two things here that were boggling me:

Endianess - PowerPC used in Xbox360 is configured to work in Big Endian mode, which is different from normal Intel architecture which is Little Endian. Which means that the ordering of bytes in the memory will be different.

The question was, could I emulate BigEndian (BE) system on a LittleEndian (LE) CPU using recompilation (so with generated code) without wasting to much performance. I would like to use native operation (addition, multiplication, etc) on the host system in every case instead of reimplementing BE equivalent. 

On x86 Intel there's a bswap instruction (aliased in MSVC under intrinsics _byteswap_ulong, _byteswap_uint64, etc: https://msdn.microsoft.com/en-us/library/a3140177.aspx ). It has only one cycle of latency and it hides itself really well with all the things happening around any way - I couldn't see any measurable difference with or without it.

Secondly size fo all accesses to memory on PowerPC is always explicit (byte, half word, word, etc) so you know every time exactly how many bytes to swap.

So, it would seem that in principle it's possible by keeping the memory layout consistent with Xbox (so it's BE) but all the data on the host side (like values in registers) are LE. Every time there's a load/store or any other instruction accessing memory we will have to swap the byte order.

Consistency - The question here is more low-level, basically is PowerPC has any special memory consistency model that would NOT be easily achievable on my host PC?

So far, the answer seems to be "no" - mostly due to the in-order nature of the CPU, explicit memory fences and explicit and very simple atomic operations on the PowerPC. Also, memory consistency model on typical x64 machine is not that much relaxed by default.

There's another issue of GPU/CPU memory coherency but I will write about this later.

GPU


In general the GPU used in Xbox360 was an ATI Radeon from R500 family. There is source code for a linux driver for that GPU avaiable on the internet and A LOT can be taken from that. I don't think that the whole project would be doable without that.

Other minor things consists of emulating a little bit different memory organization and of course emulating the whole GPU functionality using DX11. In the future memory management may get simpler with DX12 but honestly I'm a little bit affraid about performance with DX12 in this particular case.

I will not get into details here as there are hairy and I would say that the GPU is the most complicated part of the whole thing so far. I will elaborate more in other post.

OS


Any application (including games from Xbox360) will call lots of OS functions. Again, the question is will I be able to emulate them all ? Do I need to emulate them all ? In general - No, a lot of the functions can be faked and they work just fine.

Luckily, almost all of the OS functions that are important are cleanly imported (similary to function import in DLLs) and creating a fake/substituted implementations for them is very easy in practice. Also, most of the functions are named exactly (or very close) to the "normal" Windows ones so at least I didn't feel like I'm in a totally alien environment. Rest can be found again via Google (and in DDK if necessary).

To sum up


Surprisingly there's no major roadblock that to my knowledge would prevent this from working (in principle). The open ended question was (and still is) will the whole thing run fast enough to be practical :D

Thanks for reading,
Dex

czwartek, 13 sierpnia 2015

Into the Darkness - Xbox360 emulator

So - it was a warm day more than a year ago. At that time I was still a freelancer-contractor working in Bordeaux, France porting some nameless game to PS3 for a big company starting with the letter U. Going through the debugger for the n-th time and looking at the assembly (actually useful thing on PS3) I realized that I'm starting to remember the numerical values the opcodes for the specific instructions :)

This soon triggered a thought - well, maybe it will be cool to write a disassembler for PS3, just for fun. Sounded like a good idea for a weekend project :) Since I was able to find much more resources for Xbox360 (and Windows) rather than PS3 I eventually decided to try doing this for that platform.

Idea was to be able to take a compiled executable for Xbox360 (however obtained :P) understand its internal structure, disassemble it into valid assembly instructions, understand them and create some kind of representation I would be able to run/emulate. The PowerPC CPU that is powering the Xbox360 is a rather simple RISC processor so I know it was on the threshold of being doable in spare time, but since I didn't do any proper feasibility study for that project I expected to hit a wall pretty soon.

So, to put it in the right perspective:
- I needed to understand the unknown and undocumented proprietary XEX format
- I need to find a way to decrypt it (yep, seemed scary)
- I needed to get to whatever is inside (I was expecting typical PE/EXE file inside)
- Write disassembler that will turn bytes into understandable instructions (500+ instructions)
- Find a way to convert those instructions into something runable or write a VM for emulator
- Reimplement kernel and other functions in the hosting environment (to mimic OS)
- Find out how is the GPU exactly working and how communication with it works
- Find out the command buffer structure that the GPU is using
- Decompile shaders and recreate them in HLSL
- Emulate enough of GPU's functionality in DX11 and finally get something on the screen.

So, after a year of sparse work on this project it finally runs good enough end-to-end.

The artifacts in the background are due to the bad EDRAM resolve (I'm working on GPU emulation ATM).

This screen although simple shows that all of the elements of the application are working (to some extent):
- XEX dissassembly, code reconstruction and recompilation, hosting app with xbox kernel emulation (threads, semaphores, events, IRQ, DPC, APC), GPU shader decompilation (although simple ATM), GPU command buffer decompilation, translation and execution using DX11 API.

Various parts are missing or plugged with fakes but all in all I'm happy with the result.

What's interesting most of the work went into debugging why stuff isn't working as expected and I ended up with quite complicated tools just to make finding bugs faster:




The tool was mostly a fancy IDE where I could preview the dissasembled code but mostly I was using this tool to investigate traces from the application runs (trace file contained a history of every instruction executed by the CPU with full information about registers' values).

I hope I will be able to do more work on this and eventually run simple game (2D). I decided to share what I currently have because I believe that the whole project is interesting enough for sharing and to be honest - I require some motivation to push it forward.

I will describe each part in next posts.

I'm moving stuff from P4 to GitHub and I will make sourcecode for this public.

DISCLAMER: This project IS NOT COMERCIAL IN ANY SENSE, it's fully done for personal fun and amusement. All the knowledge and media used in development of this project were obtained from public internet and the original references will be provided when applicable.

Hello World :)

Hey Random Person From The Internet!

I don't like huge introductions so let's keep it simple :) I'm Software Engineer working in the game industry. Currently I work as an Engine Programmer and architect (so called Technical Director) at CD Projekt RED (we recently released The Witcher: Wild Hunt). Honestly, the biggest reason for this blog is to vent some of my research ideas at you (and hopefully getting some feedback) as I want to save my colleagues and friends at least some of my constant mumbling about this and that ;) And if any of you find any bit of my work interesting it would be even more awesome.

My interests span multiple directions but mostly are concentrated in two areas: low level (performance critical code, custom compilers, assembly and lower, up to hardware and some cases) and architecture level (engine organization and structure, tools and pipelines). Besides the general wide engine interests I also sometimes venture into the areas of rendering but mostly I always stay in the vicinity of the rendering backend (scene graph, culling, data structures, material system organization, etc) rarely shaders (for that, I suppose, you may find many good blogs out there).

So, without further ado - Welcome.